subject
Intro

Coursera: Getting and Cleaning Data

 with  Jeff Leek
Before you can work with data you have to get some. This course will cover the basic ways that data can be obtained. The course will cover obtaining data from the web, from APIs, from databases and from colleagues in various formats. It will also cover the basics of data cleaning and how to make data “tidy”. Tidy data dramatically speed downstream data analysis tasks. The course will also cover the components of a complete data set including raw data, processing instructions, codebooks, and processed data. The course will cover the basics needed for collecting, cleaning, and sharing data.

Syllabus

Week 1
In this first week of the course, we look at finding data and reading different file types.

Week 2
Welcome to Week 2 of Getting and Cleaning Data! The primary goal is to introduce you to the most common data storage systems and the appropriate tools to extract data from web or from databases like MySQL.

Week 3
Welcome to Week 3 of Getting and Cleaning Data! This week the lectures will focus on organizing, merging and managing the data you have collected using the lectures from Weeks 1 and 2.

Week 4
Welcome to Week 4 of Getting and Cleaning Data! This week we finish up with lectures on text and date manipulation in R. In this final week we will also focus on peer grading of Course Projects.

55 Student
reviews
Cost Free Online Course (Audit)
Provider Coursera
Language English
Certificates Paid Certificate Available
Hours 4-9 hours a week
Calendar 4 weeks long
+ Add to My Courses
Learn Data Analysis udacity.com

Learn to become a Data Analyst. Job offer guaranteed or get a full refund.

Advertisement
75+ Hour Free Coding Course flatironschool.com

Get started with Ruby & JS curriculum online with all-day instructor help.

Advertisement
FAQ View All
What are MOOCs?
MOOCs stand for Massive Open Online Courses. These are free online courses from universities around the world (eg. Stanford Harvard MIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you.  They also have student discussion forums, homework/assignments, and online quizzes or exams.

55 reviews for Coursera's Getting and Cleaning Data

Write a review
28 out of 32 people found the following review useful
3 years ago
profile picture
Life is Study completed this course.
Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after the data scientist's toolkit and R programming courses. The title of the course pretty well sums up the content: the entire class is about Read More
Getting and cleaning data is the third course in the first wave of John Hopkins’s data science specialization track on Coursera. It is recommended that you take this course after the data scientist's toolkit and R programming courses.

The title of the course pretty well sums up the content: the entire class is about loading data into R and cleaning it up so that it can be used of data analysis. You'll learn how to load various data formats into R, such as json, xml, csv, excel files and get data from other sources like MySQL and web APIs. The course also discusses subsetting data, adding variables, merging data, regular expressions and working with dates.

This course is a good summary of many of the things that are useful to know when trying to access and prepare data for analysis. Similar to R programming, it suffers from overuse of static slides with voice-overs, a lack of instructor face time and a lack of interactive content or in-lecture quizzes to help you learn and retain as you go along. You'll be introduced to many R packages and syntax that you probably won't remember after a week or two, but you'll be exposed to many common data formats so that you can refer back to the course materials or other web resources to deal with them in the future.
Was this review helpful to you? YES | NO
18 out of 21 people found the following review useful
2 years ago
profile picture
Anonymous is taking this course right now.
I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization. The lectures aren't so bad... they're a little bit boring and not engaging since they rarely are more than just a voiceover and slides. If that's important to you, don't take this class. However, I do think Read More
I'm a fresh beginner to R and my only experience with it is from the previous 2 courses in this specialization.

The lectures aren't so bad... they're a little bit boring and not engaging since they rarely are more than just a voiceover and slides. If that's important to you, don't take this class. However, I do think the instructors explain the lecture topics well and there is some value in their short walkthroughs.

Unfortunately... this only applies to the lecture topics... which are often only a small part of the quizzes and programming assignments. The previous course in the specialization, R Programming, was MUCH worse in this regard. That said, if your R background is minimal and you utilize outside sources (Stack Exchange, forums, etc.), you WILL learn a LOT. But at times you may feel a bit that the course itself didn't play a large role in the learning process, aside from giving you assignments.

Overall, the material is quite difficult for someone with no background (aside from the other courses in the specialization). I have to give a lot of props to the people in the discussion forums and the CTAs: these people really help close the gap and it's because of them that I keep pushing through!
Was this review helpful to you? YES | NO
29 out of 31 people found the following review useful
2 years ago
Stephen B partially completed this course.
Class information is very sparse. There's a huge gap between the (minimal) content provided in the lectures and the class project required for completion of the course. This is the worst constructed college course and worst MOOC I have ever encountered. I've completed 12 MOOCs, 2 bachelor's degrees, and several gradua Read More
Class information is very sparse. There's a huge gap between the (minimal) content provided in the lectures and the class project required for completion of the course. This is the worst constructed college course and worst MOOC I have ever encountered. I've completed 12 MOOCs, 2 bachelor's degrees, and several graduate courses at Stanford, so that is a distinction earned by Johns Hopkins U from among a very wide field. A complete overhaul of this course and series is desperately needed.
Was this review helpful to you? YES | NO
21 out of 24 people found the following review useful
2 years ago
profile picture
Anonymous is taking this course right now.
Dropping this course because there is such a disconnect between what is taught and what is expected to complete the project and quizzes. I found myself using external sources to learn all of the material necessary. Many of the questions are vague, leaving you spending hours trying to complete tasks only to realize th Read More
Dropping this course because there is such a disconnect between what is taught and what is expected to complete the project and quizzes. I found myself using external sources to learn all of the material necessary. Many of the questions are vague, leaving you spending hours trying to complete tasks only to realize that the objective is different and just not communicated effectively. There is no coherent order to how they deliver the material, teaching basic concepts in week 3 which should have been covered in week 1 or the prior course in R programming. So, I will just use others' tutorials to learn data science in R. Ridiculous that I wasted so much time on this!
Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
a year ago
Brandt Pence completed this course, spending 3 hours a week on it and found the course difficulty to be easy.
This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per table). There are brief introductions to reading datasets from online resou Read More
This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per table). There are brief introductions to reading datasets from online resources such as XML files, website APIs, and MySQL, and the quizzes for weeks 1 and 2 require you to work with these tools. Week 3 introduces subsetting and reshaping data and tools like dplyr, and week 4 introduces working with text strings and regular expressions.

I found this course to be quite a bit easier than the prerequisite R Programming. The quoted time commitment of 4-9 hours/week seems pretty reasonable, and I was probably at the lower end of that even though I front-loaded everything and finished by the middle of the second week of the course. The course project was fairly straightforward but also open-ended, and there was some concern on the discussion boards about how certain aspects of the project (e.g. descriptive variable names) would be evaluated.

All of the quizzes required a fair bit of programming, but nothing was too difficult. There were some technical hurdles in several of the quizzes that caused people problems. For example, R cannot read XML files over an https connection, and that caused some problems for quiz 1, although several solutions were quickly posted. Quiz 2 contained probably the most difficult programming task in the course, which required reading information from the instructor's Github account using the Github API. With some searching I found solutions online, but if you're having trouble and can't find good answers elsewhere, the forum will eventually help once a sufficient number of people get around to taking the quiz.

Overall, four stars. The course was fairly straightforward, and the information here may or may not be valuable to you depending on the type of data analyses you plan to perform and where/how your data are stored. I had previous experience with dplyr from the first course in the EdX PH525 series, so the most valuable portion of the course for me was the section on working with regular expressions.
Was this review helpful to you? YES | NO
25 out of 30 people found the following review useful
3 years ago
profile picture
Anonymous completed this course.
Extremely frustrating class, I spent tons of time wondering what is it that I am actually suppose to do...

I am considering dropping the specialization.
Was this review helpful to you? YES | NO
20 out of 24 people found the following review useful
3 years ago
profile picture
Anonymous completed this course.
Course is lacking any kind of logic or structure. It's simply methods/functions thrown one after another. Complete lack of perspective.
Was this review helpful to you? YES | NO
12 out of 15 people found the following review useful
2 years ago
profile picture
Anonymous is taking this course right now.
A rather poor and confusing course. The lectures are not so great. I'm rather dissapointed with it. Normally these courses are rather good, but not this one.
Was this review helpful to you? YES | NO
9 out of 11 people found the following review useful
2 years ago
Syed Aslam completed this course, spending 3 hours a week on it and found the course difficulty to be medium.
i didn't learn much from course lectures or materials, rather i learned most from stack over flow.really a big disappointment.
Was this review helpful to you? YES | NO
2 out of 2 people found the following review useful
a year ago
profile picture
Anonymous is taking this course right now.
This is the third course in the series, and it's taken me this long to realize that everything I learn comes from external sources and not the course itself. If you do this, you'll learn something. If you don't, you'll lose your mind and waste a ton of time in the process. I started out by watching the videos, taking Read More
This is the third course in the series, and it's taken me this long to realize that everything I learn comes from external sources and not the course itself. If you do this, you'll learn something. If you don't, you'll lose your mind and waste a ton of time in the process.

I started out by watching the videos, taking copious notes and then realizing that I didn't have the information I needed to complete the assignments. I was very stressed about it until my friend -- who uses R programming regularly for work -- shrugged and said, "That's how it works in the real world. You search Stack Overflow or Github to find others who have already solved your problem. Don't go reinventing the wheel." This took a huge amount of pressure off.

I find that I look at others' solutions, work through them line by line to figure out the how and why of it, test to see if they're correct (about 40% of the time they don't appear to be), and learn by doing. I listen to the course videos in the background, and try to tie what they're talking about to actual code that I'm seeing, which makes a big difference.
Was this review helpful to you? YES | NO
6 out of 9 people found the following review useful
2 years ago
profile picture
Ramesh Natarajan completed this course, spending 20 hours a week on it and found the course difficulty to be very hard.
This course just provides an outline on the subject. Its upto you to figure out how to get the assignment done .. Google and StackOverflow is your instructors .. Really! To make things worse, the course assignment instructions are very ambiguous and you spend tons of time trying to understand the problem than solving i Read More
This course just provides an outline on the subject. Its upto you to figure out how to get the assignment done .. Google and StackOverflow is your instructors .. Really! To make things worse, the course assignment instructions are very ambiguous and you spend tons of time trying to understand the problem than solving it. If thats the intend of this course, they have succeeded in it, but when you have a course deadline (and a full time job as many of you do), its extremely frustrating.
Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
10 months ago
profile picture
Anonymous completed this course.
There is a complete disconnect between what is taught and what is expected in the project and tests. The course is pretty bad. I was considering doing the specialization in Data Science and this course is making me re-think this goal. I understand that you need to be good at 'hacking' to be a good data scientist, but Read More
There is a complete disconnect between what is taught and what is expected in the project and tests. The course is pretty bad. I was considering doing the specialization in Data Science and this course is making me re-think this goal.

I understand that you need to be good at 'hacking' to be a good data scientist, but if that's the case then what's the point of paying money to have to Google everything.
Was this review helpful to you? YES | NO
3 out of 4 people found the following review useful
a year ago
Hongmei Li partially completed this course.
There is a significant gap between the video lecture and the assignments/quizzes.

Very horrible... I paid my course for certification, and I cann't retake it for free.
Was this review helpful to you? YES | NO
3 out of 7 people found the following review useful
3 years ago
profile picture
Scott orr partially completed this course.
Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long lis Read More
Getting and Cleaning Data promises to teach students how to extract data from common data storage formats (including databases, specifically SQL, XML, JSON, and HDF5), and from the web using API's and web scraping. The syllabus also includes tips on using R to clean and recode data, and, in the last lecture, a long list of links to sources of data. It's also worth noting that the style of the video lectures is a bit different from those of other classes I've taken: there's never any video of the instructor, just the instructor's voice over the lecture notes.
Was this review helpful to you? YES | NO
a year ago
Jason Michael Cherry completed this course, spending 4 hours a week on it and found the course difficulty to be hard.
This course teaches a lot of extremely important skills in data science. No matter what you end up doing, dealing with data quality is going to be a part of it. This is a challenging class, and rightly so, as the work is tedious, but oh-so-important! The lectures do get a bit bland, but are informative.
Was this review helpful to you? YES | NO
a year ago
Daniel Rosquete completed this course, spending 6 hours a week on it and found the course difficulty to be medium.
Ok, this course is really helpful!

Everything on it has no waste at all, this course is a must for a data scientist!
Was this review helpful to you? YES | NO
1 out of 3 people found the following review useful
2 years ago
Kuhnrl30 completed this course.
Was this review helpful to you? YES | NO
1 out of 3 people found the following review useful
2 years ago
profile picture
Jevgeni Martjushev completed this course.
Was this review helpful to you? YES | NO
0 out of 2 people found the following review useful
2 years ago
profile picture
Anonymous is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Lars Killingdalen completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Caio Taniguchi dropped this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Eytan completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Mkchaitanya is taking this course right now.
Was this review helpful to you? YES | NO
3 weeks ago
Adam Hjerpe completed this course.
Was this review helpful to you? YES | NO
a year ago
Jan Tatham completed this course.
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
a year ago
Shaun Moate completed this course.
Was this review helpful to you? YES | NO
a year ago
Radomir Nowacki completed this course.
Was this review helpful to you? YES | NO
8 months ago
Davide Madrisan completed this course.
Was this review helpful to you? YES | NO
a year ago
Colin Khein completed this course.
Was this review helpful to you? YES | NO
3 months ago
Sonsoles López completed this course.
Was this review helpful to you? YES | NO
a year ago
Rajanand completed this course.
Was this review helpful to you? YES | NO
3 weeks ago
Atila Romero completed this course.
Was this review helpful to you? YES | NO
a year ago
profile picture
Anal Khan partially completed this course.
Was this review helpful to you? YES | NO
a year ago
Jinwook completed this course.
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
12 months ago
Janet Wesner audited this course.
Was this review helpful to you? YES | NO
a year ago
Mario completed this course.
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
4 months ago
Hong Xu completed this course.
Was this review helpful to you? YES | NO
12 months ago
Paolo Midali completed this course.
Was this review helpful to you? YES | NO
7 months ago
Gary Baggett completed this course.
Was this review helpful to you? YES | NO
a year ago
Mark Henry Butler completed this course.
Was this review helpful to you? YES | NO
12 months ago
Nicole Fox completed this course.
Was this review helpful to you? YES | NO
a year ago
profile picture
Sebastien Pujadas completed this course.
Was this review helpful to you? YES | NO
2 out of 6 people found the following review useful
3 years ago
profile picture
Ricardo Vladimiro completed this course, spending 2 hours a week on it and found the course difficulty to be easy.
Was this review helpful to you? YES | NO
0 out of 6 people found the following review useful
2 years ago
Rahul Agarwal completed this course and found the course difficulty to be very easy.
Was this review helpful to you? YES | NO
1 out of 5 people found the following review useful
2 years ago
profile picture
Huy completed this course, spending 6 hours a week on it and found the course difficulty to be easy.
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
2 years ago
profile picture
Rafael Prados completed this course.
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
2 years ago
Bob Fridley completed this course.
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
2 years ago
Bill Seliger completed this course.
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
2 years ago
Sérgio Den Boer completed this course.
Was this review helpful to you? YES | NO

Write a review

How would you rate this course? *
How much of the course did you finish? *
Review
Create Review