subject
Intro

Exploratory Data Analysis

 with  Roger Peng
This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.

Syllabus

Week 1
This week covers the basics of analytic graphics and the base plotting system in R. We've also included some background material to help you install R if you haven't done so already.

Week 2
Welcome to Week 2 of Exploratory Data Analysis. This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualizing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particularly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process.

Week 3
Welcome to Week 3 of Exploratory Data Analysis. This week covers some of the workhorse statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colors in R so that you can use color as an important and useful dimension when making data graphics. All of this material is covered in chapters 9-12 of my book Exploratory Data Analysis with R.

Week 4
This week, we'll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis techniques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I'm providing these videos to give you a sense of how you might proceed with a specific type of dataset.

37 Student
reviews
Cost Free Online Course
Provider Coursera
Language English
Certificates Paid Certificate Available
Hours 4-9 hours a week
Calendar 4 weeks long
+ Add to My Courses
Roger Peng
Class Central presents
An interview with
Roger Peng
JHU’s Data Science Specialization offered on Coursera will give learners a solid foundation and practical experience in data science. Read
Learn Data Analysis udacity.com

Learn to become a Data Analyst. Job offer guaranteed or get a full refund.

Advertisement
75+ Hour Free Coding Course flatironschool.com

Get started with Ruby & JS curriculum online with all-day instructor help.

Advertisement
FAQ View All
What are MOOCs?
MOOCs stand for Massive Open Online Courses. These are free online courses from universities around the world (eg. Stanford Harvard MIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you.  They also have student discussion forums, homework/assignments, and online quizzes or exams.

37 reviews

Write a review
14 out of 15 people found the following review useful
3 years ago
profile picture
Life is Study completed this course.
The first 2 weeks of the course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to p Read More
The first 2 weeks of the course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to plotting with a section on color. The clustering section seems a little about of place since there is not any introduction explaining the purpose of clustering. What's worse the SVD and PCA sections require a fairly high level of linear algebra knowledge to understand, which are not prerequisites for this course. I suspect that section will leave may students scratching their heads. Week 4 consists of 2 case studies where the professor shows you how to perform an exploratory analysis on a couple different data sets.
Was this review helpful to you? YES | NO
8 out of 9 people found the following review useful
2 years ago
Prose Simian completed this course, spending 4 hours a week on it and found the course difficulty to be hard.
A painful, dull offline course on plotting & clustering in R slapped online with minimal conversion like the rest of JHU's execrable Data Science specialisation*. Hard only due to the appalling pedagogy. (Have these guys heard of labs? Apparently not...)

*Which, tragically, is apparently one of Coursera's top moneyspinners. Sigh.
Was this review helpful to you? YES | NO
6 out of 6 people found the following review useful
2 years ago
profile picture
Anonymous completed this course.
Another boring course you'll have to slog through. It's half learning a few things about making plots, half topics that been better covered elsewhere (k-mean). You can actually graduate those courses with horrible programming. As usual you'll learn more by surfing stack-overflow than by the videos. I've done half the assignments before looking at the vids.
Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
11 months ago
Brandt Pence completed this course, spending 3 hours a week on it and found the course difficulty to be easy.
This is the fourth course in the Data Science specialization. The course covers exploratory analyses in R, primarily making figures using the three most common packages: base R, lattice, and ggplot2. The instructors also manage to throw hierarchical clustering, k-means, and pca into the 3rd week of the course, which se Read More
This is the fourth course in the Data Science specialization. The course covers exploratory analyses in R, primarily making figures using the three most common packages: base R, lattice, and ggplot2. The instructors also manage to throw hierarchical clustering, k-means, and pca into the 3rd week of the course, which seems a little odd as these topics might be better left for the machine learning course. The course ends with a peer-graded course project, similar to other courses in the specialization.

I found this course to be fairly useful, on par with the preceding courses but perhaps a bit worse than Getting and Cleaning Data. As with the previous courses, I front-loaded my work and finished fairly early, in part because I was taking Reproducible Research and Bioconductor for Genomic Data Science concurrently. I found the quizzes and project to be relatively straightforward, although again the peer grading is somewhat less-than-useful.

Overall, three stars. A reasonable introduction to graphing in R, with some basic clustering and dimension reduction strategies tacked on to the end. Experience with R at the level of R Programming is almost certainly required, as stated in the course prerequisites.
Was this review helpful to you? YES | NO
4 out of 4 people found the following review useful
2 years ago
profile picture
Anonymous dropped this course.
A boring and pointless money-generating vehicle from JH. And yes - reviews should be at least 20 words - I wonder if I find a way around that.
Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
a year ago
Jason Michael Cherry completed this course, spending 4 hours a week on it and found the course difficulty to be hard.
This is a good starting point for any data analysis work, and the course covers the basics, and a bit more, rather well. It's a bit light on what you should do with the information you gather from your data exploration though.
Was this review helpful to you? YES | NO
a year ago
profile picture
Markus Stenemo completed this course.
Quite good, quite basic for those who want to review their knowledge. Should be good for those with no previous experience.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Rafael Prados completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Bob Fridley completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Bill Seliger completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Sérgio Den Boer is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Hamid Aalla is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Jevgeni Martjushev completed this course.
Was this review helpful to you? YES | NO
2 years ago
Lars Killingdalen completed this course.
Was this review helpful to you? YES | NO
a year ago
Jan Tatham completed this course.
Was this review helpful to you? YES | NO
a year ago
profile picture
Karri S completed this course.
Was this review helpful to you? YES | NO
2 years ago
Kuhnrl30 completed this course.
Was this review helpful to you? YES | NO
a year ago
Radomir Nowacki completed this course.
Was this review helpful to you? YES | NO
7 months ago
Davide Madrisan completed this course.
Was this review helpful to you? YES | NO
a year ago
Colin Khein completed this course.
Was this review helpful to you? YES | NO
a year ago
Shaun Moate completed this course.
Was this review helpful to you? YES | NO
a year ago
Daniel Rosquete partially completed this course.
Was this review helpful to you? YES | NO
a year ago
Jinwook completed this course.
Was this review helpful to you? YES | NO
10 months ago
William Hunt completed this course.
Was this review helpful to you? YES | NO
a year ago
Mario completed this course.
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
3 months ago
Hong Xu completed this course.
Was this review helpful to you? YES | NO
11 months ago
Paolo Midali completed this course.
Was this review helpful to you? YES | NO
6 months ago
Gary Baggett completed this course.
Was this review helpful to you? YES | NO
9 months ago
Avinish completed this course.
Was this review helpful to you? YES | NO
9 months ago
Kashyap Uppuluri completed this course.
Was this review helpful to you? YES | NO
5 months ago
Zhe Li completed this course.
Was this review helpful to you? YES | NO
11 months ago
Nicole Fox completed this course.
Was this review helpful to you? YES | NO
a year ago
profile picture
Sebastien Pujadas completed this course.
Was this review helpful to you? YES | NO
a year ago
Mark Henry Butler completed this course.
Was this review helpful to you? YES | NO
1 out of 4 people found the following review useful
2 years ago
profile picture
Huy completed this course, spending 5 hours a week on it and found the course difficulty to be easy.
Was this review helpful to you? YES | NO

Write a review

How would you rate this course? *
How much of the course did you finish? *
Review
Create Review