This course covers the essential exploratory techniques for summarizing data. These techniques are typically applied before formal modeling commences and can help inform the development of more complex statistical models. Exploratory techniques are also important for eliminating or sharpening potential hypotheses about the world that can be addressed by the data. We will cover in detail the plotting systems in R as well as some of the basic principles of constructing data graphics. We will also cover some of the common multivariate statistical techniques used to visualize high-dimensional data.
Week 1 This week covers the basics of analytic graphics and the base plotting system in R. We've also included some background material to help you install R if you haven't done so already.
Week 2 Welcome to Week 2 of Exploratory Data Analysis. This week covers some of the more advanced graphing systems available in R: the Lattice system and the ggplot2 system. While the base graphics system provides many important tools for visualizing data, it was part of the original R system and lacks many features that may be desirable in a plotting system, particularly when visualizing high dimensional data. The Lattice and ggplot2 systems also simplify the laying out of plots making it a much less tedious process.
Week 3 Welcome to Week 3 of Exploratory Data Analysis. This week covers some of the workhorse statistical methods for exploratory analysis. These methods include clustering and dimension reduction techniques that allow you to make graphical displays of very high dimensional data (many many variables). We also cover novel ways to specify colors in R so that you can use color as an important and useful dimension when making data graphics. All of this material is covered in chapters 9-12 of my book Exploratory Data Analysis with R.
Week 4 This week, we'll look at two case studies in exploratory data analysis. The first involves the use of cluster analysis techniques, and the second is a more involved analysis of some air pollution data. How one goes about doing EDA is often personal, but I'm providing these videos to give you a sense of how you might proceed with a specific type of dataset.
MOOCs stand for Massive Open Online Courses. These arefree online courses from universities around the world (eg. StanfordHarvardMIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you. They also have student discussion forums, homework/assignments, and online quizzes or exams.
The first 2 weeks of the course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to p
The first 2 weeks of the course provide a thorough overview of plotting in R using the base graphical package, the lattice package and the ggplot2 package. Week 3 takes a sudden detour into data clustering and the fairly advanced topics of principal components analysis and single value decomposition only jump back to plotting with a section on color. The clustering section seems a little about of place since there is not any introduction explaining the purpose of clustering. What's worse the SVD and PCA sections require a fairly high level of linear algebra knowledge to understand, which are not prerequisites for this course. I suspect that section will leave may students scratching their heads. Week 4 consists of 2 case studies where the professor shows you how to perform an exploratory analysis on a couple different data sets.
Prose Simiancompleted this course, spending 4 hours a week on it and found the course difficulty to be hard.
A painful, dull offline course on plotting & clustering in R slapped online with minimal conversion like the rest of JHU's execrable Data Science specialisation*. Hard only due to the appalling pedagogy. (Have these guys heard of labs? Apparently not...)
*Which, tragically, is apparently one of Coursera's top moneyspinners. Sigh.
Another boring course you'll have to slog through. It's half learning a few things about making plots, half topics that been better covered elsewhere (k-mean). You can actually graduate those courses with horrible programming. As usual you'll learn more by surfing stack-overflow than by the videos. I've done half the assignments before looking at the vids.
Brandt Pencecompleted this course, spending 3 hours a week on it and found the course difficulty to be easy.
This is the fourth course in the Data Science specialization. The course covers exploratory analyses in R, primarily making figures using the three most common packages: base R, lattice, and ggplot2. The instructors also manage to throw hierarchical clustering, k-means, and pca into the 3rd week of the course, which se
This is the fourth course in the Data Science specialization. The course covers exploratory analyses in R, primarily making figures using the three most common packages: base R, lattice, and ggplot2. The instructors also manage to throw hierarchical clustering, k-means, and pca into the 3rd week of the course, which seems a little odd as these topics might be better left for the machine learning course. The course ends with a peer-graded course project, similar to other courses in the specialization.
I found this course to be fairly useful, on par with the preceding courses but perhaps a bit worse than Getting and Cleaning Data. As with the previous courses, I front-loaded my work and finished fairly early, in part because I was taking Reproducible Research and Bioconductor for Genomic Data Science concurrently. I found the quizzes and project to be relatively straightforward, although again the peer grading is somewhat less-than-useful.
Overall, three stars. A reasonable introduction to graphing in R, with some basic clustering and dimension reduction strategies tacked on to the end. Experience with R at the level of R Programming is almost certainly required, as stated in the course prerequisites.
Jason Michael Cherrycompleted this course, spending 4 hours a week on it and found the course difficulty to be hard.
This is a good starting point for any data analysis work, and the course covers the basics, and a bit more, rather well. It's a bit light on what you should do with the information you gather from your data exploration though.