Brandt Pence

The Bioconductor course is technically sixth in the Genomic Data Science specialization, although I skipped directly to this after taking the third course covering Python. The course covers a bit of basic R programming (although experience with R at the level of R Programming is almost a necessity unless you have previous programming experience and are able to pick up the language very quickly. The course then dives right in to many of the most common Bioconductor packages for genomic data analysis, including GenomicRanges, AnnotationHub, Biostrings, ExpressionSets, and others. The final week includes an overview of getting your data into Bioconductor, as well as some advanced packages.

I found this to be a very good course. I had previously taken the EdX Introduction to Bioconductor course, so I had some experience with most of these constructs, but I still found this course to be an almost perfect balance between challenging and achievable. There were some technical issues along the way, including compatibility issues due to the new release (at the time) of Windows 10, but Dr. Hansen was very involved on the discussion board, and most of these were resolved relatively quickly. Bioconductor packages also undergo frequent updates, and this often can result in the same code giving different answers across versions, so it is critical to use the same Bioconductor version that was used in designing the quizzes, even though it will likely be far out-of-date by the time you take the course.

Overall, four stars. I enjoyed this course more than any others in the specialization so far, and for someone like me coming in with some experience with R, it had just the right amount of challenge. Dr. Hansen's presence in the course forums also enhanced the course, especially since there were no community TAs at the time to answer questions, despite it being apparently the fourth iteration of the course when I took it.

I found this to be a very good course. I had previously taken the EdX Introduction to Bioconductor course, so I had some experience with most of these constructs, but I still found this course to be an almost perfect balance between challenging and achievable. There were some technical issues along the way, including compatibility issues due to the new release (at the time) of Windows 10, but Dr. Hansen was very involved on the discussion board, and most of these were resolved relatively quickly. Bioconductor packages also undergo frequent updates, and this often can result in the same code giving different answers across versions, so it is critical to use the same Bioconductor version that was used in designing the quizzes, even though it will likely be far out-of-date by the time you take the course.

Overall, four stars. I enjoyed this course more than any others in the specialization so far, and for someone like me coming in with some experience with R, it had just the right amount of challenge. Dr. Hansen's presence in the course forums also enhanced the course, especially since there were no community TAs at the time to answer questions, despite it being apparently the fourth iteration of the course when I took it.

Brandt Pence **completed** this course, spending **4 hours** a week on it and found the course difficulty to be **medium**.

This is the first course in the new (at the time of this writing) Genomic Data Science specialization, offered by Johns Hopkins through Coursera. This course is similar to the Data Scientist’s Toolbox course which leads off the Data Science specialization in that it is scheduled for 4 weeks but really only takes a few hours to complete.

Indeed, this course is even less useful than the Data Scientist’s Toolbox, which at least had some practical aspects (installing and learning Git, for example). Here, the meat of the course is video lectures with titles such as “Just Enough Cell Biology”, “Why Care About Statistics?”, and “What is Computational Biology Software?”. The level of detail in these will certainly not impress anyone with training in biology or computer science, and most individuals (or at least those who post in the forums) come from one of those backgrounds, with some majority from the latter.

The quizzes (there are four) are very easy. Without referring to my notes, I needed a second attempt to score 10/10 for only one of them, and that was due to a silly error on my part. The final project asks you to read a Science paper (published not-coincidentally by the course directors) and to take a quiz covering information from that paper. The paper is dense but not particularly difficult if you have a biology background (as I do), and you are given 3 attempts at the quiz here as well, which should allow those with weak biology backgrounds a reasonable chance to score highly.

Overall, one star. This course was essentially useless and should not cost $49. I expect that, with sufficient complaint volume, the course will drop to $29 similar to the intro course in the Data Science sequence. I finished this course Tuesday morning after starting it Monday night and spending most of that night working on lectures and quizzes for R Programming, and I signed up for and started the next specialization course on the Galaxy platform immediately.

Brandt Pence **completed** this course, spending **1 hours** a week on it and found the course difficulty to be **very easy**.

This is the first course in the Data Science specialization, offered by Johns Hopkins through Coursera. The official schedule lists the time commitment as 4 weeks of study with 1-4 hours/week of work. In reality, this course can be easily completed in a few hours by anyone with a reasonable background in computers. Most of the course is dedicated to installing programs like RStudio, which is less-than-helpful for most users. However, for those unfamiliar with Git and GitHub, the introductory lectures on those topics are reasonably valuable, although there are better introductions available elsewhere.

Overall, this course is probably not even worth the $29 currently charged, but it is a required part of the specialization and thus necessary for those planning to complete it. For others, it may be a better use of time to simply download Git, R, and RStudio and to set up a GitHub account, then to move directly to R Programming. For those on the specialization track, this course can be easily completed in one night, and there should be no problem starting R Programming concurrently with this course.

The final project asks you to submit a screenshot to prove you can install and open RStudio, and to do a few simple tasks in GitHub (create a repo, create and push a markdown file to that repo, and fork someone else’s repo). The assignment is peer-graded, which historically has been a cause for concern in many of the Data Science track courses, but the questions are yes/no and straightforward enough that they leave little to a fellow student’s (sometimes rather skewed) interpretation. For example, a forum post early in the course asked if it would be preferable to reserve perfect scores on the peer-based assignment for students who had “exceeded the requirements” for the final project. Fortunately, this idea was shot down by subsequent posters (and it’s not realistically possible for this assignment anyway), but it displays a mentality that may cause good students to lose points unfairly in later classes. Reviews from other former students available online suggest that this is an ongoing problem.

Overall, two stars, mostly for the Git and GitHub material to which I had not been previously exposed.

Overall, this course is probably not even worth the $29 currently charged, but it is a required part of the specialization and thus necessary for those planning to complete it. For others, it may be a better use of time to simply download Git, R, and RStudio and to set up a GitHub account, then to move directly to R Programming. For those on the specialization track, this course can be easily completed in one night, and there should be no problem starting R Programming concurrently with this course.

The final project asks you to submit a screenshot to prove you can install and open RStudio, and to do a few simple tasks in GitHub (create a repo, create and push a markdown file to that repo, and fork someone else’s repo). The assignment is peer-graded, which historically has been a cause for concern in many of the Data Science track courses, but the questions are yes/no and straightforward enough that they leave little to a fellow student’s (sometimes rather skewed) interpretation. For example, a forum post early in the course asked if it would be preferable to reserve perfect scores on the peer-based assignment for students who had “exceeded the requirements” for the final project. Fortunately, this idea was shot down by subsequent posters (and it’s not realistically possible for this assignment anyway), but it displays a mentality that may cause good students to lose points unfairly in later classes. Reviews from other former students available online suggest that this is an ongoing problem.

Overall, two stars, mostly for the Git and GitHub material to which I had not been previously exposed.

Brandt Pence **completed** this course, spending **1 hours** a week on it and found the course difficulty to be **very easy**.

This is the second course in the Data Science specialization, and it is the foundation course for the rest of the classes. Many reviewers online will disagree with the relatively high score I have given this course, and with good reason. It is a very difficult course for those without a proper programming background. My fuzzy memory of for- and while-loops from my intro to programming course a decade ago was not enough to leave me adequately prepared for this course. Thus, I spent a lot of time working on the programming assignments to get them to work. The video lectures and quizzes are at a decidedly beginner level, but the assignments are at what would probably be described as an intermediate level, and the instructors do not provide sufficient information to bridge the gap.

This is something that not all of the students in the course are willing to accept. There are a number of students who, by virtue of their backgrounds, clearly find this class to be very easy, and some are gracious enough to provide help to struggling students. It is very easy to google to find the answers to the programming assignments since so many people have completed this course, a strategy which has dubious educational value. Much more critically, previous students have provided bridge problems that attempt to help struggling students apply what they learned in lectures to complete the programming assignments. Examples are linked here for assignment 1, assignment 2, and assignment 3.

Assignments 1 and 3 are graded by submission script, while assignment 2 is peer-graded. The latter is rather more simple than the other two assignments and mostly consists of taking a set of functions and substituting items in order to get those functions to output something else. The other two assignments generally deal with manipulating data and calculating simple statistics (e.g. mean). One excellent resource for students is the Swirl assignments. Swirl is an interactive R program originally conceived by one of Brian Caffo's graduate students (Dr. Caffo teaches Statistical Inference, Regression Models, and Developing Data Products in the Data Science Specialization). Students can get up to 5 extra credit points (1 point per module) for completing Swirl modules in the course, but all the modules are valuable resources for learning some rudimentary syntax in R.

Overall, four stars. This class is difficult, and the instructors do the bare minimum to allow students to successfully complete the assignments, but success in this course is achievable with hard work, even for students without a strong programming background.

This is something that not all of the students in the course are willing to accept. There are a number of students who, by virtue of their backgrounds, clearly find this class to be very easy, and some are gracious enough to provide help to struggling students. It is very easy to google to find the answers to the programming assignments since so many people have completed this course, a strategy which has dubious educational value. Much more critically, previous students have provided bridge problems that attempt to help struggling students apply what they learned in lectures to complete the programming assignments. Examples are linked here for assignment 1, assignment 2, and assignment 3.

Assignments 1 and 3 are graded by submission script, while assignment 2 is peer-graded. The latter is rather more simple than the other two assignments and mostly consists of taking a set of functions and substituting items in order to get those functions to output something else. The other two assignments generally deal with manipulating data and calculating simple statistics (e.g. mean). One excellent resource for students is the Swirl assignments. Swirl is an interactive R program originally conceived by one of Brian Caffo's graduate students (Dr. Caffo teaches Statistical Inference, Regression Models, and Developing Data Products in the Data Science Specialization). Students can get up to 5 extra credit points (1 point per module) for completing Swirl modules in the course, but all the modules are valuable resources for learning some rudimentary syntax in R.

Overall, four stars. This class is difficult, and the instructors do the bare minimum to allow students to successfully complete the assignments, but success in this course is achievable with hard work, even for students without a strong programming background.

Brandt Pence **completed** this course, spending **4 hours** a week on it and found the course difficulty to be **medium**.

This is the third course in the Data Science specialization. The course is all about how to read data of different formats into R and how to create tidy datasets (one variable per column, one observation per row, one observational unit type per table). There are brief introductions to reading datasets from online resources such as XML files, website APIs, and MySQL, and the quizzes for weeks 1 and 2 require you to work with these tools. Week 3 introduces subsetting and reshaping data and tools like dplyr, and week 4 introduces working with text strings and regular expressions.

I found this course to be quite a bit easier than the prerequisite R Programming. The quoted time commitment of 4-9 hours/week seems pretty reasonable, and I was probably at the lower end of that even though I front-loaded everything and finished by the middle of the second week of the course. The course project was fairly straightforward but also open-ended, and there was some concern on the discussion boards about how certain aspects of the project (e.g. descriptive variable names) would be evaluated.

All of the quizzes required a fair bit of programming, but nothing was too difficult. There were some technical hurdles in several of the quizzes that caused people problems. For example, R cannot read XML files over an https connection, and that caused some problems for quiz 1, although several solutions were quickly posted. Quiz 2 contained probably the most difficult programming task in the course, which required reading information from the instructor's Github account using the Github API. With some searching I found solutions online, but if you're having trouble and can't find good answers elsewhere, the forum will eventually help once a sufficient number of people get around to taking the quiz.

Overall, four stars. The course was fairly straightforward, and the information here may or may not be valuable to you depending on the type of data analyses you plan to perform and where/how your data are stored. I had previous experience with dplyr from the first course in the EdX PH525 series, so the most valuable portion of the course for me was the section on working with regular expressions.

I found this course to be quite a bit easier than the prerequisite R Programming. The quoted time commitment of 4-9 hours/week seems pretty reasonable, and I was probably at the lower end of that even though I front-loaded everything and finished by the middle of the second week of the course. The course project was fairly straightforward but also open-ended, and there was some concern on the discussion boards about how certain aspects of the project (e.g. descriptive variable names) would be evaluated.

All of the quizzes required a fair bit of programming, but nothing was too difficult. There were some technical hurdles in several of the quizzes that caused people problems. For example, R cannot read XML files over an https connection, and that caused some problems for quiz 1, although several solutions were quickly posted. Quiz 2 contained probably the most difficult programming task in the course, which required reading information from the instructor's Github account using the Github API. With some searching I found solutions online, but if you're having trouble and can't find good answers elsewhere, the forum will eventually help once a sufficient number of people get around to taking the quiz.

Overall, four stars. The course was fairly straightforward, and the information here may or may not be valuable to you depending on the type of data analyses you plan to perform and where/how your data are stored. I had previous experience with dplyr from the first course in the EdX PH525 series, so the most valuable portion of the course for me was the section on working with regular expressions.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **easy**.

This is the fourth course in the Data Science specialization. The course covers exploratory analyses in R, primarily making figures using the three most common packages: base R, lattice, and ggplot2. The instructors also manage to throw hierarchical clustering, k-means, and pca into the 3rd week of the course, which seems a little odd as these topics might be better left for the machine learning course. The course ends with a peer-graded course project, similar to other courses in the specialization.

I found this course to be fairly useful, on par with the preceding courses but perhaps a bit worse than Getting and Cleaning Data. As with the previous courses, I front-loaded my work and finished fairly early, in part because I was taking Reproducible Research and Bioconductor for Genomic Data Science concurrently. I found the quizzes and project to be relatively straightforward, although again the peer grading is somewhat less-than-useful.

Overall, three stars. A reasonable introduction to graphing in R, with some basic clustering and dimension reduction strategies tacked on to the end. Experience with R at the level of R Programming is almost certainly required, as stated in the course prerequisites.

I found this course to be fairly useful, on par with the preceding courses but perhaps a bit worse than Getting and Cleaning Data. As with the previous courses, I front-loaded my work and finished fairly early, in part because I was taking Reproducible Research and Bioconductor for Genomic Data Science concurrently. I found the quizzes and project to be relatively straightforward, although again the peer grading is somewhat less-than-useful.

Overall, three stars. A reasonable introduction to graphing in R, with some basic clustering and dimension reduction strategies tacked on to the end. Experience with R at the level of R Programming is almost certainly required, as stated in the course prerequisites.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **easy**.

Statistical Inference is the sixth course in the Data Science specialization, and the first course in the analytical portion of the course (followed by Regression Models and Practical Machine Learning. The course covers probability, variance, distributions (normal, binomial, poisson), hypothesis testing and p-values, power, multiple comparisons, and finally resampling. Overall this is a rather poor introduction to statistical methods, and the only really relevant hypothesis test covered is the simple t-test.

This is the first course taught by Brian Caffo, who is more mathematically-inclined, and he doesn't do a particularly good job of explaining the material in an intuitive way. There are a few good portions of the course, though, and I though the explanation of statistical power using the manipulate package in R was particularly good, and quite a bit better than the coverage I've received in face-to-face university courses I've taken. Otherwise, though, this course in no way will prepare students to actually conduct most common statistical tests, and it doesn't cover non-parametric statistics in any depth whatsoever. I have heard good things about the former Duke statistics course on Coursera, and that course has just (at the time of this writing) been released as a new specialization (Statistics with R), so that might be a better choice for learners looking for a better coverage of statistics using R packages.

Overall, three stars. There are a few gems hidden among the rest of the course content, but overall the course is not particularly good for learning statistical techniques, and it is unlikely that you'll come away from this with any real understanding of how to apply statistical hypothesis testing unless you have pre-existing experience in this area.

This is the first course taught by Brian Caffo, who is more mathematically-inclined, and he doesn't do a particularly good job of explaining the material in an intuitive way. There are a few good portions of the course, though, and I though the explanation of statistical power using the manipulate package in R was particularly good, and quite a bit better than the coverage I've received in face-to-face university courses I've taken. Otherwise, though, this course in no way will prepare students to actually conduct most common statistical tests, and it doesn't cover non-parametric statistics in any depth whatsoever. I have heard good things about the former Duke statistics course on Coursera, and that course has just (at the time of this writing) been released as a new specialization (Statistics with R), so that might be a better choice for learners looking for a better coverage of statistics using R packages.

Overall, three stars. There are a few gems hidden among the rest of the course content, but overall the course is not particularly good for learning statistical techniques, and it is unlikely that you'll come away from this with any real understanding of how to apply statistical hypothesis testing unless you have pre-existing experience in this area.

Brandt Pence **completed** this course, spending **2 hours** a week on it and found the course difficulty to be **very easy**.

Regression Models is the seventh course in the Data Science specialization. As with Statistical Inference, it is taught by Brian Caffo and suffers from the same issues as the preceding course. The course covers least squares, simple linear regression, multiple linear regression, regression model diagnostics, and logistic and poisson regression.

The material here is rather strangely-presented. As with Statistical Inference, it is light on intuition, so students will have a hard time applying techniques learned here in an appropriate way without previous experience. Although a fair amount of math is presented for each topic, it is not at a level deep enough for students to really grasp how it works, so the course tries to walk a line behind mathematical rigor and application and mostly fails at both. The course project was relatively straightforward, using the mtcars dataset in R to predict miles per gallon by transmission type (automatic versus manual), with adjustments for other variables in the dataset. Overall I think I spent less than 10-12 hours on the entire course, and I technically took this concurrently with Statistical Inference, although I enrolled in this course only after completing the majority of the previous course.

Overall, two stars. Probably the least useful course in the specialization so far. I have some previous experience in linear regression and have taken a graduate-level course in the area and have published several papers using these techniques, and I did not find this course to be particularly intuitive or useful. A new specialization (Statistics with R) from Duke will include a course covering these topics, so this may be a better choice for someone wanting to learn how to apply these techniques to his/her own data.

The material here is rather strangely-presented. As with Statistical Inference, it is light on intuition, so students will have a hard time applying techniques learned here in an appropriate way without previous experience. Although a fair amount of math is presented for each topic, it is not at a level deep enough for students to really grasp how it works, so the course tries to walk a line behind mathematical rigor and application and mostly fails at both. The course project was relatively straightforward, using the mtcars dataset in R to predict miles per gallon by transmission type (automatic versus manual), with adjustments for other variables in the dataset. Overall I think I spent less than 10-12 hours on the entire course, and I technically took this concurrently with Statistical Inference, although I enrolled in this course only after completing the majority of the previous course.

Overall, two stars. Probably the least useful course in the specialization so far. I have some previous experience in linear regression and have taken a graduate-level course in the area and have published several papers using these techniques, and I did not find this course to be particularly intuitive or useful. A new specialization (Statistics with R) from Duke will include a course covering these topics, so this may be a better choice for someone wanting to learn how to apply these techniques to his/her own data.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **easy**.

This is the first course in what was a new (at the time I took it) specialization covering data analysis using Python 3 or SAS. This first course mostly covered data manipulation, summarization, and visualization. The production value for this course is far better than other courses I've taken either in Coursera specializations (Data Science, Genomic Data Science, Python for Everybody, Machine Learning, etc.) or on EdX.

However, the material is not presented at a particularly deep level, and the instruction in programming techniques is virtually non-existent. This is probably better as a course for academics who want to do data analysis using one of these methods and need a quick introduction to coding it in either Python (which is what I chose) or SAS, or alternately for experienced Python programmers who want to start doing a bit of data analysis. I was not a huge fan of the assignment submission system, which relies on the student signing up for a blog in order to post their results. I used Tumblr, and it took me longer to get my formatting the way I wanted it when posting my assignments than it did to actually do the assignment in the first place.

Overall, 3 stars. An ok course, but there are better courses out there for programming and statistics/data analysis, so this course (and likely specialization) will probably be mostly in a niche for Python programmers who want to start data analysis and need a gentle introduction, or alternately for individuals who absolutely have to use SAS and have no other resource for instruction.

However, the material is not presented at a particularly deep level, and the instruction in programming techniques is virtually non-existent. This is probably better as a course for academics who want to do data analysis using one of these methods and need a quick introduction to coding it in either Python (which is what I chose) or SAS, or alternately for experienced Python programmers who want to start doing a bit of data analysis. I was not a huge fan of the assignment submission system, which relies on the student signing up for a blog in order to post their results. I used Tumblr, and it took me longer to get my formatting the way I wanted it when posting my assignments than it did to actually do the assignment in the first place.

Overall, 3 stars. An ok course, but there are better courses out there for programming and statistics/data analysis, so this course (and likely specialization) will probably be mostly in a niche for Python programmers who want to start data analysis and need a gentle introduction, or alternately for individuals who absolutely have to use SAS and have no other resource for instruction.

Brandt Pence **completed** this course, spending **2 hours** a week on it and found the course difficulty to be **very easy**.

This course, offered by the Karolinska Institute, is a gentle introduction to the R language. I have to admit that I only watched a few of the videos. These are very high quality compared to the Johns Hopkins Data Science courses, but this course suffers from the same issues as many of the other introductory R courses out there, namely, that there is very little true challenge in these courses and that most of the exercises are simply copying code and interpreting the output. Despite the criticisms of R Programming and the other JHU courses, those courses do force students to learn to write code in order to solve the problems. I finished this course with a 92% in a bit over 2 hours of total work, and I imagine anyone who has any experience with R could do likewise. A few of the questions were a bit tricky (but didn't require any programming), and they only give you two attempts, so it might still be difficult to get 100% in the course without careful study, but passing this one (pass line 56%) should be no problem.

Overall, two stars. This might be useful as a gentle introduction to the R language, but you won't learn to code well by taking this course.

Overall, two stars. This might be useful as a gentle introduction to the R language, but you won't learn to code well by taking this course.

Brandt Pence **completed** this course, spending **1 hours** a week on it and found the course difficulty to be **very easy**.

(Note, I took this before the reorganization of the courses. I believe the material in the first two-three courses remains the same, so my comments should still be valid here.)

This is the first course in the PH525 sequence offered by HarvardX on the EdX platform. The sequence is taught by Rafael Irizarry, a noted computational biologist at Harvard and the Dana Farber Cancer Center. The course offers a relatively gentle introduction to biostatistics, and there's little emphasis on genomic analyses here. Topics that are covered include probability, the normal distribution, some inferential statistics (T-tests, confidence intervals, power calculations, association tests, and simultation), and exploratory data analysis.

The introduction to R is rather cursory, and I have to imagine that the homework assignments might be challenging for those unfamiliar with the language, although there is a fair bit of handholding for the most difficult parts. For those that have taken R Programming, as I had, this course will seem very easy. I took it during a self-paced period and finished the entire course in a little more than three days, working only a few hours a night and a bit here and there during free periods at work, and I don't think I spent much more than 10-20 minutes on any of the programming problems.

There is some value to be had here even for those with experience in R, though. The basic introduction of actual statistical tests in this course is likely to give students taking the statistical inference courses in the Data Science and Genomic Data Science specializations a bit of a head start. The section on dplyr, a powerful method of splitting datasets and performing operations on their contents in a more intuitive way than in base R, is also reasonably good. Additional follow-up courses are available for matrix operations and advanced statistics.

Overall, four stars. The actual instruction in programming in R is a bit slim here, but for those with experience with the language but with little experience on the statistics side of R (which would describe most everyone currently taking or having recently taken R Programming), there is a lot of value here for little effort. The EdX platform is not as nice as Coursera's, especially when it comes to the discussion boards, but this doesn't detract much from the course.

This is the first course in the PH525 sequence offered by HarvardX on the EdX platform. The sequence is taught by Rafael Irizarry, a noted computational biologist at Harvard and the Dana Farber Cancer Center. The course offers a relatively gentle introduction to biostatistics, and there's little emphasis on genomic analyses here. Topics that are covered include probability, the normal distribution, some inferential statistics (T-tests, confidence intervals, power calculations, association tests, and simultation), and exploratory data analysis.

The introduction to R is rather cursory, and I have to imagine that the homework assignments might be challenging for those unfamiliar with the language, although there is a fair bit of handholding for the most difficult parts. For those that have taken R Programming, as I had, this course will seem very easy. I took it during a self-paced period and finished the entire course in a little more than three days, working only a few hours a night and a bit here and there during free periods at work, and I don't think I spent much more than 10-20 minutes on any of the programming problems.

There is some value to be had here even for those with experience in R, though. The basic introduction of actual statistical tests in this course is likely to give students taking the statistical inference courses in the Data Science and Genomic Data Science specializations a bit of a head start. The section on dplyr, a powerful method of splitting datasets and performing operations on their contents in a more intuitive way than in base R, is also reasonably good. Additional follow-up courses are available for matrix operations and advanced statistics.

Overall, four stars. The actual instruction in programming in R is a bit slim here, but for those with experience with the language but with little experience on the statistics side of R (which would describe most everyone currently taking or having recently taken R Programming), there is a lot of value here for little effort. The EdX platform is not as nice as Coursera's, especially when it comes to the discussion boards, but this doesn't detract much from the course.

(Note I took these prior to their reorganization/combination. Back then they were 4 one-week courses, of which I took three, so I will review those modules below and assume that the material in the new course is similar).

>>>>RNA-seq:

This is the first case study course and 5th overall course in the PH525 sequence offered by HarvardX through EdX. I had high hopes for these case studies going in, but I left somewhat disappointed. I had hoped to learn how to work through an entire genomics pipeline, but instead most of the questions were similar to those in the Intro to Bioconductor course where much of the hard work was done for you.

RNA-seq is a powerful technique that has essentially replaced microarrays and may someday replace many RT-PCR studies for analysis of gene expression. The major problem with these courses is the massive datasets generated by these studies and the advanced techniques necessary to analyze them. Most personal computers cannot run an RNA-seq pipeline without running out of memory, so computer clusters are often used. This makes it next to impossible to let students go through the entire analysis pipeline, which is what I had hoped would happen in this class. Online resources like Galaxy help with this to an extent by letting individuals run their pipelines on remote servers, but for a programming class this type of analytical strategy isn't appropriate.

Nevertheless, I did learn some about how to use Bioconductor software to explore RNA-seq data, and that aligns well with the rest of the courses in this sequence. The value here is in giving students an overview of this field so that they are prepared for further study or to explore available datasets on their own. I had hoped for a bit more out of this sequence, but the relatively small time commitment and the fact that the courses were free means I can't complain too much.

Overall, three stars. I had hoped for a little more exposure to some of the RNA-seq analysis workflow, but the technical difficulty involved in using such large datasets makes that impossible for this type of course. This course only took a few hours of work to complete, so it is a minimal time investment that yields a decent introduction to RNA-seq analysis.

>>>>ChIP-seq:

This is the third case study and 7th overall course in the PH525 sequence. This was by far the least useful course in the sequence, and I'm not saying that due to the low grade I received. The whole course probably took 2-3 hours to complete, and there was not a single programming exercise in the entire class. The questions were not particularly straightforward, and many of them had multiple answers with only 2 attempts allowed. While Dr. Liu did cover the material reasonably well, the answers to some of the homework questions were either not apparent from the lectures or not covered at all, and the relatively small number of points available meant that missing a few questions reduced your grade considerably.

ChIP-seq (chromatin immunoprecipitation sequencing) is a powerful technique used to find transcription factor binding sites on genomic DNA by pulling down bound DNA fragments using antibodies against the transcription factor, then sequencing the resulting DNA fragments. While the class itself was a decent overview of the technique and some of the associated technologies, there was no aspect of it related to actually introducing students to performing analyses themselves, which until now had been the theme of the PH525 sequence.

Overall, two stars. This course could use a revamp to focus more on introducing data analysis techniques and making the homework questions a little clearer. There were some complaints in the forums about the homework difficulty in this course, which in my experience is unusual for EdX (but almost a default on Coursera).

>>>> DNA Methylation:

This is the fourth and final case study and 8th and final course overall in the PH525 sequence. This was the most useful case study course in the sequence of the three I've taken (I skipped the course on variant discovery due to its reliance on a Linux virtual machine, although I will probably go back and complete that course at some point). One of Dr. Irizarry's research areas involves analysis of DNA methylation through various techniques, and as a result the material in this course is somewhat better than in the other case studies.

DNA methylation is an epigenetic modification that generally reduces gene expression. Epigenetics is an extremely hot field at the moment, and the chance to learn a bit about how the data for these studies are generated was exciting. The course suffered a bit from the same things that plagued the other courses in this sequence in that much of the data analysis pipeline isn't really feasible for the average student (although much of this is based on the limited ability of normal laptops to handle the volume of data these studies generate). Therefore, much of the difficult parts were already completed for you as in previous classes, but this course was still a great introduction to the topic.

Overall, four stars. The best of the case studies I've taken so far. I would like to see longer courses that go more in depth on these data analysis techniques (perhaps including introducing command line analyses for fastq processing and the like), but I don't know if that will be possible without asking students to pay for cloud computing resources like Amazon AWS.

>>>>RNA-seq:

This is the first case study course and 5th overall course in the PH525 sequence offered by HarvardX through EdX. I had high hopes for these case studies going in, but I left somewhat disappointed. I had hoped to learn how to work through an entire genomics pipeline, but instead most of the questions were similar to those in the Intro to Bioconductor course where much of the hard work was done for you.

RNA-seq is a powerful technique that has essentially replaced microarrays and may someday replace many RT-PCR studies for analysis of gene expression. The major problem with these courses is the massive datasets generated by these studies and the advanced techniques necessary to analyze them. Most personal computers cannot run an RNA-seq pipeline without running out of memory, so computer clusters are often used. This makes it next to impossible to let students go through the entire analysis pipeline, which is what I had hoped would happen in this class. Online resources like Galaxy help with this to an extent by letting individuals run their pipelines on remote servers, but for a programming class this type of analytical strategy isn't appropriate.

Nevertheless, I did learn some about how to use Bioconductor software to explore RNA-seq data, and that aligns well with the rest of the courses in this sequence. The value here is in giving students an overview of this field so that they are prepared for further study or to explore available datasets on their own. I had hoped for a bit more out of this sequence, but the relatively small time commitment and the fact that the courses were free means I can't complain too much.

Overall, three stars. I had hoped for a little more exposure to some of the RNA-seq analysis workflow, but the technical difficulty involved in using such large datasets makes that impossible for this type of course. This course only took a few hours of work to complete, so it is a minimal time investment that yields a decent introduction to RNA-seq analysis.

>>>>ChIP-seq:

This is the third case study and 7th overall course in the PH525 sequence. This was by far the least useful course in the sequence, and I'm not saying that due to the low grade I received. The whole course probably took 2-3 hours to complete, and there was not a single programming exercise in the entire class. The questions were not particularly straightforward, and many of them had multiple answers with only 2 attempts allowed. While Dr. Liu did cover the material reasonably well, the answers to some of the homework questions were either not apparent from the lectures or not covered at all, and the relatively small number of points available meant that missing a few questions reduced your grade considerably.

ChIP-seq (chromatin immunoprecipitation sequencing) is a powerful technique used to find transcription factor binding sites on genomic DNA by pulling down bound DNA fragments using antibodies against the transcription factor, then sequencing the resulting DNA fragments. While the class itself was a decent overview of the technique and some of the associated technologies, there was no aspect of it related to actually introducing students to performing analyses themselves, which until now had been the theme of the PH525 sequence.

Overall, two stars. This course could use a revamp to focus more on introducing data analysis techniques and making the homework questions a little clearer. There were some complaints in the forums about the homework difficulty in this course, which in my experience is unusual for EdX (but almost a default on Coursera).

>>>> DNA Methylation:

This is the fourth and final case study and 8th and final course overall in the PH525 sequence. This was the most useful case study course in the sequence of the three I've taken (I skipped the course on variant discovery due to its reliance on a Linux virtual machine, although I will probably go back and complete that course at some point). One of Dr. Irizarry's research areas involves analysis of DNA methylation through various techniques, and as a result the material in this course is somewhat better than in the other case studies.

DNA methylation is an epigenetic modification that generally reduces gene expression. Epigenetics is an extremely hot field at the moment, and the chance to learn a bit about how the data for these studies are generated was exciting. The course suffered a bit from the same things that plagued the other courses in this sequence in that much of the data analysis pipeline isn't really feasible for the average student (although much of this is based on the limited ability of normal laptops to handle the volume of data these studies generate). Therefore, much of the difficult parts were already completed for you as in previous classes, but this course was still a great introduction to the topic.

Overall, four stars. The best of the case studies I've taken so far. I would like to see longer courses that go more in depth on these data analysis techniques (perhaps including introducing command line analyses for fastq processing and the like), but I don't know if that will be possible without asking students to pay for cloud computing resources like Amazon AWS.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **medium**.

(Note that I took these before the recent reorganization. I believe portions of this course are now included in a different course in this sequence.)

This is the fourth course in the PH525 sequence offered by EdX. It is technically possible (based on the stated prerequisites) to take this course second after PH525.1x, but I would highly recommend finishing all 3 previous courses (PH525.1x, PH525.2x, and PH525.3x) before taking this course.

The course is an introduction to the Bioconductor packages for R. This course is certainly somewhat easier than the preceding course (PH525.3x), especially in the first few weeks, but the difficulty level ramps up considerably towards the end. This is a decent introduction to Bioconductor, but it suffers from the same issue that plagues the previous courses in this sequence. Basically, the analyses that are necessary for this type of genomic data are, if not difficult to understand, then at least difficult to program. This is mostly a problem of data wrangling, as once the data are in their proper form, the analyses are not particularly difficult. However, because the data are so complicated, there is a lot of code pre-written for the students, and thus students will come out of the class with an overview of various Bioconductor packages, but most won't be able to run through a complete analysis of actual data de novo. I have hopes that the upcoming case studies will rectify this somewhat.

Otherwise, this is a pretty good class, about on par with the previous courses in the sequence. The lowest three homework scores are dropped here, which is intended to allow students to choose (if they desire) to do the homeworks only for the microarray or for the next generation sequencing tracks in week 3. I did both, even though microarrays are close to obsolete due to the advent of RNA-seq. This allowed me to skip several questions in a few of the homeworks and to omit the last section on parallel programming and software engineering entirely and still to finish with a 100% score in the course.

Overall, four stars, the same score I've given the previous courses in this sequence. You won't come away from this course as an expert in Bioconductor or in analysis of genomic data with R, but it's a solid foundation that will allow you to more easily learn the details when/if you have to analyze your own original data using these software packages.

This is the fourth course in the PH525 sequence offered by EdX. It is technically possible (based on the stated prerequisites) to take this course second after PH525.1x, but I would highly recommend finishing all 3 previous courses (PH525.1x, PH525.2x, and PH525.3x) before taking this course.

The course is an introduction to the Bioconductor packages for R. This course is certainly somewhat easier than the preceding course (PH525.3x), especially in the first few weeks, but the difficulty level ramps up considerably towards the end. This is a decent introduction to Bioconductor, but it suffers from the same issue that plagues the previous courses in this sequence. Basically, the analyses that are necessary for this type of genomic data are, if not difficult to understand, then at least difficult to program. This is mostly a problem of data wrangling, as once the data are in their proper form, the analyses are not particularly difficult. However, because the data are so complicated, there is a lot of code pre-written for the students, and thus students will come out of the class with an overview of various Bioconductor packages, but most won't be able to run through a complete analysis of actual data de novo. I have hopes that the upcoming case studies will rectify this somewhat.

Otherwise, this is a pretty good class, about on par with the previous courses in the sequence. The lowest three homework scores are dropped here, which is intended to allow students to choose (if they desire) to do the homeworks only for the microarray or for the next generation sequencing tracks in week 3. I did both, even though microarrays are close to obsolete due to the advent of RNA-seq. This allowed me to skip several questions in a few of the homeworks and to omit the last section on parallel programming and software engineering entirely and still to finish with a 100% score in the course.

Overall, four stars, the same score I've given the previous courses in this sequence. You won't come away from this course as an expert in Bioconductor or in analysis of genomic data with R, but it's a solid foundation that will allow you to more easily learn the details when/if you have to analyze your own original data using these software packages.

Brandt Pence **completed** this course, spending **4 hours** a week on it and found the course difficulty to be **medium**.

(Note, these case studies seem to have been combined into one course recently).

This is the fourth and final case study and 8th and final course overall in the PH525 sequence. This was the most useful case study course in the sequence of the three I've taken (I skipped the course on variant discovery due to its reliance on a Linux virtual machine, although I will probably go back and complete that course at some point). One of Dr. Irizarry's research areas involves analysis of DNA methylation through various techniques, and as a result the material in this course is somewhat better than in the other case studies.

DNA methylation is an epigenetic modification that generally reduces gene expression. Epigenetics is an extremely hot field at the moment, and the chance to learn a bit about how the data for these studies are generated was exciting. The course suffered a bit from the same things that plagued the other courses in this sequence in that much of the data analysis pipeline isn't really feasible for the average student (although much of this is based on the limited ability of normal laptops to handle the volume of data these studies generate). Therefore, much of the difficult parts were already completed for you as in previous classes, but this course was still a great introduction to the topic.

Overall, four stars. The best of the case studies I've taken so far. I would like to see longer courses that go more in depth on these data analysis techniques (perhaps including introducing command line analyses for fastq processing and the like), but I don't know if that will be possible without asking students to pay for cloud computing resources like Amazon AWS.

This is the fourth and final case study and 8th and final course overall in the PH525 sequence. This was the most useful case study course in the sequence of the three I've taken (I skipped the course on variant discovery due to its reliance on a Linux virtual machine, although I will probably go back and complete that course at some point). One of Dr. Irizarry's research areas involves analysis of DNA methylation through various techniques, and as a result the material in this course is somewhat better than in the other case studies.

DNA methylation is an epigenetic modification that generally reduces gene expression. Epigenetics is an extremely hot field at the moment, and the chance to learn a bit about how the data for these studies are generated was exciting. The course suffered a bit from the same things that plagued the other courses in this sequence in that much of the data analysis pipeline isn't really feasible for the average student (although much of this is based on the limited ability of normal laptops to handle the volume of data these studies generate). Therefore, much of the difficult parts were already completed for you as in previous classes, but this course was still a great introduction to the topic.

Overall, four stars. The best of the case studies I've taken so far. I would like to see longer courses that go more in depth on these data analysis techniques (perhaps including introducing command line analyses for fastq processing and the like), but I don't know if that will be possible without asking students to pay for cloud computing resources like Amazon AWS.

Reproducible Research is the fifth course in the Data Science specialization, and the last course in what could reasonably be considered the basic R introduction portion of the series. Following this course, students move into Statistical Inference, Regression Models, and Practical Machine Learning, courses which are more about analytical techniques than basic programming skills.

The idea behind reproducible research is to inform students about the reproducibility crisis in science and to give them tools to make their analysis reproducible. This is something that has long had importance in programming but has only recently been given much credence in experimental sciences, so this course is relatively timely for those of us in the latter fields. The course covers R Markdown and knitr as a method of producing reports with integrated analyses, and also covers RPubs for communicating results. Overall this is probably the most useful and important course in the first part of the specialization. The course itself is easy, although the two projects (week 2 and week 4) can be very time-consuming depending on how much effort you choose to expend. I went for the option of providing the minimum to meet the stated requirements, and I still received 100% on my peer feedback, although this strategy can be risky, so proceed with caution if you choose to do this.

Overall, four stars. The best course in the first half of the specialization, and it gives a good overview of strategies to disseminate your analyses and results in an understandable and reproducible way. I took this concurrently with Exploratory Data Analysis, but some of the material from that course is useful here, so make sure you have enough time to complete both if you choose this route.

The idea behind reproducible research is to inform students about the reproducibility crisis in science and to give them tools to make their analysis reproducible. This is something that has long had importance in programming but has only recently been given much credence in experimental sciences, so this course is relatively timely for those of us in the latter fields. The course covers R Markdown and knitr as a method of producing reports with integrated analyses, and also covers RPubs for communicating results. Overall this is probably the most useful and important course in the first part of the specialization. The course itself is easy, although the two projects (week 2 and week 4) can be very time-consuming depending on how much effort you choose to expend. I went for the option of providing the minimum to meet the stated requirements, and I still received 100% on my peer feedback, although this strategy can be risky, so proceed with caution if you choose to do this.

Overall, four stars. The best course in the first half of the specialization, and it gives a good overview of strategies to disseminate your analyses and results in an understandable and reproducible way. I took this concurrently with Exploratory Data Analysis, but some of the material from that course is useful here, so make sure you have enough time to complete both if you choose this route.

I took the first offering of this, the second course in the Genomic Data Science specialization, and there are a number of issues that I hope the specialization team can work out. The video lectures are short (20-30 minutes total per module), and the introduction to working with Galaxy is reasonably interesting. The explanations given by Dr. Taylor are fairly good, but as with other courses in this specialization and in the Data Science specialization, the depth of the instruction is not quite enough to prepare students for the final project.

A fair amount of time is spent on demonstrating how to run the Galaxy software system through the cloud or locally on your own machine. Some of this is problematic for several reasons. First, Galaxy does not play well with Windows, and the only reliable way to install Galaxy on a Windows machine is to run an instance of Linux (e.g. Ubuntu) either as a second OS or as a virtual machine. The instructors also suggest Amazon AWS as a cloud provider for those wanting to run Galaxy on the cloud.

You do not need to do any of this! By all means watch the videos so you know how it's done (it's required to take the associated quiz anyway). One person posted in the forums that he was charged $60 after he left several instances running in his Amazon AWS account overnight, even though they weren't actually doing anything. Others later reported similar charges of $100-$300. The Galaxy website allows 250GB storage per registered user and has plenty of processor time to allow students to run the tools to complete the demonstrations and final project in a reasonably timely fashion, especially if you do it at night when most of the researchers using the platform have finished their work for the day.

The final project requires you to determine the number and type of variants from sequence data from a father/mother/daughter trio. The tools are all available in Galaxy main, but there is not enough background information given to make the process intuitive, nor were certain essential questions answered (ex: should I analyze each sample individually or pool all the subjects into the same sample before analysis?). One issue was that, this being the first instance of this course, there were no community TAs available to answer questions, so students had to rely on the instructor for guidance, and he was of course not often involved on the discussion forums.

Some internet resources are available to help with this (see this guide on variant calling and this Nature Genetics article). As a note, though, I could not get the listed workflow from the first resource to work for the data we were given and ended up trying to work through it by essentially picking tools based on their names/descriptions. I was able to get an answer, but there was no way to tell prior to submission if my answer was correct, close, or completely wrong. Unfortunately, it ended up that there was no way to tell when evaluating other students' projects whether their answers were right or wrong either, despite the fact that the rubric asks you to do just that. Additionally, the rubric asks evaluators to assess whether a particular variant was present is the .vcf file, and this variant was not called using the hg19 reference genome that was used for the Galaxy demonstrations (and confirmed by the instructor in one of his rare forum posts to be appropriate for the project). To his credit, the instructor resolved this (after someone sent him a direct email).

Overall, two stars. There is some value here, but the expectations aren't particularly clear, and the course project (if done correctly) is well beyond what is taught in the lectures. This would be fine (and indeed it's characteristic of the Data Science Specialization and this specialization), but the resources online that might help with actually completing the project are confusing, contradictory, or deprecated.

A fair amount of time is spent on demonstrating how to run the Galaxy software system through the cloud or locally on your own machine. Some of this is problematic for several reasons. First, Galaxy does not play well with Windows, and the only reliable way to install Galaxy on a Windows machine is to run an instance of Linux (e.g. Ubuntu) either as a second OS or as a virtual machine. The instructors also suggest Amazon AWS as a cloud provider for those wanting to run Galaxy on the cloud.

You do not need to do any of this! By all means watch the videos so you know how it's done (it's required to take the associated quiz anyway). One person posted in the forums that he was charged $60 after he left several instances running in his Amazon AWS account overnight, even though they weren't actually doing anything. Others later reported similar charges of $100-$300. The Galaxy website allows 250GB storage per registered user and has plenty of processor time to allow students to run the tools to complete the demonstrations and final project in a reasonably timely fashion, especially if you do it at night when most of the researchers using the platform have finished their work for the day.

The final project requires you to determine the number and type of variants from sequence data from a father/mother/daughter trio. The tools are all available in Galaxy main, but there is not enough background information given to make the process intuitive, nor were certain essential questions answered (ex: should I analyze each sample individually or pool all the subjects into the same sample before analysis?). One issue was that, this being the first instance of this course, there were no community TAs available to answer questions, so students had to rely on the instructor for guidance, and he was of course not often involved on the discussion forums.

Some internet resources are available to help with this (see this guide on variant calling and this Nature Genetics article). As a note, though, I could not get the listed workflow from the first resource to work for the data we were given and ended up trying to work through it by essentially picking tools based on their names/descriptions. I was able to get an answer, but there was no way to tell prior to submission if my answer was correct, close, or completely wrong. Unfortunately, it ended up that there was no way to tell when evaluating other students' projects whether their answers were right or wrong either, despite the fact that the rubric asks you to do just that. Additionally, the rubric asks evaluators to assess whether a particular variant was present is the .vcf file, and this variant was not called using the hg19 reference genome that was used for the Galaxy demonstrations (and confirmed by the instructor in one of his rare forum posts to be appropriate for the project). To his credit, the instructor resolved this (after someone sent him a direct email).

Overall, two stars. There is some value here, but the expectations aren't particularly clear, and the course project (if done correctly) is well beyond what is taught in the lectures. This would be fine (and indeed it's characteristic of the Data Science Specialization and this specialization), but the resources online that might help with actually completing the project are confusing, contradictory, or deprecated.

This is the third course in the Genomic Data Science specialization. I have mixed feelings about this course. Much of the lecture material and the quizzes are devoted to introducing students to the Python programming language. Python is used commonly in bioinformatics due to its simple syntax and the wealth of packages (e.g. Biopython, NumPy) available for data processing and genomics. There's very little in the course material that will assist students in actually learning to code prior to the final exam, as most of the quiz questions simply ask students to determine which given lines of code will produce the desired output.

The final exam then throws students into the deep end. For those without significant Python coding experience or years of experience in another language, the final exam will be somewhere between extremely difficult and impossible. There is a serious loophole in the final exam that allows students to score highly without doing any coding, however. As with all the other classes in the JHU specializations, students are allowed 3 attempts at the exam, and the exam itself is multiple choice with 4 answers per question. Additionally, the instructors have coded more than 4 answers for each question, so incorrect answers cycle in and out of the potential answer bank between multiple tests. This means that students who take the test 3 times are often presented with a situation where only 1 or 2 of the 4 choices shows up on all 3 attempts, making it easy to determine which is the right answer if you keep your previous attempts open while taking the third exam and carefully note your previous wrong answers and the choices which appear in all 3 tests.

My minimal coding experience and careful use of Biopython was enough to allow me to to answer the first 3 of 10 questions on the exam, but I scored 5/10, 8/10, and finally 10/10 using the strategy above in my 3 tries.

Overall, I'm giving this course two stars, the same as the Galaxy course. There must be a better way to introduce students to these topics than what the instructors have done in these courses so far. There has been too much of a disconnect between assumed knowledge and expectations in these first two courses (I'm ignoring Intro to Genomic Technologies, which is another story altogether). This is true of the Data Science specialization to a certain extent as well, but the jump between lecture content and course project expectations is likely too wide for most students here, unless they come from a background where they happen to have a fair bit of training in whatever topic happens to be the focus of the course and/or final project.

The final exam then throws students into the deep end. For those without significant Python coding experience or years of experience in another language, the final exam will be somewhere between extremely difficult and impossible. There is a serious loophole in the final exam that allows students to score highly without doing any coding, however. As with all the other classes in the JHU specializations, students are allowed 3 attempts at the exam, and the exam itself is multiple choice with 4 answers per question. Additionally, the instructors have coded more than 4 answers for each question, so incorrect answers cycle in and out of the potential answer bank between multiple tests. This means that students who take the test 3 times are often presented with a situation where only 1 or 2 of the 4 choices shows up on all 3 attempts, making it easy to determine which is the right answer if you keep your previous attempts open while taking the third exam and carefully note your previous wrong answers and the choices which appear in all 3 tests.

My minimal coding experience and careful use of Biopython was enough to allow me to to answer the first 3 of 10 questions on the exam, but I scored 5/10, 8/10, and finally 10/10 using the strategy above in my 3 tries.

Overall, I'm giving this course two stars, the same as the Galaxy course. There must be a better way to introduce students to these topics than what the instructors have done in these courses so far. There has been too much of a disconnect between assumed knowledge and expectations in these first two courses (I'm ignoring Intro to Genomic Technologies, which is another story altogether). This is true of the Data Science specialization to a certain extent as well, but the jump between lecture content and course project expectations is likely too wide for most students here, unless they come from a background where they happen to have a fair bit of training in whatever topic happens to be the focus of the course and/or final project.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **medium**.

(Note, I took these courses before the recent reorganization. I believe the material for the first few courses is the same, so my comments should still be valid.)

This is the second PH525 sequence course offered through HarvardX. Most of the course is dedicated to demonstrating mathematical operations for performing matrix manipulations and calculations with R. Most of what is shown can be done more easily using built-in functions in R (ex: lm()), but there is still some good information here. The descriptions of collinearity, interactions, and the demonstration of building a multivariate linear model to compare treatments and interactions in particular were better than in the dedicated statistics courses I took as a graduate student.

The programming assignments are in general much easier than the material covered in the videos, and as with the first course there is a fair amount of hand-holding throughout the assignments, which is a major difference from similar courses such as the Hopkins Data Science sequence. This is a two-week course, and since I took it self-paced, it took me only a few hours (spread over two days) to finish it. Despite my 100% score, I still feel I could learn more by repeating this course down the road, and this is probably a side-effect of the ease of the homework assignments compared to the course material. There is also a lot of instances of "here is some advanced code we're using to demonstrate how this works, but we're not going to teach you how to code this", which I found to be frustrating at times. Complete .Rmd files are available for each lecture, though, so students can go through the code independently.

Overall, four stars. Maybe not the most effective use of time if all you want to do is be able to run linear regression analyses in R, but there is a lot of good information here.

This is the second PH525 sequence course offered through HarvardX. Most of the course is dedicated to demonstrating mathematical operations for performing matrix manipulations and calculations with R. Most of what is shown can be done more easily using built-in functions in R (ex: lm()), but there is still some good information here. The descriptions of collinearity, interactions, and the demonstration of building a multivariate linear model to compare treatments and interactions in particular were better than in the dedicated statistics courses I took as a graduate student.

The programming assignments are in general much easier than the material covered in the videos, and as with the first course there is a fair amount of hand-holding throughout the assignments, which is a major difference from similar courses such as the Hopkins Data Science sequence. This is a two-week course, and since I took it self-paced, it took me only a few hours (spread over two days) to finish it. Despite my 100% score, I still feel I could learn more by repeating this course down the road, and this is probably a side-effect of the ease of the homework assignments compared to the course material. There is also a lot of instances of "here is some advanced code we're using to demonstrate how this works, but we're not going to teach you how to code this", which I found to be frustrating at times. Complete .Rmd files are available for each lecture, though, so students can go through the code independently.

Overall, four stars. Maybe not the most effective use of time if all you want to do is be able to run linear regression analyses in R, but there is a lot of good information here.

Brandt Pence **completed** this course, spending **2 hours** a week on it and found the course difficulty to be **easy**.

(Note I took these before the recent reorganization. I believe most of the material from the first few courses has remained relatively the same.)

This is the third course in the PH525 sequence offered by HarvardX. This course ended up being a bit of a surprise to me, as it was far more difficult than the previous two courses (PH525.1x and PH525.2x). Whereas previously, the lectures were at a higher level than the assignments, the assignments in this course were more difficult than the material covered in the lectures, and there was quite a bit less hand- holding compared to previous courses. Part of this may have been the material, as I had a solid background in the topics covered in the previous courses (basic statistics, R programming, and regression analyses), but I have little background in multivariate analyses.

The materials covered in this course include statistical inference for high throughput data, cluster and factor analysis, principal component analysis, hierarchical modeling, and more. I needed quite a bit of help from the discussion boards to get through some of the homework problems. Fortunately, although the EdX discussion boards are relatively poor, there was sufficient information there to get through most of the problems. I found the homework to be much less intuitive compared to previous classes, but I did learn a lot of programming and analysis tricks in this class.

Overall, four stars. This has been the most difficult course I've taken to this point, but getting the right answers is rewarding, and the instructors have set up the homeworks so that you have sufficient attempts to get the right answer (barely, in some cases).

This is the third course in the PH525 sequence offered by HarvardX. This course ended up being a bit of a surprise to me, as it was far more difficult than the previous two courses (PH525.1x and PH525.2x). Whereas previously, the lectures were at a higher level than the assignments, the assignments in this course were more difficult than the material covered in the lectures, and there was quite a bit less hand- holding compared to previous courses. Part of this may have been the material, as I had a solid background in the topics covered in the previous courses (basic statistics, R programming, and regression analyses), but I have little background in multivariate analyses.

The materials covered in this course include statistical inference for high throughput data, cluster and factor analysis, principal component analysis, hierarchical modeling, and more. I needed quite a bit of help from the discussion boards to get through some of the homework problems. Fortunately, although the EdX discussion boards are relatively poor, there was sufficient information there to get through most of the problems. I found the homework to be much less intuitive compared to previous classes, but I did learn a lot of programming and analysis tricks in this class.

Overall, four stars. This has been the most difficult course I've taken to this point, but getting the right answers is rewarding, and the instructors have set up the homeworks so that you have sufficient attempts to get the right answer (barely, in some cases).

This is the first class in the new (at the time I took it) Python for Everybody specialization, which grew out of Dr. Charles Severence's popular course of the same name. As I understand it, the first two courses of this specialization will cover the material from the previous course, while the third and fourth courses and the capstone will cover new material.

This is a very gentle introduction to programming in Python. The videos are very thorough, and Dr. Chuck does a good job of going over everything he's teaching in great detail. As I had a fair amount of experience in R and some experience with Python (Codecademy course and Genomic Data Science with Python course), I found this to be very easy, and I raced through the class in a few hours, listening to the videos on 2x speed. One of the big problems I have with this course and similar ones (like the Rice Python courses) is the use of web-based coding platforms. While they're useful for real-time checking of code, students who learn to use these platforms may end up completely lost if they try to go apply their programming experience from these courses to a real-world problem. I would much prefer to see programming taught via command line or using an IDE like Spyder.

Overall, four stars. Very thorough, very slow-paced. I imagine that finishing this specialization will only get students to the novice programmer stage, but it is a good first step, and probably the most approachable programming course out there for those with no experience.

This is a very gentle introduction to programming in Python. The videos are very thorough, and Dr. Chuck does a good job of going over everything he's teaching in great detail. As I had a fair amount of experience in R and some experience with Python (Codecademy course and Genomic Data Science with Python course), I found this to be very easy, and I raced through the class in a few hours, listening to the videos on 2x speed. One of the big problems I have with this course and similar ones (like the Rice Python courses) is the use of web-based coding platforms. While they're useful for real-time checking of code, students who learn to use these platforms may end up completely lost if they try to go apply their programming experience from these courses to a real-world problem. I would much prefer to see programming taught via command line or using an IDE like Spyder.

Overall, four stars. Very thorough, very slow-paced. I imagine that finishing this specialization will only get students to the novice programmer stage, but it is a good first step, and probably the most approachable programming course out there for those with no experience.

This is the first course in the new (at the time of this writing) Statistics with R specialization from Duke. This specialization comes out of the popular Data Analysis and Statistical Inference course which used to be offered on Coursera.

Introduction to Probability and Data covers study design, types of data, probability, and several common distributions (normal, binomial) over 4 weeks, with a 5th week devoted to a project. Everything is done in R, and R markdown documents are provided for the programming labs and the final project to help students get started. There are also theory-based quizzes each week.

I found the course extraordinarily easy. However, I had completed the vast majority of the Johns Hopkins Data Science specialization prior to taking this course, and I have graduate-level training in applied statistics. The lab instructions for each week essentially hold the learner's hand throughout, so there's essentially no challenge to them, and the R code is essentially fed directly to the student. This is not a course where you will learn to program using R to any great extent. For that purpose, I think the JHU specialization on Coursera, the HarvardX genomic statistics series on EdX, or the Analytics Edge on EdX are the best bets, probably in that order, although I have no experience with Udacity or other providers.

Overall, three stars. It's a decent course for an introductory statistics/probability course, but there are many better options out there for both statistics and R programming.

Introduction to Probability and Data covers study design, types of data, probability, and several common distributions (normal, binomial) over 4 weeks, with a 5th week devoted to a project. Everything is done in R, and R markdown documents are provided for the programming labs and the final project to help students get started. There are also theory-based quizzes each week.

I found the course extraordinarily easy. However, I had completed the vast majority of the Johns Hopkins Data Science specialization prior to taking this course, and I have graduate-level training in applied statistics. The lab instructions for each week essentially hold the learner's hand throughout, so there's essentially no challenge to them, and the R code is essentially fed directly to the student. This is not a course where you will learn to program using R to any great extent. For that purpose, I think the JHU specialization on Coursera, the HarvardX genomic statistics series on EdX, or the Analytics Edge on EdX are the best bets, probably in that order, although I have no experience with Udacity or other providers.

Overall, three stars. It's a decent course for an introductory statistics/probability course, but there are many better options out there for both statistics and R programming.

Brandt Pence **completed** this course, spending **2 hours** a week on it and found the course difficulty to be **very easy**.

This is the final course before the capstone in the Data Science specialization from Johns Hopkins on Coursera. Overall this is one of the coolest courses in the specialization. It covers production of data products (as the name suggests), such as web-based data apps using Shiny and HTML-based presentations using Slidify, among other things. These two procedures are then used for the final project, in which the student creates his/her own Shiny-based app as well as a Slidify-produced presentation introducing the app.

This may not be the most directly useful course in the specialization for most students, but it is very interesting and more fun than most of the preceding courses. I spent a fair amount of time getting my Shiny app (a simple body mass index calculator and population distribution visualizer) to work the way I wanted, and I was pretty happy with the way it turned out. I can see a lot of use for these tools in education, and there are probably myriad applications in business as well.

Overall, four stars. One of my favorite courses in the specialization, even if I will probably rarely use what I learned.

This may not be the most directly useful course in the specialization for most students, but it is very interesting and more fun than most of the preceding courses. I spent a fair amount of time getting my Shiny app (a simple body mass index calculator and population distribution visualizer) to work the way I wanted, and I was pretty happy with the way it turned out. I can see a lot of use for these tools in education, and there are probably myriad applications in business as well.

Overall, four stars. One of my favorite courses in the specialization, even if I will probably rarely use what I learned.

This is the second-to-last course in the Data Science specialization from Johns Hopkins, and the final of three courses covering actual data analysis techniques (preceded by Statistical Inference and Regression Models).

This was one of the better courses in the series, and I thought it lived up to it's name. This was certainly a practical overview of machine learning techniques. There was very little discussion of the algorithms behind these techniques, certainly much less than even in Andrew Ng's Coursera course, which is itself supposedly fairly watered-down compared to many university courses on the subject.

I was taking Analytics Edge on EdX at the same time as this (and still am, actually, at the time of this writing), and I found them to be fairly similar in depth in areas where they overlapped (e.g. random forests). PML also covered boosting, bagging, and regularized regression among other things. The final project I thought was fairly easy, and my random forest-based model correctly predicted all 20 test cases on the first try.

Overall, four stars. This is a nice overview course if your goal is to understand a bit about how to implement machine learning procedures in R. You won't gain much of a deep understanding of these techniques from this course, but it's enough to get you started.

This was one of the better courses in the series, and I thought it lived up to it's name. This was certainly a practical overview of machine learning techniques. There was very little discussion of the algorithms behind these techniques, certainly much less than even in Andrew Ng's Coursera course, which is itself supposedly fairly watered-down compared to many university courses on the subject.

I was taking Analytics Edge on EdX at the same time as this (and still am, actually, at the time of this writing), and I found them to be fairly similar in depth in areas where they overlapped (e.g. random forests). PML also covered boosting, bagging, and regularized regression among other things. The final project I thought was fairly easy, and my random forest-based model correctly predicted all 20 test cases on the first try.

Overall, four stars. This is a nice overview course if your goal is to understand a bit about how to implement machine learning procedures in R. You won't gain much of a deep understanding of these techniques from this course, but it's enough to get you started.

dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped dropped

Brandt Pence **dropped** this course, spending **3 hours** a week on it and found the course difficulty to be **very easy**.

This is the final course in the Genomic Data Science specialization from Johns Hopkins. This course covers some statistical techniques in genomics using R and Bioconductor packages. It has most of the same problems as the previous courses in this specialization in that the work is at a level for which the student really needs some significant background in the technical aspects in order to complete the course. Fortunately, I have enough background in statistics and R programming that I was able to complete this course fairly easily, but this will not be the case for most people without this background.

The course covered exploratory analysis, clustering, regression, batch effects, generalized linear models, p-values, and several other topics, and the final week also included an introduction to some of the more common genomic experiments (RNA-seq, ChIP-seq, GWAS, etc.). Similar to the Bioconductor class, there were no peer-reviewed assignments here, only 1 quiz at the end of each week. The material covered in the lectures was for the most part sufficient to complete the programming needed in order to get the answers to these quizzes. One note is that because Bioconductor packages change so frequently, for some of the questions I was only able to get answers that were somewhat close to what ended up being the correct choice on the quiz, so students need to take care to either realize this or to download and use the versions of the packages used by the instructor when creating the course. This problem is likely to get worse over time.

Overall, three stars. A fair class for someone with an interest in this field who also happens to have a decent background in R programming.

The course covered exploratory analysis, clustering, regression, batch effects, generalized linear models, p-values, and several other topics, and the final week also included an introduction to some of the more common genomic experiments (RNA-seq, ChIP-seq, GWAS, etc.). Similar to the Bioconductor class, there were no peer-reviewed assignments here, only 1 quiz at the end of each week. The material covered in the lectures was for the most part sufficient to complete the programming needed in order to get the answers to these quizzes. One note is that because Bioconductor packages change so frequently, for some of the questions I was only able to get answers that were somewhat close to what ended up being the correct choice on the quiz, so students need to take care to either realize this or to download and use the versions of the packages used by the instructor when creating the course. This problem is likely to get worse over time.

Overall, three stars. A fair class for someone with an interest in this field who also happens to have a decent background in R programming.

Brandt Pence **completed** this course, spending **3 hours** a week on it and found the course difficulty to be **medium**.

This is the second course in the Python for Everybody specialization, and corresponds to the second half of the previous course of the same name. As with the first course (Getting Started with Python), I found Dr. Chuck's thorough approach to the material to be likely to be very approachable for a beginning programmer. I have some experience with Python, including the previous course and a course in the Genomic Data Science specialization, and I have a fair amount more experience programming using R, so I found this course very easy.

The course introduces the common data structures in Python (lists, strings, files, dictionaries, tuples) and the functions used to manipulate them. Dr. Chuck does an excellent job of introducing each piece, although the course is a bit light on practice problems, and there are probably better resources for actually getting comfortable using these on a regular basis. The book "Learn Python the Hard Way" and the EdX course from MITx (6.00.1x) are supposed to be two of the best resources out there. I recently bought the first and enrolled in the second, so I'm hopeful that they will help me to become as comfortable programming in Python as I am in R.

Overall, four stars. A gentle introduction to data structures in Python, but a little light on the exercises needed to become really proficient in using them.

The course introduces the common data structures in Python (lists, strings, files, dictionaries, tuples) and the functions used to manipulate them. Dr. Chuck does an excellent job of introducing each piece, although the course is a bit light on practice problems, and there are probably better resources for actually getting comfortable using these on a regular basis. The book "Learn Python the Hard Way" and the EdX course from MITx (6.00.1x) are supposed to be two of the best resources out there. I recently bought the first and enrolled in the second, so I'm hopeful that they will help me to become as comfortable programming in Python as I am in R.

Overall, four stars. A gentle introduction to data structures in Python, but a little light on the exercises needed to become really proficient in using them.