subject

Profile

Gregory J Hamel ( Life Is Study)

Writer with a passion for E-Learning and data science. Class Central has linked to many of my course reviews on my blog: Life is Study. I am using this account to post extra reviews.

Gregory J Hamel ( Life Is Study)
MN
Economics, Computer Science, Data Science
Bachelors Degree

Completed ( 25 )

Audited ( 17 )

Dropped ( 1 )

No courses found

HTML5 Game Development

Written 2 years ago
This course is a perfect example that having smart instructors who are passionate about what they are doing is not enough to make for good instruction or a good class. Udacity's course offerings are generally top notch in quality, but this one seems to be the lemon of the lot.

The course is structured around an HTML5 game that the profs created and quizzes are centered around having you fill in bits of code into a skeleton of hundreds of lines of their game code. The video lectures are too brief and don't discuss commands at a pace that allows students to learn what they are doing before taking quizzes expecting them to use those commands.

Using an already-made game is a poor instruction decision. Building something from the ground up, piece by piece, over the course of a class is a much better system for learning that doesn't confuse students with tons of lines of unfamiliar code. The profs seem to assume that students should know much more than they actually would having watched the video lectures. Picture a bunch of scientists who are so wrapped up in their own world that they are unable to explain things in terms that a novice can understand. I love Ucadity, but this is one to avoid.
My rating
Gregory J Hamel ( Life Is Study) dropped this course and found the course difficulty to be medium.

Cluster Analysis in Data Mining

Written 2 years ago
Cluster Analysis in Data Mining is third course in Coursera's new data mining specialization offered by the University of Illinois Urbana-Champaign. The course is a 4-week overview of data clustering: unsupervised learning methods that attempt to group data into clusters of related or similar observations. The course covers two most common clustering methods--K means and hierarchical clustering--as well as more than a dozen other clustering algorithms. Grading is based on 4 weekly quizzes with 3 attempts each.

Cluster Analysis is taught by Professor Jiawei Han who was the instructor for the first course in the data mining specialization: Pattern Discovery in Data Mining. The quality of the slides, instruction and organization of materials in this course is slightly better than the pattern discovery course, but that isn't saying much: it is still below Coursera's usual high standards. The course rushes from one topic to another with instruction that is mediocre at best downright confusing at its worst. That's not to say you can't learn anything from this course, but the instruction is often more of a hindrance than a help. There are occasional in-lecture quizzes, but the graded quizzes largely fail to foster any understanding of the material. An optional programming assignment was added half way through the course; in a course about data mining, programming assignments should be front and center, not added as an afterthought to quell an outcry from students.

Cluster Analysis in Data Mining is another disappointing entry in Coursera's data mining specialization. Although the course covers many different clustering methods, poor instruction makes it hard to gain a good understanding of the material unless you are extremely attentive or watch the videos several times.

I give Cluster Analysis in Data Mining 2 out of 5 stars: Poor.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

Introduction to Probability - The Science of Uncertainty

Written 2 years ago
6.041x: Introduction to Probability - The Science of Uncertainty is a comprehensive 16-week introduction to probability offered by MIT through the edX MOOC platform. Although this course is dubbed an “introduction” it is not easy. You need familiarity with differential and integral calculus to understand some of the material, and the course can easily take 10-15 hours per week. Given its 16-week duration, the time commitment required to get through everything is much higher than the average MOOC. The course touches on all the major topics you need to gain a solid understanding of probability including basic axioms of probability, conditional probability and independence, discrete and continuous random variables, Bayesian inference and the probabilistic underpinnings of classical statistics. The course grade is based on lecture comprehension questions, weekly homework assignments, 2 midterms and 1 final exam. The midterms are worth 15% apiece and the final is worth 30% so good performance on the exams is paramount to getting a good score. You need a total of 60% to pass and it isn't quite as easy to achieve that mark as it is in most MOOCs.

Weekly content consists of 2-4 lecture sequences covering different aspects of a particular topic in probability. Each lecture sequence contains about an hour of video in 5 to 15 minute segments and most video segments are followed by graded comprehension questions. The lecture videos themselves are crisp and the professor is good at explaining the material at a pace that doesn't overload you with too much information too quickly. There can be quite a bit of mathematical notation on the screen at times, but it is well-organized. Each week also has a series of solved problem videos where TAs walk you through applying the material in lecture to problems that are similar to those you will see in the homework. The solved problems sections add another 1 to 2 hours of video content per week.

Pure math courses usually aren't that fun because they spend a lot of time dealing with proofs and theory and not so much time dealing with the real world. This course can be a slog at times because it is long and there is a lot to absorb and remember, but after building up the basic tools of probably in the first few weeks, later weeks focus on more interesting extensions and applications. You won’t find another intro to probability with greater depth and breadth. This course is best suited for technical and math-minded people who will have to work with and apply probability in future coursework or in their professional lives. If you're looking for an intro that just gets you up to speed on the rudiments of every-day probability like coin flipping and dice rolling this course is overkill.

6.041x: Introduction to Probability is a great course for those serious about forming a solid foundation in probability. As professor Tsitsiklis states early on, "the first step in fighting an enemy like randomness is to study and understand your enemy." At the end of this course you will be armed with the tools necessary to wage a well-reasoned war against uncertainty.

I give 6.041x: Introduction to Probability 5 out of 5 stars: Excellent.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be hard.

Applications of Linear Algebra Part 2

Written 2 years ago
Applications of Linear Algebra Part 2 is the second part of an introductory linear algebra course offered by Davidson University through the edX MOOC platform. The course spans 6 units and runs for 6 weeks, but all the lecture content and activities are available as soon as the course opens. The topics presented in part 2 build on the foundation laid in part 1 and include: least squares, correlation, eigenvectors, singular value decomposition, Markov chains, principle components analysis and sports prediction.

Applications of Linear Algebra Part 2 follows the same pattern as part 1: each week consists of 2 to 3 short lectures, each with a corresponding activity that illustrates an application of the topic covered in lecture. This formula worked well in part 1 because the topics were relatively simple and the activities were provided via basic web apps. In part 2, the concepts are more complicated--too complicated for students to develop a solid understanding of them after one short lecture video. In addition, most of the activities in part 2 require running code in MATLAB. The course provides a free MATLAB license and tutorial videos, but it takes more effort to jump into activities. On the plus side, once you get them up and running, the applications in part 2 are even more interesting and fun to play with than the activities in part 1.

Professor Chartier is personable and engaging in the lectures despite following a prompter/script. Although his voice is clear, he spends a bit too much time reading off the numeric contents of matrices, when it would more instructive to have the matrices and other information on screen in persistent slides. Given the complexity of the material and brevity of the lecturers, students aren't likely to fully understand the math unless they have taken a course in linear algebra before. I suspect the lectures are going to leave a lot of students scratching their heads. It might have been wiser for the course not to purport to teach all the math behind the applications, but instead give a general overview of concepts before each activity and provide resources/references for students to learn about the math in greater detail. I don't normally advocate hand-waving, but as a course prioritizing applications over mathematical understanding, there are some instances where it may have been warranted.

Overall, Applications of Linear Algebra Part 2 is another solid course that has a lot of interesting activities, but it is not as approachable as part 1 and tends to rush through complicated topics to get to interesting applications.

I give Applications of Linear Algebra Part 2 a score of 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

CS188.1x: Artificial Intelligence

Written 2 years ago
CS188.1x: Artificial Intelligence is an introductory AI course offered by UC Berkeley through the edX MOOC platform. CS188.1x covers roughly the first half of the material in the full on-campus AI course in the span of 12 weeks. Major course topics include search algorithms and heuristics, constraint satisfaction problems, Markov decision processes and reinforcement learning. The course assumes you have taken a first course in algorithms, are familiar with basic data structures, have basic python programming skills and are comfortable with mathematical notation. There isn't any particularly hairy math, but there are a lot of variables and symbols flying around at times. Grading is based on weekly homework assignments that allow unlimited attempts, 3 programming projects and a final exam that allows 1 or 2 attempts per question.

CS188.1x is a direct adaptation of the on-campus AI course. The lecture videos are edited versions of lectures delivered on-campus but instead of seeing the professor, we mostly see the presentation slides themselves with a voice-over from the professor. Direct adaptations of on-campus courses don't always work so well with MOOCs, but this course pulls it off perfectly. The professor speaks clearly and explains topics well. The lecture slides are extremely well-made, with clean text and even a bunch of cute robot and pacman art to go along with the content. The videos are cut down into digestible 5 to 15 minute segments and there are practice comprehension questions following most of the videos that allow you to take a second to reflect and digest the content.

Many courses that have great presentation fall flat when it comes to assignments. This is not one of those courses. The three pacman-themed programming projects are among the best programming assignments I've encountered in any online course. Each project consists of several parts that involve implementing AI algorithms you study in class in the context of a pacman game. The course provides you with all the code you need to run the game, a variety of convenience functions and skeleton code that you have to fill in with algorithms that accomplish the prescribed tasks. The assignments can be frustrating at times, but seeing your code in action with a little pacman racing around gobbling food pellets and ghosts is surprisingly gratifying. It also helps you gain a better understanding of how the algorithms work.

Berkeley CS188.1x: Artificial Intelligence is one of the best MOOCs on the web. It is so good that many students on the forums were eager to take part 2. Unfortunately the professors haven't gotten around to adapting the second half of the full AI course into a MOOC (they did express the desire to do so in the future) but they will give you access to an archived version of the full course upon request.

I give Berkeley CS188.1x 5 out of 5 stars: Excellent.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be hard.

Discrete Optimization

Written 2 years ago
Discrete optimization is a quasi-self-paced programming course offered by the University of Melbourne through Coursera that is all about solving hard problems. Hard problems in the context of this course means NP-hard problems--problems with exponential worst-case running times. The course differs from most classes on Coursera and elsewhere on the web in that all the materials are available as soon as the course opens, but there is a final deadline for the programming assignments, so it is not a self-paced course in the truest sense. The entire course grade is based on 5 programming assignments: the knapsack problem, graph coloring, traveling salesman, warehouse location and vehicle routing. An average score of 7 (out of 10) on each part of each programming assignment is required to earn a certificate.

Discrete optimization opens with an introductory lecture series on the knapsack problem that lasts a couple of hours followed by three longer lecture series, covering constraint programming, local search and mixed integer programming. The lectures do not need to be viewed in any particular order. Similarly, students can work on the homework projects in any order they choose. This level of freedom is great for students who want to work ahead but it may make it difficult to complete the course if you don't plan ahead because the programming assignments can be very time consuming. The assignment skeleton and submission code is written in Python 2.7, but you can use languages if you want.

The professor, Pascal Van Hentenryck, is extremely energetic and passionate about the subject. He makes the lecture videos surprisingly fun for such a dense subject. The lecture videos themselves are well-made and the professor does a good job explaining the material, although I sometimes felt like the course was trying to cover too many different topics and it wasn't always clear how one would go about applying the methods in lecture to the assignments or using them without using some external package or solver. A little more instruction and direction in that regard would be helpful.

Discrete optimization is challenging course with great programming assignments that introduces many different tools and leaves them on the table for you to play with. The tools don't always with full instruction manuals, so you'll have to figure out many of the details yourself. You won't have time to apply every tool to every problem, but if you focus on one and budget your time well, you'll have a good shot at making it through.

I give discrete optimization 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be very hard.

Text Retrieval and Search Engines

Written 2 years ago
Text Retrieval and Search Engines is the second course in Coursera's new data mining specialization offered by the University of Illinois at Urbana-Champaign. The course covers a variety of topics in text data mining and natural language processing including text retrieval, query ranking and evaluation methods, methods and the basics of recommender systems. Grading is based entirely on 4 weekly quizzes comprised of 10 multiple choice questions. You only get 1 attempt on the quizzes.

The weekly content in Text Retrieval and Search Engines consists of around 10 video lectures that range from 5 to 20 minutes followed by a short 10 question quiz. If that sounds like a lot of lecture per question, it is, and there are no in-lecture quizzes to reinforce concepts as you go along. The lectures themselves are definitely a step up from the first course in the specialization, Pattern Discovery in Data Mining. The professor isn't hard to understand this time around and he explains concepts well enough to grasp them without having to re-watch videos. As with many of Coursera's other 4-week specializations, however, lectures sometimes turn into information dumps where the professor ends up reading off slides. The course does have a C++ programming assignment which was nice to see.

Text Retrieval and Search Engines is a decent course that is worth a look if you are interested in text data mining and search engines. Although the lectures lackluster, they have some good information. If you're planning on getting a verified certificate, it is a good idea to try the practice quizzes before submitting the real one.

I give this course 2.75 out of 5 stars: Fair.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

Discrete Time Signals and Systems, Part 1: Time Domain

Written 2 years ago
Discrete Time Signals and Systems, Part 1: Time Domain is a 4-week introduction to discrete time signals offered by Rice University through the edX platform. This course was originally 8 weeks, but edX split it up into two parts, one covering the time domain and one addressing the frequency domain. Major course topics include signal properties, signals as vectors, linear time-invariant systems and convolution. The course requires some linear algebra and calculus (it has a pre-course assessment) as well as some basic programming in MATLAB. You don't need to know any MATLAB going in, but if you do you can skip the tutorial. Grading is based on a combination of comprehension questions, homework quizzes, peer graded free responses and a final exam. All of the course content other than assignments is available immediately so you can work ahead if you want to.

Discrete Time Signals and Systems started around the same time as a similar signal processing course on Coursera called "Digital Signal Processing." I found Discrete Time Signals to be much more approachable than the Coursera course; it introduces concepts at a steady but manageable pace and doesn't overload you with math right out of the gate. The course isn't easy, but it isn't too difficult considering the topic. The lecture videos are well-done and the instruction is very good, although some videos could stand to be broken up into multiple parts. Professor Baraniuk tends to stutter, but it didn't really bother me or detract from the quality of the instruction. The MATLAB programming questions are baked right into the edX website and let you get some hands-on experience with the concepts. The final exam is "closed book" which I think is a mistake as it promotes guessing over learning.

All in all, Discrete Time Signals and Systems Part 1 is an excellent introduction to signal processing that is likely to be more accessible than other courses on the same subject you may find elsewhere. The stage is set for a deeper dive into signal processing in Part 2.

I give this course 4.5 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

Applications of Linear Algebra Part 1

Written 2 years ago
Applications of Linear Algebra Part 1 is a light, activity-focused introduction to linear algebra offered by Davidson University through edX. The course is suitable for anyone who is curious about what linear algebra is and how it can be used in the real world, including high school students and advanced junior high students. The course doesn't go deep into the math, but rather focuses on thinking about data in terms of matrices and illustrating linear algebra operations with activities. The materials span 7 units that include activities ranging from image manipulation and animation to cryptography and sports prediction. Grading is very relaxed as you have unlimited attempts on comprehension quizzes and the remainder of the points are based on the activities.

If you've taken a linear algebra course before, this class will be very easy, but you can still get some entertainment out of the activities and learn a bit about sports prediction. One of the biggest failings of math education is a heavy focus on rote repetition, which disconnects math from the real world and makes it boring. Applications of Linear Algebra is the type of course that is needed to raise interest in math. It introduces concepts at a digestible pace suitable for beginners and almost every lecture video that teaches a new concept is followed by an activity devoted to seeing that concept in action. Professor Chartier is clear and personable even though he seems to be working off a script--something that is not easy to do. The video quality is good and the activities, while simple, are illustrative. You can complete each week's material in an hour or less, but you can certainly spend a lot more time if you play around with the activities.

Applications of Linear Algebra Part 1 is great course to get beginners interested in linear algebra by getting their hands on fun activities as quickly as possible. I hope to see Professor Chartier carry the same formula into Applications of Linear Algebra Part 2.

I give part 1 of Applications of Linear Algebra Part 5 out of 5 stars: Excellent.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Intro to Relational Databases

Written 2 years ago
Intro to Relational Databases is a short 4 lesson course offered by Udacity that covers the basics of SQL databases. Lessons 1 and 2 cover basic SQL querying, including grouping, ordering and inner joins, lesson 3 addresses inserts and concerns when using a database backend for a webapp and lesson 4 covers database design principles and a few more advanced features like outer joins and subqueries. I won't get into the final project as Udacity's projects tend to be geared toward students with subscriptions.

Each lesson consist of several short videos with quizzes that involve multiple choice questions and coding exercises that revolve around altering and submitting SQL queries. The instructor is easy to understand and explains things well. The content is polished and I didn't notice any bugs, which is rare for a brand new course. On the other hand, the course is a bit too short and doesn't give beginners enough practice with newly introduced syntax before moving on. It would be helpful to give students a few short drills writing queries related to each newly introduced keyword from scratch. Also, to follow along with lesson 3, you have to download, install and interact with a virtual machine. The time necessary to download, install and figure out how to use the VM is probably more than is warranted with such a short course, although the VM may be used for other Udacity courses.

Intro to relational databases is a succinct overview of SQL basics that serves as a nice refresher for someone who has seen SQL before, but making it a little longer and providing more simple drills would probably be helpful for beginners.

I give Intro to relational databases 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Linear and Integer Programming

Written 2 years ago
Linear and Integer Programming is a 7-week course covering linear programming

in detail. The course focuses on teaching the simplex method for optimizing systems linear equations with constraints for the first 4 weeks and then covers integer programming and applications. You should be comfortable with basic linear algebra and calculus before taking this course. The course includes optional programming assignments that allow students to build up their own simplex algorithms over the course of the class, but you can easily pass the course just taking the weekly quizzes.

Linear and Integer Programming does an admirable job tacking a dense, dry subject. The instructors are easy to understand and explain confusing concepts well. The presentation style and video quality seem a bit dated, but it doesn't detract much from the learning

experience. I must admit that my interest waned as the course went on because

I took it due to curiosity than rather than a preexisting interest in the subject. That was a mistake. You should not take this course for fun; take it if you really want to learn about linear programming and have the time to get through all the lectures, supplementary materials and programming assignments.

Overall, Linear and Integer Programming is a great course if you want to learn

about the simplex algorithm in depth and understand important considerations

and applications of linear and integer programming.

My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

Pattern Discovery in Data Mining

Written 2 years ago
Pattern discovery in data mining is the first course in a new 5-part data mining specialization offered by the University of Illinois at Urbana-Champaign through Coursera. Keeping with the trend of other specialization courses, pattern discovery in data mining spans 4 weeks and will likely be offered again each month or two after the first offering. The course covers a range of methods for finding different types of patterns in data, such as association rules and patterns in graphs. Grading is based exclusively on 4 weekly quizzes.

I was excited to see the new data mining specialization come up on Coursera to kick off 2015, but unfortunately, pattern discovery in data mining is a dull, poorly executed information dump. Besides an interesting topic, there’s not much going for this course. In the lectures, the professor reads information off dense slides and his delivery is more confusing than instructive. The slides, video and sound are of decent quality, but the explanations are not clear and while I normally don't have an issue with foreign accents, the professor's English made things harder to understand. To make things worse, there are few instructive in lecture quizzes and no activities or programming assignments. A course about data mining should have programming assignments or activities that let students interact with the concepts to reinforce learning.

Pattern discovery in data mining is a disappointing start to the data mining specialization, that suffers from poor instruction quality and lack of illustrative assignments. Taking this course is like a data mining problem in and of itself: you have to spend a lot of time deciphering the lectures to uncover useful information.

I give this course 2 out of 5 stars: Poor.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

Learning How to Learn: Powerful mental tools to help you master tough subjects

Written 2 years ago
Learning How to Learn is a 4 lesson self-paced course that summarizes key findings in neuroscience about how we learn. The course touches on brain function, working and long-term memory and various methods for improving learning as well as overcoming hurdles like procrastination.

The lecture content in learning how to learn is very good. Videos aren't too long, the lecturer is clear and personable and everything is easy to understand. There are more bonus/guest lectures than you'd see with a typical MOOC and I find engaging, memorable guest lectures are rare. Also, you can't fully complete the course unless you verify your identity before submitting quizzes, even if you don't want a verified certificate.

One of the main pitfalls with MOOCs is that you can get into the habit of watching hours of lecture content without taking time out to practice, recall and commit ideas into long-term memory. Good courses help students learn with quizzes and homework; this course teaches students other things they can do, such as making flash cards, taking breaks and getting adequate sleep, to maximize learning. Considering the main lecture content only takes a few hours complete, this course offers a good amount of value for your time.

I give this course 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be very easy.

Model Building and Validation

Written 2 years ago
Model Building and Validation is an advanced data science course provided by AT&T through the Udacity MOOC platform. The course is listed as "advanced" because it assumes prior knowledge of machine learning, statistics, linear algebra and calculus. Despite the stated prerequisites, math doesn't play a large role, so you will still be able to understand most of the content even if your only preparation is Udacity's intro to machine learning. The course spans 4 lessons that detail the process of extracting value from data through questioning, modeling and validation. Lesson 1 is a general introduction to the QMV process with each of the following lessons digging into each component of QMV in more detail. The course somewhat oversells its length as none of the lessons take more than a few hours despite the course being listed at an estimated 8 weeks with 6 hours of study per week. Admittedly, I did not do the final project that involves creating a fraud detection model, which could take a significant chunk of time.

Model Building and Validation follows the same formula as other Udacity courses, with each lesson taking the form of a series of short lecture videos interspersed with quizzes. The lecturers are easy to understand and the video quality is generally good, although the videos and course materials have some glitches that need to be ironed out. I won't grade the course too harshly on bugs, since all courses are buggy at the very beginning, and they will likely be fixed in the near future.

As for the content itself, the simple idea of framing a data analysis as a tree to track and organize the decisions you make along the way is probably the most useful thing you'll take away from this course. The course also does a good job getting students to think about some of the high-level decisions that must be made when conducting a data analysis. The content gets rockier when it delves into specifics after lesson 1, particularly in the models lesson. The lectures occasionally dive too quickly into the low level details of machine learning techniques that students may not have seen before. Additionally the validation section focuses much more on model evaluation metrics like ROC curves, the confusion matrix and derived metrics that fall out of it, than validation itself.

Model Building and Validation is a good course that provides a nice framework for approaching data analysis, but it gets bogged down in some machine learning specifics that don't add much to the overarching theme.

I give Model Building and Validation 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

UT.7.01x: Foundations of Data Analysis

Written 2 years ago
UT.7.01x: Foundations of Data Analysis is a gentle, 13 week introduction to statistics and the R programming language provided by the UT Austin through the edX MOOC platform. The course covers basic descriptive statistics, the normal distribution, sampling and hypothesis testing, including t-tests, chi-square tests and ANOVA. The course has no prerequisites, although you may need to spend some extra time learning the basics of R if you haven't used it before.

Each week of Foundations of Data Analysis begins with a reading assignment, a couple of lecture videos with comprehension questions and an R programming tutorial. The videos tend to be in the 7-10 minute range and the tutorials typically total less than 10 minutes a week, so the total video content per week is usually 20-30 minutes. The videos are generally well-edited and the professor does a good job describing concepts simply and concisely. Each week has a prelab, lab and problem set that allow you to apply the concepts you learn in lecture and in the R tutorials. Each problem set consists of 3-4 mini case studies, so you'll probably end up spend most of your time on the labs and problem sets. The assignments are not very difficult, although many questions limit you to 1 or 2 attempts. You need a cumulative score of 70% to earn a certificate.

Foundations of Data Analysis introduces new concepts at a relatively slow pace and gives students a good amount of practice through the labs and assignments. Concepts are explained well in lecture so the readings are not always necessary to do the activities, but they often provide extra depth and raise considerations that are not discussed lecture. The course did have some hiccups with homework questions and auto-graders and many answers expect rounded answers, which can result in frustrating off-by-a-fraction errors. In addition, the course uses an external forum system called Piazza instead of the normal edX forums, which I found to a hassle.

Bottom line: UT.7.01x is a great place for a beginner to start with stats and R as long as you don't mind an external forum.

I give Foundations of Data Analysis 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Learning From Data (Introductory Machine Learning)

Written 2 years ago
CS1156x: Learning from Data is a 10-week introductory machine learning course offered by Caltech on the edX platform focused on giving students a solid foundation in machine learning theory. Major course topics include the feasibility of learning, linear models, generalization, VC dimension, overfitting, regularization and validation. The course also covers several common machine learning algorithms including the perceptron, linear regression, logistic regression, neural networks, support vector machines and radial basis functions. As a theory-heavy course, much time is devoted to mathematical reasoning and the math behind various machine learning concepts and algorithms. You need a strong mathematical background, including knowledge of linear algebra and calculus, to understand everything in this course. You also need the ability to program in some language that allows you to perform matrix and vector operations. The course provides a temporary MATLAB license and forum support for MATLAB; many students also used R and Python.

Learning from Data is different from most MOOCs in that it isn't optimized for the web. Course content consists of 18 full-length lecture videos recorded on the Caltech campus, each spanning about 75 minutes including 10-15 minute Q&A sessions. Two lectures are posted each week for 9 weeks along with PDFs of lecture slides and 8 homeworks that each consist of 10 multiple choice questions. There are no in-video quizzes or interactive exercises, as the course is basically an online port of the on-campus course. It requires a high level of motivation and attentiveness get through two very dense 75-minute lectures each week and despite being multiple choice, the homework problems can be very time consuming since many require programming. You get 2 attempts at each question, but each attempt is worth half of your grade, so guessing based on your intuition can be costly. The final is an untimed test that is just like the homework except that it has 20 questions. You need a total score of 50% to earn a certificate.

Although Learning from Data isn't in the typical MOOC format, the professor is a skilled lecturer and manages to keep the lengthy lecture videos engaging. The lecture slides are packed with useful information and the forums were very helpful; students were active in helping one another and the professor was very active on the forums even though this wasn't the first run of the course. The homework questions reinforce the material more than you would expect from a 10 question multiple choice quiz if you take the time to understand the question and answer carefully.

Overall, Learning from Data is a great course that emphasizes theory, but often has practical implications. The level of mathematical maturity it requires will be barrier for some students, although you can still get something of this course if you don't understand all of the math. If I were taking this course as a student on campus I would probably rate it 5/5, but I think they missed some opportunities to make it truly excellent MOOC by failing to adapt it for the online the audience. This course will give you a deeper understanding of machine learning than other intro MOOCs on the same subject, but if you're more interested in learning practical tools and applying machine learning consider taking MIT's Analytics Edge on edX or Coursera's Machine Learning course.

I give this course 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be hard.

Intro to Machine Learning

Written 2 years ago
Udacity's Intro to Machine Learning is an introduction to data analysis using Python and the sklearn package. The course consists of 15 lessons covering a wide range of machine learning topics including classification algorithms (Naive Bayes, decision trees and SVMs), linear regression, clustering, selecting and transforming features and validation. As a self-paced course, you can take however long you wish on each lesson; some take less than an hour, while others can take several hours depending on how long you work on the mini projects. Intro to Machine Learning requires basic programming and math skills.

Each lesson consists of a series of video segments and quizzes introducing a new topic followed by a mini-project that gives you a chance to work with code implementing the topics you learned in Python using scikit-learn. The course instructors Katie and Sebastian (the guy who runs Udacity) do a good job explaining the material keeping the course engaging, but they keep things simple. The quizzes, at times, are almost patronizingly easy. The mini projects are a bit harder and contribute more to learning, although they occasionally lack adequate guidance and feedback to help students arrive at the expected output. The final project and many of the mini-projects leading up to it, involve detecting persons of interest in the Enron scandal using a data set of emails sent by Enron employees. Interesting real-world data sets are always a plus.

Intro to Machine Learning is an accessible first course in machine learning that prioritizes breadth, high level understanding and practical tools over depth and theory. You won't be an expert in any of the topics covered in this course by the time you're done, but you will be exposed to several major topics in machine learning and have a basic understanding of how they work. If you are interested taking a similar course with many interesting mini projects that uses the R programming language, try MIT's Analytics Edge on edX. Coursera's Machine Learning with Andrew Ng is a logical next step to dig deeper into machine learning algorithm design and implementation, while Caltech's Learning from Data on edX is a great course if you are interested in machine learning theory. Just be aware that both of these courses (particularly the Caltech course) require a stronger math background.

I give this course 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Data Visualization and D3.js

Written 2 years ago
Udacity's Data Visualization and D3.js is one of two new intermediate data science courses Udacity released this month, the other being an introduction machine learning. This course consists of 4 lessons covering visualization and D3.js basics, design principles and dimple.js, narrative structures and interaction/animation. Each lesson spends some time discussing general visualization design principles and considerations followed by technical information, which often invovles combing through D3.js code. Since D3.js is a Javascript library, it is useful to have some exposure to Javascript, HTML and CSS before taking this course.

Data Visualization with D3.js has the same polished and streamlined content structure as Udacity's other courses, with each lesson taking the form of a series of short videos interspersed with quizzes. The content focused on visualization design and principles is well done. On the other hand, the meat of the course--the sections focused on coding and creating visualizations--were not as engaging as I'd hoped. D3.js is a low level Javascript library, so it takes a lot of code to generate graphs and a lot of time to explain the code and learn what it is doing. Too many videos consist of talking students through large chunks of somewhat cryptic code without much interactivity and it takes too long to get to the point where you make visualizations. I didn't feel like I was really learning how to make visualization myself so much as understanding bits and pieces of the instructor's code. The course doesn't give students enough opportunity and direction for writing D3.js code themselves: lessons 3 and 4 don't have problem sets. I think it was probably a mistake to use D3.js for such a short course. It might have been better to use a higher level visualization package that gets students making their own visualizations faster with less code.

Data Visualization with D3.js is not a bad course and I could see other students liking more than I did, but after taking Udacity's excellent Exploratory Data Analysis course, it was a disappointment. In the EDA course, you jump in and start generating tons of plots in R and actually get to the point where you are reasonably comfortable using ggplot2 to make plots by the end. If are looking to learn D3.js specifically, this course could be a good starting point, but for learning data visualization in general, D3.js seemed to be more of a barrier to learning than an asset. I'd liken this course to an introductory programming course that uses C. Starting with a lower level language like C can be a bit painful and it takes longer to get to the point where you are doing interesting things--time you don't have in a 4 lesson course.

I give this course 3 out of 5 stars: Satisfactory.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.

Social and Economic Networks: Models and Analysis

Written 2 years ago
Social and Economic Networks: Models and Analysis is an introductory network theory and analysis course offered by Stanford through the Coursera MOOC platform geared toward learners who are comfortable with basic statistics, probability and linear algebra. You don't need to know anything about social networks ahead of time to take this course, but having basic familiarity with networks will help things go a bit smoother. The course has 7 weeks of lecture content covering network basics, measures of centrality, network formation models and diffusion, learning and games on networks. You'll also be introduced to Gephi, a software tool for network visualization and analysis. The 8th week is reserved for a final exam.

Social and economic networks provides all the raw information you need to get a solid grounding in network theory and analysis, but the presentation style is impersonal so the content is not particularly engaging. The professor is knowledgeable and appears on screen while explaining lecture slides, but he shows little emotion. While the lectures can get a bit intimidating with equation after equation, the homework exercises and final exam are easier than the lectures might suggest. You get 2 attempts on each chapter quiz and 1 attempt on the final; a score of 70% or more is required for a certificate and 90% or more will earn you a certificate with distinction.

All in all, social and economic networks is worthwhile course if you are interested in social networks and aren't intimidated by a bit of math, but I wouldn't take it for fun. If you want to take a course on the same subject that is lighter on math, consider Coursera's Networked Life from UPenn. It covers similar topics in a manner that is a bit more accessible to the average person.

I give this course 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

How to Use Git and GitHub

Written 2 years ago
How to use Git andGitHub is a 3-week introductory course offered by Udacity covering the basics of the Git version control system. As a short course with only 3 lessons, it focuses on the giving students a solid grounding in the basics of Git and doesn't stray too far into any advanced topics. Lesson 1 covers version control in general, checking differences between files, commits, cloning, git log and getting Git set up on your computer. Lesson 2 covers basics of repositories, branches, merging and merge conflicts. Lesson 3 introduces GitHub and related commands and considerations including remotes, pushing, pulling, forking and issues that may arise due to collaboration.

How to use Git and GitHub does exactly what a short intro level course should do: stay focused on covering the basics in detail without taking diversions into esoteric features that are likely to confuse students and distract them from forming foundation knowledge. Sarah and Caroline do a good job explaining things at a level and pace appropriate for an intro course. The course has a bit more reading embedded in the video playlist than most Udacity courses. Also, many of the quizzes require you to run commands on your computer and copy and paste output back into Udacity, which can be a bit troublesome. It would be nice if they had an interactive Git environment similar to Code School's Git course allowing you to do everything you need to do right in the browser. Still, How to Git and GitHub is a great place to start if you are learning about Git for the first time.

I give this course 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Intro to HTML and CSS

Written 2 years ago
Intro to HTML and CSS is a 3-lesson primer on front-end web development and design. Although the name of the course suggests you'll learn HTML and CSS basics, the content is actually focused on higher level web development and design concepts like web page and project structure, responsive design and web frameworks. The course spends very little time talking about nuts and bolts like different HTML tags and CSS properties. The content itself is well done, it just strays a bit from what you'd expect given the title of the course. It should be entitled "Fundamentals of front-end web development/design" or something similar.

This course is hard to rate given that the content is good, but it doesn't quite fulfill the expectations set by the title. As such, I'm subtracting 1 star from what is otherwise a nice, short intro to front end web development. If you want to learn HTML nuts and bolts, the first week of Udacity's "web development" spends a lot more time going over HTML tags. The HTML and CSS courses on Code Academy and Code School are also good options.

I give this course 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Explore Statistics with R

Written 2 years ago
Explore statistics with R is a 5-week introductory level course offered by the Karolinska Instetutet through the edX platform, covering the basics of R, statistics and using R for statistical analysis. The course covers 3 main topic areas in 4 weeks--R basics, getting and manipulating data in R and statistical tests in R. The 5th week consists of a final graded assignment where you follow along with research project conducted in R. Each week consists of a few short lecture videos followed by a series of graded quiz questions. The class awards a certificate if you achieve a total score at last 60% on the quizzes and the final graded assignment.

Explore stats in R offers some quality content, but it is too short constitute a complete intro to R or stats. The professor speaks clearly, explains concepts well and seems genuinely excited to be teaching the course. I noticed he seemed active on the course's discussion boards, which is nice to see. The quizzes were too few and a bit too easy: they generally tested conceptual knowledge and did not require the student to do much in R besides copy, paste and run code. As a course focused on statistical operations, it didn't teach basic programming concepts in R like control flow and functions. This course could benefit from having a bit more content each week and beefing up the homework exercises to force students to do a little bit more in R on their own. Extending the course by a couple of weeks would also give the professor time to cover some neglected topics like programming basics in R and more on data visualization. With expanded content this could be a great course, so hopefully they'll make some tweaks and additions and offer it again.

I give this course 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Networked Life

Written 2 years ago
Networked Life is course offered by the University of Pennsylvania through Coursera, which provides a gentle introduction to network and graph theory. It covers the basics of network structure, network formation models and networked games. The course consists of 7 weeks of lecture content--typically three 8-20 minute videos per week--with a 8-10 question quiz for each video. The quizzes aren't too difficult and you get 2 attempts, but since there is one quiz for every lecture video, you'll be spending a significant proportion of your total class time answering quiz questions. The course doesn't get into network algorithms or computing: it focuses on basic network structure, formation and games, so you can take this course without any programming or math background.

Networked Life debuted about 2 years ago, making it among the first courses available on Coursera, so the presentation and slide quality are a bit dated. The lecturer mainly reads directly off slides and you spend the majority of lecture time looking at static slides written in Comic Sans as the lecturer explains them in greater detail. The information is solid and generally interesting but the presentation is often a bit dull when there are no illustrations on the screen. The quizzes are probably the best part of the course; even though they are easy they help reinforce the content and break what might otherwise become a tedious slog through lecture video after lecture video. The course is self-paced, so despite it having "7 weeks" of content, you can finish it faster if you want to.

Networked Life is an accessible introduction to networks and while the presentation isn't great, the topics are interesting and the frequent quizzes help keep you engaged.

I give networked like 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Introduction to Data Science

Written 2 years ago
Introduction to Data Science is a MOOC offered by the University of Washington on the Coursera platform. Introduction to data science is a misleading title for this course because it is not introductory level and it does not have a sensible flow that builds from one week to the next as you would expect from an intro course. Instead, the course acts as more of a data science sampler that introduces new topics each week that often have little to do with material covered in previous weeks. Lecture topics include relational databases, relational algebra, SQL, MapReduce, No SQL, miscellaneous topics in statistics, machine learning, visualization and graph analytics. If that sounds like a disjointed smorgasbord of topics, it is. To make matters even more complicated, the programming assignments use three different languages: Python, R and SQL. This course is best suited for those who have some exposure to Python, R, SQL and statistics.

If you have the appropriate background knowledge, this course touches on many interesting topics and while the lecturer's delivery is not great, he is quite knowledgeable and the material usually isn't too hard to grasp. Although the homework assignments require different languages and may take you a while to complete, they are rewarding. For instance, you'll work with real Twitter data you capture from the net, implement MapReduce operations in Python and participate in a machine learning competition on Kaggle.com.

Introduction to data science is likely to be frustrating to those expecting a general intro to data science. The course jumps around too much and uses too many different tools to be a good first course in data science, but the breadth of topics covered and programming assignments make this course worth a look if you already have some exposure to data science or the tools the course uses. If nothing else, you can skip through the lectures and watch sections that are of particular interest to you.

I give this course 3 out of 5 stars: Satisfactory.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

Machine Learning

Written 2 years ago
Machine Learning is one of the first programming MOOCs Coursera put online by Coursera founder and Stanford Professor Andrew Ng. Although Machine learning has run several times since its first offering and it doesn’t seem to have been changed or updated much since then, it holds up quite well. This course assumes that you have basic programming skills. Assignments also require many vector and matrix operations and slides include some long formulas expressed in summation notation so it is recommended to have some familiarity with linear algebra. You don't need to know calculus or statistics to take this course, but you may gain deeper insight into some of the material if you do. The course uses the Octave programming language, a free clone of MATLAB.

The course runs 10 weeks and covers a variety of topics and algorithms in machine learning including gradient descent, linear and logistic regression, neural networks, support vector machines, clustering, anomaly detection, recommender systems and general advice for applying machine learning techniques. Lectures are split into 3 to 15 minute segments with periodic quizzes and each topic section has a corresponding quiz. Section quizzes are worth 1/3 of the total grade but you get unlimited attempts (with a 10-minute retry timer.). Andrew Ng does a good job explaining dense material and slides although the audio levels are often too low. If you don' have good speakers you might need headphones to hear him talk. The other 2/3 of the course grade is based on 8 multi-part programming assignments that typically involve filling in code for key functions to implement machine learning algorithms covered in lecture. The course gives you a lot of structure and direction for each homework, so it is generally pretty clear what you are supposed to do and how you are supposed to do it even if you don't understand 100% of the materiel covered in lecture. You need to achieve a total score of 80% to earn a certificate, so while you can retry quizzes and resubmit programming assignments you'll have to get most things to work in the end to get one.

Machine learning is a great course if you can get past quiet audio. If you've never used Octave or MATLAB before, don't let that stop you from taking this course: learning the basics necessary to do the assignments only takes a couple of hours and it will help you think of things in terms of vectorized operations.

I give this course 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

Big Data Analysis with Apache Spark

Written 2 years ago


CS100.1x Introduction to Big Data with Apache Spark is a 5-week intro to distributed computing offered by UC Berkeley through the edX MOOC platform focused on teaching students how to perform large-scale computation using Apache Spark. The assignments use PySpark, Spark’s Python API, so some familiarity with Python programming is necessary. You don’t need prior exposure to big data or distributed computing to take the course. Grades are based on four programming labs (80%), easy comprehension questions that allow unlimited attempts (12%) and setup of the course virtual machine used to complete the labs (8%).

Course lectures in to Big Data with Apache Spark are relatively brief and tend to stay at a high level, discussing general big data concepts rather than the details of Apache Spark. The instructor does a fine job in the few lectures the course offers, but there were not enough of them and they often felt disconnected from the assignments. The fifth week had no lectures.

The labs are the core of this course. While you can breeze through weekly lectures in half an hour or less, each of the four labs are lengthy reading and programming assignments packaged in IPython notebooks. Expect to spend 2 to 4 hours on labs 1, 2 and 4 and 3 to 6 hours on lab 3. The labs start by teaching basic Apache Spark manipulations and move on to some text analysis and machine learning. Using the IPython notebook to deliver labs is a convenient way to intermingle text and instructions with code. On the other hand, each exercise tends to depend on code executed somewhere above it, so a mistake made on earlier exercise can lead to some odd errors later on and Spark’s error traces aren’t particularly helpful. The course does provide some basic tests for each exercise, but it is easy to arrive at solutions that pass the checks but cause errors later on. The course forums on Piazza are a vital resource for troubleshooting and disambiguation; I imagine some of the snags will be resolved in future offerings. Despite the occasional hiccups, the labs do a good job familiarizing students with Apache Spark’s Resilient Distributed Dataset objects and the various transformations and actions you can perform with them.

Introduction to Big Data with Apache Spark is a great place to start learning about distributed computing if you know some Python. Although the lectures don’t add much technical depth to the course, they provide some big picture background that will be useful for students who have little prior exposure to big data concepts. The labs give you adequate opportunity to get your hands dirty with Apache Spark to gain basic familiarity with data manipulations it offers. UC Berkley is offering a follow-up course “Scalable Machine Learning” that builds on the foundation laid in CS100.1x.

I give this course 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course.

Text Mining and Analytics

Written 2 years ago


Text Mining and Analytics is the fourth course in the Data Mining specialization offered by the University of Illinois at Urbana-Champagne through Coursera. Text Mining builds upon the second course in the specialization, Text Retrieval and Search Engines. Course topics include mining word relations, topic discovery, text clustering, text categorization and sentiment analysis. The course lists programming proficiency (especially in C++) and knowledge of probability and statistics. Keeping with the system established by other data mining specialization track courses, grading is based entirely upon 4 multiple choice quizzes with 10 questions apiece. You only get one attempt at the quizzes.

Text Mining and Analytics is information-packed. Each week has 2.5 to 4 hours of lecture content in video segments that generally range from 10 to 20 minutes. The videos quality is satisfactory but the explanations and content on the slides could be a bit clearer. Despite the long videos, there are no comprehension questions or exercises to interact with during or after lecture segments to reinforce learning. By the time you reach the quiz at the end of the unit, you may find yourself having to go back review certain videos to answer the questions. There is an optional programming assignment.

Text Mining and Analytics covers many useful data mining topics, but it has too much lackluster video content for its own good. I can’t help but feel like a better course would have been able to condense the videos down to cover the same topics in half the time, leaving room for more quizzes and exercises. This course could serve as useful as reference material but students watching straight through may find a lot of information going in one ear and out the other.

I give Text Mining and Analytics 2.5 out of 5 stars: Mediocre.
My rating
Gregory J Hamel ( Life Is Study) audited this course.

CS190.1x: Scalable Machine Learning

Written 2 years ago


Scalable Machine Learning is a 5-week distributed machine learning course offered by UC Berkeley through the edX platform. It is a follow up to another UC Berkely course: Introduction to Big Data with Apache Spark. Although the first course is not a strict perquisite, Salable Machine Learning uses the same virtual machine and even has some overlap with the homework labs, so it is beneficial to take Introduction to Big Data first. Scalable Machine Learning teaches distributed machine learning basics using Pyspark, Apache Spark’s Python API. Basic proficiency with Python is necessary to pass the course and some exposure to algorithms and machine learning concepts is helpful. Course evaluation is based primarily on 5 labs distributed as iPython notebooks.

The first two weeks of the course cover machine learning basics and introduce Apache Spark. For students already familiar with machine learning basics who took Introduction to Big Data, there’s not much new to learn during first two weeks. Week 2 is essentially an exact clone of week 2 of the intro to big data course, including the lab assignment. The final 3 weeks have meatier lecture content and longer labs, each covering a different machine learning technique--linear regression, logistic regression and principal component analysis.

The lecture content is clean and the lecturer speaks clearly. His delivery isn’t perfect, but the only real purpose of the lectures is to serve as background information for the meat of the course: the labs. Each lab is a lengthy iPython notebook with several sections leading you through the process of creating a pipeline for running a machine learning algorithm with Pyspark. Much of the code you need is provided for you, but writing the key functions and data transformations necessary to complete the labs can still be time consuming. Little things like an ambiguous instruction or uncaught error you made earlier in the assignment can result in bugs that take a while to squash. Despite occasional frustrations, the labs do a good job interspersing instruction with practical, hands-on learning.

Scalable Machine Learning is a quality introduction to machine learning with Pyspark that focuses on labs over lectures. The lectures could be better and some of the instructions and error checks in the labs could be more comprehensive, but this is a great course for those looking to learn by doing.

I give Scalable Machine Learning 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

Statistics for Business – I

Written 2 years ago
Statistics for Business I is a spreadsheet-focused statistics course offered by the Indian Institute of Management, Bangalore through the edX platform. The course spans 5 weeks including 4 weekly lessons and one week for a final exam. Course topics include descriptive statistics, variable summaries, the shape of distributions and probability. The course has no prerequisites other than having access to Microsoft Excel. You may be able to get by with a free alternative like LibreOffice, but the course lectures use Excel. Grading is based on lecture comprehension questions, exercises, caselets and a final exam.

Weekly content in Statistics for Business I consists of a series of relatively short lectures interspersed with comprehension questions, followed by several exercises and caselets to let students apply what they’ve learned. The lectures themselves are well-made strike a good balance between instructor face time and showing spreadsheet operations. The lead instructor, Shankar, is easy to understand and has some lighthearted yet instructive interactions with is brainy assistant Lysa (she’s a plastic brain that sits on his desk.). Each week has a ton of comprehension questions and exercises to let students get practice with the spreadsheet operations and concepts presented in lecture. Hands-on practice is essential for skill building, so having plenty of exercises is a good thing.

Statistics for Business I starts out slow, but the pace picks up toward the final lessons. Some students might feel that the last couple of lessons cover too many concepts in one week. Although having plenty of exercises is generally a good thing, the large number of easy, repetitive exercises grew tiresome. The course might benefit from making some of the exercises optional so that students who need more practice can get it, while those who don’t can skip ahead.

Statistics for Business I is a good course for learning how to deal with numbers in Excel, but the large of number of graded exercises can make things tedious at times. This course is best suited for beginners in statistics with basic knowledge of spreadsheets and those who know some statistics and want more experience using Excel. Statistics for Business II is set to launch in October 2015.

I give Statistics for Business I 4 out of 5 stars: Very good.

My rating
Gregory J Hamel ( Life Is Study) audited this course.

Data Visualization

Written 2 years ago
Data Visualization is the fifth and final course in the data mining specialization offered by John Hopkins University on Coursera. The 4-week course provides a high-level overview of data visualization, covering topics like human visual perception, basic plotting constructs and design principles, visualizing networks and visualizing databases. The course doesn’t have any particular prerequisites, but knowing how to make plots with some software package or programming language will be helpful for the assignments. Grading is based on two quizzes and two peer-graded visualization projects.

The lecture content in Data Visualization is better than the lectures of the previous courses in the data mining specialization. The instructor is easy to understand and there isn’t as much dense technical content to absorb. On the downside, since the course focuses on high-level concepts, you won’t learn how to actually construct your own visualizations. It’s up to you to pick out software and figure out how to make visualizations with it. It would have been preferable for the entire data science specialization to pick a programming language and stick with it throughout to pair concepts with specific implementations and exercises.

Data Visualization is a nice introduction to visualization at a high level, but the lack of low-level technical instruction and exercises limits its practical usefulness, especially for students who don’t already know how to create their own visualizations. The course is relatively smooth end to what is otherwise a rocky specialization, but since the content has no real connection to the other courses in the data mining track, you could take it as a standalone course.

I give Data Visualization 3 out of 5 stars: Fair.

My rating
Gregory J Hamel ( Life Is Study) audited this course.

A Crash Course in Data Science

Written 2 years ago
A Crash Course in Data Science is a succinct, one-week overview of the field of data science produced by the same team from John Hopkins University that produced Coursera’s data science specialization. It is the first course in the “Executive Data Science” specialization, a data science track aimed at non-technical people like business managers. The course defines data science and then discusses different aspects of data science like statistics, machine learning and the structure, output and success metrics for data science projects. Grading is based on a handful of short multiple-choice comprehension quizzes.

A Crash Course in Data Science is good for what it is: a brief overview of a field taught at a high level so that anyone can follow along. The professors have plenty of face time, explain concepts well and the video quality is good. The content quality is a definite step up from the original John Hopkins data science track.

The only real knock against this course is its brevity and the fact that it costs the full $49 to get a verified certificate if you want to complete the specialization. A course that you can complete in an hour or two should not cost the same as a month-long course. Students looking to dig their teeth into something substantial for the first month of the Executive Data Science specialization may be disappointed.

A Crash Course in Data Science is a well-made primer on the data science field, but its brevity may leave paying students wanting.

For freeware students I give this course 4 out of 5 stars: Very Good.

My rating
Gregory J Hamel ( Life Is Study) completed this course.

Building a Data Science Team

Written 2 years ago
Building a Data Science Team is the second course in “Executive Data Science” specialization offered by John Hopkins University on Coursera. It is a one-week course that defines the different data science roles in an organization, what to look for in data scientists and strategies for managing and communicating with data scientists. The course has no prerequisites and grading is based on a handful of multiple-choice quizzes.

The content in Building a Data Science Team is similar to the first course in the specialization: it is geared toward a non-technical people who have to manage data scientists. The video quality is good and the instructor is personable, easy to understand and knowledgeable. There’s not too much to dislike about this course apart from its brevity. All of the courses in the Executive Data Science track are only a week long, so they can be completed in one or two learning sessions. This is not necessarily a bad thing: I find it refreshing to get a high-level overview of a topic in a short course, but it may not deliver the amount of content that paying students expect.

Building a Data Science Team is a good course for what it is: a succinct primer how to assemble and manage a data science team.

I give this course 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be very easy.

Managing Data Analysis

Written 2 years ago
Managing Data Analysis is the third course in “Executive Data Science” specialization offered by John Hopkins University on Coursera. The one-week course discusses the process of data analysis at a high level from formulating questions to exploratory analysis, inference, modeling and communicating results. Grading is based on several short comprehension quizzes.

The lectures in Managing Data Analysis are of good quality and the instructor is generally easy to understand. The lectures do, however, use some jargon and concepts that aren’t always adequately explained. Unlike the first two courses of the specialization, which are geared toward managers, this course is more geared toward people who are actually going to be conducting data analysis. The concepts in this course are definitely important for data science managers to understand, but non-technical students may find this to be a jarring change of pace. In addition, certain parts may be confusing if you have had no prior exposure to statistics or machine learning other than the first two courses of this specialization.

Managing Data Analysis provides a useful overview of the process of data analysis, but it is taught at a level appropriate for data analysts. “The Data Analysis Process” would be a more appropriate name for this course.

I give Managing Data Analysis 3.5 out of 5 stars: Good.

My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Data Science in Real Life

Written 2 years ago
Data Science in Real Life is the fourth and final course in the “Executive Data Science” specialization offered by John Hopkins University on Coursera. The one-week course examines various steps in the data analysis process and contrasts ideal outcomes against the outcomes you are likely to experience in reality. Grading is based upon a few short multiple-choice quizzes.

The lecture videos are crisp and the professor does a good job explaining the topics without being overly technical. It does discuss some topics that you won’t fully appreciate without having hand-on experience doing data science projects, but it will help prepare you for some of the problems you might encounter. Like other courses in the Executive Data Science track, there’s not too much to dislike about this course other than its brevity and the limited depth at which topics can be covered in a one-week course.

Data Science in Real Life is nice, succinct overview of many of the challenges you are likely to face in data projects and suggestions for overcoming them. It is raises considerations that could be useful for both data analysts and managers.

I give Data Science in Real Life 4 out of 5 stars: Very Good.

My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Business Metrics for Data-Driven Companies

Written 2 years ago


Business Metrics for Data-Driven Companies is the first course in the “Excel to MySQL: Analytic Techniques for Business Specialization” offered by Duke University through Coursera. This short 4-week, self-paced course introduces the concept of business metrics and the role they play in business analytics. It also spends some time discussing the various data-centric roles at different types of companies. The course has no prerequisites and grading is based on 3 multiple-choice quizzes and a final case-study assignment.

The lecture content in Business Metrics is crisp and the lecturer is easy to understand. There are only 3 short weeks of lecture content as the final week is devoted to the case study. The peer-graded case study assignment involves identifying and explaining a business metric in a fictitious business. The course explains several common business metrics in detail but doesn’t send as much time on how to use metrics to formulate questions, inform analysis or make decisions. Hopefully these are topics that will be covered in more detail in some of the upcoming courses in the specialization

Business Metrics for Data-Driven Companies is a good overview of business metrics and business data culture. As the first part of a larger specialization, it concludes before you get to use the metrics you learn about in any sort of analysis. The value of this course will ultimately depend upon whether the follow up courses make good use of the foundation it lays.

As a standalone course, I give Business Metrics for Data-Driven Companies 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Foundations of strategic business analytics

Written 2 years ago
Foundations of Strategic Business Analytics is the first course in the “Strategic Business Analytics Specialization” offered by ESSEC business school on Coursera. The 4-week course covers data analysis topics including clustering, exploring relationships between variables, forecasting and communicating results. All discussion is geared toward a business context, so the focus is on producing clear, actionable insight instead of looking at low-level details. The course uses the R programming language for analysis; basic familiarity with R is assumed. Grading is based on 3 quizzes and a peer-graded assignment.

Each week consists of two main content sections: a lecture section that introduces concepts and data analysis techniques and then a recital section that teaches you how to use the methods discussed in lecture in R. The lecture videos themselves are polished with nice text graphics. The lecturer’s English takes a little time to get used to, but he speaks clearly and he does a good job framing each topic in the context of business. The programming recitals are easy to follow and let you get some hands-on experience with lecture topics right away.

Foundations of Strategic Business Analytics is a good introduction to thinking about data analytics in a business setting, but it is a bit short if taken as a standalone course. Follow-up courses should let you dig your teeth deeper into the material. Also note that the specialization is listed as “Advanced”, but this course is not very technical and only really requires basic R knowledge as a prerequisite.

I give Foundations of Strategic Business Analytics 4 out of 5 stars: Very Good.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be easy.

Machine Learning Foundations: A Case Study Approach

Written 2 years ago
Machine Learning Foundations: A Case Study Approach is a 6-week introductory machine learning course offered by the University of Washington on Coursera. It is the first course in a 5-part Machine Learning specialization. The course provides a broad overview of key areas in machine learning, including regression, classification, clustering , recommender systems and deep learning, using short programming case studies as examples. The course assumes basic Python programming skills and it uses a software package called GraphLab that requires a 64-bit operating system running Python 2.7. Grades are based on periodic comprehension quizzes and short programming assignments.

The course covers a broad range of machine learning topics at a high level with the promise of drilling down into the details in future courses in the specialization. The lecturers have good chemistry, but they tend to get distracted when they are on screen together. The video and slide quality are very good and although the delivery is a little rough around the edges at times, the lectures are informative. The machine learning methods covered aren’t necessarily treated as complete black boxes, but the course intentionally avoids getting too deep into the details, putting the emphasis on conceptual understanding.

The weekly labs are contained in short IPython Notebooks—interactive text and code documents rendered in a web browser—that illustrate some simple models in GraphLab. The labs themselves are easy and don’t require much coding other than calling various built in GraphLab functions. The hardest part about the class is getting your programming environment set up in the first place. If you don’t have a new version of 64-bit Python 2.7, you can’t run GraphLab. It is relatively easy to get set up if you can use the recommended Anaconda Python distribution, but getting things set up manually on an existing Python installation may prove troublesome. The instructors provided some workarounds for doing the course without GraphLab or using GraphLab on Amazon’s cloud computing service; I wouldn’t take the course without getting GraphLab working in some form. Many students decried the use of a non-open source package for an open class; I think it is useful to be exposed to new tools and GraphLab seems cleaner than Python’s popular scikit-learn package. In this sort of course, the focus should be one concepts rather than syntax.

Machine Learning Foundations: A Case Study Approach achieves its goal of introducing machine learning at a high level without rushing or trying to cram too much into any particular week. What the professors lack in terms of polish they make up for with enthusiasm. Compatibility and setup issues will be a roadblock for some, but overcoming them is worth it.

I give Machine Learning Foundations: A Case Study Approach 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Analyzing and Visualizing Data with Excel

Written a year ago


Excel for Data Analysis and Visualization is an intermediate level course offered by Microsoft through the edX platform that covers cutting edge techniques for gathering, transforming and viewing data in Excel. The course focuses on getting students up to speed with new features and techniques offered in Excel 2016, such as the Excel data model, queries, DAX (a syntax of defining functions) and Power BI, an online productivity service that integrates with Excel. This course assumes you have some familiarity with MS Excel, particularly pivot tables and slicers. You can complete the course with Excel 2010 or 2013, but if you don't have Excel 2016 you'll have to download add ins and you'll have to work slightly harder to complete the assignments. Grading is based on 7 weekly labs and 12 comprehension quizzes.

Weekly content in DAT206x consists of one to three short video lectures describing new Excel features followed by a comprehension quiz. The amount of video content per week is usually under 30 minutes, so you shouldn't need to commit more than an hour or two a week to complete the course. The lecture videos have adequate resolution to see cell values and lecturer's presentation is easy to follow. Weeks 1-7 have lab assignments that let you apply the techniques presented lecture. You only get a couple of submissions for most lab and quiz questions, but most questions are not too difficult.

Excel for Data Analysis and Visualization is a succinct, informative course on new Excel features that is worth checking out for those interested in going beyond the basics. Using Excel 2016 for this course when it launched only a few months before the course debuted may partially be a ploy to convince Excel users to upgrade, but I can't fault Microsoft for teaching with the latest version of their own product and I completed the course with Excel 2010 without much difficulty.

I give Excel for Data Analysis and Visualization 4 out of 5 stars: very good.
My rating
Gregory J Hamel ( Life Is Study) completed this course.

DAT203x: Data Science and Machine Learning Essentials

Written a year ago


Data Science and Machine Learning Essentials is a 5-week introductory data science course offered by Microsoft through edX that focuses on teaching students how to use Microsoft's cloud-based machine learning platform, Azure ML. The course divides content into two tracks, an R track and a Python track, so you can complete the course with either language, but you'll need to know the basics of at least one of the two. Grading is based on 5 weekly reviews and a single 20 question exam.

The course title "Data Science and Machine Learning Essentials" is misleading because this course is not really about data science or machine learning per se. The first week attempts to cram an entire machine learning course or two worth of concepts into a handful of mediocre lectures, while the remainder of the course is all about Azure ML. Weeks 2-5 provide a nice overview of Azure ML and the fact that it has full lectures for both R and Python is a great feature that surely took a lot of extra time and effort to produce. The main lecturer's presentation skills aren't the best, but the videos are still easy to follow. Azure ML offers a lot of interesting functionality, like the ability to use Python and R scrips in the same project and publish projects as web services, but some of the exercises were tedious and ran slowly.

If data "Data Science and Machine Learning Essentials" were renamed "Intro to Azure ML" and only included the content in weeks 2-5, it would be a good course. Weeks 2-5 are definitely worth checking out if you are interested in Azure ML. As it stands now, however, the first week bombards students with far too many concepts explained too quickly to foster real understanding and sets the wrong expectations for the remainder of the course.

I give Data Science and Machine Learning Essentials 2.75 out of 5: mediocre.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be easy.

Data Visualization and Communication with Tableau

Written a year ago
Data Visualization and Communication with Tableau is the third course in Duke University's "Excel to MySQL: Analytic Techniques for Business" specialization offered on Coursera. The 5-week course starts is essentially an introduction to Tableau (weeks 2 and 3) book-ended by some lectures on considerations and best practices for communicating data insights in a business setting (weeks 1 and 4.). The final week is devoted to a peer-reviewed assignment and has no new lecture content. The course provides you with a free temporary license for the desktop version of Tableau. You can get through his course without any background knowledge, although some knowledge of MS Excel will help you appreciate some of the comparisons it makes. Grading is based on 4 weekly quizzes and a peer graded assignment.

Data Visualization has quality lectures that do a good job introducing Tableau in the context of creating visualizations for a business context. The Tableau walkthroughs are easy to follow and give you an appreciation for how much easier it is to make nice visualizations in Tableau than it is in Excel. You same data sets for the entire course, one data set for walkthoughs and one for homework assignments, which provides a nice sense of consistency. Weeks 1 and week 4 raise some useful considerations to keep in mind when preparing for and presenting a data analysis, but the Tableau sections in weeks 2 and 3 are the heart of the course. I would have preferred more content covering ins and outs of Tableau instead of the 2 weeks spent on communication topics, but the mix is probably about right for business-oriented students.

I give Data Visualization and Communication with Tableau 4 out of 5 stars: very good.
My rating
Gregory J Hamel ( Life Is Study) audited this course.

Machine Learning: Regression

Written a year ago
Machine Learning: Regression is the second course in the 6-part Machine Learning specialization offered by the University of Washington on Coursera. The 6-week course builds from simple linear regression with one input feature in the first week to ridge regression, the lasso and kernel regression. Week 3 also takes a detour to discuss important machine learning topics like the bias/variance trade-off, overfitting and validation to motivate ridge and lasso regression. Like the first course in the specialization, "Regression" uses GraphLab Create, a Python package that will only run on the 64-bit version of Python 2.7. You can technically use other tools like Scikit-learn or even R to complete the course, but using GraphLab will make things much easier because all the course materials are built around it. Knowledge of basic calculus (derivatives), linear algebra and Python is recommended. Grading is based upon weekly comprehension quizzes and programming assignments.

Each week of Machine Learning: Regression tackles specific a topic related to regression in significant depth. The lectures take adequate time to build your understanding and intuition about how the techniques work and go deep enough that you could implement the algorithms presented yourself. The presentation slides are high quality and available as .pdf downloads, although the text written by the lecturer isn't particularly neat. The lecturer isn't the best orator around but she manages to explain topics well and the course takes plenty of time to cover important considerations and review key concepts at the end of each week. Overall, the pacing and organization of course materials is excellent and the presentation, while not perfect, is personable and clear.

Every lesson in "Regression" has at least one accompanying programming assignment that explores the topics covered in lecture. The assignments are contained in Jupyter (iPython) notebooks and come with all the explanatory text and support code you need to complete them. The labs walk you through implementing some key machine learning algorithms like simple linear regression, multiple linear regression with gradient descent, ridge regression, lasso with coordinate descent and k-nearest neighbors regression. The assignments are not particularity difficult as much of the code is already written for you and most tasks you have to perform are spelled out in great detail sometimes to the point where each line of code you have to write is noted in a text comment. Some may not appreciate this level of guidance but it keeps the assignments moving along at a steady pace and puts the focus on understanding machine learning concepts rather than programming skills and limits time wasted troubleshooting bugs.

Machine Learning: Regression is an excellent introduction to regression that covers several key machine learning algorithms while building understanding of fundamental machine learning concepts that extend beyond regression. If you have any interest in regression and have an environment that can run GraphLab, take this course.

I give Machine Learning: Regression 5 out of 5 stars: Excellent.
My rating
Gregory J Hamel ( Life Is Study) completed this course and found the course difficulty to be medium.

Managing Big Data with MySQL

Written a year ago


Managing Big Data with MySQL is the fourth and final course in Duke University's Excel to MySQL: Analytic Techniques for Business specialization offered through Coursera. The 5-week course focuses on teaching students how to make relational database queries. Unlike some database courses that delve into details concerning database construction and theory, this course is all about the practical use of databases from the perspective of a business analyst. The first week introduces the concept of relational databases, entity relationship diagrams and schema, while the remainder of the course covers querying from simple select statements to summary functions, grouping, joins and subqueries. You don't need any particular background to take this course and it could be taken in isolation from the rest of the specialization. Grading is based on 4 week-end multiple-choice quizzes.

Weekly course content is divided into several lessons that typically involve watching a short video segment and then working through an exercise set in MySQL or Teradata, two relational databases used in the course. The lecture content is high quality but after the first week, you'll be spending most of your time working on exercises rather than watching videos. In fact, some lessons don't have video lectures at all: the written exercises are really the core of the course. The MySQL exercises are contained in Jupiter notebooks--interactive text and code documents--that let you read instructions and play around with code in the same place. The exercises provide plenty of opportunity to drill SQL queries and build SQL vocabulary. The answers to exercise questions are provided in PDFs (they are ungraded), which means you can skip ahead if you don't need more practice. Considering each week after the first has at least 3 exercises sets plus a quiz, each of which could take a few hours to complete in their entirety, consulting the answer keys frequently is recommended to keep things moving along at a reasonable pace.

At the end of each week after the first you'll do a final exercise set using Teradata and answer multiple choice quiz questions based on your results. You use the same real-world data set for each quiz--product information from Dillard's department stores—helping you build some familiarity with the data by the end of the course. The final week of the course doesn't cover any new material: it just contains the final quiz.

Managing Big Data with MySQL is a great course for learning practical relational database querying skills with plenty of exercises that let you interact with real-life data sets. The focus on drilling ungraded exercises combined with sparing use of lectures after the first week does, however, make the course feel impersonal. It plays out more like a collection of training materials than the sort of university-style course you may expect from Coursera.

I give Managing Big Data with MySQL 4.5 out of 5 stars: Great.

My rating
Gregory J Hamel ( Life Is Study) completed this course.

Deep Learning

Written a year ago


Udacity's "Deep Learning" is a 4-lesson data science course built by Google that covers artificial neural networks. The first lesson builds up some machine learning background on classification problems, while lesson 2 discusses the basic machinery of neural networks and deep learning (neural networks with multiple layers.). Lesson 3 covers conventional networks for image recognition and lesson 4 covers recurrent networks and issues dealing with text data. This course assumes you have intermediate Python programming experience and basic knowledge of machine learning, statistics, linear algebra and calculus.

Each lesson in the course consists of a series of short video lecture segments with occasional comprehension questions and breaks to apply topics discussed in programming assignments. The video quality itself is good and the lecture quality is adequate, but the lecture segments are very brief, with most lasting around a minute or less. The sum total of the video content in the third lesson on convents is less than 15 minutes. The programming assignments, which use a popular neural network library called TensorFlow, are lacking in instruction and involve either running large chunks of provided code or working on open-ended questions. You likely won't be able to make much progress on the assignments without prior knowledge of machine learning and TensorFlow or doing a lot of extra research outside of the course materials. The programming problems also require significant computing resources; my laptop with 8GB of RAM ran out of memory when running the provided code in the first assignment.

Deep Learning is a shallow course that is akin to reading CliffsNotes instead of a textbook: you'll learn some terminology and be exposed to some interesting concepts but its abbreviated coverage is likely to confuse students who are new to neural networks while leaving more experienced students unsatisfied. This course seems like a rushed attempt to capitalize on the hottest buzzword in the hottest tech industry, which is a shame because it could have been a good course if it took the time to cover the topics in adequate detail.

I give Deep Learning 2 out of 5 stars: Disappointing.

*If you're interested in learning about the topics this course introduces in much more depth, check out the video lectures and course materials for CS231n, a deep learning course focused on image recognition offered by Stanford.
My rating
Gregory J Hamel ( Life Is Study) audited this course.

Machine Learning: Classification

Written a year ago


Machine Learning: Classification is the third course in the 6-part machine learning specialization offered by the University of Washington on the Coursera MOOC platform. The first two weeks of the 7-week course discuss classification in general, logistic regression and controlling overfitting with regularization. Weeks 3 and 4 cover decision trees, methods to control overfitting in tree models and handling missing data. Week 5 discusses boosting as an ensemble learning method in the context of decision trees. Weeks 6 and 7 cover precision and recall as alternatives to accuracy for assessing model performance and stochastic gradient ascent to make models scalable.

The course builds on the concepts covered in Machine Learning: Regression, so it is highly recommended that you take it first. Assignments use GraphLab, a Python package that requires the 64-bit version of Python 2.7. You can technically complete the course with whatever language and tools you like, but using Python and GraphLab will make your life much easier because the assignments are designed around it. Like the previous course, basic knowledge of Python, derivatives and matrices is recommended, but course doesn't get too deep into math. Grading is based on weekly quizzes and programming assignments.

Machine Learning: Classification follows in the footsteps of the regression course, offering a good mix of high quality instructional videos and illustrative programming assignments. Carlos Guestrin takes the reigns in the course (Emily Fox, the professor for the regression course, does not make an appearance) but the presentation format and style are mostly unchanged: videos break topics down into well-organized and digestible 1 to 7 minute chunks. The slides are crisp and generally uncluttered. Some of the most complicated sections are optional, so you can skip them without it affecting your performance on the programming assignments and quizzes.

The programming assignments are provided in Jupyter notebooks--interactive text and code documents that run in your browser. They do a good job illustrating the concepts and walking you through the process of implementing machine learning algorithms. Although the course claims that you'll be implementing algorithms yourself from scratch, they provide a ton setup, support and skeleton code: you don't need to define a single function yourself. Instead, you follow along with instructions and fill in key pieces of code in the bodies of certain pre-defined functions to get things working. Essentially every line of code you need to write has a comment giving you the gist of what you are supposed to do. Some may not appreciate this degree of hand-holding, but it keeps the assignments moving along steadily and puts the focus on learning and understanding concepts rather than coding details and debugging.

My only major gripe with this course is with some of the decisions concerning which topics to cover. The course mentions random forest models briefly at the end of the section on boosting, but the topic warrants a little more detail. A single 5-8 minute video would have been enough. The course does not mention support vector machines at all. The professor stated in the forums that he may release some videos on SVMs in the future but they were not included at launch since they are more complicated than other models and do not scale well to large datasets. The section on decision trees only discusses missclassification error as a metric for splitting, failing to mention information gain or gini impurity, which are often preferred in practice. Similarly, the boosting section focuses on AdaBoost, while stochastic gradient boosting and xgboost in particular are often more successful in practice. The final week's title "scaling to huge data sets and online learning" is a little misleading because it only really covers stochastic gradient ascent and mini-batch gradient ascent.

Machine Learning: Classification is a great first course for learning about classification that benefits from good organization and illustrative programming assignments. The course, does, however eschew some important topics in favor of simplicity; including a few more optional videos covering these topics would give the course the breadth and depth advanced learners desire without harming its accessibility.

I give Machine Learning: Classification 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) completed this course.

Data Science Ethics

Written 10 months ago
DS101x Data Science Ethics is a 4-week survey of ethical issues that arise in data science offered by the University of Michigan through the edX MOOC platform. The course includes 9 modules that begin with a basic overview of ethics and the history of ethics, followed by discussions of data ownership, privacy, anonymity, data validity, algorithmic fairness and society consequences. There are no real prerequisites to take the course, although some familiarity with data science will give you more insight into the material. Grading is based on 9 quizzes (one for each module) that allow unlimited attempts and a written, peer-graded case study. You can earn enough points to pass the course without doing the peer-graded assignment.

Each module is divided into three parts: video lectures, case study lectures and a quiz. The video lecture sections generally consist of about three video segments that run from 4 to 12 minutes each that introduce and discuss major course topics. The lecturer speaks clearly and the slides and video quality are good, but the lecturer's delivery is somewhat monotonous. I recommend increasing the playback speed to keep things moving along at a reasonable pace. The case study lectures look at specific real-life instances of ethical issues presented in the main video lectures. The quizzes are mostly true/false response word problems and you can submit your answers as many times as you want so it is easy to get 100 percent.

Data Science Ethics raises a variety of ethical issues data science practitioners should consider when collecting and using data, but ethics are largely subjective so it can't provide definitive prescriptions about what you should or shouldn't do. It will help you raise relevant ethical questions, but it won't necessarily help you answer them. I found that the case studies were often more relevant and interesting than the main lectures.

Data Science Ethics provides a nice overview of some of the ethical implications of data science and requires a minimal time commitment. Just be aware that the subject matter is subjective so the professor can only really present his take on the issues.

I give Data Science Ethics 3.5 out of 5 stars: Good.
My rating
Gregory J Hamel ( Life Is Study) completed this course.

Machine Learning: Clustering & Retrieval

Written 10 months ago
Machine Learning: Clustering & Retrieval is the fourth course in the University of Washington's 6-part machine learning specialization on Coursera. The 6-week course covers several popular techniques for grouping unlabeled data and retrieving items similar to items of interest. After a short intro in week 1, the course covers k-nearest neighbor search, k-means clustering, Gaussian mixture models, latent Dirichlet allocation and hierarchical clustering. It is recommended that you complete the first 3 courses in the specialization track before taking this course, but you could take it as a standalone course as long as you know a bit of Python and probability. Grading is based on a series of comprehension quizzes and labs, but you must pay for a verified certificate to gain access to graded assignments. Thankfully you can still download and complete the labs without doing the associated quizzes, so you won't miss too much as a freeware student.

Clustering and Retrieval has a good balance of lecture content and labs that illustrate concepts covered in lecture. The professor is easy to understand and the lecture slides and are well done. The course generally has good pacing and devotes plenty of time to each of the main weekly topics, taking care to explain important considerations like different algorithmic approaches to each method and similarities between different techniques. It does, however, go off on a couple tangents, introducing map reduce and hidden Markov models, neither of which are covered in much detail or addressed in the labs.

The labs use a data set of Wikipedia articles about famous people as an example to illustrate clustering and retrieval. Using the same data set for multiple labs is always a good idea because it lets students focus on the techniques themselves instead of having familiarizing themselves with new data. The amount of actual coding you have to do in the labs is minimal. The labs are more like interactive explorations of machine learning techniques with occasional one-line fill in the blanks than full-on coding assignments. You'll spend more time reading text, running provided code and analyzing results than writing code yourself. You can look at and answer the lab quiz questions as you go along but you can't actually submit them and get graded feedback without joining the verified track.

Machine Learning: Clustering & Retrieval is a great course that covers the many most common clustering techniques with adequate depth while remaining accessible. Although the coding required is minimal, it is not an easy course: some of the concepts may take a couple watch-troughs to sink in and you may struggle with certain concepts if you don't have prior knowledge of probability. Aside from the need to pay to gain access to graded quizzes and few topics that felt tacked on, there's not much to dislike about this course.

I give Machine Learning: Clustering & Retrieval 4.5 out of 5 stars: Great.
My rating
Gregory J Hamel ( Life Is Study) audited this course and found the course difficulty to be medium.