39 minute read  written by  . Published on May 8, 2017

Editor’s note: Drop us a note at guides@class-central.com if you have any feedback or requests for particular career guides. We are also looking for contributors!

Here are the parts of the series that have been published so far:

  1. The Best Intro to Programming Courses for Data Science
  2. The Best Statistics & Probability Courses for Data Science
  3. The Best Intro to Data Science Courses
  4. The Best Data Visualization Courses
  5. The Best Machine Learning Courses (this one)

Our Pick

The best online machine learning course is Stanford University’s Machine Learning. Taught by the famous Andrew Ng, Google Brain founder and former chief scientist at Baidu, it covers all aspects of the machine learning workflow and several algorithms. Ng is a dynamic yet gentle instructor that inspires confidence. It has a 4.7-star weighted average rating over 422 reviews.

Machine Learning by Stanford University via Coursera

A New Ivy League Introduction with a Brilliant Professor

Machine Learning by Columbia University is a relatively new offering that is part of their Artificial Intelligence MicroMasters. The few reviews it has so far are exceptionally strong, with high praise for the instructor (Dr. John Paisley). The course covers all aspects of the machine learning workflow and more algorithms than the above Stanford offering. It is also a more advanced introduction than Stanford’s. It has a 4.8-star weighted average rating over 10 reviews.

Machine Learning by Columbia University via edX

A Practical Intro in Python & R from Industry Experts

Machine Learning A-Z™ on Udemy is an impressively detailed offering that provides instruction in both Python and R. It covers the entire machine learning workflow and a huge number of algorithms. The prerequisites listed are “just some high school mathematics,” so this course might be a better option for those daunted by the Stanford and Columbia offerings. The instructors are revered for their ability to “make the complex simple.”

Machine Learning A-Z™: Hands-On Python & R In Data Science by Kirill Eremenko, Hadelin de Ponteves, and the SuperDataScience Team via Udemy

Table of Contents

  1. Why You Should Trust Us
  2. About the Data Science Career Guide
  3. How We Picked Courses to Consider
  4. How We Tested
  5. What Is Machine Learning? What Is a Workflow?
  6. Do These Courses Cover Deep Learning?
  7. Recommended Prerequisites
  8. Our Pick
  9. A New Ivy League Introduction with a Brilliant Professor
  10. A Practical Intro in Python & R from Industry Experts
  11. The Competition
  12. About Class Central Career Guides
  13. Author Bio

Why You Should Trust Us

I started creating my own data science master’s degree using online courses a year and a half ago. I have taken many data science-related courses, including a few machine learning courses, and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role.

For this guide, I spent a dozen hours trying to identify every online machine learning course offered as of May 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community, and its database of thousands of course ratings and reviews.

Class Central Home Page

Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.

About the Data Science Career Guide

Class Central’s Data Science Career Guide is a six-piece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.

Here are the parts of the series that have been published so far:

  1. The Best Intro to Programming Courses for Data Science
  2. The Best Statistics & Probability Courses for Data Science
  3. The Best Intro to Data Science Courses
  4. The Best Data Visualization Courses
  5. The Best Machine Learning Courses (this one)

P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.

How We Picked Courses to Consider

Each course must fit three criteria:

  1. It must have a significant amount of machine learning content. Ideally, machine learning is the primary topic. Note that deep learning-only courses are excluded. More on that later.
  2. It must be on-demand or offered every few months.
  3. It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn, this guide focuses on courses. Courses that are strictly videos (i.e. with no quizzes, assignments, etc.) are also excluded.

We believe we covered every notable course that fits the above criteria. Since there are seemingly hundreds of courses on Udemy, we chose to consider the most-reviewed and highest-rated ones only. There’s always a chance that we missed something, though. So please let us know in the comments section if we left a good course out.

How We Tested

We compiled average ratings and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.

We made subjective syllabus judgment calls based on three factors:

  1. Explanation of the machine learning workflow. Does the course outline the steps required for executing a successful ML project? See the next section for what a typical workflow entails.
  2. Coverage of machine learning techniques and algorithms. Are a variety of techniques (e.g. regression, classification, clustering, etc.) and algorithms (e.g. within classification: naive Bayes, decision trees, support vector machines, etc.) covered or just a select few? Preference is given to courses that cover more without skimping on detail.
  3. Usage of common data science and machine learning tools. Is the course taught using popular programming languages like Python, R, and/or Scala? How about popular libraries within those languages? These aren’t necessary, but helpful so slight preference is given to these courses.

What Is Machine Learning? What Is a Workflow?

A popular definition originates from Arthur Samuel in 1959: machine learning is a subfield of computer science that gives “computers the ability to learn without being explicitly programmed.” In practice, this means developing computer programs that can make predictions based on data. Just as humans can learn from experience, so can computers, where data = experience.

A machine learning workflow is the process required for carrying out a machine learning project. Though individual projects can differ, most workflows share several common tasks: problem evaluation, data exploration, data preprocessing, model training/testing/deployment, etc. Below you’ll find helpful visualization of these core steps:

Machine Learning Workflow

The machine learning workflow, via UpX Academy

The ideal course introduces the entire process and provides interactive examples, assignments, and/or quizzes where students can perform each task themselves.

Do These Courses Cover Deep Learning?

First off, let’s define deep learning. Here is a succinct description:

“Deep learning is a subfield of machine learning concerned with algorithms inspired by the structure and function of the brain called artificial neural networks.”

— Jason Brownlee from Machine Learning Mastery

As would be expected, portions of some of the machine learning courses contain deep learning content. I chose not to include deep learning-only courses, however. If you are interested in deep learning specifically, we’ve got you covered with the following article.

My top three recommendations from that list would be:

Recommended Prerequisites

Several courses listed below ask students to have prior programming, calculus, linear algebra, and statistics experience. These prerequisites are understandable given that machine learning is an advanced discipline.

Missing a few subjects? Good news! Some of this experience can be acquired through our recommendations in the first two articles (programming, statistics) of this Data Science Career Guide. Several top-ranked courses below also provide gentle calculus and linear algebra refreshers and highlight the aspects most relevant to machine learning for those less familiar.

Our Pick

Machine Learning by Stanford University on Coursera

Stanford University’s Machine Learning on Coursera is the clear current winner in terms of ratings, reviews, and syllabus fit. Taught by the famous Andrew Ng, Google Brain founder and former chief scientist at Baidu, this was the class that sparked the founding of Coursera. Released in 2011, it covers all aspects of the machine learning workflow. Though it has a smaller scope than the original Stanford class upon which it is based, it still manages to cover a large number of techniques and algorithms. It has a 4.7-star weighted average rating over 422 reviews.

Ng is a dynamic yet gentle instructor with a palpable experience. He inspires confidence, especially when sharing practical implementation tips and warnings about common pitfalls. A linear algebra refresher is provided and Ng highlights the aspects of calculus most relevant to machine learning.

Evaluation is automatic and is done via multiple choice quizzes that follow each lesson and programming assignments. The assignments (there are eight of them) can be completed in MATLAB or Octave, which is an open-source version of MATLAB. Ng explains his language choice:

In the past, I’ve tried to teach machine learning using a large variety of different programming languages including C++, Java, Python, NumPy, and also Octave … And what I’ve seen after having taught machine learning for almost a decade is that you learn much faster if you use Octave as your programming environment.

Though Python and R are likely more compelling choices in 2017 with the increased popularity of those languages, reviewers note that that shouldn’t stop you from taking the course.

Stanford Logo

Listed below are the course’s details, including their description, syllabus, and prominent reviews.

Machine Learning

Basic Information

University: Stanford University

Instructors: Andrew Ng

Platform: Coursera

Pace: Self-paced

Cost: Free and paid options available

Estimated timeline: 11 weeks

Andrew Ng

Andrew Ng

Description

Machine learning is the science of getting computers to act without being explicitly programmed. In the past decade, machine learning has given us self-driving cars, practical speech recognition, effective web search, and a vastly improved understanding of the human genome. Machine learning is so pervasive today that you probably use it dozens of times a day without knowing it. Many researchers also think it is the best way to make progress towards human-level AI. In this class, you will learn about the most effective machine learning techniques, and gain practice implementing them and getting them to work for yourself. More importantly, you’ll learn about not only the theoretical underpinnings of learning, but also gain the practical know-how needed to quickly and powerfully apply these techniques to new problems. Finally, you’ll learn about some of Silicon Valley’s best practices in innovation as it pertains to machine learning and AI.

Syllabus

View Detailed Syllabus

Week 1

  • Introduction: Welcome to Machine Learning! In this module, we introduce the core idea of teaching a computer to learn concepts using data—without being explicitly programmed.
  • Linear Regression with One Variable: Linear regression predicts a real-valued output based on an input value. We discuss the application of linear regression to housing price prediction, present the notion of a cost function, and introduce the gradient descent method for learning.
  • Linear Algebra Review: This optional module provides a refresher on linear algebra concepts. Basic understanding of linear algebra is necessary for the rest of the course, especially as we begin to cover models with multiple variables.

Week 2

  • Linear Regression with Multiple Variables: What if your input has more than one value? In this module, we show how linear regression can be extended to accommodate multiple input features. We also discuss best practices for implementing linear regression.
  • Octave/Matlab Tutorial: This course includes programming assignments designed to help you understand how to implement the learning algorithms in practice. To complete the programming assignments, you will need to use Octave or MATLAB. This module introduces Octave/Matlab and shows you how to submit an assignment.

Week 3

  • Logistic Regression: Logistic regression is a method for classifying data into discrete outcomes. For example, we might use logistic regression to classify an email as spam or not spam. In this module, we introduce the notion of classification, the cost function for logistic regression, and the application of logistic regression to multi-class classification.
  • Regularization: Machine learning models need to generalize well to new examples that the model has not seen in practice. In this module, we introduce regularization, which helps prevent models from overfitting the training data.

Week 4

  • Neural Networks: Representation: Neural networks is a model inspired by how the brain works. It is widely used today in many applications: when your phone interprets and understand your voice commands, it is likely that a neural network is helping to understand your speech; when you cash a check, the machines that automatically read the digits also use neural networks.

Week 5

  • Neural Networks: Learning: In this module, we introduce the backpropagation algorithm that is used to help learn parameters for a neural network. At the end of this module, you will be implementing your own neural network for digit recognition.

Week 6

  • Advice for Applying Machine Learning: Applying machine learning in practice is not always straightforward. In this module, we share best practices for applying machine learning in practice, and discuss the best ways to evaluate performance of the learned models.
  • Machine Learning System Design: To optimize a machine learning algorithm, you’ll need to first understand where the biggest improvements can be made. In this module, we discuss how to understand the performance of a machine learning system with multiple parts, and also how to deal with skewed data.

Week 7

  • Support Vector Machines: Support vector machines, or SVMs, is a machine learning algorithm for classification. We introduce the idea and intuitions behind SVMs and discuss how to use it in practice.

Week 8

  • Unsupervised Learning: We use unsupervised learning to build models that help us understand our data better. We discuss the k-Means algorithm for clustering that enable us to learn groupings of unlabeled data points.
  • Dimensionality Reduction: In this module, we introduce Principal Components Analysis, and show how it can be used for data compression to speed up learning algorithms as well as for visualizations of complex datasets.

Week 9

  • Anomaly Detection: Given a large number of data points, we may sometimes want to figure out which ones vary significantly from the average. For example, in manufacturing, we may want to detect defects or anomalies. We show how a dataset can be modeled using a Gaussian distribution, and how the model can be used for anomaly detection.
  • Recommender Systems: When you buy a product online, most websites automatically recommend other products that you may like. Recommender systems look at patterns of activities between different users and different products to produce these recommendations. In this module, we introduce recommender algorithms such as the collaborative filtering algorithm and low-rank matrix factorization.

Week 10

  • Large Scale Machine Learning: Machine learning works best when there is an abundance of data to leverage for training. In this module, we discuss how to apply the machine learning algorithms with large datasets.

Week 11

  • Application Example: Photo OCR: Identifying and recognizing objects, words, and digits in an image is a challenging task. We discuss how a pipeline can be built to tackle this problem and how to analyze and improve the performance of such a system.

Reviews

Of longstanding renown in the MOOC world, Stanford’s machine learning course really is the definitive introduction to this topic. The course broadly covers all of the major areas of machine learning … Prof. Ng precedes each segment with a motivating discussion and examples.

Andrew Ng is a gifted teacher and able to explain complicated subjects in a very intuitive and clear way, including the math behind all concepts. Highly recommended.

The only problem I see with this course if that it sets the expectation bar very high for other courses.

Link to reviews.

A New Ivy League Introduction with a Brilliant Professor

Machine Learning by Columbia University on edX

Columbia University’s Machine Learning is a relatively new offering that is part of their Artificial Intelligence MicroMasters on edX. Though it is newer and doesn’t have a large number of reviews, the ones that it does have are exceptionally strong. Professor John Paisley is noted as brilliant, clear, and clever. It has a 4.8-star weighted average rating over 10 reviews.

The course also covers all aspects of the machine learning workflow and more algorithms than the above Stanford offering. Columbia’s is a more advanced introduction, with reviewers noting that students should be comfortable with the recommended prerequisites (calculus, linear algebra, statistics, probability, and coding).

Quizzes (11), programming assignments (4), and a final exam are the modes of evaluation. Students can use either Python, Octave, or MATLAB to complete the assignments.

Columbia University Logo

Listed below are the details for the course, including its description, syllabus, and prominent reviews.

Machine Learning

Basic Information

University: Columbia University

Instructors: Professor John W. Paisley

Platform: edX

Pace: Self-paced

Cost: Free with a Verified Certificate available for purchase

Estimated timeline: Eight to ten hours per week over twelve weeks

Professor John Paisley

Dr. John Paisley

Description

Machine Learning is the basis for the most exciting careers in data analysis today. You’ll learn the models and methods and apply them to real world situations ranging from identifying trending news topics, to building recommendation engines, ranking sports teams and plotting the path of movie zombies.

In the first half of the course, we will cover supervised learning techniques for regression and classification. In this framework, we possess an output or response that we wish to predict based on a set of inputs. We will discuss several fundamental methods for performing this task and algorithms for their optimization. Our approach will be more practically motivated, meaning we will fully develop a mathematical understanding of the respective algorithms, but we will only briefly touch on abstract learning theory.

In the second half of the course, we shift to unsupervised learning techniques. In these problems the end goal is less clear-cut than predicting an output based on a corresponding input. We will cover three fundamental problems of unsupervised learning: data clustering, matrix factorization, and sequential models for order-dependent data. Some applications of these models include object recommendation and topic modeling.

Syllabus

View Detailed Syllabus

Week 1

Lecture 1: We will discuss the various perspectives of the course and machine learning in general. We will then cover the maximum likelihood problem for learning parameters of a probability distribution.

Lecture 2: We move to our first supervised learning problem of linear regression. We discuss the least squares approach to linear regression and understand the geometric intuitions of the problem.

Week 2

Lecture 3: We continue our discussion of least squares by thinking probabilistically about the problem, making connections to maximum likelihood. This will motivate the ridge regression approach to linear regression through a technique called regularization. We analyze and compare these two fundamental approaches to linear regression via the SVD.

Lecture 4: We discuss the bias-variance trade-off using least squares and ridge regression as a motivating example. We then introduce Bayes rule and maximum a posteriori (MAP) inference as an alternative to maximum likelihood, making connections to ridge regression.

Week 3

Lecture 5: We discuss Bayesian linear regression as a natural development of ridge regression. This leads to a discussion of forming predictive distributions and “active learning” as two features of the fully Bayesian approach.

Lecture 6: We wrap up our focus on regression by considering cases where the dimensionality of the problem is much larger than the number of samples. We first discuss a minimum L2 approach, which is more useful for introducing two key mathematical tools in machine learning: analysis and optimization. We then discuss sparsity-promoting methods for linear regression.

Week 4

Lecture 7: We shift to the supervised learning problem of classification. We cover simple nearest neighbor approaches and discuss what an optimal classifier looks like. This motivates the generic Bayes classification approach, an approximation to the optimal classifier.

Lecture 8: We move to general linear classifiers. We discuss in detail the geometric understanding of the problem, which is crucial to appreciating what a linear classifier tries to do. We discuss the first linear classifier called the Perceptron. While this method has been improved upon, the Perceptron will provide us with our first occasion to discuss iterative algorithms for model learning.

Week 5

Lecture 9: We discuss logistic regression, a discriminative linear classification model. We compare with the generative Bayes classification model via the log odds function. The likelihood distribution formed by the logistic regression model suggests matching it with a prior; through this example we discuss the general Laplace approximation technique for approximating a posterior distribution.

Lecture 10: We make a “trick” we have been using more concrete by discussing feature expansions and their use in kernel methods. After discussing kernels, we look at a specific instance of a powerful nonparametric model that makes use of them for regression (and classification): the Gaussian process.

Week 6

Lecture 11: We return to the geometric view of linear classification and remove all probabilistic interpretations of the problem. This inspires the maximum margin approach to binary classification. We discuss and analyze an optimization algorithm called the support vector machine (SVM) that achieves this max-margin goal. We show how kernels neatly fit into this model with no extra effort.

Lecture 12: We shift to a radically different classification approach to the linear classifiers we have been discussing thus far. Tree classifiers attempt to find partitions of a space by which to classify data separately in each partition. We introduce a statistical technique called the bootstrap to “bag” these trees into a “random forest.”

Week 7

Lecture 13: We discuss and analyze boosting, a method for taking any classifier and making it better. This is done by learning sequences of classifiers on various subsets of the data such that their weighted combination makes significantly better predictions than any individual classifier on its own. We prove the training error theorem of boosting, perhaps the most difficult part of the class, but well worth the effort!

Lecture 14: This lecture marks the beginning of the unsupervised learning portion of the course. The first family of algorithms we consider are clustering algorithms. We present and derive the k-means algorithm, the most fundamental clustering algorithm.

Week 8

Lecture 15: We discuss the expectation-maximization (EM) algorithm for performing maximum likelihood via an indirect route. The EM algorithm is a remarkable technique that makes many difficult problems much easier. We discuss EM in the context of a missing data problem, but it will reappear in Lectures 16, 19 and 21.

Lecture 16: We compare hard and soft clustering models and cover a modified k-means algorithm. We then focus our discussion on a probabilistic approach to clustering called the Gaussian mixture model, deriving an iterative EM algorithm to learn its parameters.

Week 9

Lecture 17: We move to another unsupervised learning problem of factorizing a matrix into the product of two smaller matrices. This is a widely-used technique for collaborative filtering, where we wish to recommend content to users. We motivate the model we discuss, called probabilistic matrix factorization, in the context of movie recommendation.

Lecture 18: We discuss non-negative matrix factorization, which differs from Lecture 17 by restricting all values in the data and model to be greater than zero. This allows for “parts-based learning” from data, of which topic modeling is a prime example. We present the two standard NMF algorithms for this problem.

Week 10

Lecture 19: We cover the fundamental matrix factorization technique called principal components analysis (PCA), a very useful dimensionality reduction approach. Extensions covered include probabilistic PCA for image denoising and inpainting, and kernel PCA for nonlinear dimensionality reduction.

Lecture 20: We move to the unsupervised problem of designing and learning sequential models. Our first topic is the Markov model. We discuss two important properties of Markov chains and apply them to the problems of ranking and semi-supervised classification.

Week 11

Lecture 21: We broaden the Markov model to the hidden Markov model (HMM). We clarify the important difference between the two and discuss an EM algorithm for learning HMMs. We give a high-level discussion of how HMMs can be used for speech recognition.

Lecture 22: We discuss a final sequential model where all unknowns are continuous valued. We present the Kalman filter for object tracking and put all our Bayesian knowledge to use in deriving the filtering algorithm for real-time learning of this continuous-state linear Gaussian model.

Week 12

Lecture 23: In the last week we shift gears to two problems that are very different from what we’ve previously discussed. In this lecture we cover association analysis, which is the problem of learning interesting highly probable combinations of the form A implies B. The clever and exact algorithm we cover makes this combinatorially “impossible” problem very possible.Lecture 24: In this final lecture we discuss

Lecture 24: In this final lecture we discuss model selection. We have made many modeling choices throughout this course without knowing exactly which is the best. This lecture discusses two basic techniques for choosing the final complexity of a model.

Reviews

Over all my years of [being a] student I’ve come across professors who aren’t brilliant, professors who are brilliant but they don’t know how to explain the stuff clearly, and professors who are brilliant and know how explain the stuff clearly. Dr. Paisley belongs to the third group.

This is a great course … The instructor’s language is precise and that is, to my mind, one of the strongest points of the course. The lectures are of high quality and the slides are great too.

Dr. Paisley and his supervisor are … students of Michael Jordan, the father of machine learning. [Dr. Paisley] is the best ML professor at Columbia because of his ability to explain stuff clearly. Up to 240 students have selected his course this semester, the largest number among all professors [teaching] machine learning at Columbia.

Link to reviews.

A Practical Intro in Python & R from Industry Experts

Machine Learning A-Z™: Hands-On Python & R In Data Science by Kirill Eremenko, Hadelin de Ponteves, and the SuperDataScience Team via Udemy

Machine Learning A-Z™ on Udemy is an impressively detailed offering that provides instruction in both Python and R, which is rare and can’t be said for any of the other top courses. It has a 4.5-star weighted average rating over 8,119 reviews, which makes it the most reviewed course of the ones considered.

It covers the entire machine learning workflow and an almost ridiculous (in a good way) number of algorithms through 40.5 hours of on-demand video. The course takes a more applied approach and is lighter math-wise than the above two courses. Each section starts with an “intuition” video from Eremenko that summarizes the underlying theory of the concept being taught. de Ponteves then walks through implementation with separate videos for both Python and R. As a “bonus,” the course includes Python and R code templates for students to download and use on their own projects. There are quizzes and homework challenges, though these aren’t the strong points of the course.

Eremenko and the SuperDataScience team are revered for their ability to “make the complex simple.” Also, the prerequisites listed are “just some high school mathematics,” so this course might be a better option for those daunted by the Stanford and Columbia offerings.

Udemy logo

Listed below are the details for each course, including their description, syllabus, and prominent reviews.

Machine Learning A-Z™: Hands-On Python & R In Data Science

Basic Information

Instructors: Kirill Eremenko, Hadelin de Ponteves, and the SuperDataScience Team

Platform: Udemy

Pace: Self-paced

Cost: Varies depending on Udemy discounts, which are frequent. Can be purchased for as little as $10.

Estimated timeline: 40.5 hours of on-demand video

Kirill Eremenko

Kirill Eremenko

Description

Interested in the field of Machine Learning? Then this course is for you!

This course has been designed by two professional Data Scientists so that we can share our knowledge and help you learn complex theory, algorithms and coding libraries in a simple way.

We will walk you step-by-step into the World of Machine Learning. With every tutorial you will develop new skills and improve your understanding of this challenging yet lucrative sub-field of Data Science.

This course is fun and exciting, but at the same time we dive deep into Machine Learning. Moreover, the course is packed with practical exercises which are based on live examples. So not only will you learn the theory, but you will also get some hands-on practice building your own models. And as a bonus, this course includes both Python and R code templates which you can download and use on your own projects.

Syllabus

View Detailed Syllabus

Part 1: Data Preprocessing

Part 2: Regression: Simple Linear Regression, Multiple Linear Regression, Polynomial Regression, SVR, Decision Tree Regression, Random Forest Regression
Part 3: Classification: Logistic Regression, K-NN, SVM, Kernel SVM, Naive Bayes, Decision Tree Classification, Random Forest Classification
Part 4: Clustering: K-Means, Hierarchical Clustering
Part 5: Association Rule Learning: Apriori, Eclat
Part 6: Reinforcement Learning: Upper Confidence Bound, Thompson Sampling
Part 7: Natural Language Processing: Bag-of-words model and algorithms for NLP
Part 8: Deep Learning: Artificial Neural Networks, Convolutional Neural Networks
Part 9: Dimensionality Reduction: PCA, LDA, Kernel PCA
Part 10: Model Selection & Boosting: k-fold Cross Validation, Parameter Tuning, Grid Search, XGBoost

Reviews

The course is professionally produced, the sound quality is excellent, and the explanations are clear and concise … It’s an incredible value for your financial and time investment.

It was spectacular to be able to follow the course in two different programming languages simultaneously.

Kirill is one of the absolute best instructors on Udemy (if not the Internet) and I recommend taking any class he teaches. … This course has a ton of content, like a ton!

Link to reviews.

The competition

Our #1 pick had a weighted average rating of 4.7 out of 5 stars over 422 reviews. Let’s look at the other alternatives, sorted by descending rating. A reminder that deep learning-only courses are not included in this guide — you can find those here.

The Analytics Edge (Massachusetts Institute of Technology/edX): More focused on analytics in general, though it does cover several machine learning topics. Uses R. Strong narrative that leverages familiar real-world examples. Challenging. Ten to fifteen hours per week over twelve weeks. Free with a verified certificate available for purchase. It has a 4.9-star weighted average rating over 214 reviews.

Python for Data Science and Machine Learning Bootcamp (Jose Portilla/Udemy): Has large chunks of machine learning content, but covers the whole data science process. More of a very detailed intro to Python. Amazing course, though not ideal for the scope of this guide. 21.5 hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.6-star weighted average rating over 3316 reviews.

Data Science and Machine Learning Bootcamp with R (Jose Portilla/Udemy): The comments for Portilla’s above course apply here as well, except for R. 17.5 hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.6-star weighted average rating over 1317 reviews.

Machine Learning Series (Lazy Programmer Inc./Udemy): Taught by a data scientist/big data engineer/full stack software engineer with an impressive resume, Lazy Programmer currently has a series of 16 machine learning-focused courses on Udemy. In total, the courses have 5000+ ratings and almost all of them have 4.6 stars. A useful course ordering is provided in each individual course’s description. Uses Python. Cost varies depending on Udemy discounts, which are frequent.

Machine Learning (Georgia Tech/Udacity): A compilation of what was three separate courses: Supervised, Unsupervised and Reinforcement Learning. Part of Udacity’s Machine Learning Engineer Nanodegree and Georgia Tech’s Online Master’s Degree (OMS). Bite-sized videos, as is Udacity’s style. Friendly professors. Estimated timeline of four months. Free. It has a 4.56-star weighted average rating over 9 reviews.

Implementing Predictive Analytics with Spark in Azure HDInsight(Microsoft/edX): Introduces the core concepts of machine learning and a variety of algorithms. Leverages several big data-friendly tools, including Apache Spark, Scala, and Hadoop. Uses both Python and R. Four hours per week over six weeks. Free with a verified certificate available for purchase. It has a 4.5-star weighted average rating over 6 reviews.

Data Science and Machine Learning with Python — Hands On! (Frank Kane/Udemy): Uses Python. Kane has nine years of experience at Amazon and IMDb. Nine hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.5-star weighted average rating over 4139 reviews.

Scala and Spark for Big Data and Machine Learning (Jose Portilla/Udemy): “Big data” focus, specifically on implementation in Scala and Spark. Ten hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.5-star weighted average rating over 607 reviews.

Machine Learning Engineer Nanodegree (Udacity): Udacity’s flagship Machine Learning program, which features a best-in-class project review system and career support. The program is a compilation of several individual Udacity courses, which are free. Co-created by Kaggle. Estimated timeline of six months. Currently costs $199 USD per month with a 50% tuition refund available for those who graduate within 12 months. It has a 4.5-star weighted average rating over 2 reviews.

Learning From Data (Introductory Machine Learning) (California Institute of Technology/edX): Enrollment is currently closed on edX, but is also available via CalTech’s independent platform (see below). It has a 4.49-star weighted average rating over 42 reviews.

Learning From Data (Introductory Machine Learning) (Yaser Abu-Mostafa/California Institute of Technology): “A real Caltech course, not a watered-down version.” Reviews note it is excellent for understanding machine learning theory. The professor, Yaser Abu-Mostafa, is popular among students and also wrote the textbook upon which this course is based. Videos are taped lectures (with lectures slides picture-in-picture) uploaded to YouTube. Homework assignments are .pdf files. The course experience for online students isn’t as polished as the top three recommendations. It has a 4.43-star weighted average rating over 7 reviews.

Mining Massive Datasets (Stanford University): Machine learning with a focus on “big data.” Introduces modern distributed file systems and MapReduce. Ten hours per week over seven weeks. Free. It has a 4.4-star weighted average rating over 30 reviews.

AWS Machine Learning: A Complete Guide With Python (Chandra Lingam/Udemy): A unique focus on cloud-based machine learning and specifically Amazon Web Services. Uses Python. Nine hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.4-star weighted average rating over 62 reviews.

Introduction to Machine Learning & Face Detection in Python (Holczer Balazs/Udemy): Uses Python. Eight hours of on-demand video. Cost varies depending on Udemy discounts, which are frequent. It has a 4.4-star weighted average rating over 162 reviews.

StatLearning: Statistical Learning (Stanford University): Based on the excellent textbook, “An Introduction to Statistical Learning, with Applications in R” and taught by the professors who wrote it. Reviewers note that the MOOC isn’t as good as the book, citing “thin” exercises and mediocre videos. Five hours per week over nine weeks. Free. It has a 4.35-star weighted average rating over 84 reviews.

Machine Learning Specialization (University of Washington/Coursera): Great courses, but last two classes (including the capstone project) were canceled. Reviewers note that this series is more digestable (read: easier for those without strong technical backgrounds) than other top machine learning courses (e.g. Stanford’s or Caltech’s). Be aware that the series is incomplete with recommender systems, deep learning, and a summary missing. Free and paid options available. It has a 4.31-star weighted average rating over 80 reviews.

From 0 to 1: Machine Learning, NLP & Python-Cut to the Chase (Loony Corn/Udemy): “A down-to-earth, shy but confident take on machine learning techniques.” Taught by four-person team with decades of industry experience together. Uses Python. Cost varies depending on Udemy discounts, which are frequent. It has a 4.2-star weighted average rating over 494 reviews.

Principles of Machine Learning (Microsoft/edX): Uses R, Python, and Microsoft Azure Machine Learning. Part of the Microsoft Professional Program Certificate in Data Science. Three to four hours per week over six weeks. Free with a verified certificate available for purchase. It has a 4.09-star weighted average rating over 11 reviews.

Big Data: Statistical Inference and Machine Learning (Queensland University of Technology/FutureLearn): A nice, brief exploratory machine learning course with a focus on big data. Covers a few tools like R, H2O Flow, and WEKA. Only three weeks in duration at a recommended two hours per week, but one reviewer noted that six hours per week would be more appropriate. Free and paid options available. It has a 4-star weighted average rating over 4 reviews.

Genomic Data Science and Clustering (Bioinformatics V) (University of California, San Diego/Coursera): For those interested in the intersection of computer science and biology and how it represents an important frontier in modern science. Focuses on clustering and dimensionality reduction. Part of UCSD’s Bioinformatics Specialization. Free and paid options available. It has a 4-star weighted average rating over 3 reviews.

Intro to Machine Learning (Udacity): Prioritizes topic breadth and practical tools (in Python) over depth and theory. The instructors, Sebastian Thrun and Katie Malone, make this class so fun. Consists of bite-sized videos and quizzes followed by a mini-project for each lesson. Currently part of Udacity’s Data Analyst Nanodegree. Estimated timeline of ten weeks. Free. It has a 3.95-star weighted average rating over 19 reviews.

Machine Learning for Data Analysis (Wesleyan University/Coursera): A brief intro machine learning and a few select algorithms. Covers decision trees, random forests, lasso regression, and k-means clustering. Part of Wesleyan’s Data Analysis and Interpretation Specialization. Estimated timeline of four weeks. Free and paid options available. It has a 3.6-star weighted average rating over 5 reviews.

Programming with Python for Data Science (Microsoft/edX): Produced by Microsoft in partnership with Coding Dojo. Uses Python. Eight hours per week over six weeks. Free and paid options available. It has a 3.46-star weighted average rating over 37 reviews.

Machine Learning for Trading (Georgia Tech/Udacity): Focuses on applying probabilistic machine learning approaches to trading decisions. Uses Python. Part of Udacity’s Machine Learning Engineer Nanodegree and Georgia Tech’s Online Master’s Degree (OMS). Estimated timeline of four months. Free. It has a 3.29-star weighted average rating over 14 reviews.

Practical Machine Learning (Johns Hopkins University/Coursera): A brief, practical introduction to a number of machine learning algorithms. Several one/two-star reviews expressing a variety of concerns. Part of JHU’s Data Science Specialization. Four to nine hours per week over four weeks. Free and paid options available. It has a 3.11-star weighted average rating over 37 reviews.

Machine Learning for Data Science and Analytics (Columbia University/edX): Introduces a wide range of machine learning topics. Some passionate negative reviews with concerns including content choices, a lack of programming assignments, and uninspiring presentation. Seven to ten hours per week over five weeks. Free with a verified certificate available for purchase. It has a 2.74-star weighted average rating over 36 reviews.

Recommender Systems Specialization (University of Minnesota/Coursera): Strong focus one specific type of machine learning — recommender systems. A four course specialization plus a capstone project, which is a case study. Taught using LensKit (an open-source toolkit for recommender systems). Free and paid options available. It has a 2-star weighted average rating over 2 reviews.

Machine Learning With Big Data (University of California, San Diego/Coursera): Terrible reviews that highlight poor instruction and evaluation. Some noted it took them mere hours to complete the whole course. Part of UCSD’s Big Data Specialization. Free and paid options available. It has a 1.86-star weighted average rating over 14 reviews.

Practical Predictive Analytics: Models and Methods (University of Washington/Coursera): A brief intro to core machine learning concepts. One reviewer noted that there was a lack of quizzes and that the assignments were not challenging. Part of UW’s Data Science at Scale Specialization. Six to eight hours per week over four weeks. Free and paid options available. It has a 1.75-star weighted average rating over 4 reviews.

The following courses had one or no reviews as of May 2017.

Machine Learning for Musicians and Artists (Goldsmiths, University of London/Kadenze): Unique. Students learn algorithms, software tools, and machine learning best practices to make sense of human gesture, musical audio, and other real-time data. Seven sessions in length. Audit (free) and premium ($10 USD per month) options available. It has one 5-star review.

Applied Machine Learning in Python (University of Michigan/Coursera): Taught using Python and the scikit learn toolkit. Part of the Applied Data Science with Python Specialization. Scheduled to start May 29th. Free and paid options available.

Applied Machine Learning (Microsoft/edX): Taught using various tools, including Python, R, and Microsoft Azure Machine Learning (note: Microsoft produces the course). Includes hands-on labs to reinforce the lecture content. Three to four hours per week over six weeks. Free with a verified certificate available for purchase.

Machine Learning with Python (Big Data University): Taught using Python. Targeted towards beginners. Estimated completion time of four hours. Big Data University is affiliated with IBM. Free.

Machine Learning with Apache SystemML (Big Data University): Taught using Apache SystemML, which is a declarative style language designed for large-scale machine learning. Estimated completion time of eight hours. Big Data University is affiliated with IBM. Free.

Machine Learning for Data Science (University of California, San Diego/edX): Doesn’t launch until January 2018. Programming examples and assignments are in Python, using Jupyter notebooks. Eight hours per week over ten weeks. Free with a verified certificate available for purchase.

Introduction to Analytics Modeling (Georgia Tech/edX): The course advertises R as its primary programming tool. Five to ten hours per week over ten weeks. Free with a verified certificate available for purchase.

Predictive Analytics: Gaining Insights from Big Data (Queensland University of Technology/FutureLearn): Brief overview of a few algorithms. Uses Hewlett Packard Enterprise’s Vertica Analytics platform as an applied tool. Start date to be announced. Two hours per week over four weeks. Free with a Certificate of Achievement available for purchase.

Introducción al Machine Learning (Universitas Telefónica/Miríada X): Taught in Spanish. An introduction to machine learning that covers supervised and unsupervised learning. A total of twenty estimated hours over four weeks.

Machine Learning Path Step (Dataquest): Taught in Python using Dataquest’s interactive in-browser platform. Multiple guided projects and a “plus” project where you build your own machine learning system using your own data. Subscription required.


The following six courses are offered by DataCamp. DataCamp’s hybrid teaching style leverages video and text-based instruction with lots of examples through an in-browser code editor. A subscription is required for full access to each course.

Introduction to Machine Learning (DataCamp): Covers classification, regression, and clustering algorithms. Uses R. Fifteen videos and 81 exercises with an estimated timeline of six hours.

Supervised Learning with scikit-learn (DataCamp): Uses Python and scikit-learn. Covers classification and regression algorithms. Seventeen videos and 54 exercises with an estimated timeline of four hours.

Unsupervised Learning in R (DataCamp): Provides a basic introduction to clustering and dimensionality reduction in R. Sixteen videos and 49 exercises with an estimated timeline of four hours.

Machine Learning Toolbox (DataCamp): Teaches the “big ideas” in machine learning. Uses R. 24 videos and 88 exercises with an estimated timeline of four hours.

Machine Learning with the Experts: School Budgets (DataCamp): A case study from a machine learning competition on DrivenData. Involves building a model to automatically classify items in a school’s budget. DataCamp’s “Supervised Learning with scikit-learn” is a prerequisite. Fifteen videos and 51 exercises with an estimated timeline of four hours.

Unsupervised Learning in Python (DataCamp): Covers a variety of unsupervised learning algorithms using Python, scikit-learn, and scipy. The course ends with students building a recommender system to recommend popular musical artists. Thirteen videos and 52 exercises with an estimated timeline of four hours.


Machine Learning (Tom Mitchell/Carnegie Mellon University): Carnegie Mellon’s graduate introductory machine learning course. A prerequisite to their second graduate level course, “Statistical Machine Learning.” Taped university lectures with practice problems, homework assignments, and a midterm (all with solutions) posted online. A 2011 version of the course also exists. CMU is one of the best graduate schools for studying machine learning and has a whole department dedicated to ML. Free.

Statistical Machine Learning (Larry Wasserman/Carnegie Mellon University): Likely the most advanced course in this guide. A follow-up to Carnegie Mellon’s Machine Learning course. Taped university lectures with practice problems, homework assignments, and a midterm (all with solutions) posted online. Free.

Undergraduate Machine Learning (Nando de Freitas/University of British Columbia): An undergraduate machine learning course. Lectures are filmed and put on YouTube with the slides posted on the course website. The course assignments are posted as well (no solutions, though). de Freitas is now a full-time professor at the University of Oxford and receives praise for his teaching abilities in various forums. Graduate version available (see below).

Machine Learning (Nando de Freitas/University of British Columbia): A graduate machine learning course. The comments in de Freitas’ undergraduate course (above) apply here as well.

About Class Central Career Guides

Class Central Career Guides are recommendations for the best online courses and MOOCs.

Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.

These guides are updated frequently to always reflect the best in online education.

Drop us a note at guides@class-central.com if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!

Author Bio

David Venturi

David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.

Twitter Medium Web

  • I blog frequently and I really appreciate your content. Your article has really peaked my interest. I am going to bookmark your website and keep checking for new details about once per week. I subscribed to your RSS feed too.
    doulci activator

  • BorderGuard

    That Columbia University edX course on Machine Learning is probably the WORST. I have taken every one on the web. I would say the Analytics course edX from MIT for practical machine learning is good, though hasn’t been updated in years. The University of Washington series on Coursera is better than Ng’s course though not as comprehensive in the subject areas covered. Areas they do cover are done much better than Ng.

    • Trang Le

      Thanks for your input. I agree that the Analytics course by MIT is excellent and I highly recommend it for anyone in any field.

      I don’t quite get the fuss about Coursera. Having done a few courses on that website, I find most of the courses to lack depth. If a course asks you to spend only a few hours every week, then the outcomes you’ll get are just that.

  • Kranthi Anna