Editor’s note: Drop us a note at guides@classcentral.com if you have any feedback or requests for particular career guides. We are also looking for contributors!
Here are the parts of the series that have been published so far:
 The Best Intro to Programming Courses for Data Science
 The Best Statistics & Probability Courses for Data Science
 The Best Intro to Data Science Courses (this one)
 The Best Data Visualization Courses
 The Best Machine Learning Courses
Our pick
The best online introduction to data science course is Kirill Eremenko’s “Data Science AZ.” The course, which has a 4.5star weighted average rating over 3,071 reviews, is among the highest rated and most reviewed courses of the ones considered. It is the clear winner in terms of breadth and depth of coverage of the data science process. The instructor’s natural teaching ability is frequently praised by reviewers.
Data Science AZ™: RealLife Data Science Exercises Included by Kirill Eremenko on Udemy
A great Pythonfocused introduction
Udacity’s Intro to Data Analysis covers the data science process cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5star rating over one review. It is relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. The videos are wellproduced and the instructor (Caroline Buckey) is clear and personable.
Intro to Data Analysis by Udacity
An impressive offering with no review data
Data Science Fundamentals is a fourcourse series provided by Big Data University, which is an IBM initiative. The series covers the full data science process and introduces Python, R, and several other opensource tools. The courses have tremendous production value. Unfortunately, they have no review data on the major review sites that were used for this analysis.
Data Science Fundamentals by Big Data University
Table of Contents
 Why You Should Trust Us
 About the Data Science Career Guide
 How We Picked Courses to Consider
 How We Tested
 What is the Data Science Process?
 Basic Coding, Stats, and Probability Required
 Our Pick
 A Great Pythonfocused Introduction
 An Impressive Offering with No Review Data
 The Competition
 About Class Central Career Guides
 Author Bio
Why You Should Trust Us
I started creating my own data science master’s degree using online courses almost a year ago. I have taken many data sciencerelated courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role.
For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.
Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.
About the Data Science Career Guide
Class Central’s Data Science Career Guide is a sixpiece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.
Here are the parts of the series that have been published so far:
 The Best Intro to Programming Courses for Data Science
 The Best Statistics & Probability Courses for Data Science
 The Best Intro to Data Science Courses (this one)
 The Best Data Visualization Courses
P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.
How We Picked Courses to Consider
Each course must fit three criteria:
 It must teach the data science process. More on that soon.
 It must be ondemand or offered every few months.
 It must be an interactive online course, so no books or readonly tutorials. Though these are viable ways to learn, this guide focuses on courses.
We believe we covered every notable course that fits the above criteria. Since there are seemingly hundreds of courses on Udemy, we chose to consider the mostreviewed and highestrated ones only. There’s always a chance that we missed something, though. So please let us know in the comments section if we left a good course out.
How We Tested
We compiled average rating and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.
We made subjective syllabus judgment calls based on two factors:
 Coverage of the data science process. Does the course brush over or skip certain subjects? Does it cover certain subjects in too much detail? See the next section for what this process entails.
 Usage of common data science tools. Is the course taught using popular programming languages like Python and/or R? These aren’t necessary, but helpful in most cases so slight preference is given to these courses.
What is the Data Science Process?
What is data science? What does a data scientist do? These are the types of fundamental questions that an intro to data science course should answer. The following infographic from Harvard professors Joe Blitzstein and Hanspeter Pfister outlines a typical data science process, which will help us answer these questions.
Viz from Opera Solutions.
Our goal with this introduction to data science course is to become familiar with the data science process. We don’t want to go indepth coverage of specific aspects of the process, hence the “intro to” portion of the title. For each aspect, the ideal course explains key concepts within the framework of the process, introduces common tools, and provides a few examples (preferably handson).
We are only looking for an introduction. This guide therefore won’t include full specializations or programs like Johns Hopkins University’s Data Science Specialization on Coursera or Udacity’s Data Analyst Nanodegree. These compilations of courses elude the purpose of this series: to find the best individual courses for each subject to comprise a data science education. The next guides in the series will cover each aspect of the data science process in detail.
Basic Coding, Stats, and Probability Experience Required
Several courses listed below require basic programming, statistics, and probability experience. This requirement is understandable given that the new content is reasonably advanced and that these subjects often have several courses dedicated to them.
This experience can be acquired through our recommendations in the first two articles (programming, statistics) in this Data Science Career Guide.
Our Pick
Data Science AZ™: RealLife Data Science Exercises Included by Kirill Eremenko on Udemy
Kirill Eremenko’s “Data Science AZ” on Udemy is the clear winner in terms of breadth and depth of coverage of the data science process of the 20+ courses that qualified. It has a 4.5star weighted average rating over 3,071 reviews, which places it among the highest rated and most reviewed courses of the ones considered. It outlines the full process, makes it clear that it can be iterative, and provides reallife examples. Reviewers love the instructor’s delivery and the organization of the content.
Though it doesn’t check our “usage of common data science tools” box, the nonPython/R tool choices (gretl, Tableau, Excel) are used effectively in context. Eremenko mentions the following when explaining the gretl choice (gretl is a statistical software package), though it applies to all of the tools he uses (emphasis mine):
In gretl, we will be able to do the same modeling just like in R and Python but we won’t have to code. That’s the big deal here. Some of you may already know R very well, but some may not know it at all. My goal is to show you how to build a robust model and give you a framework that you can apply in any tool you choose. gretl will help us avoid getting bogged down in our coding.
Listed below are the details for each course, including their description, syllabus, and prominent reviews.
Data Science AZ™: RealLife Data Science Exercises Included
Basic Information
Instructor: Kirill Eremenko, SuperDataScience Team
Platform: Udemy
Pace: Selfpaced
Cost: Varies depending on Udemy discounts, which are frequent. Can be purchased for as little as $10.
Estimated timeline: 21 hours
Description
This course will give you a full overview of the Data Science journey. Upon completing this course you will know:
 How to clean and prepare your data for analysis
 How to perform basic visualization of your data
 How to model your data
 How to curvefit your data
 And finally, how to present your findings and wow the audience
This course will give you so much practical exercises that real world will seem like a piece of cake when you graduate this class. This course has homework exercises that are so thought provoking and challenging that you will want to cry… But you won’t give up! You will crush it. In this course you will develop a good understanding of the following tools:
 SQL
 SSIS
 Tableau
 Gretl
This course has preplanned pathways. Using these pathways you can navigate the course and combine sections into YOUR OWN journey that will get you the skills that YOU need.
Or you can do the whole course and set yourself up for an incredible career in Data Science. The choice is yours. Join the class and start learning today!
Syllabus
View Detailed SyllabusHide Detailed SyllabusSections
 1: Get Excited
 2: What is Data Science?
 3: Part 1: Visualisation
 4: Introduction to Tableau
 5: How to use Tableau for Data Mining
 6: Advanced Data Mining With Tableau
 7: Part 2: Modelling
 8: Stats Refresher
 9: Simple Linear Regression
 10: Multiple Linear Regression
 11: Logistic Regression
 12: Building a robust geodemographic segmentation model
 13: Assessing your model
 14: Drawing insights from your model
 15: Model maintenance
 16: Part 3: Data Preparation
 17: Business Intelligence (BI) Tools
 18: ETL Phase 1: Data Wrangling before the Load
 19: ETL Phase 2: Stepbystep guide to uploading data using SSIS
 20: Handling errors during ETL (Phases 1 & 2)
 21: SQL Programming for Data Science
 22: ETL Phase 3: Data Wrangling after the load
 23: Handling errors during ETL (Phase 3)
 24: Part 4: Communication
 25: Working with people
 26: Presenting for Data Scientists
 27: Homework Solutions
 28: Bonus Lectures
Reviews
“Kirill is the best teacher I’ve found online. He uses real life examples and explains common problems so that you get a deeper understanding of the coursework. He also provides a lot of insight as to what it means to be a data scientist from working with insufficient data all the way to presenting your work to Cclass management. I highly recommend this course for beginner students to intermediate data analysts!”
“This course has been absolutely amazing. Very valuable actually being *shown* the whole process of data science while working through it yourself.”
“Outstanding content delivered in a userfriendly way. Kirill has a natural ability to teach. Everything is explained to the exact level of detail you would need with no assumptions made of previous knowledge. Highly recommended.”
Link to reviews (bottom of the page).
A Great Pythonfocused Introduction
Intro to Data Analysis by Udacity
Udacity’s Intro to Data Analysis is a relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. It covers the data science process clearly and cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5star rating over one review.
The videos are wellproduced and the instructor (Caroline Buckey) is clear and personable. Lots of programming quizzes enforce the concepts learned in the videos. Students will leave the course confident in their new and/or improved NumPy and Pandas skills (these are popular Python libraries). The final project, which is graded and reviewed in the Nanodegree but not in the free individual course, can be a nice add to a portfolio.
Listed below are the details for the specialization, including each course’s description and syllabus.
Intro to Data Analysis
Basic Information
Instructors: Caroline Buckey
Platform: Udacity
Pace: Selfpaced
Cost: Free
Estimated timeline: Six weeks at six hours per week (for a total of 36 hours), though it is shorter in my experience.
Description
This course will introduce you to the world of data analysis. You’ll learn how to go through the entire data analysis process, which includes:
 Posing a question
 Wrangling your data into a format you can use and fixing any problems with it
 Exploring the data, finding patterns in it, and building your intuition about it
 Drawing conclusions and/or making predictions
 Communicating your findings
You’ll also learn how to use the Python libraries NumPy, Pandas, and Matplotlib to write code that’s cleaner, more concise, and runs faster.
Syllabus
View Detailed SyllabusHide Detailed SyllabusLESSON 1: Data Analysis Process
 Learn about the data analysis process.
 Pose a question, wrangle your data, draw conclusions and/or make predictions.
 Complete an analysis of Udacity student data using pure Python, with few additional libraries.
LESSON 2: NumPy and Pandas for 1D Data
 Start learning to use NumPy and Pandas to make the data analysis process easier.
 Features that apply to onedimensional data.
 Learn to use NumPy arrays, Pandas Series, and vectorized operations.
LESSON 3: NumPy and Pandas for 2D Data
 Continue learning about NumPy and Pandas, this time focusing on twodimensional data.
 Learn to use twodimensional NumPy arrays and Pandas DataFrames.
 Group your data and to combine data from multiple files.
LESSON 4: Investigate a Dataset
 Use NumPy and Pandas to go through the data analysis process on one of a list of recommended datasets.
An Impressive Offering with No Review Data
Data Science Fundamentals by Big Data University
Data Science Fundamentals is a fourcourse series provided by IBM’s Big Data University. It includes courses titled Data Science 101, Data Science Methodology, Data Science Handson with Open Source Tools, and R 101. It covers the full data science process and introduces Python, R, and several other opensource tools. Unfortunately, it has no review data on the major review sites that we used for this analysis, so we can’t recommend it over the above two options yet.
The courses have tremendous production value. The 5hour “R 101” course at the end isn’t necessary for the purpose of this guide.
Listed below are the details for the specialization, including each course’s description and syllabus.
Data Science Fundamentals
Basic Information
Instructors: Multiple
Platform: Big Data University
Pace: Selfpaced
Cost: Free
Estimated timeline: 13–18 hours, depending on if you take the “R 101” course at the end, which isn’t necessary for the purpose of this guide.
Description
Dust off your labcoat and stretch out your fingers and get ready for the journey of a lifetime that will have you see the everyday through a new lens. Looking at mundane events becomes interesting from the speed of your windshield wipers wiping off the rain to the rate of plant growth in ditches along highways under different conditions. As the study that leads into all things pertinent to humans in present, this path is a must for all who have even the slightest interest in this field.
This learning path currently consists of one course that introduces you to Data Science from a practitioner point of view, to courses that discuss topics such as data compilation, preparation and modeling throughout the lifecycle of data science from basic concepts and methodologies to advanced algorithms. It also discusses how to get some practical knowledge with open source tools, and introduces you to one of the most popular programming languages used by data scientists: R.
Syllabus
View Detailed SyllabusHide Detailed SyllabusCourse 1: Data Science 101
 Module 1: Defining Data Science
 Module 2: What do data science people do?
 A day in the life of a data science person
 R versus Python?
 Data science tools and technology
 “Regression”
 Module 3: Data Science in Business
 How should companies get started in data science?


 Module 4: Use Cases for Data Science
 Applications for data science
 “The Report Structure”
 Module 5: Data Science People
 Things data science people say
 “What Makes Someone a Data Scientist?”
Course 2: Data Science Methodology
 Module 1: From Problem to Approach
 Business Understanding – Concepts & Case Study
 Analytic Approach – Concepts & Case Study
 Module 2: From Requirements to Collection
 Data Requirements – Concepts & Case Study
 Data Collection – Concepts & Case Study
 Module 3: From Understanding to Preparation
 Data Understanding – Concepts & Case Study
 Data Preparation – Concepts & Case Study
 Module 4: From Modeling to Evaluation
 Modeling – Concepts & Case Study
 Evaluation – Concepts & Case Study
 Module 5: From Deployment to Feedback
 Deployment – Concepts & Case Study
 Feedback – Concepts & Case Study
Course 3: Data Science Handson with Open Source Tools
 Module 1: Introducing Data Scientist Workbench
 What is Data Scientist Workbench?
 DSWB Account features
 Creating a DSWB account
 Managing data within My Data
 Preparing data with OpenRefine
 Module 2: Introducing Jupyter Notebooks
 What are Jupyter notebooks?
 Getting started with Jupyter
 Data and Notebooks in Jupyter
 Sharing your Jupyter Notebooks and data
 Apache Spark in Jupyter Notebooks
 Module 3: Introducing Zeppelin Notebooks
 What are Zeppelin Notebooks?
 Zeppelin for Scala
 Getting started with Zeppelin
 Managing your Interpreters in Zeppelin
 Apache Spark in Zeppelin Notebooks
 Module 4: Introducing RStudio IDE
 What is RStudio IDE?
 Uploading files, Installing Packages and loading libraries in RStudio IDE
 Getting started with RStudio IDE
 RStudio Environment and History
 Apache Spark in RStudio IDE
 Module 5: Introducing Seahorse
 What is Seahorse?
 A Glimpse of Seahorse’s Features
 Getting started with Seahorse on DSWB
 Creating and uploading Seahorse Workflows on DSWB
 Exporting and Cloning the Seahorse Examples on DSWB
Course 4: R 101
 Module 1: R basics
 Math, Variables, and Strings
 Vectors and Factors
 Vector operations
 Module 2: Data structures in R
 Arrays & Matrices
 Lists
 Dataframes
 Module 3: R programming fundamentals
 Conditions and loops
 Functions in R
 Objects and Classes
 Debugging
 Module 4: Working with data in R
 Reading CSV and Excel Files
 Reading text files
 Writing and saving data objects to file in R
 Module 5: Strings and Dates in R
 String operations in R
 Regular Expressions
 Dates in R
The competition
Our #1 pick had a weighted average rating of 4.5 out of 5 stars over 3,068 reviews. Let’s look at the other alternatives, sorted by descending rating. Below you’ll find several Rfocused courses, if you are set on an introduction in that language.
 Python for Data Science and Machine Learning Bootcamp (Jose Portilla/Udemy): Full process coverage with a toolheavy focus (Python). Less processdriven and more of a very detailed intro to Python. Amazing course, though not ideal for the scope of this guide. It, like Jose’s R course below, can double as both intros to Python/R and intros to data science. 21.5 hours of content. It has a 4.7star weighted average rating over 1,644 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Data Science and Machine Learning Bootcamp with R (Jose Portilla/Udemy): Full process coverage with a toolheavy focus (R). Less processdriven and more of a very detailed intro to R. Amazing course, though not ideal for the scope of this guide. It, like Jose’s Python course above, can double as both intros to Python/R and intros to data science. 18 hours of content. It has a 4.6star weighted average rating over 847 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Data Science and Machine Learning with Python — Hands On! (Frank Kane/Udemy): Partial process coverage. Focuses on statistics and machine learning. Decent length (nine hours of content). Uses Python. It has a 4.5star weighted average rating over 3,104 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Introduction to Data Science (Data Hawk Tech/Udemy): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Briefly covers both R and Python. It has a 4.4star weighted average rating over 62 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Applied Data Science: An Introduction (Syracuse University/Open Education by Blackboard): Full process coverage, though not evenly spread. Heavily focuses on basic statistics and R. Too applied and not enough process focus for the purpose of this guide. Online course experience feels disjointed. It has a 4.33star weighted average rating over 6 reviews. Free.
 Introduction To Data Science (Nina Zumel & John Mount/Udemy): Partial process coverage only, though good depth in the data preparation and modeling aspects. Okay length (six hours of content). Uses R. It has a 4.3star weighted average rating over 101 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Applied Data Science with Python (V2 Maestros/Udemy): Full process coverage with good depth of coverage for each aspect of the process. Decent length (8.5 hours of content). Uses Python. It has a 4.3star weighted average rating over 92 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Want to be a Data Scientist? (V2 Maestros/Udemy): Full process coverage, though limited depth of coverage. Quite short (3 hours of content). Limited tool coverage. It has a 4.3star weighted average rating over 790 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Data to Insight: an Introduction to Data Analysis (University of Auckland/FutureLearn): Breadth of coverage unclear. Claims to focus on data exploration, discovery, and visualization. Not offered on demand. 24 hours of content (three hours per week over eight weeks). It has a 4star weighted average rating over 2 reviews. Free with paid certificate available.
 Data Science Orientation (Microsoft/edX): Partial process coverage (lacks modeling aspect). Uses Excel, which makes sense given it is a Microsoftbranded course. 12–24 hours of content (twofour hours per week over six weeks). It has a 3.95star weighted average rating over 40 reviews. Free with Verified Certificate available for $25.
 Data Science Essentials (Microsoft/edX): Full process coverage with good depth of coverage for each aspect. Covers R, Python, and Azure ML (a Microsoft machine learning platform). Several 1star reviews citing tool choice (Azure ML) and the instructor’s poor delivery. 18–24 hours of content (threefour hours per week over six weeks). It has a 3.81star weighted average rating over 67 reviews. Free with Verified Certificate available for $49.
 Applied Data Science with R (V2 Maestros/Udemy): The R companion to V2 Maestros’ Python course above. Full process coverage with good depth of coverage for each aspect of the process. Decent length (11 hours of content). Uses R. It has a 3.8star weighted average rating over 212 reviews. Cost varies depending on Udemy discounts, which are frequent.
 Intro to Data Science (Udacity): Partial process coverage, though good depth for the topics covered. Lacks the exploration aspect, though Udacity has a great, full course on exploratory data analysis (EDA). Claims to be 48 hours in length (six hours per week over eight weeks), but is shorter in my experience. Some reviews think the setup to the advanced content is lacking. Feels disorganized. Uses Python. It has a 3.61star weighted average rating over 18 reviews. Free.
 Introduction to Data Science in Python (University of Michigan/Coursera): Partial process coverage. No modeling and vizualization, though courses #2 and #3 in the Applied Data Science with Python Specialization cover these aspects. Taking all three courses would be too in depth for the purpose of this guides. Uses Python. Four weeks in length. It has a 3.6star weighted average rating over 15 reviews. Free and paid options available.
 Datadriven Decision Making (PwC/Coursera): Partial coverage (lacks modeling) with a business focus. Introduces many tools, including R, Python, Excel, SAS, and Tableau. Four weeks in length. It has a 3.5star weighted average rating over 2 reviews. Free and paid options available.
 A Crash Course in Data Science (Johns Hopkins University/Coursera): An extremely brief overview of the full process. Too brief for the purpose of this series. Two hours in length. It has a 3.4star weighted average rating over 19 reviews. Free and paid options available.
 The Data Scientist’s Toolbox (Johns Hopkins University/Coursera): An extremely brief overview of the full process. More of a setup course for Johns Hopkins University’s Data Science Specialization. Claims to have 4–16 hours of content (onefour hours per week over four weeks), though one reviewer noted it could be completed in two hours. It has a 3.22star weighted average rating over 182 reviews. Free and paid options available.
 Data Management and Visualization (Wesleyan University/Coursera): Partial process coverage (lacks modeling). Four weeks in length. Good production value. Uses Python and SAS. It has a 2.67star weighted average rating over 6 reviews. Free and paid options available.
The following courses had no reviews as of January 2017.
 CS109 Data Science (Harvard University): Full process coverage in great depth (probably too in depth for the purpose of this series). A full 12week undergraduate course. Course navigation is difficult since the course is not designed for online consumption. Actual Harvard lectures are filmed. The above data science process infographic originates from this course. Uses Python. No review data. Free.
 Introduction to Data Analytics for Business (University of Colorado Boulder/Coursera): Partial process coverage (lacks modeling and visualization aspects) with a focus on business. The data science process is disguised as the “InformationAction Value chain” in their lectures. Four weeks in length. Describes several tools, though only covers SQL in any depth. No review data. Free and paid options available.
 Introduction to Data Science (Lynda): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Introduces both R and Python. No review data. Cost depends on Lynda subscription.
About Class Central Career Guides
Class Central Career Guides are recommendations for the best online courses and MOOCs.
Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.
These guides are updated frequently to always reflect the best in online education.
Drop us a note at guides@classcentral.com if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!
Author Bio
David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.