23 minute read  written by  . Published on January 25, 2017

Editor’s note: Drop us a note at guides@class-central.com if you have any feedback or requests for particular career guides. We are also looking for contributors!

Here are the parts of the series that have been published so far:

  1. The Best Intro to Programming Courses for Data Science
  2. The Best Statistics & Probability Courses for Data Science
  3. The Best Intro to Data Science Courses (this one)
  4. The Best Data Visualization Courses

The Best Intro To Data Science Courses

Our pick

The best online introduction to data science course is Kirill Eremenko’s “Data Science A-Z.” The course, which has a 4.5-star weighted average rating over 3,071 reviews, is among the highest rated and most reviewed courses of the ones considered. It is the clear winner in terms of breadth and depth of coverage of the data science process. The instructor’s natural teaching ability is frequently praised by reviewers.

Data Science A-Z™: Real-Life Data Science Exercises Included by Kirill Eremenko on Udemy

A great Python-focused introduction

Udacity’s Intro to Data Analysis covers the data science process cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5-star rating over one review. It is relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. The videos are well-produced and the instructor (Caroline Buckey) is clear and personable.

Intro to Data Analysis by Udacity

An impressive offering with no review data

Data Science Fundamentals is a four-course series provided by Big Data University, which is an IBM initiative. The series covers the full data science process and introduces Python, R, and several other open-source tools. The courses have tremendous production value. Unfortunately, they have no review data on the major review sites that were used for this analysis.

Data Science Fundamentals by Big Data University

Table of Contents

  1. Why You Should Trust Us
  2. About the Data Science Career Guide
  3. How We Picked Courses to Consider
  4. How We Tested
  5. What is the Data Science Process?
  6. Basic Coding, Stats, and Probability Required
  7. Our Pick
  8. A Great Python-focused Introduction
  9. An Impressive Offering with No Review Data
  10. The Competition
  11. About Class Central Career Guides
  12. Author Bio

Why You Should Trust Us

I started creating my own data science master’s degree using online courses almost a year ago. I have taken many data science-related courses and audited portions of many more. I know the options out there, and what skills are needed for learners preparing for a data analyst or data scientist role.

For this guide, I spent 10+ hours trying to identify every online intro to data science course offered as of January 2017, extracting key bits of information from their syllabi and reviews, and compiling their ratings. For this task, I turned to none other than the open source Class Central community and its database of thousands of course ratings and reviews.

Class Central Home Page

Since 2011, Class Central founder Dhawal Shah has kept a closer eye on online courses than arguably anyone else in the world. Dhawal personally helped me assemble this list of resources.

About the Data Science Career Guide

Class Central’s Data Science Career Guide is a six-piece series that recommends the best MOOCs for launching yourself into the data science industry. The first five pieces recommend the best courses for several data science core competencies (programming, statistics, the data science process, data visualization, and machine learning). The final piece is a summary of those courses and the best MOOCs for other key topics such as data wrangling, databases, and even software engineering.

Here are the parts of the series that have been published so far:

  1. The Best Intro to Programming Courses for Data Science
  2. The Best Statistics & Probability Courses for Data Science
  3. The Best Intro to Data Science Courses (this one)
  4. The Best Data Visualization Courses

P.S. If you are looking for a complete list of Data Science MOOCs, you can find them on Class Central’s Data Science and Big Data subject page.

How We Picked Courses to Consider

Each course must fit three criteria:

  1. It must teach the data science process. More on that soon.
  2. It must be on-demand or offered every few months.
  3. It must be an interactive online course, so no books or read-only tutorials. Though these are viable ways to learn, this guide focuses on courses.

We believe we covered every notable course that fits the above criteria. Since there are seemingly hundreds of courses on Udemy, we chose to consider the most-reviewed and highest-rated ones only. There’s always a chance that we missed something, though. So please let us know in the comments section if we left a good course out.

How We Tested

We compiled average rating and number of reviews from Class Central and other review sites to calculate a weighted average rating for each course. We read text reviews and used this feedback to supplement the numerical ratings.

We made subjective syllabus judgment calls based on two factors:

  1. Coverage of the data science process. Does the course brush over or skip certain subjects? Does it cover certain subjects in too much detail? See the next section for what this process entails.
  2. Usage of common data science tools. Is the course taught using popular programming languages like Python and/or R? These aren’t necessary, but helpful in most cases so slight preference is given to these courses.

Python and R logos

What is the Data Science Process?

What is data science? What does a data scientist do? These are the types of fundamental questions that an intro to data science course should answer. The following infographic from Harvard professors Joe Blitzstein and Hanspeter Pfister outlines a typical data science process, which will help us answer these questions.

Data Science Process

Viz from Opera Solutions.

Our goal with this introduction to data science course is to become familiar with the data science process. We don’t want to go in-depth coverage of specific aspects of the process, hence the “intro to” portion of the title. For each aspect, the ideal course explains key concepts within the framework of the process, introduces common tools, and provides a few examples (preferably hands-on).

We are only looking for an introduction. This guide therefore won’t include full specializations or programs like Johns Hopkins University’s Data Science Specialization on Coursera or Udacity’s Data Analyst Nanodegree. These compilations of courses elude the purpose of this series: to find the best individual courses for each subject to comprise a data science education. The next guides in the series will cover each aspect of the data science process in detail.

Basic Coding, Stats, and Probability Experience Required

Several courses listed below require basic programming, statistics, and probability experience. This requirement is understandable given that the new content is reasonably advanced and that these subjects often have several courses dedicated to them.

This experience can be acquired through our recommendations in the first two articles (programming, statistics) in this Data Science Career Guide.

Our Pick

Data Science A-Z™: Real-Life Data Science Exercises Included by Kirill Eremenko on Udemy

Kirill Eremenko’s “Data Science A-Z” on Udemy is the clear winner in terms of breadth and depth of coverage of the data science process of the 20+ courses that qualified. It has a 4.5-star weighted average rating over 3,071 reviews, which places it among the highest rated and most reviewed courses of the ones considered. It outlines the full process, makes it clear that it can be iterative, and provides real-life examples. Reviewers love the instructor’s delivery and the organization of the content.

Though it doesn’t check our “usage of common data science tools” box, the non-Python/R tool choices (gretl, Tableau, Excel) are used effectively in context. Eremenko mentions the following when explaining the gretl choice (gretl is a statistical software package), though it applies to all of the tools he uses (emphasis mine):

In gretl, we will be able to do the same modeling just like in R and Python but we won’t have to code. That’s the big deal here. Some of you may already know R very well, but some may not know it at all. My goal is to show you how to build a robust model and give you a framework that you can apply in any tool you choose. gretl will help us avoid getting bogged down in our coding. 

Udemy logo

Listed below are the details for each course, including their description, syllabus, and prominent reviews.

Data Science A-Z™: Real-Life Data Science Exercises Included

Basic Information

Instructor: Kirill Eremenko, SuperDataScience Team

Platform: Udemy

Pace: Self-paced

Cost: Varies depending on Udemy discounts, which are frequent. Can be purchased for as little as $10.

Estimated timeline: 21 hours

Kirill Eremenko in Data Science A-Z Class

Description

This course will give you a full overview of the Data Science journey. Upon completing this course you will know:

  • How to clean and prepare your data for analysis
  • How to perform basic visualization of your data
  • How to model your data
  • How to curve-fit your data
  • And finally, how to present your findings and wow the audience

This course will give you so much practical exercises that real world will seem like a piece of cake when you graduate this class. This course has homework exercises that are so thought provoking and challenging that you will want to cry… But you won’t give up! You will crush it. In this course you will develop a good understanding of the following tools:

  • SQL
  • SSIS
  • Tableau
  • Gretl

This course has pre-planned pathways. Using these pathways you can navigate the course and combine sections into YOUR OWN journey that will get you the skills that YOU need.

Or you can do the whole course and set yourself up for an incredible career in Data Science. The choice is yours. Join the class and start learning today!

Syllabus

View Detailed Syllabus

Sections

  • 1: Get Excited
  • 2: What is Data Science?
  • 3: Part 1: Visualisation
  • 4: Introduction to Tableau
  • 5: How to use Tableau for Data Mining
  • 6: Advanced Data Mining With Tableau
  • 7: Part 2: Modelling
  • 8: Stats Refresher
  • 9: Simple Linear Regression
  • 10: Multiple Linear Regression
  • 11: Logistic Regression
  • 12: Building a robust geodemographic segmentation model
  • 13: Assessing your model
  • 14: Drawing insights from your model
  • 15: Model maintenance
  • 16: Part 3: Data Preparation
  • 17: Business Intelligence (BI) Tools
  • 18: ETL Phase 1: Data Wrangling before the Load
  • 19: ETL Phase 2: Step-by-step guide to uploading data using SSIS
  • 20: Handling errors during ETL (Phases 1 & 2)
  • 21: SQL Programming for Data Science
  • 22: ETL Phase 3: Data Wrangling after the load
  • 23: Handling errors during ETL (Phase 3)
  • 24: Part 4: Communication
  • 25: Working with people
  • 26: Presenting for Data Scientists
  • 27: Homework Solutions
  • 28: Bonus Lectures

Reviews

“Kirill is the best teacher I’ve found online. He uses real life examples and explains common problems so that you get a deeper understanding of the coursework. He also provides a lot of insight as to what it means to be a data scientist from working with insufficient data all the way to presenting your work to C-class management. I highly recommend this course for beginner students to intermediate data analysts!”

“This course has been absolutely amazing. Very valuable actually being *shown* the whole process of data science while working through it yourself.”

“Outstanding content delivered in a user-friendly way. Kirill has a natural ability to teach. Everything is explained to the exact level of detail you would need with no assumptions made of previous knowledge. Highly recommended.”

Link to reviews (bottom of the page).

A Great Python-focused Introduction

Intro to Data Analysis by Udacity

Udacity’s Intro to Data Analysis is a relatively new offering that is part of Udacity’s popular Data Analyst Nanodegree. It covers the data science process clearly and cohesively using Python, though it lacks a bit in the modeling aspect. It has a 5-star rating over one review.

The videos are well-produced and the instructor (Caroline Buckey) is clear and personable. Lots of programming quizzes enforce the concepts learned in the videos. Students will leave the course confident in their new and/or improved NumPy and Pandas skills (these are popular Python libraries). The final project, which is graded and reviewed in the Nanodegree but not in the free individual course, can be a nice add to a portfolio.

Udacity logo

Listed below are the details for the specialization, including each course’s description and syllabus.

Intro to Data Analysis

Basic Information

Instructors: Caroline Buckey

Platform: Udacity

Pace: Self-paced

Cost: Free

Estimated timeline: Six weeks at six hours per week (for a total of 36 hours), though it is shorter in my experience.

Udacity Instructor Caroline Buckey

Description

This course will introduce you to the world of data analysis. You’ll learn how to go through the entire data analysis process, which includes:

  • Posing a question
  • Wrangling your data into a format you can use and fixing any problems with it
  • Exploring the data, finding patterns in it, and building your intuition about it
  • Drawing conclusions and/or making predictions
  • Communicating your findings

You’ll also learn how to use the Python libraries NumPy, Pandas, and Matplotlib to write code that’s cleaner, more concise, and runs faster.

Syllabus

View Detailed Syllabus

LESSON 1: Data Analysis Process

  • Learn about the data analysis process.
  • Pose a question, wrangle your data, draw conclusions and/or make predictions.
  • Complete an analysis of Udacity student data using pure Python, with few additional libraries.

LESSON 2: NumPy and Pandas for 1D Data

  • Start learning to use NumPy and Pandas to make the data analysis process easier.
  • Features that apply to one-dimensional data.
  • Learn to use NumPy arrays, Pandas Series, and vectorized operations.

LESSON 3: NumPy and Pandas for 2D Data

  • Continue learning about NumPy and Pandas, this time focusing on two-dimensional data.
  • Learn to use two-dimensional NumPy arrays and Pandas DataFrames.
  • Group your data and to combine data from multiple files.

LESSON 4: Investigate a Dataset

  • Use NumPy and Pandas to go through the data analysis process on one of a list of recommended datasets.

An Impressive Offering with No Review Data

Data Science Fundamentals by Big Data University

Data Science Fundamentals is a four-course series provided by IBM’s Big Data University. It includes courses titled Data Science 101, Data Science Methodology, Data Science Hands-on with Open Source Tools, and R 101. It covers the full data science process and introduces Python, R, and several other open-source tools. Unfortunately, it has no review data on the major review sites that we used for this analysis, so we can’t recommend it over the above two options yet.

The courses have tremendous production value. The 5-hour “R 101” course at the end isn’t necessary for the purpose of this guide.

Big Data University Logo

 

Listed below are the details for the specialization, including each course’s description and syllabus.

Data Science Fundamentals

Basic Information

Instructors: Multiple

Platform: Big Data University

Pace: Self-paced

Cost: Free

Estimated timeline: 13–18 hours, depending on if you take the “R 101” course at the end, which isn’t necessary for the purpose of this guide.

Murtaza Haider

Description

Dust off your lab-coat and stretch out your fingers and get ready for the journey of a lifetime that will have you see the everyday through a new lens. Looking at mundane events becomes interesting from the speed of your windshield wipers wiping off the rain to the rate of plant growth in ditches along highways under different conditions. As the study that leads into all things pertinent to humans in present, this path is a must for all who have even the slightest interest in this field.

This learning path currently consists of one course that introduces you to Data Science from a practitioner point of view, to courses that discuss topics such as data compilation, preparation and modeling throughout the life-cycle of data science from basic concepts and methodologies to advanced algorithms. It also discusses how to get some practical knowledge with open source tools, and introduces you to one of the most popular programming languages used by data scientists: R.

Syllabus

View Detailed Syllabus

Course 1: Data Science 101

  • Module 1: Defining Data Science
    • What is data science?
    • There are many paths to data science
    • Any advice for a new data scientist?
    • What is the cloud?
    • “Data Science: The Sexiest Job in the 21st Century”
  • Module 2: What do data science people do?
    • A day in the life of a data science person
    • R versus Python?
    • Data science tools and technology
    • “Regression”
  • Module 3: Data Science in Business
    • How should companies get started in data science?
    • Tips for recruiting data science people
    • “The Final Deliverable”
  • Module 4: Use Cases for Data Science
    • Applications for data science
    • “The Report Structure”
  • Module 5: Data Science People
    • Things data science people say
    • “What Makes Someone a Data Scientist?”

Course 2: Data Science Methodology

  • Module 1: From Problem to Approach
    • Business Understanding – Concepts & Case Study
    • Analytic Approach – Concepts & Case Study
  • Module 2: From Requirements to Collection
    • Data Requirements – Concepts & Case Study
    • Data Collection – Concepts & Case Study
  • Module 3: From Understanding to Preparation
    • Data Understanding – Concepts & Case Study
    • Data Preparation – Concepts & Case Study
  • Module 4: From Modeling to Evaluation
    • Modeling – Concepts & Case Study
    • Evaluation – Concepts & Case Study
  • Module 5: From Deployment to Feedback
    • Deployment – Concepts & Case Study
    • Feedback – Concepts & Case Study

Course 3: Data Science Hands-on with Open Source Tools

  • Module 1: Introducing Data Scientist Workbench
    • What is Data Scientist Workbench?
    • DSWB Account features
    • Creating a DSWB account
    • Managing data within My Data
    • Preparing data with OpenRefine
  • Module 2: Introducing Jupyter Notebooks
    • What are Jupyter notebooks?
    • Getting started with Jupyter
    • Data and Notebooks in Jupyter
    • Sharing your Jupyter Notebooks and data
    • Apache Spark in Jupyter Notebooks
  • Module 3: Introducing Zeppelin Notebooks
    • What are Zeppelin Notebooks?
    • Zeppelin for Scala
    • Getting started with Zeppelin
    • Managing your Interpreters in Zeppelin
    • Apache Spark in Zeppelin Notebooks
  • Module 4: Introducing RStudio IDE
    • What is RStudio IDE?
    • Uploading files, Installing Packages and loading libraries in RStudio IDE
    • Getting started with RStudio IDE
    • RStudio Environment and History
    • Apache Spark in RStudio IDE
  • Module 5: Introducing Seahorse
    • What is Seahorse?
    • A Glimpse of Seahorse’s Features
    • Getting started with Seahorse on DSWB
    • Creating and uploading Seahorse Workflows on DSWB
    • Exporting and Cloning the Seahorse Examples on DSWB

Course 4: R 101

  • Module 1: R basics
    • Math, Variables, and Strings
    • Vectors and Factors
    • Vector operations
  • Module 2: Data structures in R
    • Arrays & Matrices
    • Lists
    • Dataframes
  • Module 3: R programming fundamentals
    • Conditions and loops
    • Functions in R
    • Objects and Classes
    • Debugging
  • Module 4: Working with data in R
    • Reading CSV and Excel Files
    • Reading text files
    • Writing and saving data objects to file in R
  • Module 5: Strings and Dates in R
    • String operations in R
    • Regular Expressions
    • Dates in R

The competition

Our #1 pick had a weighted average rating of 4.5 out of 5 stars over 3,068 reviews. Let’s look at the other alternatives, sorted by descending rating. Below you’ll find several R-focused courses, if you are set on an introduction in that language.

  • Python for Data Science and Machine Learning Bootcamp (Jose Portilla/Udemy): Full process coverage with a tool-heavy focus (Python). Less process-driven and more of a very detailed intro to Python. Amazing course, though not ideal for the scope of this guide. It, like Jose’s R course below, can double as both intros to Python/R and intros to data science. 21.5 hours of content. It has a 4.7-star weighted average rating over 1,644 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Data Science and Machine Learning Bootcamp with R (Jose Portilla/Udemy): Full process coverage with a tool-heavy focus (R). Less process-driven and more of a very detailed intro to R. Amazing course, though not ideal for the scope of this guide. It, like Jose’s Python course above, can double as both intros to Python/R and intros to data science. 18 hours of content. It has a 4.6-star weighted average rating over 847 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Data Science and Machine Learning with Python — Hands On! (Frank Kane/Udemy): Partial process coverage. Focuses on statistics and machine learning. Decent length (nine hours of content). Uses Python. It has a 4.5-star weighted average rating over 3,104 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Introduction to Data Science (Data Hawk Tech/Udemy): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Briefly covers both R and Python. It has a 4.4-star weighted average rating over 62 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Applied Data Science: An Introduction (Syracuse University/Open Education by Blackboard): Full process coverage, though not evenly spread. Heavily focuses on basic statistics and R. Too applied and not enough process focus for the purpose of this guide. Online course experience feels disjointed. It has a 4.33-star weighted average rating over 6 reviews. Free.
  • Introduction To Data Science (Nina Zumel & John Mount/Udemy): Partial process coverage only, though good depth in the data preparation and modeling aspects. Okay length (six hours of content). Uses R. It has a 4.3-star weighted average rating over 101 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Applied Data Science with Python (V2 Maestros/Udemy): Full process coverage with good depth of coverage for each aspect of the process. Decent length (8.5 hours of content). Uses Python. It has a 4.3-star weighted average rating over 92 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Want to be a Data Scientist? (V2 Maestros/Udemy): Full process coverage, though limited depth of coverage. Quite short (3 hours of content). Limited tool coverage. It has a 4.3-star weighted average rating over 790 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Data to Insight: an Introduction to Data Analysis (University of Auckland/FutureLearn): Breadth of coverage unclear. Claims to focus on data exploration, discovery, and visualization. Not offered on demand. 24 hours of content (three hours per week over eight weeks). It has a 4-star weighted average rating over 2 reviews. Free with paid certificate available.
  • Data Science Orientation (Microsoft/edX): Partial process coverage (lacks modeling aspect). Uses Excel, which makes sense given it is a Microsoft-branded course. 12–24 hours of content (two-four hours per week over six weeks). It has a 3.95-star weighted average rating over 40 reviews. Free with Verified Certificate available for $25.
  • Data Science Essentials (Microsoft/edX): Full process coverage with good depth of coverage for each aspect. Covers R, Python, and Azure ML (a Microsoft machine learning platform). Several 1-star reviews citing tool choice (Azure ML) and the instructor’s poor delivery. 18–24 hours of content (three-four hours per week over six weeks). It has a 3.81-star weighted average rating over 67 reviews. Free with Verified Certificate available for $49.
  • Applied Data Science with R (V2 Maestros/Udemy): The R companion to V2 Maestros’ Python course above. Full process coverage with good depth of coverage for each aspect of the process. Decent length (11 hours of content). Uses R. It has a 3.8-star weighted average rating over 212 reviews. Cost varies depending on Udemy discounts, which are frequent.
  • Intro to Data Science (Udacity): Partial process coverage, though good depth for the topics covered. Lacks the exploration aspect, though Udacity has a great, full course on exploratory data analysis (EDA). Claims to be 48 hours in length (six hours per week over eight weeks), but is shorter in my experience. Some reviews think the set-up to the advanced content is lacking. Feels disorganized. Uses Python. It has a 3.61-star weighted average rating over 18 reviews. Free.
  • Introduction to Data Science in Python (University of Michigan/Coursera): Partial process coverage. No modeling and vizualization, though courses #2 and #3 in the Applied Data Science with Python Specialization cover these aspects. Taking all three courses would be too in depth for the purpose of this guides. Uses Python. Four weeks in length. It has a 3.6-star weighted average rating over 15 reviews. Free and paid options available.
  • Data-driven Decision Making (PwC/Coursera): Partial coverage (lacks modeling) with a business focus. Introduces many tools, including R, Python, Excel, SAS, and Tableau. Four weeks in length. It has a 3.5-star weighted average rating over 2 reviews. Free and paid options available.
  • A Crash Course in Data Science (Johns Hopkins University/Coursera): An extremely brief overview of the full process. Too brief for the purpose of this series. Two hours in length. It has a 3.4-star weighted average rating over 19 reviews. Free and paid options available.
  • The Data Scientist’s Toolbox (Johns Hopkins University/Coursera): An extremely brief overview of the full process. More of a set-up course for Johns Hopkins University’s Data Science Specialization. Claims to have 4–16 hours of content (one-four hours per week over four weeks), though one reviewer noted it could be completed in two hours. It has a 3.22-star weighted average rating over 182 reviews. Free and paid options available.
  • Data Management and Visualization (Wesleyan University/Coursera): Partial process coverage (lacks modeling). Four weeks in length. Good production value. Uses Python and SAS. It has a 2.67-star weighted average rating over 6 reviews. Free and paid options available.

The following courses had no reviews as of January 2017.

  • CS109 Data Science (Harvard University): Full process coverage in great depth (probably too in depth for the purpose of this series). A full 12-week undergraduate course. Course navigation is difficult since the course is not designed for online consumption. Actual Harvard lectures are filmed. The above data science process infographic originates from this course. Uses Python. No review data. Free.
  • Introduction to Data Analytics for Business (University of Colorado Boulder/Coursera): Partial process coverage (lacks modeling and visualization aspects) with a focus on business. The data science process is disguised as the “Information-Action Value chain” in their lectures. Four weeks in length. Describes several tools, though only covers SQL in any depth. No review data. Free and paid options available.
  • Introduction to Data Science (Lynda): Full process coverage, though limited depth of coverage. Quite short (three hours of content). Introduces both R and Python. No review data. Cost depends on Lynda subscription.

About Class Central Career Guides

Class Central Career Guides are recommendations for the best online courses and MOOCs.  

Class Central Career Guides are recommendations for the best online courses and MOOCs. They have one goal: to enable you to quickly figure out which courses can help you learn new skills and advance your career. Our editorial picks are thoroughly researched using reviews written by Class Central users, as well as data from other sources and our own subjective analysis.

These guides are updated frequently to always reflect the best in online education.

Drop us a note at guides@class-central.com if you have any feedback or requests for particular career guides — it will help us prioritize. Also, reach out to us if you want to help us create more of these career guides. We are looking for contributors!

 Author Bio

David Venturi

David Venturi created a personalized data science master’s curriculum for himself using MOOCs. He has a dual degree in Chemical Engineering and Economics, and especially enjoys math, stats, and coding. He’s a huge baseball and hockey fan, and writes about the latter with a focus on analytics.

Twitter Medium Web

  • shweta

    how can data science be useful for chartered accountants?