Intro

# Udacity: Intro to Data Science

with  Dave Holtz
The Introduction to Data Science class will survey the foundational topics in data science, namely:

* Data Manipulation
* Data Analysis with Statistics and Machine Learning
* Data Communication with Information Visualization
* Data at Scale -- Working with Big Data

The class will focus on breadth and present the topics briefly instead of focusing on a single topic in depth. This will give you the opportunity to sample and apply the basic techniques of data science.

This course is also a part of our Data Analyst Nanodegree.

Why Take This Course?
You will have an opportunity to work through a data science project end to end, from analyzing a dataset to visualizing and communicating your data analysis.

Through working on the class project, you will be exposed to and understand the skills that are needed to become a data scientist yourself.

## Syllabus

### Lesson 1: Introduction to Data Science

- Introduction to Data Science
- What is a Data Scientist
- Pi-Chaun (Data Scientist @ Google): What is Data Science?
- Gabor (Data Scientist @ Twitter): What is Data Science?
- Problems Solved by Data Science
- Pandas
- Dataframes
- Create a New Dataframe

### Lesson 2: Data Wrangling

- What is Data Wrangling?
- Acquiring Data
- Common Data Formats
- What are Relational Databases?
- Aadhaar Data and Relational Databases
- Introduction to Databases Schemas
- API’s
- Data in JSON Format
- How to Access an API efficiently
- Missing Values
- Easy Imputation
- Impute using Linear Regression
- Tip of the Imputation Iceberg

### Lesson 3: Data Analysis

- Statistical Rigor
- Kurt (Data Scientist @ Twitter) - Why is Stats Useful?
- Introduction to Normal Distribution
- T Test
- Welch T Test
- Non-Parametric Tests
- Non-Normal Data
- Stats vs. Machine Learning
- Different Types of Machine Learning
- Prediction with Regression
- Cost Function
- How to Minimize Cost Function
- Coefficients of Determination

### Lesson 4: Data Visualization

- Effective Information Visualization
- Napoleon's March on Russia
- Don (Principal Data Scientist @ AT&T): Communicating Findings
- Rishiraj (Principal Data Scientist @ AT&T): Communicating Findings Well
- Visual Encodings
- Perception of Visual Cues
- Plotting in Python
- Data Scales
- Visualizing Time Series Data

### Lesson 5: MapReduce

- Big Data and MapReduce
- Basics of MapReduce
- Mapper
- Reducer
- MapReduce with Subway Data
13 Student
reviews
Cost Free Online Course
Pace Self Paced
Subject Data Science
Provider Udacity
Language English
Hours 6 hours a week
Calendar 8 weeks long

Disclosure: To support our site, Class Central may be compensated by some course providers.

##### FAQ View All
What are MOOCs?
MOOCs stand for Massive Open Online Courses. These are free online courses from universities around the world (eg. Stanford Harvard MIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you.  They also have student discussion forums, homework/assignments, and online quizzes or exams.

## Reviews for Udacity's Intro to Data Science 4.2 Based on 13 reviews

• 5 stars 38%
• 4 stars 38%
• 3 stars 23%
• 2 star 0%
• 1 star 0%

Did you take this course? Share your experience with other students.

• 1
4.0 4 years ago
completed this course.
Intro to data science is an intermediate level course that assumes basic Python programming skills and knowledge of statistics. The course focuses on gathering, manipulating, analyzing and visualizing data using Python and various Python packages such as numpy, scipy and pandas. One of the best parts about this course is getting some exposure to some Python packages in the scipy stack, although I wish more time was devoted to explaining what the various modules in the scipy stack do, how to set them up at home and when to use them.

The first lesson was fairly gentle introduction with an interesting homework project dealing with data from the Titanic disaster. Lesson 2 goes into more detail about gathering and cleaning data using Pandas and an additional module that lets you make SQL queries to extract data from Pandas data frames. Lesson 3 jumps into data analysis with a T test and linear regression using gradient descent. Going from basic data manipulation into these topi
Intro to data science is an intermediate level course that assumes basic Python programming skills and knowledge of statistics. The course focuses on gathering, manipulating, analyzing and visualizing data using Python and various Python packages such as numpy, scipy and pandas. One of the best parts about this course is getting some exposure to some Python packages in the scipy stack, although I wish more time was devoted to explaining what the various modules in the scipy stack do, how to set them up at home and when to use them.

The first lesson was fairly gentle introduction with an interesting homework project dealing with data from the Titanic disaster. Lesson 2 goes into more detail about gathering and cleaning data using Pandas and an additional module that lets you make SQL queries to extract data from Pandas data frames. Lesson 3 jumps into data analysis with a T test and linear regression using gradient descent. Going from basic data manipulation into these topics was a bit jarring in terms of difficulty and more time could have been spent explaining how the functions worked. I left without a great appreciation of what gradient descent is really doing. Lesson 4 is focused on making visualizations using a module that attempts to port the functionality R language’s ggplot2 plotting package. Finally, lesson 5 introduces the concept of big data and MapReduce as a solution to deal with large data sets. Each homework assignment after the first has students dealing with New York subway turnstile data, which allows students to get some level of familiarity with the data throughout the course. This was a very good decision, since it lets students focus on learning new concepts rather than spending time familiarizing themselves with new data sets over and over again.
11 people found
4.0 3 years ago
by completed this course and found the course difficulty to be medium.
It brings introduction in many areas, but it does not go into depth to any area. For more advanced classes look for other courses on Udacity. Good as introduction.
4 people found
3.0 2 years ago
by partially completed this course, spending 5 hours a week on it and found the course difficulty to be easy.
Though the course uses interesting examples for teaching concepts in relation to data science, the over reliance of the online grader for practice often makes learning redundant. Big part of learning programming is experimentation which the grader does not allow for.
1 person found
5.0 4 months ago
by is taking this course right now, spending 8 hours a week on it and found the course difficulty to be medium.
I was skeptical when I enrolled in UDACITY's Data Analysis Nano Degree Program but not only have they provided the experience they said they would they have steadily made improvements since I enrolled. How many times in your life have you had that experience? Here are SOME of the improvements they have made while I have been enrolled. Initially one could get one-on-one help but usually it was 1 to 2 days out and but then it was video chat.

This was great. I had tried a competitor's course and sometime s one just cannot figure out why something is not working. But not with Udacity. Then they scrapped that and instituted a MENTOR program. Here one could instant message someone who would get back to you in a few hours. Then they scrapped that and now offer LIVE HELP. It is a chat box that one types the gist of your question into. In less than 10 min, often in 3 min , someone comes on. Usually they can immediately figure out your mistake ( it seems students ma
I was skeptical when I enrolled in UDACITY's Data Analysis Nano Degree Program but not only have they provided the experience they said they would they have steadily made improvements since I enrolled. How many times in your life have you had that experience? Here are SOME of the improvements they have made while I have been enrolled. Initially one could get one-on-one help but usually it was 1 to 2 days out and but then it was video chat.

This was great. I had tried a competitor's course and sometime s one just cannot figure out why something is not working. But not with Udacity. Then they scrapped that and instituted a MENTOR program. Here one could instant message someone who would get back to you in a few hours. Then they scrapped that and now offer LIVE HELP. It is a chat box that one types the gist of your question into. In less than 10 min, often in 3 min , someone comes on. Usually they can immediately figure out your mistake ( it seems students make a finite # of errors) but if they cant they ask you to copy and paste your code. And if they still cannot figure it out, i.e., if you have really made a mess of things they do a screen sharing session to get you back on the rails . Don't make a mistake. Just sign up for Udacity.
4.0 3 years ago
completed this course.
0 person found
3.0 2 years ago
by is taking this course right now.
0 person found
4.0 3 years ago
is taking this course right now.
0 person found
5.0 3 years ago
by is taking this course right now.
4.0 3 years ago
is taking this course right now.
5.0 2 years ago
by partially completed this course.
5.0 11 months ago
by partially completed this course.