subject

edX: Big Data Analysis with Apache Spark

 with  Anthony D. Joseph
Class Central Course Rank
#2 in Subjects > Data Science > Big Data

HIGHEST RATED MOOC

This course is a Top 50 MOOC of All Time based on thousands of reviews written by Class Central users. It's guaranteed to be good!

Check out the rest of the Top 50 here.

Organizations use their data to support and influence decisions and build data-intensive products and services, such as recommendation, prediction, and diagnostic systems. The collection of skills required by organizations to support these functions has been grouped under the term ‘data science’.

This statistics and data analysis course will attempt to articulate the expected output of data scientists and then teach students how to use PySpark (part of Spark) to deliver against these expectations. The course assignments include log mining, textual entity recognition, and collaborative filtering exercises that teach students how to manipulate data sets using parallel processing with PySpark.

This course covers advanced undergraduate-level material. It requires a programming background and experience with Python (or the ability to learn it quickly). All exercises will use PySpark (the Python API for Spark), and previous experience with Spark equivalent to Introduction to Apache Spark, is required.

43 Student
reviews
Cost Free Online Course
Pace Finished
Subject Big Data
Provider edX
Language English
Hours 5-7 hours a week
Calendar 4 weeks long
+ Add to My Courses
In-Depth Review
Discover the Spark API through lectures and programming assignments using IPython Read Review
Learn Data Analysis udacity.com

Learn to become a Data Analyst. Job offer guaranteed or get a full refund.

Advertisement
Become a Data Scientist datacamp.com

Learn Python & R at your own pace. Start now for free!

Advertisement
FAQ View All
What are MOOCs?
MOOCs stand for Massive Open Online Courses. These are free online courses from universities around the world (eg. Stanford Harvard MIT) offered to anyone with an internet connection.
How do I register?
To register for a course, click on "Go to Class" button on the course page. This will take you to the providers website where you can register for the course.
How do these MOOCs or free online courses work?
MOOCs are designed for an online audience, teaching primarily through short (5-20 min.) pre recorded video lectures, that you watch on weekly schedule when convenient for you.  They also have student discussion forums, homework/assignments, and online quizzes or exams.

43 reviews for edX's Big Data Analysis with Apache Spark

Write a review
6 out of 6 people found the following review useful
2 years ago
CS100.1x Introduction to Big Data with Apache Spark is a 5-week intro to distributed computing offered by UC Berkeley through the edX MOOC platform focused on teaching students how to perform large-scale computation using Apache Spark. The assignments use PySpark, Spark’s Python API, so some familiarity with Python p Read More


CS100.1x Introduction to Big Data with Apache Spark is a 5-week intro to distributed computing offered by UC Berkeley through the edX MOOC platform focused on teaching students how to perform large-scale computation using Apache Spark. The assignments use PySpark, Spark’s Python API, so some familiarity with Python programming is necessary. You don’t need prior exposure to big data or distributed computing to take the course. Grades are based on four programming labs (80%), easy comprehension questions that allow unlimited attempts (12%) and setup of the course virtual machine used to complete the labs (8%).

Course lectures in to Big Data with Apache Spark are relatively brief and tend to stay at a high level, discussing general big data concepts rather than the details of Apache Spark. The instructor does a fine job in the few lectures the course offers, but there were not enough of them and they often felt disconnected from the assignments. The fifth week had no lectures.

The labs are the core of this course. While you can breeze through weekly lectures in half an hour or less, each of the four labs are lengthy reading and programming assignments packaged in IPython notebooks. Expect to spend 2 to 4 hours on labs 1, 2 and 4 and 3 to 6 hours on lab 3. The labs start by teaching basic Apache Spark manipulations and move on to some text analysis and machine learning. Using the IPython notebook to deliver labs is a convenient way to intermingle text and instructions with code. On the other hand, each exercise tends to depend on code executed somewhere above it, so a mistake made on earlier exercise can lead to some odd errors later on and Spark’s error traces aren’t particularly helpful. The course does provide some basic tests for each exercise, but it is easy to arrive at solutions that pass the checks but cause errors later on. The course forums on Piazza are a vital resource for troubleshooting and disambiguation; I imagine some of the snags will be resolved in future offerings. Despite the occasional hiccups, the labs do a good job familiarizing students with Apache Spark’s Resilient Distributed Dataset objects and the various transformations and actions you can perform with them.

Introduction to Big Data with Apache Spark is a great place to start learning about distributed computing if you know some Python. Although the lectures don’t add much technical depth to the course, they provide some big picture background that will be useful for students who have little prior exposure to big data concepts. The labs give you adequate opportunity to get your hands dirty with Apache Spark to gain basic familiarity with data manipulations it offers. UC Berkley is offering a follow-up course “Scalable Machine Learning” that builds on the foundation laid in CS100.1x.

I give this course 4 out of 5 stars: Very Good.
Was this review helpful to you? YES | NO
2 out of 2 people found the following review useful
2 years ago
Martin Strandbygaard completed this course, spending 4 hours a week on it and found the course difficulty to be medium.
Overall a good course, that is worthwhile spending the time on, if you want to get familiar with spark and the map-reduce programming model. The lecture videos and quizzes are pretty lightweight, and nothing spectacular. However, I found the assignments really well structured, interesting, and informative. They use IP Read More
Overall a good course, that is worthwhile spending the time on, if you want to get familiar with spark and the map-reduce programming model.

The lecture videos and quizzes are pretty lightweight, and nothing spectacular. However, I found the assignments really well structured, interesting, and informative. They use IPython notebook which I found to be a really awesome format for this kind of course and assignments.

The course is not heave on mathematics and statistics, but the assignments will challenge you to really understand the stated problems, and the map-reduce programming model, to successfully complete them.

Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
a year ago
Wendao Liu is taking this course right now.
Slightly disappointed by the content, not very informative. if u wanna learn more about spark, u definitely need explore more material.
Was this review helpful to you? YES | NO
1 out of 1 people found the following review useful
2 years ago
Gaurav Srivastva is taking this course right now, spending 4 hours a week on it and found the course difficulty to be medium.
Lectures are very light in content and disappointing but the labs are good and do require students to investigate and complete them.
Was this review helpful to you? YES | NO
2 years ago
Charlie Soliman completed this course, spending 5 hours a week on it and found the course difficulty to be hard.
This is an excellent course for beginners to the world of Spark but it would be a good idea to have some programming knowledge in Python as well as basic understanding of what big data means. The problem sets are organized methodically with much explanation so even if you don't know much statistics you can still follow Read More
This is an excellent course for beginners to the world of Spark but it would be a good idea to have some programming knowledge in Python as well as basic understanding of what big data means. The problem sets are organized methodically with much explanation so even if you don't know much statistics you can still follow with the programming. I'm no statistician but managed to go through all problem sets with few mistakes. It certainly was fun on top of being educational and informative.
Was this review helpful to you? YES | NO
2 years ago
Anoop Toffy is taking this course right now and found the course difficulty to be medium.
It was nice course. I loved it.

Good Intro PySpark API.

Nice set of Problem set.

As a part of it, if you are lucky you will get access to Databricks clouds
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Shuang Wu completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Tabish Sada audited this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Gabriel Trautmann is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
V M completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Klaas Naaijkens audited this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Prakhar Srivastav completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Rogier Werschkull completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Karri S completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Hamza Rashid is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Kuronosuke is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Vlad Podgurschi completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Chema Cortés completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Rakesh is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Matteo Ferrara partially completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Геночка Кузнецов completed this course.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
Igor Subbotin is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 1 people found the following review useful
2 years ago
profile picture
Jevgeni Martjushev is taking this course right now.
Was this review helpful to you? YES | NO
Was this review helpful to you? YES | NO
9 months ago
profile picture
Mertez completed this course.
Was this review helpful to you? YES | NO
a year ago
Mark Henry Butler completed this course.
Was this review helpful to you? YES | NO
a year ago
François Allain completed this course.
Was this review helpful to you? YES | NO
10 months ago
Davide Madrisan completed this course.
Was this review helpful to you? YES | NO
2 years ago
Asr partially completed this course.
Was this review helpful to you? YES | NO
a year ago
Colin Khein completed this course.
Was this review helpful to you? YES | NO
12 months ago
profile picture
César Alba completed this course.
Was this review helpful to you? YES | NO
a year ago
Shayan Fahimi completed this course.
Was this review helpful to you? YES | NO
a year ago
Cristina completed this course.
Was this review helpful to you? YES | NO
7 months ago
Ronny De Winter completed this course.
Was this review helpful to you? YES | NO
2 years ago
Sundeep is taking this course right now.
Was this review helpful to you? YES | NO
2 years ago
profile picture
Gerhard Gasseling completed this course.
Was this review helpful to you? YES | NO
2 years ago
Gregory Deangelis completed this course.
Was this review helpful to you? YES | NO
a year ago
profile picture
Sebastien Pujadas completed this course.
Was this review helpful to you? YES | NO
3 months ago
Adam Hjerpe completed this course.
Was this review helpful to you? YES | NO
0 out of 6 people found the following review useful
2 years ago
Sauro Grandi completed this course.
Was this review helpful to you? YES | NO
0 out of 3 people found the following review useful
2 years ago
Alejandro Mercado is taking this course right now.
Was this review helpful to you? YES | NO
0 out of 3 people found the following review useful
2 years ago
Abdul Qadir Ibrahim is taking this course right now.
Was this review helpful to you? YES | NO

Write a review

How would you rate this course? *
How much of the course did you finish? *
Review
Create Review