6 minute read  written by  . Published on May 27, 2015

‘Data scientist’ has been been called the sexiest job of the 21st Century. So how do you become one? Charlie Chung of Class Central sat down with Roger Peng, professor in the Department of Biostatistics at the Bloomberg School of Public Health at Johns Hopkins University (JHU), and co-director of the Data Science Specialization on the Coursera platform.

Roger Peng

Data science is the practice of capturing and analyzing data to develop insights. In some ways, this is as old as rational thought: inductive reasoning is based on drawing conclusions from related facts; analyzing data is a core component of the scientific method, perhaps the biggest driver of progress in modern civilization. So what has changed? Two things: the amount of data we have, and the tools we have to analyze it. With technology an integral part of our lives, we are generating data at an astounding rate. And not just manually-generated data (like the hundreds of terabytes uploaded by users to Facebook each day): a modern PET scan of a person’s brain generates about 70MB of raw data, and a Boeing 787 airplane is said to generate half a terabyte of data per flight.

Thankfully, we have also had advances in computing hardware, software, and algorithms. With these trends, it seems advisable that most people to become familiar with data science, many people to learn about it at some level, and  some people to develop strong skills (some becoming ‘data scientists’). Prof. Peng notes:

Often, the people who collect the data don’t necessarily know how to do it because they haven’t been trained. They need to collaborate with a data scientist to help them make sense of their data. So, the demand for these skills is ‘off the charts’ right now, more so than it was even when I was first starting out 

If you are looking to get into data science may be interested in taking the JHU Data Science Specialization. In addition to a solid understanding of the fundamentals, it is focused on building practical skills, using real tools on real data,  resulting in tangible data products that can be seen by others.

What the Specialization Courses will Teach

Prof. Peng describes how the concept of the data science specialization came about:

We started thinking about the specialization almost a year and a half ago, back in 2013. Brian Caffo, Jeff Leek, and I had always thought that the statistics curriculum that we taught at Johns Hopkins could use a refresher. There’s been a lot of change over time in the technology of computing. When Johns Hopkins developed the partnership with Coursera, we thought it was a prime opportunity to make what we thought would be the ideal curriculum. So we just started with a completely blank slate, and mapped out everything we would want in a curriculum that we would teach.

The Specialization consists of a series of 9 courses each four weeks long (Prof. Peng suggests taking no more than two at a time, thus requiring between 4-9 months), plus a Capstone project (described further in the next section). To take the Capstone and earn the Specialization, you will need to register and pay for a verified certificate for each course.

Prof. Peng describes the sequence of topics in the Specialization:

Learners who go through these courses will also utilize and become familiar with a number of tools that are used in analyzing data, including these:

• GitHub to store data analysis code

• Plotly to create interactive graphs

• R Markdown to embed documentation in stats code, (to be published on RPubs)

• Shiny to create interactive webapps for your data analysis

Learning data science requires rolling up your sleeves, and being familiar with some of the main tools used is very important. But it gets more real-world than this–after these nine courses, there’s an applied project that you work on in the Capstone course.

Real Data and a Real Project – the Capstone Course

In the Capstone course, you work on a complete data science project, from beginning to end, using a publicly available data set. To help develop the project, JHU partnered with app developer SwiftKey, maker of the most popular keyboard typing apps on both the iPhone and Android systems–they use algorithms that rely heavily on data analytics to predict what people will type. Prof. Peng described the rationale for partnering with an outside company:

From the very beginning, we always felt like it would be important to have a partner for the capstone project that would give the learners an outside perspective. Early on, we talked with SwiftKey about building a project, and they were very enthusiastic. We discussed the skills that learners would learn in the sequence and what they should be able to do at the end of it. They helped us design and write up the project.

SwiftKey’s involvement is not just in the initial design. They also made themselves available at specific times to interact with learners:

As the Capstone was running, SwiftKey was very generous in doing a Google Hangout with learners for an hour in the middle of the capstone. That gives a chance for learners to ask questions of the engineers and get some hints and feedback…they were really enthusiastic to interact with the learners and see how they were doing

Throughout the Capstone course, the instructors and Community TAs answer questions in the discussion forums to help learners work through their projects. JHU is also in talks with other companies as well to design new Capstone projects in the future.

As the Capstone was running, SwiftKey was very generous in doing a Google Hangout with learners for an hour in the middle of the capstone. 

A Journey Begins with a Single Step / Course

For those interested in pursuing a career in data science, this is a unique time to get involved. The field is so new that few people have much in the way of formal credentials–it a level-playing field from that standpoint. Also, as you learn, you will be generating tangible results provide some demonstration of your level of competence and insight–your online portfolio will include data products, data, analysis plans, and source code.

A final thought: data science is applicable in many more contexts than you can imagine. Journalists are using data journalism to look for patterns of evidence to support a point. Political campaigns are using data science to target their outreach efforts. Facebook keeps tweaking its News feed algorithm (to the dismay of many). So keep your mind open to the possibilities. Even if you don’t want to be a full-time data scientist, take a moment to consider whether some data science skills might be helpful for you, whatever your career, field, or interests.

If you want to explore further or sign up, you can go to the JHU Data Science Specialization on Coursera.