5 minute read  written by  . Published on July 2, 2014

Editor’s Note: This post was originally published on June 11th here and written by Fras and Sabine. They are a couple in their mid 20s living in Thailand and designing their own DIY Data Science Masters. Follow their journey here.

Free!! education platforms have put some of the world’s most prestigious courses online in the last few years. This is our plan to use these and create our own custom open source data science Master. Quickly before we begin though, in the spirit of openness we should explain where we are starting from:

We both have Physics degrees and are comfortable with maths, logic, algorithms and manipulating data. But perhaps most importantly we enjoy this type of work. The program we have designed does not require any pre-conceived knowledge in the topics below; however we do feel it is an advantage to come from a numerical background.

 

So what does it take to become a data scientist?

 

Statistics

Statistics is perhaps the start point of data science. For most questions in the world we have neither measured every phenomenon nor asked every person what they think, instead we have a small recorded subset of conversations and measurements. Statistics helps us understand what we can, and as importantly cannot reasonably learn from that smaller group.

Visualisation

There is no such thing as information overload. There is only bad design.” Edward Tufte

There is no point having a story to tell but being unable to tell it. With the number of new visualisation tools springing up each year there is no excuse not to make your story beautiful and compelling. This involves elements of design and artistic principles, not things you pick up in your average Physics warehouse.

Programming

As far as I understand programming is making the computer do what you can’t be bothered or do not have a long enough life span to do yourself. Most analysis is crunched by programs and now most beautiful data visuals are drawn by them. Although we had done basic programming before: simple loops and stringing together conditional statements, we needed programming as the glue that tied everything else together. As a data scientist, you could probably get away with a certain level of R or python but you’d be reliant on your back-end developers to retrieve/ manipulate data and front end developers to showcase it for you ! Limiting huh ?

Data Manipulation

Having both worked as data scientists before leaving for Thailand, we quickly understood that the majority of Data Science is actually finding, cleaning and reformatting data. Although it doesn’t sound exciting, a thorough understanding of current data formats, querying databases and building interfaces for your data models will allow you to work well in a team and more importantly actually leave the office on time!

Machine Learning & Algorithms

This topic is broad, with a masters worth of active research in numerous fields and areas. However it is also where our motivation to become data scientists came from: building self driving cars, identifying people based on their ear lobes 🙂. The common process these all share is the ability to teach a computer to understand patterns like the human brain, whether it is oncoming traffic or youtube cats. We will be studying some of the current most common tools for uncovering patterns, but it is an active field and a life long learning.

The Plan

This is our study: it is divided into two (circa 2 months each) terms and contains the courses we have found to be most suitable for learning each of the above topics. NB: It also contains some time to work on projects which apply what we are learning to mimic the structure of a Masters that would include a final project.

Pre – Requisites & Pre – Read

We completed the courses below whilst still in our corporate jobs, in the evenings and weekends, before fully embarking on this journey and hence will consider these courses as pre-requisites because some of the new courses build upon them.

Machine Learning

Computing for Data Analysis

Data Analysis

Programming Methodology: CS106A

Also, as we didn’t have previous Javascript experience, and needed some concepts for the visualisation course, we set reading Eloquent Javascript as a pre-read.

Term 1

Mon Tue Wed Thu Fri Sat
9 – 10 NLP CS106B NLP CS106B CS106B
10 – 11 NLP CS106B NLP CS106B CS106B Spanish1
11 – 12 NLP CS106B NLP CS106B Stats Work Spanish
12 – 13 Lunch Lunch Lunch Lunch Lunch Lunch
13 – 14 NLP Stats NLP Stats Vis
14 – 15 DB Stats DB Stats Vis Catch-up2
15 – 16 DB Stats Work DB Stats Work DB Catch-up
16 – 17 CS106B Vis CS106B Vis DB
17 – 18 CS106B Vis CS106B Vis

Term 2

Mon Tue Wed Thu Fri Sat Sun
9 – 10 CS1693 Ruby CS169 Project4 CS169 Project
10 – 11 CS169 Ruby CS169 Project CS169 Project Spanish
11 – 12 CS169 RoR CS169 Project CS169 Project Spanish
12 – 13 Lunch RoR Lunch Project Lunch Project Lunch
13 – 14 CS169 Lunch RoR Project Lunch Project
14 – 15 API Web Dev5 API Project API Project
15 – 16 Choice6 CS169 Choice Project Choice Project
16 – 17 Choice CS169 Choice Project Choice Project
17 – 18 Choice CS169 Choice Project Choice Project

1 Yep, we thought it would be fun and a useful skill to learn Spanish!

2 Whatever work needs catching up on as and if we fall behind…

3 Part 2 of CS169 is also running on EDX.

4 Project work hasn’t yet been defined as we hope ideas will finalise as we work through the programme.

5 This hour will comprise a mix of learning skills around web development such as Twitter bootstrap.

6 This course will be a choice of the following or another that we decide nearer the time, as our skills and interests get more defined:

– Programming Paradigms CS106C

Probability and Random Variables

Developing IOS 7 Apps for IPhone and IPad

selection from