Resources

Data Science Resources

Below you will find a compiled list of all my favorite data science resources, broken down into the following subject categories:

  • General Guidance
  • Process & Skills Breakdowns
  • Industry Roles
  • Job Search
  • Building a Portfolio
  • Bootcamps
  • Online Courses
  • Python
  • Asking Questions
  • SQL
  • Pandas
  • Tidy Data
  • Scientific Computing
  • Inferential Statistics
  • Experimental Design
  • Machine Learning
  • Data Storytelling

Note that these resources are meant to be referenced on an as-needed basis. I do not recommend trying to read through everything at once.

Please consider contributing your own favorites, via the submission form below. This page will be a work in progress, so I’d love your help in expanding it.

Share YOUR Favorite Resources:


AJ’s Favorites

General Guidance

These articles helped set the foundation for my approach to learning Data Science.

  1. How to (actually) learn data science (DataQuest)
  2. How to learn data science without a degree (Springboard)
  3. Raj Bandyopadhyay (Quora)
  4. 5 Things You Should Know Before Getting a Degree in Data Science (Medium)
  5. To Become a Data Scientist, Focus on Competencies Before Skills

Process & Skill Breakdowns

DS DeconstructedThese articles were the main online sources for the “Data Science Deconstructed” infographic I created above:

  1. The Data Science Process: What a data scientist actually does day-to-day (Medium)
  2. The Data Science Process, Rediscovered (KD Nuggets)
  3. 8 Skills You Need to Be a Data Scientist (Udacity)
  4. 10 Must Have Data Science Skills (KD Nuggets)
  5. The 22 Skills of a Data Scientist (Dataconomy)

  6. A Comprehensive Review of Skills Required for Data Scientist Jobs (Dataversity)

  7. Five Essential Traits of a Data Scientist (DataQuest)


Industry Roles

These are helpful for understanding the current data science job market.

  1. The Data Science Industry: Who Does What Infographic (DataCamp)
  2. Data Science Career Paths: Different Roles in the Industry (Springboard)
  3. The State of Data Science & Machine Learning (Kaggle)

Job Search

These are longer PDFs and case studies that explain how to find DS jobs & interview well.

  1. Guide to Data Science Jobs (70 page PDF) (Springboard)
  2. Guide to Data Science Interviews (90 page PDF) – (Springboard)
  3. Lessons from Analyzing Hundreds of Data Science Interviews (Springboard)

Building a Portfolio

Your portfolio is your most important asset for landing a job. Here are some brilliant blog posts from Vik at DataQuest on how to do it right.

  1. Building a data science portfolio: Storytelling with data (DataQuest)
  2. The key to building a data science portfolio that will get you a job (DataQuest)

Bootcamps

Bootcamps tend to complement accelerated learning well. Personally, I ended up choosing Springboard’s Data Science Intensive (here’s a $100 coupon code off any Springboard course), but I also considered Udacity & Metis:

  1. Springboard’s Data Science Intensive
  2. Udacity’s Data Analyst Nanodegree
    1. REVIEW: Udacity Data Analyst Nanodegree
  3. Metis Data Science Bootcamp
    1. PDF: Curriculum

Online Courses

Both of these courses have been absolutely indispensable for me. Some best online instructional videos out there, hands down.

  1. Andrew Ng’s Coursera Machine Learning
  2. Harvard’s CS109: Intro to Data Science

Python (Steps 3-6)

This graph from Kaggle’s 2017 industry-wide survey shows why I would recommend you start with learning Python over R.

Screen Shot 2017-11-12 at 1.33.33 AM.png

  1. Learn Python (Codecademy)
  2. Intermediate Python for Data Science (DataCamp)
  3. Python Data Science Handbook (GitHub)

Asking Questions – (Step 1: Frame the Problem)

Asking intelligent questions is a skill you have to develop. Here’s some insights on how to ask questions that data can answer.

  1. How to ask questions data science can solve (Medium)
  2. Ask a question you can answer with data (Microsoft)

SQL – (Step 2: Collect raw data)

There are many SQL tutorial out there to choose from, but here are my personal top 3.

  1. Mode Analytics SQL Tutorial
  2. GalaXQL SQL tutorial game
  3. W3 School SQL Tutorial

Pandas – (Step 3: Process the Data)

Pandas is an indispensable skill in the Data Science toolbox. It’s important to become very comfortable with using it for data wrangling & data cleaning.

  1. 10 Minute Pandas Tutorial (Pandas Docs)
  2. Data analysis in Python with pandas (YouTube)
  3. Data Wrangling Cheat Sheet (GitHub)
  4. Reshaping in Pandas – Pivot, Pivot-Table, Stack, & Unstack Explained with Pictures

Tidy Data – (Step 3: Process the Data)

This research paper is a must-read to understand the conventions behind data cleaning:

  1. Tidy Data research paper by Hadley Wickham

Scientific Computing – (Step 4: Explore the Data)

Everything you need to know about Exploratory Data Analysis (EDA) and more:

  1. Exploratory Data Analysis Conceptual Handbook (NIST)
    • the assumptions, principles, and techniques necessary to gain insight into data via EDA
  2. Ultimate guide for Data Exploration in Python (Analytics Vidhya)
    • a walkthrough of examples with numpy, matplotlib, and pandas

Inferential Statistics – (Step 4: Explore the Data)

  1. Inferential Statistics by Univ of Amsterdam (Coursera)
  2. Statistical Aspects of Data Mining – Google TechTalk (YouTube)

Experimental Design – (Step 4: Explore the Data)

  1. Experimental Design by Johns Hopkins (Coursera)

Machine Learning – (Step 5: In-depth Analysis)

This section could use some more work. Does anyone have any good resources they’re able to submit above?

  1. 7 Steps to Mastering Machine Learning in Python
  2. Cheatsheet: Scikit-Learn & Caret Package for Python & R respectively
  3. Feature Engineering (Microsoft)

Data Storytelling – (Step 6: Communicate Results)

Some amazing examples of data storytelling:

  1. Wealth Inequality in America
  2. How Mariano Rivera Dominates Hitters
  3. Hans Rosling TEDTalk – The best stats you’ve ever seen
  4. Top 10 TEDTalks for Data Scientists

These two lectures are what Tim Ferriss would call the “Minimum Effective Dose” (MED) of data storytelling:

  1. Harvard’s CS109: Communication & Storytelling
  2. Harvard’s CS109: EDA & Visualizations

Here’s some other inspiring blogs/courses on data storytelling:

  1. Presentation Zen (blog)
  2. The Art of Storytelling course (Khan Academy)