Data Science Resources
Below you will find a compiled list of all my favorite data science resources, broken down into the following subject categories:
- General Guidance
- Process & Skills Breakdowns
- Industry Roles
- Job Search
- Building a Portfolio
- Online Courses
- Asking Questions
- Tidy Data
- Scientific Computing
- Inferential Statistics
- Experimental Design
- Machine Learning
- Data Storytelling
Note that these resources are meant to be referenced on an as-needed basis. I do not recommend trying to read through everything at once.
Please consider contributing your own favorites, via the submission form below. This page will be a work in progress, so I’d love your help in expanding it.
Share YOUR Favorite Resources:
These articles helped set the foundation for my approach to learning Data Science.
- How to (actually) learn data science (DataQuest)
- How to learn data science without a degree (Springboard)
- Raj Bandyopadhyay (Quora)
- 5 Things You Should Know Before Getting a Degree in Data Science (Medium)
- To Become a Data Scientist, Focus on Competencies Before Skills
Process & Skill Breakdowns
These articles were the main online sources for the “Data Science Deconstructed” infographic I created above:
- The Data Science Process: What a data scientist actually does day-to-day (Medium)
- The Data Science Process, Rediscovered (KD Nuggets)
- 8 Skills You Need to Be a Data Scientist (Udacity)
- 10 Must Have Data Science Skills (KD Nuggets)
These are helpful for understanding the current data science job market.
- The Data Science Industry: Who Does What Infographic (DataCamp)
- Data Science Career Paths: Different Roles in the Industry (Springboard)
- The State of Data Science & Machine Learning (Kaggle)
These are longer PDFs and case studies that explain how to find DS jobs & interview well.
- Guide to Data Science Jobs (70 page PDF) (Springboard)
- Guide to Data Science Interviews (90 page PDF) – (Springboard)
- Lessons from Analyzing Hundreds of Data Science Interviews (Springboard)
Building a Portfolio
Your portfolio is your most important asset for landing a job. Here are some brilliant blog posts from Vik at DataQuest on how to do it right.
- Building a data science portfolio: Storytelling with data (DataQuest)
- The key to building a data science portfolio that will get you a job (DataQuest)
Bootcamps tend to complement accelerated learning well. Personally, I ended up choosing Springboard’s Data Science Intensive (here’s a $100 coupon code off any Springboard course), but I also considered Udacity & Metis:
- Springboard’s Data Science Intensive
- Udacity’s Data Analyst Nanodegree
- Metis Data Science Bootcamp
Both of these courses have been absolutely indispensable for me. Some best online instructional videos out there, hands down.
Python (Steps 3-6)
This graph from Kaggle’s 2017 industry-wide survey shows why I would recommend you start with learning Python over R.
- Learn Python (Codecademy)
- Intermediate Python for Data Science (DataCamp)
- Python Data Science Handbook (GitHub)
Asking Questions – (Step 1: Frame the Problem)
Asking intelligent questions is a skill you have to develop. Here’s some insights on how to ask questions that data can answer.
- How to ask questions data science can solve (Medium)
- Ask a question you can answer with data (Microsoft)
SQL – (Step 2: Collect raw data)
There are many SQL tutorial out there to choose from, but here are my personal top 3.
Pandas – (Step 3: Process the Data)
Pandas is an indispensable skill in the Data Science toolbox. It’s important to become very comfortable with using it for data wrangling & data cleaning.
- 10 Minute Pandas Tutorial (Pandas Docs)
- Data analysis in Python with pandas (YouTube)
- Data Wrangling Cheat Sheet (GitHub)
- Reshaping in Pandas – Pivot, Pivot-Table, Stack, & Unstack Explained with Pictures
Tidy Data – (Step 3: Process the Data)
This research paper is a must-read to understand the conventions behind data cleaning:
Scientific Computing – (Step 4: Explore the Data)
Everything you need to know about Exploratory Data Analysis (EDA) and more:
- Exploratory Data Analysis Conceptual Handbook (NIST)
- the assumptions, principles, and techniques necessary to gain insight into data via EDA
- Ultimate guide for Data Exploration in Python (Analytics Vidhya)
- a walkthrough of examples with numpy, matplotlib, and pandas
Inferential Statistics – (Step 4: Explore the Data)
- Inferential Statistics by Univ of Amsterdam (Coursera)
- Statistical Aspects of Data Mining – Google TechTalk (YouTube)
Experimental Design – (Step 4: Explore the Data)
Machine Learning – (Step 5: In-depth Analysis)
This section could use some more work. Does anyone have any good resources they’re able to submit above?
- 7 Steps to Mastering Machine Learning in Python
- Cheatsheet: Scikit-Learn & Caret Package for Python & R respectively
- Feature Engineering (Microsoft)
Data Storytelling – (Step 6: Communicate Results)
Some amazing examples of data storytelling:
- Wealth Inequality in America
- How Mariano Rivera Dominates Hitters
- Hans Rosling TEDTalk – The best stats you’ve ever seen
- Top 10 TEDTalks for Data Scientists
These two lectures are what Tim Ferriss would call the “Minimum Effective Dose” (MED) of data storytelling:
- Harvard’s CS109: Communication & Storytelling
- Harvard’s CS109: EDA & Visualizations
Here’s some other inspiring blogs/courses on data storytelling: