Deconstructing Data Science: Breaking The Complex Craft Into Its Simplest Parts

This is the SECOND in a series of posts on applying Tim Ferriss’ accelerated learning framework to Data Science. My goal is to become a world-class (top 5%) Data Scientist in < 6 months, while open-sourcing everything I find & learn on the way.

The purpose of this post is to empower others to start accelerating their own learning by:

  1. deconstructing the complex craft of Data Science into its simple micro-skills
  2. identifying the 20% of skills that contribute to 80% of outcomes

And if you stick around until the end, you’re in for a special treat.

Estimated reading time: 15 min ( to save you hours of spinning in circles 😉 )

The Problem

FireHydrant

A simple Google search of “how to learn Data Science” returns thousands of learning plans, degree programs, tutorials, and bootcamps. It’s never been more difficult for a beginner to find signal in the noise.

Everyone seems to have a different opinion, and the only common approach appears to be dumping a long list of courses to take and books to read, all the while providing little to no context into how these concepts fit into the bigger picture.

This post is my attempt to convert all the buzzwords & fluffy terminology into explicitly-learnable skills. To do this, I’ll be walking through my application of the first two steps to Tim Ferriss’ accelerated learning framework: Deconstruction & Selection.

Rather than jump right in to a roadmap of my own learning journey (that’ll be next post), I want to empower you to begin your own. And if you haven’t read my first post, I’d highly recommend starting there: www.ajgoldstein.com/learning-without-limits/

Deconstruction: The Data Science Process

“The whole is greater than the sum of its parts.” – Aristotle

DS Deconstructed
I’ll be walking through this infographic step-by-step below

It’s true: Data Science is not a single discipline, but a craft at the intersection of many. So in order to appreciate how the seemingly disparate puzzle pieces fit together, I present to you a story. It’s called “The Data Science Process”, and it has six parts:

  1. Frame the problem: who are you helping? what do they need?
  2. Collect raw data: what data is available? which parts are useful?
  3. Process the data: what do the variables actually mean? what cleaning is required?
  4. Explore the data: what patterns exist? are they significant?
  5. Perform in-depth analysis: how can the past inform the future? to what degree?
  6. Communicate results: why do the numbers matter? what should be done differently?

But before we begin, a couple quick caveats:

1) In large organizations, “The Data Science Process” is often carried out by an entire team, not a single individual. An individual can specialize in any one of the six steps, but for simplicity, we’ll be assuming a one-person team.

2) The insights that follow are a compilation of various expert interpretations; not my original ideas. I am not (yet) an expert Data Scientist, but over the past 6 weeks I’ve learned from many. Thus, I’m simply serving as the filter between hundreds of hours of research and the actionable insights you’ll find below.

In particular, I’ll be pulling from favorite online articles (linked throughout) and conversations with the following 10 experts:

  1. Chris Brooks — Director of Learning Analytics at the University of Michigan
  2. Andrew Cassidy — Freelance Data Scientist & Online Educator
  3. Jim Guszcza — US Chief Data Scientist at Deloitte Consulting
  4. Kirk Borne — Principal Data Scientist at Booz Allen Hamilton
  5. Michael Moliterno — Data Scientist + Design Lead at IDEO
  6. Chris Teplovs — Research Investigator at the University of Michigan
  7. Jonathan Stroud — Co-Founder of the Michigan Data Science Team (MDST)
  8. Josh Gardner — Data Science Research Associate, Team Leader on MDST
  9. Jared Webb — PhD Candidate in Applied Math, Data Manager at MDST
  10. Alex Chojnacki — Data Application Manager for Flint-Water-Crisis project

And to bring each step of the process to life, I’ll be using my work at Calm.com, Inc. in San Francisco this summer as a real-world case study.

While there, I leveraged analytics insights from Calm’s database of 11 million users to develop & launch Calm College — the first US platform geared toward using mindfulness to improve college student mental health.

Alright, let’s get started!

Step One: Frame The Problem

Step01

The first step of The Data Science process involves asking a lot of questions.

The exact manner in which you do this will depend on the context in which you’re working, but whether you’re in the private sector, public sector, or academia, the key idea is the same: before you can start to solve a problem, you have to deeply understand it.

Your goal here is to get into the clients’ head to understand their view of the problem and desired solution. In the case of a corporation, this will first involve speaking with managers & supervisors to identify the business priorities and strategy decisions that’ll influence your work.

It’s not uncommon for the first request that a Data Scientists’ receives to be entirely ambiguous (i.e. “we want to increase sales”). But it’ll be your job to translate the task into a concrete, well-defined data problem (i.e. “predict conversion rate & return-on-investment across customer segments.”)

This is where domain knowledge and product intuition is crucial. Speaking with subject-matter-experts to cut through confusing acronyms & dense terminology can be incredibly helpful here. And familiarizing yourself with the product/service will be essential to understanding the intuition behind metrics.

For example…

With Calm College, the ambiguous request we started with was to establish partnerships with universities to offer the Calm app as a student wellness resource.

To better understand our specific domain, we started by spending two weeks speaking on the phone with as many college administrators as possible.

We asked questions like:

  • How would you describe the mental health climate on your campus?
  • How high of a priority is improving student mental health?
  • What main resources do you currently offer students?
  • What have been the greatest challenges?
  • Is there precedence for offering 3rd party services?

By the time we got to the final question, nearly every administrator had described their campus’ mental health climate as nothing short of “toxic”, and expressed improving it as their #1 priority.

They explained that the greatest challenge to students seeking help has been overcoming logistical issues (i.e. wait-time, transportation, & money) with the counseling services they currently offer.

Finally, here’s where our ambiguous request became a data problem…

Administrators told us that, before a 3rd party service can be adopted, precedence requires evidence supporting its use. In other words, showing that students on campus are already using the Calm app would be crucial to getting a deal done.

Step Two: Collect Raw Data

Step02.jpg

The second step of the Data Science Process is typically the most straightforward: collect raw data.

This is where your first technical skill — querying structured databases with SQL — comes into play. But fret not; it’s not as complicated as it may sound.

Here’s an awesome tutorial by Mode Analytics that’ll get you started with SQL in just a couple hours.

More important than the querying itself, however, is your ability to identify all the relevant data sources available to you (e.g. web, internal/external databases) and extract that data into a useable format (e.g. .csv, .json, .xml).

Oftentimes, an analysis requires more than one dataset, so you’ll likely need to speak with backend-engineers in your organization who are more familiar with what data is being collected and where it currently resides. Communication is key.

For example…

With Calm College, this required me sitting down with Calm’s lead engineer and exploring ways to pull usage data for specific college campuses.

Ultimately, I found out that we could simply query user activity by email address and school location. So for the University of Michigan, for example, I simply searched the database for emails ending in “umich.edu” or locations listed as “Ann Arbor, MI”.

This approach wasn’t full-proof (turns out not all students were using their school email) but it did the job by giving us a representative sample of ~1000 users per college to compare different campuses’ activity head-to-head.

Step Three: Process The Data

Step03

The third step of the Data Science Process is the most underrated: process the data.

This is where a scripting language like Python or R comes into play, and a data wrangling tool like Python’s Pandas is absolutely indispensable.

To get started, here’s a breakdown of Python vs. R, intro to Python on Codecademy, 10-minute tutorial to Pandas, and colorful data wrangling cheat-sheet.

Data cleaning is typically the most time-intensive part of data wrangling. In fact, in expert surveys it’s been estimated that up to 80% of a Data Scientists’ time is spent here: cleaning & preparing the data for analysis (more on this below).

The reason this can be so time-consuming is because — before you can analyze data — you have to go column-by-column, developing an understanding for the meaning of every variable and then checking for bad values accordingly.

The tricky part is that a bad value can be defined as many things: input errors, missing values, corrupt records, etc. And once you’ve identified a “bad value”, you have to decide whether it’s most appropriate (given the situation) to throw it away or replace it.

For example…

With Calm College, I faced two significant roadblocks here:

  1. There was little to no company documentation on database variables
  2. I didn’t know Python’s Pandas and felt too intimidated to try and learn

Each of these presented their own challenge:

  1. It took me several days to figure out how to define an “active user” (i.e. should ‘active’ mean opening the app, starting a session, or completing a session?)
  2. I had to use an analytics tool called Amplitude rather than coding in a script file.

After talking with Calm’s Product Manager, I was able to define an active user as someone who “starts a meditation session” and identify the right variables. Then I had to clean the data by filtering out students who hadn’t been active in the last 365 days.

The thought process here was that administrators (i.e. our client) would primarily be interested in student activity from the past academic year, and non-active students (i.e. “null” values) were outliers that, if included, would only skew the results.

Noticing a theme here? It’s about your clients’ interests, not your own.

Step Four: Explore The Data 

Step04.jpg

The fourth step of the Data Science Process is where you explore the data, and the real adventure begins.

This is where the core competency of scientific computing (i.e. Python’s numpy, matplotlib, scipy, & pandas libraries) comes into play.

To begin, here’s an awesome breakdown of the “SciPy ecosystem” (a collection of libraries in Python), extensive guide to data exploration, and a conceptual handbook of assumptions/principles/techniques.

Using these libraries, you’ll split, segment, & plot the data, in search for patterns. Thus, the key is becoming really comfortable with producing quick & simple bar graphs, box plots, histograms, etc. that’ll let you catch trends early on.

Remember that analysts who produce beautiful externally-facing visualizations often have to iterate through hundreds of internally-facing ones first. So playing around with possibilities in this way is more of a guess-and-check art than a hard-and-fast science.

Finally, once you’ve identified some patterns, you’ll want to test them for statistical significance to determine which are worth including in a model. This is where a strong grounding in inferential statistics (e.g. hypothesis testing, confidence intervals) and experimental design (e.g. A/B tests, controlled trials) is essential.

For example…

With Calm College, I started by exploring factors that would influence a potential partnership: monthly engagement, week-by-week retention, and subscription rate.

My hypothesis going in was that elite schools known for student stress (i.e. Cornell, Harvard, MIT) would have significantly higher numbers across the three statistics. Or, in other words, I suspected that stressed-out kids need more calm.

To test this, I began by segmenting universities into their regional groups and then splitting areas into specific college towns. From there, I was able to compare the statistical significance of schools’ activity across local, regional, and national averages.

After several iterations of my experimental design (and hundreds of internally-facing visualizations), I found what I was looking for: a list of outlier schools that we would ultimately call “Calm’s Most Popular Colleges”.

Step Five: In-Depth Analysis

Step05

The fifth step of the Data Science process is where you create a model to explain or predict your findings.

This is where most people lose the forest for the trees, as they enter into the land of shiny algorithms and fancy mathematics. Creating models is by far the most over-glorified part of Data Science, which is why most degree programs solely focus on this single step.

But before jumping in to a particular solution, it’s important to pause and return to the bigger picture by asking yourself: “what am I really trying to do and why does it matter?”.

From here, you’ll:

  1. apply your knowledge of algorithms’ contextual pros/cons to choose one approach best-suited for the situation
  2. carry forward statistically significant variables (from the exploratory phase) using what Data Scientists call “feature engineering”
  3. use a machine learning library like scikit-learn for implementation.

The overall goal is to use training data to build a model that generalizes to new (unseen) test data. So while building, it’s important that you’re keenly aware of (and capable of recognizing) overfitting and underfitting.

Here are some amazing free videos from Andrew Ng’s Machine Learning course and Harvard’s CS109 “Intro to Data Science” class that will teach you how to do this for different algorithm types. A great place to practice is through Kaggle tutorials.

NOTE: I’d recommend starting by watching just one or two videos on a simple model type like logistic regression or decision trees, and then immediately applying what you’ve learned on a dataset you care about.

For example…

With Calm College, the model I was building was more “explanatory” than “predictive”.

That is, I was simply trying to identify the universities most suitable for a partnership and understand what factors about a school were contributing to that.

So what I ultimately built was a simple linear regression model (in Excel, no less) that used features like active user count, student enrollment, & university endowment to explain a university’s user activity over time.

Sure, building a predictive model would’ve been the “cool” thing to do, but the goal wasn’t to predict sales leads for the future; it was to establish partnerships with universities NOW.

Lesson learned: the job of a Data Scientist is NOT to build a fancy model; it’s to do whatever it takes to solve a real-world human problem. 

Step Six: Communicate Results

Step06.jpg

The sixth step of the Data Science Process is where you bring it all together and communicate results.

This is where you practice the most underrated skill in the Data Science toolbox; the X-factor that separates the good Data Scientists from the great ones: data storytelling.

Speaking with experts, I heard it time and time again: your worth as a Data Scientist will be ultimately determined by your ability to convert insights into a clear and actionable story.

In other words, the ability to create and present simple, effective data visualizations to a non-technical audience is the most sought after skill in business today.

For a perfect example of how to do it right, here’s the most well-put-together data story I’ve ever seen on “Wealth Inequality in America”.

And here’s a lecture by Harvard’s CS109 that’s a brilliant encapsulation of the art of data storytelling. The professor covers everything from understanding your audience to providing memorable examples. If you don’t have time to watch the lecture, you can check out my Evernote notes that sum it all up.

Finally, to create beautiful data visualizations, I’d recommend going beyond Python’s basic matplotlib library and checking out seaborn (statistical) and bokeh (interative).

For example…

With Calm College, we had to weave our findings on student activity into an actionable story for campus administrators.

First, I used our list of “Calm’s Most Popular Colleges” to generate sales leads, by reaching out to 50 schools that the model identified as most suitable for a partnership.

Then, for each of the 50 schools, I crafted a personalized story about their students’ activity on the Calm app.

For example, with Harvard, we reached out to the head of campus wellness to let her know that Harvard’s campus was a top 5 most popular college for the Calm app. Then we included 4 graphs depicting the following insights:

  1. 6% of the Cambridge, Massachusetts population (17,000+ people) are Calm users.
  2. More than 82% of Harvard users are active on a monthly basis, with an average of 15 (fifteen!) sessions/month!
  3. Week-by-week retention amongst Harvard users is 3x that of the average Calm user.
  4. Yet, despite all of this, Harvard student’s subscription rate is still well below average.

The first 3 graphs told a story of extraordinary interest in the Calm app on Harvard’s campus. But what really drove home our program was the last point:

“despite all this amazing interest, it’s clear that your students cannot afford Calm’s $60/year subscription. That’s why you need Calm College: to make the Calm app a FREE wellness resource for your students.”

Rather than sell our product, we were selling their students’ past and present use of our product. And it worked like a charm.

Repeating this approach for other colleges, we were able to successfully get our foot-in-the-door at many of the most elite institutions in the country.

And eventually, thanks to this application of The Data Science Process, we were able to launch the program at 8 schools this Fall:

CalmPartnerSchools
the 8 schools Calm College launched at this Fall

Selection: The Core 20%

“You are not flailing through a rainforest of information with a machete; you are a sniper with a single bull’s-eye in the cross-hairs.” — Tim Ferriss, The Four Hour Chef

The greatest mistake you can make in accelerated learning is trying to master everything. This is not Pokémon. You are not going to catch ’em all.

Instead, the key is being relentlessly focused with the micro-skills you choose to develop. Through rigorous application of the 80/20 rule, it’s possible to cut down a long list of possibilities to the highest frequency material. Then, once you’ve cleared your plate, it’s depth over breadth all the way.

In his book, the “Four Hour Chef”, Tim Ferriss discusses this selection process by introducing the idea of a “Minimum Effective Dose” (MED). Simply put, an MED is the smallest dose that will produce a desired outcome.

Here, I’ve broken down the MED for all 6 steps of The Data Science Process:

The Core 20
the 20% of Data Science skills that result in 80% of outcomes

In conversations with experts, these 8 skills continuously came up as the most essential.

In particular, Data Wrangling (i.e. Python’s Pandas) was said to be the #1 skill (in terms of time spent doing) by every Data Scientist I spoke with. Data cleaning is not sexy, but it encapsulates up to 80% of the job.

You may be wondering where big data tools like Hadoop & Spark, or modeling techniques like neural networks & deep learning fall into all this. The answer: surely outside the core 20%.

To my surprise, many Data Scientists I spoke with emphasized that only a small percentage of companies have data that even requires something as complex as a neural network!

Instead, an overwhelming majority of employers need more simple services like data cleaning, exploratory analysis, and logistic regression models (as recently reflected in an industry-wide survey by Kaggle).

When choosing what to learn, remember: you can always revisit the heavier topics later, but don’t weigh yourself down at the start. The goal is to accelerate learning. So wait until your house of expertise has a strong foundation before adding the shiny stuff.

If you’re looking to master the fundamentals of Data Science in 6 months or less, you’ll want to simply focus on the core 20%.

Next Steps

“Live as if you were to die tomorrow. Learn as if you were to live forever.” — Mahatma Gandhi

I do not believe knowledge is useful for the sake of knowledge; only if you use what you’ve learned to improve your life, or the lives of others. So I would encourage you to pause, reflect, & ask yourself: “what’s the smallest possible action I can take right now with what I’ve learned?”.

For instance, a great place to start would be picking one of the six steps you’re most interested in and exploring the skills/resources associated with it. Then find a dataset that’s of interest to you and start learning by doing through a mini-side-project.

The key is trusting yourself by following the path that you’re instinctually most drawn to… because that’s where you’re find the most short-term motivation & long-term fulfillment.

Personally, after deconstructing data science and identifying the core 20%, I decided to enroll in Springboard’s Data Science Intensive online bootcamp (recently renamed to “Intermediate Data Science”). I chose this program because it was the only curriculum I could find that covered all 6 steps of the data science process while focusing in on all 8 skills of the core 20%.

For more information on the program, I’d recommend checking out Raj Bandyopadhyay’s brilliant Quora answers (here and here) on the methodology behind Springboard’s approach to Data Science education. And here’s a discount code for $100 off any Springboard course.

Whatever you choose to do with this information, the important thing is that you do something. Getting started is always the hardest part, so I challenge you to turn intention into action.

Final Thoughts

Over the past few weeks, the power of the internet has sure become apparent. In just the first 7 days, my first post — Learning Without Limits — had 3000+ views from 66 countries around the world. Never did I expect it to spread so far and wide, but I guess I have all of you to thank for that.

So as long as you all continue to pay it forward, I’ll continue to be an open book. As promised, I’ve complied and will continue to open-source all my favorite resources, insights, and findings via this new page: ajgoldstein.com/resources.

All I ask of you is that you share this with people you think would benefit. That’s my call-to-action. Share. Why? Because we’re all in this together and true happiness comes from other people.

____________________________________

To follow along this journey, feel free to drop your email in the sign-up bar below. By signing up, you’ll receive one (just one) email when I’ve posted a new update.

And don’t hesitate to leave any questions, thoughts, or feedback you have in the comments box below. I’d love to hear from you.

Learning Without Limits: How Indigenous Tribes Prepared Me To Master Data Science

cropped-heaven3.jpg

This is the first in a series of posts on applying Tim Ferriss’ accelerated learning framework to Data Science. My goal is to become a world-class (top 5%) Data Scientist in < 6 months, while open-sourcing everything I find and learn along the way. Here’s the story behind the journey and an invitation to follow along:

Fear

There I was, ten yards out, staring my dinner in the face. The only problem was, the wild boar was still alive.

For the past 4 weeks I had been backpacking solo throughout Southeast Asia, and just yesterday had decided to spend the final 2 days of my trip doing jungle survival training in Bario, Malaysia.

With the help of a local guide, I was effectively learning how to live off the land: boil my own water, kill my own food, build my own shelter, and more.

Now here I was, in the most isolated region of the country, face-to-face with my next meal; having to come to terms with yet another deep-seated fear of mine.

Pic1 - JungleSurvivalCollage
Living off the land

In many ways, this was nothing new. I had spent the majority of the past month living with indigenous tribes — finding myself in a long list of situations that had me totally outside my comfort zone.

In other ways, this situation felt just as scary as the last one. And the one before that. And the one before that. Even after facing hundreds of these “oh shit” moments, the fear never went away.

Crossing a makeshift bamboo-tree-bridge over a rapid river just 30 minutes earlier was no less terrifying than this moment of staring a 130-pound wild animal right in the face.

Throughout my 30 days in Cambodia and Borneo-Malaysia, the fear never faded. But, what did change, was how I learned to handle the fear.

Pic2 - JungleSurvivalCollage2
Me, my guide Phillip, and the “bridge” that had me so scared I was physically shaking.

Ground Zero

Adjusting to such a different way of life was not always quick or easy.

As part of adopting indigenous culture for 30 days, I sometimes found myself eating insects & reptiles, sleeping on wooden boards, taking ice-cold showers, and using hole-in-the-ground toilets.

Virtually every guarantee and absolute of my life back home had been either stripped away or disproved. Left with only the bare essentials, I quickly found myself at ground zero.

One week before Bario, I was in another village 300 miles south, staying with the Iban indigenous people in Kanowit, Malaysia. I was the first young white male that the children had ever seen in-person and, like every other tribe I’d visited, only a few people spoke more than a little broken English.

Pic3 - IbanVillage
The Iban indigenous children & my host-family

As someone who places such a high value on building connection through conversation, the language-barrier was especially difficult for me. I often found myself feeling incredibly lonely and in need of some intellectual stimulation.

Thankfully, a couple weeks earlier I discovered that listening to podcasts was an amazing way to keep myself company. In particular, I’d quickly become obsessed with The Tim Ferriss Show. Each episode, Tim would deconstruct the habits and routines of world-class performers in order to distill actionable tips/tricks for his listeners.

Listening while experimenting with a totally different way of life, I found these podcasts to be the perfect recipe for re-examining my life from the ground up. The combination of fresh ideas with new scenery channeled a level of creativity within me that I never knew was there.

Pic5 - Snake
Yum, lunch 😛

Accelerated Learning

One theme that really resonated was Tim’s framework for accelerated learning. As a self-proclaimed human guinea pig, he had spent the previous decade mastering various skills like Brazilian jiu-jitsu, language learning, tango dancing, and swimming. In each case, he had become a world-champion in less than 6 months, all by following the same framework (described in detail below).

Hearing his personal stories of emulating the world’s fastest learners completely opened my mind to the sheer breadth of possibility around self-education. In particular, his accelerated learning framework had me asking two questions of myself:

  1. What skill do I want to learn the most?

  2. What fears are stopping me?

The answer to the first question felt somewhat obvious. In high school I’d developed a fascination for data analytics through my childhood love of baseball. And in just a couple weeks I’d be starting my senior year at the University of Michigan — where I’d spent the past three years studying for a degree in Data Science Engineering. So mastering the technical expertise of a Data Scientist seemed like the clear choice.

Pic6 - DS Venn Diagram
“What’s Data Science?” Drew Conway’s famous 2010 venn-diagram depicts it as the intersection of hacking skills, math & statistics knowledge, and substantive expertise.

But as I started to try and answer the second question, I came up totally short. Reflecting on the past few summers — where I had three internships involving data analytics — I realized that I’ve continuously shied away from hard-and-fast engineering work, choosing instead to focus in on “softer” business-development skills where I was already most comfortable.

As a result, in none of these experiences did I take things to the full extent of my technical capabilities. Every time, I felt myself just barely scratching the surface of something I’d been seemingly fascinated by for as long as I could remember.

It just didn’t make any sense. Clearly I’ve always wanted to learn this stuff, but why haven’t I given myself the chance?

Limiting Beliefs

On August 17 – my last evening with the Iban people – I was relaxing on the back porch of the tribal chief’s home, listening to yet another Tim Ferriss podcast. This one was a conversation with Tara Brach, a world-renowned meditation teacher.

Toward the end of the episode, while discussing how she’s overcome fear, Tara posed a simple question to listeners that left me dead in my tracks:

“What are you believing that’s limiting you?”

For the past few weeks I’d been forced to come to terms with physical fears big & small, but only now did I begin to consider the mental fears that had been limiting me back home.

Returning back to the second, lingering question above, the answers came pouring out. I quickly grabbed my journal and scribbled down the following entry:

Pic7 - Journal Entry
The journal entry from the Iban village, answering Tara Brach’s question

What came out was both surprising and enlightening. By seeing my fears as just that — fears — I was able to take a step back and ask myself if any of these were actually worth being afraid of.

I was able to see that the only blockades preventing me from growth were internal insecurities around my own worthiness and capacity to learn. With nearly all the information I need free and readily available online, the only thing really standing in my way were mental barriers I had created for myself.

Accelerating My Learning

With all limiting walls identified, I was able to begin to knock them down.

Now circling back to Tim’s learning framework, I revisited podcasts, articles, & videos about his personal story, in search for ways to apply the same principles to Data Science.

And by the way, this was possible due to the fact that – in the indigenous villages of Cambodia and Borneo-Malaysia – free Wi-Fi is more available than clean drinking water. Go figure.

Pic8 - Adventure

So throughout the final two weeks of my backpacking trip — while motorbiking across towns, trekking through jungles, climbing mountains, and riding long-bus rides — I slowly created a learning plan to execute upon returning home.

Below, I’ve outlined an overview of that plan, step-by-step. Please note that the framework described below was originally published in Tim Ferriss’ epic book: The Four Hour Chef.

The Framework

DiSSS: the recipe to becoming world class in anything in less than 6 months

  1. Deconstruction
  2. Selection
  3. Sequencing
  4. Stakes

Step One: Deconstruction

The first step of Tim’s framework is to break down the complex skill you want to learn into it’s simplest parts. The key question here is:

“What are the LEGO blocks (e.g. micro-skills) that make up the big scary wall?”

Two main tools (and supporting examples) for accomplishing this are:

  1. Reducing: break down each micro-skill into its individual components.
  • While learning Japanese, Tim broke up each alphabetical character into native “strokes” called radicals. With only 214 traditional radicals in the language, this turned a near-impossible task — learning 1,945 characters — into something much more manageable.
  1. Interviewing: consult experts about learning strategies, key principles, common mistakes, etc.
  • While learning basketball, Tim cold-emailed Rick Torbett (who coached the Warriors to the highest 3-point shooting percentage in NBA history) for learning strategies like “framing the goal on the follow-through” and key principles like “legs for distance, arms for aim” — in exchange for a feature on his blog.

During my first two weeks back home, I started with “reducing” by spending 50+ hours reading every article I could find online about the core/top/essential/critical “skills of a Data Scientist”.

Not surprisingly, much of what I found at first was filled with buzz-words and fluffy terminology, but about 10 articles in, I started to notice some consistent, substantive patterns.

Pic9 - Data Science Summary
a summary of my first 50 hours of research. In my next post I’ll walk through these findings, step-by-step.

Next, I created a list of questions and started reaching out to every expert Data Scientist I could find (e.g. co-workers, university professors, industry professionals, etc).

In most cases, they were more than happy to help. So over the past two weeks, I’ve conducted 10 informational-interviews with local Data Scientists, and learned a ton of “do’s” and “do-not’s” in the process (details to come in my next post).

Step Two: Selection

The second step of the framework is to apply the 80/20 rule by asking:

“What 20% of micro-skills will result in 80% of the outcome I’m trying to create?”

The tagline here is “Material beats Method”. That is, carefully choosing WHAT you learn is more important than HOW you learn that material. Thus, to apply the framework effectively, you must identify and focus on the highest frequency material.

For example, Tim notes that, of the 171,476 English words in the Oxford Dictionary, the 100 most commonly written words (a mere 0.06%) make up more than 50% of all written material.

In the case of Data Science, this has required me to continuously separate the “hot-topics” of today (e.g. deep learning, neural networks) from the core fundamentals (e.g. data cleaning, data wrangling).

Over and over again, I found the following 8 micro-skills (4 technical, 4 non-technical) to be responsible for more than 80% of experts’ results:

Pic10 - DS Skills.png
the micro-skills I’ll be focusing in on for the next 6 months. More details to come next post.

Step Three: Sequencing

The third step is to lay out the selected LEGO blocks into the most logic progression possible.

Two things to consider here are any dependencies that may exist between the skills, as well as which ones will provide the most early-wins (because humans quit if they’re not having fun).

For instance, Tim grew up 5 minutes from the beach in Long Island, NY, but didn’t learn to swim until he was 31. But what finally did the trick was a program called Total Immersion, with a progression that wouldn’t allow him to fail.

Each exercise was built upon the previous, and failure points like kickboards were completely avoided. Skills were layered on one at a time, and within 10 days he’d gone from a two-pool-length (40 yards) maximum to swimming more than 40 lengths per workout.

With Data Science, I’ve found that learning the basics of a programming language like Python or R is a dependency to the 8 LEGO-blocks listed above. And within each micro-skill, there’s a somewhat obvious progression to learning.

For example, it doesn’t really matter if you start by learning Machine Learning models with clean data or Data Wrangling with messy data. Neither is directly dependent on the other.

However, when learning how to implement Machine Learning models, it’s best to start with simpler algorithms like linear regression and decision-trees before moving on to more complicated approaches like random forests. That is, basic principles tend to carry up to higher-level techniques.

Pic11 - Adventure2
climbing 2500 feet up Mt. Santubong via dangling ladders & scrambling rocks

Step Four: Stakes

The fourth and final step is to build in consequences and rewards for yourself that will ensure you actually do what you say you’re going to do.

As Tim explains, “if you were to sum up the last 50 years of behavioral psychology in two words, they would be: LOGIC FAILS.” No matter how good a plan is, or how sincere our intentions, humans are horrible at self-discipline.

This is something that I’ve struggled with immensely in the past. As an insatiably curious person, I’ve often found myself jumping from one project to the next as motivations change day-by-day.

So I’ve decided that, if I can stay focused enough to land my first PAID Data Science freelance/contracting/consulting gig by Dec 1, I’ll reward myself with another solo backpacking trip over Winter Break. This one to Patagonia.

Why I’m Doing This

My interest in mastering Data Science is entirely driven by two motivating factors:

  1. Social Impact

One year ago, I published a personal narrative about my recent road-to-recovery from depression. It was a non-traditional path that began through daily mindfulness meditation practice with the Calm app.

Pic12 - Meditation
Meditating with Calm

Since then, the scientific efficacy in the non-clinical mental health space has only deepened. Day by day, it’s becoming increasingly clear: taking a pill with side-effects or waiting in line for therapy is not always necessary. In many cases, building healthy habits (e.g. meditation practice, physical exercise, diet-change) is a better long-term solution to mental illnesses like depression and anxiety.

This summer I took my first step in this direction by going to work for Calm.com, Inc. in San Francisco. Alongside one of my best friends, we developed & launched Calm College: the first US platform geared toward using mindfulness to improve mental health on college campuses throughout the country.

And thus far, Calm College has launched at 8 schools this Fall — Harvard, Princeton, NYU, Northwestern, Cornell, Johns Hopkins, USC, and the University of Pennsylvania.

Pic13 - CalmCollege
Calm College’s Harvard University landing page

Now back on campus, I have two semi-fleshed-out ideas for how Data Science could be applied to mental health and mindfulness:

  1. Use national survey data (e.g. Healthy Minds for college student mental health) to build a predictive model that identifies high-risk students who need a helping hand.

  2. Survey existing users of particular interventions (e.g. Calm) to identify which demographics benefit most from mindfulness-based interventions and match people to solutions accordingly.

I’m still fleshing out these two ideas (and considering many others) so I’d love to hear what you may have in mind?

  1. Freedom & Growth

My second motivator comes from this simple truth: my three backpacking trips abroad have taught me more than anything else I’ve ever done.

Even just my most recent trip to Southeast Asia has already caused countless lifestyle changes back home. In two such examples, I’ve been regulating my use of technology by putting my phone on Airplane Mode for 12 hours/day, and just last week I donated half my clothes/possessions to GoodWill.

In essence, I’m loving all the ways travel has helped me grow and I don’t want it to stop anytime soon. So by learning Data Science (the most high-demand craft of the 21st century), I’m earning my freedom to work/live wherever I want after graduation.

Pic15 - Adventure3
Temple-hopping (the new bar-hopping) in Siem Reap, Cambodia

Moreover, I hope to face my fears while building skills that add real-world value. In the best case, I’ll achieve total financial independence. And in the worst case, I’ll have learned the art and discipline of independent/accelerated learning — a skill that’s transferable to anything I hope to learn in the future.

But it’s not just about my own growth. I want to bring you along for the ride. By blogging about my experiences over the next 6 months, I hope to empower anyone else who’s interested to learn along with me.

That’s why I’ll be open-sourcing every single resource, insight, and finding I come across over the next 6 months, through this blog. I hope to build off Tim Ferriss’ framework by creating my own: a free framework for anyone — now or in the future — to master Data Science in less than 6 months.

Already, learning as much as I have since returning from Southeast Asia has made me feel like Superman. And that’s exactly how I want you to feel too: a powerful being in charge of your own destiny.

The Road Ahead

Pic15 - RoadAhead

My goal is straightforward: by the time that I graduate college in 6 months, I aim to be a world-class (top 5%) Data Scientist; as measured by the caliber of my professional project portfolio.

Doing this is as much about pursuing my purpose of helping people live more mindfully as it is about empowering others to learn and grow along with me.

So if you’re interested in following along this journey, please feel free to drop your email in the sign-up bar below.

By signing-up, you’ll receive one (just one) email every couple weeks when I’ve posted a new update. And of course you can opt-out anytime.

In my next post, I’ll be going into more detail around the LEGO blocks of Data Science deconstructed, actionable tips & tricks from interviews with experts, a bootcamp I’ve already enrolled in, conferences I’ll be attending, and much more.

Let the fun begin 🙂