I started Aware Healthcare because I believe the greatest problem in mental healthcare is not stigma or access, but measurement. Mental health patients often don’t receive the care they need because we lack a universal vocabulary for describing how we’re mentally feeling. And mental health providers often can’t treat those who need it most because we lack an objective measure for prioritizing one condition or approach over another.
So why do we have a thermometer for the body but not a thermometer for the mind?
We’re starting with addiction because there is simply no more costly, preventable, or unmanaged disease in the United States of America. Addiction costs our nation $700 billion every year and leads to 70 other comorbid medical conditions. 16% of Americans meet criteria for clinical addiction and another 32% classify as risky users. Yet, with nearly half of Americans *directly* affected, we continue to turn our attention away from this chronic, complex brain disease.
So how big of a problem does addiction need to become before we give it the attention it deserves?
Now in a post-COVID world, the need for remote monitoring of mental healthcare has never been greater. This is no longer just an idea. Real patients in recovery from a substance-use disorder are being touched by our work every day. Which is why we’re inviting 7 new people to join us, in paid positions, starting immediately.
Interested? Check out our open positions below, and message me directly if any of the roles are a good fit for you or somebody you know.
Over the past 4 months, I’ve been starting a new technology company focused on predicting and preventing addiction relapse.
Then yesterday, on Christmas morning, I found out that one of my teammates had overdosed. Overnight late last week, he passed away in his sleep. The next morning, his mother found him lying in his bed, his glasses still on… his body cold. He was 23 years old.
On Christmas morning, as I sat for a half-hour on the phone with his mother, listening to her crying hysterically… still mourning the loss of her son… she told that me in the days leading up to his death she had never seen him so happy.
“I’m sorry, I think the phone may have cut out. Did you say ‘happy’?”… “Yes. He told me about the promotion you gave him last week. He was so excited. He said it was his dream job… that he finally felt successful. He told me ‘Mom, I want to make you proud’.”
The Story, As I Understand It
Speaking with his childhood friends thereafter, I learned that, for at least the past decade, he’s struggled with addiction, depression, insomnia, and anxiety. One of his friends from middle school told me he started seeing a therapist for it all when he was 15 or 16. A second friend told me “he took Xanax a lot”. A third told me he “had a thing with opioids” and “kind of had an alcohol addiction”. He never me told any of this.
His mother explained that her husband — his father — has struggled with “these issues” his entire life. I know from my own research that addiction is a complex brain disease originating in the reward circuitry of the brain, and that genetics account for 50-75% of the risk. As one addiction psychiatrist put it, “people may choose to take drugs, but nobody chooses to be an addict”
Two days before his death, I promoted him to lead our engineering team. While we had only met just 3 months ago, his skillset was incredibly impressive. He was a full-stack engineer. A product manager. A digital designer. On our last Zoom call, he showed more initiative than he ever had before… volunteering to stand-up our technology live in the cloud on Amazon Web Services (AWS), to help with customer discovery, to reach out to every investor he knew. It was just last year that he graduated the University of Michigan with a degree in Computer Science… but boy did he know how to hustle.
The night he passed, I set him up with a new email on our company’s Google Admin console. Then we texted to coordinate him setting up a new AWS account for the company. I emailed him asking if he could take ownership over finding us the AWS credits we needed in the most cost-effective way possible. He emailed back in less than 3 minutes, saying “Yup, got it covered. Have rough estimates of the cost, will let you know once it’s finalized.”
Then I never heard from him again.
When Facts Become Feeling
The thing I just can’t get over is the irony of it all. He overdosed while building a tool to prevent addiction relapse. On one hand, it makes absolutely no sense. On the other hand, it makes perfect sense… but in a way that I can’t exactly wrap my head around.
It gives me chills to think that I was likely the last person he communicated with before he died… that our work together was one of the last things he thought about. The night he passed, he told his Mom he would be “up late, finishing a project”. Nothing unusual: he was a hard-worker who, when he couldn’t sleep, often worked through the night. Nobody thought it would be his last.
Over the past 2 months I’ve been learning everything I can about addiction. I just finished a 2012 report by CASA Columbia which spanned 250 pages. I’ve been listening to people’s struggles first-hand, through attending addiction support groups — like AA, NA, SMART, and Refuge Recovery — several times per week in my area. I’ve even been holding near-daily meetings with addiction psychiatrists, to learn everything they know about treating and managing the disease.
And yet, despite all of this, I couldn’t see it right in front of my own face… on my own team.
Everything I’ve been reading… everything I’ve been learning… did not truly come alive until just yesterday, when I was on the phone with the mother of my teammate, listening to her sobbing over the death of her only son.
Our mission just got real.
What Comes Next
While we’re still an early-stage startup in stealth-mode, this unexpected loss has me feeling the need to share a bit about where we’re headed.
The one-liner is this: my company, Conscious Insights, is a consulting group developing an AI technology to predict and prevent substance-use disorder relapse using passive meta-data from patients’ smartphones.
Our mission is to build the ‘thermometer for the mind’; enabling care providers to ‘check the mental temperature’ of their patients (with explicit permission) — in an objective, continuous, ecological, and passive fashion — at any time. This way, they can determine which patients require heightened attention and intervention, in advance of relapse.
In my teammate’s case, our technology could have potentially let his mother and doctor know he needed help, days or weeks in advance.
And while there is still a ton of work to do, what started as ‘a crazy idea’ just a few months ago is starting to come together.
In March 2020 Conscious Insights will be kicking off a 300+ person clinical study at several Community Health Centers across the state of California. It will be the first study of its kind, and the largest to ever use technology to predict/prevent addiction relapse.
This is why my teammate was up late that night. Because he saw setting up our AWS account as the first step in a larger opportunity to help hundreds more with the same struggle that’s plagued him and his father their entire lives.
I just wish I knew before it was too late. For others in the future, it’s our mission to change that.
Join the Mission
If you were moved by this story and are interested in supporting our mission, here are some ways you can get involved:
RESEARCH WITH US: We are seeking additional research partners (e.g. health centers, treatment clinics) to join our upcoming clinical studies. Through partnering with us, we are offering to provide research staff, obtain IRB approval, grant early access to the eventual product, and fully compensate all parties involved for their time. Not to mention the opportunity to be recognized as a leader in the field of addiction medicine as we learn, together, what early warning signs exist addiction relapse… and ultimately develop a tool that alerts trained medical professionals to intervene before it’s too late.
BUILD WITH US: We are openly hiring for paid Data Science, Machine Learning, and Backend Engineering positions on our team. We have several part-time roles (10-20 hours/week) starting in January 2020 with our consulting services business, which you can apply for here. Likewise, we have several full-time positions (40 hours/week) starting in August 2020 (involving signal processing analysis with smartphone meta-data) for this new product business.
SPEAK WITH US: We are actively recruiting advisors, with stock-options, primarily across three different areas:
(A) experts in treating substance-use disorders (e.g. addiction psychiatrists),
(B) health professionals in managing addiction (e.g. nurse care managers),
(C) healthcare leaders whose systems financially ‘bear risk’ for addiction relapse (e.g. capitated insurance payers)
How many people know someone who’s depressed despite the fact that they take anti-depressants? Or have people in your lives affected by addiction – opioid, alcohol, or otherwise?
Well, there’s hope.
The scientific research that continues to come out around the use of psychedelics to treat depression, anxiety, opioid addiction, alcoholism, and a whole host of other mental disorders… is nothing short of amazing.
In two such examples, cited in the video below:
1) [4:10-7:10] In the largest study to-date examining the effect of psilocybin on depression and anxiety in individuals with life threatening cancer diagnosis (people freaking out because they’re going to die), a single high-dose session of so-called magic mushrooms resulted in sustained reduction of depression and anxiety, from clinically-severe levels (23/25, 26/30) to nearly-symptom-free levels (6/25, 7/30) a full 6 months out.
To put this in perspective: current depression medication (most commonly SSRIs) hasn’t evolved since the 1980’s, requires people to swallow a pill every day, comes with a long-list of side-effects, and does absolutely nothing for the 1/3 of depressed adults with treatment-resistant depression (TRD).
2) [7:10-9:00] In a pilot study examining the effect of psilocybin in the treatment of tobacco addiction (people who want to but cannot quit smoking), three low-doses of mushrooms led to 80% of participants being biologically-confirmed (e.g. breathe samples, urine samples) as smoke-free 6 months out. And these results held up to 60% 2.5 years after their target quit-date.
Comparatively, the best FDA-approved medication we currently have in treating tobacco addiction is less than half as effective, averaging 35% abstinence 6 months out.
And the best part?
The Imperial College London just announced they’re launching the world’s first Centre for Psychedelic Research (2.5 minute teaser video here) so it appears this is just the beginning.
As someone with a family history of mental illness, who’s had my own fair share of battles with depression, and lost family/friends to the addictions described above… watching this video and reading these studies, I can’t help but feel incredibly hopeful for the future.
Today I found myself digging through a time-capsule worth of bookmarks from the past 5 years in search for a master list of design resources requested by a co-worker.
In the process of all this digging, I came across countless articles I’ve since forgotten existed, yet at the time of reading were nothing short of mind-expanding for me.
It’s funny how learning works. When we really learn something, the lesson becomes part of who we are. But somewhere along the way, we tend to forget the source.
Perhaps this is where the myth of a self-made person comes from? In ourselves and others, all we ever see is the end-result; too easily forgetting all the people that’ve helped along the way.
As I look back on the 1000+ articles I’ve read and bookmarked over the past 5 years, I thought I would share the 5 that have been most influential on my thinking.
I’ve chosen these articles because, since reading each of them, I’ve experienced a distinct before/after in how I approach the given topic. And collectively, I would go so far as to say that the lessons I’ve taken away have served as more of an education than school ever could:
In what has been quite the unexpected turn of events over the past week, I have some big news to share.
This week I will be packing up my things, moving from San Francisco California, and putting my business — Conscious Insights — on hold to go work full-time for Kevin Rose at his latest venture — Oak Meditation — as employee #7 / data scientist #1 in Portland Oregon.
For the past year and a half, I’ve been fully focused on applying data science to mindfulness-technology, and up until now have been convinced that consulting for many companies in this space was the best way to make the most positive impact.
Just 10 days ago, I would have never considered this a possibility.
However, after several face-to-face conversations with Kevin and the Oak team in Portland last week, and an abundance of video chats thereafter, I have been successfully convinced otherwise; given an offer and opportunity that I just cannot refuse.
So instead of working for many companies in this space, I’ll be going all-in on just one.
At Oak Meditation, not only will I be able to continue contributing to the cause I care so much about, but I’ll also have the chance to lead all-things data within this new organization, and build a world-class team around me in the process.
I’ll be growing in tangible and measurable ways each and everyday, creating a modern data architecture from the ground up, participating in BoD meetings + VC pitches, weaving data into every aspect of the business, and — most importantly — working alongside just a fantastic bunch of humans.
It’s ironic that, as I write this, I’m on a plane from SFO —> NYC for a 2-day Mindfulness in America conference. Since starting to meditate everyday 908 days ago (April 27 2016 was when it all began), this practice has found it’s way to the total center of my work and life.
Climbing to the top of Corona Heights late last night to say goodbye to San Francisco, a friend and I recounted about how much of a turning point that day has been for me. Now, when I think about how deeply the practice has transformed my life, and all the ways it’s allowed me to give back to others’ lives, I feel a sense of appreciation and purpose that’s impossible to put into words.
Working for Oak is the next step in this journey.
The hardest part about this decision was leaving SF: a city I just arrived in 5 weeks ago, and a place where most of my closest friends in the world reside. But alas, after many long-hikes and late-night-hangouts over the past week, I feel — deep in my bones — this is what I’m meant to do.
One of the best lessons I’ve learned over the past few years is that — when faced with a difficult decision or challenging situation — rarely is more information the answer.
Instead, what I’ve found to be so incredibly helpful is this idea of “simple truths” (what some may call “first principles”): short snippets of wisdom that have been gathered, carefully curated, and repeatedly learned over years of life experience.
I originally learned of the concept from a French philosopher named Alain de Botton, who’s ideas on education reform are fascinating, and is often quoted as saying “we overeducate ourselves out of simple truths”.
As Alain explains, school teaches us that once we know something, that’s it; you know it, and it’s time to move onto the next chapter. But this is dangerous, because it leads us to believe we understand more than we actually do. That is, knowing something in your head is entirely different than feeling it in your bones.
To truly understand requires repetition; repeatedly learning the same lesson over and over again until it translates from conceptual understanding to daily practice. Understanding an idea is on a much lower dimension than acting on it. For example, knowing that daily exercise is good for you is much different than actually going to the gym everyday.
So every time I move into a new place (5 times and counting over the past 3 years), I start a new wall of simple truths. And then, over the course of my time there, I gather these ‘simple truths’ from all sources: conversations with friends & mentors, books I read, podcasts I listen to, or even just experiences I have. Then, every time I’m faced with a difficult decision or challenging situation, I return back to the wall as my source of guiding light.
Friends who know me well (and have received my countless text messages sharing new addition to the wall) often give me shit about my seeming obsession with “post-it note wisdom”. But in a way, this is my religion. The difference is, instead of becoming defensive or dogmatic about it, I start over more than once a year. I’m always beginning again, gathering new lessons and repeatedly learning them until I can’t *not* remember.
Truthfully, up until now I’ve been pretty insecure about sharing these simple truths outside a core group friends… mostly because I realize that 95% of them won’t resonate with others in the same way they make sense to me. But, as I take down my 5th wall in the past 3 years, I feel a need to be more vulnerable than I’m naturally willing, and share.
For the past 11 months, these 25 simple truths (below) have been my guiding signals in a world of noise. There’s nothing complex about them, but they’ve been so very helpful for me in what may have otherwise felt like hopeless situations.
My hope is that, even if 24/25 pass you by, just 1 (5%) sticks with you; enough that you’re able to feel it in your bones, and not just know it in your head whenever you need it most. After all, as Derek Sivers says, “if information was the answer, we’d all be billionaires with perfect abs”.
Data Journeys is a podcast for aspiring Data Scientists where I’ll be interviewing world-class Data Scientists about their learning journeys.
In each episode, the goal is to have them tell their story and equip up-and-comers with the strategies, tactics, and tools that the best in the world have used to get to where they are today.
I’m speaking with guests ranging from the US Military to Silicon Valley, from the top-ranks of academia to down-under in Australia, with a focus on how they’ve bridged the gap between acquiring technical skills and creating real-world impact.
For example, two upcoming guests are Andrew Ng — the co-founder of Coursera — at Stanford University and Fernando Perez — the creator of Jupyter Notebooks — at UC Berkeley.
This is the SECOND in a series of posts on applying Tim Ferriss’ accelerated learning framework to Data Science. My goal is to become a world-class (top 5%) Data Scientist in < 6 months, while open-sourcing everything I find & learn on the way.
The purpose of this post is to empower others to start accelerating their own learning by:
deconstructing the complex craft of Data Science into its simple micro-skills
identifying the 20% of skills that contribute to 80% of outcomes
And if you stick around until the end, you’re in for a special treat.
Estimated reading time: 15 min ( to save you hours of spinning in circles 😉 )
A simple Google search of “how to learn Data Science” returns thousands of learning plans, degree programs, tutorials, and bootcamps. It’s never been more difficult for a beginner to find signal in the noise.
Everyone seems to have a different opinion, and the only common approach appears to be dumping a long list of courses to take and books to read, all the while providing little to no context into how these concepts fit into the bigger picture.
This post is my attempt to convert all the buzzwords & fluffy terminology into explicitly-learnable skills. To do this, I’ll be walking through my application of the first two steps to Tim Ferriss’ accelerated learning framework: Deconstruction & Selection.
Rather than jump right in to a roadmap of my own learning journey (that’ll be next post), I want to empower you to begin your own. And if you haven’t read my first post, I’d highly recommend starting there: www.ajgoldstein.com/learning-without-limits/
Deconstruction: The Data Science Process
“The whole is greater than the sum of its parts.” – Aristotle
It’s true: Data Science is not a single discipline, but a craft at the intersection of many. So in order to appreciate how the seemingly disparate puzzle pieces fit together, I present to you a story. It’s called “The Data Science Process”, and it has six parts:
Frame the problem: who are you helping? what do they need?
Collect raw data: what data is available? which parts are useful?
Process the data: what do the variables actually mean? what cleaning is required?
Explore the data: what patterns exist? are they significant?
Perform in-depth analysis: how can the past inform the future? to what degree?
Communicate results: why do the numbers matter? what should be done differently?
But before we begin, a couple quick caveats:
1) In large organizations, “The Data Science Process” is often carried out by an entire team, not a single individual. An individual can specialize in any one of the six steps, but for simplicity, we’ll be assuming aone-person team.
2) The insights that follow are a compilation of various expert interpretations; not my original ideas. I am not (yet) an expert Data Scientist, but over the past 6 weeks I’ve learned from many. Thus, I’m simply serving as the filter between hundreds of hours of research and the actionable insights you’ll find below.
In particular, I’ll be pulling from favorite online articles (linked throughout) and conversations with the following 10 experts:
Chris Brooks — Director of Learning Analytics at the University of Michigan
Josh Gardner — Data Science Research Associate, Team Leader on MDST
Jared Webb — PhD Candidate in Applied Math, Data Manager at MDST
Alex Chojnacki — Data Application Manager for Flint-Water-Crisis project
And to bring each step of the process to life, I’ll be using my work at Calm.com, Inc. in San Francisco this summer as a real-world case study.
While there, I leveraged analytics insights from Calm’s database of 11 million users to develop & launch Calm College — the first US platform geared toward using mindfulness to improve college student mental health.
Alright, let’s get started!
Step One: Frame The Problem
The first step of The Data Science process involves asking a lot of questions.
The exact manner in which you do this will depend on the context in which you’re working, but whether you’re in the private sector, public sector, or academia, the key idea is the same: before you can start to solve a problem, you have to deeply understand it.
Your goal here is to get into the clients’ head to understand their view of the problem and desired solution. In the case of a corporation, this will first involve speaking with managers & supervisors to identify the business priorities and strategy decisions that’ll influence your work.
It’s not uncommon for the first request that a Data Scientists’ receives to be entirely ambiguous (i.e. “we want to increase sales”). But it’ll be your job to translate the task into a concrete, well-defined data problem (i.e. “predict conversion rate & return-on-investment across customer segments.”)
This is where domain knowledge and product intuition is crucial. Speaking with subject-matter-experts to cut through confusing acronyms & dense terminology can be incredibly helpful here. And familiarizing yourself with the product/service will be essential to understanding the intuition behind metrics.
With Calm College, the ambiguous request we started with was to establish partnerships with universities to offer the Calm app as a student wellness resource.
To better understand our specific domain, we started by spending two weeks speaking on the phone with as many college administrators as possible.
We asked questions like:
How would you describe the mental health climate on your campus?
How high of a priority is improving student mental health?
What main resources do you currently offer students?
What have been the greatest challenges?
Is there precedence for offering 3rd party services?
By the time we got to the final question, nearly every administrator had described their campus’ mental health climate as nothing short of “toxic”, and expressed improving it as their #1 priority.
They explained that the greatest challenge to students seeking help has been overcoming logistical issues (i.e. wait-time, transportation, & money) with the counseling services they currently offer.
Finally, here’s where our ambiguous request became a data problem…
Administrators told us that, before a 3rd party service can be adopted, precedence requires evidence supporting its use. In other words, showing that students on campus are already using the Calm app would be crucial to getting a deal done.
Step Two: Collect Raw Data
The second step of the Data Science Process is typically the most straightforward: collect raw data.
This is where your first technical skill — querying structured databases with SQL — comes into play. But fret not; it’s not as complicated as it may sound.
More important than the querying itself, however, is your ability to identify all the relevant data sources available to you (e.g. web, internal/external databases) and extract that data into a useable format (e.g. .csv, .json, .xml).
Oftentimes, an analysis requires more than one dataset, so you’ll likely need to speak with backend-engineers in your organization who are more familiar with what data is being collected and where it currently resides. Communication is key.
With Calm College, this required me sitting down with Calm’s lead engineer and exploring ways to pull usage data for specific college campuses.
Ultimately, I found out that we could simply query user activity by email address and school location. So for the University of Michigan, for example, I simply searched the database for emails ending in “umich.edu” or locations listed as “Ann Arbor, MI”.
This approach wasn’t full-proof (turns out not all students were using their school email) but it did the job by giving us a representative sample of ~1000 users per college to compare different campuses’ activity head-to-head.
Step Three: Process The Data
The third step of the Data Science Process is the most underrated: process the data.
This is where a scripting language like Python or R comes into play, and a data wrangling tool like Python’s Pandas is absolutely indispensable.
Data cleaning is typically the most time-intensive part of data wrangling. In fact, in expert surveys it’s been estimated that up to 80% of a Data Scientists’ time is spent here: cleaning & preparing the data for analysis (more on this below).
The reason this can be so time-consuming is because — before you can analyze data — you have to go column-by-column, developing an understanding for the meaning of every variable and then checking for bad values accordingly.
The tricky part is that a bad value can be defined as many things: input errors, missing values, corrupt records, etc. And once you’ve identified a “bad value”, you have to decide whether it’s most appropriate (given the situation) to throw it away or replace it.
With Calm College, I faced two significant roadblocks here:
There was little to no company documentation on database variables
I didn’t know Python’s Pandas and felt too intimidated to try and learn
Each of these presented their own challenge:
It took me several days to figure out how to define an “active user” (i.e. should ‘active’ mean opening the app, starting a session, or completing a session?)
I had to use an analytics tool called Amplitude rather than coding in a script file.
After talking with Calm’s Product Manager, I was able to define an active user as someone who “starts a meditation session” and identify the right variables. Then I had to clean the data by filtering out students who hadn’t been active in the last 365 days.
The thought process here was that administrators (i.e. our client) would primarily be interested in student activity from the past academic year, and non-active students (i.e. “null” values) were outliers that, if included, would only skew the results.
Noticing a theme here? It’s about your clients’ interests, not your own.
Step Four: Explore The Data
The fourth step of the Data Science Process is where you explore the data, and the real adventure begins.
This is where the core competency of scientific computing (i.e. Python’s numpy, matplotlib, scipy, & pandas libraries) comes into play.
Using these libraries, you’ll split, segment, & plot the data, in search for patterns. Thus, the key is becoming really comfortable with producing quick & simple bar graphs, box plots, histograms, etc. that’ll let you catch trends early on.
Remember that analysts who produce beautiful externally-facing visualizations often have to iterate through hundreds of internally-facing ones first. So playing around with possibilities in this way is more of a guess-and-check art than a hard-and-fast science.
Finally, once you’ve identified some patterns, you’ll want to test them for statistical significance to determine which are worth including in a model. This is where a strong grounding in inferential statistics (e.g. hypothesis testing, confidence intervals) and experimental design (e.g. A/B tests, controlled trials) is essential.
With Calm College, I started by exploring factors that would influence a potential partnership: monthly engagement, week-by-week retention, and subscription rate.
My hypothesis going in was that elite schools known for student stress (i.e. Cornell, Harvard, MIT) would have significantly higher numbers across the three statistics. Or, in other words, I suspected that stressed-out kids need more calm.
To test this, I began by segmenting universities into their regional groups and then splitting areas into specific college towns. From there, I was able to compare the statistical significance of schools’ activity across local, regional, and national averages.
After several iterations of my experimental design (and hundreds of internally-facing visualizations), I found what I was looking for: a list of outlier schools that we would ultimately call “Calm’s Most Popular Colleges”.
Step Five: In-Depth Analysis
The fifth step of the Data Science process is where you create a model to explain or predict your findings.
This is where most people lose the forest for the trees, as they enter into the land of shiny algorithms and fancy mathematics. Creating models is by far the most over-glorified part of Data Science, which is why most degree programs solely focus on this single step.
But before jumping in to a particular solution, it’s important to pause and return to the bigger picture by asking yourself: “what am I really trying to do and why does it matter?”.
From here, you’ll:
apply your knowledge of algorithms’ contextual pros/cons to choose one approach best-suited for the situation
carry forward statistically significant variables (from the exploratory phase) using what Data Scientists call “feature engineering”
use a machine learning library like scikit-learn for implementation.
The overall goal is to use training data to build a model that generalizes to new (unseen) test data. So while building, it’s important that you’re keenly aware of (and capable of recognizing) overfitting and underfitting.
NOTE: I’d recommend starting by watching just one or two videos on a simple model type like logistic regression or decision trees, and then immediately applying what you’ve learned on a dataset you care about.
With Calm College, the model I was building was more “explanatory” than “predictive”.
That is, I was simply trying to identify the universities most suitable for a partnership and understand what factors about a school were contributing to that.
So what I ultimately built was a simple linear regression model (in Excel, no less) that used features like active user count, student enrollment, & university endowment to explain a university’s user activity over time.
Sure, building a predictive model would’ve been the “cool” thing to do, but the goal wasn’t to predict sales leads for the future; it was to establish partnerships with universities NOW.
Lesson learned: the job of a Data Scientist is NOT to build a fancy model; it’s to do whatever it takes to solve a real-world human problem.
Step Six: Communicate Results
The sixth step of the Data Science Process is where you bring it all together and communicate results.
This is where you practice the most underrated skill in the Data Science toolbox; the X-factor that separates the good Data Scientists from the great ones: data storytelling.
Speaking with experts, I heard it time and time again: your worth as a Data Scientist will be ultimately determined by your ability to convert insights into a clear and actionable story.
In other words, the ability to create and present simple, effective data visualizations to a non-technical audience is the most sought after skill in business today.
Finally, to create beautiful data visualizations, I’d recommend going beyond Python’s basic matplotlib library and checking out seaborn (statistical) and bokeh (interative).
With Calm College, we had to weave our findings on student activity into an actionable story for campus administrators.
First, I used our list of “Calm’s Most Popular Colleges” to generate sales leads, by reaching out to 50 schools that the model identified as most suitable for a partnership.
Then, for each of the 50 schools, I crafted a personalized story about their students’ activity on the Calm app.
For example, with Harvard, we reached out to the head of campus wellness to let her know that Harvard’s campus was a top 5 most popular college for the Calm app. Then we included 4 graphs depicting the following insights:
6% of the Cambridge, Massachusetts population (17,000+ people) are Calm users.
More than 82% of Harvard users are active on a monthly basis, with an average of 15 (fifteen!) sessions/month!
Week-by-week retention amongst Harvard users is 3x that of the average Calm user.
Yet, despite all of this, Harvard student’s subscription rate is still well below average.
The first 3 graphs told a story of extraordinary interest in the Calm app on Harvard’s campus. But what really drove home our program was the last point:
“despite all this amazing interest, it’s clear that your students cannot afford Calm’s $60/year subscription. That’s why you need Calm College: to make the Calm app a FREE wellness resource for your students.”
Rather than sell our product, we were selling their students’ past and present use of our product. And it worked like a charm.
Repeating this approach for other colleges, we were able to successfully get our foot-in-the-door at many of the most elite institutions in the country.
And eventually, thanks to this application of The Data Science Process, we were able to launch the program at 8 schools this Fall:
Selection: The Core 20%
“You are not flailing through a rainforest of information with a machete; you are a sniper with a single bull’s-eye in the cross-hairs.” — Tim Ferriss, The Four Hour Chef
The greatest mistake you can make in accelerated learning is trying to master everything. This is not Pokémon. You are not going to catch ’em all.
Instead, the key is being relentlessly focused with the micro-skills you choose to develop. Through rigorous application of the 80/20 rule, it’s possible to cut down a long list of possibilities to the highest frequency material. Then, once you’ve cleared your plate, it’s depth over breadth all the way.
In his book, the “Four Hour Chef”, Tim Ferriss discusses this selection process by introducing the idea of a “Minimum Effective Dose” (MED). Simply put, an MED is the smallest dose that will produce a desired outcome.
Here, I’ve broken down the MED for all 6 steps of The Data Science Process:
In conversations with experts, these 8 skills continuously came up as the most essential.
In particular, Data Wrangling (i.e. Python’s Pandas) was said to be the #1 skill (in terms of time spent doing) by every Data Scientist I spoke with. Data cleaning is not sexy, but it encapsulates up to 80% of the job.
You may be wondering where big data tools like Hadoop & Spark, or modeling techniques like neural networks & deep learning fall into all this. The answer: surely outside the core 20%.
To my surprise, many Data Scientists I spoke with emphasized that only a small percentage of companies have data that even requires something as complex as a neural network!
Instead, an overwhelming majority of employers need more simple services like data cleaning, exploratory analysis, and logistic regression models (as recently reflected in an industry-wide survey by Kaggle).
When choosing what to learn, remember: you can always revisit the heavier topics later, but don’t weigh yourself down at the start. The goal is to accelerate learning. So wait until your house of expertise has a strong foundation before adding the shiny stuff.
If you’re looking to master the fundamentals of Data Science in 6 months or less, you’ll want to simply focus on the core 20%.
“Live as if you were to die tomorrow. Learn as if you were to live forever.” — Mahatma Gandhi
I do not believe knowledge is useful for the sake of knowledge; only if you use what you’ve learned to improve your life, or the lives of others. So I would encourage you to pause, reflect, & ask yourself: “what’s the smallest possible action I can take right now with what I’ve learned?”.
For instance, a great place to start would be picking one of the six steps you’re most interested in and exploring the skills/resources associated with it. Then find a dataset that’s of interest to you and start learning by doing through a mini-side-project.
The key is trusting yourself by following the path that you’re instinctually most drawn to… because that’s where you’re find the most short-term motivation & long-term fulfillment.
Personally, after deconstructing data science and identifying the core 20%, I decided to enroll in Springboard’s Data Science Intensive online bootcamp (recently renamed to “Intermediate Data Science”). I chose this program because it was the only curriculum I could find that covered all 6 steps of the data science process while focusing in on all 8 skills of the core 20%.
Whatever you choose to do with this information, the important thing is that you do something. Getting started is always the hardest part, so I challenge you to turn intention into action.
Over the past few weeks, the power of the internet has sure become apparent. In just the first 7 days, my first post — Learning Without Limits — had 3000+ views from 66 countries around the world. Never did I expect it to spread so far and wide, but I guess I have all of you to thank for that.
So as long as you all continue to pay it forward, I’ll continue to be an open book.As promised, I’ve complied and will continue to open-source all my favorite resources, insights, and findings via this new page: ajgoldstein.com/resources.
All I ask of you is that you share this with people you think would benefit. That’s my call-to-action. Share. Why? Because we’re all in this together and true happiness comes from other people.
To follow along this journey,feel free to drop your email in the sign-up bar below. By signing up, you’ll receive one (just one) email when I’ve posted a new update.
And don’t hesitate to leave any questions, thoughts, or feedback you have in the comments box below. I’d love to hear from you.