SlideShare a Scribd company logo
1 of 28
A Hitchhiker’s Guide to
Data Science
sudeep das
Sudeep Das
Senior Machine Learning Researcher
@datamusing
My Journey
Ph. D. Astrophysics
Cosmic Microwave Background
Gravitational Lensing
Beats Music
Core Recommendation Systems Group
What do I do?
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
The Grand Innovation Workflow
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
In some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
yet in some other companies, this is a data scientist
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
At Netflix, this is broadly what I do
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Tools of the trade
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Matplotlib, Tableau, Vega, Plotly, custom javascript (d3)
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metrics
Hive, s3, APIs in Flask/Django/Java
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipeline
Monitor offline
metricsPython, SciKit-learn, Jupyter notebooks,
TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ...
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Docker, company specific platforms
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Identify
Problem
Understand what
is important to
the business
Deep Data Dives
Visualizations
Communicate to
Stakeholders
Sometimes top
down, sometimes
ground Up Idea
Generation
Prepare
Data
Build
Models
Implement in
Production
Test
Hypotheses
Slice/dice/
massage data
Work with data
teams to ensure
data integrity
Make sure data
tables/feeds that
you need are
stood up
Offline/online
data integrity
Prototype
features
Modeling
extremes: out-of
the-box Logistic
Regression, GBMs
to adapting an
emergent idea
from a recent
paper!
Set up offline
training pipelines
Monitor offline
metrics
Java, Scala, in some cases Python, company specific
Design the
experiment/hypot
hesis/cell
structure
Integrate your
models with the
production
systems (code
review, load
testing)
Hook up with the
testing platform
Read results of
experiments to
determine
significance
Slice and dice the
online data to
determine if your
test affected the
intended audience
If results are flat,
rinse and repeat!
Types of Problems
● Personalization
● Search
● Object recognition
● Voice/speech recognition
● Pattern recognition
● Natural Language
Processing
● Trend prediction
● Segmentation/clustering
● Dynamic Pricing
● Optimization
● Outlier Detection
At Netflix, we do a bit of everything
Emergent Trends
Probabilistic Graphical Models -
Bayes Nets
Deep Learning
Causal
Inference
(Deep)
Reinforcement
Learning
What academia prepares you for
● Perseverance
● Ability to pick up new technical skills
● Presentation skills
● Some quantitative visualization skills
● Ability to distil technical research in related areas and adapt it to the problem at hand
● If you are from a quantitative and experimental field:
○ Mathematical abilities
○ Knowledge of Basic Statistics - error analysis, experiment design
○ Some parameter estimation, bayesian inference exposure
○ Some ability to write code
○ Some exposure to general machine learning
● Learning from failure: Most A/B tests fail - so do experiments in academia
● Writing papers/ technical blogs etc.
What academia doesn’t prepare you for
● Being a good listener
● Asking questions
● Understanding and articulating the business value of your technical pursuit
● Writing clean, maintainable code with documentation and unit tests
● Ability to collaborate across teams and cultures - cross-functionally
● Admitting that “Good enough” is better than perfect
● Coping with quick project timelines
● Documenting, sharing, getting early input on projects
● Dealing with live, large, and exceptionally dirty datasets.
● Understanding that research in Industry is results driven and not publication driven.
● Stepping out of your focus area and seeing your problem in the bigger context of where your
company is headed.
Marketing Yourself
Fill in your
basic skills
gaps
Databases, SQL,
Spark familiarity
Data Structures
Algo/CS 101
Get really strong
in one language -
highly
recommend
Python - pandas,
scikit ecosystem
Good coding
practices -
documentation,
modular code,
unit tests
Amp up
your ML
Knowledge
Create an
Online
Presence
Improve soft
skills
Interview
Prep
Your friends:
Online courses
and open
datasets!
Do mini projects
on ML, esp. Deep
Learning,
Reinforcement
Learning. Get
creative!
Get a rock solid
foundation in
basic stats.
Kaggle
Competitions
Github repo so
recruiters can look
at your code.
Put your hobby
projects online
Write a blog post
on something new
you learned
Follow/contribute
to Stackoverflow
Landing the First Job!
Identify
weakness in
communication
skills and work
on them.
Pick up speaking
engagements at
meetups, at your
university, and
conferences such
as PyData
Do collaborative
projects with
people who are
also transitioning
Practise whiteboarding,
collaborative coding on
CoderPad
Standard books like
Cracking the Coding
Interview, Glassdoor
Go for some “dry run”
interviews.
Do background research
on the company - be
inquisitive, ask
questions
Keep at it!
@datamusing

More Related Content

What's hot

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectiveJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsYves Raimond
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixJaya Kawale
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsYves Raimond
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at NetflixLinas Baltrunas
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Anoop Deoras
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Justin Basilico
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsJaya Kawale
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsXavier Amatriain
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemAnoop Deoras
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized HomepageJustin Basilico
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...Sudeep Das, Ph.D.
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender modelsParmeshwar Khurd
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsJustin Basilico
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender SystemsJustin Basilico
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at NetflixJustin Basilico
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixJustin Basilico
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation enginesGeorgian Micsa
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systemsJaya Kawale
 

What's hot (20)

Past, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry PerspectivePast, Present & Future of Recommender Systems: An Industry Perspective
Past, Present & Future of Recommender Systems: An Industry Perspective
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
A Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at NetflixA Multi-Armed Bandit Framework For Recommendations at Netflix
A Multi-Armed Bandit Framework For Recommendations at Netflix
 
Time, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender SystemsTime, Context and Causality in Recommender Systems
Time, Context and Causality in Recommender Systems
 
Contextualization at Netflix
Contextualization at NetflixContextualization at Netflix
Contextualization at Netflix
 
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019Tutorial on Deep Learning in Recommender System, Lars summer school 2019
Tutorial on Deep Learning in Recommender System, Lars summer school 2019
 
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
Recap: Designing a more Efficient Estimator for Off-policy Evaluation in Band...
 
Sequential Decision Making in Recommendations
Sequential Decision Making in RecommendationsSequential Decision Making in Recommendations
Sequential Decision Making in Recommendations
 
Netflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 StarsNetflix Recommendations - Beyond the 5 Stars
Netflix Recommendations - Beyond the 5 Stars
 
Shallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender SystemShallow and Deep Latent Models for Recommender System
Shallow and Deep Latent Models for Recommender System
 
Learning a Personalized Homepage
Learning a Personalized HomepageLearning a Personalized Homepage
Learning a Personalized Homepage
 
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se... Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
Deeper Things: How Netflix Leverages Deep Learning in Recommendations and Se...
 
Missing values in recommender models
Missing values in recommender modelsMissing values in recommender models
Missing values in recommender models
 
Personalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing RecommendationsPersonalized Page Generation for Browsing Recommendations
Personalized Page Generation for Browsing Recommendations
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Artwork Personalization at Netflix
Artwork Personalization at NetflixArtwork Personalization at Netflix
Artwork Personalization at Netflix
 
Recent Trends in Personalization at Netflix
Recent Trends in Personalization at NetflixRecent Trends in Personalization at Netflix
Recent Trends in Personalization at Netflix
 
Recommendation engines
Recommendation enginesRecommendation engines
Recommendation engines
 
Exploration and diversity in recommender systems
Exploration and diversity in recommender systemsExploration and diversity in recommender systems
Exploration and diversity in recommender systems
 

Similar to Academia to Data Science - A Hitchhiker's Guide

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning CCG
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningMostafa
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Zenodia Charpy
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure Zenodia Charpy
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDays Riga
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...San Diego Supercomputer Center
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLPaco Nathan
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databaseselliando dias
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dcc.titus.brown
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management ToolkitJack Moore
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data scienceShilpaKrishna6
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2Roger Barga
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18Harvinder Atwal
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLPaco Nathan
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseLisa Cohen
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine LearningMostafa
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New worldDavid Leip
 

Similar to Academia to Data Science - A Hitchhiker's Guide (20)

Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning Afternoons with Azure - Azure Machine Learning
Afternoons with Azure - Azure Machine Learning
 
Building predictive models in Azure Machine Learning
Building predictive models in Azure Machine LearningBuilding predictive models in Azure Machine Learning
Building predictive models in Azure Machine Learning
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Datascience and Azure(v1.0)
Datascience and Azure(v1.0)Datascience and Azure(v1.0)
Datascience and Azure(v1.0)
 
Data Science on Azure
Data Science on Azure Data Science on Azure
Data Science on Azure
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
DevOpsDaysRiga 2017 ignite: Mikhail Iljin - DevOps meets Data Science - how t...
 
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
SciDB : Open Source Data Management System for Data-Intensive Scientific Anal...
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 
Automated Testing with Databases
Automated Testing with DatabasesAutomated Testing with Databases
Automated Testing with Databases
 
2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc2013 nas-ehs-data-integration-dc
2013 nas-ehs-data-integration-dc
 
The Data Science Product Management Toolkit
The Data Science Product Management ToolkitThe Data Science Product Management Toolkit
The Data Science Product Management Toolkit
 
AI-SDV 2020: Kairntech
AI-SDV 2020: KairntechAI-SDV 2020: Kairntech
AI-SDV 2020: Kairntech
 
Data science | What is Data science
Data science | What is Data scienceData science | What is Data science
Data science | What is Data science
 
Barga Data Science lecture 2
Barga Data Science lecture 2Barga Data Science lecture 2
Barga Data Science lecture 2
 
DataOps: Nine steps to transform your data science impact Strata London May 18
DataOps: Nine steps to transform your data science impact  Strata London May 18DataOps: Nine steps to transform your data science impact  Strata London May 18
DataOps: Nine steps to transform your data science impact Strata London May 18
 
Data Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area MLData Workflows for Machine Learning - SF Bay Area ML
Data Workflows for Machine Learning - SF Bay Area ML
 
Tips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the EnterpriseTips for Effective Data Science in the Enterprise
Tips for Effective Data Science in the Enterprise
 
Azure Machine Learning
Azure Machine LearningAzure Machine Learning
Azure Machine Learning
 
Software Development in the Brave New world
Software Development in the Brave New worldSoftware Development in the Brave New world
Software Development in the Brave New world
 

Recently uploaded

Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkSujalTamhane
 
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls DubaiDark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls Dubaikojalkojal131
 
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130Suhani Kapoor
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...amitlee9823
 
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfExperience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfSoham Mondal
 
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...shivangimorya083
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Masuk Ahmed
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negronnegronf24
 
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士obuhobo
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...Call Girls in Nagpur High Profile
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Résumé (2 pager - 12 ft standard syntax)
Résumé (2 pager -  12 ft standard syntax)Résumé (2 pager -  12 ft standard syntax)
Résumé (2 pager - 12 ft standard syntax)Soham Mondal
 
PM Job Search Council Info Session - PMI Silver Spring Chapter
PM Job Search Council Info Session - PMI Silver Spring ChapterPM Job Search Council Info Session - PMI Silver Spring Chapter
PM Job Search Council Info Session - PMI Silver Spring ChapterHector Del Castillo, CPM, CPMM
 
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Call Girls in Nagpur High Profile
 
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boodykojalkojal131
 
Biography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleBiography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleHafizMuhammadAbdulla5
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)Delhi Call girls
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 

Recently uploaded (20)

Internship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmkInternship Report].pdf iiwmoosmsosmshkssmk
Internship Report].pdf iiwmoosmsosmshkssmk
 
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls DubaiDark Dubai Call Girls O525547819 Skin Call Girls Dubai
Dark Dubai Call Girls O525547819 Skin Call Girls Dubai
 
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
VIP Call Girls Service Film Nagar Hyderabad Call +91-8250192130
 
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
Nandini Layout Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangal...
 
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdfExperience Certificate - Marketing Analyst-Soham Mondal.pdf
Experience Certificate - Marketing Analyst-Soham Mondal.pdf
 
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
Delhi Call Girls Preet Vihar 9711199171 ☎✔👌✔ Whatsapp Body to body massage wi...
 
Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..Presentation on Workplace Politics.ppt..
Presentation on Workplace Politics.ppt..
 
Personal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando NegronPersonal Brand Exploration - Fernando Negron
Personal Brand Exploration - Fernando Negron
 
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In East Of Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士
内布拉斯加大学林肯分校毕业证录取书( 退学 )学位证书硕士
 
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...Booking open Available Pune Call Girls Ambegaon Khurd  6297143586 Call Hot In...
Booking open Available Pune Call Girls Ambegaon Khurd 6297143586 Call Hot In...
 
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Greater Noida 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Résumé (2 pager - 12 ft standard syntax)
Résumé (2 pager -  12 ft standard syntax)Résumé (2 pager -  12 ft standard syntax)
Résumé (2 pager - 12 ft standard syntax)
 
PM Job Search Council Info Session - PMI Silver Spring Chapter
PM Job Search Council Info Session - PMI Silver Spring ChapterPM Job Search Council Info Session - PMI Silver Spring Chapter
PM Job Search Council Info Session - PMI Silver Spring Chapter
 
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...Top Rated  Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
Top Rated Pune Call Girls Deccan ⟟ 6297143586 ⟟ Call Me For Genuine Sex Serv...
 
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Nehru Place 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big BoodyDubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
Dubai Call Girls Demons O525547819 Call Girls IN DUbai Natural Big Boody
 
Biography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO GoogleBiography of Sundar Pichai, the CEO Google
Biography of Sundar Pichai, the CEO Google
 
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
WhatsApp 📞 8448380779 ✅Call Girls In Salarpur Sector 81 ( Noida)
 
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Patparganj 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 

Academia to Data Science - A Hitchhiker's Guide

  • 1. A Hitchhiker’s Guide to Data Science sudeep das Sudeep Das Senior Machine Learning Researcher @datamusing
  • 3. Ph. D. Astrophysics Cosmic Microwave Background Gravitational Lensing
  • 5. What do I do?
  • 6. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics The Grand Innovation Workflow Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 7. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 8. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics In some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 9. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics yet in some other companies, this is a data scientist Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 10. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics At Netflix, this is broadly what I do Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 11. Tools of the trade
  • 12. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics SQL, Spark (scala), PySpark, Python-Pandas, Hive,AWS-S3 Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 13. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Matplotlib, Tableau, Vega, Plotly, custom javascript (d3) Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 14. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metrics Hive, s3, APIs in Flask/Django/Java Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 15. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipeline Monitor offline metricsPython, SciKit-learn, Jupyter notebooks, TensorFlow/Keras, XGBoost, SparkML/scala, Zeppelin ... Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 16. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Docker, company specific platforms Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 17. Identify Problem Understand what is important to the business Deep Data Dives Visualizations Communicate to Stakeholders Sometimes top down, sometimes ground Up Idea Generation Prepare Data Build Models Implement in Production Test Hypotheses Slice/dice/ massage data Work with data teams to ensure data integrity Make sure data tables/feeds that you need are stood up Offline/online data integrity Prototype features Modeling extremes: out-of the-box Logistic Regression, GBMs to adapting an emergent idea from a recent paper! Set up offline training pipelines Monitor offline metrics Java, Scala, in some cases Python, company specific Design the experiment/hypot hesis/cell structure Integrate your models with the production systems (code review, load testing) Hook up with the testing platform Read results of experiments to determine significance Slice and dice the online data to determine if your test affected the intended audience If results are flat, rinse and repeat!
  • 19. ● Personalization ● Search ● Object recognition ● Voice/speech recognition ● Pattern recognition ● Natural Language Processing ● Trend prediction ● Segmentation/clustering ● Dynamic Pricing ● Optimization ● Outlier Detection At Netflix, we do a bit of everything
  • 21. Probabilistic Graphical Models - Bayes Nets Deep Learning Causal Inference (Deep) Reinforcement Learning
  • 23. ● Perseverance ● Ability to pick up new technical skills ● Presentation skills ● Some quantitative visualization skills ● Ability to distil technical research in related areas and adapt it to the problem at hand ● If you are from a quantitative and experimental field: ○ Mathematical abilities ○ Knowledge of Basic Statistics - error analysis, experiment design ○ Some parameter estimation, bayesian inference exposure ○ Some ability to write code ○ Some exposure to general machine learning ● Learning from failure: Most A/B tests fail - so do experiments in academia ● Writing papers/ technical blogs etc.
  • 24. What academia doesn’t prepare you for
  • 25. ● Being a good listener ● Asking questions ● Understanding and articulating the business value of your technical pursuit ● Writing clean, maintainable code with documentation and unit tests ● Ability to collaborate across teams and cultures - cross-functionally ● Admitting that “Good enough” is better than perfect ● Coping with quick project timelines ● Documenting, sharing, getting early input on projects ● Dealing with live, large, and exceptionally dirty datasets. ● Understanding that research in Industry is results driven and not publication driven. ● Stepping out of your focus area and seeing your problem in the bigger context of where your company is headed.
  • 27. Fill in your basic skills gaps Databases, SQL, Spark familiarity Data Structures Algo/CS 101 Get really strong in one language - highly recommend Python - pandas, scikit ecosystem Good coding practices - documentation, modular code, unit tests Amp up your ML Knowledge Create an Online Presence Improve soft skills Interview Prep Your friends: Online courses and open datasets! Do mini projects on ML, esp. Deep Learning, Reinforcement Learning. Get creative! Get a rock solid foundation in basic stats. Kaggle Competitions Github repo so recruiters can look at your code. Put your hobby projects online Write a blog post on something new you learned Follow/contribute to Stackoverflow Landing the First Job! Identify weakness in communication skills and work on them. Pick up speaking engagements at meetups, at your university, and conferences such as PyData Do collaborative projects with people who are also transitioning Practise whiteboarding, collaborative coding on CoderPad Standard books like Cracking the Coding Interview, Glassdoor Go for some “dry run” interviews. Do background research on the company - be inquisitive, ask questions Keep at it!