SlideShare a Scribd company logo
THE OTHER
99%
OF A DATA
SCIENCE
PROJECT
Open Data Science Conference
Santa Clara | November 4-6th 2016
Eugene Mandel
@eugmandel
∎ @eugmandel
∎ lead of data science at directly
∎ formerly:
□ data science team at Jawbone
□ co-founder qualaroo, jaxtr
ABOUT ME
DATA SCIENCE NEEDS
PRODUCT MANAGEMENT
success of a data science
project has as much to do
with product management
as with data science
2 KINDS OF DATA SCIENCE
B
ANALYZE
A
BUILD
PAY
FOR
PARKING
WITH
YOUR
PHONE
DON’T
YOU
KNOW
ME?!
∎ “don’t you know me?!” -> “you get me!”
∎ get smarter with every interaction
∎ reduce search space
SMART
PRODUCTS
SMART PRODUCTS
BUT NOT THAT SMART...
SMART
PRODUCTS
GO
PROBABILISTIC
THE
OTHER
99%
PERCENT
algorithms
Show and explain your web, app or
software projects using these gadget
templates.
PARKING
APP
ON DEMAND CUSTOMER
SUPPORT
LOOKING FOR
OPPORTUNITIES
PROBLEM:
choose support
tickets that expert
users can resolve
LOOKING FOR
OPPORTUNITIES
CHOOSE
RESOLVABLE
TICKETS
WITH
MACHINE
LEARNING
GETTING THE DATA
GETTING ALLIES
GETTING THE DATA
CLEAN YOUR DATA
Automated bug reports
Surveys
Bounced emails
Internal tickets
Email metadata
Email threads
...
GUYS
CLEAN A
DATASET,
GET RICH
FEATURE
ENGINEERING
TRAINING -
COLD START PROBLEM
all tickets
tickets seen by expert
TRAINING -GET LABELS
“Is there a cat in this picture?” “Is this support ticket resolvable?”
TRAINING -GET LABELS
∎ label manually
∎ derive labels from user behavior
∎ derive labels from external sources
∎ mix
My favorite data science
algorithm is division.
Monica Rogati
Former VP of Data, Jawbone & LinkedIn data scientist
Tokenization
Bag of words (BOW)
Tf–idf
Random Forest Classifier
MODEL
DEVELOPMENT
PLAYING WELL WITH
ENGINEERING
∎ gaining trust
∎ development process
POINTS OF
INTEGRATION
online or offline?
DEVELOPMENT
integration -
broad APIs
“NAPKIN ARCHITECTURE”
IS IT
WORKING?
evaluating
data
products
Image source: https://themouseandthewindmill.wordpress.com
accuracy
precision/recall
driven by business
EVALUATION METRICS
IS IT
WORKING?
QA’ing
data
products
Image source: https://themouseandthewindmill.wordpress.com
PLAYING WELL WITH
DEVOPS
BRIDGING TECH
STACKS
IN PRODUCTION
THE KNOBS:
HOW TO CONTROL
THE PRODUCT
∎ on/off switch per customer
∎ prediction threshold
∎ exclusions
“... SMART…”
“... AI …”
“...MACHINE LEARNING…”
“...INTELLIGENT…”
NAMING THINGS
UPDATING THE MODEL
∎ input data changes
∎ users behaviour changes
∎ dataset grows
NEGATIVE SAMPLING
send small % of
predicted negative
as if they were
positive
predicted positive
NEGATIVE LABELING
send small % of
predicted negative
for manual labeling
predicted positive
∎ “Would you be able to resolve this ticket successfully?”
∎ “Would an expert user be able to resolve this ticket
successfully?”
∎ “Would an expert user be able to resolve this ticket
successfully without getting a negative rating?”
LABELING - HOW TO
PHRASE THE
QUESTION?
∎ customers
∎ sales
∎ account managers
∎ marketing
∎ execs
MESSAGING
CUSTOMER
ENGAGEMENT
PLAYBOOK
DATA ETHICS
INTERPRETABILITY
Image source:https://en.wikipedia.org/wiki/File:Blue_Poles_(Jackson_Pollock_painting).jpg
THANKS!
Eugene Mandel
@eugmandel
∎ Presentation template by SlidesCarnival
∎ Images:
□ http://jedismedicine.blogspot.com/
□ Jawbone
□ Directly
□ Wikipedia
□ https://themouseandthewindmill.wordpress.com
□ http://www.imdb.com/
CREDITS

More Related Content

What's hot

Agile data science
Agile data scienceAgile data science
Agile data science
Joel Horwitz
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
Mahesh Kumar CV
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
bodaceacat
 
Data Science
Data ScienceData Science
Data Science
Prithwis Mukerjee
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
odsc
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
Mark West
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
Edureka!
 
A Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementA Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project Management
Elaine K. Lee
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
Mohammed Barakat
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
Καρολίνα Κάτι
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
Peter Kua
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
Frank Kienle
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
ANOOP V S
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino Data Lab
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
Sreenatha Reddy K R
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
Srinath Perera
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
SwapnilDahake2
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 

What's hot (20)

Agile data science
Agile data scienceAgile data science
Agile data science
 
8 minute intro to data science
8 minute intro to data science 8 minute intro to data science
8 minute intro to data science
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Data Science
Data ScienceData Science
Data Science
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
A Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data ScienceA Practical-ish Introduction to Data Science
A Practical-ish Introduction to Data Science
 
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
What Is Data Science? Data Science Course - Data Science Tutorial For Beginne...
 
A Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project ManagementA Hybrid Approach to Data Science Project Management
A Hybrid Approach to Data Science Project Management
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Evaluation of big data analysis
Evaluation of big data analysisEvaluation of big data analysis
Evaluation of big data analysis
 
Data science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi PeriasamyData science vs. Data scientist by Jothi Periasamy
Data science vs. Data scientist by Jothi Periasamy
 
data scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st centurydata scientist the sexiest job of the 21st century
data scientist the sexiest job of the 21st century
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...Domino and AWS: collaborative analytics and model governance at financial ser...
Domino and AWS: collaborative analytics and model governance at financial ser...
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Data Science Lifecycle
Data Science LifecycleData Science Lifecycle
Data Science Lifecycle
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 

Viewers also liked

CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
Sergey Shelpuk
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
Lamjed Ben Jabeur
 
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATIONSAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
José Antonio Lorenzo
 
Metis data science_project_kiva_20150407
Metis data science_project_kiva_20150407Metis data science_project_kiva_20150407
Metis data science_project_kiva_20150407
Frederik Durant
 
CRISP-DM: Data Mining e Modelos Preditivos
CRISP-DM: Data Mining e Modelos PreditivosCRISP-DM: Data Mining e Modelos Preditivos
CRISP-DM: Data Mining e Modelos Preditivos
Leandro Guerra
 
Leading an open source project oscon2016
Leading an open source project oscon2016Leading an open source project oscon2016
Leading an open source project oscon2016
Tessa Mero
 
]project-open[ CVS+ACL Permission Configuration
]project-open[ CVS+ACL Permission Configuration]project-open[ CVS+ACL Permission Configuration
]project-open[ CVS+ACL Permission ConfigurationKlaus Hofeditz
 
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
Klaus Hofeditz
 
BFBM(12-2016) Business to business marketing
BFBM(12-2016) Business to business marketingBFBM(12-2016) Business to business marketing
BFBM(12-2016) Business to business marketing
Hub Myanmar Company Limited
 
How to cover the whole Translation Project Workflow with one open-source syst...
How to cover the whole Translation Project Workflow with one open-source syst...How to cover the whole Translation Project Workflow with one open-source syst...
How to cover the whole Translation Project Workflow with one open-source syst...
Qabiria
 
The Top 10 Free and Open Source Project Management Software For Your Small Bu...
The Top 10 Free and Open Source Project Management Software For Your Small Bu...The Top 10 Free and Open Source Project Management Software For Your Small Bu...
The Top 10 Free and Open Source Project Management Software For Your Small Bu...
Capterra
 
Open Source Project Management Part 2
Open Source Project Management Part 2Open Source Project Management Part 2
Open Source Project Management Part 2
Semen Arslan
 
Eclipse Mylyn Integration with ]project-open[
Eclipse Mylyn Integration with ]project-open[Eclipse Mylyn Integration with ]project-open[
Eclipse Mylyn Integration with ]project-open[
Klaus Hofeditz
 
BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
 BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ) BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
Hub Myanmar Company Limited
 
Five awesome django tutorials - Open Data Science
Five awesome django tutorials - Open Data ScienceFive awesome django tutorials - Open Data Science
Five awesome django tutorials - Open Data Science
opendatascience
 
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYCDan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
MLconf
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
VMware Tanzu
 
]project-open[ Budget Planning and Tracking
]project-open[ Budget Planning and Tracking]project-open[ Budget Planning and Tracking
]project-open[ Budget Planning and Tracking
Klaus Hofeditz
 
]project-open[ Timesheet Project Invoicing
]project-open[ Timesheet Project Invoicing]project-open[ Timesheet Project Invoicing
]project-open[ Timesheet Project Invoicing
Klaus Hofeditz
 
]project-open[ on Amazon AWS
]project-open[ on Amazon AWS]project-open[ on Amazon AWS
]project-open[ on Amazon AWS
Klaus Hofeditz
 

Viewers also liked (20)

CRISP-DM: a data science project methodology
CRISP-DM: a data science project methodologyCRISP-DM: a data science project methodology
CRISP-DM: a data science project methodology
 
Challenges of managing Data Science Project
Challenges of managing Data Science ProjectChallenges of managing Data Science Project
Challenges of managing Data Science Project
 
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATIONSAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
SAP FORUM 2016 - CAPGEMINI COLOMBIA - DIGITAL TRANSFORMATION
 
Metis data science_project_kiva_20150407
Metis data science_project_kiva_20150407Metis data science_project_kiva_20150407
Metis data science_project_kiva_20150407
 
CRISP-DM: Data Mining e Modelos Preditivos
CRISP-DM: Data Mining e Modelos PreditivosCRISP-DM: Data Mining e Modelos Preditivos
CRISP-DM: Data Mining e Modelos Preditivos
 
Leading an open source project oscon2016
Leading an open source project oscon2016Leading an open source project oscon2016
Leading an open source project oscon2016
 
]project-open[ CVS+ACL Permission Configuration
]project-open[ CVS+ACL Permission Configuration]project-open[ CVS+ACL Permission Configuration
]project-open[ CVS+ACL Permission Configuration
 
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
Tutorial: Writing Sencha Touch Mobile Apps using ]project-open[
 
BFBM(12-2016) Business to business marketing
BFBM(12-2016) Business to business marketingBFBM(12-2016) Business to business marketing
BFBM(12-2016) Business to business marketing
 
How to cover the whole Translation Project Workflow with one open-source syst...
How to cover the whole Translation Project Workflow with one open-source syst...How to cover the whole Translation Project Workflow with one open-source syst...
How to cover the whole Translation Project Workflow with one open-source syst...
 
The Top 10 Free and Open Source Project Management Software For Your Small Bu...
The Top 10 Free and Open Source Project Management Software For Your Small Bu...The Top 10 Free and Open Source Project Management Software For Your Small Bu...
The Top 10 Free and Open Source Project Management Software For Your Small Bu...
 
Open Source Project Management Part 2
Open Source Project Management Part 2Open Source Project Management Part 2
Open Source Project Management Part 2
 
Eclipse Mylyn Integration with ]project-open[
Eclipse Mylyn Integration with ]project-open[Eclipse Mylyn Integration with ]project-open[
Eclipse Mylyn Integration with ]project-open[
 
BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
 BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ) BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
BFBM(7-2016) Productivity : Smarter Faster Better ေဟာေျပာပြဲ (မံုရြာ)
 
Five awesome django tutorials - Open Data Science
Five awesome django tutorials - Open Data ScienceFive awesome django tutorials - Open Data Science
Five awesome django tutorials - Open Data Science
 
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYCDan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
Dan Mallinger, Data Science Practice Manager, Think Big Analytics at MLconf NYC
 
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
Webinar - The Science of Segmentation: What Questions You Should be Asking Yo...
 
]project-open[ Budget Planning and Tracking
]project-open[ Budget Planning and Tracking]project-open[ Budget Planning and Tracking
]project-open[ Budget Planning and Tracking
 
]project-open[ Timesheet Project Invoicing
]project-open[ Timesheet Project Invoicing]project-open[ Timesheet Project Invoicing
]project-open[ Timesheet Project Invoicing
 
]project-open[ on Amazon AWS
]project-open[ on Amazon AWS]project-open[ on Amazon AWS
]project-open[ on Amazon AWS
 

Similar to The Other 99% of a Data Science Project

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Ann Venkataraman
 
IoT as a metaphor!
IoT as a metaphor!IoT as a metaphor!
IoT as a metaphor!
PG Madhavan
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
Inside Analysis
 
BarCampBangalore presentation on MindCanvas
BarCampBangalore presentation on MindCanvasBarCampBangalore presentation on MindCanvas
BarCampBangalore presentation on MindCanvas
Amit Ranjan
 
Leap into data science!
Leap into data science!Leap into data science!
Leap into data science!
David "Gonzo" Gonzalez
 
LEAP into Data Science!
LEAP into Data Science!LEAP into Data Science!
LEAP into Data Science!
Dev Gonzalez
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
Kai Wähner
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience
Emi Kwon
 
AI in the Financial Services Industry
AI in the Financial Services IndustryAI in the Financial Services Industry
AI in the Financial Services Industry
Alison B. Lowndes
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
Inside Analysis
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Stefan Urbanek
 
The What, Why and How of Analytics Testing
The What, Why and How of Analytics TestingThe What, Why and How of Analytics Testing
The What, Why and How of Analytics Testing
Anand Bagmar
 
Practical Strategies for Targeting the Fortune 1000
Practical Strategies for Targeting the Fortune 1000Practical Strategies for Targeting the Fortune 1000
Practical Strategies for Targeting the Fortune 1000
BAO Inc.
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
nkabra
 
Predictive Asset Optimization - Advanced Analytics
Predictive Asset Optimization - Advanced AnalyticsPredictive Asset Optimization - Advanced Analytics
Predictive Asset Optimization - Advanced Analytics
Leonard Lee
 
Clicks, Conversions and Crawls
Clicks, Conversions and CrawlsClicks, Conversions and Crawls
Clicks, Conversions and Crawls
Michelle Robbins
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
Matt Dusig
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
Matt Dusig
 
SENTIENT ENTERPRISE
SENTIENT ENTERPRISESENTIENT ENTERPRISE
SENTIENT ENTERPRISE
Teradata
 
Iotx futures research_futures_trends_2011
Iotx futures research_futures_trends_2011Iotx futures research_futures_trends_2011
Iotx futures research_futures_trends_2011Andy Hunter
 

Similar to The Other 99% of a Data Science Project (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
IoT as a metaphor!
IoT as a metaphor!IoT as a metaphor!
IoT as a metaphor!
 
The Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine LearninThe Sky’s the Limit – The Rise of Machine Learnin
The Sky’s the Limit – The Rise of Machine Learnin
 
BarCampBangalore presentation on MindCanvas
BarCampBangalore presentation on MindCanvasBarCampBangalore presentation on MindCanvas
BarCampBangalore presentation on MindCanvas
 
Leap into data science!
Leap into data science!Leap into data science!
Leap into data science!
 
LEAP into Data Science!
LEAP into Data Science!LEAP into Data Science!
LEAP into Data Science!
 
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
How to Apply Machine Learning with R, H20, Apache Spark MLlib or PMML to Real...
 
Data-Driven Design for User Experience
Data-Driven Design for User Experience Data-Driven Design for User Experience
Data-Driven Design for User Experience
 
AI in the Financial Services Industry
AI in the Financial Services IndustryAI in the Financial Services Industry
AI in the Financial Services Industry
 
How Can Analytics Improve Business?
How Can Analytics Improve Business?How Can Analytics Improve Business?
How Can Analytics Improve Business?
 
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
Forces and Threats in a Data Warehouse (and why metadata and architecture is ...
 
The What, Why and How of Analytics Testing
The What, Why and How of Analytics TestingThe What, Why and How of Analytics Testing
The What, Why and How of Analytics Testing
 
Practical Strategies for Targeting the Fortune 1000
Practical Strategies for Targeting the Fortune 1000Practical Strategies for Targeting the Fortune 1000
Practical Strategies for Targeting the Fortune 1000
 
Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013Big data in marketing at harvard business club nick1 june 15 2013
Big data in marketing at harvard business club nick1 june 15 2013
 
Predictive Asset Optimization - Advanced Analytics
Predictive Asset Optimization - Advanced AnalyticsPredictive Asset Optimization - Advanced Analytics
Predictive Asset Optimization - Advanced Analytics
 
Clicks, Conversions and Crawls
Clicks, Conversions and CrawlsClicks, Conversions and Crawls
Clicks, Conversions and Crawls
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!Webinar: Everyone cares about sample quality but not everyone values it!
Webinar: Everyone cares about sample quality but not everyone values it!
 
SENTIENT ENTERPRISE
SENTIENT ENTERPRISESENTIENT ENTERPRISE
SENTIENT ENTERPRISE
 
Iotx futures research_futures_trends_2011
Iotx futures research_futures_trends_2011Iotx futures research_futures_trends_2011
Iotx futures research_futures_trends_2011
 

Recently uploaded

Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
takuyayamamoto1800
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Globus
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
XfilesPro
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
Sharepoint Designs
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
Juraj Vysvader
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
IES VE
 

Recently uploaded (20)

Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
Climate Science Flows: Enabling Petabyte-Scale Climate Analysis with the Eart...
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, BetterWebinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
Webinar: Salesforce Document Management 2.0 - Smarter, Faster, Better
 
Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024Explore Modern SharePoint Templates for 2024
Explore Modern SharePoint Templates for 2024
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
In 2015, I used to write extensions for Joomla, WordPress, phpBB3, etc and I ...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 

The Other 99% of a Data Science Project

Editor's Notes

  1. There are 2 big areas of data science - A for “analyze” and B for “build”. A is product development informed by data. It became adopted pretty widely by now. Having analytics, running A/B tests, doing cohort and funnel analysis became part of the product management culture. The “build” kind of data science is about building smarts into the product itself and this is the kind I want to talk about. Implementing some of this requires machine learning and it is important for product managers to understand the level of complexity of some techniques that apply to their products. However, when machine learning is discussed, too much emphasis is put on the algorithms. More needs to be said about how a smart product gains humans’ trust and make them feel good about using it.
  2. An app that allows you to pay for parking. You fire it up, it shows 3 choices - start a new parking session, see you old sessions. Choose “Start a new session”, go to next screen, there are several options here - select a parking zone. Done. I would not give this a second thought on a desktop.
  3. But when I use this app, I’m late, I hold the phone in one hand and trying to pay for parking while running to the ferry. I am running and fumbling with the phone and thinking - DON’T YOU KNOW ME?! It’s a weekday morning, I am at the parking lot next to the ferry terminal, you have seen me here before. More than once. Just give me one button - PAY NOW. And a small link to all the other features.
  4. Every time a user has this “DON’T YOU KNOW ME?!” moment, it is an opportunity to make a product just a little bit smarter. Smart products convert DONT YOU KNOW ME?! into YOU GET ME! Even when they don’t know my next step exactly, they reduce the search space.
  5. Smarter products - new problems. Complexity goes way beyond the algorithms.
  6. Take Nest smart thermostat - great visual design, easy to install, it is powered by machine learning that learns your preferences. It’s a good product, but even they can’t get it quite right. Got it when we just had our baby. We both like it pretty cool, but my wife felt cold after birth. This is just when Nest was learning. Once it did, for some reason it was very tough for it to adjust. Another thing - when it turns the heater on, there is no indicator is it was a human in the house or the software. I am OK correcting Nest. But not my wife. Making products smarter introduces probabilistic behavior. Because probabilistic behavior feels kind of like life, you start having different expectations. Northern California has some very hot days with cold mornings. On a day like that I would not turn the heater on in the morning. But Nest would. It just knows - get to 68 degrees. But it has no context - something that is easy and intuitive to a human is not easy to software.
  7. Getting the relationship of the user with a smart product right is tricky. Product managers are the best people in a company to get the tradeoffs right. Just like a pm does not have to be developer to manage a software product, she does not have to be a mathematician or a data scientist to manage a data product. But it is necessary to understand some core concepts. I'll use 2 data products to demonstrate some of these necessary concepts.
  8. Here is the second data product. This one is B2B and is working in the background. Directly helps companies like Airbnb, Linkedin, Pinterest with on-demand customer support. When a user submits a support ticket, some of these are sent to Directly which distributes them to a network of expert users that are ready to answer them. If experts resolve a question successfully, they get paid and Directly takes a cut. Otherwise, the experts can reroute the ticket back to the customer’s call center.
  9. When questions are created in the helpdesk how do we find ones that the expert users can (and want) to solve? Initially, we relied on our customers to configure some categories that their users chose when they were filling out the support form. Users are not great about categorizing their issues. We tried keywords. Very cumbersome to manage. We need to pick as many tickets as we can, but not to create too much noise for the experts.
  10. Getting the relationship of the user with a smart product right is tricky. Product managers are the best people in a company to get the tradeoffs right. Just like a pm does not have to be developer to manage a software product, she does not have to be a mathematician or a data scientist to manage a data product. But it is necessary to understand some core concepts. I'll use 2 data products to demonstrate some of these necessary concepts.
  11. Solution: let us look at at ALL your tickets as they come in and a machine learning model will choose which ones will be sent to the expert users. Here is how it works: ….. Explain the image The model is a classifier and it needs examples to learn what a good ticket looks. It can do so from watching how the experts respond to tickets they have seen earlier. If the experts took a ticket and resolved it successfully, it becomes a positive example. If the send the question back or resolve it, but the user reviews their answer negatively, this question becomes a negative example.
  12. ML startups ask companies “give us all your data” I was preparing for a touch conversation. Getting access to more and better data… “Is it a yes?” Think of getting data early, before you need it Legal. Stripping of anything personal. Insist on storing.
  13. Customer success (Account managers) - interested. One of the main metrics they are responsible for is our ticket share- percentage of tickets we are handling at a customer.
  14. ML startups ask companies “give us all your data” I was preparing for a touch conversation. Getting access to more and better data… “Is it a yes?” Think of getting data early, before you need it Legal. Stripping of anything personal. Insist on storing.
  15. The improvements that you can get from cleaning your data are great. The plot of the movie Big Short can be summarized as “guys clean a dataset, get rich”. In case of Jawbone meal logging, the biggest lyft in performance came from realizing that breakfasts are different from other meals. Spinach in the morning was probably a part of omelete. Spinach at lunch was most likely a salad. Sometimes, cleaning your data requires a good understanding of the domain you are working with. Which properties of your data you do and don’t use is to a significant degree a product management decision. For example, different cuisines disagree on what foods are eaten best together. Do you use this knowledge somehow? Depends what you know about your users.
  16. Monica Rogati, who used to be VP of data at Jawbone has this saying:... Yes, you could go much more advanced algorithm, but this simple one can get you pretty far. the biggest improvements were achieved by cleaning the data and understanding it deeply
  17. Account managers - interested
  18. How do we know if a model is good? When "normal software” breaks, it breaks with high visibility. An issue with ML is that it will ALWAYS give you an answer. How we compare models? An obvious metric is accuracy. Basically the percentage of predictions that the algorithm, gets right. However in product is data science this is a very bad metric. This depends on how balanced or unbalanced the classes that you are predicting are. Example: fraud detection, rare disease testing. If 0.1% of transactions are fraudulent, you can create a “very sophisticated” predictive model. When asked “Is this transaction fraudulent?” it will always say “no”. The accuracy of this model will be about 99.9%. Thinking through this is exactly the PM’s job. In this case you don’t need to know the math that underlies the predictive model. How do we QA data products?
  19. How do we know if a model is good? When "normal software” breaks, it breaks with high visibility. An issue with ML is that it will ALWAYS give you an answer. How we compare models? An obvious metric is accuracy. Basically the percentage of predictions that the algorithm, gets right. However in product is data science this is a very bad metric. This depends on how balanced or unbalanced the classes that you are predicting are. Example: fraud detection, rare disease testing. If 0.1% of transactions are fraudulent, you can create a “very sophisticated” predictive model. When asked “Is this transaction fraudulent?” it will always say “no”. The accuracy of this model will be about 99.9%. Thinking through this is exactly the PM’s job. In this case you don’t need to know the math that underlies the predictive model. How do we QA data products?
  20. How do we QA data products? When "normal software” breaks, it breaks with high visibility. An issue with ML is that it will ALWAYS give you an answer. Monitoring in production
  21. Unless you are making the ultimate data product - a make money while you sleep fund runner :) - your system lives in the world and interacts with people. Once the product is out, other people carry the message and you cannot control it. Listen to how an account manager talks about this with a client, how a salesperson talks with a prospect. ML/DS is uniquely susceptible to BS - how to control it?
  22. "Why did you show me ‘french fries’?" Well, because this is the item that is logged together most frequently with burger. "Why you decided that this transaction is fraudulent? Why did you decide that this customer support ticket is resolvable?" the simpler the model the more interpretable it is. When a model is not easily interpreted, but it performs well, it’s your task to manage expectations.