SlideShare a Scribd company logo
1 of 35
Evaluating Machine
Learning Models
– A Beginner’s Guide
Alice Zheng, Dato
September 15, 2015
1
2
My machine learning trajectory
Applied machine learning
(Data science)
Build ML tools
Shortage of experts
and good tools.
3
Why machine learning?
Model data.
Make predictions.
Build intelligent
applications.
4
Machine learning pipeline
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, his
soft furry paws, …
Raw data
Features
Models
Predictions
Deploy in
production
GraphLab Create
Dato Distributed
Dato Predictive Services
The ML Jargon Challenge
Typical machine learning paper
6
… semi-supervised model for with large-scale
learning from sparse data … sub-modular
optimization for distributed computation…
evaluated on real and synthetic datasets…
performance exceeds start-of-the-art methods
What it looks like to ML researchers
7
Such regularize!
Much optimal
So sparsity
Wow!
Amaze
Very scale
What it looks like to normal people
8
What it’s like in practice
9
Doesn’t scale
Brittle
?
Hard to tune
?
?
Doesn’t solve my problem on my data
Achieve Machine Learning Zen
10
11
Why is evaluation important?
• So you know when you’ve succeeded
• So you know how much you’ve succeeded
• So you can decide when to stop
• So you can decide when to update the model
11
12
Basic questions for evaluation
• When to evaluate?
• What metric to use?
• On what data?
12
13
When to evaluate
Online
evaluation
Historical
data
Online
evaluation
results
Offline
evaluation
Live
data
Offline
evaluation
on live data
Prototype
model
Training
results
Validation
results
Deployed model
Evaluation Metrics
15
Types of evaluation metric
• Training metric
• Validation metric
• Tracking metric
• Business metric
“But they may
not match!”
Uh-oh Penguin
16
Example: recommender system
• Given data on which users liked which items, recommend
other items to users
• Training metric
- How well is it predicting the preference score?
- Residual mean squared error: (actual – predicted)2
• Validation metric
- Does it rank known preferences correctly?
- Ranking loss
17
Example: recommender system
• Tracking metric
- Does it rank items correctly, especially for top items?
- Normalized Discounted Cumulative Gain (NDCG)
• Business metric
- Does it increase the amount of time the user spends on the
site/service?
18
Dealing with metrics
• Many possible metrics at different
stages
• Defining the right metric is an art
- What’s useful? What’s feasible?
• Aligning the metrics will make
everyone happier
- Not always possible: cannot directly
train model to optimize for user
engagement
“Do the
best you
can!”
Okedokey Donkey
Model Selection and Tuning
Model Selection and Tuning
Historical
data
Hyperparameter
tuning
Training
data
Validation
data
Model
training
Model
Training
results
Validation
results
21
Key questions for model selection
• What’s validation?
• What’s a hyperparameter and how do you tune it?
21
22
Model validation
• Measure generalization error
- How well the model works on new data
- “New” data = data not used during training
• Train on one dataset, validate on another
• Where to find “new” data for validation?
- Clever re-use of old data
23
Methods for simulating new data
Hold-out validation
Data
Training Validation
K-fold cross validation
Data
1 2 3 K…
Validation set
Bootstrap resampling
Data
Resampled dataset
24
Hyperparameter tuning vs. model training
Best model
parameters
Best
hyperparameters
Hyperparameter
tuning
Model
training
25
Hyperparameters != model parameters
Feature 2
Feature 1
Classification between two classes
Model parameter
Hyperparameter:
How many features to use
26
Why is hyperparameter tuning hard?
• Involves model training as a sub-process
- Can’t optimize directly
• Methods:
- Grid search
- Random search
- Smart search
• Gaussian processes/Bayesian optimization
• Random forests
• Derivative-free optimization
• Genetic algorithms
Online Evaluations
28
ML in production - 101
Model
Historical
Data
Predictions
Live
Data
Feedback
Batch training Real-time predictions
29
ML in production - 101
Model
Historical
Data
Real-time predictionsBatch training
Predictions
Model 2
Live
Data
30
Why evaluate models online?
• Track real performance of model over time
• Decide which model to use when
31
Choosing between ML models
Model 2
Model 1
2000 visits
10% CTR
Group A
Everybody gets
Model 2
2000 visits
30% CTR
Group B
Strategy 1: A/B testing—select the best model and use it all the time
32
Choosing between ML models
A statistician walks into a casino…
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Play this 85% of the
time
Play this 10% of the
time
Play this 5% of the
time
Multi-armed
bandits
33
Choosing between ML models
A statistician walks into an ML production environment
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Use this 85% of the
time
(Exploitation)
Use this 10% of the
time
(Exploration)
Use this 5% of the
time
(Exploration)
Model 1 Model 2 Model 3
34
MAB vs. A/B testing
Why MAB?
• Continuous optimization, “set and forget”
• Maximize overall reward
Why A/B test?
• Simple to understand
• Single winner
• Tricky to do right
35
That’s not all, folks!
Read the details
• Blog posts: http://blog.dato.com/topic/machine-learning-
primer
• Report: http://oreil.ly/1L7dS4a
• Dato is hiring! jobs@dato.com
alicez@dato.com @RainyData

More Related Content

Viewers also liked

An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliMDO_Lab
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationThomas Ploetz
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksSpark Summit
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning modelsandosa
 
Introduction & EHR Benefits Realization
Introduction & EHR Benefits RealizationIntroduction & EHR Benefits Realization
Introduction & EHR Benefits RealizationDave Shiple
 
SMS versus Cell Broadcast in Wireless Emergency Alerts
SMS versus Cell Broadcast in Wireless Emergency AlertsSMS versus Cell Broadcast in Wireless Emergency Alerts
SMS versus Cell Broadcast in Wireless Emergency AlertsManuel Cornelisse
 
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learningapsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learningandi1400
 
Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationBrian Lange
 
Enterprise mHealth Strategy
Enterprise mHealth StrategyEnterprise mHealth Strategy
Enterprise mHealth StrategyDave Shiple
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IMachine Learning Valencia
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityPier Luca Lanzi
 
Challenges of (Lean) Enterprise Product Management
Challenges of (Lean) Enterprise  Product ManagementChallenges of (Lean) Enterprise  Product Management
Challenges of (Lean) Enterprise Product ManagementRich Mironov
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBigML, Inc
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Md. Main Uddin Rony
 

Viewers also liked (18)

An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
PEMF2_SDM_2012_Ali
PEMF2_SDM_2012_AliPEMF2_SDM_2012_Ali
PEMF2_SDM_2012_Ali
 
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- EvaluationBridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
Bridging the Gap: Machine Learning for Ubiquitous Computing -- Evaluation
 
Machine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - DatabricksMachine Learning Pipelines - Joseph Bradley - Databricks
Machine Learning Pipelines - Joseph Bradley - Databricks
 
Interpreting machine learning models
Interpreting machine learning modelsInterpreting machine learning models
Interpreting machine learning models
 
Introduction & EHR Benefits Realization
Introduction & EHR Benefits RealizationIntroduction & EHR Benefits Realization
Introduction & EHR Benefits Realization
 
SMS versus Cell Broadcast in Wireless Emergency Alerts
SMS versus Cell Broadcast in Wireless Emergency AlertsSMS versus Cell Broadcast in Wireless Emergency Alerts
SMS versus Cell Broadcast in Wireless Emergency Alerts
 
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learningapsis - Automatic Hyperparameter Optimization Framework for Machine Learning
apsis - Automatic Hyperparameter Optimization Framework for Machine Learning
 
Machine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— ClassificationMachine Learning in 5 Minutes— Classification
Machine Learning in 5 Minutes— Classification
 
The Value Proposition Canvas
The Value Proposition CanvasThe Value Proposition Canvas
The Value Proposition Canvas
 
Enterprise mHealth Strategy
Enterprise mHealth StrategyEnterprise mHealth Strategy
Enterprise mHealth Strategy
 
L2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms IL2. Evaluating Machine Learning Algorithms I
L2. Evaluating Machine Learning Algorithms I
 
Machine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and CredibilityMachine Learning and Data Mining: 14 Evaluation and Credibility
Machine Learning and Data Mining: 14 Evaluation and Credibility
 
Challenges of (Lean) Enterprise Product Management
Challenges of (Lean) Enterprise  Product ManagementChallenges of (Lean) Enterprise  Product Management
Challenges of (Lean) Enterprise Product Management
 
BSSML16 L7. Feature Engineering
BSSML16 L7. Feature EngineeringBSSML16 L7. Feature Engineering
BSSML16 L7. Feature Engineering
 
L1. State of the Art in Machine Learning
L1. State of the Art in Machine LearningL1. State of the Art in Machine Learning
L1. State of the Art in Machine Learning
 
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
Data Analysis: Evaluation Metrics for Supervised Learning Models of Machine L...
 

Recently uploaded

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxhumanexperienceaaa
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxupamatechverse
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 

Recently uploaded (20)

the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptxthe ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
the ladakh protest in leh ladakh 2024 sonam wangchuk.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Introduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptxIntroduction to IEEE STANDARDS and its different types.pptx
Introduction to IEEE STANDARDS and its different types.pptx
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 

Evaluating Machine Learning Models -- A Beginner's Guide

  • 1. Evaluating Machine Learning Models – A Beginner’s Guide Alice Zheng, Dato September 15, 2015 1
  • 2. 2 My machine learning trajectory Applied machine learning (Data science) Build ML tools Shortage of experts and good tools.
  • 3. 3 Why machine learning? Model data. Make predictions. Build intelligent applications.
  • 4. 4 Machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features Models Predictions Deploy in production GraphLab Create Dato Distributed Dato Predictive Services
  • 5. The ML Jargon Challenge
  • 6. Typical machine learning paper 6 … semi-supervised model for with large-scale learning from sparse data … sub-modular optimization for distributed computation… evaluated on real and synthetic datasets… performance exceeds start-of-the-art methods
  • 7. What it looks like to ML researchers 7 Such regularize! Much optimal So sparsity Wow! Amaze Very scale
  • 8. What it looks like to normal people 8
  • 9. What it’s like in practice 9 Doesn’t scale Brittle ? Hard to tune ? ? Doesn’t solve my problem on my data
  • 11. 11 Why is evaluation important? • So you know when you’ve succeeded • So you know how much you’ve succeeded • So you can decide when to stop • So you can decide when to update the model 11
  • 12. 12 Basic questions for evaluation • When to evaluate? • What metric to use? • On what data? 12
  • 15. 15 Types of evaluation metric • Training metric • Validation metric • Tracking metric • Business metric “But they may not match!” Uh-oh Penguin
  • 16. 16 Example: recommender system • Given data on which users liked which items, recommend other items to users • Training metric - How well is it predicting the preference score? - Residual mean squared error: (actual – predicted)2 • Validation metric - Does it rank known preferences correctly? - Ranking loss
  • 17. 17 Example: recommender system • Tracking metric - Does it rank items correctly, especially for top items? - Normalized Discounted Cumulative Gain (NDCG) • Business metric - Does it increase the amount of time the user spends on the site/service?
  • 18. 18 Dealing with metrics • Many possible metrics at different stages • Defining the right metric is an art - What’s useful? What’s feasible? • Aligning the metrics will make everyone happier - Not always possible: cannot directly train model to optimize for user engagement “Do the best you can!” Okedokey Donkey
  • 20. Model Selection and Tuning Historical data Hyperparameter tuning Training data Validation data Model training Model Training results Validation results
  • 21. 21 Key questions for model selection • What’s validation? • What’s a hyperparameter and how do you tune it? 21
  • 22. 22 Model validation • Measure generalization error - How well the model works on new data - “New” data = data not used during training • Train on one dataset, validate on another • Where to find “new” data for validation? - Clever re-use of old data
  • 23. 23 Methods for simulating new data Hold-out validation Data Training Validation K-fold cross validation Data 1 2 3 K… Validation set Bootstrap resampling Data Resampled dataset
  • 24. 24 Hyperparameter tuning vs. model training Best model parameters Best hyperparameters Hyperparameter tuning Model training
  • 25. 25 Hyperparameters != model parameters Feature 2 Feature 1 Classification between two classes Model parameter Hyperparameter: How many features to use
  • 26. 26 Why is hyperparameter tuning hard? • Involves model training as a sub-process - Can’t optimize directly • Methods: - Grid search - Random search - Smart search • Gaussian processes/Bayesian optimization • Random forests • Derivative-free optimization • Genetic algorithms
  • 28. 28 ML in production - 101 Model Historical Data Predictions Live Data Feedback Batch training Real-time predictions
  • 29. 29 ML in production - 101 Model Historical Data Real-time predictionsBatch training Predictions Model 2 Live Data
  • 30. 30 Why evaluate models online? • Track real performance of model over time • Decide which model to use when
  • 31. 31 Choosing between ML models Model 2 Model 1 2000 visits 10% CTR Group A Everybody gets Model 2 2000 visits 30% CTR Group B Strategy 1: A/B testing—select the best model and use it all the time
  • 32. 32 Choosing between ML models A statistician walks into a casino… Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500 Play this 85% of the time Play this 10% of the time Play this 5% of the time Multi-armed bandits
  • 33. 33 Choosing between ML models A statistician walks into an ML production environment Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500 Use this 85% of the time (Exploitation) Use this 10% of the time (Exploration) Use this 5% of the time (Exploration) Model 1 Model 2 Model 3
  • 34. 34 MAB vs. A/B testing Why MAB? • Continuous optimization, “set and forget” • Maximize overall reward Why A/B test? • Simple to understand • Single winner • Tricky to do right
  • 35. 35 That’s not all, folks! Read the details • Blog posts: http://blog.dato.com/topic/machine-learning- primer • Report: http://oreil.ly/1L7dS4a • Dato is hiring! jobs@dato.com alicez@dato.com @RainyData

Editor's Notes

  1. Features sit between raw data and model. They can make or break an application.