Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Evaluating Machine
Learning Models
– A Beginner’s Guide
Alice Zheng, Dato
September 15, 2015
1
2
My machine learning trajectory
Applied machine learning
(Data science)
Build ML tools
Shortage of experts
and good tools.
3
Why machine learning?
Model data.
Make predictions.
Build intelligent
applications.
4
Machine learning pipeline
I fell in love the instant I laid
my eyes on that puppy. His
big eyes and playful tail, his
so...
The ML Jargon Challenge
Typical machine learning paper
6
… semi-supervised model for with large-scale
learning from sparse data … sub-modular
opti...
What it looks like to ML researchers
7
Such regularize!
Much optimal
So sparsity
Wow!
Amaze
Very scale
What it looks like to normal people
8
What it’s like in practice
9
Doesn’t scale
Brittle
?
Hard to tune
?
?
Doesn’t solve my problem on my data
Achieve Machine Learning Zen
10
11
Why is evaluation important?
• So you know when you’ve succeeded
• So you know how much you’ve succeeded
• So you can d...
12
Basic questions for evaluation
• When to evaluate?
• What metric to use?
• On what data?
12
13
When to evaluate
Online
evaluation
Historical
data
Online
evaluation
results
Offline
evaluation
Live
data
Offline
evalu...
Evaluation Metrics
15
Types of evaluation metric
• Training metric
• Validation metric
• Tracking metric
• Business metric
“But they may
not ...
16
Example: recommender system
• Given data on which users liked which items, recommend
other items to users
• Training me...
17
Example: recommender system
• Tracking metric
- Does it rank items correctly, especially for top items?
- Normalized Di...
18
Dealing with metrics
• Many possible metrics at different
stages
• Defining the right metric is an art
- What’s useful?...
Model Selection and Tuning
Model Selection and Tuning
Historical
data
Hyperparameter
tuning
Training
data
Validation
data
Model
training
Model
Traini...
21
Key questions for model selection
• What’s validation?
• What’s a hyperparameter and how do you tune it?
21
22
Model validation
• Measure generalization error
- How well the model works on new data
- “New” data = data not used dur...
23
Methods for simulating new data
Hold-out validation
Data
Training Validation
K-fold cross validation
Data
1 2 3 K…
Vali...
24
Hyperparameter tuning vs. model training
Best model
parameters
Best
hyperparameters
Hyperparameter
tuning
Model
training
25
Hyperparameters != model parameters
Feature 2
Feature 1
Classification between two classes
Model parameter
Hyperparamet...
26
Why is hyperparameter tuning hard?
• Involves model training as a sub-process
- Can’t optimize directly
• Methods:
- Gr...
Online Evaluations
28
ML in production - 101
Model
Historical
Data
Predictions
Live
Data
Feedback
Batch training Real-time predictions
29
ML in production - 101
Model
Historical
Data
Real-time predictionsBatch training
Predictions
Model 2
Live
Data
30
Why evaluate models online?
• Track real performance of model over time
• Decide which model to use when
31
Choosing between ML models
Model 2
Model 1
2000 visits
10% CTR
Group A
Everybody gets
Model 2
2000 visits
30% CTR
Group...
32
Choosing between ML models
A statistician walks into a casino…
Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500
Play th...
33
Choosing between ML models
A statistician walks into an ML production environment
Pay-off $1:$1000 Pay-off $1:$200 Pay-...
34
MAB vs. A/B testing
Why MAB?
• Continuous optimization, “set and forget”
• Maximize overall reward
Why A/B test?
• Simp...
35
That’s not all, folks!
Read the details
• Blog posts: http://blog.dato.com/topic/machine-learning-
primer
• Report: htt...
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
The How and Why of Feature Engineering
Next
Upcoming SlideShare
The How and Why of Feature Engineering
Next
Download to read offline and view in fullscreen.

Share

Evaluating Machine Learning Models -- A Beginner's Guide

Download to read offline

This is an overview on evaluating machine learning models: when, how, metrics, datasets, methods. Topics include metrics, validation, hyperparameter tuning, A/B testing, and multi-armed bandits. It's a summary of my short report on the topic: http://oreil.ly/1LkP2tn.

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Evaluating Machine Learning Models -- A Beginner's Guide

  1. 1. Evaluating Machine Learning Models – A Beginner’s Guide Alice Zheng, Dato September 15, 2015 1
  2. 2. 2 My machine learning trajectory Applied machine learning (Data science) Build ML tools Shortage of experts and good tools.
  3. 3. 3 Why machine learning? Model data. Make predictions. Build intelligent applications.
  4. 4. 4 Machine learning pipeline I fell in love the instant I laid my eyes on that puppy. His big eyes and playful tail, his soft furry paws, … Raw data Features Models Predictions Deploy in production GraphLab Create Dato Distributed Dato Predictive Services
  5. 5. The ML Jargon Challenge
  6. 6. Typical machine learning paper 6 … semi-supervised model for with large-scale learning from sparse data … sub-modular optimization for distributed computation… evaluated on real and synthetic datasets… performance exceeds start-of-the-art methods
  7. 7. What it looks like to ML researchers 7 Such regularize! Much optimal So sparsity Wow! Amaze Very scale
  8. 8. What it looks like to normal people 8
  9. 9. What it’s like in practice 9 Doesn’t scale Brittle ? Hard to tune ? ? Doesn’t solve my problem on my data
  10. 10. Achieve Machine Learning Zen 10
  11. 11. 11 Why is evaluation important? • So you know when you’ve succeeded • So you know how much you’ve succeeded • So you can decide when to stop • So you can decide when to update the model 11
  12. 12. 12 Basic questions for evaluation • When to evaluate? • What metric to use? • On what data? 12
  13. 13. 13 When to evaluate Online evaluation Historical data Online evaluation results Offline evaluation Live data Offline evaluation on live data Prototype model Training results Validation results Deployed model
  14. 14. Evaluation Metrics
  15. 15. 15 Types of evaluation metric • Training metric • Validation metric • Tracking metric • Business metric “But they may not match!” Uh-oh Penguin
  16. 16. 16 Example: recommender system • Given data on which users liked which items, recommend other items to users • Training metric - How well is it predicting the preference score? - Residual mean squared error: (actual – predicted)2 • Validation metric - Does it rank known preferences correctly? - Ranking loss
  17. 17. 17 Example: recommender system • Tracking metric - Does it rank items correctly, especially for top items? - Normalized Discounted Cumulative Gain (NDCG) • Business metric - Does it increase the amount of time the user spends on the site/service?
  18. 18. 18 Dealing with metrics • Many possible metrics at different stages • Defining the right metric is an art - What’s useful? What’s feasible? • Aligning the metrics will make everyone happier - Not always possible: cannot directly train model to optimize for user engagement “Do the best you can!” Okedokey Donkey
  19. 19. Model Selection and Tuning
  20. 20. Model Selection and Tuning Historical data Hyperparameter tuning Training data Validation data Model training Model Training results Validation results
  21. 21. 21 Key questions for model selection • What’s validation? • What’s a hyperparameter and how do you tune it? 21
  22. 22. 22 Model validation • Measure generalization error - How well the model works on new data - “New” data = data not used during training • Train on one dataset, validate on another • Where to find “new” data for validation? - Clever re-use of old data
  23. 23. 23 Methods for simulating new data Hold-out validation Data Training Validation K-fold cross validation Data 1 2 3 K… Validation set Bootstrap resampling Data Resampled dataset
  24. 24. 24 Hyperparameter tuning vs. model training Best model parameters Best hyperparameters Hyperparameter tuning Model training
  25. 25. 25 Hyperparameters != model parameters Feature 2 Feature 1 Classification between two classes Model parameter Hyperparameter: How many features to use
  26. 26. 26 Why is hyperparameter tuning hard? • Involves model training as a sub-process - Can’t optimize directly • Methods: - Grid search - Random search - Smart search • Gaussian processes/Bayesian optimization • Random forests • Derivative-free optimization • Genetic algorithms
  27. 27. Online Evaluations
  28. 28. 28 ML in production - 101 Model Historical Data Predictions Live Data Feedback Batch training Real-time predictions
  29. 29. 29 ML in production - 101 Model Historical Data Real-time predictionsBatch training Predictions Model 2 Live Data
  30. 30. 30 Why evaluate models online? • Track real performance of model over time • Decide which model to use when
  31. 31. 31 Choosing between ML models Model 2 Model 1 2000 visits 10% CTR Group A Everybody gets Model 2 2000 visits 30% CTR Group B Strategy 1: A/B testing—select the best model and use it all the time
  32. 32. 32 Choosing between ML models A statistician walks into a casino… Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500 Play this 85% of the time Play this 10% of the time Play this 5% of the time Multi-armed bandits
  33. 33. 33 Choosing between ML models A statistician walks into an ML production environment Pay-off $1:$1000 Pay-off $1:$200 Pay-off $1:$500 Use this 85% of the time (Exploitation) Use this 10% of the time (Exploration) Use this 5% of the time (Exploration) Model 1 Model 2 Model 3
  34. 34. 34 MAB vs. A/B testing Why MAB? • Continuous optimization, “set and forget” • Maximize overall reward Why A/B test? • Simple to understand • Single winner • Tricky to do right
  35. 35. 35 That’s not all, folks! Read the details • Blog posts: http://blog.dato.com/topic/machine-learning- primer • Report: http://oreil.ly/1L7dS4a • Dato is hiring! jobs@dato.com alicez@dato.com @RainyData
  • CleiseMarciaMurbachT

    Jun. 27, 2021
  • sirczpp

    Sep. 8, 2020
  • SmrutiranjanSahu

    May. 14, 2020
  • safibaig

    Jan. 9, 2020
  • jainnnitk

    Nov. 4, 2019
  • tzhang

    May. 9, 2019
  • kabjinkwon

    Feb. 11, 2018
  • boylook

    Dec. 5, 2017
  • whizfromberkeley

    Jun. 20, 2017
  • gianvitosiciliano

    Feb. 18, 2017
  • SunnyShekhar2

    Jan. 26, 2017
  • linekin

    Dec. 20, 2016
  • cunniet1

    Oct. 8, 2016
  • teju_sanghavi

    Oct. 2, 2016
  • saurabhkaushikin

    Feb. 2, 2016
  • kijunson2

    Dec. 1, 2015
  • kijunglee

    Sep. 19, 2015
  • vamshireddy9

    Sep. 18, 2015
  • RyanChristensen6

    Sep. 17, 2015

This is an overview on evaluating machine learning models: when, how, metrics, datasets, methods. Topics include metrics, validation, hyperparameter tuning, A/B testing, and multi-armed bandits. It's a summary of my short report on the topic: http://oreil.ly/1LkP2tn.

Views

Total views

13,225

On Slideshare

0

From embeds

0

Number of embeds

294

Actions

Downloads

604

Shares

0

Comments

0

Likes

19

×