2nd edition
#MLSEV 2
Automating Model Selection
Taking (most of) the work out of model tuning
Charles Parker
VP Algorithms, BigML, Inc
#MLSEV 3
Machine Learning for Machine Learning
#MLSEV 4
Parameter Optimization
• There are lots of algorithms and lots of parameters
• We don’t have time to try even close to everything
• If only we had a way to make a prediction . . .
Did I hear someone say
Machine Learning?
#MLSEV 5
The Allure of ML
“Why don’t we just use
machine learning to predict
the quality of a set of
modeling parameters before
we train a model on them?”
— Every first year ML grad student ever
#MLSEV 6
Bayesian Parameter Optimization
• The performance of an ML algorithm (with associated parameters) is data
dependent
• So: Learn from your previous attempts
• Train a model, then evaluate it
• After you’ve done a number of evaluations, learn a regression model to predict the
performance of future, as-yet-untrained models
• Use this classifier to chose a promising set of “next models” to evaluate
#MLSEV 7
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
#MLSEV 8
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
#MLSEV 9
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
#MLSEV 10
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
#MLSEV 11
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
#MLSEV 12
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
Machine Learning!
parameters ⟶ performance
#MLSEV 13
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
Bayesian Parameter Optimization
0.75
0.56
0.92
Machine Learning!
parameters ⟶ performance
#MLSEV 14
Wow, Magic!
• So all of my problems are solved, right?
NO NO NO
• First, you’re selecting a model based on
held out data so you have to have
enough data to do an accurate model
selection
• Second, there are still important
remaining issues and possible ways to
screw up
#MLSEV 15
Remaining Issue #1: Metric Choice
#MLSEV 16
Driving The Search
• So how do we measure the peformance of
each model, to figure out what to do next?
• If we choose the wrong metric, we’ll get
models that are the best at something that we
don’t really care about
• But there are so many metrics! How do we
choose the right one?
• Hmmmm, all of this sounds awfully familiar . . .
#MLSEV 17
Flashback #1
TP + TN
Total
• “Percentage correct” - like an exam
• If Accuracy = 1 then no mistakes
• If Accuracy = 0 then all mistakes
• Intuitive but not always useful
• Watch out for unbalanced classes!
• Remember, only 1 in 1000 have the disease
• A silly model which always predicts “well” is 99.9% accurate
#MLSEV 18
A Metric Selection Flowchart
Will you bother about
threshold setting?
Is your dataset
imbalanced?
Is yours a
“ranking” problem?
Do you care
more about the top-
ranked instances?
Phi coefficient
f-mesure Accuracy
Max. Phi
KS-statistic
Area Under the ROC / PR curve
Kendall’s Tau
Spearman’s Rho
Yes
Yes
Yes
No
No
No
Yes
No
#MLSEV 19
Ranking Problems
Medical Diagnosis (no) vs. Stock Picking (yes)
#MLSEV 20
Remaining Issue #2: Holdout Choice
#MLSEV 21
Is Cross-Validation Right for You?
• Cross-validation is a good tool some of the time
• Many Other times, it is disastrously bad (overly optimistic)
• This is why BigML offers the option for a specific holdout set.
• Should you use it?
#MLSEV 22
Flashback #2
• Okay, so I’m not testing on the training
data, so I’m good, right? NO NO NO
• You also have to worry about information
leakage between training and test data.
• What is this? Let’s try to predict the daily
closing price of the stock market
• What happens if you hold out 10 random
days from your dataset?
• What if you hold out the last 10 days?
#MLSEV 23
Flashback #3
• This is common when you have time-distributed
data, but can also happen in other instances:
• Let’s say we have a dataset of 10,000 pictures
from 20 people, each labeled with the year it which
it was taken
• We want to predict the year from the image
• What happens if we hold out random data?
• Solution: Hold out users instead
#MLSEV 24
Again, Take Care!
• These situations are very common in all cases
where data comes in groups (days, users, etc.)
• The solution is to hold out whole groups of data
• It’s possible that it isn’t a problem in your
dataset, but when in doubt, try both!
#MLSEV 25
Remaining Issue #3: Model Choice?
#MLSEV 26
Which Model is Best?
• Performance isn’t the only issue!
• Retraining: Will the amount of data you have be different in the future?
• Fit stability: How confident must you be that the model’s behavior is
invariant to small data changes?
• Prediction speed: The difference can be orders of magnitude
#MLSEV 27
Flashback #4
Amount of data required Linear models < trees, ensembles < deep learning
Potential to overfit Linear models < ensembles < trees, deep learning
Speed Linear models, trees < ensembles < deep learning
Representational Power Linear models < trees < ensembles < deep learning
• How much data do you have
• How fast do you need things to go?
• How much performance do you really need?
#MLSEV 28
Modeling Tradeoffs
Interpretability vs. Representability
Weak vs. Slow
Confidence vs. Performance
Biased vs. Data-hungry
Simple
(Logistic)
Complex
(Deepnets)
#MLSEV 29
Summary
• We can do some simple tricks and use
machine learning to help us search
through the space of possible models
• Even with this however, there is still
lots of work domain expert
• Automated model selection relies on
data. If you don’t have enough, it will
go poorly!
MLSEV Virtual. Automating Model Selection

MLSEV Virtual. Automating Model Selection

  • 1.
  • 2.
    #MLSEV 2 Automating ModelSelection Taking (most of) the work out of model tuning Charles Parker VP Algorithms, BigML, Inc
  • 3.
    #MLSEV 3 Machine Learningfor Machine Learning
  • 4.
    #MLSEV 4 Parameter Optimization •There are lots of algorithms and lots of parameters • We don’t have time to try even close to everything • If only we had a way to make a prediction . . . Did I hear someone say Machine Learning?
  • 5.
    #MLSEV 5 The Allureof ML “Why don’t we just use machine learning to predict the quality of a set of modeling parameters before we train a model on them?” — Every first year ML grad student ever
  • 6.
    #MLSEV 6 Bayesian ParameterOptimization • The performance of an ML algorithm (with associated parameters) is data dependent • So: Learn from your previous attempts • Train a model, then evaluate it • After you’ve done a number of evaluations, learn a regression model to predict the performance of future, as-yet-untrained models • Use this classifier to chose a promising set of “next models” to evaluate
  • 7.
    #MLSEV 7 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization
  • 8.
    #MLSEV 8 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75
  • 9.
    #MLSEV 9 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56
  • 10.
    #MLSEV 10 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92
  • 11.
    #MLSEV 11 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92
  • 12.
    #MLSEV 12 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92 Machine Learning! parameters ⟶ performance
  • 13.
    #MLSEV 13 Model and EvaluateParameters1 Parameters 2 Parameters 3 Parameters 4 Parameters 5 Parameters 6 Bayesian Parameter Optimization 0.75 0.56 0.92 Machine Learning! parameters ⟶ performance
  • 14.
    #MLSEV 14 Wow, Magic! •So all of my problems are solved, right? NO NO NO • First, you’re selecting a model based on held out data so you have to have enough data to do an accurate model selection • Second, there are still important remaining issues and possible ways to screw up
  • 15.
    #MLSEV 15 Remaining Issue#1: Metric Choice
  • 16.
    #MLSEV 16 Driving TheSearch • So how do we measure the peformance of each model, to figure out what to do next? • If we choose the wrong metric, we’ll get models that are the best at something that we don’t really care about • But there are so many metrics! How do we choose the right one? • Hmmmm, all of this sounds awfully familiar . . .
  • 17.
    #MLSEV 17 Flashback #1 TP+ TN Total • “Percentage correct” - like an exam • If Accuracy = 1 then no mistakes • If Accuracy = 0 then all mistakes • Intuitive but not always useful • Watch out for unbalanced classes! • Remember, only 1 in 1000 have the disease • A silly model which always predicts “well” is 99.9% accurate
  • 18.
    #MLSEV 18 A MetricSelection Flowchart Will you bother about threshold setting? Is your dataset imbalanced? Is yours a “ranking” problem? Do you care more about the top- ranked instances? Phi coefficient f-mesure Accuracy Max. Phi KS-statistic Area Under the ROC / PR curve Kendall’s Tau Spearman’s Rho Yes Yes Yes No No No Yes No
  • 19.
    #MLSEV 19 Ranking Problems MedicalDiagnosis (no) vs. Stock Picking (yes)
  • 20.
    #MLSEV 20 Remaining Issue#2: Holdout Choice
  • 21.
    #MLSEV 21 Is Cross-ValidationRight for You? • Cross-validation is a good tool some of the time • Many Other times, it is disastrously bad (overly optimistic) • This is why BigML offers the option for a specific holdout set. • Should you use it?
  • 22.
    #MLSEV 22 Flashback #2 •Okay, so I’m not testing on the training data, so I’m good, right? NO NO NO • You also have to worry about information leakage between training and test data. • What is this? Let’s try to predict the daily closing price of the stock market • What happens if you hold out 10 random days from your dataset? • What if you hold out the last 10 days?
  • 23.
    #MLSEV 23 Flashback #3 •This is common when you have time-distributed data, but can also happen in other instances: • Let’s say we have a dataset of 10,000 pictures from 20 people, each labeled with the year it which it was taken • We want to predict the year from the image • What happens if we hold out random data? • Solution: Hold out users instead
  • 24.
    #MLSEV 24 Again, TakeCare! • These situations are very common in all cases where data comes in groups (days, users, etc.) • The solution is to hold out whole groups of data • It’s possible that it isn’t a problem in your dataset, but when in doubt, try both!
  • 25.
    #MLSEV 25 Remaining Issue#3: Model Choice?
  • 26.
    #MLSEV 26 Which Modelis Best? • Performance isn’t the only issue! • Retraining: Will the amount of data you have be different in the future? • Fit stability: How confident must you be that the model’s behavior is invariant to small data changes? • Prediction speed: The difference can be orders of magnitude
  • 27.
    #MLSEV 27 Flashback #4 Amountof data required Linear models < trees, ensembles < deep learning Potential to overfit Linear models < ensembles < trees, deep learning Speed Linear models, trees < ensembles < deep learning Representational Power Linear models < trees < ensembles < deep learning • How much data do you have • How fast do you need things to go? • How much performance do you really need?
  • 28.
    #MLSEV 28 Modeling Tradeoffs Interpretabilityvs. Representability Weak vs. Slow Confidence vs. Performance Biased vs. Data-hungry Simple (Logistic) Complex (Deepnets)
  • 29.
    #MLSEV 29 Summary • Wecan do some simple tricks and use machine learning to help us search through the space of possible models • Even with this however, there is still lots of work domain expert • Automated model selection relies on data. If you don’t have enough, it will go poorly!