VSSML18. OptiML and Fusions

Valencian Summer School in Machine Learning
4rd edition
September 13-14, 2018

BigML, Inc 2
OptiML
Hands-Free Parameter Tuning
Charles Parker
VP Algorithms, BigML, Inc

BigML, Inc 3
Parameter Optimization
• There are lots of algorithms and lots of parameters
• We don’t have time to try even close to everything
• If only we had a way to make a prediction . . .
Did I hear someone say
Machine Learning?

BigML, Inc 4
The Allure of ML
“Why don’t we just use
machine learning to predict
the quality of a set of
modeling parameters before
we train a model on them?”
— Every ﬁrst year ML grad student ever

BigML, Inc 5
In This Talk
• Technology Review
• Metric Selection
• The Dangers of Naive Cross-validation
• Selecting the “Best” Model
• Caveat Emptor!

BigML, Inc 6
• The performance of an ML algorithm (with associated parameters) is
data dependent
• So: Learn from your previous attempts
• Train a model, then evaluate it
• After you’ve done a number of evaluations, learn a regression model
to predict the performance of future, as-yet-untrained models
• Use this classiﬁer to chose a promising set of “next models”
• Sound Familiar?
Bayesian Parameter Optimization

BigML, Inc 7
Model and
EvaluateParameters 1
Parameters 2
Parameters 3
Parameters 4
Parameters 5
Parameters 6
0.75
0.56
0.92
Machine Learning!
parameters ⟶ performance
Bayesian Parameter Optimization

BigML, Inc 8
Some Other Tricks
• Use metalearning to select a good set of initial candidates
• Cross-validation is expensive, and there’s no reason to do
it for models with terrible performance; stop early in these
cases

BigML, Inc 9Xxxxxx
Metric Selection

BigML, Inc 10
A Metric Selection Flowchart
Will you
bother about
threshold setting?
Is your dataset
imbalanced?
Is yours a
“ranking” problem?
Do you
care more about
the top-ranked
instances?
Phi coefﬁcient
f-mesure Accuracy
Max. Phi
KS-statistic
Area Under the ROC / PR curve
Kendall’s Tau
Spearman’s Rho
Yes
Yes
Yes
No
No
No
Yes
No

BigML, Inc 11
Ranking Problems
Medical Diagnosis (no) vs. Stock Picking (yes)

BigML, Inc 12
Top-heavy Importance
Draft-Style Selections (no) vs. Customer Churn (yes)

BigML, Inc 13Xxxxxx
The Dangers of Naive Cross-validation

BigML, Inc 14
Is Cross-Validation Right for You?
• Cross-validation is a good tool some of the time
• Many Other times, it is disastrously bad
• Overly optimistic
• False confidence in results
• This is why we offer the option for a specific holdout set

BigML, Inc 15
Case #1: Market Direction
• Suppose you want to predict the direction of the stock market, or any particular
stock (Disclaimer: this is hard)
• You have information for that market for each minute of each day
• But minutes next to each other are dramatically correlated in both the input and
objective ﬁeld
• So if you have the answer for one minute, you can trivially predict the rest!
• Cross-validation will tell you your classiﬁer is near-perfect!

BigML, Inc 16
Case #2: Photo Age Prediction
• Suppose you want to predict the age of a printed photograph (based on dye-
fade, paper watermarks, the presence and type of border, etc.)
• Your training set: A few thousand photos from a few dozen people
• But the age of one person’s photos are correlated in both the input and output
spaces! (same age, camera, storage conditions, etc.)
• So you can trivially do well predicting the age of some of one person’s photos if
you know the ages of the rest
• Cross-validation will tell you your classiﬁer is near perfect!

BigML, Inc 17
Take Care!
• These situations are very common in all
cases where data comes in batches
(days, users, etc.)
• The solution is to hold out whole batches
of data (e.g., a speciﬁc test set) rather
than just random points from each one
(as in cross-validation)
• It’s possible that it isn’t a problem in your
dataset, but when in doubt, try both!

BigML, Inc 18Xxxxxx
Selecting the “Best” Model

BigML, Inc 19Xxxxxx
Which Model is Best?
• Performance isn’t the only issue!
• Retraining: Will the amount of data you have be different in the future?
• Fit stability: How confident must you be that the model’s behavior is invariant
to small data changes?
• Prediction speed: The difference can be orders of magnitude

BigML, Inc 20Xxxxxx
Modeling Tradeoﬀs
Interpretability vs. Representability
Weak vs. Slow
Confidence vs. Performance
Biased vs. Data-hungry
Simple
(Logistic)
Complex
(Deepnets)

BigML, Inc 21Xxxxxx
Caveat Emptor!

BigML, Inc 22
Mo’ Problems
• Model selection tends to take a lot of
data, and the more accurate you
want the search to be, the more data
you need.
• We had to deﬁne a search space that
would suit “most” datasets. It’s
possible that the right model for your
data isn’t in there!

BigML, Inc 23
Fusions
Just Slam A Bunch of Stuﬀ Together
Charles Parker
VP Algorithms, BigML, Inc

BigML, Inc 24
• Diving into Fusions
• Some Pros and Cons
• Aside: Prediction Explanations
• Creating a Diverse Ensemble
Much Ado About Fusions

BigML, Inc 25
Mixture of Experts
Prediction!

BigML, Inc 26
Mixture of Experts
Prediction!
?

BigML, Inc 27
Mixture of Experts

BigML, Inc 28
Ensemble?
Prediction!Aggregate!

BigML, Inc 29
Creating a Fusion

BigML, Inc 30
Ensemble?

BigML, Inc 31
Fusion = Diverse Ensemble

BigML, Inc 32
Other Techniques?

BigML, Inc 33
Stacking
Prediction!

BigML, Inc 34
Boosting
Prediction!

BigML, Inc 35Xxxxxx
Some Pros and Cons

BigML, Inc 36
• A bit wobbly
• Regions of the input space might
have under-performing predictions
• Probably pretty fast
• With OptiML, it’s the best thing we
could ﬁnd
Fusions vs. Single Models
• More stable
• Errors tend to be “smoothed out”
across the entire input space
• Maybe somewhat slow
• You’ll have to do some additional
validation to check performance
FusionsSingle Models

BigML, Inc 37
What About Performance?
• This is not typically a step that will result in huge performance gains, unless
you’ve got significant feature diversity
• You’re usually better off feature engineering / acquiring more data
• Do it for stability
• . . . or to improve the importance profile

BigML, Inc 38Xxxxxx
Importance Tuning

BigML, Inc 39
Feature Importance
• Importance is measured in diﬀerent ways depending on the model type
• This is the importance given under
• Global importance is diﬀerent from local importance!
• This is given by prediction explanations

BigML, Inc 40Xxxxxx
Global Importance
• What’s really important? Does it make sense?

BigML, Inc 41Xxxxxx
Local Importance

BigML, Inc 42Xxxxxx
Creating a Diverse Ensemble

BigML, Inc 43Xxxxxx
Fusions Love Diversiﬁcation
• Fusions work better if the predictions of the constituent models are all good
but not correlated
• One way to increase the chances of this is to use diﬀerent feature sets that
are not well-correlated
• Text provides a good opportunity do to this because so many possible
features can be generated from text data

BigML, Inc 44Xxxxxx
Text Feature Makeover
• Stem / don’t stem
• Change aggressiveness of stop word removal
• Longer n-grams / Ignore unigrams

VSSML18. OptiML and Fusions

More Related Content

Similar to VSSML18. OptiML and Fusions

More from BigML, Inc

Recently uploaded

VSSML18. OptiML and Fusions