MLSEV. Machine Learning: Technical Perspective

BigML, Inc
ML: Technical Perspective
What is the Big Deal?
Poul Petersen
CIO, BigML
!2

BigML, Inc #MLSEV: ML a Technical Perspective
Sampling the Audience
!3
Expert: Published papers at KDD, ICML, NIPS, etc or
developed own ML algorithms used at large scale
Aﬁcionado: Understands pros/cons of different
techniques and/or can tweak algorithms as needed
Practitioner: Very familiar with ML packages (Weka,
Scikit, BigML, etc.)
Newbie: Just taking Coursera ML class or reading an
introductory book to ML
Absolute beginner: ML sounds like science ﬁction

A Present for You
!4

Free 1-Month Boosted Subscription
!5
https://bigml.com/accounts/register/
MLSEV

BigML, Inc #MLSEV
What is Machine Learning?
!6

!7
Let’s start with what is NOT Machine Learning…
• Sentience
• Killer robots
• Generalized Artiﬁcial Intelligence
• Anything to do with the word “singularity”

Oh the Hype!
!8
AlphaGo Zero beats a human at Go… killer robots far off?
• First of all, AlphaGo Zero is impressive!
• But, no need to fear killer robots power by AlphaGo Zero:
• Learning is not transferrable: retrain for chess, etc.
• Works only for rule based systems / perfect simulator
• Relies on games/systems with clear objectives (win/lose)
• Cost $25 million1
“While AlphaGo Zero is a step towards a general-purpose AI, it can only work on
problems that can be perfectly simulated in a computer, making tasks such as
driving a car out of the question. AIs that match humans at a huge range of
tasks are still a long way off” - Demis Hassabis, CEO of DeepMind2
2. https://www.theguardian.com/science/2017/oct/18/its-able-to-create-knowledge-itself-google-unveils-ai-learns-all-on-its-own
1. https://www.inc.com/lisa-calhoun/google-artificial-intelligence-alpha-go-zero-just-pressed-reset-on-how-we-learn.html

Three Domains
!9
Artificial
Intelligence
Cool/Scary things…
that mostly don’t exist
Machine
Learning
AI Concepts applied to
very speciﬁc problems
Deep
Learning
Speciﬁc techniques of
Machine Learning

!10
Let’s start with what is NOT Machine Learning…
• Sentience
• Killer robots
• Generalized Artiﬁcial Intelligence
• Anything to do with the word “singularity”
• Something “new”
• First International Conference on ML held in 1980
• Top-performing algorithms have been around for decades
How do these things relate?

AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
!11
Finding patterns in data that can be used to
make inferences
Predictive Models
A practical definition…

Machine Learning Terminology
!12
Instances
Features
New Instance
Predictive model
Prediction
Confidence
ML algorithm
Label
Training / Learning Predicting / Scoring
Data

BigML, Inc #MLSEV
Why Machine Learning?
!13

Why Machine Learning
!14
COMPLEXITYOFTASKS
TIME20th century 21st century
-
+

Traditional Programming
!15
Lost Baggage Policy
• Explicit rules deﬁned by requirements and experience
• How do we program when the rules are unknown or
very difﬁcult to determine?

Programming with ML
!16
AIRLINE ORIGIN DESTINATION
DEPARTURE
DELAY
DISTANCE
ARRIVAL
DELAY
AS ANC SEA -11 1448,0 -22
AA LAX PBI -8 2330,0 -9
US SFO CLT -2 2296,0 5
AA LAX MIA -5 2342,0 -9
AS SEA ANC -1 1448,0 -21
DL SFO MSP -5 1589 8
NK LAS MSP -6 1299 -17
US LAX CLT 14 2125,0 -10
AA SFO DFW -11 1464,0 -13
DL LAS ATL 3 1747,0 -15
Want: Flight Delay Prediction
Flight Delay Model????
What else can ML do?

BigML, Inc #MLSEV
Machine Learning Tasks
!17

Machine Learning Tasks
!18
CLUSTER
ANALYSIS
ANOMALY
DETECTION
ASSOCIATION
DISCOVERY
TOPIC MODELING
TIME SERIES
UNSUPERVISED
CLASSIFICATION AND REGRESSION
SUPERVISED

Predictive Maintenance
!19
CLASSIFICATION Will this component fail?
REGRESSION How many days until this component fails?
TIME SERIES FORECASTING How many components will fail in a week from now?
CLUSTER ANALYSIS Which machines behave similarly?
ANOMALY DETECTION Is this behavior normal?
ASSOCIATION DISCOVERY What alerts are triggered together before a failure?

Personalized Music
!20
CLASSIFICATION Will this song be a hit?
REGRESSION How many users will play this song next month?
TIME SERIES FORECASTING
How many downloads this song will have in 3
months?
CLUSTER ANALYSIS Which songs are similar?
ANOMALY DETECTION Is this song being played more than normal?
ASSOCIATION DISCOVERY What songs people like to play together?

Airline Revenue Management
!21
CLASSIFICATION Will this flight be booked at 80% 14 days out?
REGRESSION
How many passengers will book this flight 7 days
out?
TIME SERIES FORECASTING How many tickets will be cancelled this week?
CLUSTER ANALYSIS Which flight booking patterns are similar?
ANOMALY DETECTION Are these flights booking patterns normal?
ASSOCIATION DISCOVERY What price changes help overbook sooner?

Network Security
!22
CLASSIFICATION Is this email part of a phishing attack?
REGRESSION How many logins after work per week?
TIME SERIES FORECASTING What will be the number of false alarms next week?
CLUSTER ANALYSIS Are these users behaving similarly?
ANOMALY DETECTION Is this user behavior worth to inspect?
ASSOCIATION DISCOVERY What alerts were triggered before this attack?

BigML, Inc #MLSEV
All ML Models are Wrong
!23

All ML Models are WRONG
!24
TRUE FALSE
DEEPNET ENSEMBLELOGISTIC
REGRESION
DECISION TREE
Some model(s) is wrong… which one?
Same patient… different models… different predictions!
Insight: Need a way to measure model fitness

Evaluating Models
!25
TEST
TRAINING
CONFIDENCEPREDICTION
%
EVALUATION
%
ENSEMBLE
PATIENT DATA
Stay Tuned: You will see this in Evaluations

Measuring ML Mistakes
!26
TRUE FALSE
TRUE
TRUE
POSITIVE
FALSE
POSITIVE
FALSE
FALSE
NEGATIVE
TRUE
NEGATIVE
MODEL
ACTUAL
We can bend the rules a bit…

Operating Point
!27
TRUE
FALSE
100% 0%
0% 100%
Operating Point
More False Positives More False Negatives
Why would you do this?

Comparing Models
!28
%TRUEPOSITIVES
% FALSE POSITIVES
WORST(?) MODEL
IDEAL MODEL
GOOD
BETTER
R
AN
D
O
M
TRIVIAL MODEL
TRIVIAL MODEL

Mistakes can be Costly
!29
+ =
FUN!
DANGER!

Cost Functions
!30
GOOD
BETTER?%TRUEPOSITIVES
% FALSE POSITIVES
• What is the cost of predicting cancer incorrectly?
• What is the cost of labeling a fraudulent transaction as valid?
• What is the cost of incorrectly predicting an aircraft part is safe?
• Why can’t I just have a perfect model?
FALSE NEGATIVE COST
FALSE POSITIVE COST
One possibility

How it Goes All Wrong
!31
• Over-ﬁtting
• Under-ﬁtting

Hunting Dog Image Classiﬁer
!32
TRU
E
FAL
SE
Which images are pictures of dogs that are
bred to be hunters?

Over-ﬁtting…
!33
“Hunting dogs are short-
haired spotted puppies that
lay out on the grass”

Title
!34
A perfect model! How about some new images…
TRU
E
FAL
SE

Over-ﬁtting
!35
Model: true
Reality: false
Model: false
Reality: true
• This is an example or poor generalization
• The model “ﬁt” the training data perfectly
• But it does not generalize to new instances well

Under-ﬁtting
!36
“Dogs with drop or pendant
ears are hunters”
Only use ear shape:

Title
!37
An imperfect model… now we are making some
mistakes on the training data.
TRU
E
FAL
SE

Under-ﬁtting
!38
• This is an example of good generalization
• The model “under-ﬁt” the training data
• But it is generalizing to new instances better
Model: true
Reality: true
Model: false
Reality: false

Under-ﬁtting
!39
Model: false
Reality: true
Model: false
Reality: true

Learning Problems / Complexity
!40
Under-fitting Over-fitting
• High Complexity Model
• Fitting the data too well
One way to mitigate this is with different types of models…
• Low Complexity Model
• Not fitting the data very well

Choosing the ML Algorithm
!41
Decreasing Interpretability / Better Representation / Longer Training
IncreasingDataSize/Complexity
Early Stage

Rapid Prototyping
Mid Stage

Proven Application
Late Stage

Critical Performance
DeepnetsSingle Tree Model
Logistic Regression Boosted Trees
Random

Decision Forest
Decision Forest
Hard?

BigML, Inc #MLSEV
Automating Machine Learning
!42

Deepnet Structure
!43
x1 x2 x3 x4
y1 y2 y3Outputs
Inputs
h1 h2 h3 h4 h5 Hidden layer
3 Classes
4 Features
h1 h2 h3 h4 h5 Hidden layer
h1 h2 h3 h4 h9 Hidden layer….
h1 = activation?(wx, x) ?

BigML Deepnet
!44
• The success of a Deepnet is dependent on getting the right
network structure for the dataset
• But, there are too many parameters:
• Nodes, layers, activation function, learning rate, etc…
• And setting them takes signiﬁcant expert knowledge
• Solution: Metalearning (a good initial guess)
• Solution: Network search (try a bunch)

!45
http://www.clparker.org/ml_benchmark/

!46
• Each resource has several parameters that impact quality
• Number of trees, missing splits, nodes, weight
• Rather than trial and error, we can use ML to ﬁnd ideal
parameters
• Why not make the model type, Decision Tree, Boosted Tree,
etc, a parameter as well?
• Similar to Deepnet network search, but ﬁnds the optimum
machine learning algorithm and parameters for your data
automatically
Key Insight: We can solve any parameter selection
problem in a similar way.

BigML OptiML
!47

Fusions
!48
Key Insight: ML algorithms each have unique
strengths and weaknesses
Single Tree: output changes abruptly
with inputs near decision boundary
Tree + Deepnet: output changes smoothly
with inputs near decision boundary

Fusions
!49
Model Skills: Some ML algorithms “generally” do better
on some feature types:
• RDF for sparse text vectors

• LR/Deepnets for numeric features

• Trees for categorical features
Full
Numeric
Text

Summary
!50
• Machine Learning is a subset of “Artiﬁcial Intelligence”
• Finds patterns in data that can be used to make inferences
• Can be thought of as “programming with data”
• Has been around for a long time (only recently practical)
• Already being used to solve real-world problems
• Caveat Emptor:
• Machine Learning mistakes are expected
• Care must be taken to address the cost of mistakes
• Automating Machine Learning
• Powerful application of ML to parameterizing ML
• Models can be fused to address speciﬁc data complexities

MLSEV. Machine Learning: Technical Perspective

MLSEV. Machine Learning: Technical Perspective

Recommended

Recommended

More Related Content

Similar to MLSEV. Machine Learning: Technical Perspective

Similar to MLSEV. Machine Learning: Technical Perspective (20)

More from BigML, Inc

More from BigML, Inc (20)

Recently uploaded

Recently uploaded (20)

MLSEV. Machine Learning: Technical Perspective