Scalable Machine Learning
For Smarter Applications
Agenda
Data Science
Machine Learning
Trees and Power of Algorithmic Methods
Examples using H2O Scalable Machine
Learning Engine
Who am I?
Hank Roark
Data Scientist & Hacker @ H2O.ai
Lecturer in Systems Thinking, UIUC
13 years at John Deere, Research, New Product
Development, New High Tech Ventures
Previously at startups and consulting
Physics Georgia Tech
Systems Design & Management MIT
Data Science
Data Science
Interdisciplinary
Electronic commodity, must
speak ‘hacker’
Extract insights from data
Discovery and building
knowledge
http://drewconway.com/zia/2013/3/26/the-data-
science-venn-diagram
Data Science
Jeff Hammerbacher (Facebook, Cloudera)
• Identify problem
• Instrument data sources
• Collect data
• Prepare data (integrate, transform, clean,
impute, filter, aggregate)
• Build model
• Evaluate model
• Communicate results
Data Science
Ben Fry (data visualization expert)
• Acquire
• Parse
• Filter
• Mine
• Represent
• Refine
• Interact
Agenda
 Data Science
Machine Learning
Trees and Power of Algorithmic Methods
Examples using H2O Scalable Machine
Learning Engine
WHAT IS MACHINE
LEARNING?
Field of study that gives computers the ability to learn
without being explicitly programmed.
Arthur Samuel, 1959
10
A computer program is said to learn from experience E
With regards to some task T
and some performance measure P,
if its performance on T,
as measured by P,
improves with experience E.
Tom Mitchell, 1998
11
Types of Learning
• Supervised Learning
• Inferring function from labeled data
• Classification
• Regression
• Unsupervised Learning
• Finding hidden structure in unlabeled data
• Clustering
• Anomaly
• Reinforcement Learning
• Learning from delayed feedback
Isn’t this just statistics repackaged?
x nature y
Shared goals of data analysis:
Prediction
Information extraction
L Breiman
Statistical Analysis
x
Linear regression
Logistic regression
Cox models
y
Assume some process that creates observed data
Model validation:
Yes–no using goodness-of-fit tests
Residual examination
L Breiman
Algorithmic Analysis (aka ML)
x Unknown y
Process that creates observed data is unknowable
Model validation:
Measured by predictive accuracy L Breiman
Decision trees
Neural networks
Why Big Data + Machine Learning
Why Big Data + Machine Learning
Agenda
 Data Science
 Machine Learning
Trees and Power of Algorithmic Methods
Examples using H2O Scalable Machine
Learning Engine
Trees
Short exploration of one algorithmic method
Can be used for regression and classification
Segments the prediction space into a number
of simple regions
Often referred to as decision trees
Baseball Salary
Salary is color coded from low
(blue) to high (red)
Tibshirani and Hastie
Baseball Salary
Salary is color coded from low
(blue) to high (red)
Tibshirani and Hastie
Pros and Cons
Simple, thought to mirror human decision
making
Not competitive with the best supervised
learning approaches in terms of predictive
accuracy
Combining large number of trees results in
dramatic improvements, with some loss of
interpretability
Methods to Improve Predictive
Performance of Trees
Bagging Random Forest Boosting
Bagging is short for
bootstrap aggregation.
Averaging a set of
observations reduces
variance.
Individual trees are built on
samples, with
replacement, of the data.
(Bootstrap)
Many trees are built and
the results ‘averaged’
(Aggregation)
Random forest builds on
bagging, by considering a
random subset of the
predictors at each tree split
This further decorrelates
the trees, resulting in
improved predictive
performance.
Implemented in H2O as
Random Forest.
Builds multiple models
sequentially, using
information from prior
trees.
Slowly fit the residuals of
prior models.
Is a general method, not
limited to trees.
Implemented in H2O as
GBM (Gradient Boosted
Models); first ever parallel,
distributed GBM.
Which Algorithm Is Best?
Linear
models
Decision
tree
Tibshirani and Hastie
Which Algorithm Is Best?
25
We have dubbed the associated results No Free Lunch theorems
because they demonstrate that if an algorithm performs well on a certain
class of problems then it necessarily pays for that with degraded
performance on the set of all remaining problems. (Wolpert and Macready)
Agenda
 Data Science
 Machine Learning
 Trees and Power of Algorithmic Methods
Examples using H2O Scalable Machine
Learning Engine
• Founded: 2011 venture-backed, debuted in 2012
• Product: H2O open source in-memory prediction engine
• Team: 37 - Distributed Systems Engineers doing ML
• HQ: Mountain View, CA
H2O.ai Overview
H2O.ai
Machine Intelligence
25,000 commits / 3yrs
H2O World Conference 2014
Team Work @ H2O.ai
28
Join H2O World Nov 9-11 2015!
What is H2O?
Open source in-memory prediction engineMath Platform
• Parallelized and distributed algorithms making the most use out of
multithreaded systems
• GLM, Random Forest, GBM, Deep Learning, etc.
Easy to use and adoptAPI
• Written in Java – perfect for Java Programmers
• REST API (JSON) – drives H2O from R, Python, Excel, Tableau
More data? Or better models? BOTHBig Data
• Use all of your data – model without down sampling
• Run a simple GLM or a more complex GBM to find the best fit for the data
• More Data + Better Models = Better Predictions
H2O.ai
Machine Intelligence
Accuracy with Speed and Scale
31
Ad Optimization (200% CPA Lift with H2O)
P2B Model Factory (60k models, 15x
faster with H2O than before)
Fraud Detection (11% higher accuracy with H2O
Deep Learning - saves millions)
…and many large insurance, financial services, and
manufacturing companies!
Real-time marketing (H2O is 10x faster than
anything else)
Customer Use Cases
Customer Stories
• Propensity to Buy model
• AdTech
• Fraud prevention
Propensity to Buy modeling factory
Cisco Predictive Modeling Factories
Problem
Why H2O?
Who uses it?
• Need to predict whether a company will buy a
certain product at a given time
• Spend a lot of time preparing models
• Less time for scoring and less time left for using the
scores in the sales activities
• P2B factory is 15x faster with H2O
• Newer buying patterns incorporated immediately
into models
• Scores are published sooner
• More time for planning and executing activities
• R + H2O is a robust and powerful combination
• Lou Carvalheira, advanced analytics manager
• Customer Intelligence data scientists
P2B factory is 15x faster with H2O
Q1 Q2
P2B Training
Scoring
models
Data
Refresh Q2
Data
Refresh Q1
Prepare,
execute
Mktg & Sales
activities
Before, without H2O
Q1 Q2
Trai
n
&
scor
e
Data
Refresh
Prepare, execute
Mktg & Sales
activities
Trai
n
&
scor
e
Data
Refresh
Prepare, execute
Mktg & Sales
activities
Now, with H2O
Modeling conversion rate on multiple campaigns
ShareThis AdTech Optimization
Problem
Why H2O?
Who uses it?
• ShareThis ONLY targets users within 24 hours to
ensure ads reach them at the most relevant
moment for maximum ROI
• Maximized ROI by optimizing campaign
performance and budget allocation
• Increased accuracy and better anomaly removal
• Reduced R&D time significantly
• Used all data and built models faster, & faster scoring
• Smooth model building pipeline with R and Spark
API
• Prasanta Behera, VP of Engineering
• Ad Products team
STANDARD TARGETING
THRESHOLD
INTEREST
TIME
TRIGGER
EXCITEMENT
PEAK READINESS
FOR
ENGAGEMENT
FADING INTEREST
 MALE 25-45
 TECH ENTHUSIAST
 $HHI $75K+
“DAN”
ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most
relevant moment
SHARETHIS
MESSAGING TRIGGER
Real Time Messaging Reaches Users During
Peak Interest
Live Tests on Different Campaigns
observed CPA lift using H2O
Fraud prevention using Deep Learning
PayPal Fraud Prevention
Problem
Why H2O?
Who uses it?
• Flag fraudulent behavior upfront
• Monitor account activity and account-to-account
transactions for suspicious behavior and changes
• Need to model new and complex attack patterns
quickly
• Fast, scalable, and accurate
• Flexible deployment
• Works seamlessly with Hadoop
• Simple interface
• 11% improvement in accuracy w/ Deep Learning
• Fraud Prevention data science team
Fraud Prevention at PayPal
Experiment
• Dataset
− 160 million records
− 1500 features (150
categorical)
− 0.6TB compressed in
HDFS
• Infrastructure
− 800 node Hadoop
(CDH3) cluster
• Decision
− Fraud/not-fraud
• Network architecture- 6 layers
with 600 neurons each performed
the best
• Activation function
− RectifierWithDropout performed the
best
• 11% accuracy Improvement with
limited feature set & a deep
network
− With a third of the original feature set,
6 hidden layers, 600 neurons each
Results
Customer
selects song to
purchase
$
Payment
information
entered
Data collected
Comparison with past consumer
behavior
Random ForestDetermine
fraud/not
fraud
Take steps to stop
fraud or prevent
future fraud
Fraud Prevention with Random Forest
Live Demonstration
Agenda
 Data Science
 Machine Learning
 Trees and Power of Algorithmic Methods
 Examples using H2O Scalable Machine
Learning Engine
Thank You

Data Science, Machine Learning, and H2O

  • 1.
    Scalable Machine Learning ForSmarter Applications
  • 2.
    Agenda Data Science Machine Learning Treesand Power of Algorithmic Methods Examples using H2O Scalable Machine Learning Engine
  • 3.
    Who am I? HankRoark Data Scientist & Hacker @ H2O.ai Lecturer in Systems Thinking, UIUC 13 years at John Deere, Research, New Product Development, New High Tech Ventures Previously at startups and consulting Physics Georgia Tech Systems Design & Management MIT
  • 4.
  • 5.
    Data Science Interdisciplinary Electronic commodity,must speak ‘hacker’ Extract insights from data Discovery and building knowledge http://drewconway.com/zia/2013/3/26/the-data- science-venn-diagram
  • 6.
    Data Science Jeff Hammerbacher(Facebook, Cloudera) • Identify problem • Instrument data sources • Collect data • Prepare data (integrate, transform, clean, impute, filter, aggregate) • Build model • Evaluate model • Communicate results
  • 7.
    Data Science Ben Fry(data visualization expert) • Acquire • Parse • Filter • Mine • Represent • Refine • Interact
  • 8.
    Agenda  Data Science MachineLearning Trees and Power of Algorithmic Methods Examples using H2O Scalable Machine Learning Engine
  • 9.
  • 10.
    Field of studythat gives computers the ability to learn without being explicitly programmed. Arthur Samuel, 1959 10
  • 11.
    A computer programis said to learn from experience E With regards to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E. Tom Mitchell, 1998 11
  • 12.
    Types of Learning •Supervised Learning • Inferring function from labeled data • Classification • Regression • Unsupervised Learning • Finding hidden structure in unlabeled data • Clustering • Anomaly • Reinforcement Learning • Learning from delayed feedback
  • 13.
    Isn’t this juststatistics repackaged? x nature y Shared goals of data analysis: Prediction Information extraction L Breiman
  • 14.
    Statistical Analysis x Linear regression Logisticregression Cox models y Assume some process that creates observed data Model validation: Yes–no using goodness-of-fit tests Residual examination L Breiman
  • 15.
    Algorithmic Analysis (akaML) x Unknown y Process that creates observed data is unknowable Model validation: Measured by predictive accuracy L Breiman Decision trees Neural networks
  • 16.
    Why Big Data+ Machine Learning
  • 17.
    Why Big Data+ Machine Learning
  • 18.
    Agenda  Data Science Machine Learning Trees and Power of Algorithmic Methods Examples using H2O Scalable Machine Learning Engine
  • 19.
    Trees Short exploration ofone algorithmic method Can be used for regression and classification Segments the prediction space into a number of simple regions Often referred to as decision trees
  • 20.
    Baseball Salary Salary iscolor coded from low (blue) to high (red) Tibshirani and Hastie
  • 21.
    Baseball Salary Salary iscolor coded from low (blue) to high (red) Tibshirani and Hastie
  • 22.
    Pros and Cons Simple,thought to mirror human decision making Not competitive with the best supervised learning approaches in terms of predictive accuracy Combining large number of trees results in dramatic improvements, with some loss of interpretability
  • 23.
    Methods to ImprovePredictive Performance of Trees Bagging Random Forest Boosting Bagging is short for bootstrap aggregation. Averaging a set of observations reduces variance. Individual trees are built on samples, with replacement, of the data. (Bootstrap) Many trees are built and the results ‘averaged’ (Aggregation) Random forest builds on bagging, by considering a random subset of the predictors at each tree split This further decorrelates the trees, resulting in improved predictive performance. Implemented in H2O as Random Forest. Builds multiple models sequentially, using information from prior trees. Slowly fit the residuals of prior models. Is a general method, not limited to trees. Implemented in H2O as GBM (Gradient Boosted Models); first ever parallel, distributed GBM.
  • 24.
    Which Algorithm IsBest? Linear models Decision tree Tibshirani and Hastie
  • 25.
    Which Algorithm IsBest? 25 We have dubbed the associated results No Free Lunch theorems because they demonstrate that if an algorithm performs well on a certain class of problems then it necessarily pays for that with degraded performance on the set of all remaining problems. (Wolpert and Macready)
  • 26.
    Agenda  Data Science Machine Learning  Trees and Power of Algorithmic Methods Examples using H2O Scalable Machine Learning Engine
  • 27.
    • Founded: 2011venture-backed, debuted in 2012 • Product: H2O open source in-memory prediction engine • Team: 37 - Distributed Systems Engineers doing ML • HQ: Mountain View, CA H2O.ai Overview H2O.ai Machine Intelligence
  • 28.
    25,000 commits /3yrs H2O World Conference 2014 Team Work @ H2O.ai 28 Join H2O World Nov 9-11 2015!
  • 29.
    What is H2O? Opensource in-memory prediction engineMath Platform • Parallelized and distributed algorithms making the most use out of multithreaded systems • GLM, Random Forest, GBM, Deep Learning, etc. Easy to use and adoptAPI • Written in Java – perfect for Java Programmers • REST API (JSON) – drives H2O from R, Python, Excel, Tableau More data? Or better models? BOTHBig Data • Use all of your data – model without down sampling • Run a simple GLM or a more complex GBM to find the best fit for the data • More Data + Better Models = Better Predictions H2O.ai Machine Intelligence
  • 30.
  • 31.
    31 Ad Optimization (200%CPA Lift with H2O) P2B Model Factory (60k models, 15x faster with H2O than before) Fraud Detection (11% higher accuracy with H2O Deep Learning - saves millions) …and many large insurance, financial services, and manufacturing companies! Real-time marketing (H2O is 10x faster than anything else) Customer Use Cases
  • 32.
    Customer Stories • Propensityto Buy model • AdTech • Fraud prevention
  • 33.
    Propensity to Buymodeling factory
  • 34.
    Cisco Predictive ModelingFactories Problem Why H2O? Who uses it? • Need to predict whether a company will buy a certain product at a given time • Spend a lot of time preparing models • Less time for scoring and less time left for using the scores in the sales activities • P2B factory is 15x faster with H2O • Newer buying patterns incorporated immediately into models • Scores are published sooner • More time for planning and executing activities • R + H2O is a robust and powerful combination • Lou Carvalheira, advanced analytics manager • Customer Intelligence data scientists
  • 35.
    P2B factory is15x faster with H2O Q1 Q2 P2B Training Scoring models Data Refresh Q2 Data Refresh Q1 Prepare, execute Mktg & Sales activities Before, without H2O Q1 Q2 Trai n & scor e Data Refresh Prepare, execute Mktg & Sales activities Trai n & scor e Data Refresh Prepare, execute Mktg & Sales activities Now, with H2O
  • 36.
    Modeling conversion rateon multiple campaigns
  • 37.
    ShareThis AdTech Optimization Problem WhyH2O? Who uses it? • ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most relevant moment for maximum ROI • Maximized ROI by optimizing campaign performance and budget allocation • Increased accuracy and better anomaly removal • Reduced R&D time significantly • Used all data and built models faster, & faster scoring • Smooth model building pipeline with R and Spark API • Prasanta Behera, VP of Engineering • Ad Products team
  • 38.
    STANDARD TARGETING THRESHOLD INTEREST TIME TRIGGER EXCITEMENT PEAK READINESS FOR ENGAGEMENT FADINGINTEREST  MALE 25-45  TECH ENTHUSIAST  $HHI $75K+ “DAN” ShareThis ONLY targets users within 24 hours to ensure ads reach them at the most relevant moment SHARETHIS MESSAGING TRIGGER Real Time Messaging Reaches Users During Peak Interest
  • 39.
    Live Tests onDifferent Campaigns observed CPA lift using H2O
  • 40.
  • 41.
    PayPal Fraud Prevention Problem WhyH2O? Who uses it? • Flag fraudulent behavior upfront • Monitor account activity and account-to-account transactions for suspicious behavior and changes • Need to model new and complex attack patterns quickly • Fast, scalable, and accurate • Flexible deployment • Works seamlessly with Hadoop • Simple interface • 11% improvement in accuracy w/ Deep Learning • Fraud Prevention data science team
  • 42.
    Fraud Prevention atPayPal Experiment • Dataset − 160 million records − 1500 features (150 categorical) − 0.6TB compressed in HDFS • Infrastructure − 800 node Hadoop (CDH3) cluster • Decision − Fraud/not-fraud • Network architecture- 6 layers with 600 neurons each performed the best • Activation function − RectifierWithDropout performed the best • 11% accuracy Improvement with limited feature set & a deep network − With a third of the original feature set, 6 hidden layers, 600 neurons each Results
  • 43.
    Customer selects song to purchase $ Payment information entered Datacollected Comparison with past consumer behavior Random ForestDetermine fraud/not fraud Take steps to stop fraud or prevent future fraud Fraud Prevention with Random Forest
  • 44.
  • 45.
    Agenda  Data Science Machine Learning  Trees and Power of Algorithmic Methods  Examples using H2O Scalable Machine Learning Engine
  • 46.

Editor's Notes

  • #39 MOVING AWAY FROM OUTDATED AUDIENCE TARGETING BUCKETS – TO UTILIZING “FRESHER” REAL-TIME DATA . Other companies use standard audience targeting and bucket Dan as a “tech enthusiast”, we message him at the moments when it’s most relevant.