H2O.ai 
Open Source 
Machine Learning 
for Intelligent Applications 
H2O.ai 
Machine Intelligence
Time is the only non-renewable resource 
Speed Matters! 
H2O.ai 
Machine Intelligence
Sampling 
Law of Large Numbers
Data scientists & Analysts will not 
write Java MapReduce
On Premise 
On / Off Hadoop 
On EC2 
Per Node 
2M Row ingest/sec 
50M Row Regression/sec 
750M Row Aggregates / sec 
Tableau 
R 
JSON 
Scala 
Java 
Python 
H2O Prediction Engine 
SDK / API 
Nano Fast Scoring Engine 
Deep learning 
Regression 
Trees 
Boosting 
Forests 
Solvers 
Gradients 
ensembles 
Cluster 
Query Processor R-engine 
In-Mem Map Reduce 
Distributed fork/join 
Memory Manager 
Columnar Compression 
Classify 
HDFS S3 SQL NoSQL 
Excel 
H2O.ai 
Machine Intelligence
Infrastructure 
Parallelism 
Data Parallel 
Chunking Express! 
Algorithm Parallel 
Parallel Code blocks 
Math Parallelism 
ADMM, HogWild 
Distribution 
Zero-Serialization – 
endian wars have ended
Scalable Machine Learning 
For Smarter Applications 
H2O.ai 
Machine Intelligence 
H2O.ai
Programmable Internet 
H2O.ai 
Machine Intelligence
Programmable Devices 
H2O.ai 
Machine Intelligence
AdSense Sense 
H2O.ai 
Machine Intelligence
Correlation Causality 
H2O.ai 
Machine Intelligence
Data 
Sensors 
Devices 
Events. Signals. TimeSeries 
Semi-structured data. json. 
High velocity. 
High dimensions. 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Scoring from prediction 
Anomaly and Outliers Detection 
Unsupervised Learning 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Anomaly and Outliers Detection 
model 
Scoring from prediction 
H2O.ai 
Machine Intelligence
Streaming Data 
Historical Data 
Clustering / Unsupervise Learning 
model 
Scoring from prediction 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence https://developer.nest.com/documentation/api-reference/devices
Take Models to Production in Java 
H2O.ai 
Machine Intelligence
Onset of Rita 
H2O.ai 
Machine Intelligence
Common ensemble techniques 
Bayesian Classifiers 
Ensembles of all hypotheses in hypothesis-space. 
Bagging 
Each model votes with equal weight. 
Bagging trains models on randomly drawn subset 
Boosting 
Incrementally build an ensemble of each new model 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
Gradient Boosting Machine 
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
H2O.ai 
Machine Intelligence
Variable Importance Comparison 
Gradient Boosting Machine, 50 trees 
Random Forest, 50 trees 
H2O.ai 
Machine Intelligence
Generalized Linear Modeling – Variable Importance 
GLM, Elastic Net (Binomial) 
GLM, Elastic Net (Binomial) 
Categorical expansion on Age 
H2O.ai 
Machine Intelligence
Variable Importance Comparison 
Deep Learning (Tanh / 4-layer) 
Deep Learning (Tanh / 3-layer) 
H2O.ai 
Machine Intelligence
every generation needs to invent it’s math. 
Our data, our tools! 
H2O.ai 
Machine Intelligence
Power-Law
Code is incomplete without Community! 
Open Source Matters! 
H2O.ai 
Machine Intelligence
Community 
Committers 30 
Meet ups 90 
in 12 months 
Coverage Conference 
Speakers 
Curriculum 
Stanford, MIT, CSU, 
SUNY, SJSU, Purdue
Data Driven Decision Making is hard! 
Courage Matters! 
H2O.ai 
Machine Intelligence
Winning customer trust not just quarters! 
Mindset matters! 
H2O.ai 
Machine Intelligence
Thanks 
Courtney, Nick & MLConf 
for bringing us to ATL
Sparkling Water Application Life 
Cycle 
Sparkling 
App 
jar file 
Spark 
Master 
JVM 
spark-submit 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
Spark 
Worker 
JVM 
(1) 
(2) 
(3) 
(1) User submits App to Spark cluster Master node 
(2) App distributed to Spark cluster Worker nodes 
(3) Spark Executor JVMs start for App 
(4) H2O instance starts within each Executor JVM 
(5) App’s Scala main program runs 
Sparkling Water Cluster 
Spark 
Executor 
JVM 
H2O 
(4) 
Spark 
Executor 
JVM 
H2O 
Spark 
Executor 
JVM 
H2O
Sparkling Water Data Distribution 
Sparkling Water Cluster 
H2O 
H2O 
H2O 
Spark Executor JVM 
Data 
Source 
(e.g. 
HDFS) 
(1) 
(2) 
(3) 
(1) Use Spark SQL to read 
data into a Spark RDD 
(2) Convert Spark RDD to 
H2O RDD; H2O RDD is 
column-based and highly 
compressed 
(Not shown) Run modeling 
and prediction workflows 
with H2O 
(3) Convert H2O RDD (e.g. 
predictions) back to Spark 
RDD 
H2O 
RDD 
Spark 
RDD 
Spark Executor JVM 
Spark Executor JVM
H2O 
HHDFS 
H2O 
YARN 
HHDFS 
H2O 
Hadoop MR 
HHDFS 
Standalone YARN H2O in MR 
H HortonWorks, Cloudera, MapR, Intel 2O.ai 
Machine Intelligence
H2O – The Killer-App for Spark 
MLlib H2O SQL 
H2ORDD 
HDFS=DATA 
Sparkling Water 
H2O.ai 
Machine Intelligence 
In-Memory Big Data, Columnar 
ML 100x faster Algos 
R CRAN, API, fast engine 
API Spark API, Java MM 
Community Devs, Data Science
examples 
H2O.ai 
Machine Intelligence
Fraud / No-fraud 
1/1000 unbalanced 
Click-Stream 
Browse / Click / Buy 
H2O.ai 
Machine Intelligence
Propensity Models 
Merchants –to- Users 
Lifetime Value of Customer 
Pricing Engines 
H2O.ai 
Machine Intelligence

H2O 0xdata MLconf

  • 1.
    H2O.ai Open Source Machine Learning for Intelligent Applications H2O.ai Machine Intelligence
  • 2.
    Time is theonly non-renewable resource Speed Matters! H2O.ai Machine Intelligence
  • 3.
    Sampling Law ofLarge Numbers
  • 4.
    Data scientists &Analysts will not write Java MapReduce
  • 5.
    On Premise On/ Off Hadoop On EC2 Per Node 2M Row ingest/sec 50M Row Regression/sec 750M Row Aggregates / sec Tableau R JSON Scala Java Python H2O Prediction Engine SDK / API Nano Fast Scoring Engine Deep learning Regression Trees Boosting Forests Solvers Gradients ensembles Cluster Query Processor R-engine In-Mem Map Reduce Distributed fork/join Memory Manager Columnar Compression Classify HDFS S3 SQL NoSQL Excel H2O.ai Machine Intelligence
  • 6.
    Infrastructure Parallelism DataParallel Chunking Express! Algorithm Parallel Parallel Code blocks Math Parallelism ADMM, HogWild Distribution Zero-Serialization – endian wars have ended
  • 7.
    Scalable Machine Learning For Smarter Applications H2O.ai Machine Intelligence H2O.ai
  • 8.
    Programmable Internet H2O.ai Machine Intelligence
  • 9.
    Programmable Devices H2O.ai Machine Intelligence
  • 10.
    AdSense Sense H2O.ai Machine Intelligence
  • 11.
    Correlation Causality H2O.ai Machine Intelligence
  • 12.
    Data Sensors Devices Events. Signals. TimeSeries Semi-structured data. json. High velocity. High dimensions. H2O.ai Machine Intelligence
  • 13.
    Streaming Data HistoricalData Scoring from prediction Anomaly and Outliers Detection Unsupervised Learning H2O.ai Machine Intelligence
  • 14.
    Streaming Data HistoricalData Anomaly and Outliers Detection model Scoring from prediction H2O.ai Machine Intelligence
  • 15.
    Streaming Data HistoricalData Clustering / Unsupervise Learning model Scoring from prediction H2O.ai Machine Intelligence
  • 16.
    H2O.ai Machine Intelligencehttps://developer.nest.com/documentation/api-reference/devices
  • 17.
    Take Models toProduction in Java H2O.ai Machine Intelligence
  • 18.
    Onset of Rita H2O.ai Machine Intelligence
  • 19.
    Common ensemble techniques Bayesian Classifiers Ensembles of all hypotheses in hypothesis-space. Bagging Each model votes with equal weight. Bagging trains models on randomly drawn subset Boosting Incrementally build an ensemble of each new model H2O.ai Machine Intelligence
  • 20.
  • 21.
  • 22.
    Gradient Boosting Machine H2O.ai Machine Intelligence
  • 23.
  • 24.
  • 25.
    Variable Importance Comparison Gradient Boosting Machine, 50 trees Random Forest, 50 trees H2O.ai Machine Intelligence
  • 26.
    Generalized Linear Modeling– Variable Importance GLM, Elastic Net (Binomial) GLM, Elastic Net (Binomial) Categorical expansion on Age H2O.ai Machine Intelligence
  • 27.
    Variable Importance Comparison Deep Learning (Tanh / 4-layer) Deep Learning (Tanh / 3-layer) H2O.ai Machine Intelligence
  • 28.
    every generation needsto invent it’s math. Our data, our tools! H2O.ai Machine Intelligence
  • 29.
  • 30.
    Code is incompletewithout Community! Open Source Matters! H2O.ai Machine Intelligence
  • 32.
    Community Committers 30 Meet ups 90 in 12 months Coverage Conference Speakers Curriculum Stanford, MIT, CSU, SUNY, SJSU, Purdue
  • 33.
    Data Driven DecisionMaking is hard! Courage Matters! H2O.ai Machine Intelligence
  • 34.
    Winning customer trustnot just quarters! Mindset matters! H2O.ai Machine Intelligence
  • 35.
    Thanks Courtney, Nick& MLConf for bringing us to ATL
  • 36.
    Sparkling Water ApplicationLife Cycle Sparkling App jar file Spark Master JVM spark-submit Spark Worker JVM Spark Worker JVM Spark Worker JVM (1) (2) (3) (1) User submits App to Spark cluster Master node (2) App distributed to Spark cluster Worker nodes (3) Spark Executor JVMs start for App (4) H2O instance starts within each Executor JVM (5) App’s Scala main program runs Sparkling Water Cluster Spark Executor JVM H2O (4) Spark Executor JVM H2O Spark Executor JVM H2O
  • 37.
    Sparkling Water DataDistribution Sparkling Water Cluster H2O H2O H2O Spark Executor JVM Data Source (e.g. HDFS) (1) (2) (3) (1) Use Spark SQL to read data into a Spark RDD (2) Convert Spark RDD to H2O RDD; H2O RDD is column-based and highly compressed (Not shown) Run modeling and prediction workflows with H2O (3) Convert H2O RDD (e.g. predictions) back to Spark RDD H2O RDD Spark RDD Spark Executor JVM Spark Executor JVM
  • 38.
    H2O HHDFS H2O YARN HHDFS H2O Hadoop MR HHDFS Standalone YARN H2O in MR H HortonWorks, Cloudera, MapR, Intel 2O.ai Machine Intelligence
  • 39.
    H2O – TheKiller-App for Spark MLlib H2O SQL H2ORDD HDFS=DATA Sparkling Water H2O.ai Machine Intelligence In-Memory Big Data, Columnar ML 100x faster Algos R CRAN, API, fast engine API Spark API, Java MM Community Devs, Data Science
  • 40.
  • 42.
    Fraud / No-fraud 1/1000 unbalanced Click-Stream Browse / Click / Buy H2O.ai Machine Intelligence
  • 43.
    Propensity Models Merchants–to- Users Lifetime Value of Customer Pricing Engines H2O.ai Machine Intelligence