SlideShare a Scribd company logo
1 of 32
Distributed GLM
Tom Kraljevic
January 27, 2015
Atlanta, GA @ Polygon
• About H2O.ai (the company) (5 minutes)
• About H2O (the software) (10 minutes)
• H2O’s Distributed GLM (30 minutes)
• Demo of GLM (15 minutes)
• Q & A (up to 30 minutes)
Content for today’s talk can be found at:
https://github.com/h2oai/h2o-meetups/tree/master/2015_01_27_GLM
Outline for today’s talk
H2O.ai
Machine Intelligence
• Founded: 2011 venture-backed, debuted in 2012
• Product: H2O open source in-memory prediction engine
• Team: 30
• HQ: Mountain View, CA
• SriSatish Ambati – CEO & Co-founder (Founder Platfora, DataStax; Azul)
• Cliff Click – CTO & Co-founder (Creator Hotspot, Azul, Sun, Motorola, HP)
• Tom Kraljevic – VP of Engineering (CTO & Founder Luminix, Azul, Chromatic)
H2O.ai Overview
H2O.ai
Machine Intelligence
TBD.
Customer
Support
TBD
Head of Sales
Distributed
Systems
Engineers
Making
ML Scale!
H2O.ai
Machine Intelligence
H2O.ai
Machine Intelligence
cientific Advisory Council
Stephen Boyd
Professor of EE Engineering
Stanford University
Rob Tibshirani
Professor of Health Research
and Policy, and Statistics
Stanford University
Trevor Hastie
Professor of Statistics
Stanford University
H2O.ai
Machine Intelligence
What is H2O?
Open source in-memory prediction engineMath Platform
• Parallelized and distributed algorithms making the most use out of
multithreaded systems
• GLM, Random Forest, GBM, PCA, etc.
Easy to use and adoptAPI
• Written in Java – perfect for Java Programmers
• REST API (JSON) – drives H2O from R, Python, Excel, Tableau
More data? Or better models? BOTHBig Data
• Use all of your data – model without down sampling
• Run a simple GLM or a more complex GBM to find the best fit for the data
• More Data + Better Models = Better Predictions
Ensembles
Deep Neural Networks
Algorithms on H2O
• Generalized Linear Models: Binomial,
Gaussian, Gamma, Poisson and Tweedie
• Cox Proportional Hazards Models
• Naïve Bayes
• Distributed Random Forest: Classification or
regression models
• Gradient Boosting Machine: Produces an
ensemble of decision trees with increasing
refined approximations
• Deep learning: Create multi-layer feed
forward neural networks starting with an input
layer followed by multiple layers of nonlinear
transformations
Supervised Learning
Statistical
Analysis
H2O.ai
Machine Intelligence
Dimensionality Reduction
Anomaly Detection
Algorithms on H2O
• K-means: Partitions observations into k
clusters/groups of the same spatial size
• Principal Component Analysis: Linearly
transforms correlated variables to independent
components
• Autoencoders: Find outliers using a nonlinear
dimensionality reduction using deep learning
Unsupervised Learning
Clustering
H2O.ai
Machine Intelligence
Per Node
2M Row ingest/sec
50M Row Regression/sec
750M Row Aggregates / sec
On Premise
On Hadoop & Spark
On EC2
Tableau
R
JSON
Scala
Java
H2O Prediction Engine
Nano Fast Scoring Engine
Memory Manager
Columnar Compression
Rapids Query R-engine
In-Mem Map Reduce
Distributed fork/join
Python
HDFS S3 SQL NoSQL
SDK / API
Excel
H2O.ai
Machine Intelligence
Deep Learning
Ensembles
Distributed GLM
H2O.ai
Machine Intelligence
Distributed Data Taxonomy
Vector
H2O.ai
Machine Intelligence
Distributed Data Taxonomy
Vector
The vector may be very large
(billions of rows)
- Stored as a compressed column (often 4x)
- Access as Java primitives with
on-the-fly decompression
- Support fast Random access
- Modifiable with Java memory semantics
H2O.ai
Machine Intelligence
Distributed Data Taxonomy
Vector Large vectors must be distributed
over multiple JVMs
- Vector is split into chunks
- Chunk is a unit of parallel access
- Each chunk ~ 1000 elements
- Per-chunk compression
- Homed to a single node
- Can be spilled to disk
- GC very cheap
H2O.ai
Machine Intelligence
Distributed Data Taxonomy
age gender zip_code ID
A row of data is always
stored in a single JVM
Distributed data frame
- Similar to R data frame
- Adding and removing
columns is cheap
H2O.ai
Machine Intelligence
Distributed Fork/Join
JVM
task
JVM
task
JVM
task
JVM
task
JVM
task
Task is distributed in a
tree pattern
- Results are reduced
at each inner node
- Returns with a single
result when all
subtasks done
H2O.ai
Machine Intelligence
Distributed Fork/Join
JVM
task
JVM
task
task task
tasktaskchunk
chunk chunk
On each node the task is parallelized using Fork/Join
H2O.ai
Machine Intelligence
H2O’s GLM Fit
min
b
1
N
log_likelihood(family, b)( + )l(a b 1 + b 2)
1-a
2
Classic GLM Regularization
penalty
H2O.ai
Machine Intelligence
E(y) = link-1(b0 + b1x1 + b2x2 + … bnxn)
Model interpretability
Features of H2O’s GLM
• Support for various GLM families
– Gaussian (numeric regression)
– Binomial (binary classification / logistic regression)
– Poisson
– Gamma
• Regularization
– L1 (Lasso) (+ Strong rules)
– L2 (Ridge regression)
– Elastic-net
• Automatic and efficient handling of categorical variables
• Efficient distributed n-fold cross validation
• Grid search over elastic-net parameter α
• Lambda search for best model
• Upper and lower bounds for coefficients
H2O.ai
Machine Intelligence
Input Parameters:
Predictors, Response and Family
df = iris
df$is_setosa = df$Species == 'setosa'
library(h2o)
h = h2o.init()
h2odf = as.h2o(h, df)
# Y is a T/F value
y = 'is_setosa'
x = c("Sepal.Length", "Sepal.Width",
"Petal.Length", "Petal.Width")
binary_classification_model =
h2o.glm(data = h2odf, y = y, x = x, family = "binomial")
# Y is a real value
y = 'Sepal.Length'
x = c("Sepal.Width",
"Petal.Length", "Petal.Width")
numeric_regression_model =
h2o.glm(data = h2odf, y = y, x = x, family = "gaussian")H2O.ai
Machine Intelligence
Input Parameters:
Number of iterations
H2O.ai
Machine Intelligence
# Specify maximum number of IRLSM iterations
# (default is 100)
h2o.glm(…, iter.max = 50)
Input Parameters:
Regularization
# Enable lambda_search
h2o.glm(…, lambda_search = TRUE)
# Enable strong_rules (requires lambda_search)
h2o.glm(…, lambda_search = TRUE, strong_rules = TRUE)
# Specify max_predictors (requires lambda_search)
h2o.glm(…, lambda_search = TRUE, max_predictors = 100)
# Grid search over alpha
h2o.glm(…, alpha = c(0.05, 0.5, 0.95))
# Turn off regularization entirely
h2o.glm(…, lambda = 0)
H2O.ai
Machine Intelligence
Input Parameters:
Cross validation
# Specify 5-fold cross validation
model_with_5folds = h2o.glm(data = h2odf, y = y, x = x, family = "binomial",
nfolds = 5)
print(model_with_5folds@model$auc)
print(model_with_5folds@xval[[1]]@model$auc)
print(model_with_5folds@xval[[2]]@model$auc)
print(model_with_5folds@xval[[3]]@model$auc)
print(model_with_5folds@xval[[4]]@model$auc)
print(model_with_5folds@xval[[5]]@model$auc)
H2O.ai
Machine Intelligence
Outputs
• Coefficients
• Normalized coefficients (variable importance)
H2O.ai
Machine Intelligence
Coefficients:
Dest.ABQ Dest.ACY Dest.ALB Dest.AMA Dest.ANC
0.80322 -0.06288 0.13333 0.14092 0.92581
Dest.AT Dest.AUS Dest.AVL Dest.AVP Dest.BDL
-0.21849 0.78392 -0.34974 -0.31825 0.38924
. .
. .
DayofMonth DayOfWeek CRSDepTime CRSArrTime FlightNum
-0.03087 0.02110 0.00029 0.00027 0.00007
Distance Intercept
0.00024 136.69614
Outputs
• Null deviance, residual deviance
• AIC (Akaike information criterion)
• Deviance explained
H2O.ai
Machine Intelligence
Degrees of Freedom: 43942 Total (i.e. Null); 43668 Residual
Null Deviance: 60808
Residual Deviance: 54283 AIC: 54833
Deviance Explained: 0.10731
Outputs
• Confusion matrix (for logistic regression)
• AUC (for logistic regression)
H2O.ai
Machine Intelligence
Confusion Matrix:
Predicted
Actual false true Error
false 6747 14126 0.67676
true 2490 20580 0.10793
Totals 9237 34706 0.37813
AUC = 0.7166454 (on train)
GLM Lifecycle
H2O.ai
Machine Intelligence
(1) Calc gradient of null model (mean)
Done
(2) Get next l
(3) Calc active predictors
(4) Calc Gram Matrix (XTX)
(5) ADMM + Cholesky decomposition
(6) Calc gradient on all predictors
(7) KKT checks
Start
Iterative
Reweighted
Least Squares
Method
to compute b
GLM Runtime Cost
H2O.ai
Machine Intelligence
Calc Gram Matrix (XTX)
ADMM + Cholesky
decomposition
CPU Memory
O(
M * N2
p * n
) O(M * N) + O(N2 * p * n)
O(
N3
p
)
M Number of rows in the training data
N Number of predictors in the training data
p Number of CPUs per node
n Number of nodes in the cluster
O(N2)
(the training data)
Best Practices
• GLM works best on tall and skinny datasets
– But if you have a wide dataset, use L1 penalty and Strong Rules
to eliminate columns from the model
• Give lambda search a shot
– But specify strong_rules and/or max_predictors if it is taking too
long
– 90% of the time is spent on the larger models with the small
lambdas, so specifying max_predictors helps a lot
• Keep a little bit of L2 for numerical stability (i.e. don’t use
alpha 1.0, use 0.95 instead)
• Use symmetric nodes in your cluster
• Bigger nodes can help the ADMM / Cholesky run faster
• Impute if you need to before running GLM
H2O.ai
Machine Intelligence
Things to Watch Out for
• Look for suspiciously different cross-validation results between
folds
• Look for explained deviance
– Too close to 0: model doesn’t predict well
– Too close to 1: model predicts “too” well (one of your input cols is
cheating)
• Same for AUC
– Too close to 0.5: model doesn’t predict well
– Too close to 1: model predicts “too” well
• See if GLM stops early for a particular lambda that interests you
(performing all the iterations probably means the solution isn’t
good)
• Too many N/As in your data (GLM discards rows with N/A values)
– If you have a really bad column, you might accidentally be losing all
your rows.
H2O.ai
Machine Intelligence
H2O.ai
Machine Intelligence
Demonstration
H2O.ai
Machine Intelligence
Q & A
H2O.ai
Machine Intelligence
Thanks for attending!
Content for today’s talk can be found at:
https://github.com/h2oai/h2o-meetups/tree/master/2015_01_27_GLM

More Related Content

What's hot

Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Spark Summit
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...Databricks
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLSpark Summit
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXrhatr
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Jen Aman
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OSri Ambati
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsJen Aman
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Spark Summit
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!DataWorks Summit
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersJen Aman
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkFlink Forward
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkCloudera, Inc.
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Databricks
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Milind Bhandarkar
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerSpark Summit
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce AlgorithmsAmund Tveit
 
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit
 

What's hot (20)

Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
Sketching Data with T-Digest In Apache Spark: Spark Summit East talk by Erik ...
 
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
ADMM-Based Scalable Machine Learning on Apache Spark with Sauptik Dhar and Mo...
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Optimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone MLOptimizing Terascale Machine Learning Pipelines with Keystone ML
Optimizing Terascale Machine Learning Pipelines with Keystone ML
 
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphXIntroduction into scalable graph analysis with Apache Giraph and Spark GraphX
Introduction into scalable graph analysis with Apache Giraph and Spark GraphX
 
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
Massive Simulations In Spark: Distributed Monte Carlo For Global Health Forec...
 
High Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2OHigh Performance Machine Learning in R with H2O
High Performance Machine Learning in R with H2O
 
Enhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable StatisticsEnhancing Spark SQL Optimizer with Reliable Statistics
Enhancing Spark SQL Optimizer with Reliable Statistics
 
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
 
Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!Surge: Rise of Scalable Machine Learning at Yahoo!
Surge: Rise of Scalable Machine Learning at Yahoo!
 
Time-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity ClustersTime-Evolving Graph Processing On Commodity Clusters
Time-Evolving Graph Processing On Commodity Clusters
 
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in FlinkAnwar Rizal – Streaming & Parallel Decision Tree in Flink
Anwar Rizal – Streaming & Parallel Decision Tree in Flink
 
Large Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache SparkLarge Scale Machine Learning with Apache Spark
Large Scale Machine Learning with Apache Spark
 
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
Flare: Scale Up Spark SQL with Native Compilation and Set Your Data on Fire! ...
 
Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011Modeling with Hadoop kdd2011
Modeling with Hadoop kdd2011
 
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S OptimizerDeep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
Deep Dive Into Catalyst: Apache Spark 2.0'S Optimizer
 
Mapreduce Algorithms
Mapreduce AlgorithmsMapreduce Algorithms
Mapreduce Algorithms
 
Distributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark MeetupDistributed Deep Learning + others for Spark Meetup
Distributed Deep Learning + others for Spark Meetup
 
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
Accumulo Summit 2015: Using D4M for rapid prototyping of analytics for Apache...
 

Viewers also liked

Sparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySri Ambati
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Sri Ambati
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelSri Ambati
 
Machine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsMachine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsSri Ambati
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFSri Ambati
 
H2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellSri Ambati
 
General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | StatisticsTransweb Global Inc
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramSri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverSri Ambati
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...Sri Ambati
 
Making of R2D3 - Tony Chu
Making of R2D3 - Tony ChuMaking of R2D3 - Tony Chu
Making of R2D3 - Tony ChuSri Ambati
 
実装ディープラーニング
実装ディープラーニング実装ディープラーニング
実装ディープラーニングYurie Oka
 
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsReading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsChristian Robert
 

Viewers also liked (15)

Sparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public SafetySparkling Water Meetup: Deep Learning for Public Safety
Sparkling Water Meetup: Deep Learning for Public Safety
 
Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015Python and H2O with Cliff Click at PyData Dallas 2015
Python and H2O with Cliff Click at PyData Dallas 2015
 
H2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno CandelH2O World - Welcome to H2O World with Arno Candel
H2O World - Welcome to H2O World with Arno Candel
 
Machine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of ThingsMachine Learning for the Sensored Internet of Things
Machine Learning for the Sensored Internet of Things
 
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SFH2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
H2O Machine Learning and Kalman Filters for Machine Prognostics - Galvanize SF
 
H2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine UdellH2O World - Generalized Low Rank Models - Madeleine Udell
H2O World - Generalized Low Rank Models - Madeleine Udell
 
General Linear Model | Statistics
General Linear Model | StatisticsGeneral Linear Model | Statistics
General Linear Model | Statistics
 
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda IngramH2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
H2O World - Migrating from Proprietary Analytics Software - Fonda Ingram
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Intro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - DenverIntro to Machine Learning with H2O and Python - Denver
Intro to Machine Learning with H2O and Python - Denver
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
H2O World - Transamerica's Product Recommender Platform - Vishal Bamba & Niti...
 
Making of R2D3 - Tony Chu
Making of R2D3 - Tony ChuMaking of R2D3 - Tony Chu
Making of R2D3 - Tony Chu
 
実装ディープラーニング
実装ディープラーニング実装ディープラーニング
実装ディープラーニング
 
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimatorsReading the Lindley-Smith 1973 paper on linear Bayes estimators
Reading the Lindley-Smith 1973 paper on linear Bayes estimators
 

Similar to Distributed GLM with H2O - Atlanta Meetup

Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupSri Ambati
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatternsgrepalex
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopAvkash Chauhan
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudRevolution Analytics
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13Dave Gardner
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the TrenchesYan Cui
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13Dave Gardner
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on RailsAvi Kedar
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationTimothy Fitz
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth PresentationEric Ries
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowMateuszSzczyrzyca
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdDatabricks
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...DevClub_lv
 
MassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteMassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteAriel Tseitlin
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Java Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldJava Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldКирилл Толкачёв
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Storyvanphp
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeAmazon Web Services
 

Similar to Distributed GLM with H2O - Atlanta Meetup (20)

Machine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville MeetupMachine Learning for Smarter Apps - Jacksonville Meetup
Machine Learning for Smarter Apps - Jacksonville Meetup
 
Avoiding big data antipatterns
Avoiding big data antipatternsAvoiding big data antipatterns
Avoiding big data antipatterns
 
Applied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R WorkshopApplied Machine learning using H2O, python and R Workshop
Applied Machine learning using H2O, python and R Workshop
 
Speed up R with parallel programming in the Cloud
Speed up R with parallel programming in the CloudSpeed up R with parallel programming in the Cloud
Speed up R with parallel programming in the Cloud
 
Planning to Fail #phpne13
Planning to Fail #phpne13Planning to Fail #phpne13
Planning to Fail #phpne13
 
AWS Lambda from the Trenches
AWS Lambda from the TrenchesAWS Lambda from the Trenches
AWS Lambda from the Trenches
 
Planning to Fail #phpuk13
Planning to Fail #phpuk13Planning to Fail #phpuk13
Planning to Fail #phpuk13
 
Web Development using Ruby on Rails
Web Development using Ruby on RailsWeb Development using Ruby on Rails
Web Development using Ruby on Rails
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
Just In Time Scalability  Agile Methods To Support Massive Growth PresentationJust In Time Scalability  Agile Methods To Support Massive Growth Presentation
Just In Time Scalability Agile Methods To Support Massive Growth Presentation
 
Gopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracowGopher in performance_tales_ms_go_cracow
Gopher in performance_tales_ms_go_cracow
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
“Machine Learning in Production + Case Studies” by Dmitrijs Lvovs from Epista...
 
MassTLC Cloud Summit Keynote
MassTLC Cloud Summit KeynoteMassTLC Cloud Summit Keynote
MassTLC Cloud Summit Keynote
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Java Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real worldJava Day Minsk 2016 Keynote about Microservices in real world
Java Day Minsk 2016 Keynote about Microservices in real world
 
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling StoryPHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
PHP At 5000 Requests Per Second: Hootsuite’s Scaling Story
 
CI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the TimeCI/CD on AWS Deploy Everything All the Time
CI/CD on AWS Deploy Everything All the Time
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrandmasabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...masabamasaba
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...Jittipong Loespradit
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...masabamasaba
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastPapp Krisztián
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...masabamasaba
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024VictoriaMetrics
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Bert Jan Schrijver
 

Recently uploaded (20)

WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
WSO2CON 2024 - API Management Usage at La Poste and Its Impact on Business an...
 
WSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go PlatformlessWSO2CON2024 - It's time to go Platformless
WSO2CON2024 - It's time to go Platformless
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With SimplicityWSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
WSO2Con2024 - Enabling Transactional System's Exponential Growth With Simplicity
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Atlanta Psychic Readings, Attraction spells,Brin...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
WSO2CON 2024 - Cloud Native Middleware: Domain-Driven Design, Cell-Based Arch...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 

Distributed GLM with H2O - Atlanta Meetup

  • 1. Distributed GLM Tom Kraljevic January 27, 2015 Atlanta, GA @ Polygon
  • 2. • About H2O.ai (the company) (5 minutes) • About H2O (the software) (10 minutes) • H2O’s Distributed GLM (30 minutes) • Demo of GLM (15 minutes) • Q & A (up to 30 minutes) Content for today’s talk can be found at: https://github.com/h2oai/h2o-meetups/tree/master/2015_01_27_GLM Outline for today’s talk H2O.ai Machine Intelligence
  • 3. • Founded: 2011 venture-backed, debuted in 2012 • Product: H2O open source in-memory prediction engine • Team: 30 • HQ: Mountain View, CA • SriSatish Ambati – CEO & Co-founder (Founder Platfora, DataStax; Azul) • Cliff Click – CTO & Co-founder (Creator Hotspot, Azul, Sun, Motorola, HP) • Tom Kraljevic – VP of Engineering (CTO & Founder Luminix, Azul, Chromatic) H2O.ai Overview H2O.ai Machine Intelligence
  • 5. H2O.ai Machine Intelligence cientific Advisory Council Stephen Boyd Professor of EE Engineering Stanford University Rob Tibshirani Professor of Health Research and Policy, and Statistics Stanford University Trevor Hastie Professor of Statistics Stanford University
  • 6. H2O.ai Machine Intelligence What is H2O? Open source in-memory prediction engineMath Platform • Parallelized and distributed algorithms making the most use out of multithreaded systems • GLM, Random Forest, GBM, PCA, etc. Easy to use and adoptAPI • Written in Java – perfect for Java Programmers • REST API (JSON) – drives H2O from R, Python, Excel, Tableau More data? Or better models? BOTHBig Data • Use all of your data – model without down sampling • Run a simple GLM or a more complex GBM to find the best fit for the data • More Data + Better Models = Better Predictions
  • 7. Ensembles Deep Neural Networks Algorithms on H2O • Generalized Linear Models: Binomial, Gaussian, Gamma, Poisson and Tweedie • Cox Proportional Hazards Models • Naïve Bayes • Distributed Random Forest: Classification or regression models • Gradient Boosting Machine: Produces an ensemble of decision trees with increasing refined approximations • Deep learning: Create multi-layer feed forward neural networks starting with an input layer followed by multiple layers of nonlinear transformations Supervised Learning Statistical Analysis H2O.ai Machine Intelligence
  • 8. Dimensionality Reduction Anomaly Detection Algorithms on H2O • K-means: Partitions observations into k clusters/groups of the same spatial size • Principal Component Analysis: Linearly transforms correlated variables to independent components • Autoencoders: Find outliers using a nonlinear dimensionality reduction using deep learning Unsupervised Learning Clustering H2O.ai Machine Intelligence
  • 9. Per Node 2M Row ingest/sec 50M Row Regression/sec 750M Row Aggregates / sec On Premise On Hadoop & Spark On EC2 Tableau R JSON Scala Java H2O Prediction Engine Nano Fast Scoring Engine Memory Manager Columnar Compression Rapids Query R-engine In-Mem Map Reduce Distributed fork/join Python HDFS S3 SQL NoSQL SDK / API Excel H2O.ai Machine Intelligence Deep Learning Ensembles
  • 12. Distributed Data Taxonomy Vector The vector may be very large (billions of rows) - Stored as a compressed column (often 4x) - Access as Java primitives with on-the-fly decompression - Support fast Random access - Modifiable with Java memory semantics H2O.ai Machine Intelligence
  • 13. Distributed Data Taxonomy Vector Large vectors must be distributed over multiple JVMs - Vector is split into chunks - Chunk is a unit of parallel access - Each chunk ~ 1000 elements - Per-chunk compression - Homed to a single node - Can be spilled to disk - GC very cheap H2O.ai Machine Intelligence
  • 14. Distributed Data Taxonomy age gender zip_code ID A row of data is always stored in a single JVM Distributed data frame - Similar to R data frame - Adding and removing columns is cheap H2O.ai Machine Intelligence
  • 15. Distributed Fork/Join JVM task JVM task JVM task JVM task JVM task Task is distributed in a tree pattern - Results are reduced at each inner node - Returns with a single result when all subtasks done H2O.ai Machine Intelligence
  • 16. Distributed Fork/Join JVM task JVM task task task tasktaskchunk chunk chunk On each node the task is parallelized using Fork/Join H2O.ai Machine Intelligence
  • 17. H2O’s GLM Fit min b 1 N log_likelihood(family, b)( + )l(a b 1 + b 2) 1-a 2 Classic GLM Regularization penalty H2O.ai Machine Intelligence E(y) = link-1(b0 + b1x1 + b2x2 + … bnxn) Model interpretability
  • 18. Features of H2O’s GLM • Support for various GLM families – Gaussian (numeric regression) – Binomial (binary classification / logistic regression) – Poisson – Gamma • Regularization – L1 (Lasso) (+ Strong rules) – L2 (Ridge regression) – Elastic-net • Automatic and efficient handling of categorical variables • Efficient distributed n-fold cross validation • Grid search over elastic-net parameter α • Lambda search for best model • Upper and lower bounds for coefficients H2O.ai Machine Intelligence
  • 19. Input Parameters: Predictors, Response and Family df = iris df$is_setosa = df$Species == 'setosa' library(h2o) h = h2o.init() h2odf = as.h2o(h, df) # Y is a T/F value y = 'is_setosa' x = c("Sepal.Length", "Sepal.Width", "Petal.Length", "Petal.Width") binary_classification_model = h2o.glm(data = h2odf, y = y, x = x, family = "binomial") # Y is a real value y = 'Sepal.Length' x = c("Sepal.Width", "Petal.Length", "Petal.Width") numeric_regression_model = h2o.glm(data = h2odf, y = y, x = x, family = "gaussian")H2O.ai Machine Intelligence
  • 20. Input Parameters: Number of iterations H2O.ai Machine Intelligence # Specify maximum number of IRLSM iterations # (default is 100) h2o.glm(…, iter.max = 50)
  • 21. Input Parameters: Regularization # Enable lambda_search h2o.glm(…, lambda_search = TRUE) # Enable strong_rules (requires lambda_search) h2o.glm(…, lambda_search = TRUE, strong_rules = TRUE) # Specify max_predictors (requires lambda_search) h2o.glm(…, lambda_search = TRUE, max_predictors = 100) # Grid search over alpha h2o.glm(…, alpha = c(0.05, 0.5, 0.95)) # Turn off regularization entirely h2o.glm(…, lambda = 0) H2O.ai Machine Intelligence
  • 22. Input Parameters: Cross validation # Specify 5-fold cross validation model_with_5folds = h2o.glm(data = h2odf, y = y, x = x, family = "binomial", nfolds = 5) print(model_with_5folds@model$auc) print(model_with_5folds@xval[[1]]@model$auc) print(model_with_5folds@xval[[2]]@model$auc) print(model_with_5folds@xval[[3]]@model$auc) print(model_with_5folds@xval[[4]]@model$auc) print(model_with_5folds@xval[[5]]@model$auc) H2O.ai Machine Intelligence
  • 23. Outputs • Coefficients • Normalized coefficients (variable importance) H2O.ai Machine Intelligence Coefficients: Dest.ABQ Dest.ACY Dest.ALB Dest.AMA Dest.ANC 0.80322 -0.06288 0.13333 0.14092 0.92581 Dest.AT Dest.AUS Dest.AVL Dest.AVP Dest.BDL -0.21849 0.78392 -0.34974 -0.31825 0.38924 . . . . DayofMonth DayOfWeek CRSDepTime CRSArrTime FlightNum -0.03087 0.02110 0.00029 0.00027 0.00007 Distance Intercept 0.00024 136.69614
  • 24. Outputs • Null deviance, residual deviance • AIC (Akaike information criterion) • Deviance explained H2O.ai Machine Intelligence Degrees of Freedom: 43942 Total (i.e. Null); 43668 Residual Null Deviance: 60808 Residual Deviance: 54283 AIC: 54833 Deviance Explained: 0.10731
  • 25. Outputs • Confusion matrix (for logistic regression) • AUC (for logistic regression) H2O.ai Machine Intelligence Confusion Matrix: Predicted Actual false true Error false 6747 14126 0.67676 true 2490 20580 0.10793 Totals 9237 34706 0.37813 AUC = 0.7166454 (on train)
  • 26. GLM Lifecycle H2O.ai Machine Intelligence (1) Calc gradient of null model (mean) Done (2) Get next l (3) Calc active predictors (4) Calc Gram Matrix (XTX) (5) ADMM + Cholesky decomposition (6) Calc gradient on all predictors (7) KKT checks Start Iterative Reweighted Least Squares Method to compute b
  • 27. GLM Runtime Cost H2O.ai Machine Intelligence Calc Gram Matrix (XTX) ADMM + Cholesky decomposition CPU Memory O( M * N2 p * n ) O(M * N) + O(N2 * p * n) O( N3 p ) M Number of rows in the training data N Number of predictors in the training data p Number of CPUs per node n Number of nodes in the cluster O(N2) (the training data)
  • 28. Best Practices • GLM works best on tall and skinny datasets – But if you have a wide dataset, use L1 penalty and Strong Rules to eliminate columns from the model • Give lambda search a shot – But specify strong_rules and/or max_predictors if it is taking too long – 90% of the time is spent on the larger models with the small lambdas, so specifying max_predictors helps a lot • Keep a little bit of L2 for numerical stability (i.e. don’t use alpha 1.0, use 0.95 instead) • Use symmetric nodes in your cluster • Bigger nodes can help the ADMM / Cholesky run faster • Impute if you need to before running GLM H2O.ai Machine Intelligence
  • 29. Things to Watch Out for • Look for suspiciously different cross-validation results between folds • Look for explained deviance – Too close to 0: model doesn’t predict well – Too close to 1: model predicts “too” well (one of your input cols is cheating) • Same for AUC – Too close to 0.5: model doesn’t predict well – Too close to 1: model predicts “too” well • See if GLM stops early for a particular lambda that interests you (performing all the iterations probably means the solution isn’t good) • Too many N/As in your data (GLM discards rows with N/A values) – If you have a really bad column, you might accidentally be losing all your rows. H2O.ai Machine Intelligence
  • 32. Q & A H2O.ai Machine Intelligence Thanks for attending! Content for today’s talk can be found at: https://github.com/h2oai/h2o-meetups/tree/master/2015_01_27_GLM

Editor's Notes

  1. L1 penalization (lasso) is a technique for eliminating predictors from the model. L1 leads to a sparse solution, which is good for model interpretability L2 penalization (ridge) is a technique that tends to improve numerical stability and push the coefficients of correlated columns toward each other Elastic-net combines L1 and L2 with an alpha parameter
  2. The null deviance shows how well the response is predicted by the model with nothing but an intercept. The residual deviance shows how well the response is predicted by the model when the predictors are included. The Akaike information criterion (AIC) is a measure of the relative quality of a statistical model for a given set of data. That is, given a collection of models for the data, AIC estimates the quality of each model, relative to the other models. Hence, AIC provides a means for model selection. AIC is founded on information theory: it offers a relative estimate of the information lost when a given model is used to represent the process that generates the data. In doing so, it deals with the trade-off between the goodness of fit of the model and the complexity of the model. AIC does not provide a test of a model in the sense of testing a null hypothesis; i.e. AIC can tell nothing about the quality of the model in an absolute sense. If all the candidate models fit poorly, AIC will not give any warning of that. Null and Residual deviance can be combined to give the proportion deviance explained, a generalization of r^2.
  3. This diagram shows the full GLM lifecycle (1) GLM starts by creating a null model, which is just the mean (2) When using Lambda search, GLM tries many different values starting at LambdaMax and decreasing from there (3) Strong rules calculate the active predictors based on a warm start from the previous iteration (4, 5) We convert the log likelihood problem into an Iterative Reweighted Least Squares problem so we can distribute it across many nodes. (4) The Gram matrix calculation (Green box) is distributed. This is big data. (5) The ADMM and Cholesky calculations are done on a single node (although the Cholesky is done in parallel). ADMM solves for the non-differentiable Lasso penalty. This is (usually) small data. ADMM solves on the smaller Gram, not the entire training dataset of X. (6, 7) Calculate gradient again based on new Beta values. Check for KKT stopping criteria.