2. H2O.ai
Machine Intelligence
Outline
• Introduction to H2O
• GLM Overview
• Quick demo on Airlines Data
• Overview of H2O GLM features
• Common usage patterns
•finding optimal regularization
•handling wide datasets
• Kaggle Example
•Avito Dataset overview
•basic model
•feature engineering
•final model building 2
3. H2O.ai
Machine Intelligence
In-Memory ML
Distributed
Open Source
APIs
3
Memory-Efficient Data Structures
Cutting-Edge Algorithms
Use all your Data (No Sampling)
Accuracy with Speed and Scale
Ownership of Methods - Apache V2
Easy to Deploy: Bare, Hadoop, Spark, etc.
Java, Scala, R, Python, JavaScript, JSON
NanoFast Scoring Engine (POJO)
H2O - Product Overview
5. H2O.ai
Machine Intelligence
5
cientific Advisory Council
Stephen Boyd
Professor of EE Engineering
Stanford University
Rob Tibshirani
Professor of Health Research
and Policy, and Statistics
Stanford University
Trevor Hastie
Professor of Statistics
Stanford University
6. H2O.ai
Machine Intelligence
103 634 2789
463 2,887 13,237
Companies
Users
Mar 2014 July 2014 Mar 2015
Active Users
150+
6
Strong Community & Growth
5/25/15 @kdnuggets t.co/4xSgleSIdY
7. H2O.ai
Machine Intelligence
7
Ad Optimization (200% CPA Lift with H2O)
P2B Model Factory (60k models,
15x faster with H2O than before)
Fraud Detection (11% higher accuracy with
H2O Deep Learning - saves millions)
…and many large insurance and financial
services companies!
Real-time marketing (H2O is 10x faster than
anything else)
Actual Customer Use Cases
9. H2O.ai
Machine Intelligence
- Well known statistical/machine learning method
- Fits a linear model
- link(y) = c1*x1 + c2*x + … + cn*xn + intercept
- Produces (relatively) simple model
- easy to fit
- easy to understand and interpret
- well known statistical properties
- Regression problems
- gaussian, poisson, gamma, tweedie
- Classification
- binomial, multinomial
- Requires good features
- not as powerful on raw data as some other models
- (gbm/deep learning)
Generalized Linear Models
9
10. H2O.ai
Machine Intelligence
- Linear Model
- defined by vector of coefficients
- 1 number per predictor
- Parametrized by Family and Link
- Family
- Our assumption about distribution of the response
- e.g. poisson for regression on counts, binomial for
two class classification
- Link
- non-linear transform of the response
- e.g. logit to generate s-curve for logistic regression
- Fitted by maximum likelihood
- pick the model with max probability of seeing the data
- need an iterative solver (e.g. newton method, L-BFGS)
Generalized Linear Models 2
10
11. H2O.ai
Machine Intelligence
Generalized Linear Models 3
11
Simple 2-class classification example
Linear Regression fit
(family=gaussian,link =identity)
Logistic Regression fit
(family=binomial,link = logit)
12. H2O.ai
Machine Intelligence
- Problems
- can overfit - works great on training, fails on test
- solution is not unique with correlated variables
- Solution - Add Regularization
- add penalty to reduce size of the vector
- L1 or L2 norm of the coefficient vector
- L1 versus L2
- L2 dense solution
- correlated variables coefficients are pushed to the same
value
- L1 sparse solution
- picks one correlated variable, others discarded
- Elastic Net
- combination of L1 and L2
- sparse solution, correlated variables grouped, enter/ leave the
model together
Penalized Generalized Linear Models
12
13. H2O.ai
Machine Intelligence
- Fully Distributed and Parallel
- handles datasets with up to 100s of thousand of predictors
- scales linearly with number of rows
- processes datasets with 100s of millions of rows in seconds
- All standard GLM features
- standard families
- support for observation weights and offset
- Elastic Net Regularization
- lambda-search - efficient computation of optimal regularization
strength
- applies strong rules to filter out in-active coefficients
- Several solvers for different problems
- Iterative re-weighted least squares with ADMM solver
- L-BFGS for wide problems
- Coordinate Descent (Experimental)
GLM on H2O
13
14. H2O.ai
Machine Intelligence
- Automatic handling of categorical variables
- automatically expands categoricals into 1-hot encoded binary
vectors
- Efficient handling (sparse acces, sparse covariance matrix)
- (Unlike R) uses all levels by default if running with
regularization
- Missing value handling
- missing values are not handled and rows with any missing value
will be omitted from the training dataset
- need to impute missing values up front if there are many
GLM on H2O 2
14
17. H2O.ai
Machine Intelligence
17
Results in Seconds on Big Data
Logistic Regression: ~20s
elastic net, alpha=0.5, lambda=1.379e-4 (auto)
Deep Learning: ~70s
4 hidden ReLU layers of 20 neurons, 2 epochs
8-node EC2 cluster: 64 virtual cores, 1GbE
Year, Month, Sched.
Dep. Time have
non-linear impact
Chicago, Atlanta,
Dallas:
often delayed
All cores maxed out
+9% AUC
+--+++
18. H2O.ai
Machine Intelligence
- Standard Metrics as other H2O algos +
- residual deviance
- null deviance
- degrees of freedom
- Coefficients / standardized coefficients
- The actual model
- One number per predictor
- Model is fitted on standardized data by default (parameter)
- standardized coefficients are the actual coefficients
fitted on standardized data
- (non-standardized) coefficients are de-scaled version of
standardized coefficients (so that they can be applied to
original dataset)
Output Fields
18
20. H2O.ai
Machine Intelligence
- Running GLM straight on the data runs fast, but not great accuracy
- We will try to improve it by:
- turning variables to categoricals
- imputing NAs with means
Avito Dataset: Predict User Clicks
20
21. H2O.ai
Machine Intelligence
- Further improvements
- cut numerical columns into intervals to make new categoricals
- use h2o.hist + h2o.cut
- add interactions
- use h2o.interaction
Avito Dataset: Predict User Clicks
21
22. H2O.ai
Machine Intelligence
- Solver selection
- IRLSM - default choice with L1 penalty
- works great with small number of predictors
- efficient L1 solver
- can handle wide datasets with lambda search and L1 penalty
- L-BFGS
- handles wide data well, but
- can iterate a lot (take long time), especially with L1 penalty
- tune the objective epsilon - often many iterations are spent
on minor improvements
- Regularization Selection
- Compare sparse versus dense
- compare runs with alpha >= .5, alpha == 0
- generally L1 does slightly better
- Run lambda search to pick optimal regularization strength
General Usage Guidelines
22
23. H2O.ai
Machine Intelligence
- Do not pre-expand categorical variables
- H2O expands categorical variables automatically,
- way more efficient
- Adding features
- Splitting numerical variables into intervals helps
- Adding categorical interactions helps
- Using Lambda-Search
- always use validation data set
- otherwise picks lambda.min
- validation dataset is used to pick the best lambda value ->
need separate test set!
- check that lambda.best > lambda.min
- otherwise did not start overfitting, smaller lambda may be
better
- re-run with smaller lambda.min
General Usage Guidelines 2
23