SlideShare a Scribd company logo
1 of 25
Download to read offline
Penalized GLM
and
GBM Guide
Arno Candel, PhD
Tomas Nykodym
Patrick Hall
Penalized Generalized Linear Models
Part 1
Linear Modeling Methods
Ordinary Least Squares
Carl Friedrich Gauss
(1777–1855)
Elastic Net
Hui Zou and Trevor Hastie
Regularization and variable selection via the elastic net,
Journal of the Royal Statistical Society, 2005
Ordinary Least Squares Requirements
Requirements If	broken	…
Linear	relationship between	inputs	and	
targets;	normal	y,	normal	errors
Inappropriate	application/unreliable results; use	a	machine	learning	
technique;	use	GLM
N >	p Underspecified/unreliable results; use	LASSO or	elastic net	penalized	
regression	
No	strong	multicollinearity Ill-conditioned/unstable/unreliable results;	Use	
ridge(L2/Tikhonov)/elastic net	penalized	regression
No influential	outliers Biased predictions,	parameters,	and	statistical	tests;	use	robust	
methods,	i.e.	IRLS,	Huber	loss,	investigate/remove	outliers
Constant	variance/no heteroskedasticity Lessened	predictive accuracy,	invalidates	statistical	tests;	use	GLM	in	
some	cases
Limited	correlation	between	input	rows	
(no	autocorrelation)
Invalidates	statistical	tests;	use time-series	methods	or	machine	
learning	technique
Anatomy of GLM
Family/distribution defines
mean and variance of Y
Family/distribution allows for non-constant variance
Nonlinear link function between
linear component and E(Y)
Linear
component
Distributions / Loss Functions
For regression problems, there’s a large choice of different distributions
and related loss functions:
• Gaussian distribution, squared error loss, sensitive to outliers
• Laplace distribution, absolute error loss, more robust to outliers
• Huber loss, hybrid of squared error & absolute error, robust to outliers
• Poisson distribution (e.g., number of claims in a time period)
• Gamma distribution (e.g, size of insurance claims)
• Tweedie distribution (compound Poisson-Gamma)
• Binomial distribution, log-loss for binary classification
Also, H2O supports:
• Offsets
• Observation weights
Iteratively reweighted least squares (IRLS) complements
model fitting methods in the presence of outliers by:
• Initially setting all observations to an equal weight
• Fitting GLM parameters (β’s)
• Calculating the residuals of the fitted GLM
• Assigning observations with high residuals a lower weight
• Repeating until GLM parameters (β’s) converge
Iteratively Reweighed Least Squares
!" = min
'
( )* − !, − ( -*. ∗ !.
0
.12
3
4
*12
“Inner loop”
(ADMM optimization in H2O)
“Outer loop”
Regularization (e.g. Penalties)
Combining GLM, IRLS and Regularization
!" = min
'
( )* − !, − ( -*. ∗ !.
0
.12
3
4
*12
+ 	7((9 ∗ |!.| + 1 − 9 ∗ !.
3
)
0
.12
Error function for a GLM.
𝜆 controls magnitude of penalties. Variable selection conducted by refitting model
many times while varying 𝜆. Decreasing 𝜆 allows more variables in the model.
“Outermost loop” L2/Ridge/Tinkhonov penalty
helps address multicollinearity.
L1/LASSO helps
with variable
selection.
𝛼 tunes balance between L1 and L2 penalties,
i.e. elastic net.
• Inner loop: Fitting GLM parameters for a given 𝜆 and 𝛼
• Outer loop: IRLS until β’s converge
• Outermost loop: 𝜆 varies from 𝜆max to 0
Elastic net advantages over L1 or L2:
• Does not saturate at min(p, N)
• Allows groups of correlated variables
Outer most loop(s):
• 𝜆 search from 𝜆max (where all coefficients = 0) to 𝜆 = 0
• Grid search on alpha usually not necessary
• Just try a few: 0, 0.5, 0.95
• Always keep some L2
• Set max_predictors, large models take longer
• Models can also be validated:
• Validation and test partitioning available
• Cross-validated (k-fold, CV predictions available)
Hyperparameter Selection
•P-values available for non-penalized models
•β constraints available, i.e. for all positive β’s
•Use IRLS optimization for tall, skinny data sets
•L1 OR LBFGS for wide data (> 500 predictors)
• (L1 AND LBFGS possible, but can be slower)
Practical Pointers
Gradient Boosting Machines
Part 2
GBM builds a sequential ensemble of decision trees: each tree attempts to correct the mis-
predictions of all previous trees
…
First tree models
the data.
Data is sampled with
replacement for each tree
(iteration). Both rows and
columns can be sampled.
Each subsequent tree models the residuals of the previous trees.
(Equivalent to a gradient-based update to model predictions)
Gradient Boosting Machines
Jerome H. Friedman
Greedy function approximation: a gradient boosting machine
Annals of statistics, 2001
Ensemble models: Intuition
•Stability: the predictions of ensemble models are stable w.r.t. minor perturbations of
training data
• Variable hiding: important variables are often correlated and can hide one-another
(only the single most important variable from a group of important correlated
variables will be used in many models); in different samples, many different
important variables can shine through
• Representative samples: some samples can be highly representative of new data
15
Distributed Gradient Boosting Machine
find optimal split
(feature & value)
• H2O: First open-source implementation of scalable,
distributed Gradient Boosting Machine - fully featured
• Each tree is built in parallel on the entire compute cluster
• Discretization (binning) for speedup without loss of
accuracy
age < 25 ?
Y N
all data
age
1 118
income
1 1M
Analytical error
best split: age 25
H2O: discretized into
1 118
age 25
age
16
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
map()
driver
Algo calls

histogram
method
data parallelism - global histogram computation
Scalable Distributed Histogram Calculation
global histogram
local histogram - one each for w, w*y and w*y2
w: observation weights
y : response
Same results as
if computed on
a single
compute node
• GBM builds a sequential ensemble of decision trees: each tree
attempts to correct the mis-predictions of all previous trees
• Each decision tree partitions the training data iteratively (left vs right)
• Splits are found by an approximated exhaustive search
(all features, all values)
• Stop splitting if
• max depth is reached
• no error decrease from further splitting
• Number of leaves grows exponentially, model complexity (training/
scoring time and storage) grows as O(ntrees * 2^max_depth)
• Training is iterative: can stop at any point and continue training
Model Complexity
Validation Schemes
• Validation data can be created upfront (e.g., a split)
• for big data, it’s usually fine to not use all the data for training
• 60%/20%/20% train/valid/test splits are usually fine for big data
• Validation data can be held out in N-fold cross-validation
• best for smaller datasets or when validation is critical
• e.g.: 80%/20% train/test, 5-fold CV on the 80%, final testing on
20%
• Random shuffling only works for i.i.d. distribution
• If the data is non-i.i.d., has structure (time, events, etc.), you must
stratify!
Validate properly
Training data Model training
Validation data Parameter tuning
Testing data Final estimate of generalization error
Establishing Baselines
• Train default GBM, GLM, DRF, DL models and inspect
convergence plots, train/validation metrics and variable
importances
• Train a few extra default GBM models, with:
- max_depth=3 “Do Interactions matter at all?”
- max_depth=7 “I feel lucky”
- max_depth=10 “Master memorizer”
• Develop a feel for the problem and the holdout performance of
the different models
Develop a Feel
Viewing Models in Flow
• Inspect the model in Flow during training: getModel ‘model_id’
• If the model is wrong (wrong architecture, response, parameters, etc.), cancel it
• If the model is taking too long, cancel and decrease model complexity
• If the model is performing badly, cancel and increase model complexity
Cancel early, cancel often
Use Early Stopping!
(Off by default)
Before: build as many trees as
specified - can overfit on training
data
Now: stop training once user-specified metric
converges, e.g., stopping_rounds=2,
stopping_metric=“logloss”,
stopping_tolerance=1e-3, score_tree_interval=10
Early stopping saves time and leads to higher accuracy
validation error increases
overfitting
best model
training error
decreases
early stopping finds
optimal number of
trees
BNP Paribas Kaggle dataset
Distributions / Loss Functions
• For regression problems, there’s a large choice of different distributions
and related loss functions, just like in GLM.
• GBM also supports Quantile regression (e.g., predict the 80-th percentile)
Do your math!
y
X
90th
50th
10th
HyperParameter Search
Just need to find one of the many good models
There are a lot of hyper parameters for H2O GBM. Many are for expert-level
fine control and do not significantly affect the model performance.
The most important parameters to tune for GBM are

(e.g., with a Random search via strategy = “RandomDiscrete”):
• max_depth = [1,2,…,25]
• learn_rate = [0.01,0.02,0.03,0.05,0.1]
• sample_rate = [0.2,…,1]
• col_sample_rate = [0.2,…,1]
• min_rows = [1,2,…,20]
• nbins_cats = [10, 100, 1000, 10000]
• histogram_type = [“UniformAdaptive”, “QuantilesGlobal”]
• stopping_rounds = 3 (together with ntrees=5000 or so)
Partial Dependence Plots
Trust and Understanding
Partial dependence plots for the well-known California housing data set.
Partial dependence plots display the mean prediction for a given model and a given value of a dependent variable,
over the range of the dependent variable.
Partial dependence plots now available in R, Python, and Flow.
Questions?

More Related Content

What's hot

General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsMark Peng
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reductionMarco Quartulli
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaSangwoo Mo
 
A review of net lift models
A review of net lift modelsA review of net lift models
A review of net lift modelsZixia Wang
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into productionDataWorks Summit
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal InferenceNBER
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in productionTuri, Inc.
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsMinesh A. Jethva
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image TranslationJunho Kim
 
Introduction to Uplift Modelling
Introduction to Uplift ModellingIntroduction to Uplift Modelling
Introduction to Uplift ModellingPierre Gutierrez
 
Santander bank product recommendation.pptx
Santander bank product recommendation.pptxSantander bank product recommendation.pptx
Santander bank product recommendation.pptxSwaraj Teja Machiraju
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsDatabricks
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark mldatamantra
 
Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Soowan Lee
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyNeville Li
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"GenkiYasumoto
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysisSunghyon Kyeong
 

What's hot (20)

General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
07 dimensionality reduction
07 dimensionality reduction07 dimensionality reduction
07 dimensionality reduction
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
A review of net lift models
A review of net lift modelsA review of net lift models
A review of net lift models
 
How to deploy machine learning models into production
How to deploy machine learning models into productionHow to deploy machine learning models into production
How to deploy machine learning models into production
 
Tabnet presentation
Tabnet presentationTabnet presentation
Tabnet presentation
 
Machine Learning and Causal Inference
Machine Learning and Causal InferenceMachine Learning and Causal Inference
Machine Learning and Causal Inference
 
Machine learning in production
Machine learning in productionMachine learning in production
Machine learning in production
 
Confusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metricsConfusion matrix and classification evaluation metrics
Confusion matrix and classification evaluation metrics
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Image-to-Image Translation
Image-to-Image TranslationImage-to-Image Translation
Image-to-Image Translation
 
Introduction to Uplift Modelling
Introduction to Uplift ModellingIntroduction to Uplift Modelling
Introduction to Uplift Modelling
 
Santander bank product recommendation.pptx
Santander bank product recommendation.pptxSantander bank product recommendation.pptx
Santander bank product recommendation.pptx
 
Frequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on EmbeddingsFrequently Bought Together Recommendations Based on Embeddings
Frequently Bought Together Recommendations Based on Embeddings
 
Machine learning pipeline with spark ml
Machine learning pipeline with spark mlMachine learning pipeline with spark ml
Machine learning pipeline with spark ml
 
Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0Energy based models and boltzmann machines - v2.0
Energy based models and boltzmann machines - v2.0
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Causal Inference in Marketing
Causal Inference in MarketingCausal Inference in Marketing
Causal Inference in Marketing
 
Paper description of "Reformer"
Paper description of "Reformer"Paper description of "Reformer"
Paper description of "Reformer"
 
Topological data analysis
Topological data analysisTopological data analysis
Topological data analysis
 

Similar to GLM & GBM in H2O

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Universitat Politècnica de Catalunya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)Salford Systems
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideSalford Systems
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? HackerEarth
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Alex Gilgur
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Machine learning how are things going on
Machine learning how are things going onMachine learning how are things going on
Machine learning how are things going onRajasekhar364622
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statisticsSpotle.ai
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programmingSoumya Mukherjee
 

Similar to GLM & GBM in H2O (20)

Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Introduction to MARS (1999)
Introduction to MARS (1999)Introduction to MARS (1999)
Introduction to MARS (1999)
 
CART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User GuideCART Classification and Regression Trees Experienced User Guide
CART Classification and Regression Trees Experienced User Guide
 
Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ? How to Win Machine Learning Competitions ?
How to Win Machine Learning Competitions ?
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016 Performance OR Capacity #CMGimPACt2016
Performance OR Capacity #CMGimPACt2016
 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Regression ppt
Regression pptRegression ppt
Regression ppt
 
Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Machine learning how are things going on
Machine learning how are things going onMachine learning how are things going on
Machine learning how are things going on
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
An introduction to machine learning and statistics
An introduction to machine learning and statisticsAn introduction to machine learning and statistics
An introduction to machine learning and statistics
 
Intro to ml_2021
Intro to ml_2021Intro to ml_2021
Intro to ml_2021
 
Machine learning and linear regression programming
Machine learning and linear regression programmingMachine learning and linear regression programming
Machine learning and linear regression programming
 

More from Sri Ambati

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxSri Ambati
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek Sri Ambati
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thSri Ambati
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Sri Ambati
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMsSri Ambati
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the WaySri Ambati
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OSri Ambati
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Sri Ambati
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersSri Ambati
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Sri Ambati
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...Sri Ambati
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability Sri Ambati
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email AgainSri Ambati
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Sri Ambati
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...Sri Ambati
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...Sri Ambati
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneySri Ambati
 

More from Sri Ambati (20)

H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Generative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptxGenerative AI Masterclass - Model Risk Management.pptx
Generative AI Masterclass - Model Risk Management.pptx
 
AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek AI and the Future of Software Development: A Sneak Peek
AI and the Future of Software Development: A Sneak Peek
 
LLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5thLLMOps: Match report from the top of the 5th
LLMOps: Match report from the top of the 5th
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
Building LLM Solutions using Open Source and Closed Source Solutions in Coher...
 
Risk Management for LLMs
Risk Management for LLMsRisk Management for LLMs
Risk Management for LLMs
 
Open-Source AI: Community is the Way
Open-Source AI: Community is the WayOpen-Source AI: Community is the Way
Open-Source AI: Community is the Way
 
Building Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2OBuilding Custom GenAI Apps at H2O
Building Custom GenAI Apps at H2O
 
Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical Applied Gen AI for the Finance Vertical
Applied Gen AI for the Finance Vertical
 
Cutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM PapersCutting Edge Tricks from LLM Papers
Cutting Edge Tricks from LLM Papers
 
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
Practitioner's Guide to LLMs: Exploring Use Cases and a Glimpse Beyond Curren...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
KGM Mastering Classification and Regression with LLMs: Insights from Kaggle C...
 
LLM Interpretability
LLM Interpretability LLM Interpretability
LLM Interpretability
 
Never Reply to an Email Again
Never Reply to an Email AgainNever Reply to an Email Again
Never Reply to an Email Again
 
Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)Introducción al Aprendizaje Automatico con H2O-3 (1)
Introducción al Aprendizaje Automatico con H2O-3 (1)
 
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
From Rapid Prototypes to an end-to-end Model Deployment: an AI Hedge Fund Use...
 
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
AI Foundations Course Module 1 - Shifting to the Next Step in Your AI Transfo...
 
AI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation JourneyAI Foundations Course Module 1 - An AI Transformation Journey
AI Foundations Course Module 1 - An AI Transformation Journey
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsAndrey Dotsenko
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024BookNet Canada
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfjimielynbastida
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC BiblioShare - Tech Forum 2024
 
Science&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdfScience&tech:THE INFORMATION AGE STS.pdf
Science&tech:THE INFORMATION AGE STS.pdf
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 

GLM & GBM in H2O

  • 1. Penalized GLM and GBM Guide Arno Candel, PhD Tomas Nykodym Patrick Hall
  • 3. Linear Modeling Methods Ordinary Least Squares Carl Friedrich Gauss (1777–1855) Elastic Net Hui Zou and Trevor Hastie Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, 2005
  • 4. Ordinary Least Squares Requirements Requirements If broken … Linear relationship between inputs and targets; normal y, normal errors Inappropriate application/unreliable results; use a machine learning technique; use GLM N > p Underspecified/unreliable results; use LASSO or elastic net penalized regression No strong multicollinearity Ill-conditioned/unstable/unreliable results; Use ridge(L2/Tikhonov)/elastic net penalized regression No influential outliers Biased predictions, parameters, and statistical tests; use robust methods, i.e. IRLS, Huber loss, investigate/remove outliers Constant variance/no heteroskedasticity Lessened predictive accuracy, invalidates statistical tests; use GLM in some cases Limited correlation between input rows (no autocorrelation) Invalidates statistical tests; use time-series methods or machine learning technique
  • 5. Anatomy of GLM Family/distribution defines mean and variance of Y Family/distribution allows for non-constant variance Nonlinear link function between linear component and E(Y) Linear component
  • 6. Distributions / Loss Functions For regression problems, there’s a large choice of different distributions and related loss functions: • Gaussian distribution, squared error loss, sensitive to outliers • Laplace distribution, absolute error loss, more robust to outliers • Huber loss, hybrid of squared error & absolute error, robust to outliers • Poisson distribution (e.g., number of claims in a time period) • Gamma distribution (e.g, size of insurance claims) • Tweedie distribution (compound Poisson-Gamma) • Binomial distribution, log-loss for binary classification Also, H2O supports: • Offsets • Observation weights
  • 7. Iteratively reweighted least squares (IRLS) complements model fitting methods in the presence of outliers by: • Initially setting all observations to an equal weight • Fitting GLM parameters (β’s) • Calculating the residuals of the fitted GLM • Assigning observations with high residuals a lower weight • Repeating until GLM parameters (β’s) converge Iteratively Reweighed Least Squares !" = min ' ( )* − !, − ( -*. ∗ !. 0 .12 3 4 *12 “Inner loop” (ADMM optimization in H2O) “Outer loop”
  • 9. Combining GLM, IRLS and Regularization !" = min ' ( )* − !, − ( -*. ∗ !. 0 .12 3 4 *12 + 7((9 ∗ |!.| + 1 − 9 ∗ !. 3 ) 0 .12 Error function for a GLM. 𝜆 controls magnitude of penalties. Variable selection conducted by refitting model many times while varying 𝜆. Decreasing 𝜆 allows more variables in the model. “Outermost loop” L2/Ridge/Tinkhonov penalty helps address multicollinearity. L1/LASSO helps with variable selection. 𝛼 tunes balance between L1 and L2 penalties, i.e. elastic net. • Inner loop: Fitting GLM parameters for a given 𝜆 and 𝛼 • Outer loop: IRLS until β’s converge • Outermost loop: 𝜆 varies from 𝜆max to 0 Elastic net advantages over L1 or L2: • Does not saturate at min(p, N) • Allows groups of correlated variables
  • 10. Outer most loop(s): • 𝜆 search from 𝜆max (where all coefficients = 0) to 𝜆 = 0 • Grid search on alpha usually not necessary • Just try a few: 0, 0.5, 0.95 • Always keep some L2 • Set max_predictors, large models take longer • Models can also be validated: • Validation and test partitioning available • Cross-validated (k-fold, CV predictions available) Hyperparameter Selection
  • 11. •P-values available for non-penalized models •β constraints available, i.e. for all positive β’s •Use IRLS optimization for tall, skinny data sets •L1 OR LBFGS for wide data (> 500 predictors) • (L1 AND LBFGS possible, but can be slower) Practical Pointers
  • 13. GBM builds a sequential ensemble of decision trees: each tree attempts to correct the mis- predictions of all previous trees … First tree models the data. Data is sampled with replacement for each tree (iteration). Both rows and columns can be sampled. Each subsequent tree models the residuals of the previous trees. (Equivalent to a gradient-based update to model predictions) Gradient Boosting Machines Jerome H. Friedman Greedy function approximation: a gradient boosting machine Annals of statistics, 2001
  • 14. Ensemble models: Intuition •Stability: the predictions of ensemble models are stable w.r.t. minor perturbations of training data • Variable hiding: important variables are often correlated and can hide one-another (only the single most important variable from a group of important correlated variables will be used in many models); in different samples, many different important variables can shine through • Representative samples: some samples can be highly representative of new data
  • 15. 15 Distributed Gradient Boosting Machine find optimal split (feature & value) • H2O: First open-source implementation of scalable, distributed Gradient Boosting Machine - fully featured • Each tree is built in parallel on the entire compute cluster • Discretization (binning) for speedup without loss of accuracy age < 25 ? Y N all data age 1 118 income 1 1M Analytical error best split: age 25 H2O: discretized into 1 118 age 25 age
  • 16. 16 map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() map() driver Algo calls
 histogram method data parallelism - global histogram computation Scalable Distributed Histogram Calculation global histogram local histogram - one each for w, w*y and w*y2 w: observation weights y : response Same results as if computed on a single compute node
  • 17. • GBM builds a sequential ensemble of decision trees: each tree attempts to correct the mis-predictions of all previous trees • Each decision tree partitions the training data iteratively (left vs right) • Splits are found by an approximated exhaustive search (all features, all values) • Stop splitting if • max depth is reached • no error decrease from further splitting • Number of leaves grows exponentially, model complexity (training/ scoring time and storage) grows as O(ntrees * 2^max_depth) • Training is iterative: can stop at any point and continue training Model Complexity
  • 18. Validation Schemes • Validation data can be created upfront (e.g., a split) • for big data, it’s usually fine to not use all the data for training • 60%/20%/20% train/valid/test splits are usually fine for big data • Validation data can be held out in N-fold cross-validation • best for smaller datasets or when validation is critical • e.g.: 80%/20% train/test, 5-fold CV on the 80%, final testing on 20% • Random shuffling only works for i.i.d. distribution • If the data is non-i.i.d., has structure (time, events, etc.), you must stratify! Validate properly Training data Model training Validation data Parameter tuning Testing data Final estimate of generalization error
  • 19. Establishing Baselines • Train default GBM, GLM, DRF, DL models and inspect convergence plots, train/validation metrics and variable importances • Train a few extra default GBM models, with: - max_depth=3 “Do Interactions matter at all?” - max_depth=7 “I feel lucky” - max_depth=10 “Master memorizer” • Develop a feel for the problem and the holdout performance of the different models Develop a Feel
  • 20. Viewing Models in Flow • Inspect the model in Flow during training: getModel ‘model_id’ • If the model is wrong (wrong architecture, response, parameters, etc.), cancel it • If the model is taking too long, cancel and decrease model complexity • If the model is performing badly, cancel and increase model complexity Cancel early, cancel often
  • 21. Use Early Stopping! (Off by default) Before: build as many trees as specified - can overfit on training data Now: stop training once user-specified metric converges, e.g., stopping_rounds=2, stopping_metric=“logloss”, stopping_tolerance=1e-3, score_tree_interval=10 Early stopping saves time and leads to higher accuracy validation error increases overfitting best model training error decreases early stopping finds optimal number of trees BNP Paribas Kaggle dataset
  • 22. Distributions / Loss Functions • For regression problems, there’s a large choice of different distributions and related loss functions, just like in GLM. • GBM also supports Quantile regression (e.g., predict the 80-th percentile) Do your math! y X 90th 50th 10th
  • 23. HyperParameter Search Just need to find one of the many good models There are a lot of hyper parameters for H2O GBM. Many are for expert-level fine control and do not significantly affect the model performance. The most important parameters to tune for GBM are
 (e.g., with a Random search via strategy = “RandomDiscrete”): • max_depth = [1,2,…,25] • learn_rate = [0.01,0.02,0.03,0.05,0.1] • sample_rate = [0.2,…,1] • col_sample_rate = [0.2,…,1] • min_rows = [1,2,…,20] • nbins_cats = [10, 100, 1000, 10000] • histogram_type = [“UniformAdaptive”, “QuantilesGlobal”] • stopping_rounds = 3 (together with ntrees=5000 or so)
  • 24. Partial Dependence Plots Trust and Understanding Partial dependence plots for the well-known California housing data set. Partial dependence plots display the mean prediction for a given model and a given value of a dependent variable, over the range of the dependent variable. Partial dependence plots now available in R, Python, and Flow.