SlideShare a Scribd company logo
Gradient Boosted Regression Trees

scikit

Peter Prettenhofer (@pprett)

Gilles Louppe (@glouppe)

DataRobot

Universit´ de Li`ge, Belgium
e
e
Motivation
Motivation
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
About us
Peter
• @pprett
• Python & ML ∼ 6 years
• sklearn dev since 2010

Gilles
• @glouppe
• PhD student (Li`ge,
e

Belgium)
• sklearn dev since 2011

Chief tree hugger
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Machine Learning 101
• Data comes as...
• A set of examples {(xi , yi )|0 ≤ i < n samples}, with
• Feature vector x ∈ Rn features , and
• Response y ∈ R (regression) or y ∈ {−1, 1} (classification)

• Goal is to...
• Find a function y = f (x)
ˆ
• Such that error L(y , y ) on new (unseen) x is minimal
ˆ
Classification and Regression Trees [Breiman et al, 1984]

MedInc <= 5.04

MedInc <= 3.07

AveRooms <= 4.31

1.62

1.16

MedInc <= 6.82

AveOccup <= 2.37

AveOccup <= 2.74

2.79

1.88

3.39

2.56

sklearn.tree.DecisionTreeClassifier|Regressor

MedInc <= 7.82

3.73

4.57
Function approximation with Regression Trees
10
8
6

ground truth
RT d=1
RT d=3
RT d=20

4

y

2
0
2
4
6
8
0

2

4

x

6

8

10
Function approximation with Regression Trees
10
8
6

ground truth
RT d=1
RT d=3
RT d=20

4

Deprecated

y

2
0

• Nowadays seldom used alone

2

• Ensembles: Random Forest, Bagging, or Boosting

(see sklearn.ensemble)

4
6
8
0

2

4

x

6

8

10
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Gradient Boosted Regression Trees

Advantages
• Heterogeneous data (features measured on different scale),
• Supports different loss functions (e.g. huber),
• Automatically detects (non-linear) feature interactions,

Disadvantages
• Requires careful tuning
• Slow to train (but fast to predict)
• Cannot extrapolate
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its

predecessor
• Iteratively re-weights training examples based on errors
2

x1

1
0
1
2

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

0

x0

1

2

sklearn.ensemble.AdaBoostClassifier|Regressor

3

2

1

0

x0

1

2

3
Boosting
Huge success
AdaBoost [Y. Freund & R. Schapire, 1995]
• Viola-Jones Face Detector (2001)
• Ensemble: each member is an expert on the errors of its

predecessor
• Iteratively re-weights training examples based on errors
2

x1

1
0
1
2

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

0

x0

1

2

3

2

1

• Freund & Schapire won the G¨del prize 2003
o

sklearn.ensemble.AdaBoostClassifier|Regressor

0

x0

1

2

3
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions

y

Residual fitting
2.5
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0

Ground truth

tree 1

+

∼

2

x

6

10

tree 2

2

x

6

10

tree 3

+

2

x

6

10

2

sklearn.ensemble.GradientBoostingClassifier|Regressor

x

6

10
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
f
• The residual ∼ the (negative) gradient ∂L(yi ,(x (xi ))
∂f i )
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
f
• The residual ∼ the (negative) gradient ∂L(yi ,(x (xi ))
∂f i )

Steepest Descent
• Regression trees approximate the (negative) gradient
• Each tree is a successive gradient descent step
8

8

Squared error
Absolute error
Huber error

7
6

6
5
L(y,f(x))

5
L(y,f(x))

Zero-one loss
Log loss
Exponential loss

7

4

4

3

3

2

2

1
0

1
4

3

2

1

0

y−f(x)

1

2

3

4

0

4

3

2

1

0

y ·f(x)

1

2

3

4
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
GBRT in scikit-learn
How to use it
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.datasets import make_hastie_10_2
>>> X, y = make_hastie_10_2(n_samples=10000)
>>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3)
>>> est.fit(X, y)
...
>>> # get predictions
>>> pred = est.predict(X)
>>> est.predict_proba(X)[0] # class probabilities
array([ 0.67, 0.33])

Implementation
• Written in pure Python/Numpy (easy to extend).
• Builds on top of sklearn.tree.DecisionTreeRegressor (Cython).
• Custom node splitter that uses pre-sorting (better for shallow trees).
Example
from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y)
for pred in est.staged_predict(X):
plt.plot(X[:, 0], pred, color=’r’, alpha=0.1)

10
8
6

ground truth
RT d=1
RT d=3
GBRT d=1
High bias - low variance

4

y

2
0
2
4
Low bias - high variance

6
8
0

2

4

x

6

8

10
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)

2.0

Test
Train

Error

1.5

1.0

Lowest test error

0.5
train-test gap
0.0
0

200

400

n_estimators

600

800

1000
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)

2.0

Test
Train

Regularization
1.5
GBRT provides a number of knobs to control
overfitting
Error

Lowest test
•1.0Tree structure error

• Shrinkage
• Stochastic Gradient Boosting
0.5

train-test gap
0.0
0

200

400

n_estimators

600

800

1000
Regularization: Tree structure
• The max depth of the trees controls the degree of features interactions
• Use min samples leaf to have a sufficient nr. of samples per leaf.
Regularization: Shrinkage
• Slow learning by shrinking tree predictions with 0 < learning rate <= 1
• Lower learning rate requires higher n estimators
2.0

Test
Train
Test learning_rate=0.1
Train learning_rate=0.1

Error

1.5

1.0

Requires more trees
Lower test error

0.5

0.0
0

200

400

n_estimators

600

800

1000
Regularization: Stochastic Gradient Boosting
• Samples: random subset of the training set (subsample)
• Features: random subset of features (max features)
• Improved accuracy – reduced runtime
2.0

Train
Test
Train subsample=0.5, learning_rate=0.1
Test subsample=0.5, learning_rate=0.1

Error

1.5

Subsample alone does poorly

1.0

Even lower test error
0.5

0.0
0

200

400

n_estimators

600

800

1000
Hyperparameter tuning
1. Set n estimators as high as possible (eg. 3000)
2. Tune hyperparameters via grid search.
from sklearn.grid_search import GridSearchCV
param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01],
’max_depth’: [4, 6],
’min_samples_leaf’: [3, 5, 9, 17],
’max_features’: [1.0, 0.3, 0.1]}
est = GradientBoostingRegressor(n_estimators=3000)
gs_cv = GridSearchCV(est, param_grid).fit(X, y)
# best hyperparameter setting
gs_cv.best_params_

3. Finally, set n estimators even higher and tune
learning rate.
Outline

1 Basics

2 Gradient Boosting

3 Gradient Boosting in Scikit-learn

4 Case Study: California housing
Case Study
California Housing dataset
• Predict log(medianHouseValue)
• Block groups in 1990 census
• 20.640 groups with 8 features
(median income, median age, lat,
lon, ...)

• Evaluation: Mean absolute error
on 80/20 split

Challenges
• Heterogeneous features
• Non-linear interactions
Predictive accuracy & runtime

Mean
Ridge
SVR
RF
GBRT

Train time [s]
0.006
28.0
26.3
192.0

Test time [ms]
0.11
2000.00
605.00
439.00

MAE
0.4635
0.2756
0.1888
0.1620
0.1438

0.5

Test
Train

0.4

error

0.3
0.2
0.1
0.0
0

500

1000

1500
n_estimators

2000

2500

3000
Model interpretation
Which features are important?
>>> est.feature_importances_
array([ 0.01, 0.38, ...])

MedInc
AveRooms
Longitude
AveOccup
Latitude
AveBedrms
Population
HouseAge
0.00

0.02

0.04

0.06

0.08 0.10 0.12
Relative importance

0.14

0.16

0.18
Model interpretation
What is the effect of a feature on the response?
from sklearn.ensemble import partial_dependence import as pd

Partial dependence

-0.12

0.09

0.2

3

0.02

0.16

-0.05

Partial dependence

Partial dependence of house value on nonlocation features
for the California housing dataset
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0
0.2
0.2
0.4
0.4
1.5 3.0 4.5 6.0 7.5
2.0 2.5 3.0 3.5 4.0 4.5
10 20 30 40 50 60
MedInc
AveOccup
HouseAge
0.6
50
0.4
40
0.2
30
0.0
20
0.2
0.4
10
4 5 6 7 8
2.0 2.5 3.0 3.5 4.0
AveRooms
AveOccup
0.6
0.4
0.2
0.0
0.2
0.4

HouseAge

Partial dependence

Partial dependence

features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’,
(’AveOccup’, ’HouseAge’)]
fig, axs = pd.plot_partial_dependence(est, X_train, features,
feature_names=names)
Model interpretation

Automatically detects spatial effects
0.97

0.57

0.66

0.49
0.41
partial dep. on median house value

partial dep. on median house value

0.34

0.33

-0.28

0.25

latitude

latitude

0.03

0.17

-0.60

0.09

-0.91

0.01

-0.07

-1.22
longitude

-1.54

-0.15
longitude
Summary

• Flexible non-parametric classification and regression technique
• Applicable to a variety of problems
• Solid, battle-worn implementation in scikit-learn
Thanks! Questions?
Test time
Train time

Error

1.2
1.0
0.8
0.6
0.4
0.2
0.0
3.0
2.5
2.0
1.5
1.0
0.5
0.0
1.0
0.8
0.6
0.4
0.2
0.0

dataset
bioresp

YahooLTRC

Spam

Solar

Madelon

Expedia

Example 10.2

Covtype

California

Boston

Arcene

Benchmarks

gbm
sklearn-0.15
Tipps & Tricks 1

Input layout
Use dtype=np.float32 to avoid memory copies and fortan layout for slight
runtime benefit.
X = np.asfortranarray(X, dtype=np.float32)
Tipps & Tricks 2

Feature interactions
GBRT automatically detects feature interactions but often explicit interactions
help.
Trees required to approximate X1 − X2 : 10 (left), 1000 (right).

0.3

1.0

0.2
x-y

0.0

0.0

0.1

0.5

0.2

1.0

0.8

0.6

x

0.4

0.2

0.0 1.0

0.8

0.6

0.4
y

0.2

x-y

0.5

0.1

0.3
0.0
1.0

0.8

0.6

x

0.4

0.2

0.0 1.0

0.8

0.6

0.4
y

0.2

1.0
0.0
Tipps & Tricks 3

Categorical variables
Sklearn requires that categorical variables are encoded as numerics. Tree-based
methods work well with ordinal encoding:
df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]})
# ordinal encoding
df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao,
return_inverse=True)[1]})
X = np.asfortranarray(df_enc.values, dtype=np.float32)

More Related Content

What's hot

Machine Learning
Machine LearningMachine Learning
Machine Learning
Girish Khanzode
 
Decision tree
Decision treeDecision tree
Decision tree
Ami_Surati
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Simplilearn
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
Carlo Carandang
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
Shubham Jaybhaye
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Parth Khare
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
Md. Ariful Hoque
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
Sanghyuk Chun
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
Zhen Li
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
Jon Lederman
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
Marc Garcia
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
Mohammad Junaid Khan
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
Rupak Roy
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
HJ van Veen
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 

What's hot (20)

Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Support Vector Machines- SVM
Support Vector Machines- SVMSupport Vector Machines- SVM
Support Vector Machines- SVM
 
Stochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptxStochastic Gradient Decent (SGD).pptx
Stochastic Gradient Decent (SGD).pptx
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Random Forest and KNN is fun
Random Forest and KNN is funRandom Forest and KNN is fun
Random Forest and KNN is fun
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 

Viewers also liked

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
PyData
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
Donald Miner
 
Kaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solutionKaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solution
Kazuki Onodera
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Spark Summit
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingHackerEarth
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
HackerEarth
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
HackerEarth
 
Smart Switchboard: An home automation system
Smart Switchboard: An home automation systemSmart Switchboard: An home automation system
Smart Switchboard: An home automation system
HackerEarth
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
Domino Data Lab
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
Jeong-Yoon Lee
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
DataRobot
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Antti Haapala
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
HJ van Veen
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
DataRobot
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
HackerEarth
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
HackerEarth
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
HackerEarth
 

Viewers also liked (20)

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Pr...
 
Hadoop with Python
Hadoop with PythonHadoop with Python
Hadoop with Python
 
Kaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solutionKaggle meetup #3 instacart 2nd place solution
Kaggle meetup #3 instacart 2nd place solution
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
State of women in technical workforce
State of women in technical workforceState of women in technical workforce
State of women in technical workforce
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
How to recruit excellent tech talent
How to recruit excellent tech talentHow to recruit excellent tech talent
How to recruit excellent tech talent
 
Smart Switchboard: An home automation system
Smart Switchboard: An home automation systemSmart Switchboard: An home automation system
Smart Switchboard: An home automation system
 
Leveraged Analytics at Scale
Leveraged Analytics at ScaleLeveraged Analytics at Scale
Leveraged Analytics at Scale
 
Data Science Competition
Data Science CompetitionData Science Competition
Data Science Competition
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Make Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature EngineeringMake Sense Out of Data with Feature Engineering
Make Sense Out of Data with Feature Engineering
 
Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit Wapid and wobust active online machine leawning with Vowpal Wabbit
Wapid and wobust active online machine leawning with Vowpal Wabbit
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
 
HackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case StudyHackerEarth helping a startup hire developers - The Practo Case Study
HackerEarth helping a startup hire developers - The Practo Case Study
 
How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 

Similar to Gradient Boosted Regression Trees in scikit-learn

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
Yogendra Singh
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
Shocky1
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
ssuserf07225
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
재연 윤
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
harmonylab
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Universitat Politècnica de Catalunya
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
Seonho Park
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
yasir149288
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
Ramakrishna Reddy Bijjam
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
MuhammadAhmedShah2
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Universitat Politècnica de Catalunya
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
MohamedAliHabib3
 
Xgboost
XgboostXgboost
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
Fabian Pedregosa
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
Garrett Teoh Hor Keong
 
Eye deep
Eye deepEye deep
Eye deep
sveitser
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Universitat Politècnica de Catalunya
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pawan Singh
 

Similar to Gradient Boosted Regression Trees in scikit-learn (20)

L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
Optimization (DLAI D4L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
adversarial robustness lecture
adversarial robustness lectureadversarial robustness lecture
adversarial robustness lecture
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
R user group meeting 25th jan 2017
R user group meeting 25th jan 2017R user group meeting 25th jan 2017
R user group meeting 25th jan 2017
 
Eye deep
Eye deepEye deep
Eye deep
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 

Recently uploaded

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
CatarinaPereira64715
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
UiPathCommunity
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Tobias Schneck
 

Recently uploaded (20)

Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 

Gradient Boosted Regression Trees in scikit-learn

  • 1. Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) Gilles Louppe (@glouppe) DataRobot Universit´ de Li`ge, Belgium e e
  • 4. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 5. About us Peter • @pprett • Python & ML ∼ 6 years • sklearn dev since 2010 Gilles • @glouppe • PhD student (Li`ge, e Belgium) • sklearn dev since 2011 Chief tree hugger
  • 6. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 7. Machine Learning 101 • Data comes as... • A set of examples {(xi , yi )|0 ≤ i < n samples}, with • Feature vector x ∈ Rn features , and • Response y ∈ R (regression) or y ∈ {−1, 1} (classification) • Goal is to... • Find a function y = f (x) ˆ • Such that error L(y , y ) on new (unseen) x is minimal ˆ
  • 8. Classification and Regression Trees [Breiman et al, 1984] MedInc <= 5.04 MedInc <= 3.07 AveRooms <= 4.31 1.62 1.16 MedInc <= 6.82 AveOccup <= 2.37 AveOccup <= 2.74 2.79 1.88 3.39 2.56 sklearn.tree.DecisionTreeClassifier|Regressor MedInc <= 7.82 3.73 4.57
  • 9. Function approximation with Regression Trees 10 8 6 ground truth RT d=1 RT d=3 RT d=20 4 y 2 0 2 4 6 8 0 2 4 x 6 8 10
  • 10. Function approximation with Regression Trees 10 8 6 ground truth RT d=1 RT d=3 RT d=20 4 Deprecated y 2 0 • Nowadays seldom used alone 2 • Ensembles: Random Forest, Bagging, or Boosting (see sklearn.ensemble) 4 6 8 0 2 4 x 6 8 10
  • 11. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 12. Gradient Boosted Regression Trees Advantages • Heterogeneous data (features measured on different scale), • Supports different loss functions (e.g. huber), • Automatically detects (non-linear) feature interactions, Disadvantages • Requires careful tuning • Slow to train (but fast to predict) • Cannot extrapolate
  • 13. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 x1 1 0 1 2 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 0 x0 1 2 sklearn.ensemble.AdaBoostClassifier|Regressor 3 2 1 0 x0 1 2 3
  • 14. Boosting Huge success AdaBoost [Y. Freund & R. Schapire, 1995] • Viola-Jones Face Detector (2001) • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 x1 1 0 1 2 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 0 x0 1 2 3 2 1 • Freund & Schapire won the G¨del prize 2003 o sklearn.ensemble.AdaBoostClassifier|Regressor 0 x0 1 2 3
  • 15. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions
  • 16. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions y Residual fitting 2.5 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 Ground truth tree 1 + ∼ 2 x 6 10 tree 2 2 x 6 10 tree 3 + 2 x 6 10 2 sklearn.ensemble.GradientBoostingClassifier|Regressor x 6 10
  • 17. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 f • The residual ∼ the (negative) gradient ∂L(yi ,(x (xi )) ∂f i )
  • 18. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 f • The residual ∼ the (negative) gradient ∂L(yi ,(x (xi )) ∂f i ) Steepest Descent • Regression trees approximate the (negative) gradient • Each tree is a successive gradient descent step 8 8 Squared error Absolute error Huber error 7 6 6 5 L(y,f(x)) 5 L(y,f(x)) Zero-one loss Log loss Exponential loss 7 4 4 3 3 2 2 1 0 1 4 3 2 1 0 y−f(x) 1 2 3 4 0 4 3 2 1 0 y ·f(x) 1 2 3 4
  • 19. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 20. GBRT in scikit-learn How to use it >>> from sklearn.ensemble import GradientBoostingClassifier >>> from sklearn.datasets import make_hastie_10_2 >>> X, y = make_hastie_10_2(n_samples=10000) >>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3) >>> est.fit(X, y) ... >>> # get predictions >>> pred = est.predict(X) >>> est.predict_proba(X)[0] # class probabilities array([ 0.67, 0.33]) Implementation • Written in pure Python/Numpy (easy to extend). • Builds on top of sklearn.tree.DecisionTreeRegressor (Cython). • Custom node splitter that uses pre-sorting (better for shallow trees).
  • 21. Example from sklearn.ensemble import GradientBoostingRegressor est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y) for pred in est.staged_predict(X): plt.plot(X[:, 0], pred, color=’r’, alpha=0.1) 10 8 6 ground truth RT d=1 RT d=3 GBRT d=1 High bias - low variance 4 y 2 0 2 4 Low bias - high variance 6 8 0 2 4 x 6 8 10
  • 22. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 2.0 Test Train Error 1.5 1.0 Lowest test error 0.5 train-test gap 0.0 0 200 400 n_estimators 600 800 1000
  • 23. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 2.0 Test Train Regularization 1.5 GBRT provides a number of knobs to control overfitting Error Lowest test •1.0Tree structure error • Shrinkage • Stochastic Gradient Boosting 0.5 train-test gap 0.0 0 200 400 n_estimators 600 800 1000
  • 24. Regularization: Tree structure • The max depth of the trees controls the degree of features interactions • Use min samples leaf to have a sufficient nr. of samples per leaf.
  • 25. Regularization: Shrinkage • Slow learning by shrinking tree predictions with 0 < learning rate <= 1 • Lower learning rate requires higher n estimators 2.0 Test Train Test learning_rate=0.1 Train learning_rate=0.1 Error 1.5 1.0 Requires more trees Lower test error 0.5 0.0 0 200 400 n_estimators 600 800 1000
  • 26. Regularization: Stochastic Gradient Boosting • Samples: random subset of the training set (subsample) • Features: random subset of features (max features) • Improved accuracy – reduced runtime 2.0 Train Test Train subsample=0.5, learning_rate=0.1 Test subsample=0.5, learning_rate=0.1 Error 1.5 Subsample alone does poorly 1.0 Even lower test error 0.5 0.0 0 200 400 n_estimators 600 800 1000
  • 27. Hyperparameter tuning 1. Set n estimators as high as possible (eg. 3000) 2. Tune hyperparameters via grid search. from sklearn.grid_search import GridSearchCV param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01], ’max_depth’: [4, 6], ’min_samples_leaf’: [3, 5, 9, 17], ’max_features’: [1.0, 0.3, 0.1]} est = GradientBoostingRegressor(n_estimators=3000) gs_cv = GridSearchCV(est, param_grid).fit(X, y) # best hyperparameter setting gs_cv.best_params_ 3. Finally, set n estimators even higher and tune learning rate.
  • 28. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 29. Case Study California Housing dataset • Predict log(medianHouseValue) • Block groups in 1990 census • 20.640 groups with 8 features (median income, median age, lat, lon, ...) • Evaluation: Mean absolute error on 80/20 split Challenges • Heterogeneous features • Non-linear interactions
  • 30. Predictive accuracy & runtime Mean Ridge SVR RF GBRT Train time [s] 0.006 28.0 26.3 192.0 Test time [ms] 0.11 2000.00 605.00 439.00 MAE 0.4635 0.2756 0.1888 0.1620 0.1438 0.5 Test Train 0.4 error 0.3 0.2 0.1 0.0 0 500 1000 1500 n_estimators 2000 2500 3000
  • 31. Model interpretation Which features are important? >>> est.feature_importances_ array([ 0.01, 0.38, ...]) MedInc AveRooms Longitude AveOccup Latitude AveBedrms Population HouseAge 0.00 0.02 0.04 0.06 0.08 0.10 0.12 Relative importance 0.14 0.16 0.18
  • 32. Model interpretation What is the effect of a feature on the response? from sklearn.ensemble import partial_dependence import as pd Partial dependence -0.12 0.09 0.2 3 0.02 0.16 -0.05 Partial dependence Partial dependence of house value on nonlocation features for the California housing dataset 0.6 0.6 0.4 0.4 0.2 0.2 0.0 0.0 0.2 0.2 0.4 0.4 1.5 3.0 4.5 6.0 7.5 2.0 2.5 3.0 3.5 4.0 4.5 10 20 30 40 50 60 MedInc AveOccup HouseAge 0.6 50 0.4 40 0.2 30 0.0 20 0.2 0.4 10 4 5 6 7 8 2.0 2.5 3.0 3.5 4.0 AveRooms AveOccup 0.6 0.4 0.2 0.0 0.2 0.4 HouseAge Partial dependence Partial dependence features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’, (’AveOccup’, ’HouseAge’)] fig, axs = pd.plot_partial_dependence(est, X_train, features, feature_names=names)
  • 33. Model interpretation Automatically detects spatial effects 0.97 0.57 0.66 0.49 0.41 partial dep. on median house value partial dep. on median house value 0.34 0.33 -0.28 0.25 latitude latitude 0.03 0.17 -0.60 0.09 -0.91 0.01 -0.07 -1.22 longitude -1.54 -0.15 longitude
  • 34. Summary • Flexible non-parametric classification and regression technique • Applicable to a variety of problems • Solid, battle-worn implementation in scikit-learn
  • 37. Tipps & Tricks 1 Input layout Use dtype=np.float32 to avoid memory copies and fortan layout for slight runtime benefit. X = np.asfortranarray(X, dtype=np.float32)
  • 38. Tipps & Tricks 2 Feature interactions GBRT automatically detects feature interactions but often explicit interactions help. Trees required to approximate X1 − X2 : 10 (left), 1000 (right). 0.3 1.0 0.2 x-y 0.0 0.0 0.1 0.5 0.2 1.0 0.8 0.6 x 0.4 0.2 0.0 1.0 0.8 0.6 0.4 y 0.2 x-y 0.5 0.1 0.3 0.0 1.0 0.8 0.6 x 0.4 0.2 0.0 1.0 0.8 0.6 0.4 y 0.2 1.0 0.0
  • 39. Tipps & Tricks 3 Categorical variables Sklearn requires that categorical variables are encoded as numerics. Tree-based methods work well with ordinal encoding: df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]}) # ordinal encoding df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao, return_inverse=True)[1]}) X = np.asfortranarray(df_enc.values, dtype=np.float32)