SlideShare a Scribd company logo
1 of 20
Deepak George
Senior Data Scientist – Machine Learning
Decision Tree Ensembles
Bagging, Random Forest & Gradient Boosting Machines
December 2015
 Education
 Computer Science Engineering – College Of Engineering Trivandrum
 Business Analytics & Intelligence – Indian Institute Of Management Bangalore
 Career
 Mu Sigma
 Accenture Analytics
 Data Science
 1st Prize Best Data Science Project (BAI 5) – IIM Bangalore
 Top 10% (out of 1100) finish Kaggle Coupon Purchase Prediction (Recommender
System)
 SAS Certified Statistical Business Analyst: Regression and Modeling Credentials
 Statistical Learning – Stanford University
 Passion
 Photography, Football, Data Science, Machine Learning
 Contact
 Deepak.george14@iimb.ernet.in
 linkedin.com/in/deepakgeorge7
Copyright @ Deepak George, IIM Bangalore
2
About Me
Copyright @ Deepak George, IIM Bangalore
3
Bias-Variance Tradeoff
Expected test MSE
 Bias
 Error that is introduced by approximating a
complicated relationship, by a much simpler
model.
 Difference between the truth and what you
expect to learn
 Underfitting
 Variance
 Amount by which model would change if we
estimated it using a different training data.
 If a model has high variance then small
changes in the training data can result in
large changes in the model.
 Overfitting
Copyright @ Deepak George, IIM Bangalore
4
Bias-Variance Tradeoff
Underfitting Ideal Learner Overfitting
 Problem: Decision tree have low bias & suffer from high variance
 Goal: Reduce variance of decision trees
 Hint: Given set of n independent observations Z1, . . . , Zn, each
with variance σ2, the variance of the mean of the observations is given
by σ2/n.
 In other words, averaging a set of observations reduces variance.
 Theoretically: Take multiple independent samples S’ from the population
 Fit “bushy”/deep decision trees on each S1,S2…. Sn
 Trees are grown deep and are not pruned
 Variance reduces linearly & Bias remain unchanged
 Practically: We only have one sample/training set & not the population.
 So take bootstrap samples i.e. multiple samples from the
single sample with replacement
 Variance reduces sub-linearly & Bias often increase slightly
because bootstrap samples are correlated.
 Final Classifier: Average of predictions for regression or majority vote
for classification.
 High Variance introduced by deep decision trees are mitigated by
averaging predictions from each decision trees.
Copyright @ Deepak George, IIM Bangalore
5
Bagging
Population
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S1
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S2
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Sn
.
.
.
Samples
Sample
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S1
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
S2
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Sn
.
.
.
Bootstrap Samples
Alice# 14# 0# 1#
Bob# 10# 1# 1#
Carol# 13# 0# 1#
Dave# 8# 1# 0#
Erin# 11# 0# 0#
Frank# 9# 1# 1#
Gena# 8# 0# 0#
James# 11# 1# 1#
Jessica# 14# 0# 1#
Alice# 14# 0# 1#
Amy# 12# 0# 1#
Bob# 10# 1# 1#
Xavier# 9# 1# 0#
Cathy# 9# 0# 1#
Carol# 13# 0# 1#
Eugene# 13# 1# 0#
Rafael# 12# 1# 1#
Dave# 8# 1# 0#
Peter# 9# 1# 0#
Henry# 13# 1# 0#
Erin# 11# 0# 0#
Rose# 7# 0# 0#
Iain# 8# 1# 1#
Paulo# 12# 1# 0#
Margaret# 10# 0# 1#
Frank# 9# 1# 1#
Jill# 13# 0# 0#
Leon# 10# 1# 0#
Sarah# 12# 0# 0#
Gena# 8# 0# 0#
Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
Copyright @ Deepak George, IIM Bangalore
6
Bootstrap sampling
Bootstrap sample
should have same
sample size as the
original sample.
With replacement results
in repetition of values
Bootstrap sample on an
average uses only 2/3 of
the data in the original
sample
Copyright @ Deepak George, IIM Bangalore
7
Random Forest
 Problem: Bagging still have relatively high variance
 Goal: Reduce variance of Bagging
 Solution: Along with sampling of data in Bagging, take samples of features also!
 In other words, in building a random forest, at each split in the tree,
the use only a random subset of features instead of all the features.
 This de-correlates the trees.
 Its mathematically proved that 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 is a good approximate value for
predictor subset size (mtry/max_features).
 Evaluation: A bootstrap sample uses only approximately 2/3 of the observations of original
sample.
 Remaining training data (OOB) are used to estimate error and variable importance
 Hyperparameters are knobs to control bias & variance tradeoff of any
machine learning algorithm.
 Key Hyper parameters
 Max Features – De-correlates the trees
 Number of Trees in the forest – Higher number reduce more variance
Random Forest - Key Hyperparameters
8
Copyright @ Deepak George, IIM Bangalore
Copyright @ Deepak George, IIM Bangalore
9
Random Forest – R Implementation
library(randomForest)
library(MASS) #Contains Boston dataframe
library(caret)
View(Boston)
#Cross Validation
cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T)
#GridSeach
rf.grid <- expand.grid(mtry = 2:13)
set.seed(1861) ## make reproducible here, but not if generating many random samples
#Hyper Parametertuning
rf_tune <-train(medv~.,
data=Boston,
method="rf",
trControl=cv.ctrl,
tuneGrid=rf.grid,
ntree = 1000,
importance = TRUE)
#Cross Validation results
rf_tune
plot(rf_tune)
#Variable Importance
varImp(rf_tune)
plot(varImp(rf_tune), top = 10)
Copyright @ Deepak George, IIM Bangalore
10
Boosting
 Intuition: Ensemble many “weak” classifiers (typically decision trees) to
produce a final “strong” classifier
 Weak classifier  Error rate is only slightly better than random
guessing.
 Boosting is a Forward Stagewise Additive model
 Boosting sequentially apply the weak classifiers one by one to repeatedly
reweighted versions of the data.
 Each new weak learner in the sequence tries to correct the
misclassification/error made by the previous weak learners.
 Initially all of the weights are set to Wi = 1/N
 For each successive step the observation weights are individually
modified and a new weak learner is fitted on the reweighted
observations.
 At step m, those observations that were misclassified by the
classifier Gm−1(x) induced at the previous step have their weights
increased, whereas the weights are decreased for those that were
classified correctly.
 Final “strong” classifier is based on weighted vote of weak classifiers
X1
X2
AdaBoost – Illustration
11Copyright @ Deepak George, IIM Bangalore
Step 1
Input Data
Initially all observations are
assigned equal weight (1/N)
Observations that are
misclassified in the ith
iteration is given higher
weights in the (i+1)th iteration
Observations that are correctly
classified in the ith iteration is
given lower weights in the
(i+1)th iteration
Copyright @ Deepak George, IIM Bangalore
12
Copyright @ Deepak George, IIM Bangalore
Step 2
Step 3
AdaBoost – Illustration
13
Copyright @ Deepak George, IIM Bangalore
Final Ensemble/Model
AdaBoost – Illustration
AdaBoost - Algorithm
14
Copyright @ Deepak George, IIM Bangalore
 Generalization of AdaBoost to work with arbitrary loss functions resulted in GBM.
Gradient Boosting = Gradient Descent + Boosting
 GBM uses gradient descent algorithm which can optimize any differentiable loss
function.
 In Adaboost, ‘shortcomings’ are identified by high-weight data points.
 In Gradient Boosting,“shortcomings” are identified by negative gradients (also
called pseudo residuals).
 In GBM instead of reweighting used in adaboost, each new tree is fit to the
negative gradients of the previous tree.
 Each tree in GBM is a successive gradient descent step.
Gradient Boosting Machines
15
Copyright @ Deepak George, IIM Bangalore
 AdaBoost is equivalent to forward stagewise additive modeling using the
exponential loss function.
Gradient Boosting - Algorithm
16
Copyright @ Deepak George, IIM Bangalore
 GBM has 3 types of hyper parameters
 Tree Structure
 Max depth of the trees - Controls the degree of features
interactions
 Min samples leaf – Minimum number of samples in leaf node.
 Number of Trees
 Shrinkage
 Learning rate - Slows learning by shrinking tree predictions.
 Unlike fitting a single large decision tree to the data, which amounts
to fitting the data hard and potentially overfitting, the boosting
approach instead learns slowly
 Stochastic Gradient Boosting
 SubSample: Select random subset of the training set for fitting each
tree than using the complete training data.
 Max features: Select random subset of features for each tree.
GBM – Key Hyperparameters
17
Copyright @ Deepak George, IIM Bangalore
Copyright @ Deepak George, IIM Bangalore
18
Tree Ensembles- Interpretation
library(xgboost)
library(MASS) #Contains Boston dataframe
library(caret)
#Cross Validation
cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T)
#GridSeach
xgb.grid <- expand.grid(nrounds=1000,eta = c(0.005,0.01,0.05,0.1) ,max_depth = c(4,5,6,7,8))
set.seed(1860)
#Model training
xgb_tune <-train(medv~.,
data=Boston,
method="xgbTree",
trControl=cv.ctrl,
tuneGrid=xgb.grid,
importance = TRUE,
subsample =0.8)
#Cross Validation results
xgb_tune
plot(xgb_tune)
#Variable Importance
plot(varImp(xgb_tune), top = 10)
Copyright @ Deepak George, IIM Bangalore
19
GBM – R Implementation
Copyright @ Deepak George, IIM Bangalore
20
End
Questions ?

More Related Content

What's hot

Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...RavindraSinghKushwah1
 
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...Hemantha Kulathilake
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Inductive bias
Inductive biasInductive bias
Inductive biasswapnac12
 
Smoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainSmoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainMadhu Bala
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetAmazon Web Services
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Wanjin Yu
 
Boundary Extraction
Boundary ExtractionBoundary Extraction
Boundary ExtractionMaria Akther
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Mohammed Bennamoun
 
PR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale HyperpriorPR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale HyperpriorHyeongmin Lee
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Manohar Mukku
 
DIGITAL IMAGE PROCESSING - LECTURE NOTES
DIGITAL IMAGE PROCESSING - LECTURE NOTESDIGITAL IMAGE PROCESSING - LECTURE NOTES
DIGITAL IMAGE PROCESSING - LECTURE NOTESEzhilya venkat
 
07 regularization
07 regularization07 regularization
07 regularizationRonald Teo
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processingData Science Thailand
 

What's hot (20)

Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
Machine Learning PPT BY RAVINDRA SINGH KUSHWAHA B.TECH(IT) CHAUDHARY CHARAN S...
 
Object Recognition
Object RecognitionObject Recognition
Object Recognition
 
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...
COM2304: Intensity Transformation and Spatial Filtering – II Spatial Filterin...
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Inductive bias
Inductive biasInductive bias
Inductive bias
 
Smoothing Filters in Spatial Domain
Smoothing Filters in Spatial DomainSmoothing Filters in Spatial Domain
Smoothing Filters in Spatial Domain
 
Computer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT DevicesComputer Graphics - Introduction and CRT Devices
Computer Graphics - Introduction and CRT Devices
 
Introduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNetIntroduction to Machine Learning, Deep Learning and MXNet
Introduction to Machine Learning, Deep Learning and MXNet
 
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
Intelligent Image Enhancement and Restoration - From Prior Driven Model to Ad...
 
Boundary Extraction
Boundary ExtractionBoundary Extraction
Boundary Extraction
 
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
Artificial Neural Network Lecture 6- Associative Memories & Discrete Hopfield...
 
PR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale HyperpriorPR-395: Variational Image Compression with a Scale Hyperprior
PR-395: Variational Image Compression with a Scale Hyperprior
 
Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)Generative Adversarial Networks (GAN)
Generative Adversarial Networks (GAN)
 
DDA algorithm
DDA algorithmDDA algorithm
DDA algorithm
 
DIGITAL IMAGE PROCESSING - LECTURE NOTES
DIGITAL IMAGE PROCESSING - LECTURE NOTESDIGITAL IMAGE PROCESSING - LECTURE NOTES
DIGITAL IMAGE PROCESSING - LECTURE NOTES
 
07 regularization
07 regularization07 regularization
07 regularization
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Machine learning in image processing
Machine learning in image processingMachine learning in image processing
Machine learning in image processing
 
Edge detection
Edge detectionEdge detection
Edge detection
 

Viewers also liked

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2OSri Ambati
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with PythonGramener
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostJaroslaw Szymczak
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2Darryl Moore
 
Landscape architecture
Landscape architectureLandscape architecture
Landscape architectureRaima Hashmi
 
Bird Friendly Architecture
Bird Friendly ArchitectureBird Friendly Architecture
Bird Friendly ArchitectureSurya Ramesh
 
Landscape Architect portfolio
Landscape Architect portfolioLandscape Architect portfolio
Landscape Architect portfolioAhmad Al-khalaqi
 
INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1Darryl Moore
 
Vegetation in landscape
Vegetation in landscapeVegetation in landscape
Vegetation in landscapeSaima Iqbal
 

Viewers also liked (20)

Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Inlining Heuristics
Inlining HeuristicsInlining Heuristics
Inlining Heuristics
 
Gbm.more GBM in H2O
Gbm.more GBM in H2OGbm.more GBM in H2O
Gbm.more GBM in H2O
 
XGBoost (System Overview)
XGBoost (System Overview)XGBoost (System Overview)
XGBoost (System Overview)
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with Python
 
GBM theory code and parameters
GBM theory code and parametersGBM theory code and parameters
GBM theory code and parameters
 
Gradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboostGradient boosting in practice: a deep dive into xgboost
Gradient boosting in practice: a deep dive into xgboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2REV CITYSCAPES 042.GDJ137.V2
REV CITYSCAPES 042.GDJ137.V2
 
InternationalNov
InternationalNovInternationalNov
InternationalNov
 
Mangalavanam Bird Sanctuary
Mangalavanam Bird SanctuaryMangalavanam Bird Sanctuary
Mangalavanam Bird Sanctuary
 
this is india
this is indiathis is india
this is india
 
Landscape architecture
Landscape architectureLandscape architecture
Landscape architecture
 
GDJ155.14.v2
GDJ155.14.v2GDJ155.14.v2
GDJ155.14.v2
 
Bird Friendly Architecture
Bird Friendly ArchitectureBird Friendly Architecture
Bird Friendly Architecture
 
Landscape Architect portfolio
Landscape Architect portfolioLandscape Architect portfolio
Landscape Architect portfolio
 
INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1INTERNAT 014.GDJ137.V1
INTERNAT 014.GDJ137.V1
 
Vegetation in landscape
Vegetation in landscapeVegetation in landscape
Vegetation in landscape
 

Similar to Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

Similar to Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines (8)

Ensemble methods.pptx
Ensemble methods.pptxEnsemble methods.pptx
Ensemble methods.pptx
 
Readme
ReadmeReadme
Readme
 
Appendix
AppendixAppendix
Appendix
 
Appendix
AppendixAppendix
Appendix
 
BoD presi w notes-1mar2013
BoD presi w notes-1mar2013BoD presi w notes-1mar2013
BoD presi w notes-1mar2013
 
Volume c
Volume cVolume c
Volume c
 
Volume c
Volume cVolume c
Volume c
 
Volume d
Volume dVolume d
Volume d
 

Recently uploaded

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 

Recently uploaded (20)

Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 

Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines

  • 1. Deepak George Senior Data Scientist – Machine Learning Decision Tree Ensembles Bagging, Random Forest & Gradient Boosting Machines December 2015
  • 2.  Education  Computer Science Engineering – College Of Engineering Trivandrum  Business Analytics & Intelligence – Indian Institute Of Management Bangalore  Career  Mu Sigma  Accenture Analytics  Data Science  1st Prize Best Data Science Project (BAI 5) – IIM Bangalore  Top 10% (out of 1100) finish Kaggle Coupon Purchase Prediction (Recommender System)  SAS Certified Statistical Business Analyst: Regression and Modeling Credentials  Statistical Learning – Stanford University  Passion  Photography, Football, Data Science, Machine Learning  Contact  Deepak.george14@iimb.ernet.in  linkedin.com/in/deepakgeorge7 Copyright @ Deepak George, IIM Bangalore 2 About Me
  • 3. Copyright @ Deepak George, IIM Bangalore 3 Bias-Variance Tradeoff Expected test MSE  Bias  Error that is introduced by approximating a complicated relationship, by a much simpler model.  Difference between the truth and what you expect to learn  Underfitting  Variance  Amount by which model would change if we estimated it using a different training data.  If a model has high variance then small changes in the training data can result in large changes in the model.  Overfitting
  • 4. Copyright @ Deepak George, IIM Bangalore 4 Bias-Variance Tradeoff Underfitting Ideal Learner Overfitting
  • 5.  Problem: Decision tree have low bias & suffer from high variance  Goal: Reduce variance of decision trees  Hint: Given set of n independent observations Z1, . . . , Zn, each with variance σ2, the variance of the mean of the observations is given by σ2/n.  In other words, averaging a set of observations reduces variance.  Theoretically: Take multiple independent samples S’ from the population  Fit “bushy”/deep decision trees on each S1,S2…. Sn  Trees are grown deep and are not pruned  Variance reduces linearly & Bias remain unchanged  Practically: We only have one sample/training set & not the population.  So take bootstrap samples i.e. multiple samples from the single sample with replacement  Variance reduces sub-linearly & Bias often increase slightly because bootstrap samples are correlated.  Final Classifier: Average of predictions for regression or majority vote for classification.  High Variance introduced by deep decision trees are mitigated by averaging predictions from each decision trees. Copyright @ Deepak George, IIM Bangalore 5 Bagging Population Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S1 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S2 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### Sn . . . Samples Sample Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S1 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]### S2 Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Sn . . . Bootstrap Samples Alice# 14# 0# 1# Bob# 10# 1# 1# Carol# 13# 0# 1# Dave# 8# 1# 0# Erin# 11# 0# 0# Frank# 9# 1# 1# Gena# 8# 0# 0# James# 11# 1# 1# Jessica# 14# 0# 1# Alice# 14# 0# 1# Amy# 12# 0# 1# Bob# 10# 1# 1# Xavier# 9# 1# 0# Cathy# 9# 0# 1# Carol# 13# 0# 1# Eugene# 13# 1# 0# Rafael# 12# 1# 1# Dave# 8# 1# 0# Peter# 9# 1# 0# Henry# 13# 1# 0# Erin# 11# 0# 0# Rose# 7# 0# 0# Iain# 8# 1# 1# Paulo# 12# 1# 0# Margaret# 10# 0# 1# Frank# 9# 1# 1# Jill# 13# 0# 0# Leon# 10# 1# 0# Sarah# 12# 0# 0# Gena# 8# 0# 0# Patrick# 5# 1# 1# L(h)#=#E(x,y)~P(x,y)[#f(h(x),y)#]###
  • 6. Copyright @ Deepak George, IIM Bangalore 6 Bootstrap sampling Bootstrap sample should have same sample size as the original sample. With replacement results in repetition of values Bootstrap sample on an average uses only 2/3 of the data in the original sample
  • 7. Copyright @ Deepak George, IIM Bangalore 7 Random Forest  Problem: Bagging still have relatively high variance  Goal: Reduce variance of Bagging  Solution: Along with sampling of data in Bagging, take samples of features also!  In other words, in building a random forest, at each split in the tree, the use only a random subset of features instead of all the features.  This de-correlates the trees.  Its mathematically proved that 𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑜𝑟𝑠 is a good approximate value for predictor subset size (mtry/max_features).  Evaluation: A bootstrap sample uses only approximately 2/3 of the observations of original sample.  Remaining training data (OOB) are used to estimate error and variable importance
  • 8.  Hyperparameters are knobs to control bias & variance tradeoff of any machine learning algorithm.  Key Hyper parameters  Max Features – De-correlates the trees  Number of Trees in the forest – Higher number reduce more variance Random Forest - Key Hyperparameters 8 Copyright @ Deepak George, IIM Bangalore
  • 9. Copyright @ Deepak George, IIM Bangalore 9 Random Forest – R Implementation library(randomForest) library(MASS) #Contains Boston dataframe library(caret) View(Boston) #Cross Validation cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T) #GridSeach rf.grid <- expand.grid(mtry = 2:13) set.seed(1861) ## make reproducible here, but not if generating many random samples #Hyper Parametertuning rf_tune <-train(medv~., data=Boston, method="rf", trControl=cv.ctrl, tuneGrid=rf.grid, ntree = 1000, importance = TRUE) #Cross Validation results rf_tune plot(rf_tune) #Variable Importance varImp(rf_tune) plot(varImp(rf_tune), top = 10)
  • 10. Copyright @ Deepak George, IIM Bangalore 10 Boosting  Intuition: Ensemble many “weak” classifiers (typically decision trees) to produce a final “strong” classifier  Weak classifier  Error rate is only slightly better than random guessing.  Boosting is a Forward Stagewise Additive model  Boosting sequentially apply the weak classifiers one by one to repeatedly reweighted versions of the data.  Each new weak learner in the sequence tries to correct the misclassification/error made by the previous weak learners.  Initially all of the weights are set to Wi = 1/N  For each successive step the observation weights are individually modified and a new weak learner is fitted on the reweighted observations.  At step m, those observations that were misclassified by the classifier Gm−1(x) induced at the previous step have their weights increased, whereas the weights are decreased for those that were classified correctly.  Final “strong” classifier is based on weighted vote of weak classifiers
  • 11. X1 X2 AdaBoost – Illustration 11Copyright @ Deepak George, IIM Bangalore Step 1 Input Data Initially all observations are assigned equal weight (1/N) Observations that are misclassified in the ith iteration is given higher weights in the (i+1)th iteration Observations that are correctly classified in the ith iteration is given lower weights in the (i+1)th iteration Copyright @ Deepak George, IIM Bangalore
  • 12. 12 Copyright @ Deepak George, IIM Bangalore Step 2 Step 3 AdaBoost – Illustration
  • 13. 13 Copyright @ Deepak George, IIM Bangalore Final Ensemble/Model AdaBoost – Illustration
  • 14. AdaBoost - Algorithm 14 Copyright @ Deepak George, IIM Bangalore
  • 15.  Generalization of AdaBoost to work with arbitrary loss functions resulted in GBM. Gradient Boosting = Gradient Descent + Boosting  GBM uses gradient descent algorithm which can optimize any differentiable loss function.  In Adaboost, ‘shortcomings’ are identified by high-weight data points.  In Gradient Boosting,“shortcomings” are identified by negative gradients (also called pseudo residuals).  In GBM instead of reweighting used in adaboost, each new tree is fit to the negative gradients of the previous tree.  Each tree in GBM is a successive gradient descent step. Gradient Boosting Machines 15 Copyright @ Deepak George, IIM Bangalore  AdaBoost is equivalent to forward stagewise additive modeling using the exponential loss function.
  • 16. Gradient Boosting - Algorithm 16 Copyright @ Deepak George, IIM Bangalore
  • 17.  GBM has 3 types of hyper parameters  Tree Structure  Max depth of the trees - Controls the degree of features interactions  Min samples leaf – Minimum number of samples in leaf node.  Number of Trees  Shrinkage  Learning rate - Slows learning by shrinking tree predictions.  Unlike fitting a single large decision tree to the data, which amounts to fitting the data hard and potentially overfitting, the boosting approach instead learns slowly  Stochastic Gradient Boosting  SubSample: Select random subset of the training set for fitting each tree than using the complete training data.  Max features: Select random subset of features for each tree. GBM – Key Hyperparameters 17 Copyright @ Deepak George, IIM Bangalore
  • 18. Copyright @ Deepak George, IIM Bangalore 18 Tree Ensembles- Interpretation
  • 19. library(xgboost) library(MASS) #Contains Boston dataframe library(caret) #Cross Validation cv.ctrl <- trainControl(method = "repeatedcv", repeats = 2,number = 5, allowParallel=T) #GridSeach xgb.grid <- expand.grid(nrounds=1000,eta = c(0.005,0.01,0.05,0.1) ,max_depth = c(4,5,6,7,8)) set.seed(1860) #Model training xgb_tune <-train(medv~., data=Boston, method="xgbTree", trControl=cv.ctrl, tuneGrid=xgb.grid, importance = TRUE, subsample =0.8) #Cross Validation results xgb_tune plot(xgb_tune) #Variable Importance plot(varImp(xgb_tune), top = 10) Copyright @ Deepak George, IIM Bangalore 19 GBM – R Implementation
  • 20. Copyright @ Deepak George, IIM Bangalore 20 End Questions ?