SlideShare a Scribd company logo
Tree: Advanced Topic 
Bagging, Random forest, Boosting 
Jinseob Kim 
GSPH, SNU 
August 22, 2014 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 1 / 16
Tree: Pros & Cons 
Pros 
Computed very quickly 
Simple interpretations. 
built-in feature selection; if a predictor was not used in any split, the 
model is completely independent of that data. 
Cons 
Do not usually have optimal performance 
Small change in data ! drastically change in Tree 
Ensemble methods 
Many trees are
t and predictions are aggregated across the trees. 
Bagging, boosting and random forests 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 2 / 16
Bagging 
Contents 
1 Bagging 
2 Random Forest 
3 Boosting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 3 / 16
Bagging 
Bootstrap aggregation 
Basic idea 
Resampling and recalculating tree 
Averaging(continuous) or Majority vote(categorial) 
Note 
Similar Bias 
Reduced variance 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 4 / 16
Bagging 
Bagging: reduces variance –– Example 1 
Two categories of samples: blue, red 
Two predictors: x1 and x2 
Diagonal separation .. hardest case for tree-based classifier 
Single tree decision boundary in orange. 
Bagged predictor decision boundary in red. 
UPenn & Rutgers Albert A. Montillo 12 of 28 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 5 / 16
Bagging 
Bagging: reduces variance –– Example 2 
Ellipsoid separation Æ 
Two categories, 
Two predictors 
Single tree decision boundary 100 bagged trees.. 
UPenn & Rutgers Albert A. Montillo 13 of 28 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 6 / 16
Random Forest 
Contents 
1 Bagging 
2 Random Forest 
3 Boosting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 7 / 16
Random Forest 
Basic idea: Decorrelated Tree 
Bootstrap samples 
At each split, bootstrap variables 
Grow multiple trees and vote 
Pros 
Accuracy 
Cons 
Speed 
Interpretability 
Over
tting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 8 / 16
Random Forest 
Bagging vs Random Forest 
Bagging alone utilizes the same full set of predictors to determine 
each split. 
Random forest applies another judicious injection of randomness: 
namely by selecting a random subset of the predictors for each split 
Number of predictors to try at each split?? : mtry p 
k : classi
cation 
k 
3 : regression 
Bagging is a special case of random forest where mtry = k 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 9 / 16
Boosting 
Contents 
1 Bagging 
2 Random Forest 
3 Boosting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 10 / 16
Boosting 
Boosting Algorithms 
A method to “boost”weak learning algorithms (e.g. single trees) into 
strong learning algorithms. 
Boosted trees try to improve the model fit over di↵erent trees by 
considering past fits (not unlike iteratively reweighted least squares) 
The basic tree boosting algorithm: 
Initialize equal weights per sample; 
for j = 1. . .M iterations do 
Fit a classification tree using sample weights (denote the model 
equation as fj (x )); 
forall the misclassified samples do 
increase sample weight 
end 
Save a “stage–weight” ("j ) based on the performance of the current 
model; 
end 
Max Kuhn (Pfizer) Predictive Modeling 83 / 132 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 11 / 16
Boosting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 12 / 16
Boosting 
Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 13 / 16

More Related Content

Viewers also liked

L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
Machine Learning Valencia
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
Gilles Louppe
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overview
Adam Pah
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
Zihui Li
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
Pier Luca Lanzi
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Sri Ambati
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
Gilles Louppe
 
Random forest
Random forestRandom forest
Random forest
Musa Hawamdah
 
Bootstrap 3 Basic - Bangkok WordPress Meetup
Bootstrap 3 Basic - Bangkok WordPress MeetupBootstrap 3 Basic - Bangkok WordPress Meetup
Bootstrap 3 Basic - Bangkok WordPress Meetup
Woratana Perth Ngarmtrakulchol
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboost
Takami Sato
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
Jaroslaw Szymczak
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
Matthew Magistrado
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to Bootstrap
Ron Reiter
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
Derek Kane
 

Viewers also liked (14)

L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Kaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overviewKaggle "Give me some credit" challenge overview
Kaggle "Give me some credit" challenge overview
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Random forest
Random forestRandom forest
Random forest
 
Bootstrap 3 Basic - Bangkok WordPress Meetup
Bootstrap 3 Basic - Bangkok WordPress MeetupBootstrap 3 Basic - Bangkok WordPress Meetup
Bootstrap 3 Basic - Bangkok WordPress Meetup
 
Overview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboostOverview of tree algorithms from decision tree to xgboost
Overview of tree algorithms from decision tree to xgboost
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Tree net and_randomforests_2009
Tree net and_randomforests_2009Tree net and_randomforests_2009
Tree net and_randomforests_2009
 
Introduction to Bootstrap
Introduction to BootstrapIntroduction to Bootstrap
Introduction to Bootstrap
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 

More from Jinseob Kim

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Jinseob Kim
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection index
Jinseob Kim
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
Jinseob Kim
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.
Jinseob Kim
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
Jinseob Kim
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학
Jinseob Kim
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in Sabermetrics
Jinseob Kim
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLE
Jinseob Kim
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in R
Jinseob Kim
 
Fst in R
Fst in R Fst in R
Fst in R
Jinseob Kim
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_genetics
Jinseob Kim
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010
Jinseob Kim
 
DALY & QALY
DALY & QALYDALY & QALY
DALY & QALY
Jinseob Kim
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover study
Jinseob Kim
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model
Jinseob Kim
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
Jinseob Kim
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
Jinseob Kim
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
Jinseob Kim
 
Main result
Main result Main result
Main result
Jinseob Kim
 
Multilevel study
Multilevel study Multilevel study
Multilevel study
Jinseob Kim
 

More from Jinseob Kim (20)

Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
Unsupervised Deep Learning Applied to Breast Density Segmentation and Mammogr...
 
Fst, selection index
Fst, selection indexFst, selection index
Fst, selection index
 
Why Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So WellWhy Does Deep and Cheap Learning Work So Well
Why Does Deep and Cheap Learning Work So Well
 
괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.괴델(Godel)의 불완전성 정리 증명의 이해.
괴델(Godel)의 불완전성 정리 증명의 이해.
 
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
New Epidemiologic Measures in Multilevel Study: Median Risk Ratio, Median Haz...
 
가설검정의 심리학
가설검정의 심리학 가설검정의 심리학
가설검정의 심리학
 
Win Above Replacement in Sabermetrics
Win Above Replacement in SabermetricsWin Above Replacement in Sabermetrics
Win Above Replacement in Sabermetrics
 
Regression Basic : MLE
Regression  Basic : MLERegression  Basic : MLE
Regression Basic : MLE
 
iHS calculation in R
iHS calculation in RiHS calculation in R
iHS calculation in R
 
Fst in R
Fst in R Fst in R
Fst in R
 
Selection index population_genetics
Selection index population_geneticsSelection index population_genetics
Selection index population_genetics
 
질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010질병부담계산: Dismod mr gbd2010
질병부담계산: Dismod mr gbd2010
 
DALY & QALY
DALY & QALYDALY & QALY
DALY & QALY
 
Case-crossover study
Case-crossover studyCase-crossover study
Case-crossover study
 
Generalized Additive Model
Generalized Additive Model Generalized Additive Model
Generalized Additive Model
 
Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)Deep Learning by JSKIM (Korean)
Deep Learning by JSKIM (Korean)
 
Machine Learning Introduction
Machine Learning IntroductionMachine Learning Introduction
Machine Learning Introduction
 
Deep learning by JSKIM
Deep learning by JSKIMDeep learning by JSKIM
Deep learning by JSKIM
 
Main result
Main result Main result
Main result
 
Multilevel study
Multilevel study Multilevel study
Multilevel study
 

Recently uploaded

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
mzpolocfi
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
eddie19851
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
dwreak4tg
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Enterprise Wired
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
g4dpvqap0
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
GetInData
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
Timothy Spann
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 

Recently uploaded (20)

一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
一比一原版(Dalhousie毕业证书)达尔豪斯大学毕业证如何办理
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
Nanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdfNanandann Nilekani's ppt On India's .pdf
Nanandann Nilekani's ppt On India's .pdf
 
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
一比一原版(BCU毕业证书)伯明翰城市大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdfUnleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
Unleashing the Power of Data_ Choosing a Trusted Analytics Platform.pdf
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
一比一原版(爱大毕业证书)爱丁堡大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdfEnhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
Enhanced Enterprise Intelligence with your personal AI Data Copilot.pdf
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
06-04-2024 - NYC Tech Week - Discussion on Vector Databases, Unstructured Dat...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 

Tree advanced

  • 1. Tree: Advanced Topic Bagging, Random forest, Boosting Jinseob Kim GSPH, SNU August 22, 2014 Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 1 / 16
  • 2. Tree: Pros & Cons Pros Computed very quickly Simple interpretations. built-in feature selection; if a predictor was not used in any split, the model is completely independent of that data. Cons Do not usually have optimal performance Small change in data ! drastically change in Tree Ensemble methods Many trees are
  • 3. t and predictions are aggregated across the trees. Bagging, boosting and random forests Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 2 / 16
  • 4. Bagging Contents 1 Bagging 2 Random Forest 3 Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 3 / 16
  • 5. Bagging Bootstrap aggregation Basic idea Resampling and recalculating tree Averaging(continuous) or Majority vote(categorial) Note Similar Bias Reduced variance Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 4 / 16
  • 6. Bagging Bagging: reduces variance –– Example 1 Two categories of samples: blue, red Two predictors: x1 and x2 Diagonal separation .. hardest case for tree-based classifier Single tree decision boundary in orange. Bagged predictor decision boundary in red. UPenn & Rutgers Albert A. Montillo 12 of 28 Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 5 / 16
  • 7. Bagging Bagging: reduces variance –– Example 2 Ellipsoid separation Æ Two categories, Two predictors Single tree decision boundary 100 bagged trees.. UPenn & Rutgers Albert A. Montillo 13 of 28 Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 6 / 16
  • 8. Random Forest Contents 1 Bagging 2 Random Forest 3 Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 7 / 16
  • 9. Random Forest Basic idea: Decorrelated Tree Bootstrap samples At each split, bootstrap variables Grow multiple trees and vote Pros Accuracy Cons Speed Interpretability Over
  • 10. tting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 8 / 16
  • 11. Random Forest Bagging vs Random Forest Bagging alone utilizes the same full set of predictors to determine each split. Random forest applies another judicious injection of randomness: namely by selecting a random subset of the predictors for each split Number of predictors to try at each split?? : mtry p k : classi
  • 12. cation k 3 : regression Bagging is a special case of random forest where mtry = k Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 9 / 16
  • 13. Boosting Contents 1 Bagging 2 Random Forest 3 Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 10 / 16
  • 14. Boosting Boosting Algorithms A method to “boost”weak learning algorithms (e.g. single trees) into strong learning algorithms. Boosted trees try to improve the model fit over di↵erent trees by considering past fits (not unlike iteratively reweighted least squares) The basic tree boosting algorithm: Initialize equal weights per sample; for j = 1. . .M iterations do Fit a classification tree using sample weights (denote the model equation as fj (x )); forall the misclassified samples do increase sample weight end Save a “stage–weight” ("j ) based on the performance of the current model; end Max Kuhn (Pfizer) Predictive Modeling 83 / 132 Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 11 / 16
  • 15. Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 12 / 16
  • 16. Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 13 / 16
  • 17. Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 14 / 16
  • 18. Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 15 / 16
  • 19. Boosting Jinseob Kim (GSPH, SNU) Tree: Advanced Topic August 22, 2014 16 / 16