SlideShare a Scribd company logo
김현우 a.k.a 순록킴
yBigTa 9기
심리학 & 컴퓨터과학
Random Forest
yBigTa
the Tree
Contents { random forest,
ensemble,
bias & variance,
bagging };
Bias
Variance
Ensemble
Bootstrap
통계학
수능 점수 평균의
신뢰구간?
평가원은 모수를 알고 있다
등급컷 및 평균에 대한 모델을 갖고 있다
개인 점수
모수
모델값
데이터
잔차
오차
추정
𝑿𝒊, 𝒀𝒊
개인
점수
잔차
Problem of Overfitting
Bias & Variance
Bias - Variance trade off [or dilemma]
Does she like me…?
High bias (=low variance)
High variance (=low bias)
그렇다면 overfitting은?
High LowVariance Bias
Tree를 크게(=leaf node가 많다) 만들어 놨더니
high variance가 문제
Bias - Variance dilemma[or trade-off]
그렇다고 pruning 통해서 tree를
작게 만들었더니 이제는 high bias가 문제
Decision Tree
Regression
Overfit: 과적합
Overfitting을 위한 해결책?
TreeRandom Forest
Random Forestan ensemble learning method,
by constructing multitude of decision trees
Ensemble?
합주단
Democracy in the Forest
How the Forest works _classification
How it works _classification
치킨 피자 치킨 치킨
= 치킨 먹는 날
How the Forest works _regression
How it works _regression
2.5 4 3.7 2.9
= 3.2병
How it looks
How to make a Forest
How to make a Forest
Bagging
Bagging?
Resampling Methods
Resampling methods
1. Cross-validation
- Validation set approach
- Leave-One-Out Cross-validation
- k-Fold Cross-validation
2. Bootstrap
Training & Testing
Training & Testing
Validation set approach
Cross-validation
Training Set Validation Set [or Hold-out set]
Randomly
usually almost Half
Validation set approach
Cross-validation
• Randomly
• Almost half
모델 안정성 저해
Leave-one-out CV [LOOCV]
Cross-validation
T r a i n i n g S e t
Cross-validation
Leave-one-out CV [LOOCV]
• No randomness
• 항상 같은 결과값
• n-1 observations(data)
→ Fitting이 n번 이루어져야 한다
→ Computational problems
k-Fold CV
Cross-validation
T r a i n i n g S e tTest set
Randomly
k-Fold CV
Cross-validation
• k-Fold: split 횟수
• Usually 10-folds
• Lower variance than LOOCV
• Computational advantage
Cross-Validation
Validation set
approach
LOOCVk-Fold CV
1-Fold n-Foldk-Fold
Bias Bias Bias
Variance Variance Variance
Bootstrap
"Bootstrap is one of the biggest statistical breakthrough
in the 21th century."
Harvard statistics professor, Joe Blitztein
Bootstrap?
Bootstrap
Bootstrap
pull one's own by one's bootstrap
불가능한 일을 해낸다는 관용어구
Bootstrap
pull one's own by one's bootstrap
누군가의 도움을 받지 않고 스스로 문제 상황을 개선한다
Bootstrap
누군가의 도움을 받지 않고 스스로 문제 상황을 개선한다
Training & Testing
Bootstrap
누군가의 도움을 받지 않고 스스로 문제 상황을 개선한다
Training & Testing
Decision Trees have high variance
Bootstrap
a slight change in sample,
A huge change in result
Overfit: 과적합
모집단의 성질에 대해 표본을 통해 추정할 수 있는 것처럼,
표본의 성질에 대해서도 재표본을 통해 추정할 수 있다는 것이다.
즉 주어진 표본(샘플)에 대해서, 그 샘플에서 또 다시 샘플(재표본)을
여러번(1,000~10,000번, 혹은 그 이상)추출하여
표본의 평균이나 분산 등이 어떤 분포를 가지는가를 알아낼 수 있다.
Bootstrap
Bootstrap
복원 추출
Bootstrap
직접 해봅시다
Bootstrap
Bagging?
Bootstrap Aggregating
1.[C] 합계, 총액
2.[U , C] (전문 용어) (건설 자재용) 골재
Aggregating
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Averaging a set of observations reduces variance
Bootstrap
A slight change in sample,
Still a slight change in split
시험지 확인하러 가서 내 점수를 올려도
반 평균에는 큰 영향을 미치지 않는다
Why Random Forest ?
A random sample
of predictors per split
학과, 학점, 학교, 영어성적, 수능성적
√p
Training & Testing
Training & Testing
Out-of-Bag Error Estimation
OOB
Out-of-Bag ?
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Out-of-Bag
복원추출
뽑히지 않는
데이터가 존재
Validation set
Cross-validation
Tree
& advantages
1. 이해하기 쉽다: 씹고 뜯고 맛보고 즐기고 [White box]
2. 데이터 정제가 크게 필요하지 않다: 바로 넣자
3. numerical, categorical 가리지 않는다: 그냥 넣자
4. 데이터가 어떤 패턴인지 볼 때 편하다: 넣어봐
A dark forest
Inside the Forest
Black Box
Decision Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Tree
Feature Importance
Excluding the
Features
Thank you

More Related Content

What's hot

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hakky St
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
Sunggon Song
 
STATISTIK MATEMATIKA (Distribusi)
STATISTIK MATEMATIKA (Distribusi) STATISTIK MATEMATIKA (Distribusi)
STATISTIK MATEMATIKA (Distribusi)
erik-pebs
 
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar ProdukImplementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
Elvi Rahmi
 
실무에서 활용하는 A/B테스트
실무에서 활용하는 A/B테스트실무에서 활용하는 A/B테스트
실무에서 활용하는 A/B테스트
JeongMin Kwon
 
Klasifikasi Data Mining.pptx
Klasifikasi Data Mining.pptxKlasifikasi Data Mining.pptx
Klasifikasi Data Mining.pptx
Adam Superman
 
Feature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.aiFeature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.ai
Sri Ambati
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
Amit Sharma
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
홍배 김
 
06. graph mining
06. graph mining06. graph mining
06. graph mining
Jeonghun Yoon
 
3. Regression.pdf
3. Regression.pdf3. Regression.pdf
3. Regression.pdf
Jyoti Yadav
 
Active learning
Active learningActive learning
Active learning
Akhilesh Ravi
 
Materi 1-statistika 5
Materi 1-statistika 5Materi 1-statistika 5
Materi 1-statistika 5
Asep Komarudin (Milanisti)
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
Yan Xu
 
딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해
Hee Won Park
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Edureka!
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習
hagino 3000
 
05. manipulasi data update,delete & seleksi
05. manipulasi data   update,delete & seleksi05. manipulasi data   update,delete & seleksi
05. manipulasi data update,delete & seleksi
Fakhrian Fadlia Adiwijaya
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
Chode Amarnath
 
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver){tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
Takashi Kitano
 

What's hot (20)

Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
Hands-On Machine Learning with Scikit-Learn and TensorFlow - Chapter8
 
밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장밑바닥부터 시작하는딥러닝 8장
밑바닥부터 시작하는딥러닝 8장
 
STATISTIK MATEMATIKA (Distribusi)
STATISTIK MATEMATIKA (Distribusi) STATISTIK MATEMATIKA (Distribusi)
STATISTIK MATEMATIKA (Distribusi)
 
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar ProdukImplementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
Implementasi Algoritma FP - Growth Menentukan Asosiasi Antar Produk
 
실무에서 활용하는 A/B테스트
실무에서 활용하는 A/B테스트실무에서 활용하는 A/B테스트
실무에서 활용하는 A/B테스트
 
Klasifikasi Data Mining.pptx
Klasifikasi Data Mining.pptxKlasifikasi Data Mining.pptx
Klasifikasi Data Mining.pptx
 
Feature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.aiFeature Engineering for ML - Dmitry Larko, H2O.ai
Feature Engineering for ML - Dmitry Larko, H2O.ai
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 
Visualizing data using t-SNE
Visualizing data using t-SNEVisualizing data using t-SNE
Visualizing data using t-SNE
 
06. graph mining
06. graph mining06. graph mining
06. graph mining
 
3. Regression.pdf
3. Regression.pdf3. Regression.pdf
3. Regression.pdf
 
Active learning
Active learningActive learning
Active learning
 
Materi 1-statistika 5
Materi 1-statistika 5Materi 1-statistika 5
Materi 1-statistika 5
 
Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering Mean shift and Hierarchical clustering
Mean shift and Hierarchical clustering
 
딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해딥러닝 기본 원리의 이해
딥러닝 기본 원리의 이해
 
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
Logistic Regression in Python | Logistic Regression Example | Machine Learnin...
 
スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習スパース性に基づく機械学習 2章 データからの学習
スパース性に基づく機械学習 2章 データからの学習
 
05. manipulasi data update,delete & seleksi
05. manipulasi data   update,delete & seleksi05. manipulasi data   update,delete & seleksi
05. manipulasi data update,delete & seleksi
 
Bag the model with bagging
Bag the model with baggingBag the model with bagging
Bag the model with bagging
 
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver){tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
{tidygraph}と{ggraph}による モダンなネットワーク分析(未公開ver)
 

More from Hyunwoo Kim

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
Hyunwoo Kim
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Hyunwoo Kim
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Hyunwoo Kim
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2
Hyunwoo Kim
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis Intro
Hyunwoo Kim
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial Perturbation
Hyunwoo Kim
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attention
Hyunwoo Kim
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
Hyunwoo Kim
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorch
Hyunwoo Kim
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표
Hyunwoo Kim
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표
Hyunwoo Kim
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표
Hyunwoo Kim
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표
Hyunwoo Kim
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표
Hyunwoo Kim
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]
Hyunwoo Kim
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]
Hyunwoo Kim
 

More from Hyunwoo Kim (16)

서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
서울대학교 IAB 강의 Pytorch(파이토치) CNN 실습 수업
 
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific NoveltyCuriosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
Curiosity-Bottleneck: Exploration by Distilling Task-Specific Novelty
 
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory NetworksAbstractive Summarization of Reddit Posts with Multi-level Memory Networks
Abstractive Summarization of Reddit Posts with Multi-level Memory Networks
 
Genetic Algorithm Project 2
Genetic Algorithm Project 2Genetic Algorithm Project 2
Genetic Algorithm Project 2
 
Sentiment Analysis Intro
Sentiment Analysis IntroSentiment Analysis Intro
Sentiment Analysis Intro
 
Universal Adversarial Perturbation
Universal Adversarial PerturbationUniversal Adversarial Perturbation
Universal Adversarial Perturbation
 
Two VWM representations simultaneously control attention
Two VWM representations simultaneously control attentionTwo VWM representations simultaneously control attention
Two VWM representations simultaneously control attention
 
Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표Capstone Design(2) 최종 발표
Capstone Design(2) 최종 발표
 
Neural Networks Basics with PyTorch
Neural Networks Basics with PyTorchNeural Networks Basics with PyTorch
Neural Networks Basics with PyTorch
 
Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표Capstone Design(2) 중간 발표
Capstone Design(2) 중간 발표
 
Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표Capstone Design(2) 연구제안 발표
Capstone Design(2) 연구제안 발표
 
Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표Capstone Design(1) 최종 발표
Capstone Design(1) 최종 발표
 
Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표Capstone Design(1) 중간 발표
Capstone Design(1) 중간 발표
 
Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표Capstone Design(1) 연구제안 발표
Capstone Design(1) 연구제안 발표
 
Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]Neural Network Intro [인공신경망 설명]
Neural Network Intro [인공신경망 설명]
 
Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]Decision Tree Intro [의사결정나무]
Decision Tree Intro [의사결정나무]
 

Random Forest Intro [랜덤포레스트 설명]