SlideShare a Scribd company logo
1 of 41
Download to read offline
Decision Tree
HANSAM CHO
GROOT SEMINAR
Definition
Decision tree learning is a method commonly used in data mining. The goal is to
create a model that predicts the value of a target variable based on several input
variables.
Issues
1. How to split the training records → Impurity measure, Algorithm
2. When to stop splitting → Stopping condition, Pruning
Impurity Measure
• Splitting의 결과가 얼마나 좋은지에 대한 평가 척도 (homogeneity)
• Misclassification error
• Gini impurity
• Information gain
• Variance reduction
Misclassification error
Misclassification error
Gini impurity
• Used by the CART (classification and regression tree) algorithm for classification trees
• Gini impurity is a measure of how often a randomly chosen element from the set
would be incorrectly labeled if it was randomly labeled according to the distribution
of labels in the subset.
뽑힌 element가 특정 클래스에 속할 확률 잘못 분류될 확률
Gini impurity
Information gain
• Used by the ID3, C4.5 and C5.0
• Information (사건의 확률이 낮을 수록 높은 정보를 가지고 있다 / 로또 예시)
• Entropy (expectation of information) / Deviance
Information gain
• Information gain
Information gain
Variance reduction
• Introduced in CART, variance reduction is often employed in cases where the target
variable is continuous (regression tree)
Algorithm
•Split의 결과가 얼마나 좋은지에 대한 척도 – Impurity measure
•어떻게 나눌까 - Algorithm
• ID3
• C 4.5
• C 5.0
• CART
ID3 - algorithm
• Calculate the entropy of every attribute a of the data set S.
• Partition ("split") the set S into subsets using the attribute for which the resulting
entropy after splitting is minimized; or, equivalently, information gain is maximum
• Make a decision tree node containing that attribute.
• Recurse on subsets using the remaining attributes.
ID3 - example
ID3 – stopping condition
• Every element in the subset belongs to the same class; in which case the node is
turned into a leaf node and labelled with the class of the examples.
• There are no more attributes to be selected, but the examples still do not belong to
the same class. In this case, the node is made a leaf node and labelled with the most
common class of the examples in the subset.
• There are no examples in the subset, which happens when no example in the parent
set was found to match a specific value of the selected attribute. An example could be
the absence of a person among the population with age over 100 years. Then a leaf
node is created and labelled with the most common class of the examples in the
parent node's set.
C 4.5 – Information gain ratio
• A notable problem occurs when information gain is applied to attributes that can take
on a large number of distinct values. (ex. 고객번호 / overfitting)
•Information gain ratio
• Intrinsic value (많이 쪼개는 것에 대한 패널티, 쪼개는 것에 대한 엔트로피)
C 4.5 - Improvements from ID.3
algorithm
C 4.5 – Handling continuous attribute
(mid-point)
Pruning
•Pre-pruning / Post-pruning
•Reduced error pruning
• Subtree 제거했을 때 성능 차이가 없다면 pruning
•Cost complexity pruning
C 5.0
CART
• 기본적인 컨셉은 C 4.5와 유사
• Regression 가능
• Binary split
• Choose attribute recursively
• Classification – Gini impurity / Regression – Variance reduction
Ensemble
Bootstrap Aggregation (Bagging)
◦ Random Forest
Boosting
◦ AdaBoost
◦ Gradient Boosting
Bootstrap Aggregation (Bagging)
• Bootstrapping
• Given a standard training set D of size n, bagging generates m new training sets Di, each of
size n′, by sampling from D uniformly and with replacement. By sampling with replacement,
some observations may be repeated in each Di. If n′=n, then for large n the set Di is
expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being
duplicates.
• Aggregation
• This kind of sample is known as a bootstrap sample. Then, m models are fitted using the
above m bootstrap samples and combined by averaging the output (for regression) or
voting (for classification)
lim
𝑛→∞
1 − 1 −
1
𝑛
𝑛
Random Forest – random subspace
• Random forests differ in only one way from this general scheme: they use a modified
tree learning algorithm that selects, at each candidate split in the learning process, a
random subset of the features. This process is sometimes called "feature bagging".
• The reason for doing this is the correlation of the trees in an ordinary bootstrap
sample: if one or a few features are very strong predictors for the response variable
(target output), these features will be selected in many of the B trees, causing them to
become correlated.
Extra-Trees
• Its two main differences with other tree based ensemble methods are that it splits
nodes by choosing cut-points fully at random and that it uses the whole learning
sample (rather than a bootstrap replica) to grow the trees.
• 𝑛 𝑚𝑖𝑛 : the minimum sample size for splitting a node
• 비슷한 성능을 유지하면서 computational cost감소
Random Forest – Variable importance
Boosting
Boosting algorithms consist of iteratively learning weak classifiers with respect to a
distribution and adding them to a final strong classifier. When they are added, they
are typically weighted in some way that is usually related to the weak learners'
accuracy
AdaBoost
AdaBoost Algorithm
AdaBoost
https://www.youtube.com/watch?v=LsK-xG1cLYA
Decision Stump (Weak learner)
AdaBoost
error↓ → α↑
Logit function
error가 0또는 1
AdaBoost
AdaBoost
AdaBoost
Weighted Gini Index
Bootstrapping
AdaBoost
Exponential error
AdaBoost
m-1번째까지 모델이 만들어져 있고 m번째 weak learner 추가하는 과정을 가정
𝛼 𝑚, 𝑦 𝑚에 대해서만 최적화 시행
𝑇 𝑚 : 𝑦 𝑚에 의해 정확히 분류된 포인트 / 𝑀 𝑚 : 𝑦 𝑚에 의해 잘못 분류된 포인트
AdaBoost
14.23을 𝑦 𝑚에 대해 minimize, 𝛼 𝑚상수 취급 / 14.15식 유도
AdaBoost
14.23을𝛼 𝑚에 대해 minimize / 14.17식 유도
https://en.wikipedia.org/wiki/AdaBoost
AdaBoost
exp(−
𝛼 𝑚
2
)는 n에 대해 독립적이기 때문에 제거 가능
Gradient Boosting
Gradient Boosting Algorithm
이후…
Xgboost
LightGBM
Catboost
Optimal decision tree

More Related Content

What's hot

Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)Learnbay Datascience
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Simplilearn
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsPalin analytics
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Edureka!
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 

What's hot (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification and regression trees (cart)
Classification and regression trees (cart)Classification and regression trees (cart)
Classification and regression trees (cart)
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Random Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin AnalyticsRandom Forest Classifier in Machine Learning | Palin Analytics
Random Forest Classifier in Machine Learning | Palin Analytics
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
Decision Tree Algorithm & Analysis | Machine Learning Algorithm | Data Scienc...
 
Random forest
Random forestRandom forest
Random forest
 
Random Forest
Random ForestRandom Forest
Random Forest
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 

Similar to Decision tree

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptxRaflyRizky2
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfAdityaSoraut
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptxPRIYACHAURASIYA25
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest Rupak Roy
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 r-kor
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxAbhishekSingh43430
 

Similar to Decision tree (20)

Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx5.Module_AIML Random Forest.pptx
5.Module_AIML Random Forest.pptx
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
Introduction to Random Forest
Introduction to Random Forest Introduction to Random Forest
Introduction to Random Forest
 
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로 모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
 
Bank loan purchase modeling
Bank loan purchase modelingBank loan purchase modeling
Bank loan purchase modeling
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
CS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptxCS109a_Lecture16_Bagging_RF_Boosting.pptx
CS109a_Lecture16_Bagging_RF_Boosting.pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Issues in DTL.pptx
Issues in DTL.pptxIssues in DTL.pptx
Issues in DTL.pptx
 

More from SEMINARGROOT

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learningSEMINARGROOT
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMCSEMINARGROOT
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style TransferSEMINARGROOT
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.SEMINARGROOT
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embeddingSEMINARGROOT
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional NetworkSEMINARGROOT
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency DomainSEMINARGROOT
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian StatisticsSEMINARGROOT
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3SEMINARGROOT
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMASEMINARGROOT
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine LearningSEMINARGROOT
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GANSEMINARGROOT
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsSEMINARGROOT
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You NeedSEMINARGROOT
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewSEMINARGROOT
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2SEMINARGROOT
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashingSEMINARGROOT
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1SEMINARGROOT
 

More from SEMINARGROOT (20)

Metric based meta_learning
Metric based meta_learningMetric based meta_learning
Metric based meta_learning
 
Sampling method : MCMC
Sampling method : MCMCSampling method : MCMC
Sampling method : MCMC
 
Demystifying Neural Style Transfer
Demystifying Neural Style TransferDemystifying Neural Style Transfer
Demystifying Neural Style Transfer
 
Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.Towards Deep Learning Models Resistant to Adversarial Attacks.
Towards Deep Learning Models Resistant to Adversarial Attacks.
 
The ways of node embedding
The ways of node embeddingThe ways of node embedding
The ways of node embedding
 
Graph Convolutional Network
Graph  Convolutional NetworkGraph  Convolutional Network
Graph Convolutional Network
 
Denoising With Frequency Domain
Denoising With Frequency DomainDenoising With Frequency Domain
Denoising With Frequency Domain
 
Bayesian Statistics
Bayesian StatisticsBayesian Statistics
Bayesian Statistics
 
Coding Test Review 3
Coding Test Review 3Coding Test Review 3
Coding Test Review 3
 
Time Series Analysis - ARMA
Time Series Analysis - ARMATime Series Analysis - ARMA
Time Series Analysis - ARMA
 
Differential Geometry for Machine Learning
Differential Geometry for Machine LearningDifferential Geometry for Machine Learning
Differential Geometry for Machine Learning
 
Generative models : VAE and GAN
Generative models : VAE and GANGenerative models : VAE and GAN
Generative models : VAE and GAN
 
Effective Python
Effective PythonEffective Python
Effective Python
 
Understanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence FunctionsUnderstanding Blackbox Prediction via Influence Functions
Understanding Blackbox Prediction via Influence Functions
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Attention
AttentionAttention
Attention
 
WWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial ReviewWWW 2020 XAI Tutorial Review
WWW 2020 XAI Tutorial Review
 
Coding test review 2
Coding test review 2Coding test review 2
Coding test review 2
 
Locality sensitive hashing
Locality sensitive hashingLocality sensitive hashing
Locality sensitive hashing
 
Coding Test Review1
Coding Test Review1Coding Test Review1
Coding Test Review1
 

Recently uploaded

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...ssifa0344
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptxRajatChauhan518211
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 

Recently uploaded (20)

TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptxBroad bean, Lima Bean, Jack bean, Ullucus.pptx
Broad bean, Lima Bean, Jack bean, Ullucus.pptx
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Green chemistry and Sustainable development.pptx
Green chemistry  and Sustainable development.pptxGreen chemistry  and Sustainable development.pptx
Green chemistry and Sustainable development.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 

Decision tree

  • 2. Definition Decision tree learning is a method commonly used in data mining. The goal is to create a model that predicts the value of a target variable based on several input variables.
  • 3. Issues 1. How to split the training records → Impurity measure, Algorithm 2. When to stop splitting → Stopping condition, Pruning
  • 4. Impurity Measure • Splitting의 결과가 얼마나 좋은지에 대한 평가 척도 (homogeneity) • Misclassification error • Gini impurity • Information gain • Variance reduction
  • 7. Gini impurity • Used by the CART (classification and regression tree) algorithm for classification trees • Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it was randomly labeled according to the distribution of labels in the subset. 뽑힌 element가 특정 클래스에 속할 확률 잘못 분류될 확률
  • 9. Information gain • Used by the ID3, C4.5 and C5.0 • Information (사건의 확률이 낮을 수록 높은 정보를 가지고 있다 / 로또 예시) • Entropy (expectation of information) / Deviance
  • 12. Variance reduction • Introduced in CART, variance reduction is often employed in cases where the target variable is continuous (regression tree)
  • 13. Algorithm •Split의 결과가 얼마나 좋은지에 대한 척도 – Impurity measure •어떻게 나눌까 - Algorithm • ID3 • C 4.5 • C 5.0 • CART
  • 14. ID3 - algorithm • Calculate the entropy of every attribute a of the data set S. • Partition ("split") the set S into subsets using the attribute for which the resulting entropy after splitting is minimized; or, equivalently, information gain is maximum • Make a decision tree node containing that attribute. • Recurse on subsets using the remaining attributes.
  • 16. ID3 – stopping condition • Every element in the subset belongs to the same class; in which case the node is turned into a leaf node and labelled with the class of the examples. • There are no more attributes to be selected, but the examples still do not belong to the same class. In this case, the node is made a leaf node and labelled with the most common class of the examples in the subset. • There are no examples in the subset, which happens when no example in the parent set was found to match a specific value of the selected attribute. An example could be the absence of a person among the population with age over 100 years. Then a leaf node is created and labelled with the most common class of the examples in the parent node's set.
  • 17. C 4.5 – Information gain ratio • A notable problem occurs when information gain is applied to attributes that can take on a large number of distinct values. (ex. 고객번호 / overfitting) •Information gain ratio • Intrinsic value (많이 쪼개는 것에 대한 패널티, 쪼개는 것에 대한 엔트로피)
  • 18. C 4.5 - Improvements from ID.3 algorithm
  • 19. C 4.5 – Handling continuous attribute (mid-point)
  • 20. Pruning •Pre-pruning / Post-pruning •Reduced error pruning • Subtree 제거했을 때 성능 차이가 없다면 pruning •Cost complexity pruning
  • 21. C 5.0
  • 22. CART • 기본적인 컨셉은 C 4.5와 유사 • Regression 가능 • Binary split • Choose attribute recursively • Classification – Gini impurity / Regression – Variance reduction
  • 23. Ensemble Bootstrap Aggregation (Bagging) ◦ Random Forest Boosting ◦ AdaBoost ◦ Gradient Boosting
  • 24. Bootstrap Aggregation (Bagging) • Bootstrapping • Given a standard training set D of size n, bagging generates m new training sets Di, each of size n′, by sampling from D uniformly and with replacement. By sampling with replacement, some observations may be repeated in each Di. If n′=n, then for large n the set Di is expected to have the fraction (1 - 1/e) (≈63.2%) of the unique examples of D, the rest being duplicates. • Aggregation • This kind of sample is known as a bootstrap sample. Then, m models are fitted using the above m bootstrap samples and combined by averaging the output (for regression) or voting (for classification) lim 𝑛→∞ 1 − 1 − 1 𝑛 𝑛
  • 25. Random Forest – random subspace • Random forests differ in only one way from this general scheme: they use a modified tree learning algorithm that selects, at each candidate split in the learning process, a random subset of the features. This process is sometimes called "feature bagging". • The reason for doing this is the correlation of the trees in an ordinary bootstrap sample: if one or a few features are very strong predictors for the response variable (target output), these features will be selected in many of the B trees, causing them to become correlated.
  • 26. Extra-Trees • Its two main differences with other tree based ensemble methods are that it splits nodes by choosing cut-points fully at random and that it uses the whole learning sample (rather than a bootstrap replica) to grow the trees. • 𝑛 𝑚𝑖𝑛 : the minimum sample size for splitting a node • 비슷한 성능을 유지하면서 computational cost감소
  • 27. Random Forest – Variable importance
  • 28. Boosting Boosting algorithms consist of iteratively learning weak classifiers with respect to a distribution and adding them to a final strong classifier. When they are added, they are typically weighted in some way that is usually related to the weak learners' accuracy
  • 31. AdaBoost error↓ → α↑ Logit function error가 0또는 1
  • 36. AdaBoost m-1번째까지 모델이 만들어져 있고 m번째 weak learner 추가하는 과정을 가정 𝛼 𝑚, 𝑦 𝑚에 대해서만 최적화 시행 𝑇 𝑚 : 𝑦 𝑚에 의해 정확히 분류된 포인트 / 𝑀 𝑚 : 𝑦 𝑚에 의해 잘못 분류된 포인트
  • 37. AdaBoost 14.23을 𝑦 𝑚에 대해 minimize, 𝛼 𝑚상수 취급 / 14.15식 유도
  • 38. AdaBoost 14.23을𝛼 𝑚에 대해 minimize / 14.17식 유도 https://en.wikipedia.org/wiki/AdaBoost
  • 39. AdaBoost exp(− 𝛼 𝑚 2 )는 n에 대해 독립적이기 때문에 제거 가능