SlideShare a Scribd company logo
1 of 28
DA 5230 – Statistical & Machine Learning
Lecture 9 – Decision Trees and Ensemble
Methods
Maninda Edirisooriya
manindaw@uom.lk
Decision Tree (DT)
• Tree-like ML modelling structure
• Each node is relevant to a
categorical feature and branch
to classes of that feature
• During prediction, data is moved
from root and passed down till it
meets a leaf
• Leaf node decides the final
prediction
BMI > 80
Age > 35
Smoking
Vegetarian Exercise
True False
True False
Cardiovascular Disease Predictor
Root Node
Internal Node
Leaf Node
Disease Healthy Disease Healthy
Disease Healthy
True
False True
False
True False
Decision Trees
• Suppose you have got a binary classification problem with 3
independent binary categorical variables X1, X2, X3, and 1 dependent
variable Y
• You can draw a decision tree starting from one of the X variables
• If this X variable cannot classify the training dataset perfectly, add
another X variable as a child node to the tree branches where there
are misclassifications (or not Pure)
• Even after adding the second X variable, if there are some
misclassifications in the branches, you can add the third X variable as
well OR you can add the third variable as the second node of the root
Decision Trees
You will be able to draw several trees like that, depending on the
training set and the X variables (note that outputs are not shown here)
Depth
1
Root X1
X2 X3
Depth
1
Root X1
X2 Depth
2
Depth
1
Root X1
X2
X3
Root
X1
Optimizing Decision Trees
• In order to find the maximum Purity of the classification you will have
to try with many decision trees
• As the number of parameters (nodes and their classes) are different
in each of the decision tree, there is no optimization algorithm to
minimize the error (or impurity)
• Known algorithms to find the globally optimum Decision Tree are
computationally expensive (problem known as a NP-hard Problem)
• Therefore, heuristic techniques are used to get better performance
out of Decision Trees
CART Algorithm
• CART (Classification And Regression Tree) is one of the best heuristic
Decision Tree algorithms
• There are 2 key decisions to be taken in the CART algorithm
1. How to select the X variable to be selected to split on each node?
2. What is the stopping criteria of splitting?
• Decision 1 is taken based on the basis of maximizing the Purity of
classification on the selected node
• Decision 2 is taken based on the basis either on,
• Reduction of purity added with new nodes
• Increased computational/memory complexity of new nodes
Stopping Criteria of Splitting
Splitting to a new node (being a leaf node) can be stopped with one of
the following criteria
• When all the data in the current node belongs to a one Y class
• Adding a new node exceeds the maximum depth of the tree
• Impurity reduction is less than a pre-defined threshold
• Number of data in the current node is lesser than a pre-defined
threshold
Adding a New Node (Splitting)
• A new node is added to a tree node, only when that branch has data
belongs to more than a one Y class (i.e. when impurity is there)
• When a new node is added, the total impurity of the new node
branches should be lesser than the current node
• Therefore, the new node is selected which has the capability of
increasing the purity (or reducing the impurity) as much as possible
• There are mainly 2 measurements to evaluate the impurity reduction,
1. Gini Index
2. Entropy
3. Variance (in Regression Trees)
Gini Index
• Gini Index (or Gini Impurity) is a well-known measure to evaluate the
value of discrimination between classes based on frequency
• Gini Impurity is defined as,
Gini Impurity = 1 - ෌𝐢=𝟏
𝐜
𝐏𝐢
𝟐
• Where,
• C is the number of Y classes and
• Pi is the data proportion of ith class
• Pi =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒊𝒕𝒉 𝒄𝒍𝒂𝒔𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆
𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆
Entropy
• Entropy is a measure of randomness or chaos in a system
• Entropy is defined as,
Entropy = H = - ෌𝐢=𝟏
𝐜
𝐏𝐢 𝐥𝐨𝐠𝟐(𝐏𝐢)
• Where,
• C is the number of Y classes and
• Pi is the data proportion of ith class
• Pi =
𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒊𝒕𝒉 𝒄𝒍𝒂𝒔𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆
𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆
• Entropy value is generally given as a negative value
• For 100% purely classified nodes, entropy is zero
Gini Impurity and Entropy vs. Proportion
Source: https://zerowithdot.com/decision-tree/
Classification Geometry
• Unlike many other Classifiers
(e.g. Logistic Classifier) Decision
Trees have Linear Hyperplanes
perpendicular to their axes
• This makes it difficult a DT to
define a diagonal decision
boundaries
• But this simplicity makes the
algorithm faster
Age = X1
BMI = X2
35
80
Age > 35
BMI > 35
Convert Continuous Features to Categorical
• Some of the X variables (e.g.: BMI) can be continuous
• They have to be converted to categorical variables to apply to DTs
• To convert a continuous variable to a binary categorical variable,
• Consider all the possible splits using all the data points as split points
• Calculate total entropy for each of the cases
• Select the case with the least entropy as the splitting point
• Encode all the data values with the new binary categorical variable
• Now you can apply this new feature to DTs
Bias-Variance Metrics of DT
• With sufficient number of X variables a DT can almost purely classify
(with 100% accuracy) for the training set
• But that kind of DT may not fit enough for test data
• Therefore, such a DT is generally considered a High Variance
(overfitting) and Low Bias ML algorithm
• However, we can increase regularization and make much smaller DTs
that have Lower Variance (which may somewhat increase the Bias)
Decision Tree Regularization
Following are some of the regularization techniques used to reduce its high
variance
1. Having a minimum limit for data points per node – will avoid adding new nodes
just for small amount of data to be classified
2. Having a maximum depth – will avoid having larger overfitting trees
3. Having a maximum number of nodes - will avoid having larger overfitting trees
4. Having a minimum decrease in loss - will avoid adding new nodes just for small
amount of purity improvement
5. Pruning the tree for misclassifications with validation data set (a special test
set) – will avoid having larger overfitting trees
• However, the variance can be hugely reduced when many different DTs are used
together
• This is known as making Ensemble Models
• This is possible, as computation cost for a DT is very small due to its simplicity
1. he validation set
Ensemble Methods
• Ensemble methods involve in combining multiple ML models that
produces a stronger model than any of its individual constituent
models
• Leverage the concept of the “Wisdom of the Crowd” where the
collective decision making of people brings much accurate decisions
than any individual person
• There are several main types of ensemble models
1. Bagging
2. Boosting
3. Stacking (combining heterogenous ML algorithms)
Bootstrapping
• Bootstrapping is a resampling technique used in statistics and ML
• The idea is to use the dataset as a data distribution where every
sample collected from the dataset is collected randomly with
replacement
• “With Replacement” means, when a datapoint is collected to the sample
from the distribution, the same data point is available in the distribution, to
be taken again as a new datapoint, to the sample
• In other words, the sample taken from the training dataset, can contain the
same data point as multiple copies
• This technique helps to increase the amount of training data without
actually increasing the given data
Bootstrapping
Source: https://dgarcia-eu.github.io/SocialDataScience/2_SocialDynamics/025_Bootstrapping/Bootstrapping.html
E.g.:
Bagging
• Bagging stands for Bootstrapping and Aggregating
• In this ensemble method, multiple models are built, where each
model is trained with Bootstrapped data from the original training
dataset
• As all the resultant models are similar in predictive power to each
other, they are averaged (aggregated) to get a prediction
• When it is a classification problem voting is used to get the
aggregation
Random Forrest
• Random Forrest use a modified version of the Bagging algorithm with
Decision Trees
• Instead of using all the X variables for any model in the ensemble,
Random Forrest selects a smaller subset of the available X variables
• For larger number of X variables this is generally 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐗 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬
• This algorithm has significantly less variance with almost no increase
in bias compared to an individual DT
• Random Forrest can be used with unscaled data and get faster results
• Random Forrest is used as a way for Feature Selection as well
Random Forrest
Source: https://www.javatpoint.com/machine-learning-random-forest-algorithm
E.g.:
Boosting
• Though DTs are generally considered as high variance ML algorithms,
it is possible to get highly regularized DTs that are low in variance but
higher in bias
• It was found that combining many such high bias DTs can make a low
bias ensemble model, with very little increase in variance, which is
known as Boosting
• There are many Boosting algorithms such as AdaBoost, Gradient
Boosting Machines (GBM), Light GBM and XGBoost
XGBoost
• Like in Bagging, XGBoost also samples data for each of the individual
DT by Bootstrapping
• But unlike in bagging, in XGBoost, each of the new DT is generated
sequentially, after evaluating earlier DT model with data
• When selecting data to train a new DT, data that failed to classify with
the earlier DT are given higher priority
• The idea is to generate new DTs to classify the data that were not
possible to classify with previous DTs
• XGBoost has even more advanced features for tuning in its
implementation than Random Forrest
XGBoost
Source: https://www.geeksforgeeks.org/xgboost/
E.g.:
Decision Tree - Advantages
• DT ensembles are very fast at learning compared to alternatives like
Neural Networks
• Feature scaling does not significantly impact the learning
performance in DT ensemble models
• Smaller DT ensembles have higher interpretability
• Helps to Feature Selection
• There are lesser hyperparameters to be tuned compared to Neural
Networks
Decision Tree - Disadvantages
• DT ensembles cannot learn highly deeper insights like Neural
Networks
• DTs or DT ensembles are not that capable of Transfer Learn (transfer
the knowledge learnt from one larger generic model to another new
one) its knowledge
One Hour Homework
• Officially we have one more hour to do after the end of the lecture
• Therefore, for this week’s extra hour you have a homework
• DT ensembles are actually the most widely used ML algorithms in
competitions specially with non-pre-processed datasets
• As Random Forrest and XGBoost can work well at the first shot it is very
important to practice them with real world datasets
• On the other hand these algorithms can be used as the feature selection
algorithms
• Good Luck!
Questions?

More Related Content

Similar to Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module Statistical & Machine Learning

Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017StampedeCon
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learningRajasekhar364622
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxShivakrishnan18
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxmonicafrancis71118
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxVijayalakshmi171563
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning Souma Maiti
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learningAkshay Kanchan
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computertttiba
 
Different Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxDifferent Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxAzad988896
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptxkibriaswe
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agricultureAboul Ella Hassanien
 

Similar to Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module Statistical & Machine Learning (20)

Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017Foundations of Machine Learning - StampedeCon AI Summit 2017
Foundations of Machine Learning - StampedeCon AI Summit 2017
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
Random Forest Decision Tree.pptx
Random Forest Decision Tree.pptxRandom Forest Decision Tree.pptx
Random Forest Decision Tree.pptx
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
Decision trees
Decision treesDecision trees
Decision trees
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
random forest.pptx
random forest.pptxrandom forest.pptx
random forest.pptx
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
PCA.pptx
PCA.pptxPCA.pptx
PCA.pptx
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Primer on major data mining algorithms
Primer on major data mining algorithmsPrimer on major data mining algorithms
Primer on major data mining algorithms
 
Classification.pptx
Classification.pptxClassification.pptx
Classification.pptx
 
Intro to machine learning
Intro to machine learningIntro to machine learning
Intro to machine learning
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Different Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptxDifferent Algorithms used in classification [Auto-saved].pptx
Different Algorithms used in classification [Auto-saved].pptx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Feature selection with imbalanced data in agriculture
Feature selection with  imbalanced data in agricultureFeature selection with  imbalanced data in agriculture
Feature selection with imbalanced data in agriculture
 

More from Maninda Edirisooriya

Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Maninda Edirisooriya
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Maninda Edirisooriya
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Maninda Edirisooriya
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Maninda Edirisooriya
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Maninda Edirisooriya
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Maninda Edirisooriya
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Maninda Edirisooriya
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Maninda Edirisooriya
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Maninda Edirisooriya
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Maninda Edirisooriya
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMManinda Edirisooriya
 

More from Maninda Edirisooriya (18)

Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
Lecture 9 - Deep Sequence Models, Learn Recurrent Neural Networks (RNN), GRU ...
 
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
Extra Lecture - Support Vector Machines (SVM), a lecture in subject module St...
 
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
Lecture 11 - KNN and Clustering, a lecture in subject module Statistical & Ma...
 
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
Lecture 10 - Model Testing and Evaluation, a lecture in subject module Statis...
 
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
Lecture 8 - Feature Engineering and Optimization, a lecture in subject module...
 
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
Lecture 7 - Bias, Variance and Regularization, a lecture in subject module St...
 
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
Lecture 6 - Logistic Regression, a lecture in subject module Statistical & Ma...
 
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
Lecture 5 - Gradient Descent, a lecture in subject module Statistical & Machi...
 
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
Lecture 4 - Linear Regression, a lecture in subject module Statistical & Mach...
 
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
Lecture 3 - Exploratory Data Analytics (EDA), a lecture in subject module Sta...
 
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
 
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAMAnalyzing the effectiveness of mobile and web channels using WSO2 BAM
Analyzing the effectiveness of mobile and web channels using WSO2 BAM
 
WSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolboxWSO2 BAM - Your big data toolbox
WSO2 BAM - Your big data toolbox
 
Training Report
Training ReportTraining Report
Training Report
 
GViz - Project Report
GViz - Project ReportGViz - Project Report
GViz - Project Report
 
Mortivation
MortivationMortivation
Mortivation
 
Hafnium impact 2008
Hafnium impact 2008Hafnium impact 2008
Hafnium impact 2008
 
ChatCrypt
ChatCryptChatCrypt
ChatCrypt
 

Recently uploaded

High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...Call Girls in Nagpur High Profile
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...RajaP95
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130Suhani Kapoor
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)Suman Mia
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxDeepakSakkari2
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 

Recently uploaded (20)

★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
★ CALL US 9953330565 ( HOT Young Call Girls In Badarpur delhi NCR
 
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
High Profile Call Girls Nashik Megha 7001305949 Independent Escort Service Na...
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Processing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptxProcessing & Properties of Floor and Wall Tiles.pptx
Processing & Properties of Floor and Wall Tiles.pptx
 
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
IMPLICATIONS OF THE ABOVE HOLISTIC UNDERSTANDING OF HARMONY ON PROFESSIONAL E...
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCRCall Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
Call Us -/9953056974- Call Girls In Vikaspuri-/- Delhi NCR
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)Software Development Life Cycle By  Team Orange (Dept. of Pharmacy)
Software Development Life Cycle By Team Orange (Dept. of Pharmacy)
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
Biology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptxBiology for Computer Engineers Course Handout.pptx
Biology for Computer Engineers Course Handout.pptx
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 

Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module Statistical & Machine Learning

  • 1. DA 5230 – Statistical & Machine Learning Lecture 9 – Decision Trees and Ensemble Methods Maninda Edirisooriya manindaw@uom.lk
  • 2. Decision Tree (DT) • Tree-like ML modelling structure • Each node is relevant to a categorical feature and branch to classes of that feature • During prediction, data is moved from root and passed down till it meets a leaf • Leaf node decides the final prediction BMI > 80 Age > 35 Smoking Vegetarian Exercise True False True False Cardiovascular Disease Predictor Root Node Internal Node Leaf Node Disease Healthy Disease Healthy Disease Healthy True False True False True False
  • 3. Decision Trees • Suppose you have got a binary classification problem with 3 independent binary categorical variables X1, X2, X3, and 1 dependent variable Y • You can draw a decision tree starting from one of the X variables • If this X variable cannot classify the training dataset perfectly, add another X variable as a child node to the tree branches where there are misclassifications (or not Pure) • Even after adding the second X variable, if there are some misclassifications in the branches, you can add the third X variable as well OR you can add the third variable as the second node of the root
  • 4. Decision Trees You will be able to draw several trees like that, depending on the training set and the X variables (note that outputs are not shown here) Depth 1 Root X1 X2 X3 Depth 1 Root X1 X2 Depth 2 Depth 1 Root X1 X2 X3 Root X1
  • 5. Optimizing Decision Trees • In order to find the maximum Purity of the classification you will have to try with many decision trees • As the number of parameters (nodes and their classes) are different in each of the decision tree, there is no optimization algorithm to minimize the error (or impurity) • Known algorithms to find the globally optimum Decision Tree are computationally expensive (problem known as a NP-hard Problem) • Therefore, heuristic techniques are used to get better performance out of Decision Trees
  • 6. CART Algorithm • CART (Classification And Regression Tree) is one of the best heuristic Decision Tree algorithms • There are 2 key decisions to be taken in the CART algorithm 1. How to select the X variable to be selected to split on each node? 2. What is the stopping criteria of splitting? • Decision 1 is taken based on the basis of maximizing the Purity of classification on the selected node • Decision 2 is taken based on the basis either on, • Reduction of purity added with new nodes • Increased computational/memory complexity of new nodes
  • 7. Stopping Criteria of Splitting Splitting to a new node (being a leaf node) can be stopped with one of the following criteria • When all the data in the current node belongs to a one Y class • Adding a new node exceeds the maximum depth of the tree • Impurity reduction is less than a pre-defined threshold • Number of data in the current node is lesser than a pre-defined threshold
  • 8. Adding a New Node (Splitting) • A new node is added to a tree node, only when that branch has data belongs to more than a one Y class (i.e. when impurity is there) • When a new node is added, the total impurity of the new node branches should be lesser than the current node • Therefore, the new node is selected which has the capability of increasing the purity (or reducing the impurity) as much as possible • There are mainly 2 measurements to evaluate the impurity reduction, 1. Gini Index 2. Entropy 3. Variance (in Regression Trees)
  • 9. Gini Index • Gini Index (or Gini Impurity) is a well-known measure to evaluate the value of discrimination between classes based on frequency • Gini Impurity is defined as, Gini Impurity = 1 - ෌𝐢=𝟏 𝐜 𝐏𝐢 𝟐 • Where, • C is the number of Y classes and • Pi is the data proportion of ith class • Pi = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒊𝒕𝒉 𝒄𝒍𝒂𝒔𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆
  • 10. Entropy • Entropy is a measure of randomness or chaos in a system • Entropy is defined as, Entropy = H = - ෌𝐢=𝟏 𝐜 𝐏𝐢 𝐥𝐨𝐠𝟐(𝐏𝐢) • Where, • C is the number of Y classes and • Pi is the data proportion of ith class • Pi = 𝑵𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒊𝒕𝒉 𝒄𝒍𝒂𝒔𝒔 𝒐𝒇 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆 𝑻𝒐𝒕𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒅𝒂𝒕𝒂 𝒊𝒏 𝒕𝒉𝒆 𝒏𝒐𝒅𝒆 • Entropy value is generally given as a negative value • For 100% purely classified nodes, entropy is zero
  • 11. Gini Impurity and Entropy vs. Proportion Source: https://zerowithdot.com/decision-tree/
  • 12. Classification Geometry • Unlike many other Classifiers (e.g. Logistic Classifier) Decision Trees have Linear Hyperplanes perpendicular to their axes • This makes it difficult a DT to define a diagonal decision boundaries • But this simplicity makes the algorithm faster Age = X1 BMI = X2 35 80 Age > 35 BMI > 35
  • 13. Convert Continuous Features to Categorical • Some of the X variables (e.g.: BMI) can be continuous • They have to be converted to categorical variables to apply to DTs • To convert a continuous variable to a binary categorical variable, • Consider all the possible splits using all the data points as split points • Calculate total entropy for each of the cases • Select the case with the least entropy as the splitting point • Encode all the data values with the new binary categorical variable • Now you can apply this new feature to DTs
  • 14. Bias-Variance Metrics of DT • With sufficient number of X variables a DT can almost purely classify (with 100% accuracy) for the training set • But that kind of DT may not fit enough for test data • Therefore, such a DT is generally considered a High Variance (overfitting) and Low Bias ML algorithm • However, we can increase regularization and make much smaller DTs that have Lower Variance (which may somewhat increase the Bias)
  • 15. Decision Tree Regularization Following are some of the regularization techniques used to reduce its high variance 1. Having a minimum limit for data points per node – will avoid adding new nodes just for small amount of data to be classified 2. Having a maximum depth – will avoid having larger overfitting trees 3. Having a maximum number of nodes - will avoid having larger overfitting trees 4. Having a minimum decrease in loss - will avoid adding new nodes just for small amount of purity improvement 5. Pruning the tree for misclassifications with validation data set (a special test set) – will avoid having larger overfitting trees • However, the variance can be hugely reduced when many different DTs are used together • This is known as making Ensemble Models • This is possible, as computation cost for a DT is very small due to its simplicity 1. he validation set
  • 16. Ensemble Methods • Ensemble methods involve in combining multiple ML models that produces a stronger model than any of its individual constituent models • Leverage the concept of the “Wisdom of the Crowd” where the collective decision making of people brings much accurate decisions than any individual person • There are several main types of ensemble models 1. Bagging 2. Boosting 3. Stacking (combining heterogenous ML algorithms)
  • 17. Bootstrapping • Bootstrapping is a resampling technique used in statistics and ML • The idea is to use the dataset as a data distribution where every sample collected from the dataset is collected randomly with replacement • “With Replacement” means, when a datapoint is collected to the sample from the distribution, the same data point is available in the distribution, to be taken again as a new datapoint, to the sample • In other words, the sample taken from the training dataset, can contain the same data point as multiple copies • This technique helps to increase the amount of training data without actually increasing the given data
  • 19. Bagging • Bagging stands for Bootstrapping and Aggregating • In this ensemble method, multiple models are built, where each model is trained with Bootstrapped data from the original training dataset • As all the resultant models are similar in predictive power to each other, they are averaged (aggregated) to get a prediction • When it is a classification problem voting is used to get the aggregation
  • 20. Random Forrest • Random Forrest use a modified version of the Bagging algorithm with Decision Trees • Instead of using all the X variables for any model in the ensemble, Random Forrest selects a smaller subset of the available X variables • For larger number of X variables this is generally 𝐍𝐮𝐦𝐛𝐞𝐫 𝐨𝐟 𝐗 𝐯𝐚𝐫𝐢𝐚𝐛𝐥𝐞𝐬 • This algorithm has significantly less variance with almost no increase in bias compared to an individual DT • Random Forrest can be used with unscaled data and get faster results • Random Forrest is used as a way for Feature Selection as well
  • 22. Boosting • Though DTs are generally considered as high variance ML algorithms, it is possible to get highly regularized DTs that are low in variance but higher in bias • It was found that combining many such high bias DTs can make a low bias ensemble model, with very little increase in variance, which is known as Boosting • There are many Boosting algorithms such as AdaBoost, Gradient Boosting Machines (GBM), Light GBM and XGBoost
  • 23. XGBoost • Like in Bagging, XGBoost also samples data for each of the individual DT by Bootstrapping • But unlike in bagging, in XGBoost, each of the new DT is generated sequentially, after evaluating earlier DT model with data • When selecting data to train a new DT, data that failed to classify with the earlier DT are given higher priority • The idea is to generate new DTs to classify the data that were not possible to classify with previous DTs • XGBoost has even more advanced features for tuning in its implementation than Random Forrest
  • 25. Decision Tree - Advantages • DT ensembles are very fast at learning compared to alternatives like Neural Networks • Feature scaling does not significantly impact the learning performance in DT ensemble models • Smaller DT ensembles have higher interpretability • Helps to Feature Selection • There are lesser hyperparameters to be tuned compared to Neural Networks
  • 26. Decision Tree - Disadvantages • DT ensembles cannot learn highly deeper insights like Neural Networks • DTs or DT ensembles are not that capable of Transfer Learn (transfer the knowledge learnt from one larger generic model to another new one) its knowledge
  • 27. One Hour Homework • Officially we have one more hour to do after the end of the lecture • Therefore, for this week’s extra hour you have a homework • DT ensembles are actually the most widely used ML algorithms in competitions specially with non-pre-processed datasets • As Random Forrest and XGBoost can work well at the first shot it is very important to practice them with real world datasets • On the other hand these algorithms can be used as the feature selection algorithms • Good Luck!