SlideShare a Scribd company logo
Lesson 17
Classification Techniques – Decision Trees Kush Kulshrestha
Decision Tree Algorithm
A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch
represents a decision rule, and each leaf node represents the outcome.
The topmost node in a decision tree is known as the root node. It learns to partition on the basis of the attribute
value. It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in
decision making.
Decision Tree Algorithm
Decision Tree is a white box type of ML algorithm. It shares internal decision-making logic, which makes them one of
the most interpretable algorithm in machine learning.
Its training time is faster compared to the neural network algorithm. The decision tree is a distribution-free or non-
parametric method, which does not depend upon probability distribution assumptions. Decision trees can handle
high dimensional data with good accuracy.
Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods.
Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike
linear models, they map non-linear relationships quite well.
Common Terms
• Root Node: It represents entire population or sample and this further gets divided into two or more
homogeneous sets.
• Splitting: It is a process of dividing a node into two or more sub-nodes.
• Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.
• Leaf/ Terminal Node: Nodes that do not split are called Leaf or Terminal node.
• Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite
process of splitting.
• Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.
• Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub-
nodes are the child of parent node.
Working of the Algo
1. Select the best attribute using Attribute Selection Measures(ASM) to split the records.
2. Make that attribute a decision node and breaks the dataset into smaller subsets.
3. Starts tree building by repeating this process recursively for each child until one of the condition will match:
1. All the tuples belong to the same attribute value.
2. There are no more remaining attributes
3. There are no more instances.
Attribute Selection Measures
Attribute selection measure is a heuristic for selecting the splitting criterion that partition data into the best possible
manner.
It is also known as splitting rules because it helps us to determine breakpoints for tuples on a given node.
ASM provides a rank to each feature(or attribute) by explaining the given dataset. Best score attribute will be
selected as a splitting attribute.
Most popular selection measures are Information Gain, Gini Index.
Information Gain
• Shannon invented the concept of entropy, which measures the impurity of the input set. In physics and
mathematics, entropy referred as the randomness or the impurity in the system. In information theory, it refers to
the impurity in a group of examples.
• Information gain is the decrease in entropy.
• Information gain computes the difference between entropy before split and average entropy after split of the
dataset based on given attribute values.
• ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain.
• Less impure node requires less information to describe it. And, more impure node requires more information.
Information theory is a measure to define this degree of disorganization in a system known as Entropy.
• If the sample is completely homogeneous, then the entropy is zero and if the sample is an equally divided (50% —
 50%), it has entropy of one.
• Entropy can be calculated using formula:- Entropy = -p log2 p — q log2q,
Where p and q is probability of success and failure respectively in that node.
• Entropy is also used with categorical target variable. It chooses the split which has lowest entropy compared to
parent node and other splits. The lesser the entropy, the better it is.
Information Gain
To build a decision tree, we need to calculate two types of entropy using frequency tables as follows:
1) Entropy using the frequency table of one attribute
2) Entropy using the frequency table of two attributes
Information Gain
1) Entropy using the frequency table of one attribute
Information Gain
2. Entropy using the frequency table of two attributes
Information Gain
The information gain is based on the decrease in entropy after a data-set is split on an attribute. Constructing a
decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous
branches).
Step 1: Calculate entropy of the target.
Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is
added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before
the split. The result is the Information Gain, or decrease in entropy.
Information Gain
Information Gain
Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its branches
and repeat the same process on every branch.
Information Gain
Step 4(a): A branch with entropy of 0 is a leaf node.
Information Gain
Step 4(b): A branch with entropy more than 0 needs further splitting.
Gini Index
Steps to Calculate Gini for a split:
1. Calculate Gini for sub-nodes, using formula sum of square of probability for success and failure (p²+q²).
2. Calculate Gini for split using weighted Gini score of each node of that split.
Referring to example where we want to segregate the students based on target variable ( playing cricket or not ). In
the snapshot below, we split the population using two input variables Gender and Class. Now, I want to identify
which split is producing more homogeneous sub-nodes using Gini index.
Gini Index
Split on Gender:
• Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68
• Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55
• Weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59
Gini Index
Similar for Split on Class:
• Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51
• Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51
• Weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51
Gini Index
Gini score for split on Gender: 0.59
Gini score on the split on Class: 0.51
Above, you can see that Gini score for Split on Gender is higher than Split on Class, hence, the node split will take
place on Gender.
Constraints on Tree Size
1. Minimum samples for a node split
• Defines the minimum number of samples (or observations) which are required in a node to be considered for
splitting.
• Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific
to the particular sample selected for a tree.
• Too high values can lead to under-fitting.
2. Minimum samples for a terminal node
• Defines the minimum samples (or observations) required in a terminal node or leaf.
• Used to control over-fitting similar to min_samples_split.
• Generally lower values should be chosen for imbalanced class problems because the regions in which the
minority class will be in majority will be very small.
Constraints on Tree Size
3. Maximum depth of tree
• The maximum depth of a tree.
• Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample.
4. Maximum number of terminal nodes
• The maximum number of terminal nodes or leaves in a tree.
• Can be defined in place of max_depth. Since binary trees are created, a depth of ’n’ would produce a maximum of
2^n leaves.
5. Maximum features to consider for split
• The number of features to consider while searching for a best split. These will be randomly selected.
• As a thumb-rule, square root of the total number of features works great but we should check upto 30–40% of
the total number of features.
• Higher values can lead to over-fitting but depends on case to case.

More Related Content

What's hot

Decision tree
Decision treeDecision tree
Decision tree
Soujanya V
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
Swati .
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
Pradeep Redddy Raamana
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
Knoldus Inc.
 
Decision tree
Decision treeDecision tree
Decision tree
shivani saluja
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
Kamal Acharya
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
Data Mining
Data MiningData Mining
Data Mining
IIIT ALLAHABAD
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
SOUMIT KAR
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
piyushkumar222909
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3Xueping Peng
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
Eng Teong Cheah
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
MdAlAmin187
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
Arshad Farhad
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
Mahbubur Rahman Shimul
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
Lippo Group Digital
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
Haris Jamil
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
Model Selection Techniques
Model Selection TechniquesModel Selection Techniques
Model Selection Techniques
 
Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?Cross-validation Tutorial: What, how and which?
Cross-validation Tutorial: What, how and which?
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
Decision tree
Decision treeDecision tree
Decision tree
 
Random forest
Random forestRandom forest
Random forest
 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Data Mining
Data MiningData Mining
Data Mining
 
Overfitting & Underfitting
Overfitting & UnderfittingOverfitting & Underfitting
Overfitting & Underfitting
 
Ensemble Learning.pptx
Ensemble Learning.pptxEnsemble Learning.pptx
Ensemble Learning.pptx
 
Decision Tree - ID3
Decision Tree - ID3Decision Tree - ID3
Decision Tree - ID3
 
Decision tree
Decision treeDecision tree
Decision tree
 
Classification and Clustering
Classification and ClusteringClassification and Clustering
Classification and Clustering
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Unsupervised learning clustering
Unsupervised learning clusteringUnsupervised learning clustering
Unsupervised learning clustering
 
Dbscan algorithom
Dbscan algorithomDbscan algorithom
Dbscan algorithom
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 

Similar to Machine Learning Algorithm - Decision Trees

Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi
 
Decision tree
Decision tree Decision tree
Decision tree
Learnbay Datascience
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
Palin analytics
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
jaskarankaur21
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
Shivakrishnan18
 
Classification
ClassificationClassification
Classification
DataminingTools Inc
 
Classification
ClassificationClassification
Classification
Datamining Tools
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
vijaita kashyap
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
Vijay Yadav
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
RaflyRizky2
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
Takrim Ul Islam Laskar
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
Kaviya452563
 

Similar to Machine Learning Algorithm - Decision Trees (20)

Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision tree
Decision tree Decision tree
Decision tree
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx20211229120253D6323_PERT 06_ Ensemble Learning.pptx
20211229120253D6323_PERT 06_ Ensemble Learning.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Facial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional FaceFacial Emotion Detection on Children's Emotional Face
Facial Emotion Detection on Children's Emotional Face
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
data mining.pptx
data mining.pptxdata mining.pptx
data mining.pptx
 

More from Kush Kulshrestha

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
Kush Kulshrestha
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
Kush Kulshrestha
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
Kush Kulshrestha
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
Kush Kulshrestha
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
Kush Kulshrestha
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
Kush Kulshrestha
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
Kush Kulshrestha
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
Kush Kulshrestha
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
Kush Kulshrestha
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
Kush Kulshrestha
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
Kush Kulshrestha
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
Kush Kulshrestha
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
Kush Kulshrestha
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
Kush Kulshrestha
 
Time management
Time managementTime management
Time management
Kush Kulshrestha
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
Kush Kulshrestha
 

More from Kush Kulshrestha (17)

Clustering - Machine Learning Techniques
Clustering - Machine Learning TechniquesClustering - Machine Learning Techniques
Clustering - Machine Learning Techniques
 
Machine Learning Algorithm - KNN
Machine Learning Algorithm - KNNMachine Learning Algorithm - KNN
Machine Learning Algorithm - KNN
 
Machine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for ClassificationMachine Learning Algorithm - Naive Bayes for Classification
Machine Learning Algorithm - Naive Bayes for Classification
 
Machine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic RegressionMachine Learning Algorithm - Logistic Regression
Machine Learning Algorithm - Logistic Regression
 
Assumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine LearningAssumptions of Linear Regression - Machine Learning
Assumptions of Linear Regression - Machine Learning
 
Interpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine LearningInterpreting Regression Results - Machine Learning
Interpreting Regression Results - Machine Learning
 
Machine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear RegressionMachine Learning Algorithm - Linear Regression
Machine Learning Algorithm - Linear Regression
 
General Concepts of Machine Learning
General Concepts of Machine LearningGeneral Concepts of Machine Learning
General Concepts of Machine Learning
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Visualization-2
Visualization-2Visualization-2
Visualization-2
 
Visualization-1
Visualization-1Visualization-1
Visualization-1
 
Inferential Statistics
Inferential StatisticsInferential Statistics
Inferential Statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
Wireless Charging of Electric Vehicles
Wireless Charging of Electric VehiclesWireless Charging of Electric Vehicles
Wireless Charging of Electric Vehicles
 
Time management
Time managementTime management
Time management
 
Handshakes and their types
Handshakes and their typesHandshakes and their types
Handshakes and their types
 

Recently uploaded

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
oz8q3jxlp
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
NABLAS株式会社
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
MaleehaSheikh2
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
enxupq
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 

Recently uploaded (20)

一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
一比一原版(Deakin毕业证书)迪肯大学毕业证如何办理
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
FP Growth Algorithm and its Applications
FP Growth Algorithm and its ApplicationsFP Growth Algorithm and its Applications
FP Growth Algorithm and its Applications
 
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单一比一原版(YU毕业证)约克大学毕业证成绩单
一比一原版(YU毕业证)约克大学毕业证成绩单
 
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 

Machine Learning Algorithm - Decision Trees

  • 1. Lesson 17 Classification Techniques – Decision Trees Kush Kulshrestha
  • 2. Decision Tree Algorithm A decision tree is a flowchart-like tree structure where an internal node represents feature(or attribute), the branch represents a decision rule, and each leaf node represents the outcome. The topmost node in a decision tree is known as the root node. It learns to partition on the basis of the attribute value. It partitions the tree in recursively manner call recursive partitioning. This flowchart-like structure helps you in decision making.
  • 3. Decision Tree Algorithm Decision Tree is a white box type of ML algorithm. It shares internal decision-making logic, which makes them one of the most interpretable algorithm in machine learning. Its training time is faster compared to the neural network algorithm. The decision tree is a distribution-free or non- parametric method, which does not depend upon probability distribution assumptions. Decision trees can handle high dimensional data with good accuracy. Tree based learning algorithms are considered to be one of the best and mostly used supervised learning methods. Tree based methods empower predictive models with high accuracy, stability and ease of interpretation. Unlike linear models, they map non-linear relationships quite well.
  • 4. Common Terms • Root Node: It represents entire population or sample and this further gets divided into two or more homogeneous sets. • Splitting: It is a process of dividing a node into two or more sub-nodes. • Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node. • Leaf/ Terminal Node: Nodes that do not split are called Leaf or Terminal node. • Pruning: When we remove sub-nodes of a decision node, this process is called pruning. You can say opposite process of splitting. • Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree. • Parent and Child Node: A node, which is divided into sub-nodes is called parent node of sub-nodes whereas sub- nodes are the child of parent node.
  • 5. Working of the Algo 1. Select the best attribute using Attribute Selection Measures(ASM) to split the records. 2. Make that attribute a decision node and breaks the dataset into smaller subsets. 3. Starts tree building by repeating this process recursively for each child until one of the condition will match: 1. All the tuples belong to the same attribute value. 2. There are no more remaining attributes 3. There are no more instances.
  • 6. Attribute Selection Measures Attribute selection measure is a heuristic for selecting the splitting criterion that partition data into the best possible manner. It is also known as splitting rules because it helps us to determine breakpoints for tuples on a given node. ASM provides a rank to each feature(or attribute) by explaining the given dataset. Best score attribute will be selected as a splitting attribute. Most popular selection measures are Information Gain, Gini Index.
  • 7. Information Gain • Shannon invented the concept of entropy, which measures the impurity of the input set. In physics and mathematics, entropy referred as the randomness or the impurity in the system. In information theory, it refers to the impurity in a group of examples. • Information gain is the decrease in entropy. • Information gain computes the difference between entropy before split and average entropy after split of the dataset based on given attribute values. • ID3 (Iterative Dichotomiser) decision tree algorithm uses information gain. • Less impure node requires less information to describe it. And, more impure node requires more information. Information theory is a measure to define this degree of disorganization in a system known as Entropy. • If the sample is completely homogeneous, then the entropy is zero and if the sample is an equally divided (50% —  50%), it has entropy of one. • Entropy can be calculated using formula:- Entropy = -p log2 p — q log2q, Where p and q is probability of success and failure respectively in that node. • Entropy is also used with categorical target variable. It chooses the split which has lowest entropy compared to parent node and other splits. The lesser the entropy, the better it is.
  • 8. Information Gain To build a decision tree, we need to calculate two types of entropy using frequency tables as follows: 1) Entropy using the frequency table of one attribute 2) Entropy using the frequency table of two attributes
  • 9. Information Gain 1) Entropy using the frequency table of one attribute
  • 10. Information Gain 2. Entropy using the frequency table of two attributes
  • 11. Information Gain The information gain is based on the decrease in entropy after a data-set is split on an attribute. Constructing a decision tree is all about finding attribute that returns the highest information gain (i.e., the most homogeneous branches). Step 1: Calculate entropy of the target. Step 2: The dataset is then split on the different attributes. The entropy for each branch is calculated. Then it is added proportionally, to get total entropy for the split. The resulting entropy is subtracted from the entropy before the split. The result is the Information Gain, or decrease in entropy.
  • 13. Information Gain Step 3: Choose attribute with the largest information gain as the decision node, divide the dataset by its branches and repeat the same process on every branch.
  • 14. Information Gain Step 4(a): A branch with entropy of 0 is a leaf node.
  • 15. Information Gain Step 4(b): A branch with entropy more than 0 needs further splitting.
  • 16. Gini Index Steps to Calculate Gini for a split: 1. Calculate Gini for sub-nodes, using formula sum of square of probability for success and failure (p²+q²). 2. Calculate Gini for split using weighted Gini score of each node of that split. Referring to example where we want to segregate the students based on target variable ( playing cricket or not ). In the snapshot below, we split the population using two input variables Gender and Class. Now, I want to identify which split is producing more homogeneous sub-nodes using Gini index.
  • 17. Gini Index Split on Gender: • Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68 • Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55 • Weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59
  • 18. Gini Index Similar for Split on Class: • Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51 • Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51 • Weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51
  • 19. Gini Index Gini score for split on Gender: 0.59 Gini score on the split on Class: 0.51 Above, you can see that Gini score for Split on Gender is higher than Split on Class, hence, the node split will take place on Gender.
  • 20. Constraints on Tree Size 1. Minimum samples for a node split • Defines the minimum number of samples (or observations) which are required in a node to be considered for splitting. • Used to control over-fitting. Higher values prevent a model from learning relations which might be highly specific to the particular sample selected for a tree. • Too high values can lead to under-fitting. 2. Minimum samples for a terminal node • Defines the minimum samples (or observations) required in a terminal node or leaf. • Used to control over-fitting similar to min_samples_split. • Generally lower values should be chosen for imbalanced class problems because the regions in which the minority class will be in majority will be very small.
  • 21. Constraints on Tree Size 3. Maximum depth of tree • The maximum depth of a tree. • Used to control over-fitting as higher depth will allow model to learn relations very specific to a particular sample. 4. Maximum number of terminal nodes • The maximum number of terminal nodes or leaves in a tree. • Can be defined in place of max_depth. Since binary trees are created, a depth of ’n’ would produce a maximum of 2^n leaves. 5. Maximum features to consider for split • The number of features to consider while searching for a best split. These will be randomly selected. • As a thumb-rule, square root of the total number of features works great but we should check upto 30–40% of the total number of features. • Higher values can lead to over-fitting but depends on case to case.