SlideShare a Scribd company logo
1 of 12
Download to read offline
Machine
Learning - IV
Decision Trees - II
Tree Algorithms:
For Categorical target variable
1. Gini is the most widely used splitting criterion.
It gives the probability that 2 times chosen at random from the same
population are in the same class.
For a pure population. The probability is 1
#reds=2 blue=0 #reds=7 blue=10
#prop. of reds=1 #prop. of reds=7/17=.41
#prop. of blue=0 #prop. of blue=10/17=.58
Gini =1^2 + 0^2=1 Gini =.41^2 + .58^2=.504
A
Rupak Roy
Tree Algorithms:
For Categorical target variable
#reds=10 blue= 2 #reds=2 blue=10
#prop. of reds=10/12= .83 #prop. of reds=.166
#prop. of blue= 2/12= .166 #prop. of blue=.83
Gini =.83^2 + .17^2=.71 Gini =.17^2 + .83^2=.71
Gini Score for split
A: (1*2/19)+(.50*10/19)=.48 B: (.71*12/24)+(.71*12/24)=.71
Higher the Gini score the better the model is. So higher Gini score will be
chosen by Gini method. It is the default for Decision Trees
B
Rupak Roy
Tree Algorithms:
Categorical target variable
2. Information Gain
Before applying Information Gain lets understand what is Logarithm.
What is the log(10,000)? = 4
10,000 = 10 x 10 x 10 X 10 =(10)4
#of reds(8), blue(4) #of reds(4), blue(8)
#Prop. of reds(.7) Blue(.3) #Prop. of reds(.3) Blue(.7)
#Entropy of the node1 #Entropy of the node2
= -1 *(.7log 2 (.7) + .3log 2(.3)) = -1 *(.3log 2 (.3) + .7log 2(.7))
A
Tree Algorithms:
Categorical target variable
We can repeat the the same for B
And assume the entropy for the split (B) = entropy of node1+node2 =.81
Then we will compute information gain for B
Information gain for (B) = Entropy (parent node) – Entropy (split)
i.e. Information gain = 1 – entropy of the split = 1-.81 =0.19
Higher the Entropy score the better the model is. Finally the information
gain will choose the higher entropy score.
Entropy is a measure on how disorganized the systems is.
Entropy ranges from 0 to1
Pure node has an Entropy of 0 while impure node has Entropy of 1
B
Tree Algorithms:
Categorical target variable
3. Chi-Square: is a test of statistical significance developed by Karl
Pearson
Chi- Square = square root of (actual –expected)2 / Expected
Again the highest chi-square score will be selected.
Rupak Roy
Tree Algorithms:
Continuous target variable
4. Reduction in Variance
Variance measures how far each number in the set is from the mean. In
simple words variance the is fact or quality of being different, divergent,
or inconsistent.
A low variance refers most values
are close to the mean.
A high variance refers most values
are far from the mean.
Varaince = where,
So the reduction in variance split criterion is specially designed for
target variable having continuous/ numeric data type.
Pure node variance is 0,and like before the highest score will be
selected.
Over fitting & Tree Pruning
A fully grown tree tends to over fit the data. It occurs when a statistical
model describes random error or noise and generally occurs when a
model is excessively complex.
The model with over fitting will result in poor predicting power.
Rupak Roy
Over fitting & Tree Pruning
Pruning: process of eliminating unstable nodes to create simpler, robust
nodes. In other words it reduces the size of decision trees by removing
sections of the tree that provides little predicting
power. Pruning reduces the complexity of the final result, and hence
improves predictive accuracy by the reduction of over fitting.
Pruning Algorithms:
CART- Prunes the tree by imposing a complexity penalty based on
number of leaves in the tree.
C5- assumes a higher rate of error than what is seen on the training
data. The smaller the node the more the increase over observed. When
the child node estimate is higher than the parent node, tree is pruned.
Still it is advisable to study the tree in detail for any node that looks
unstable should be pruned.
Rupak Roy
Applications of Techniques
1) Classification & Regression trees (CART) algorithm uses the GINI
method to create binary splits. Most commonly used decision tree
algorithm.
2) Chi-square Automatic Interaction Detector (CHAID) – detecting
statistical relationship between variables. It uses the Chi-square
algorithm test to produce multi- splits.
3) Gini: method is used in sociology and other noisy domains.
4) Reduction in variance & F-test algorithms used in regression trees.
Rupak Roy
Summary
For Binary Target / Categorical use CART
For Noisy data use CART i.e. GINI
If you want trees with multiple splits at each level use CHAID
Numeric target variable use F-test &
Continuous target variable use Reduction in variance
Rupak Roy
Next
We will learn what are the requirements that makes a good decision
tree.
Rupak Roy

More Related Content

What's hot

Types of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IITypes of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IIRupak Roy
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)Abhimanyu Dwivedi
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)Abhimanyu Dwivedi
 
Types of Statistics
Types of Statistics Types of Statistics
Types of Statistics Rupak Roy
 
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Rupak Roy
 
Random forest
Random forestRandom forest
Random forestUjjawal
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsRupak Roy
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Methods for feature/variable selection in Regression Analysis
Methods for feature/variable selection in Regression AnalysisMethods for feature/variable selection in Regression Analysis
Methods for feature/variable selection in Regression AnalysisRupak Roy
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)Abhimanyu Dwivedi
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai UniversityMadhav Mishra
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionDerek Kane
 

What's hot (19)

Types of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics IITypes of Probability Distributions - Statistics II
Types of Probability Distributions - Statistics II
 
Machine learning session8(svm nlp)
Machine learning   session8(svm nlp)Machine learning   session8(svm nlp)
Machine learning session8(svm nlp)
 
Machine learning session9(clustering)
Machine learning   session9(clustering)Machine learning   session9(clustering)
Machine learning session9(clustering)
 
Types of Statistics
Types of Statistics Types of Statistics
Types of Statistics
 
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
Multiple sample test - Anova, Chi-square, Test of association, Goodness of Fit
 
Random forest
Random forestRandom forest
Random forest
 
Understanding the Machine Learning Algorithms
Understanding the Machine Learning AlgorithmsUnderstanding the Machine Learning Algorithms
Understanding the Machine Learning Algorithms
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Random forest
Random forestRandom forest
Random forest
 
Methods for feature/variable selection in Regression Analysis
Methods for feature/variable selection in Regression AnalysisMethods for feature/variable selection in Regression Analysis
Methods for feature/variable selection in Regression Analysis
 
Machine learning session7(nb classifier k-nn)
Machine learning   session7(nb classifier k-nn)Machine learning   session7(nb classifier k-nn)
Machine learning session7(nb classifier k-nn)
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Machine learning session1
Machine learning   session1Machine learning   session1
Machine learning session1
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai UniversityMachine Learning Unit 3 Semester 3  MSc IT Part 2 Mumbai University
Machine Learning Unit 3 Semester 3 MSc IT Part 2 Mumbai University
 
Data Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model SelectionData Science - Part III - EDA & Model Selection
Data Science - Part III - EDA & Model Selection
 

Similar to Machine Learning Decision Tree Algorithms

Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learningSadia Zafar
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Itembarthriley
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Derek Kane
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IRRushdi Shams
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)AlexAman1
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Wuhyun Rico Shin
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression TreesHemant Chetwani
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentationVijay Yadav
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxShivakrishnan18
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...IRJET Journal
 

Similar to Machine Learning Decision Tree Algorithms (20)

Decision tree
Decision tree Decision tree
Decision tree
 
16 Simple CART
16 Simple CART16 Simple CART
16 Simple CART
 
Probability distribution Function & Decision Trees in machine learning
Probability distribution Function  & Decision Trees in machine learningProbability distribution Function  & Decision Trees in machine learning
Probability distribution Function & Decision Trees in machine learning
 
The Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive ItemThe Use Of Decision Trees For Adaptive Item
The Use Of Decision Trees For Adaptive Item
 
Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests Data Science - Part V - Decision Trees & Random Forests
Data Science - Part V - Decision Trees & Random Forests
 
Common evaluation measures in NLP and IR
Common evaluation measures in NLP and IRCommon evaluation measures in NLP and IR
Common evaluation measures in NLP and IR
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
Supervised learning (2)
Supervised learning (2)Supervised learning (2)
Supervised learning (2)
 
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
Paper review: Measuring the Intrinsic Dimension of Objective Landscapes.
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Decision tree presentation
Decision tree presentationDecision tree presentation
Decision tree presentation
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Design of experiments(
Design of experiments(Design of experiments(
Design of experiments(
 
Module III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptxModule III - Classification Decision tree (1).pptx
Module III - Classification Decision tree (1).pptx
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
Saliency Based Hookworm and Infection Detection for Wireless Capsule Endoscop...
 

More from Rupak Roy

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPRupak Roy
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPRupak Roy
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLPRupak Roy
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical StepsRupak Roy
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular ExpressionsRupak Roy
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase ArchitectureRupak Roy
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase Rupak Roy
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQLRupak Roy
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Rupak Roy
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive Rupak Roy
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSRupak Roy
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Rupak Roy
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functionsRupak Roy
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to FlumeRupak Roy
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Rupak Roy
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command LineRupak Roy
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations Rupak Roy
 

More from Rupak Roy (20)

Hierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLPHierarchical Clustering - Text Mining/NLP
Hierarchical Clustering - Text Mining/NLP
 
Clustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLPClustering K means and Hierarchical - NLP
Clustering K means and Hierarchical - NLP
 
Network Analysis - NLP
Network Analysis  - NLPNetwork Analysis  - NLP
Network Analysis - NLP
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Sentiment Analysis Practical Steps
Sentiment Analysis Practical StepsSentiment Analysis Practical Steps
Sentiment Analysis Practical Steps
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Text Mining using Regular Expressions
Text Mining using Regular ExpressionsText Mining using Regular Expressions
Text Mining using Regular Expressions
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
Apache Hbase Architecture
Apache Hbase ArchitectureApache Hbase Architecture
Apache Hbase Architecture
 
Introduction to Hbase
Introduction to Hbase Introduction to Hbase
Introduction to Hbase
 
Apache Hive Table Partition and HQL
Apache Hive Table Partition and HQLApache Hive Table Partition and HQL
Apache Hive Table Partition and HQL
 
Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export Installing Apache Hive, internal and external table, import-export
Installing Apache Hive, internal and external table, import-export
 
Introductive to Hive
Introductive to Hive Introductive to Hive
Introductive to Hive
 
Scoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMSScoop Job, import and export to RDBMS
Scoop Job, import and export to RDBMS
 
Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode Apache Scoop - Import with Append mode and Last Modified mode
Apache Scoop - Import with Append mode and Last Modified mode
 
Introduction to scoop and its functions
Introduction to scoop and its functionsIntroduction to scoop and its functions
Introduction to scoop and its functions
 
Introduction to Flume
Introduction to FlumeIntroduction to Flume
Introduction to Flume
 
Apache Pig Relational Operators - II
Apache Pig Relational Operators - II Apache Pig Relational Operators - II
Apache Pig Relational Operators - II
 
Passing Parameters using File and Command Line
Passing Parameters using File and Command LinePassing Parameters using File and Command Line
Passing Parameters using File and Command Line
 
Apache PIG Relational Operations
Apache PIG Relational Operations Apache PIG Relational Operations
Apache PIG Relational Operations
 

Recently uploaded

Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Servicejennyeacort
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts ServiceCall Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
Call Girls In Noida City Center Metro 24/7✡️9711147426✡️ Escorts Service
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
Russian Call Girls Dwarka Sector 15 💓 Delhi 9999965857 @Sabina Modi VVIP MODE...
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 

Machine Learning Decision Tree Algorithms

  • 2. Tree Algorithms: For Categorical target variable 1. Gini is the most widely used splitting criterion. It gives the probability that 2 times chosen at random from the same population are in the same class. For a pure population. The probability is 1 #reds=2 blue=0 #reds=7 blue=10 #prop. of reds=1 #prop. of reds=7/17=.41 #prop. of blue=0 #prop. of blue=10/17=.58 Gini =1^2 + 0^2=1 Gini =.41^2 + .58^2=.504 A Rupak Roy
  • 3. Tree Algorithms: For Categorical target variable #reds=10 blue= 2 #reds=2 blue=10 #prop. of reds=10/12= .83 #prop. of reds=.166 #prop. of blue= 2/12= .166 #prop. of blue=.83 Gini =.83^2 + .17^2=.71 Gini =.17^2 + .83^2=.71 Gini Score for split A: (1*2/19)+(.50*10/19)=.48 B: (.71*12/24)+(.71*12/24)=.71 Higher the Gini score the better the model is. So higher Gini score will be chosen by Gini method. It is the default for Decision Trees B Rupak Roy
  • 4. Tree Algorithms: Categorical target variable 2. Information Gain Before applying Information Gain lets understand what is Logarithm. What is the log(10,000)? = 4 10,000 = 10 x 10 x 10 X 10 =(10)4 #of reds(8), blue(4) #of reds(4), blue(8) #Prop. of reds(.7) Blue(.3) #Prop. of reds(.3) Blue(.7) #Entropy of the node1 #Entropy of the node2 = -1 *(.7log 2 (.7) + .3log 2(.3)) = -1 *(.3log 2 (.3) + .7log 2(.7)) A
  • 5. Tree Algorithms: Categorical target variable We can repeat the the same for B And assume the entropy for the split (B) = entropy of node1+node2 =.81 Then we will compute information gain for B Information gain for (B) = Entropy (parent node) – Entropy (split) i.e. Information gain = 1 – entropy of the split = 1-.81 =0.19 Higher the Entropy score the better the model is. Finally the information gain will choose the higher entropy score. Entropy is a measure on how disorganized the systems is. Entropy ranges from 0 to1 Pure node has an Entropy of 0 while impure node has Entropy of 1 B
  • 6. Tree Algorithms: Categorical target variable 3. Chi-Square: is a test of statistical significance developed by Karl Pearson Chi- Square = square root of (actual –expected)2 / Expected Again the highest chi-square score will be selected. Rupak Roy
  • 7. Tree Algorithms: Continuous target variable 4. Reduction in Variance Variance measures how far each number in the set is from the mean. In simple words variance the is fact or quality of being different, divergent, or inconsistent. A low variance refers most values are close to the mean. A high variance refers most values are far from the mean. Varaince = where, So the reduction in variance split criterion is specially designed for target variable having continuous/ numeric data type. Pure node variance is 0,and like before the highest score will be selected.
  • 8. Over fitting & Tree Pruning A fully grown tree tends to over fit the data. It occurs when a statistical model describes random error or noise and generally occurs when a model is excessively complex. The model with over fitting will result in poor predicting power. Rupak Roy
  • 9. Over fitting & Tree Pruning Pruning: process of eliminating unstable nodes to create simpler, robust nodes. In other words it reduces the size of decision trees by removing sections of the tree that provides little predicting power. Pruning reduces the complexity of the final result, and hence improves predictive accuracy by the reduction of over fitting. Pruning Algorithms: CART- Prunes the tree by imposing a complexity penalty based on number of leaves in the tree. C5- assumes a higher rate of error than what is seen on the training data. The smaller the node the more the increase over observed. When the child node estimate is higher than the parent node, tree is pruned. Still it is advisable to study the tree in detail for any node that looks unstable should be pruned. Rupak Roy
  • 10. Applications of Techniques 1) Classification & Regression trees (CART) algorithm uses the GINI method to create binary splits. Most commonly used decision tree algorithm. 2) Chi-square Automatic Interaction Detector (CHAID) – detecting statistical relationship between variables. It uses the Chi-square algorithm test to produce multi- splits. 3) Gini: method is used in sociology and other noisy domains. 4) Reduction in variance & F-test algorithms used in regression trees. Rupak Roy
  • 11. Summary For Binary Target / Categorical use CART For Noisy data use CART i.e. GINI If you want trees with multiple splits at each level use CHAID Numeric target variable use F-test & Continuous target variable use Reduction in variance Rupak Roy
  • 12. Next We will learn what are the requirements that makes a good decision tree. Rupak Roy