SlideShare a Scribd company logo
1 of 15
Decision Tree Classifier
Dr.G.Jasmine Beulah
Kristu Jayanti College
Introduction
A decision tree consists of
• Nodes: test for the value of a certain attribute
• Edges: correspond to the outcome of a test
• connect to the next node or leaf
• Leaves: terminal nodes that predict the outcome
Decision Tree Classifier
Decision Tree Learning
What Is Information Gain?
Information Gain (IG) is the most significant measure used to build a Decision Tree. It indicates how
much “information” a particular feature/ variable gives us about the final outcome.
Information Gain is important because it used to choose the variable that best splits the data at each
node of a Decision Tree. The variable with the highest IG is used to split the data at the root node.
Equation For Information Gain (IG):
Entropy: Entropy is nothing but the uncertainty in our dataset or measure of disorder
An Example to understand
Create a Decision Tree that classifies the speed of a car (response variable) as either slow or fast,
depending on the following predictor variables:
•Road type
•Obstruction
•Speed limit
• By calculating the Entropy and Information Gain (IG) for each of the
predictor variables, starting with ‘Road type’.
• In our data set, there are four observations in the ‘Road type’ column that
correspond to four labels in the ‘Speed of car’ column.
• We shall begin by calculating the entropy of the parent node (Speed of
car).
• Step one is to find out the fraction of the two classes present in the parent
node. We know that there are a total of four values present in the parent
node, out of which two samples belong to the ‘slow’ class and the other 2
belong to the ‘fast’ class, therefore:
• P(slow) -> fraction of ‘slow’ outcomes in the parent node
• P(fast) -> fraction of ‘fast’ outcomes in the parent node
• Now that we know that the entropy of the parent node is 1, let’s see
how to calculate the Information Gain for the ‘Road type’ variable.
Remember that, if the Information gain of the ‘Road type’ variable is
greater than the Information Gain of all the other predictor variables,
only then the root node can be split by using the ‘Road type’ variable.
• In order to calculate the Information Gain of ‘Road type’ variable, we
first need to split the root node by the ‘Road type’ variable.
• we’ve split the parent node by using the ‘Road type’ variable, the child
nodes denote the corresponding responses as shown in the data set. Now,
we need to measure the entropy of the child nodes.
• The entropy of the right-hand side child node (fast) is 0 because all of the
outcomes in this node belongs to one class (fast). In a similar manner, we
must find the Entropy of the left-hand side node (slow, slow, fast).
• In this node there are two types of outcomes (fast and slow), therefore, we
first need to calculate the fraction of slow and fast outcomes for this
particular node.
P(slow) = 2/3 = 0.667
P(fast) = 1/3 = 0.334
Therefore, entropy is:
Entropy(left child node) = – {0.667 log2(0.667) + 0.334 log2(0.334)} =
– {-0.38 + (-0.52)}
= 0.9
• Our next step is to calculate the Entropy(children) with weighted
average:
• Total number of outcomes in parent node: 4
• Total number of outcomes in left child node: 3
• Total number of outcomes in right child node: 1
• Formula for Entropy(children) with weighted avg. :
• [Weighted avg]Entropy(children) = (no. of outcomes in left child node)
/ (total no. of outcomes in parent node) * (entropy of left node) + (no.
of outcomes in right child node)/ (total no. of outcomes in parent
node) * (entropy of right node)
• By using the above formula you’ll find that the, Entropy(children) with
weighted avg. is = 0.675
Our final step is to substitute the above weighted average in the IG formula in order to calculate the final IG of the
‘Road type’ variable:
Information Gain formula - Decision Tree Algorithm - EdurekaTherefore,
Information gain(Road type) = 1 – 0.675 = 0.325
Information gain of Road type feature is 0.325.
• The Decision Tree Algorithm selects the variable with the highest
Information Gain to split the Decision Tree. Therefore, by using the
above method you need to calculate the Information Gain for all the
predictor variables to check which variable has the highest IG.
• So by using the above methodology, you must get the following
values for each predictor variable:
• Information gain(Road type) = 1 – 0.675 = 0.325
• Information gain(Obstruction) = 1 – 1 = 0
• Information gain(Speed limit) = 1 – 0 = 1
So, here we can see that the ‘Speed limit’ variable has the
highest Information Gain. Therefore, the final Decision Tree for
this dataset is built using the ‘Speed limit’ variable.
A Sample Task – Build a Decision Tree – Try out

More Related Content

Similar to Decision Tree Classifier Explained

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxPriyadharshiniG41
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Kush Kulshrestha
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptglorypreciousj
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning Souma Maiti
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Maninda Edirisooriya
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computertttiba
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing frameworkAgnes van Belle
 
evolutionary algo's.ppt
evolutionary algo's.pptevolutionary algo's.ppt
evolutionary algo's.pptSherazAhmed103
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision treeAAKANKSHA JAIN
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfDatacademy.ai
 

Similar to Decision Tree Classifier Explained (20)

Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
ML MODULE 4.pdf
ML MODULE 4.pdfML MODULE 4.pdf
ML MODULE 4.pdf
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Setting up an A/B-testing framework
Setting up an A/B-testing frameworkSetting up an A/B-testing framework
Setting up an A/B-testing framework
 
evolutionary algo's.ppt
evolutionary algo's.pptevolutionary algo's.ppt
evolutionary algo's.ppt
 
Decision tree
Decision tree Decision tree
Decision tree
 
Random forest and decision tree
Random forest and decision treeRandom forest and decision tree
Random forest and decision tree
 
Id3 algorithm
Id3 algorithmId3 algorithm
Id3 algorithm
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 

More from Dr. Jasmine Beulah Gnanadurai

More from Dr. Jasmine Beulah Gnanadurai (20)

Data Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptxData Warehouse_Architecture.pptx
Data Warehouse_Architecture.pptx
 
DMQL(Data Mining Query Language).pptx
DMQL(Data Mining Query Language).pptxDMQL(Data Mining Query Language).pptx
DMQL(Data Mining Query Language).pptx
 
Stacks.pptx
Stacks.pptxStacks.pptx
Stacks.pptx
 
Quick Sort.pptx
Quick Sort.pptxQuick Sort.pptx
Quick Sort.pptx
 
KBS Architecture.pptx
KBS Architecture.pptxKBS Architecture.pptx
KBS Architecture.pptx
 
Knowledge Representation in AI.pptx
Knowledge Representation in AI.pptxKnowledge Representation in AI.pptx
Knowledge Representation in AI.pptx
 
File allocation methods (1)
File allocation methods (1)File allocation methods (1)
File allocation methods (1)
 
Segmentation in operating systems
Segmentation in operating systemsSegmentation in operating systems
Segmentation in operating systems
 
Mem mgt
Mem mgtMem mgt
Mem mgt
 
Association rules apriori algorithm
Association rules   apriori algorithmAssociation rules   apriori algorithm
Association rules apriori algorithm
 
Big data architecture
Big data architectureBig data architecture
Big data architecture
 
Knowledge representation
Knowledge representationKnowledge representation
Knowledge representation
 
Aritificial intelligence
Aritificial intelligenceAritificial intelligence
Aritificial intelligence
 
Java threads
Java threadsJava threads
Java threads
 
Java Applets
Java AppletsJava Applets
Java Applets
 
Stacks and Queue - Data Structures
Stacks and Queue - Data StructuresStacks and Queue - Data Structures
Stacks and Queue - Data Structures
 
JavaScript Functions
JavaScript FunctionsJavaScript Functions
JavaScript Functions
 
JavaScript Operators
JavaScript OperatorsJavaScript Operators
JavaScript Operators
 
Css Text Formatting
Css Text FormattingCss Text Formatting
Css Text Formatting
 
CSS - Cascading Style Sheet
CSS - Cascading Style SheetCSS - Cascading Style Sheet
CSS - Cascading Style Sheet
 

Recently uploaded

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxUnboundStockton
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...jaredbarbolino94
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 

Recently uploaded (20)

Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Blooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docxBlooming Together_ Growing a Community Garden Worksheet.docx
Blooming Together_ Growing a Community Garden Worksheet.docx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...Historical philosophical, theoretical, and legal foundations of special and i...
Historical philosophical, theoretical, and legal foundations of special and i...
 
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 

Decision Tree Classifier Explained

  • 1. Decision Tree Classifier Dr.G.Jasmine Beulah Kristu Jayanti College
  • 2. Introduction A decision tree consists of • Nodes: test for the value of a certain attribute • Edges: correspond to the outcome of a test • connect to the next node or leaf • Leaves: terminal nodes that predict the outcome
  • 5. What Is Information Gain? Information Gain (IG) is the most significant measure used to build a Decision Tree. It indicates how much “information” a particular feature/ variable gives us about the final outcome. Information Gain is important because it used to choose the variable that best splits the data at each node of a Decision Tree. The variable with the highest IG is used to split the data at the root node. Equation For Information Gain (IG): Entropy: Entropy is nothing but the uncertainty in our dataset or measure of disorder
  • 6. An Example to understand Create a Decision Tree that classifies the speed of a car (response variable) as either slow or fast, depending on the following predictor variables: •Road type •Obstruction •Speed limit
  • 7. • By calculating the Entropy and Information Gain (IG) for each of the predictor variables, starting with ‘Road type’. • In our data set, there are four observations in the ‘Road type’ column that correspond to four labels in the ‘Speed of car’ column. • We shall begin by calculating the entropy of the parent node (Speed of car). • Step one is to find out the fraction of the two classes present in the parent node. We know that there are a total of four values present in the parent node, out of which two samples belong to the ‘slow’ class and the other 2 belong to the ‘fast’ class, therefore: • P(slow) -> fraction of ‘slow’ outcomes in the parent node • P(fast) -> fraction of ‘fast’ outcomes in the parent node
  • 8.
  • 9. • Now that we know that the entropy of the parent node is 1, let’s see how to calculate the Information Gain for the ‘Road type’ variable. Remember that, if the Information gain of the ‘Road type’ variable is greater than the Information Gain of all the other predictor variables, only then the root node can be split by using the ‘Road type’ variable. • In order to calculate the Information Gain of ‘Road type’ variable, we first need to split the root node by the ‘Road type’ variable.
  • 10. • we’ve split the parent node by using the ‘Road type’ variable, the child nodes denote the corresponding responses as shown in the data set. Now, we need to measure the entropy of the child nodes. • The entropy of the right-hand side child node (fast) is 0 because all of the outcomes in this node belongs to one class (fast). In a similar manner, we must find the Entropy of the left-hand side node (slow, slow, fast). • In this node there are two types of outcomes (fast and slow), therefore, we first need to calculate the fraction of slow and fast outcomes for this particular node. P(slow) = 2/3 = 0.667 P(fast) = 1/3 = 0.334 Therefore, entropy is: Entropy(left child node) = – {0.667 log2(0.667) + 0.334 log2(0.334)} = – {-0.38 + (-0.52)} = 0.9
  • 11. • Our next step is to calculate the Entropy(children) with weighted average: • Total number of outcomes in parent node: 4 • Total number of outcomes in left child node: 3 • Total number of outcomes in right child node: 1 • Formula for Entropy(children) with weighted avg. : • [Weighted avg]Entropy(children) = (no. of outcomes in left child node) / (total no. of outcomes in parent node) * (entropy of left node) + (no. of outcomes in right child node)/ (total no. of outcomes in parent node) * (entropy of right node) • By using the above formula you’ll find that the, Entropy(children) with weighted avg. is = 0.675
  • 12. Our final step is to substitute the above weighted average in the IG formula in order to calculate the final IG of the ‘Road type’ variable: Information Gain formula - Decision Tree Algorithm - EdurekaTherefore, Information gain(Road type) = 1 – 0.675 = 0.325 Information gain of Road type feature is 0.325.
  • 13. • The Decision Tree Algorithm selects the variable with the highest Information Gain to split the Decision Tree. Therefore, by using the above method you need to calculate the Information Gain for all the predictor variables to check which variable has the highest IG. • So by using the above methodology, you must get the following values for each predictor variable: • Information gain(Road type) = 1 – 0.675 = 0.325 • Information gain(Obstruction) = 1 – 1 = 0 • Information gain(Speed limit) = 1 – 0 = 1
  • 14. So, here we can see that the ‘Speed limit’ variable has the highest Information Gain. Therefore, the final Decision Tree for this dataset is built using the ‘Speed limit’ variable.
  • 15. A Sample Task – Build a Decision Tree – Try out