SlideShare a Scribd company logo
Decision Tree
Decision Tree
• Decision tree is the most powerful and popular tool for
classification and prediction.
• A Decision tree is a flowchart like tree structure, where each
internal node denotes a test on an attribute, each branch
represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
Introduction
• The set of records available for developing
classification methods (decision tree) is divided into
two disjoint subsets – a training set and a test set.
• The attributes of the records are categorise into two
types:
– Attributes whose domain is numerical are called numerical
attributes.
– Attributes whose domain is not numerical are called the
categorical attributes.
Decision Tree Example
• The data set has five attributes.
Decision Tree Example
• The data set has five attributes.
• There is a special attribute: the attribute class is the class label.
• The attributes, temp (temperature) and humidity are
numerical attributes.
• Other attributes are categorical, that is, they cannot be
ordered.
• Based on the training data set, we want to find a set of rules to
know what values of outlook, temperature, humidity and
wind, determine whether or not to play golf.
Decision Tree Example
Decision Tree Example
• We have five leaf nodes.
• In a decision tree, each leaf node represents a rule.
• We have the following rules corresponding to the tree given in
Figure.
• RULE 1: If it is sunny and the humidity is not above 75% then
play.
• RULE 2: If it is sunny and the humidity is above 75%, then do
not play.
• RULE 3: If it is overcast, then play.
• RULE 4: If it is rainy and not windy, then play.
• RULE 5: If it is rainy and windy, then don't play
Classification
• The classification of an unknown input vector is done by
traversing the tree from the root node to a leaf node.
• A record enters the tree at the root node.
• At the root, a test is applied to determine which child node the
record will encounter next.
• This process is repeated until the record arrives at a leaf node.
• All the records that end up at a given leaf of the tree are
classified in the same way.
• There is a unique path from the root to each leaf.
• The path is a rule which is used to classify the records.
Classification
• In our tree, we can carry out the classification for an unknown
record as follows.
• Let us assume, for the record, that we know the values of the
first four attributes (but we the values of the first four
attributes (but we do not know the value of class attribute) as :
outlook= rain; temp = 70; humidity = 65; and windy= true.
• We start from the root node to check the value of the attribute
associated at the root node.
Classification
• outlook= rain; temp = 70; humidity = 65; and windy= true.
• We start from the root node to check the value of the attribute
associated at the root node.
• In our example, outlook is the splitting attribute at root.
• Since for the given record, outlook = rain, we move to the
rightmost child node of the root.
• At this node, the splitting attribute is windy and we find that
for the record we want classify, windy = true.
• Hence, we move to the left child node to conclude that the
class label Is "no play".
Accuracy of Classification
RULE 1
• If it is sunny and the humidity is not above 75%, then
play.
Accuracy of Classification
• The accuracy of the classifier is determined by the percentage
of the test data set that is correctly classified. For example :
RULE 1
– If it is sunny and the humidity is not above 75%, then play.
• We can see that for Rule 1 there are two records of the test
data set satisfying outlook= sunny and humidity < =75, and
only one of these is correctly classified as play.
• Thus, the accuracy of this Rule 1 is 0.5 (or 50%).
Advantages of Decision Tree
Classifications
• A decision tree construction process is concerned with
identifying the splitting attributes and splitting criterion at
every level of the tree.
Major strengths are:
• Decision tree able to generate understandable rules.
• They are able to handle both numerical and categorical
attributes.
• They provide clear indication of which fields are most
important for prediction or classification.
Shortcomings of Decision Tree
Classifications
Weaknesses are:
• The process of growing a decision tree is computationally
expensive. At each node, each candidate splitting field is
examined before its best split can be found.
• Some decision tree can only deal with binary-valued target
classes.
Iterative Dichotomizer (ID3)
Quinlan (1986)
Each node corresponds to a splitting attribute
Each arc is a possible value of that attribute.
• At each node the splitting attribute is selected to be the most
informative among the attributes not yet considered in the path
from the root.
• Entropy is used to measure how informative is a node.
Iterative Dichotomizer (ID3)
The algorithm uses the criterion of information gain to
determine the goodness of a split.
The attribute with the greatest information gain is taken as the
splitting attribute, and the data set is split for all distinct values
of the attribute.
Attribute Selection Measure: Information
Gain (ID3/C4.5)
Entropy
• Entropy measures the homogeneity (purity) of a
set of examples.
• It gives the information content of the set in terms
of the class labels of the examples.
• Consider that you have a set of examples, S with
two classes, P and N. Let the set have p instances
for the class P and n instances for the class N.
• So the total number of instances we have is t = p +
n. The view [p, n] can be seen as a class
distribution of S.
Entropy
Entropy
• The entropy for a completely pure set is 0 and is 1 for a set
with equal occurrences for both the classes.
i.e. Entropy[14,0] = - (14/14).log2(14/14) - (0/14).log2(0/14)
= -1.log2(1) - 0.log2(0)
= -1.0 - 0
= 0
i.e. Entropy[7,7] = - (7/14).log2(7/14) - (7/14).log2(7/14)
= - (0.5).log2(0.5) - (0.5).log2(0.5)
= - (0.5).(-1) - (0.5).(-1)
= 0.5 + 0.5
= 1
Information
Information
• Example:
• I(p, n) = I(9, 5) ?
• Here there are 9+5 =14 samples out of which
9 are positive and 5 are negative.
Thus:
• I(9,5) = -(9/14)log2
(9/14) – (5/14)log2
(5/14)
• = 0.4102 + 0.5304
• = 0.9406
Information Gain
Gain (age) = I(p.n) – E(age)
= 0.940-0.694
= 0.246
• Therefore, out of these 4 attributes (age,
income, students, credit rating), age will be
selected as splitting attribute at level 0

More Related Content

Similar to decision tree.pdf

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
decisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptxdecisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptx
ABINASHPADHY6
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
AjayKumar773878
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
zohebmusharraf
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
Shweta Ghate
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
hktripathy
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
ssuser5c580e1
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
digitalzombie
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
vijaita kashyap
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
Valerii Klymchuk
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
LECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmathsLECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmaths
jafari12
 
Chemunit1presentation 110830201747-phpapp01
Chemunit1presentation 110830201747-phpapp01Chemunit1presentation 110830201747-phpapp01
Chemunit1presentation 110830201747-phpapp01
Cleophas Rwemera
 
classVIII_Coding_Teacher_Presentation.pptx
classVIII_Coding_Teacher_Presentation.pptxclassVIII_Coding_Teacher_Presentation.pptx
classVIII_Coding_Teacher_Presentation.pptx
bhanutickets
 

Similar to decision tree.pdf (20)

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
decisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptxdecisiontree-110906040745-phpapp01.pptx
decisiontree-110906040745-phpapp01.pptx
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
Unit-V.pptx DVD is a great way to get sbi and more jobs available review and ...
 
Decision tree
Decision treeDecision tree
Decision tree
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
Lt. 5 Pattern Reg.pptx
Lt. 5  Pattern Reg.pptxLt. 5  Pattern Reg.pptx
Lt. 5 Pattern Reg.pptx
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
04 Classification in Data Mining
04 Classification in Data Mining04 Classification in Data Mining
04 Classification in Data Mining
 
Decision tree
Decision treeDecision tree
Decision tree
 
LECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmathsLECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmaths
 
Chemunit1presentation 110830201747-phpapp01
Chemunit1presentation 110830201747-phpapp01Chemunit1presentation 110830201747-phpapp01
Chemunit1presentation 110830201747-phpapp01
 
classVIII_Coding_Teacher_Presentation.pptx
classVIII_Coding_Teacher_Presentation.pptxclassVIII_Coding_Teacher_Presentation.pptx
classVIII_Coding_Teacher_Presentation.pptx
 

Recently uploaded

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
Nguyen Thanh Tu Collection
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
ArianaBusciglio
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
Dr. Shivangi Singh Parihar
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
JezreelCabil2
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
deeptiverma2406
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
Bisnar Chase Personal Injury Attorneys
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
Krisztián Száraz
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
AG2 Design
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
thanhdowork
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
Celine George
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 

Recently uploaded (20)

BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
BÀI TẬP BỔ TRỢ TIẾNG ANH GLOBAL SUCCESS LỚP 3 - CẢ NĂM (CÓ FILE NGHE VÀ ĐÁP Á...
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
Group Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana BuscigliopptxGroup Presentation 2 Economics.Ariana Buscigliopptx
Group Presentation 2 Economics.Ariana Buscigliopptx
 
CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.PCOS corelations and management through Ayurveda.
PCOS corelations and management through Ayurveda.
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
Landownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptxLandownership in the Philippines under the Americans-2-pptx.pptx
Landownership in the Philippines under the Americans-2-pptx.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
Best Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDABest Digital Marketing Institute In NOIDA
Best Digital Marketing Institute In NOIDA
 
Top five deadliest dog breeds in America
Top five deadliest dog breeds in AmericaTop five deadliest dog breeds in America
Top five deadliest dog breeds in America
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Advantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO PerspectiveAdvantages and Disadvantages of CMS from an SEO Perspective
Advantages and Disadvantages of CMS from an SEO Perspective
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
Delivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and TrainingDelivering Micro-Credentials in Technical and Vocational Education and Training
Delivering Micro-Credentials in Technical and Vocational Education and Training
 
A Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptxA Survey of Techniques for Maximizing LLM Performance.pptx
A Survey of Techniques for Maximizing LLM Performance.pptx
 
How to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold MethodHow to Build a Module in Odoo 17 Using the Scaffold Method
How to Build a Module in Odoo 17 Using the Scaffold Method
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 

decision tree.pdf

  • 2. Decision Tree • Decision tree is the most powerful and popular tool for classification and prediction. • A Decision tree is a flowchart like tree structure, where each internal node denotes a test on an attribute, each branch represents an outcome of the test, and each leaf node (terminal node) holds a class label.
  • 3. Introduction • The set of records available for developing classification methods (decision tree) is divided into two disjoint subsets – a training set and a test set. • The attributes of the records are categorise into two types: – Attributes whose domain is numerical are called numerical attributes. – Attributes whose domain is not numerical are called the categorical attributes.
  • 4. Decision Tree Example • The data set has five attributes.
  • 5. Decision Tree Example • The data set has five attributes. • There is a special attribute: the attribute class is the class label. • The attributes, temp (temperature) and humidity are numerical attributes. • Other attributes are categorical, that is, they cannot be ordered. • Based on the training data set, we want to find a set of rules to know what values of outlook, temperature, humidity and wind, determine whether or not to play golf.
  • 7. Decision Tree Example • We have five leaf nodes. • In a decision tree, each leaf node represents a rule. • We have the following rules corresponding to the tree given in Figure. • RULE 1: If it is sunny and the humidity is not above 75% then play. • RULE 2: If it is sunny and the humidity is above 75%, then do not play. • RULE 3: If it is overcast, then play. • RULE 4: If it is rainy and not windy, then play. • RULE 5: If it is rainy and windy, then don't play
  • 8. Classification • The classification of an unknown input vector is done by traversing the tree from the root node to a leaf node. • A record enters the tree at the root node. • At the root, a test is applied to determine which child node the record will encounter next. • This process is repeated until the record arrives at a leaf node. • All the records that end up at a given leaf of the tree are classified in the same way. • There is a unique path from the root to each leaf. • The path is a rule which is used to classify the records.
  • 9. Classification • In our tree, we can carry out the classification for an unknown record as follows. • Let us assume, for the record, that we know the values of the first four attributes (but we the values of the first four attributes (but we do not know the value of class attribute) as : outlook= rain; temp = 70; humidity = 65; and windy= true. • We start from the root node to check the value of the attribute associated at the root node.
  • 10. Classification • outlook= rain; temp = 70; humidity = 65; and windy= true. • We start from the root node to check the value of the attribute associated at the root node. • In our example, outlook is the splitting attribute at root. • Since for the given record, outlook = rain, we move to the rightmost child node of the root. • At this node, the splitting attribute is windy and we find that for the record we want classify, windy = true. • Hence, we move to the left child node to conclude that the class label Is "no play".
  • 11. Accuracy of Classification RULE 1 • If it is sunny and the humidity is not above 75%, then play.
  • 12. Accuracy of Classification • The accuracy of the classifier is determined by the percentage of the test data set that is correctly classified. For example : RULE 1 – If it is sunny and the humidity is not above 75%, then play. • We can see that for Rule 1 there are two records of the test data set satisfying outlook= sunny and humidity < =75, and only one of these is correctly classified as play. • Thus, the accuracy of this Rule 1 is 0.5 (or 50%).
  • 13. Advantages of Decision Tree Classifications • A decision tree construction process is concerned with identifying the splitting attributes and splitting criterion at every level of the tree. Major strengths are: • Decision tree able to generate understandable rules. • They are able to handle both numerical and categorical attributes. • They provide clear indication of which fields are most important for prediction or classification.
  • 14. Shortcomings of Decision Tree Classifications Weaknesses are: • The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field is examined before its best split can be found. • Some decision tree can only deal with binary-valued target classes.
  • 15. Iterative Dichotomizer (ID3) Quinlan (1986) Each node corresponds to a splitting attribute Each arc is a possible value of that attribute. • At each node the splitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root. • Entropy is used to measure how informative is a node.
  • 16. Iterative Dichotomizer (ID3) The algorithm uses the criterion of information gain to determine the goodness of a split. The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute.
  • 17. Attribute Selection Measure: Information Gain (ID3/C4.5)
  • 18. Entropy • Entropy measures the homogeneity (purity) of a set of examples. • It gives the information content of the set in terms of the class labels of the examples. • Consider that you have a set of examples, S with two classes, P and N. Let the set have p instances for the class P and n instances for the class N. • So the total number of instances we have is t = p + n. The view [p, n] can be seen as a class distribution of S.
  • 20. Entropy • The entropy for a completely pure set is 0 and is 1 for a set with equal occurrences for both the classes. i.e. Entropy[14,0] = - (14/14).log2(14/14) - (0/14).log2(0/14) = -1.log2(1) - 0.log2(0) = -1.0 - 0 = 0 i.e. Entropy[7,7] = - (7/14).log2(7/14) - (7/14).log2(7/14) = - (0.5).log2(0.5) - (0.5).log2(0.5) = - (0.5).(-1) - (0.5).(-1) = 0.5 + 0.5 = 1
  • 22. Information • Example: • I(p, n) = I(9, 5) ? • Here there are 9+5 =14 samples out of which 9 are positive and 5 are negative. Thus: • I(9,5) = -(9/14)log2 (9/14) – (5/14)log2 (5/14) • = 0.4102 + 0.5304 • = 0.9406
  • 23.
  • 24. Information Gain Gain (age) = I(p.n) – E(age) = 0.940-0.694 = 0.246
  • 25. • Therefore, out of these 4 attributes (age, income, students, credit rating), age will be selected as splitting attribute at level 0