SlideShare a Scribd company logo
Decision tree
Goal of Classification Algorithm
• Build models with good generalization capabilities, i.e., Model that
accurately predict the class labels of previously unknown records.
• Classification Algorithm
• Naïve Bayes Classifier
• Decision Tree
• Rule-based classifiers
• Neural Network
• Support Vector Machine
Why decision tree?
• Decision trees are powerful and popular tools for
classification and prediction.
• Decision trees represent rules, which can be understood
by humans and used in knowledge system such as
database.
Predicting potential loan default
Predicting potential loan default (Credit)
Predicting potential loan default (Income)
Predicting potential loan default (Term)
Predicting potential loan default
(Personal Info)
Intelligent Application
Classifier Review
Input Predicted class
 Decision tree is a classifier in the form of a tree structure
 Decision tree maps out all possible decision paths in the
form of a tree.
– Root node: The node has no incoming edges and zero or
more outgoing edges.
– Internal node ((Decision node): specifies a test on a single
attribute
– Leaf node: indicates the value of the target attribute
– Branches (Arc/edge): split of one attribute
 Decision trees classify instances or examples by starting at
the root of the tree and moving through it until a leaf node
based on local optimum decesion.
Definition
Decision Tree
What does decision tree represents?
What does decision tree represents?
Scoring a loan application
Decision Tree Classification Task
Decision
Tree
Test Data
Training Data
Learn Decision tree from data ?
Decision Tree Learning Problem
Quality metric: Classification Error
• Error measure fraction of mistakes
=
#
#
• Best possible value : 0.0
• Worst possible value: 1.0
Find the tree with lowest classification
error
How do we find the best tree
•Exponentially large number of possible decision
tree makes decision tree hard.
Decision tree
• Decision tree to represent learned target function
• Each internal node tests an attribute
• Each branch corresponds to attribute value
• Each leaf node assigns a classification
• Can be represented by
logical formula
(2) Which node
to proceed?
(3) When to stop/ come
to conclusion?
(1) Which to
start? (root)
Tree Induction
•Greedy strategy.
• Split the records based on an attribute test that
optimizes certain criterion.
Greedy Algorithm
Step 1: Start with an empty tree
Greedy Algorithm
Step 2: Split on a feature
Feature split explained
Step 3: Making predictions
Step 4: Recursion
Greedy Decision Tree Algorithm
Step 1: Start with an empty tree
Step 2: Select a feature to split data
For each split of the tree.
Step 3: If nothing more to, make
predictions
Step 4: Otherwise, go to Step 2 &
continue (recurse) on this split.
Problem 1: Feature split
selection
Problem 2:
Stopping condition
Recursion
Design Issues of Decision Tree Induction
•Issues
• How to Classify a leaf node
• Assign the majority class
• If leaf is empty, assign the default class – the class that has the
highest popularity.
• Determine how to split the records
• How to specify the attribute test condition?
• How to determine the best split?
• Determine when to stop splitting
• Every attribute has already been included along this path
through the tree.
• Stop splitting if all the records belong to the same class or have
identical attribute values
• Stop when each leaf node has uncertainty below some
threshold.
Decision Tree learning
Start with the data
Assume N = 40, 3 features
Starts with all data
Compact visual notation: Root node
Decision Stump: Single Level Tree
Visual Notation: Intermediate
Node
Making Prediction with Decision Stump
How do we learn decision stump
Algorithms
•Many Algorithms:
• Hunt’s Algorithm (one of the earliest)
• ID3 (Iterative Dichotomiser)
• C4.5
• CART (Classification And Regression Tree)
• SLIQ, SPRINT
General Structure of Hunt’s Algorithm
• Basic of many existing DT algorithm.
• Let Dt be the set of training records that reach a
node t
• General Procedure:
• If Dt contains records that belong the same class
yt, then t is a leaf node labeled as yt
• If Dt contains records with the same attribute
values, then t is a leaf node labeled with the
majority class yt
• If Dt is an empty set, then t is a leaf node labeled
by the default class, yd
• If Dt contains records that belong to more than
one class, use an attribute test to split the data
into smaller subsets.
• Recursively apply the procedure to each
subset.
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Dt
?
Hunt’s Algorithm
Don’t
Cheat
Refund
Don’t
Cheat
Don’t
Cheat
Yes No
Refund
Don’t
Cheat
Yes No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
Taxable
Income
Don’t
Cheat
< 80K >= 80K
Refund
Don’t
Cheat
Yes No
Marital
Status
Don’t
Cheat
Cheat
Single,
Divorced
Married
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
4 Yes Married 120K No
7 Yes Divorced 220K No
2 No Married 100K No
3 No Single 70K No
5 No Divorced 95K Yes
6 No Married 60K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Hunt’s Algorithm
•Empty node (Non of the training records have the
combination of attribute value)
• Node is declared as a leaf node with same class label as
the majority class of training records associated with
its parent node.
•Non-empty node
• Same class
• Identical attribute values (Except for the class label)
• Node is declared as a leaf node with the same class label as the
majority class of training records associated with this node.
Iterative Dichotomiser (ID3)
• Dichotomisation means, the act of dividing into two sharply
different categories.
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
Yes
No
Principled Criterion
•Selection of an attribute to test at each node
choosing the most useful attribute for classifying
examples.
•Information gain
• Measures how well a given attribute separates the
training examples according to their target
classification.
• This measure is used to select among the candidate
attributes at each step while growing the tree.
• Gain is measure of how much we can reduce
uncertainty (value lies between 0, 1)
How to Specify Test Condition?
•Depends on attribute types
• Binary
• Nominal
• Ordinal
• Continuous
•Depends on number of ways to split
• 2-way split
• Multi-way split
Splitting Based on Nominal Attributes
• Binary split: The test condition for a binary attribute generates
two potential outcomes
Body Temp
{Warm-
blooded}
{Cold-blooded}
Splitting Based on Nominal Attributes
• Multi-way split: Use as many partitions as distinct values.
• Binary split: Divides values into two subsets.
Need to find optimal partitioning.
CarType
Family
Sports
Luxury
CarType
{Family,
Luxury} {Sports}
CarType
{Sports,
Luxury} {Family}
OR
Note: CART produces only binary split by considering all 2 − 1
ways of creating a binary partition of attribute values.
Splitting Based on Ordinal Attributes
• Multi-way split: Use as many partitions as distinct values.
• Binary split: Divides values into two subsets – respects the order
(Grouped as long as the grouping does not violates the order property
of the attribute values). Need to find optimal partitioning.
Size
Small
Medium
Large
Size
{Medium,
Large,
Extra Large} {Small}
Size
{Small,
Medium} {Large, Extra Large}
OR
Size
{Small,
Large} {Medium,
Extra Large}
Splitting Based on Continuous Attributes
•Different ways of handling
•Discretization to form an ordinal categorical
attribute
• Static – discretize once at the beginning
• Dynamic – ranges can be found by equal interval bucketing,
equal frequency bucketing (percentiles), or clustering.
•Binary Decision: (A < v) or (A  v)
• consider all possible splits and finds the best cut
• can be more compute intensive
Splitting Based on Continuous Attributes
Threshold Split
Splitting Based on Continuous Attributes
•Threshold Split in 1-D
Splitting Based on Continuous Attributes
•Visualizing the threshold split
Splitting Based on Continuous Attributes
Split on Age >= 38
Splitting Based on Continuous Attributes
•Split on Income >= $60K
Splitting Based on Continuous Attributes
•Each Split partition the 2D space
Splitting Based on Continuous Attributes
How to determine the Best Split
Before Splitting: 10 records of class 0,
10 records of class 1
Which test condition is the best?
• Class distribution of the records before and after splitting
How to determine the Best Split
• Greedy approach:
• Nodes with homogeneous class distribution are preferred
• Need a measure of node impurity: The smaller the degree of impurity the
more skewed the class distribution.
• Ideas?
• Entropy and Information gain
Non-homogeneous,
High degree of impurity
Homogeneous,
Low degree of impurity
Entropy
• A measure of
• Uncertainty
• (Im)Purity
• Information content
• Given a collection S
= − ⊕ log ⊕ − ⊖ log ⊖
⊕ is the proportion of positive examples in D
⊖ is the proportion of negative examples in D
• The lower the Entropy, the less uniform the distribution, the
purer the node.
Information Gain
• Gain tells us how would be gained by branching on A.
• Information gain is simply the expected reduction in entropy caused by
partitioning the examples according to the selected attribute.
• Information gain, Gain (S, A) of an attribute A is defined as
= − ( )
= ( )
∈ ( )
( ) is the set of all possible values for attribute A, is the subset
of D for which attribute A has value v
Simple Greedy Decision Tree Learning
When do we stop ?
Stopping Condition
1. All data agrees on y
Stopping Condition 2: Already split on
all features
Training Example
Example
• D is collection of 14 examples, 9 positive and 5 negative examples
9+, 5 − = −(9 14
⁄ )log 9 14
⁄ −
(5 14
⁄ )log 5 14
⁄ = 0.940
Entropy is 0 if all members of D belongs to the same class. Entropy is 1
when the collection contains an equal number of positive and negative
examples.
Entropy
1. The entropy is 0 if the outcome is
‘certain’
2. The entropy is maximum if we
have no knowledge of the system
( or any outcome is equally possible)
 S is sample of training examples.
 ⊕ is the proportion of positive examples in S
 ⊖ is the proportion of negative examples in S.
 Entropy measures the impurity of S
= − ⊕ log ⊕ − ⊖ log ⊖
Entropy of a 2-class problem with regard to the
portion of one of the two groups
Examples
• Before partitioning, the entropy is
• Info(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1
• Using the ``where’’ attribute, divide into 2 subsets
• Entropy of the first set Info (home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1
• Entropy of the second set Info (away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1
• Expected entropy after partitioning
• 12/20 * Info (home) + 8/20 * Info (away) = 1
Example
• Using the ``when’’ attribute, divide into 3 subsets
• Entropy of the first set Info (5pm) = - 1/4 log(1/4) - 3/4 log(3/4);
• Entropy of the second set Info (7pm) = - 9/12 log(9/12) - 3/12 log(3/12);
• Entropy of the second set Info (9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0
• Expected entropy after partitioning
• 4/20 * Info (1/4, 3/4) + 12/20 * Info (9/12, 3/12) + 4/20 * Info (0/4, 4/4) = 0.65
• Information gain 1-0.65 = 0.35
Training Example
Weak wind factor on decision
Strong wind factor on decision
Example
• The information gain due to sorting the original 14 examples by the
attribute Wind may then be calculated as
= ,
= 9+, 5 −
← 6+, 2 −
← 3+, 3 −
, = − ( )
∈{ , }
= − (8 14
⁄ ) − (6 14
⁄ )
Continue
= 0.940 − 8 14
⁄ ∗ 0.811 − 6 14
⁄ ∗1.00
=0.048
Continue
• The Information Gain for all four attributes are
, = 0.246
, = 0.151
, = 0.048
, = 0.029
Intermediate
Resulting
Tree
Overcast outlook on decision
• Decision will always be yes if outlook were overcast.
Intermediate Resulting Tree
Intermediate
Resulting
Tree
{D1, D2, D8, D9, D11}
[2+, 3-]
{D3, D7, D12, D13}
[4+, 0-]
{D4, D5, D6, D10, D14}
[3+, 2-]
Which attribute to select
?
?
Yes
Sunny outlook on decision
• Here, there are 5 instances for sunny outlook. Decision would be
probably 3/5 percent no, 2/5 percent yes.
Cont…
= | = 0.570
= |wind = 0.019
= |Humidity = 0.970
Humidity is the decision because it produces the highest score if outlook
were sunny.
At this point, decision will always be no if humidity were
high.
On the other hand, decision will always be yes if humidity
were normal
Intermediate Resulting Tree
Rain outlook on decision
Cont…
= |
= |Humidity
= |Wind
Wind produces the highest score if outlook were rain.
Decision will always be yes if wind were weak and outlook were rain.
Decision will be always no if wind were strong and outlook
were rain.
Final Tree
Information Gain: Limitation
• Problematic: Attribute with a large number of values (extreme
case: ID case)
• Subsets are more likely to be pure if there is a large number of
values
• Information gain is biased towards choosing attribute with a large
number of values
Gain Ratio
• A modification of the information gain that reduces its bais.
• The gain ratio measure penalizes attribute such as customer ID by
incorporating a term, called split information.
• Split information is sensitive to how broadly and uniformly the
attribute splits the data.
C4.5
• C4.5, a successor of ID3, uses an extension to Information gain known
as gain ratio.
• It overcome the bias problem.
• It applies a kind of normalization to information gain using split
information value.
= − log
is the entropy of with respect to the values of attribute
.
=
( )
The attribute with maximum gain ratio is selected as the splitting
attribute.
Gain ratios for weather data
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362
Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021
Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049

More Related Content

What's hot

Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
kandelin
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
Student
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
Krish_ver2
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
pyingkodi maran
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
Suraj Parmar
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
YashwantGahlot1
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
Fellowship at Vodafone FutureLab
 
Presentation on Elementary data structures
Presentation on Elementary data structuresPresentation on Elementary data structures
Presentation on Elementary data structures
Kuber Chandra
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
shubhamshirke12
 
supervised learning
supervised learningsupervised learning
supervised learning
Amar Tripathi
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
Dr-Dipali Meher
 
Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.
Nurul Amin Choudhury
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
zekeLabs Technologies
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
AyanaRukasar
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
MaryamRehman6
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
Kush Kulshrestha
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 

What's hot (20)

Optimization/Gradient Descent
Optimization/Gradient DescentOptimization/Gradient Descent
Optimization/Gradient Descent
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised2.6 support vector machines and associative classifiers revised
2.6 support vector machines and associative classifiers revised
 
Data preprocessing in Machine learning
Data preprocessing in Machine learning Data preprocessing in Machine learning
Data preprocessing in Machine learning
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)Fuzzy Clustering(C-means, K-means)
Fuzzy Clustering(C-means, K-means)
 
Presentation on Elementary data structures
Presentation on Elementary data structuresPresentation on Elementary data structures
Presentation on Elementary data structures
 
Introduction to Machine learning ppt
Introduction to Machine learning pptIntroduction to Machine learning ppt
Introduction to Machine learning ppt
 
supervised learning
supervised learningsupervised learning
supervised learning
 
Naive bayesian classification
Naive bayesian classificationNaive bayesian classification
Naive bayesian classification
 
Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.Fishers linear discriminant for dimensionality reduction.
Fishers linear discriminant for dimensionality reduction.
 
Support vector machine
Support vector machineSupport vector machine
Support vector machine
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Support Vector Machine ppt presentation
Support Vector Machine ppt presentationSupport Vector Machine ppt presentation
Support Vector Machine ppt presentation
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 

Similar to Lecture 5 Decision tree.pdf

Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
tttiba
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
Luca Zavarella
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
Decision trees
Decision treesDecision trees
Decision trees
Ncib Lotfi
 
decision tree.pdf
decision tree.pdfdecision tree.pdf
decision tree.pdf
DivitGoyal2
 
Classification.pptx
Classification.pptxClassification.pptx
Classification.pptx
Dr. Amanpreet Kaur
 
Decision tree
Decision treeDecision tree
Decision tree
Varun Jain
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
AjayKumar773878
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
monicafrancis71118
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
Machine Learning Valencia
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Maninda Edirisooriya
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
DrGnaneswariG
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
Vijayalakshmi171563
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
tafosepsdfasg
 

Similar to Lecture 5 Decision tree.pdf (20)

Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Decision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic UnveiledDecision Trees - The Machine Learning Magic Unveiled
Decision Trees - The Machine Learning Magic Unveiled
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Decision trees
Decision treesDecision trees
Decision trees
 
decision tree.pdf
decision tree.pdfdecision tree.pdf
decision tree.pdf
 
Classification.pptx
Classification.pptxClassification.pptx
Classification.pptx
 
Decision tree
Decision treeDecision tree
Decision tree
 
Lecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lectureLecture 12.pptx for bca student DAA lecture
Lecture 12.pptx for bca student DAA lecture
 
Classification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docxClassification Using Decision Trees and RulesChapter 5.docx
Classification Using Decision Trees and RulesChapter 5.docx
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
L3. Decision Trees
L3. Decision TreesL3. Decision Trees
L3. Decision Trees
 
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
Lecture 9 - Decision Trees and Ensemble Methods, a lecture in subject module ...
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 

Recently uploaded

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
Col Mukteshwar Prasad
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
GeoBlogs
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
Anna Sz.
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
EugeneSaldivar
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
Celine George
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
Vivekanand Anglo Vedic Academy
 

Recently uploaded (20)

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Fish and Chips - have they had their chips
Fish and Chips - have they had their chipsFish and Chips - have they had their chips
Fish and Chips - have they had their chips
 
Polish students' mobility in the Czech Republic
Polish students' mobility in the Czech RepublicPolish students' mobility in the Czech Republic
Polish students' mobility in the Czech Republic
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...TESDA TM1 REVIEWER  FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
TESDA TM1 REVIEWER FOR NATIONAL ASSESSMENT WRITTEN AND ORAL QUESTIONS WITH A...
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Sectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdfSectors of the Indian Economy - Class 10 Study Notes pdf
Sectors of the Indian Economy - Class 10 Study Notes pdf
 

Lecture 5 Decision tree.pdf

  • 2. Goal of Classification Algorithm • Build models with good generalization capabilities, i.e., Model that accurately predict the class labels of previously unknown records. • Classification Algorithm • Naïve Bayes Classifier • Decision Tree • Rule-based classifiers • Neural Network • Support Vector Machine
  • 3. Why decision tree? • Decision trees are powerful and popular tools for classification and prediction. • Decision trees represent rules, which can be understood by humans and used in knowledge system such as database.
  • 5. Predicting potential loan default (Credit)
  • 6. Predicting potential loan default (Income)
  • 7. Predicting potential loan default (Term)
  • 8. Predicting potential loan default (Personal Info)
  • 11.  Decision tree is a classifier in the form of a tree structure  Decision tree maps out all possible decision paths in the form of a tree. – Root node: The node has no incoming edges and zero or more outgoing edges. – Internal node ((Decision node): specifies a test on a single attribute – Leaf node: indicates the value of the target attribute – Branches (Arc/edge): split of one attribute  Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node based on local optimum decesion. Definition
  • 13. What does decision tree represents?
  • 14. What does decision tree represents?
  • 15. Scoring a loan application
  • 16. Decision Tree Classification Task Decision Tree Test Data Training Data
  • 17. Learn Decision tree from data ?
  • 19. Quality metric: Classification Error • Error measure fraction of mistakes = # # • Best possible value : 0.0 • Worst possible value: 1.0
  • 20. Find the tree with lowest classification error
  • 21. How do we find the best tree •Exponentially large number of possible decision tree makes decision tree hard.
  • 22. Decision tree • Decision tree to represent learned target function • Each internal node tests an attribute • Each branch corresponds to attribute value • Each leaf node assigns a classification • Can be represented by logical formula (2) Which node to proceed? (3) When to stop/ come to conclusion? (1) Which to start? (root)
  • 23. Tree Induction •Greedy strategy. • Split the records based on an attribute test that optimizes certain criterion.
  • 24. Greedy Algorithm Step 1: Start with an empty tree
  • 25. Greedy Algorithm Step 2: Split on a feature
  • 27. Step 3: Making predictions
  • 29. Greedy Decision Tree Algorithm Step 1: Start with an empty tree Step 2: Select a feature to split data For each split of the tree. Step 3: If nothing more to, make predictions Step 4: Otherwise, go to Step 2 & continue (recurse) on this split. Problem 1: Feature split selection Problem 2: Stopping condition Recursion
  • 30. Design Issues of Decision Tree Induction •Issues • How to Classify a leaf node • Assign the majority class • If leaf is empty, assign the default class – the class that has the highest popularity. • Determine how to split the records • How to specify the attribute test condition? • How to determine the best split? • Determine when to stop splitting • Every attribute has already been included along this path through the tree. • Stop splitting if all the records belong to the same class or have identical attribute values • Stop when each leaf node has uncertainty below some threshold.
  • 31. Decision Tree learning Start with the data Assume N = 40, 3 features
  • 36. Making Prediction with Decision Stump
  • 37. How do we learn decision stump
  • 38. Algorithms •Many Algorithms: • Hunt’s Algorithm (one of the earliest) • ID3 (Iterative Dichotomiser) • C4.5 • CART (Classification And Regression Tree) • SLIQ, SPRINT
  • 39. General Structure of Hunt’s Algorithm • Basic of many existing DT algorithm. • Let Dt be the set of training records that reach a node t • General Procedure: • If Dt contains records that belong the same class yt, then t is a leaf node labeled as yt • If Dt contains records with the same attribute values, then t is a leaf node labeled with the majority class yt • If Dt is an empty set, then t is a leaf node labeled by the default class, yd • If Dt contains records that belong to more than one class, use an attribute test to split the data into smaller subsets. • Recursively apply the procedure to each subset. Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Dt ?
  • 40. Hunt’s Algorithm Don’t Cheat Refund Don’t Cheat Don’t Cheat Yes No Refund Don’t Cheat Yes No Marital Status Don’t Cheat Cheat Single, Divorced Married Taxable Income Don’t Cheat < 80K >= 80K Refund Don’t Cheat Yes No Marital Status Don’t Cheat Cheat Single, Divorced Married Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 4 Yes Married 120K No 7 Yes Divorced 220K No 2 No Married 100K No 3 No Single 70K No 5 No Divorced 95K Yes 6 No Married 60K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10
  • 41. Hunt’s Algorithm •Empty node (Non of the training records have the combination of attribute value) • Node is declared as a leaf node with same class label as the majority class of training records associated with its parent node. •Non-empty node • Same class • Identical attribute values (Except for the class label) • Node is declared as a leaf node with the same class label as the majority class of training records associated with this node.
  • 42. Iterative Dichotomiser (ID3) • Dichotomisation means, the act of dividing into two sharply different categories. Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes Yes No
  • 43. Principled Criterion •Selection of an attribute to test at each node choosing the most useful attribute for classifying examples. •Information gain • Measures how well a given attribute separates the training examples according to their target classification. • This measure is used to select among the candidate attributes at each step while growing the tree. • Gain is measure of how much we can reduce uncertainty (value lies between 0, 1)
  • 44. How to Specify Test Condition? •Depends on attribute types • Binary • Nominal • Ordinal • Continuous •Depends on number of ways to split • 2-way split • Multi-way split
  • 45. Splitting Based on Nominal Attributes • Binary split: The test condition for a binary attribute generates two potential outcomes Body Temp {Warm- blooded} {Cold-blooded}
  • 46. Splitting Based on Nominal Attributes • Multi-way split: Use as many partitions as distinct values. • Binary split: Divides values into two subsets. Need to find optimal partitioning. CarType Family Sports Luxury CarType {Family, Luxury} {Sports} CarType {Sports, Luxury} {Family} OR Note: CART produces only binary split by considering all 2 − 1 ways of creating a binary partition of attribute values.
  • 47. Splitting Based on Ordinal Attributes • Multi-way split: Use as many partitions as distinct values. • Binary split: Divides values into two subsets – respects the order (Grouped as long as the grouping does not violates the order property of the attribute values). Need to find optimal partitioning. Size Small Medium Large Size {Medium, Large, Extra Large} {Small} Size {Small, Medium} {Large, Extra Large} OR Size {Small, Large} {Medium, Extra Large}
  • 48. Splitting Based on Continuous Attributes •Different ways of handling •Discretization to form an ordinal categorical attribute • Static – discretize once at the beginning • Dynamic – ranges can be found by equal interval bucketing, equal frequency bucketing (percentiles), or clustering. •Binary Decision: (A < v) or (A  v) • consider all possible splits and finds the best cut • can be more compute intensive
  • 49. Splitting Based on Continuous Attributes Threshold Split
  • 50. Splitting Based on Continuous Attributes •Threshold Split in 1-D
  • 51. Splitting Based on Continuous Attributes •Visualizing the threshold split
  • 52. Splitting Based on Continuous Attributes Split on Age >= 38
  • 53. Splitting Based on Continuous Attributes •Split on Income >= $60K
  • 54. Splitting Based on Continuous Attributes •Each Split partition the 2D space
  • 55. Splitting Based on Continuous Attributes
  • 56. How to determine the Best Split Before Splitting: 10 records of class 0, 10 records of class 1 Which test condition is the best? • Class distribution of the records before and after splitting
  • 57. How to determine the Best Split • Greedy approach: • Nodes with homogeneous class distribution are preferred • Need a measure of node impurity: The smaller the degree of impurity the more skewed the class distribution. • Ideas? • Entropy and Information gain Non-homogeneous, High degree of impurity Homogeneous, Low degree of impurity
  • 58. Entropy • A measure of • Uncertainty • (Im)Purity • Information content • Given a collection S = − ⊕ log ⊕ − ⊖ log ⊖ ⊕ is the proportion of positive examples in D ⊖ is the proportion of negative examples in D • The lower the Entropy, the less uniform the distribution, the purer the node.
  • 59. Information Gain • Gain tells us how would be gained by branching on A. • Information gain is simply the expected reduction in entropy caused by partitioning the examples according to the selected attribute. • Information gain, Gain (S, A) of an attribute A is defined as = − ( ) = ( ) ∈ ( ) ( ) is the set of all possible values for attribute A, is the subset of D for which attribute A has value v
  • 60. Simple Greedy Decision Tree Learning When do we stop ?
  • 61. Stopping Condition 1. All data agrees on y
  • 62. Stopping Condition 2: Already split on all features
  • 64. Example • D is collection of 14 examples, 9 positive and 5 negative examples 9+, 5 − = −(9 14 ⁄ )log 9 14 ⁄ − (5 14 ⁄ )log 5 14 ⁄ = 0.940 Entropy is 0 if all members of D belongs to the same class. Entropy is 1 when the collection contains an equal number of positive and negative examples.
  • 65. Entropy 1. The entropy is 0 if the outcome is ‘certain’ 2. The entropy is maximum if we have no knowledge of the system ( or any outcome is equally possible)  S is sample of training examples.  ⊕ is the proportion of positive examples in S  ⊖ is the proportion of negative examples in S.  Entropy measures the impurity of S = − ⊕ log ⊕ − ⊖ log ⊖ Entropy of a 2-class problem with regard to the portion of one of the two groups
  • 66. Examples • Before partitioning, the entropy is • Info(10/20, 10/20) = - 10/20 log(10/20) - 10/20 log(10/20) = 1 • Using the ``where’’ attribute, divide into 2 subsets • Entropy of the first set Info (home) = - 6/12 log(6/12) - 6/12 log(6/12) = 1 • Entropy of the second set Info (away) = - 4/8 log(6/8) - 4/8 log(4/8) = 1 • Expected entropy after partitioning • 12/20 * Info (home) + 8/20 * Info (away) = 1
  • 67. Example • Using the ``when’’ attribute, divide into 3 subsets • Entropy of the first set Info (5pm) = - 1/4 log(1/4) - 3/4 log(3/4); • Entropy of the second set Info (7pm) = - 9/12 log(9/12) - 3/12 log(3/12); • Entropy of the second set Info (9pm) = - 0/4 log(0/4) - 4/4 log(4/4) = 0 • Expected entropy after partitioning • 4/20 * Info (1/4, 3/4) + 12/20 * Info (9/12, 3/12) + 4/20 * Info (0/4, 4/4) = 0.65 • Information gain 1-0.65 = 0.35
  • 69. Weak wind factor on decision
  • 70. Strong wind factor on decision
  • 71. Example • The information gain due to sorting the original 14 examples by the attribute Wind may then be calculated as = , = 9+, 5 − ← 6+, 2 − ← 3+, 3 − , = − ( ) ∈{ , } = − (8 14 ⁄ ) − (6 14 ⁄ )
  • 72. Continue = 0.940 − 8 14 ⁄ ∗ 0.811 − 6 14 ⁄ ∗1.00 =0.048
  • 73. Continue • The Information Gain for all four attributes are , = 0.246 , = 0.151 , = 0.048 , = 0.029
  • 75. Overcast outlook on decision • Decision will always be yes if outlook were overcast.
  • 77. Intermediate Resulting Tree {D1, D2, D8, D9, D11} [2+, 3-] {D3, D7, D12, D13} [4+, 0-] {D4, D5, D6, D10, D14} [3+, 2-] Which attribute to select ? ? Yes
  • 78. Sunny outlook on decision • Here, there are 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes.
  • 79. Cont… = | = 0.570 = |wind = 0.019 = |Humidity = 0.970 Humidity is the decision because it produces the highest score if outlook were sunny.
  • 80. At this point, decision will always be no if humidity were high. On the other hand, decision will always be yes if humidity were normal
  • 82. Rain outlook on decision
  • 83. Cont… = | = |Humidity = |Wind Wind produces the highest score if outlook were rain.
  • 84. Decision will always be yes if wind were weak and outlook were rain. Decision will be always no if wind were strong and outlook were rain.
  • 86. Information Gain: Limitation • Problematic: Attribute with a large number of values (extreme case: ID case) • Subsets are more likely to be pure if there is a large number of values • Information gain is biased towards choosing attribute with a large number of values
  • 87. Gain Ratio • A modification of the information gain that reduces its bais. • The gain ratio measure penalizes attribute such as customer ID by incorporating a term, called split information. • Split information is sensitive to how broadly and uniformly the attribute splits the data.
  • 88. C4.5 • C4.5, a successor of ID3, uses an extension to Information gain known as gain ratio. • It overcome the bias problem. • It applies a kind of normalization to information gain using split information value. = − log is the entropy of with respect to the values of attribute . = ( ) The attribute with maximum gain ratio is selected as the splitting attribute.
  • 89. Gain ratios for weather data Outlook Temperature Info: 0.693 Info: 0.911 Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029 Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362 Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021 Humidity Windy Info: 0.788 Info: 0.892 Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048 Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985 Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049