SlideShare a Scribd company logo
1 of 30
Decision Tree
R. Akerkar
TMRF, Kolhapur, India
1
R. Akerkar
Introduction
 A classification scheme which generates a tree and
a set of rules from given data set.
 The set of records available for developing
classification methods is divided into two disjoint
subsets – a training set and a test set.
 The attributes of the records are categorise into two
types:
 Attributes whose domain is numerical are called numerical
attributes.
 Attributes whose domain is not numerical are called the
categorical attributes.
2
R. Akerkar
Introduction
 A decision tree is a tree with the following properties:
 An inner node represents an attribute.
 An edge represents a test on the attribute of the father
node.
 A leaf represents one of the classes.
 Construction of a decision tree
 Based on the training data
 Top-Down strategy
3
R. Akerkar
Decision Tree
Example
 The data set has five attributes.
 There is a special attribute: the attribute class is the class label.
 The attributes, temp (temperature) and humidity are numerical
attributes
 Other attributes are categorical, that is, they cannot be ordered.
 Based on the training data set, we want to find a set of rules to
know what values of outlook, temperature, humidity and wind,
determine whether or not to play golf.
4
R. Akerkar
Decision Tree
Example
 We have five leaf nodes.
 In a decision tree, each leaf node represents a rule.
 We have the following rules corresponding to the tree given in
Figure.
 RULE 1
 RULE 2
 RULE 3
 RULE 4
 RULE 5
If it is sunny and the humidity is not above 75%, then play.
If it is sunny and the humidity is above 75%, then do not play.
If it is overcast, then play.
If it is rainy and not windy, then play.
If it is rainy and windy, then don't play.
5
R. Akerkar
Classification
 The classification of an unknown input vector is done by
traversing the tree from the root node to a leaf node.
 A record enters the tree at the root node.
 At the root, a test is applied to determine which child
node the record will encounter next.
 This process is repeated until the record arrives at a leaf
node.
 All the records that end up at a given leaf of the tree are
classified in the same way.
 There is a unique path from the root to each leaf.
 The path is a rule which is used to classify the records.
6
R. Akerkar
 In our tree, we can carry out the classification
for an unknown record as follows.
 Let us assume, for the record, that we know
the values of the first four attributes (but we
do not know the value of class attribute) as
 outlook= rain; temp = 70; humidity = 65; and
windy= true.
7
R. Akerkar
 We start from the root node to check the value of the attribute
associated at the root node.
 This attribute is the splitting attribute at this node.
 For a decision tree, at every node there is an attribute associated
with the node called the splitting attribute.
 In our example, outlook is the splitting attribute at root.
 Since for the given record, outlook = rain, we move to the right-
most child node of the root.
 At this node, the splitting attribute is windy and we find that for
the record we want classify, windy = true.
 Hence, we move to the left child node to conclude that the class
label Is "no play".
8
R. Akerkar
 The accuracy of the classifier is determined by the percentage of the
test data set that is correctly classified.
 We can see that for Rule 1 there are two records of the test data set
satisfying outlook= sunny and humidity < 75, and only one of these
is correctly classified as play.
 Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the
accuracy of Rule 2 is also 0.5 (or 50%). The accuracy of Rule 3 is
0.66.
RULE 1
If it is sunny and the humidity
is not above 75%, then play.
9
R. Akerkar
Concept of Categorical Attributes
 Consider the following training
data set.
 There are three attributes,
namely, age, pincode and class.
 The attribute class is used for
class label.
The attribute age is a numeric attribute, whereas pincode is a categorical
one.
Though the domain of pincode is numeric, no ordering can be defined
among pincode values.
You cannot derive any useful information if one pin-code is greater than
another pincode.
10
R. Akerkar
 Figure gives a decision tree for the
training data.
 The splitting attribute at the root is
pincode and the splitting criterion
here is pincode = 500 046.
 Similarly, for the left child node, the
splitting criterion is age < 48 (the
splitting attribute is age).
 Although the right child node has
the same attribute as the splitting
attribute, the splitting criterion is
different.
At root level, we have 9 records.
The associated splitting criterion is
pincode = 500 046.
As a result, we split the records
into two subsets. Records 1, 2, 4, 8,
and 9 are to the left child note and
remaining to the right node.
The process is repeated at every
node.
11
R. Akerkar
Advantages and Shortcomings of Decision
Tree Classifications
 A decision tree construction process is concerned with identifying
the splitting attributes and splitting criterion at every level of the tree.
 Major strengths are:
 Decision tree able to generate understandable rules.
 They are able to handle both numerical and categorical attributes.
 They provide clear indication of which fields are most important for
prediction or classification.
 Weaknesses are:
 The process of growing a decision tree is computationally expensive. At
each node, each candidate splitting field is examined before its best split
can be found.
 Some decision tree can only deal with binary-valued target classes.
12
R. Akerkar
Iterative Dichotomizer (ID3)
 Quinlan (1986)
 Each node corresponds to a splitting attribute
 Each arc is a possible value of that attribute.
 At each node the splitting attribute is selected to be the most
informative among the attributes not yet considered in the path from
the root.
 Entropy is used to measure how informative is a node.
 The algorithm uses the criterion of information gain to determine the
goodness of a split.
 The attribute with the greatest information gain is taken as
the splitting attribute, and the data set is split for all distinct
values of the attribute.
13
R. Akerkar
Training Dataset
This follows an example from Quinlan’s ID3
The class label attribute,
buys_computer, has two distinct
values.
Thus there are two distinct
classes. (m =2)
Class C1 corresponds to yes
and class C2 corresponds to no.
There are 9 samples of class yes
and 5 samples of class no.
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31…40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31…40 high yes fair yes
>4 medium n excellent n
14
R. Akerkar
Extracting Classification Rules from Trees
 Represent the knowledge in
the form of IF-THEN rules
 One rule is created for each
path from the root to a leaf
 Each attribute-value pair
along a path forms a
conjunction
 The leaf node holds the class
prediction
 Rules are easier for humans
to understand
What are the rules?
15
R. Akerkar
Solution (Rules)
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN
buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer =
“no”
16
R. Akerkar
Algorithm for Decision Tree Induction
 Basic algorithm (a greedy algorithm)
 Tree is constructed in a top-down recursive divide-and-conquer
manner
 At start, all the training examples are at the root
 Attributes are categorical (if continuous-valued, they are
discretized in advance)
 Examples are partitioned recursively based on selected attributes
 Test attributes are selected on the basis of a heuristic or
statistical measure (e.g., information gain)
 Conditions for stopping partitioning
 All samples for a given node belong to the same class
 There are no remaining attributes for further partitioning –
majority voting is employed for classifying the leaf
 There are no samples left
17
R. Akerkar
Attribute Selection Measure: Information
Gain (ID3/C4.5)
 Select the attribute with the highest information gain
 S contains si tuples of class Ci for i = {1, …, m}
 information measures info required to classify any
arbitrary tuple
 ….information is encoded in bits.
s s
si
log 2
si
i1
m
I( s1,s2,...,sm )  

 entropy of attribute A with values {a1,a2,…,av}
s
I(s1j,...,smj )
s1j ...smj
v
E(A)
j1
18
R. Akerkar
 information gained by branching on attribute A
Gain(A)  I(s 1,s 2,...,sm) E(A)
Entropy
 Entropy measures the homogeneity (purity) of a set of examples.
 It gives the information content of the set in terms of the class labels of
the examples.
 Consider that you have a set of examples, S with two classes, P and N. Let
the set have p instances for the class P and n instances for the class N.
 So the total number of instances we have is t = p + n. The view [p, n] can
be seen as a class distribution of S.
The entropy for S is defined as
 Entropy(S) = - (p/t).log2(p/t) - (n/t).log2(n/t)
 Example: Let a set of examples consists of 9 instances for class positive,
and 5 instances for class negative.
 Answer: p = 9 and n = 5.
 So Entropy(S) = - (9/14).log2(9/14) - (5/14).log2(5/14)



= -(0.64286)(-0.6375) - (0.35714)(-1.48557)
= (0.40982) + (0.53056)
= 0.940
19
R. Akerkar
Entropy
The entropy for a completely pure set is 0 and is 1 for a set with
equal occurrences for both the classes.
i.e. Entropy[14,0] = - (14/14).log2(14/14) - (0/14).log2(0/14)
= -1.log2(1) - 0.log2(0)
= -1.0 - 0
= 0
i.e. Entropy[7,7] = - (7/14).log2(7/14) - (7/14).log2(7/14)
= - (0.5).log2(0.5) - (0.5).log2(0.5)
= - (0.5).(-1) - (0.5).(-1)
= 0.5 + 0.5
= 1
20
R. Akerkar
Attribute Selection by Information Gain
Computation
14
5
E(age) 
5
I (2,3) 
4
I (4,0)
14

14
I (3,2)  0.694
means “age <=30” has 5
out of 14 samples, with 2 yes's
and 3 no’s. Hence
 Class P: buys_computer = “yes”
 Class N: buys_computer = “no”
 I(p, n) = I(9, 5) =0.940
 Compute the entropy for age:
age pi ni I(pi, ni)
<=30 2 3 0.971
30…40 4 0 0
>40 3 2 0.971
14
5
I (2,3)
Gain(income)  0.029
Gain(age)  I( p, n)  E(age)  0.246
Similarly,
Gain(student)  0.151
Gain(credit _ rating)  0.048
age income student credit_rating buys_computer
<=30 high no fair n o
<=30 high no excellent no
31…40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent n o
31…40 low yes excellent yes
<=30 medium no fair n o
<=30 low yes fair yes
<=30 medium yes excellent yes
31…40 medium no excellent yes
31… 40 h ig h yes fair yes
>40 medium no excellent no
Since, age has the highest information gain
among the attributes, it is selected as the
test attribute. 21
R. Akerkar
Exercise 1
 The following table consists of training data from an employee
database.
 Let status be the class attribute. Use the ID3 algorithm to construct a
decision tree from the given data.
22
R. Akerkar
Solution 1
23
R. Akerkar
Other Attribute Selection Measures
 Gini index (CART, IBM IntelligentMiner)
 All attributes are assumed continuous-valued
 Assume there exist several possible split values for each
attribute
 May need other tools, such as clustering, to get the
possible split values
 Can be modified for categorical attributes
24
R. Akerkar
Gini Index (IBM IntelligentMiner)
 If a data set T contains examples from n classes, gini index, gini(T) is
defined as
n
 p 2
j
gini (T )  1 
j 1
where pj is the relative frequency of class j in T.
 If a data set T is split into two subsets T1 and T2 with sizes N1 and N2
respectively, the gini index of the split data contains examples from n
classes, the gini index gini(T) is defined as
gini split (T )  N 1 gini (T1) N 2 gini (T 2)
N N
 The attribute provides the smallest ginisplit(T) is chosen to split the node
(need to enumerate all possible splitting points for each attribute).
25
R. Akerkar
Exercise 2
26
R. Akerkar
Solution 2
 SPLIT: Age <= 50
 ----------------------
 | High | Low | Total
 --------------------
 S1 (left) | 8 | 11 | 19
 S2 (right) | 11 | 10 | 21

 For S1: P(high) = 8/19 = 0.42 and P(low) = 11/19 = 0.58
 For S2: P(high) = 11/21 = 0.52 and P(low) = 10/21 = 0.48
 Gini(S1) = 1-[0.42x0.42 + 0.58x0.58] = 1-[0.18+0.34] = 1-0.52 = 0.48
 Gini(S2) = 1-[0.52x0.52 + 0.48x0.48] = 1-[0.27+0.23] = 1-0.5 = 0.5
 Gini-Split(Age<=50) = 19/40 x 0.48 + 21/40 x 0.5 = 0.23 + 0.26 = 0.49
 SPLIT: Salary <= 65K
 ----------------------
 | High | Low | Total
 --------------------
 S1 (top) | 18 | 5 | 23
 S2 (bottom) | 1 | 16 | 17
 --------------------
 For S1: P(high) = 18/23 = 0.78 and P(low) = 5/23 = 0.22
 For S2: P(high) = 1/17 = 0.06 and P(low) = 16/17 = 0.94
 Gini(S1) = 1-[0.78x0.78 + 0.22x0.22] = 1-[0.61+0.05] = 1-0.66 = 0.34
 Gini(S2) = 1-[0.06x0.06 + 0.94x0.94] = 1-[0.004+0.884] = 1-0.89 = 0.11
 Gini-Split(Age<=50) = 23/40 x 0.34 + 17/40 x 0.11 = 0.20 + 0.05 = 0.25
27
R. Akerkar
Exercise 3
 In previous exercise, which is a better split of
the data among the two split points? Why?
28
R. Akerkar
Solution 3
``
 Intuitively Salary <= 65K is a better split point since it produces
relatively pure'' partitions as opposed to Age <= 50, which
results in more mixed partitions (i.e., just look at the distribution
of Highs and Lows in S1 and S2).
 More formally, let us consider the properties of the Gini index.
If a partition is totally pure, i.e., has all elements from the same
class, then gini(S) = 1-[1x1+0x0] = 1-1 = 0 (for two classes).
On the other hand if the classes are totally mixed, i.e., both
classes have equal probability then
gini(S) = 1 - [0.5x0.5 + 0.5x0.5] = 1-[0.25+0.25] = 0.5.
In other words the closer the gini value is to 0, the better the
partition is. Since Salary has lower gini it is a better split.
29
R. Akerkar
Avoid Overfitting in Classification
 Overfitting: An induced tree may overfit the training data
 Too many branches, some may reflect anomalies due to noise
or outliers
 Poor accuracy for unseen samples
 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early—do not split a node if
this would result in the goodness measure falling below a
threshold
 Difficult to choose an appropriate threshold
 Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees
 Use a set of data different from the training data to decide
which is the “best pruned tree”
30
R. Akerkar

More Related Content

Similar to decisiontree-110906040745-phpapp01.pptx

Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for buildingajmal_fuuast
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
Ai sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAi sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAzimah Hashim
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_CSrimatre K
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseasesijsrd.com
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.pptLaxmi139487
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification treesLeonardo Auslender
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networksjaskarankaur21
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesVarad Meru
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighborUjjawal
 

Similar to decisiontree-110906040745-phpapp01.pptx (20)

Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
Dbm630 lecture06
Dbm630 lecture06Dbm630 lecture06
Dbm630 lecture06
 
forest-cover-type
forest-cover-typeforest-cover-type
forest-cover-type
 
Decision Trees.ppt
Decision Trees.pptDecision Trees.ppt
Decision Trees.ppt
 
An algorithm for building
An algorithm for buildingAn algorithm for building
An algorithm for building
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Ai sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representationAi sem1 2012-13-w2-representation
Ai sem1 2012-13-w2-representation
 
ML_Unit_1_Part_C
ML_Unit_1_Part_CML_Unit_1_Part_C
ML_Unit_1_Part_C
 
A Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of DiseasesA Decision Tree Based Classifier for Classification & Prediction of Diseases
A Decision Tree Based Classifier for Classification & Prediction of Diseases
 
DM Unit-III ppt.ppt
DM Unit-III ppt.pptDM Unit-III ppt.ppt
DM Unit-III ppt.ppt
 
Kmeans
KmeansKmeans
Kmeans
 
Alacart Poor man's classification trees
Alacart Poor man's classification treesAlacart Poor man's classification trees
Alacart Poor man's classification trees
 
Desicion tree and neural networks
Desicion tree and neural networksDesicion tree and neural networks
Desicion tree and neural networks
 
Predicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensemblesPredicting rainfall using ensemble of ensembles
Predicting rainfall using ensemble of ensembles
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
K nearest neighbor
K nearest neighborK nearest neighbor
K nearest neighbor
 

More from ABINASHPADHY6

Rapid Guard PVC Doors (Ritul Joshi).pptx
Rapid Guard PVC Doors (Ritul Joshi).pptxRapid Guard PVC Doors (Ritul Joshi).pptx
Rapid Guard PVC Doors (Ritul Joshi).pptxABINASHPADHY6
 
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyu
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyuwqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyu
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyuABINASHPADHY6
 
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjk
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjkBlue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjk
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjkABINASHPADHY6
 
Types-of-Information-System.pptx
Types-of-Information-System.pptxTypes-of-Information-System.pptx
Types-of-Information-System.pptxABINASHPADHY6
 
Ferns and Petals.pdf
Ferns and Petals.pdfFerns and Petals.pdf
Ferns and Petals.pdfABINASHPADHY6
 
Coursera H5B26VDU9GSH.pdf
Coursera H5B26VDU9GSH.pdfCoursera H5B26VDU9GSH.pdf
Coursera H5B26VDU9GSH.pdfABINASHPADHY6
 
Bagging_and_Boosting.pptx
Bagging_and_Boosting.pptxBagging_and_Boosting.pptx
Bagging_and_Boosting.pptxABINASHPADHY6
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxABINASHPADHY6
 
Module 7_ Use Cases_ Blockchain Certification Training Course.pptx
Module 7_ Use Cases_ Blockchain Certification Training Course.pptxModule 7_ Use Cases_ Blockchain Certification Training Course.pptx
Module 7_ Use Cases_ Blockchain Certification Training Course.pptxABINASHPADHY6
 
collaborativefiltering-150228122057-conversion-gate02.pptx
collaborativefiltering-150228122057-conversion-gate02.pptxcollaborativefiltering-150228122057-conversion-gate02.pptx
collaborativefiltering-150228122057-conversion-gate02.pptxABINASHPADHY6
 
Chernick.Michael.ppt
Chernick.Michael.pptChernick.Michael.ppt
Chernick.Michael.pptABINASHPADHY6
 
videorecommendationsystemfornewseducationandentertainment-170519183703.pptx
videorecommendationsystemfornewseducationandentertainment-170519183703.pptxvideorecommendationsystemfornewseducationandentertainment-170519183703.pptx
videorecommendationsystemfornewseducationandentertainment-170519183703.pptxABINASHPADHY6
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...ABINASHPADHY6
 
pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxABINASHPADHY6
 

More from ABINASHPADHY6 (20)

Rapid Guard PVC Doors (Ritul Joshi).pptx
Rapid Guard PVC Doors (Ritul Joshi).pptxRapid Guard PVC Doors (Ritul Joshi).pptx
Rapid Guard PVC Doors (Ritul Joshi).pptx
 
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyu
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyuwqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyu
wqedfghj,hgfdgthjkjhgfdsdfgthujsdfrtghyu
 
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjk
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjkBlue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjk
Blue Shark Tech zsxdfghsdfghjsdfghjkdfghjkdfghjk
 
Types-of-Information-System.pptx
Types-of-Information-System.pptxTypes-of-Information-System.pptx
Types-of-Information-System.pptx
 
Ferns and Petals.pdf
Ferns and Petals.pdfFerns and Petals.pdf
Ferns and Petals.pdf
 
Coursera H5B26VDU9GSH.pdf
Coursera H5B26VDU9GSH.pdfCoursera H5B26VDU9GSH.pdf
Coursera H5B26VDU9GSH.pdf
 
Advertising &.pptx
Advertising &.pptxAdvertising &.pptx
Advertising &.pptx
 
Bagging_and_Boosting.pptx
Bagging_and_Boosting.pptxBagging_and_Boosting.pptx
Bagging_and_Boosting.pptx
 
shubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptxshubhampresentation-180430060134.pptx
shubhampresentation-180430060134.pptx
 
Module 7_ Use Cases_ Blockchain Certification Training Course.pptx
Module 7_ Use Cases_ Blockchain Certification Training Course.pptxModule 7_ Use Cases_ Blockchain Certification Training Course.pptx
Module 7_ Use Cases_ Blockchain Certification Training Course.pptx
 
collaborativefiltering-150228122057-conversion-gate02.pptx
collaborativefiltering-150228122057-conversion-gate02.pptxcollaborativefiltering-150228122057-conversion-gate02.pptx
collaborativefiltering-150228122057-conversion-gate02.pptx
 
Chernick.Michael.ppt
Chernick.Michael.pptChernick.Michael.ppt
Chernick.Michael.ppt
 
videorecommendationsystemfornewseducationandentertainment-170519183703.pptx
videorecommendationsystemfornewseducationandentertainment-170519183703.pptxvideorecommendationsystemfornewseducationandentertainment-170519183703.pptx
videorecommendationsystemfornewseducationandentertainment-170519183703.pptx
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
 
pcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptxpcappt-140121072949-phpapp01.pptx
pcappt-140121072949-phpapp01.pptx
 
Lecture1_jps.ppt
Lecture1_jps.pptLecture1_jps.ppt
Lecture1_jps.ppt
 
Culbert.ppt
Culbert.pptCulbert.ppt
Culbert.ppt
 
Bootstrap.ppt
Bootstrap.pptBootstrap.ppt
Bootstrap.ppt
 
15303589.ppt
15303589.ppt15303589.ppt
15303589.ppt
 
09learning.ppt
09learning.ppt09learning.ppt
09learning.ppt
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxRoyAbrique
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAssociation for Project Management
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3JemimahLaneBuaron
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxOH TEIK BIN
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 

Recently uploaded (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptxContemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
Contemporary philippine arts from the regions_PPT_Module_12 [Autosaved] (1).pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3Q4-W6-Restating Informational Text Grade 3
Q4-W6-Restating Informational Text Grade 3
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Solving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptxSolving Puzzles Benefits Everyone (English).pptx
Solving Puzzles Benefits Everyone (English).pptx
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 

decisiontree-110906040745-phpapp01.pptx

  • 1. Decision Tree R. Akerkar TMRF, Kolhapur, India 1 R. Akerkar
  • 2. Introduction  A classification scheme which generates a tree and a set of rules from given data set.  The set of records available for developing classification methods is divided into two disjoint subsets – a training set and a test set.  The attributes of the records are categorise into two types:  Attributes whose domain is numerical are called numerical attributes.  Attributes whose domain is not numerical are called the categorical attributes. 2 R. Akerkar
  • 3. Introduction  A decision tree is a tree with the following properties:  An inner node represents an attribute.  An edge represents a test on the attribute of the father node.  A leaf represents one of the classes.  Construction of a decision tree  Based on the training data  Top-Down strategy 3 R. Akerkar
  • 4. Decision Tree Example  The data set has five attributes.  There is a special attribute: the attribute class is the class label.  The attributes, temp (temperature) and humidity are numerical attributes  Other attributes are categorical, that is, they cannot be ordered.  Based on the training data set, we want to find a set of rules to know what values of outlook, temperature, humidity and wind, determine whether or not to play golf. 4 R. Akerkar
  • 5. Decision Tree Example  We have five leaf nodes.  In a decision tree, each leaf node represents a rule.  We have the following rules corresponding to the tree given in Figure.  RULE 1  RULE 2  RULE 3  RULE 4  RULE 5 If it is sunny and the humidity is not above 75%, then play. If it is sunny and the humidity is above 75%, then do not play. If it is overcast, then play. If it is rainy and not windy, then play. If it is rainy and windy, then don't play. 5 R. Akerkar
  • 6. Classification  The classification of an unknown input vector is done by traversing the tree from the root node to a leaf node.  A record enters the tree at the root node.  At the root, a test is applied to determine which child node the record will encounter next.  This process is repeated until the record arrives at a leaf node.  All the records that end up at a given leaf of the tree are classified in the same way.  There is a unique path from the root to each leaf.  The path is a rule which is used to classify the records. 6 R. Akerkar
  • 7.  In our tree, we can carry out the classification for an unknown record as follows.  Let us assume, for the record, that we know the values of the first four attributes (but we do not know the value of class attribute) as  outlook= rain; temp = 70; humidity = 65; and windy= true. 7 R. Akerkar
  • 8.  We start from the root node to check the value of the attribute associated at the root node.  This attribute is the splitting attribute at this node.  For a decision tree, at every node there is an attribute associated with the node called the splitting attribute.  In our example, outlook is the splitting attribute at root.  Since for the given record, outlook = rain, we move to the right- most child node of the root.  At this node, the splitting attribute is windy and we find that for the record we want classify, windy = true.  Hence, we move to the left child node to conclude that the class label Is "no play". 8 R. Akerkar
  • 9.  The accuracy of the classifier is determined by the percentage of the test data set that is correctly classified.  We can see that for Rule 1 there are two records of the test data set satisfying outlook= sunny and humidity < 75, and only one of these is correctly classified as play.  Thus, the accuracy of this rule is 0.5 (or 50%). Similarly, the accuracy of Rule 2 is also 0.5 (or 50%). The accuracy of Rule 3 is 0.66. RULE 1 If it is sunny and the humidity is not above 75%, then play. 9 R. Akerkar
  • 10. Concept of Categorical Attributes  Consider the following training data set.  There are three attributes, namely, age, pincode and class.  The attribute class is used for class label. The attribute age is a numeric attribute, whereas pincode is a categorical one. Though the domain of pincode is numeric, no ordering can be defined among pincode values. You cannot derive any useful information if one pin-code is greater than another pincode. 10 R. Akerkar
  • 11.  Figure gives a decision tree for the training data.  The splitting attribute at the root is pincode and the splitting criterion here is pincode = 500 046.  Similarly, for the left child node, the splitting criterion is age < 48 (the splitting attribute is age).  Although the right child node has the same attribute as the splitting attribute, the splitting criterion is different. At root level, we have 9 records. The associated splitting criterion is pincode = 500 046. As a result, we split the records into two subsets. Records 1, 2, 4, 8, and 9 are to the left child note and remaining to the right node. The process is repeated at every node. 11 R. Akerkar
  • 12. Advantages and Shortcomings of Decision Tree Classifications  A decision tree construction process is concerned with identifying the splitting attributes and splitting criterion at every level of the tree.  Major strengths are:  Decision tree able to generate understandable rules.  They are able to handle both numerical and categorical attributes.  They provide clear indication of which fields are most important for prediction or classification.  Weaknesses are:  The process of growing a decision tree is computationally expensive. At each node, each candidate splitting field is examined before its best split can be found.  Some decision tree can only deal with binary-valued target classes. 12 R. Akerkar
  • 13. Iterative Dichotomizer (ID3)  Quinlan (1986)  Each node corresponds to a splitting attribute  Each arc is a possible value of that attribute.  At each node the splitting attribute is selected to be the most informative among the attributes not yet considered in the path from the root.  Entropy is used to measure how informative is a node.  The algorithm uses the criterion of information gain to determine the goodness of a split.  The attribute with the greatest information gain is taken as the splitting attribute, and the data set is split for all distinct values of the attribute. 13 R. Akerkar
  • 14. Training Dataset This follows an example from Quinlan’s ID3 The class label attribute, buys_computer, has two distinct values. Thus there are two distinct classes. (m =2) Class C1 corresponds to yes and class C2 corresponds to no. There are 9 samples of class yes and 5 samples of class no. age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31…40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31…40 high yes fair yes >4 medium n excellent n 14 R. Akerkar
  • 15. Extracting Classification Rules from Trees  Represent the knowledge in the form of IF-THEN rules  One rule is created for each path from the root to a leaf  Each attribute-value pair along a path forms a conjunction  The leaf node holds the class prediction  Rules are easier for humans to understand What are the rules? 15 R. Akerkar
  • 16. Solution (Rules) IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no” 16 R. Akerkar
  • 17. Algorithm for Decision Tree Induction  Basic algorithm (a greedy algorithm)  Tree is constructed in a top-down recursive divide-and-conquer manner  At start, all the training examples are at the root  Attributes are categorical (if continuous-valued, they are discretized in advance)  Examples are partitioned recursively based on selected attributes  Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain)  Conditions for stopping partitioning  All samples for a given node belong to the same class  There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf  There are no samples left 17 R. Akerkar
  • 18. Attribute Selection Measure: Information Gain (ID3/C4.5)  Select the attribute with the highest information gain  S contains si tuples of class Ci for i = {1, …, m}  information measures info required to classify any arbitrary tuple  ….information is encoded in bits. s s si log 2 si i1 m I( s1,s2,...,sm )     entropy of attribute A with values {a1,a2,…,av} s I(s1j,...,smj ) s1j ...smj v E(A) j1 18 R. Akerkar  information gained by branching on attribute A Gain(A)  I(s 1,s 2,...,sm) E(A)
  • 19. Entropy  Entropy measures the homogeneity (purity) of a set of examples.  It gives the information content of the set in terms of the class labels of the examples.  Consider that you have a set of examples, S with two classes, P and N. Let the set have p instances for the class P and n instances for the class N.  So the total number of instances we have is t = p + n. The view [p, n] can be seen as a class distribution of S. The entropy for S is defined as  Entropy(S) = - (p/t).log2(p/t) - (n/t).log2(n/t)  Example: Let a set of examples consists of 9 instances for class positive, and 5 instances for class negative.  Answer: p = 9 and n = 5.  So Entropy(S) = - (9/14).log2(9/14) - (5/14).log2(5/14)    = -(0.64286)(-0.6375) - (0.35714)(-1.48557) = (0.40982) + (0.53056) = 0.940 19 R. Akerkar
  • 20. Entropy The entropy for a completely pure set is 0 and is 1 for a set with equal occurrences for both the classes. i.e. Entropy[14,0] = - (14/14).log2(14/14) - (0/14).log2(0/14) = -1.log2(1) - 0.log2(0) = -1.0 - 0 = 0 i.e. Entropy[7,7] = - (7/14).log2(7/14) - (7/14).log2(7/14) = - (0.5).log2(0.5) - (0.5).log2(0.5) = - (0.5).(-1) - (0.5).(-1) = 0.5 + 0.5 = 1 20 R. Akerkar
  • 21. Attribute Selection by Information Gain Computation 14 5 E(age)  5 I (2,3)  4 I (4,0) 14  14 I (3,2)  0.694 means “age <=30” has 5 out of 14 samples, with 2 yes's and 3 no’s. Hence  Class P: buys_computer = “yes”  Class N: buys_computer = “no”  I(p, n) = I(9, 5) =0.940  Compute the entropy for age: age pi ni I(pi, ni) <=30 2 3 0.971 30…40 4 0 0 >40 3 2 0.971 14 5 I (2,3) Gain(income)  0.029 Gain(age)  I( p, n)  E(age)  0.246 Similarly, Gain(student)  0.151 Gain(credit _ rating)  0.048 age income student credit_rating buys_computer <=30 high no fair n o <=30 high no excellent no 31…40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent n o 31…40 low yes excellent yes <=30 medium no fair n o <=30 low yes fair yes <=30 medium yes excellent yes 31…40 medium no excellent yes 31… 40 h ig h yes fair yes >40 medium no excellent no Since, age has the highest information gain among the attributes, it is selected as the test attribute. 21 R. Akerkar
  • 22. Exercise 1  The following table consists of training data from an employee database.  Let status be the class attribute. Use the ID3 algorithm to construct a decision tree from the given data. 22 R. Akerkar
  • 24. Other Attribute Selection Measures  Gini index (CART, IBM IntelligentMiner)  All attributes are assumed continuous-valued  Assume there exist several possible split values for each attribute  May need other tools, such as clustering, to get the possible split values  Can be modified for categorical attributes 24 R. Akerkar
  • 25. Gini Index (IBM IntelligentMiner)  If a data set T contains examples from n classes, gini index, gini(T) is defined as n  p 2 j gini (T )  1  j 1 where pj is the relative frequency of class j in T.  If a data set T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index gini(T) is defined as gini split (T )  N 1 gini (T1) N 2 gini (T 2) N N  The attribute provides the smallest ginisplit(T) is chosen to split the node (need to enumerate all possible splitting points for each attribute). 25 R. Akerkar
  • 27. Solution 2  SPLIT: Age <= 50  ----------------------  | High | Low | Total  --------------------  S1 (left) | 8 | 11 | 19  S2 (right) | 11 | 10 | 21   For S1: P(high) = 8/19 = 0.42 and P(low) = 11/19 = 0.58  For S2: P(high) = 11/21 = 0.52 and P(low) = 10/21 = 0.48  Gini(S1) = 1-[0.42x0.42 + 0.58x0.58] = 1-[0.18+0.34] = 1-0.52 = 0.48  Gini(S2) = 1-[0.52x0.52 + 0.48x0.48] = 1-[0.27+0.23] = 1-0.5 = 0.5  Gini-Split(Age<=50) = 19/40 x 0.48 + 21/40 x 0.5 = 0.23 + 0.26 = 0.49  SPLIT: Salary <= 65K  ----------------------  | High | Low | Total  --------------------  S1 (top) | 18 | 5 | 23  S2 (bottom) | 1 | 16 | 17  --------------------  For S1: P(high) = 18/23 = 0.78 and P(low) = 5/23 = 0.22  For S2: P(high) = 1/17 = 0.06 and P(low) = 16/17 = 0.94  Gini(S1) = 1-[0.78x0.78 + 0.22x0.22] = 1-[0.61+0.05] = 1-0.66 = 0.34  Gini(S2) = 1-[0.06x0.06 + 0.94x0.94] = 1-[0.004+0.884] = 1-0.89 = 0.11  Gini-Split(Age<=50) = 23/40 x 0.34 + 17/40 x 0.11 = 0.20 + 0.05 = 0.25 27 R. Akerkar
  • 28. Exercise 3  In previous exercise, which is a better split of the data among the two split points? Why? 28 R. Akerkar
  • 29. Solution 3 ``  Intuitively Salary <= 65K is a better split point since it produces relatively pure'' partitions as opposed to Age <= 50, which results in more mixed partitions (i.e., just look at the distribution of Highs and Lows in S1 and S2).  More formally, let us consider the properties of the Gini index. If a partition is totally pure, i.e., has all elements from the same class, then gini(S) = 1-[1x1+0x0] = 1-1 = 0 (for two classes). On the other hand if the classes are totally mixed, i.e., both classes have equal probability then gini(S) = 1 - [0.5x0.5 + 0.5x0.5] = 1-[0.25+0.25] = 0.5. In other words the closer the gini value is to 0, the better the partition is. Since Salary has lower gini it is a better split. 29 R. Akerkar
  • 30. Avoid Overfitting in Classification  Overfitting: An induced tree may overfit the training data  Too many branches, some may reflect anomalies due to noise or outliers  Poor accuracy for unseen samples  Two approaches to avoid overfitting  Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold  Difficult to choose an appropriate threshold  Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees  Use a set of data different from the training data to decide which is the “best pruned tree” 30 R. Akerkar