SlideShare a Scribd company logo
1 of 78
Download to read offline
Prof. Pier Luca Lanzi
Classification: DecisionTrees!
Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
Prof. Pier Luca Lanzi
The Weather Dataset
Outlook' Temp' Humidity' Windy' Play'
Sunny' Hot' High' False' No'
Sunny' Hot'' High'' True' No'
Overcast'' Hot''' High' False' Yes'
Rainy' Mild' High' False' Yes'
Rainy' Cool' Normal' False' Yes'
Rainy' Cool' Normal' True' No'
Overcast' Cool' Normal' True' Yes'
Sunny' Mild' High' False' No'
Sunny' Cool' Normal' False' Yes'
Rainy' Mild' Normal' False' Yes'
Sunny' Mild' Normal' True' Yes'
Overcast' Mild' High' True' Yes'
Overcast' Hot' Normal' False' Yes'
Rainy' Mild' High' True' No'
2
Prof. Pier Luca Lanzi
The Decision Tree for the !
Weather Dataset
3
humidity
overcast
No NoYes Yes
Yes
outlook
windy
sunny rain
falsetruenormalhigh
Prof. Pier Luca Lanzi
What is a Decision Tree?
•  An internal node is a test on an attribute
•  A branch represents an outcome of the test, e.g., outlook=windy
•  A leaf node represents a class label or class label distribution
•  At each node, one attribute is chosen to split training examples
into distinct classes as much as possible
•  A new case is classified by following a matching path to a leaf
node
4
Prof. Pier Luca Lanzi
Building a DecisionTree with KNIME
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Decision Tree Representations (Graphical) 8
Prof. Pier Luca Lanzi
Decision Tree Representations (Text)
outlook = overcast: yes {no=0, yes=4}
outlook = rainy
| windy = FALSE: yes {no=0, yes=3}
| windy = TRUE: no {no=2, yes=0}
outlook = sunny
| humidity = high: no {no=3, yes=0}
| humidity = normal: yes {no=0, yes=2}
9
Prof. Pier Luca Lanzi
Computing DecisionTrees
Prof. Pier Luca Lanzi
Computing Decision Trees
•  Top-down Tree Construction
! Initially, all the training examples are at the root
! Then, the examples are recursively partitioned, !
by choosing one attribute at a time
•  Bottom-up Tree Pruning
! Remove subtrees or branches, in a bottom-up manner, to
improve the estimated accuracy on new cases.
11
Prof. Pier Luca Lanzi
Top Down Induction of Decision Trees
function TDIDT(S) // S, a set of labeled examples
Tree = new empty node
if (all examples have the same class c
or no further splitting is possible)
then // new leaf labeled with the majority class c
Label(Tree) = c
else // new decision node
(A,T) = FindBestSplit(S)
foreach test t in T do
St = all examples that satisfy t
Nodet = TDIDT(St)
AddEdge(Tree -> Nodet)
endfor
endif
return Tree
12
Prof. Pier Luca Lanzi
When Should Building Stop?
•  There are several possible stopping criteria
•  All samples for a given node belong to the same class
•  If there are no remaining attributes for further partitioning,
majority voting is employed
•  There are no samples left
•  Or there is nothing to gain in splitting
13
Prof. Pier Luca Lanzi
How is the Splitting Attribute Determined?
Prof. Pier Luca Lanzi
What Do We Need? 15
k
IF salary<kTHEN not repaid
Prof. Pier Luca Lanzi
Which Attribute for Splitting?
•  At each node, available attributes are evaluated on the basis of
separating the classes of the training examples
•  A purity or impurity measure is used for this purpose
•  Information Gain: increases with the average purity of the subsets
that an attribute produces
•  Splitting Strategy: choose the attribute that results in greatest
information gain
•  Typical goodness functions: information gain (ID3), information
gain ratio (C4.5), gini index (CART)
16
Prof. Pier Luca Lanzi
Which Attribute Should We Select? 17
Prof. Pier Luca Lanzi
Computing Information
•  Information is measured in bits
! Given a probability distribution, the info required to predict an
event is the distribution’s entropy
! Entropy gives the information required in bits (this can involve
fractions of bits!)
•  Formula for computing the entropy
18
Prof. Pier Luca Lanzi
Entropy 19
two classes
0/1
% of elements of class 0
entropy
Prof. Pier Luca Lanzi
Which Attribute Should We Select? 20
Prof. Pier Luca Lanzi
The Attribute “outlook”
•  outlook” = “sunny”
•  “outlook” = “overcast”
•  “outlook” = “rainy”
•  Expected information for attribute
21
Prof. Pier Luca Lanzi
Information Gain
•  Difference between the information before split and the
information after split
•  The information before the split, info(D), is the entropy,
•  The information after the split using attribute A is computed as
the weighted sum of the entropies on each split, given n splits,
22
Prof. Pier Luca Lanzi
Information Gain
•  Difference between the information before split and the
information after split
•  Information gain for the attributes from the weather data:
! gain(“outlook”)=0.247 bits
! gain(“temperature”)=0.029 bits
! gain(“humidity”)=0.152 bits
! gain(“windy”)=0.048 bits
23
Prof. Pier Luca Lanzi
You can check the math using KNIME
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Going Further 26
Prof. Pier Luca Lanzi
The Final Decision Tree
•  Not all the leaves need to be pure
•  Splitting stops when data can not be split any further
27
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
We Can Always Build a 100% AccurateTree…
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Another Version of the Weather Dataset
ID Code Outlook Temp Humidity Windy Play
A Sunny Hot High False No
B Sunny Hot High True No
C Overcast Hot High False Yes
D Rainy Mild High False Yes
E Rainy Cool Normal False Yes
F Rainy Cool Normal True No
G Overcast Cool Normal True Yes
H Sunny Mild High False No
I Sunny Cool Normal False Yes
J Rainy Mild Normal False Yes
K Sunny Mild Normal True Yes
L Overcast Mild High True Yes
M Overcast Hot Normal False Yes
N Rainy Mild High True No
31
Prof. Pier Luca Lanzi
Decision Tree for the New Dataset
•  Entropy for splitting using “ID Code” is zero, since each leaf node is “pure”
•  Information Gain is thus maximal for ID code
32
Prof. Pier Luca Lanzi
Highly-Branching Attributes
•  Attributes with a large number of values are usually problematic
•  Examples: id, primary keys, or almost primary key attributes
•  Subsets are likely to be pure if there is a large number of values
•  Information Gain is biased towards choosing attributes with a
large number of values
•  This may result in overfitting (selection of an attribute that is non-
optimal for prediction)
33
Prof. Pier Luca Lanzi
Information Gain Ratio
Prof. Pier Luca Lanzi
Information Gain Ratio
•  Modification of the Information Gain that reduces the bias
toward highly-branching attributes
•  Information Gain Ratio should be
! Large when data is evenly spread
! Small when all data belong to one branch
•  Information Gain Ratio takes number and size of branches into
account when choosing an attribute
•  It corrects the information gain by taking the intrinsic information
of a split into account
35
Prof. Pier Luca Lanzi
Information Gain Ratio and !
Intrinsic information
•  Intrinsic information!
!
!
!
!
computes the entropy of distribution of instances into branches
•  Information Gain Ratio normalizes Information Gain by
36
Prof. Pier Luca Lanzi
Computing the Information Gain Ratio
•  The intrinsic information for ID code is
•  Importance of attribute decreases as intrinsic information gets
larger
•  The Information gain ratio of “ID code”,
37
Prof. Pier Luca Lanzi
Information Gain Ratio for Weather Data 38
Outlook Temperature
Info: 0.693 Info: 0.911
Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029
Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362
Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021
Humidity Windy
Info: 0.788 Info: 0.892
Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048
Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985
Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049
Prof. Pier Luca Lanzi
More on Information Gain Ratio
•  “outlook” still comes out top, however “ID code” has greater
Information Gain Ratio
•  The standard fix is an ad-hoc test to prevent splitting on that type
of attribute
•  First, only consider attributes with greater than average
Information Gain; Then, compare them using the Information
Gain Ratio
•  Information Gain Ratio may overcompensate and choose an
attribute just because its intrinsic information is very low
39
Prof. Pier Luca Lanzi
What if Attributes are Numerical?
Prof. Pier Luca Lanzi
The Weather Dataset (Numerical)
Outlook Temp Humidity Windy Play
Sunny 85 85 False No
Sunny 80 90 True No
Overcast 83 78 False Yes
Rainy 70 96 False Yes
Rainy 68 80 False Yes
Rainy 65 70 True No
Overcast 64 65 True Yes
Sunny 72 95 False No
Sunny 69 70 False Yes
Rainy 75 80 False Yes
Sunny 75 70 True Yes
Overcast 72 90 True Yes
Overcast 81 75 False Yes
Rainy 71 80 True No
41
Prof. Pier Luca Lanzi
The Temperature Attribute
•  First, sort the temperature values, including the class labels
•  Then, check all the cut points and choose the one with the best
information gain
•  E.g. temperature < 71.5: yes/4, no/2!
temperature ≥ 71.5: yes/5, no/3
•  Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939
•  Place split points halfway between values
•  Can evaluate all split points in one pass!
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No
Prof. Pier Luca Lanzi
The Information Gain for Humidity
Humidity' Play'
65 Yes'
70 No'
70 Yes'
70 Yes'
75 Yes'
78 Yes'
80 Yes'
80 Yes'
80 No'
85 No'
90 No'
90 Yes'
95 No'
96 Yes'
43
Humidity' Play'
65 Yes'
70 No'
70 Yes'
70 Yes'
75 Yes'
78 Yes'
80 Yes'
80 Yes'
80 No'
85 No'
90 No'
90 Yes'
95 No'
96 Yes'
sort the
attribute
values
compute the gain for
every possible split
what is the information
gain if we split here?
Prof. Pier Luca Lanzi
Information Gain for Humidity
Humidity Play # ofYes % ofYes
# of
No
% of No Weight
Entropy
Left
# of
Yes
% ofYes
# of
No
% of
No
Weight
Entropy
Right
Information
Gain
65 Yes 1 100.00% 0 0.00% 7.14% 0.00 8.00 0.62 5.00 0.38 92.86% 0.96 0.0477
70 No 1 50.00% 1 50.00% 14.29% 1.00 8.00 0.67 4.00 0.33 85.71% 0.92 0.0103
70 Yes 2 66.67% 1 33.33% 21.43% 0.92 7.00 0.64 4.00 0.36 78.57% 0.95 0.0005
70 Yes 3 75.00% 1 25.00% 28.57% 0.81 6.00 0.60 4.00 0.40 71.43% 0.97 0.0150
75 Yes 4 80.00% 1 20.00% 35.71% 0.72 5.00 0.56 4.00 0.44 64.29% 0.99 0.0453
78 Yes 5 83.33% 1 16.67% 42.86% 0.65 4.00 0.50 4.00 0.50 57.14% 1.00 0.0903
80 Yes 6 85.71% 1 14.29% 50.00% 0.59 3.00 0.43 4.00 0.57 50.00% 0.99 0.1518
80 Yes 7 87.50% 1 12.50% 57.14% 0.54 2.00 0.33 4.00 0.67 42.86% 0.92 0.2361
80 No 7 77.78% 2 22.22% 64.29% 0.76 2.00 0.40 3.00 0.60 35.71% 0.97 0.1022
85 No 7 70.00% 3 30.00% 71.43% 0.88 2.00 0.50 2.00 0.50 28.57% 1.00 0.0251
90 No 7 63.64% 4 36.36% 78.57% 0.95 2.00 0.67 1.00 0.33 21.43% 0.92 0.0005
90 Yes 8 66.67% 4 33.33% 85.71% 0.92 1.00 0.50 1.00 0.50 14.29% 1.00 0.0103
95 No 8 61.54% 5 38.46% 92.86% 0.96 1.00 1.00 0.00 0.00 7.14% 0.00 0.0477
96 Yes 9 64.29% 5 35.71% 100.00% 0.94 0.00 0.00 0.00 0.00 0.00% 0.00 0.0000
44
Prof. Pier Luca Lanzi
Otherwise, we can discretize…
Prof. Pier Luca Lanzi
Discretization
•  Processing for converting an interval/numerical variable into a a finite
(discrete) set of elements (labels)
•  A discretization algorithm computes a series of cut-points which defines
a set of intervals that are mapped into labels
•  Motivations: many methods (also in mathematics and computer
science) cannot deal with numerical variables. In these cases,
discretization is required to be able to use them, despite the loss of
information
•  Several approaches
! Supervised vs unsupervised
! Static vs dynamic
46
Prof. Pier Luca Lanzi
Unsupervised vs Supervised Discretization
•  Unsupervised discretization just uses the attribute values, for instance,
discretize humidity as <70, 70-79, >=80)
•  Supervised discretization also uses the class attribute to generate interval with
lower entropy
•  Example using humidity
! Values alone may suggest <60, 60-70
! Considering the class value might suggest different intervals!
!
!
!
by grouping values to maintain information about the class attribute
47
64 65 68 69 70 71 72 72 75 75 80 81 83 85
64 65 68 69 70 71 72 72 75 75 80 81 83 85
Yes | No | Yes Yes Yes | No No | Yes | Yes Yes | No | Yes Yes | No
Prof. Pier Luca Lanzi
Example of Unsupervised and Supervised
Discretization for Humidity
48
Unsupervised Discretization Supervised Discretization
Prof. Pier Luca Lanzi
The Gini Index
Prof. Pier Luca Lanzi
Another Splitting Criteria: !
The Gini Index
•  The gini index, for a data set T contains examples from n classes,
is defined as!
!
!
!
!
where pj is the relative frequency of class j in T
•  gini(T) is minimized if the classes in T are skewed.
50
Prof. Pier Luca Lanzi
The Gini Index vs Entropy 51
Prof. Pier Luca Lanzi
The Gini Index
•  If a data set D is split on A into two subsets D1 and D2, then,
•  The reduction of impurity is defined as,
•  The attribute provides the smallest gini splitting D over A (or the
largest reduction in impurity) is chosen to split the node (need to
enumerate all the possible splitting points for each attribute)
52
Prof. Pier Luca Lanzi
The Gini Index: Example
•  D has 9 tuples labeled “yes” and 5 labeled “no”
•  Suppose the attribute income partitions D into 10 in D1
branching on low and medium and 4 in D2
•  but gini{medium,high} is 0.30 and thus the best since it is the
lowest
53
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Generalization and Overfitting in DecisionTrees
Prof. Pier Luca Lanzi
Prof. Pier Luca Lanzi
Avoiding Overfitting in Decision Trees
•  The generated tree may overfit the training data
•  Too many branches, some may reflect anomalies!
due to noise or outliers
•  Result is in poor accuracy for unseen samples
•  Two approaches to avoid overfitting
! Prepruning
! Postpruning
57
Prof. Pier Luca Lanzi
Pre-pruning vs Post-pruning
•  Prepruning
! Halt tree construction early
! Do not split a node if this would result in the goodness
measure falling below a threshold
! Difficult to choose an appropriate threshold
•  Postpruning
! Remove branches from a “fully grown” tree
! Get a sequence of progressively pruned trees
! Use a set of data different from the training data to decide
which is the “best pruned tree”
58
Prof. Pier Luca Lanzi
Pre-Pruning
•  Based on statistical significance test
•  Stop growing the tree when there is no statistically significant
association between any attribute and the class at a particular
node
•  Early stopping and halting
! No individual attribute exhibits any interesting information
about the class
! The structure is only visible in fully expanded tree
! Pre-pruning won’t expand the root node
59
Prof. Pier Luca Lanzi
Post-Pruning
•  First, build full tree, then prune it
•  Fully-grown tree shows all attribute interactions
•  Problem: some subtrees might be due to chance effects
•  Two pruning operations
! Subtree raising
! Subtree replacement
•  Possible strategies
! Error estimation
! Significance testing
! MDL principle
60
Prof. Pier Luca Lanzi
Subtree Raising
•  Delete node
•  Redistribute instances
•  Slower than subtree replacement
61
X
Prof. Pier Luca Lanzi
Subtree Replacement
•  Works bottom-up
•  Consider replacing a tree!
only after considering !
all its subtrees
62
Prof. Pier Luca Lanzi
Estimating Error Rates
•  Prune only if it reduces the estimated error
•  Error on the training data is NOT a useful estimator!
(Why it would result in very little pruning?)
•  A hold-out set might be kept for pruning!
(“reduced-error pruning”)
•  Example (C4.5’s method)
! Derive confidence interval from training data
! Use a heuristic limit, derived from this, for pruning
! Standard Bernoulli-process-based method
! Shaky statistical assumptions (based on training data)
63
Prof. Pier Luca Lanzi
Mean and Variance
•  Mean and variance for a Bernoulli trial: p, p (1–p)
•  Expected error rate f=S/N has mean p and variance p(1–p)/N
•  For large enough N, f follows a Normal distribution
•  c% confidence interval [–z ≤ X ≤ z] for random variable with 0
mean is given by:
•  With a symmetric distribution,
64
Prof. Pier Luca Lanzi
Confidence Limits
• Confidence limits for the
normal distribution with 0
mean and a variance of 1,
• Thus,
• To use this we have to
reduce our random
variable f to have 0 mean
and unit variance
65
Pr[X ≥ z] z
0.1% 3.09
0.5% 2.58
1% 2.33
5% 1.65
10% 1.28
20% 0.84
25% 0.69
40% 0.25
Prof. Pier Luca Lanzi
C4.5’s Pruning Method
•  Given the error f on the training data, the upper bound for the
error estimate for a node is computed as
•  If c = 25% then z = 0.69 (from normal distribution)
! f is the error on the training data
! N is the number of instances covered by the leaf
66
!
"
#
$
%
&
+
!
!
"
#
$
$
%
&
+−++=
N
z
N
z
N
f
N
f
z
N
z
fe
2
2
222
1
42
Prof. Pier Luca Lanzi
f=0.33
e=0.47
f=0.5
e=0.72
f=0.33
e=0.47
f = 5/14
e = 0.46
e < 0.51
so prune!
Combined using ratios 6:2:6 gives 0.51
Prof. Pier Luca Lanzi
Examples Using KNIME
Prof. Pier Luca Lanzi
pruned tree (Gain ratio)
Prof. Pier Luca Lanzi
pruned tree (Gini index)
Prof. Pier Luca Lanzi
RegressionTrees
Prof. Pier Luca Lanzi
Regression Trees for Prediction
•  Decision trees can also be used to predict the value of a
numerical target variable
•  Regression trees work similarly to decision trees, by analyzing
several splits attempted, choosing the one that minimizes impurity
•  Differences from Decision Trees
! Prediction is computed as the average of numerical target
variable in the subspace (instead of the majority vote)
! Impurity is measured by sum of squared deviations from leaf
mean (instead of information-based measures)
! Find split that produces greatest separation in 	
[y – E(y)]2
! Find nodes with minimal within variance and therefore
! Performance is measured by RMSE (root mean squared error)
72
Prof. Pier Luca Lanzi
0
0
32
128
CHMAX
0
0
8
16
CHMIN
Channels PerformanceCache
(Kb)
Main memory
(Kb)
Cycle time
(ns)
45040001000480209
67328000512480208
…
26932320008000292
19825660002561251
PRPCACHMMAXMMINMYCT
PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX
+ 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX
Prof. Pier Luca Lanzi
Mining Regression Trees in R
# Regression Tree Example
library(rpart)
library(MASS)
data(cpus)
# grow tree
fit <- rpart(perf~syct + mmin + mmax + cach + chmin + chmax,
method="anova", data=cpus)
# create attractive postcript plot of tree
post(fit, file = "CPUPerformanceRegressionTree.ps",
title = "Regression Tree for CPU Performance Dataset")
74
Prof. Pier Luca Lanzi
|
mmax< 2.8e+04
cach< 27
cach< 96.5
cach< 80
mmax>=2.8e+04
cach>=27
cach>=96.5
cach>=80
105.62
n=209
60.72
n=182
39.638
n=141
133.22
n=41
114.44
n=34
224.43
n=7
408.26
n=27
299.21
n=19
667.25
n=8
Regression Tree for CPU Performance Dataset
Prof. Pier Luca Lanzi
Summary
Prof. Pier Luca Lanzi
Summary
•  Decision tree construction is a recursive procedure involving
! The selection of the best splitting attribute
! (and thus) a selection of an adequate purity measure
(Information Gain, Information Gain Ratio, Gini Index)
! Pruning to avoid overfitting
•  Information Gain is biased toward highly splitting attributes
•  Information Gain Ratio takes the number of splits that an
attribute induces into the picture but might not be enough
77
Prof. Pier Luca Lanzi
Suggested Homework
•  Try building the tree for the nominal version of the Weather
dataset using the Gini Index
•  Compute the best upper bound you can for the computational
complexity of the decision tree building
•  Check the problems given in the previous year exams on the use
of Information Gain, Gain Ratio and Gini index, solve them and
check the results using Weka.
78

More Related Content

What's hot

Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learningAmr BARAKAT
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesPier Luca Lanzi
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationPier Luca Lanzi
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationPier Luca Lanzi
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringPier Luca Lanzi
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationPier Luca Lanzi
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesPier Luca Lanzi
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationPier Luca Lanzi
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationPier Luca Lanzi
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationPier Luca Lanzi
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringPier Luca Lanzi
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 RegressionPier Luca Lanzi
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision treesKnoldus Inc.
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationPier Luca Lanzi
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithmsMark Moriarty
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7Roger Barga
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 

What's hot (20)

Decision trees for machine learning
Decision trees for machine learningDecision trees for machine learning
Decision trees for machine learning
 
DMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification EnsemblesDMTM 2015 - 15 Classification Ensembles
DMTM 2015 - 15 Classification Ensembles
 
DMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data PreparationDMTM 2015 - 16 Data Preparation
DMTM 2015 - 16 Data Preparation
 
DMTM Lecture 04 Classification
DMTM Lecture 04 ClassificationDMTM Lecture 04 Classification
DMTM Lecture 04 Classification
 
DMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to ClusteringDMTM 2015 - 06 Introduction to Clustering
DMTM 2015 - 06 Introduction to Clustering
 
DMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data RepresentationDMTM 2015 - 03 Data Representation
DMTM 2015 - 03 Data Representation
 
DMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensemblesDMTM Lecture 10 Classification ensembles
DMTM Lecture 10 Classification ensembles
 
Decision tree
Decision treeDecision tree
Decision tree
 
DMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluationDMTM Lecture 15 Clustering evaluation
DMTM Lecture 15 Clustering evaluation
 
DMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparationDMTM Lecture 20 Data preparation
DMTM Lecture 20 Data preparation
 
DMTM Lecture 05 Data representation
DMTM Lecture 05 Data representationDMTM Lecture 05 Data representation
DMTM Lecture 05 Data representation
 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
 
DMTM Lecture 11 Clustering
DMTM Lecture 11 ClusteringDMTM Lecture 11 Clustering
DMTM Lecture 11 Clustering
 
DMTM Lecture 03 Regression
DMTM Lecture 03 RegressionDMTM Lecture 03 Regression
DMTM Lecture 03 Regression
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Machine Learning with Decision trees
Machine Learning with Decision treesMachine Learning with Decision trees
Machine Learning with Decision trees
 
DMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data ExplorationDMTM 2015 - 04 Data Exploration
DMTM 2015 - 04 Data Exploration
 
Mathematics online: some common algorithms
Mathematics online: some common algorithmsMathematics online: some common algorithms
Mathematics online: some common algorithms
 
Barga Data Science lecture 7
Barga Data Science lecture 7Barga Data Science lecture 7
Barga Data Science lecture 7
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 

Viewers also liked

Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Pier Luca Lanzi
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationPier Luca Lanzi
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1Pier Luca Lanzi
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringPier Luca Lanzi
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringPier Luca Lanzi
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningPier Luca Lanzi
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningPier Luca Lanzi
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionPier Luca Lanzi
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2Pier Luca Lanzi
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesPier Luca Lanzi
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesPier Luca Lanzi
 
Procedural Content Generation with Unity
Procedural Content Generation with UnityProcedural Content Generation with Unity
Procedural Content Generation with UnityPier Luca Lanzi
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and ConceptualizationPier Luca Lanzi
 
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Pier Luca Lanzi
 
Working with Formal Elements
Working with Formal ElementsWorking with Formal Elements
Working with Formal ElementsPier Luca Lanzi
 

Viewers also liked (20)

Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016Focus Junior - 14 Maggio 2016
Focus Junior - 14 Maggio 2016
 
DMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to ClassificationDMTM 2015 - 10 Introduction to Classification
DMTM 2015 - 10 Introduction to Classification
 
DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1DMTM 2015 - 17 Text Mining Part 1
DMTM 2015 - 17 Text Mining Part 1
 
DMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical ClusteringDMTM 2015 - 07 Hierarchical Clustering
DMTM 2015 - 07 Hierarchical Clustering
 
DMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based ClusteringDMTM 2015 - 09 Density Based Clustering
DMTM 2015 - 09 Density Based Clustering
 
Course Introduction
Course IntroductionCourse Introduction
Course Introduction
 
DMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph MiningDMTM 2015 - 19 Graph Mining
DMTM 2015 - 19 Graph Mining
 
DMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data MiningDMTM 2015 - 02 Data Mining
DMTM 2015 - 02 Data Mining
 
DMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course IntroductionDMTM 2015 - 01 Course Introduction
DMTM 2015 - 01 Course Introduction
 
Course Organization
Course OrganizationCourse Organization
Course Organization
 
DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2DMTM 2015 - 18 Text Mining Part 2
DMTM 2015 - 18 Text Mining Part 2
 
DMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association RulesDMTM 2015 - 05 Association Rules
DMTM 2015 - 05 Association Rules
 
Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Classification Using Decision tree
Classification Using Decision treeClassification Using Decision tree
Classification Using Decision tree
 
Procedural Content Generation with Unity
Procedural Content Generation with UnityProcedural Content Generation with Unity
Procedural Content Generation with Unity
 
Idea Generation and Conceptualization
Idea Generation and ConceptualizationIdea Generation and Conceptualization
Idea Generation and Conceptualization
 
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
Introduction to Procedural Content Generation - Codemotion 29 Novembre 2014
 
Working with Formal Elements
Working with Formal ElementsWorking with Formal Elements
Working with Formal Elements
 
The Structure of Games
The Structure of GamesThe Structure of Games
The Structure of Games
 
The Design Document
The Design DocumentThe Design Document
The Design Document
 

Similar to DMTM 2015 - 11 Decision Trees

Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
unit 5 decision tree2.pptx
unit 5 decision tree2.pptxunit 5 decision tree2.pptx
unit 5 decision tree2.pptxssuser5c580e1
 
Introduction to ML and Decision Tree
Introduction to ML and Decision TreeIntroduction to ML and Decision Tree
Introduction to ML and Decision TreeSuman Debnath
 
Decision trees
Decision treesDecision trees
Decision treesNcib Lotfi
 
"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love BucharestStefan Adam
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision treeyazad dumasia
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptglorypreciousj
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handoutYi-Shin Chen
 
Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxABCraftsman
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningLibya Thomas
 
How to Measure the Security of your Network Defenses
How to Measure the Security of your Network DefensesHow to Measure the Security of your Network Defenses
How to Measure the Security of your Network DefensesPECB
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)Dinis Cruz
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networksDavid Balduzzi
 
Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMedia Perspectives
 
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...butest
 
The Four Questions (Every Monitoring Engineer gets asked), by Leon Adato
The Four Questions (Every Monitoring Engineer gets asked), by Leon AdatoThe Four Questions (Every Monitoring Engineer gets asked), by Leon Adato
The Four Questions (Every Monitoring Engineer gets asked), by Leon AdatoCloud Native Day Tel Aviv
 

Similar to DMTM 2015 - 11 Decision Trees (20)

Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
unit 5 decision tree2.pptx
unit 5 decision tree2.pptxunit 5 decision tree2.pptx
unit 5 decision tree2.pptx
 
CS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptxCS632_Lecture_15_updated.pptx
CS632_Lecture_15_updated.pptx
 
Introduction to ML and Decision Tree
Introduction to ML and Decision TreeIntroduction to ML and Decision Tree
Introduction to ML and Decision Tree
 
Decision trees
Decision treesDecision trees
Decision trees
 
"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest"Induction of Decision Trees" @ Papers We Love Bucharest
"Induction of Decision Trees" @ Papers We Love Bucharest
 
Classification decision tree
Classification  decision treeClassification  decision tree
Classification decision tree
 
[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊[系列活動] 資料探勘速遊
[系列活動] 資料探勘速遊
 
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.pptDECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
DECISION TREEScbhwbfhebfyuefyueye7yrue93e939euidhcn xcnxj.ppt
 
Quick tour all handout
Quick tour all handoutQuick tour all handout
Quick tour all handout
 
Lecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptxLecture 1 Descriptives.pptx
Lecture 1 Descriptives.pptx
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Supervised learning: Types of Machine Learning
Supervised learning: Types of Machine LearningSupervised learning: Types of Machine Learning
Supervised learning: Types of Machine Learning
 
How to Measure the Security of your Network Defenses
How to Measure the Security of your Network DefensesHow to Measure the Security of your Network Defenses
How to Measure the Security of your Network Defenses
 
How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)How to not fail at security data analytics (by CxOSidekick)
How to not fail at security data analytics (by CxOSidekick)
 
Game theory for neural networks
Game theory for neural networksGame theory for neural networks
Game theory for neural networks
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Master Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno SiebesMaster Minds on Data Science - Arno Siebes
Master Minds on Data Science - Arno Siebes
 
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
Software-Praktikum SoSe 2005 Lehrstuhl fuer Maschinelles ...
 
The Four Questions (Every Monitoring Engineer gets asked), by Leon Adato
The Four Questions (Every Monitoring Engineer gets asked), by Leon AdatoThe Four Questions (Every Monitoring Engineer gets asked), by Leon Adato
The Four Questions (Every Monitoring Engineer gets asked), by Leon Adato
 

More from Pier Luca Lanzi

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i VideogiochiPier Luca Lanzi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiPier Luca Lanzi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomePier Luca Lanzi
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...Pier Luca Lanzi
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaPier Luca Lanzi
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Pier Luca Lanzi
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningPier Luca Lanzi
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningPier Luca Lanzi
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringPier Luca Lanzi
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringPier Luca Lanzi
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringPier Luca Lanzi
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionPier Luca Lanzi
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningPier Luca Lanzi
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelinePier Luca Lanzi
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityPier Luca Lanzi
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationPier Luca Lanzi
 

More from Pier Luca Lanzi (18)

11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi11 Settembre 2021 - Giocare con i Videogiochi
11 Settembre 2021 - Giocare con i Videogiochi
 
Breve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei VideogiochiBreve Viaggio al Centro dei Videogiochi
Breve Viaggio al Centro dei Videogiochi
 
Global Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning WelcomeGlobal Game Jam 19 @ POLIMI - Morning Welcome
Global Game Jam 19 @ POLIMI - Morning Welcome
 
Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018Data Driven Game Design @ Campus Party 2018
Data Driven Game Design @ Campus Party 2018
 
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
GGJ18 al Politecnico di Milano - Presentazione che precede la presentazione d...
 
GGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di aperturaGGJ18 al Politecnico di Milano - Presentazione di apertura
GGJ18 al Politecnico di Milano - Presentazione di apertura
 
Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018Presentation for UNITECH event - January 8, 2018
Presentation for UNITECH event - January 8, 2018
 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
 
DMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph miningDMTM Lecture 18 Graph mining
DMTM Lecture 18 Graph mining
 
DMTM Lecture 17 Text mining
DMTM Lecture 17 Text miningDMTM Lecture 17 Text mining
DMTM Lecture 17 Text mining
 
DMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clusteringDMTM Lecture 14 Density based clustering
DMTM Lecture 14 Density based clustering
 
DMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clusteringDMTM Lecture 13 Representative based clustering
DMTM Lecture 13 Representative based clustering
 
DMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clusteringDMTM Lecture 12 Hierarchical clustering
DMTM Lecture 12 Hierarchical clustering
 
DMTM Lecture 01 Introduction
DMTM Lecture 01 IntroductionDMTM Lecture 01 Introduction
DMTM Lecture 01 Introduction
 
DMTM Lecture 02 Data mining
DMTM Lecture 02 Data miningDMTM Lecture 02 Data mining
DMTM Lecture 02 Data mining
 
VDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipelineVDP2016 - Lecture 16 Rendering pipeline
VDP2016 - Lecture 16 Rendering pipeline
 
VDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with UnityVDP2016 - Lecture 15 PCG with Unity
VDP2016 - Lecture 15 PCG with Unity
 
VDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generationVDP2016 - Lecture 14 Procedural content generation
VDP2016 - Lecture 14 Procedural content generation
 

Recently uploaded

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfSumit Tiwari
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.arsicmarija21
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentInMediaRes1
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 

Recently uploaded (20)

How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdfEnzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
Enzyme, Pharmaceutical Aids, Miscellaneous Last Part of Chapter no 5th.pdf
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 
AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.AmericanHighSchoolsprezentacijaoskolama.
AmericanHighSchoolsprezentacijaoskolama.
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Meghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media ComponentMeghan Sutherland In Media Res Media Component
Meghan Sutherland In Media Res Media Component
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 

DMTM 2015 - 11 Decision Trees

  • 1. Prof. Pier Luca Lanzi Classification: DecisionTrees! Data Mining andText Mining (UIC 583 @ Politecnico di Milano)
  • 2. Prof. Pier Luca Lanzi The Weather Dataset Outlook' Temp' Humidity' Windy' Play' Sunny' Hot' High' False' No' Sunny' Hot'' High'' True' No' Overcast'' Hot''' High' False' Yes' Rainy' Mild' High' False' Yes' Rainy' Cool' Normal' False' Yes' Rainy' Cool' Normal' True' No' Overcast' Cool' Normal' True' Yes' Sunny' Mild' High' False' No' Sunny' Cool' Normal' False' Yes' Rainy' Mild' Normal' False' Yes' Sunny' Mild' Normal' True' Yes' Overcast' Mild' High' True' Yes' Overcast' Hot' Normal' False' Yes' Rainy' Mild' High' True' No' 2
  • 3. Prof. Pier Luca Lanzi The Decision Tree for the ! Weather Dataset 3 humidity overcast No NoYes Yes Yes outlook windy sunny rain falsetruenormalhigh
  • 4. Prof. Pier Luca Lanzi What is a Decision Tree? •  An internal node is a test on an attribute •  A branch represents an outcome of the test, e.g., outlook=windy •  A leaf node represents a class label or class label distribution •  At each node, one attribute is chosen to split training examples into distinct classes as much as possible •  A new case is classified by following a matching path to a leaf node 4
  • 5. Prof. Pier Luca Lanzi Building a DecisionTree with KNIME
  • 8. Prof. Pier Luca Lanzi Decision Tree Representations (Graphical) 8
  • 9. Prof. Pier Luca Lanzi Decision Tree Representations (Text) outlook = overcast: yes {no=0, yes=4} outlook = rainy | windy = FALSE: yes {no=0, yes=3} | windy = TRUE: no {no=2, yes=0} outlook = sunny | humidity = high: no {no=3, yes=0} | humidity = normal: yes {no=0, yes=2} 9
  • 10. Prof. Pier Luca Lanzi Computing DecisionTrees
  • 11. Prof. Pier Luca Lanzi Computing Decision Trees •  Top-down Tree Construction ! Initially, all the training examples are at the root ! Then, the examples are recursively partitioned, ! by choosing one attribute at a time •  Bottom-up Tree Pruning ! Remove subtrees or branches, in a bottom-up manner, to improve the estimated accuracy on new cases. 11
  • 12. Prof. Pier Luca Lanzi Top Down Induction of Decision Trees function TDIDT(S) // S, a set of labeled examples Tree = new empty node if (all examples have the same class c or no further splitting is possible) then // new leaf labeled with the majority class c Label(Tree) = c else // new decision node (A,T) = FindBestSplit(S) foreach test t in T do St = all examples that satisfy t Nodet = TDIDT(St) AddEdge(Tree -> Nodet) endfor endif return Tree 12
  • 13. Prof. Pier Luca Lanzi When Should Building Stop? •  There are several possible stopping criteria •  All samples for a given node belong to the same class •  If there are no remaining attributes for further partitioning, majority voting is employed •  There are no samples left •  Or there is nothing to gain in splitting 13
  • 14. Prof. Pier Luca Lanzi How is the Splitting Attribute Determined?
  • 15. Prof. Pier Luca Lanzi What Do We Need? 15 k IF salary<kTHEN not repaid
  • 16. Prof. Pier Luca Lanzi Which Attribute for Splitting? •  At each node, available attributes are evaluated on the basis of separating the classes of the training examples •  A purity or impurity measure is used for this purpose •  Information Gain: increases with the average purity of the subsets that an attribute produces •  Splitting Strategy: choose the attribute that results in greatest information gain •  Typical goodness functions: information gain (ID3), information gain ratio (C4.5), gini index (CART) 16
  • 17. Prof. Pier Luca Lanzi Which Attribute Should We Select? 17
  • 18. Prof. Pier Luca Lanzi Computing Information •  Information is measured in bits ! Given a probability distribution, the info required to predict an event is the distribution’s entropy ! Entropy gives the information required in bits (this can involve fractions of bits!) •  Formula for computing the entropy 18
  • 19. Prof. Pier Luca Lanzi Entropy 19 two classes 0/1 % of elements of class 0 entropy
  • 20. Prof. Pier Luca Lanzi Which Attribute Should We Select? 20
  • 21. Prof. Pier Luca Lanzi The Attribute “outlook” •  outlook” = “sunny” •  “outlook” = “overcast” •  “outlook” = “rainy” •  Expected information for attribute 21
  • 22. Prof. Pier Luca Lanzi Information Gain •  Difference between the information before split and the information after split •  The information before the split, info(D), is the entropy, •  The information after the split using attribute A is computed as the weighted sum of the entropies on each split, given n splits, 22
  • 23. Prof. Pier Luca Lanzi Information Gain •  Difference between the information before split and the information after split •  Information gain for the attributes from the weather data: ! gain(“outlook”)=0.247 bits ! gain(“temperature”)=0.029 bits ! gain(“humidity”)=0.152 bits ! gain(“windy”)=0.048 bits 23
  • 24. Prof. Pier Luca Lanzi You can check the math using KNIME
  • 26. Prof. Pier Luca Lanzi Going Further 26
  • 27. Prof. Pier Luca Lanzi The Final Decision Tree •  Not all the leaves need to be pure •  Splitting stops when data can not be split any further 27
  • 29. Prof. Pier Luca Lanzi We Can Always Build a 100% AccurateTree…
  • 31. Prof. Pier Luca Lanzi Another Version of the Weather Dataset ID Code Outlook Temp Humidity Windy Play A Sunny Hot High False No B Sunny Hot High True No C Overcast Hot High False Yes D Rainy Mild High False Yes E Rainy Cool Normal False Yes F Rainy Cool Normal True No G Overcast Cool Normal True Yes H Sunny Mild High False No I Sunny Cool Normal False Yes J Rainy Mild Normal False Yes K Sunny Mild Normal True Yes L Overcast Mild High True Yes M Overcast Hot Normal False Yes N Rainy Mild High True No 31
  • 32. Prof. Pier Luca Lanzi Decision Tree for the New Dataset •  Entropy for splitting using “ID Code” is zero, since each leaf node is “pure” •  Information Gain is thus maximal for ID code 32
  • 33. Prof. Pier Luca Lanzi Highly-Branching Attributes •  Attributes with a large number of values are usually problematic •  Examples: id, primary keys, or almost primary key attributes •  Subsets are likely to be pure if there is a large number of values •  Information Gain is biased towards choosing attributes with a large number of values •  This may result in overfitting (selection of an attribute that is non- optimal for prediction) 33
  • 34. Prof. Pier Luca Lanzi Information Gain Ratio
  • 35. Prof. Pier Luca Lanzi Information Gain Ratio •  Modification of the Information Gain that reduces the bias toward highly-branching attributes •  Information Gain Ratio should be ! Large when data is evenly spread ! Small when all data belong to one branch •  Information Gain Ratio takes number and size of branches into account when choosing an attribute •  It corrects the information gain by taking the intrinsic information of a split into account 35
  • 36. Prof. Pier Luca Lanzi Information Gain Ratio and ! Intrinsic information •  Intrinsic information! ! ! ! ! computes the entropy of distribution of instances into branches •  Information Gain Ratio normalizes Information Gain by 36
  • 37. Prof. Pier Luca Lanzi Computing the Information Gain Ratio •  The intrinsic information for ID code is •  Importance of attribute decreases as intrinsic information gets larger •  The Information gain ratio of “ID code”, 37
  • 38. Prof. Pier Luca Lanzi Information Gain Ratio for Weather Data 38 Outlook Temperature Info: 0.693 Info: 0.911 Gain: 0.940-0.693 0.247 Gain: 0.940-0.911 0.029 Split info: info([5,4,5]) 1.577 Split info: info([4,6,4]) 1.362 Gain ratio: 0.247/1.577 0.156 Gain ratio: 0.029/1.362 0.021 Humidity Windy Info: 0.788 Info: 0.892 Gain: 0.940-0.788 0.152 Gain: 0.940-0.892 0.048 Split info: info([7,7]) 1.000 Split info: info([8,6]) 0.985 Gain ratio: 0.152/1 0.152 Gain ratio: 0.048/0.985 0.049
  • 39. Prof. Pier Luca Lanzi More on Information Gain Ratio •  “outlook” still comes out top, however “ID code” has greater Information Gain Ratio •  The standard fix is an ad-hoc test to prevent splitting on that type of attribute •  First, only consider attributes with greater than average Information Gain; Then, compare them using the Information Gain Ratio •  Information Gain Ratio may overcompensate and choose an attribute just because its intrinsic information is very low 39
  • 40. Prof. Pier Luca Lanzi What if Attributes are Numerical?
  • 41. Prof. Pier Luca Lanzi The Weather Dataset (Numerical) Outlook Temp Humidity Windy Play Sunny 85 85 False No Sunny 80 90 True No Overcast 83 78 False Yes Rainy 70 96 False Yes Rainy 68 80 False Yes Rainy 65 70 True No Overcast 64 65 True Yes Sunny 72 95 False No Sunny 69 70 False Yes Rainy 75 80 False Yes Sunny 75 70 True Yes Overcast 72 90 True Yes Overcast 81 75 False Yes Rainy 71 80 True No 41
  • 42. Prof. Pier Luca Lanzi The Temperature Attribute •  First, sort the temperature values, including the class labels •  Then, check all the cut points and choose the one with the best information gain •  E.g. temperature < 71.5: yes/4, no/2! temperature ≥ 71.5: yes/5, no/3 •  Info([4,2],[5,3]) = 6/14 info([4,2]) + 8/14 info([5,3]) = 0.939 •  Place split points halfway between values •  Can evaluate all split points in one pass! 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No
  • 43. Prof. Pier Luca Lanzi The Information Gain for Humidity Humidity' Play' 65 Yes' 70 No' 70 Yes' 70 Yes' 75 Yes' 78 Yes' 80 Yes' 80 Yes' 80 No' 85 No' 90 No' 90 Yes' 95 No' 96 Yes' 43 Humidity' Play' 65 Yes' 70 No' 70 Yes' 70 Yes' 75 Yes' 78 Yes' 80 Yes' 80 Yes' 80 No' 85 No' 90 No' 90 Yes' 95 No' 96 Yes' sort the attribute values compute the gain for every possible split what is the information gain if we split here?
  • 44. Prof. Pier Luca Lanzi Information Gain for Humidity Humidity Play # ofYes % ofYes # of No % of No Weight Entropy Left # of Yes % ofYes # of No % of No Weight Entropy Right Information Gain 65 Yes 1 100.00% 0 0.00% 7.14% 0.00 8.00 0.62 5.00 0.38 92.86% 0.96 0.0477 70 No 1 50.00% 1 50.00% 14.29% 1.00 8.00 0.67 4.00 0.33 85.71% 0.92 0.0103 70 Yes 2 66.67% 1 33.33% 21.43% 0.92 7.00 0.64 4.00 0.36 78.57% 0.95 0.0005 70 Yes 3 75.00% 1 25.00% 28.57% 0.81 6.00 0.60 4.00 0.40 71.43% 0.97 0.0150 75 Yes 4 80.00% 1 20.00% 35.71% 0.72 5.00 0.56 4.00 0.44 64.29% 0.99 0.0453 78 Yes 5 83.33% 1 16.67% 42.86% 0.65 4.00 0.50 4.00 0.50 57.14% 1.00 0.0903 80 Yes 6 85.71% 1 14.29% 50.00% 0.59 3.00 0.43 4.00 0.57 50.00% 0.99 0.1518 80 Yes 7 87.50% 1 12.50% 57.14% 0.54 2.00 0.33 4.00 0.67 42.86% 0.92 0.2361 80 No 7 77.78% 2 22.22% 64.29% 0.76 2.00 0.40 3.00 0.60 35.71% 0.97 0.1022 85 No 7 70.00% 3 30.00% 71.43% 0.88 2.00 0.50 2.00 0.50 28.57% 1.00 0.0251 90 No 7 63.64% 4 36.36% 78.57% 0.95 2.00 0.67 1.00 0.33 21.43% 0.92 0.0005 90 Yes 8 66.67% 4 33.33% 85.71% 0.92 1.00 0.50 1.00 0.50 14.29% 1.00 0.0103 95 No 8 61.54% 5 38.46% 92.86% 0.96 1.00 1.00 0.00 0.00 7.14% 0.00 0.0477 96 Yes 9 64.29% 5 35.71% 100.00% 0.94 0.00 0.00 0.00 0.00 0.00% 0.00 0.0000 44
  • 45. Prof. Pier Luca Lanzi Otherwise, we can discretize…
  • 46. Prof. Pier Luca Lanzi Discretization •  Processing for converting an interval/numerical variable into a a finite (discrete) set of elements (labels) •  A discretization algorithm computes a series of cut-points which defines a set of intervals that are mapped into labels •  Motivations: many methods (also in mathematics and computer science) cannot deal with numerical variables. In these cases, discretization is required to be able to use them, despite the loss of information •  Several approaches ! Supervised vs unsupervised ! Static vs dynamic 46
  • 47. Prof. Pier Luca Lanzi Unsupervised vs Supervised Discretization •  Unsupervised discretization just uses the attribute values, for instance, discretize humidity as <70, 70-79, >=80) •  Supervised discretization also uses the class attribute to generate interval with lower entropy •  Example using humidity ! Values alone may suggest <60, 60-70 ! Considering the class value might suggest different intervals! ! ! ! by grouping values to maintain information about the class attribute 47 64 65 68 69 70 71 72 72 75 75 80 81 83 85 64 65 68 69 70 71 72 72 75 75 80 81 83 85 Yes | No | Yes Yes Yes | No No | Yes | Yes Yes | No | Yes Yes | No
  • 48. Prof. Pier Luca Lanzi Example of Unsupervised and Supervised Discretization for Humidity 48 Unsupervised Discretization Supervised Discretization
  • 49. Prof. Pier Luca Lanzi The Gini Index
  • 50. Prof. Pier Luca Lanzi Another Splitting Criteria: ! The Gini Index •  The gini index, for a data set T contains examples from n classes, is defined as! ! ! ! ! where pj is the relative frequency of class j in T •  gini(T) is minimized if the classes in T are skewed. 50
  • 51. Prof. Pier Luca Lanzi The Gini Index vs Entropy 51
  • 52. Prof. Pier Luca Lanzi The Gini Index •  If a data set D is split on A into two subsets D1 and D2, then, •  The reduction of impurity is defined as, •  The attribute provides the smallest gini splitting D over A (or the largest reduction in impurity) is chosen to split the node (need to enumerate all the possible splitting points for each attribute) 52
  • 53. Prof. Pier Luca Lanzi The Gini Index: Example •  D has 9 tuples labeled “yes” and 5 labeled “no” •  Suppose the attribute income partitions D into 10 in D1 branching on low and medium and 4 in D2 •  but gini{medium,high} is 0.30 and thus the best since it is the lowest 53
  • 55. Prof. Pier Luca Lanzi Generalization and Overfitting in DecisionTrees
  • 57. Prof. Pier Luca Lanzi Avoiding Overfitting in Decision Trees •  The generated tree may overfit the training data •  Too many branches, some may reflect anomalies! due to noise or outliers •  Result is in poor accuracy for unseen samples •  Two approaches to avoid overfitting ! Prepruning ! Postpruning 57
  • 58. Prof. Pier Luca Lanzi Pre-pruning vs Post-pruning •  Prepruning ! Halt tree construction early ! Do not split a node if this would result in the goodness measure falling below a threshold ! Difficult to choose an appropriate threshold •  Postpruning ! Remove branches from a “fully grown” tree ! Get a sequence of progressively pruned trees ! Use a set of data different from the training data to decide which is the “best pruned tree” 58
  • 59. Prof. Pier Luca Lanzi Pre-Pruning •  Based on statistical significance test •  Stop growing the tree when there is no statistically significant association between any attribute and the class at a particular node •  Early stopping and halting ! No individual attribute exhibits any interesting information about the class ! The structure is only visible in fully expanded tree ! Pre-pruning won’t expand the root node 59
  • 60. Prof. Pier Luca Lanzi Post-Pruning •  First, build full tree, then prune it •  Fully-grown tree shows all attribute interactions •  Problem: some subtrees might be due to chance effects •  Two pruning operations ! Subtree raising ! Subtree replacement •  Possible strategies ! Error estimation ! Significance testing ! MDL principle 60
  • 61. Prof. Pier Luca Lanzi Subtree Raising •  Delete node •  Redistribute instances •  Slower than subtree replacement 61 X
  • 62. Prof. Pier Luca Lanzi Subtree Replacement •  Works bottom-up •  Consider replacing a tree! only after considering ! all its subtrees 62
  • 63. Prof. Pier Luca Lanzi Estimating Error Rates •  Prune only if it reduces the estimated error •  Error on the training data is NOT a useful estimator! (Why it would result in very little pruning?) •  A hold-out set might be kept for pruning! (“reduced-error pruning”) •  Example (C4.5’s method) ! Derive confidence interval from training data ! Use a heuristic limit, derived from this, for pruning ! Standard Bernoulli-process-based method ! Shaky statistical assumptions (based on training data) 63
  • 64. Prof. Pier Luca Lanzi Mean and Variance •  Mean and variance for a Bernoulli trial: p, p (1–p) •  Expected error rate f=S/N has mean p and variance p(1–p)/N •  For large enough N, f follows a Normal distribution •  c% confidence interval [–z ≤ X ≤ z] for random variable with 0 mean is given by: •  With a symmetric distribution, 64
  • 65. Prof. Pier Luca Lanzi Confidence Limits • Confidence limits for the normal distribution with 0 mean and a variance of 1, • Thus, • To use this we have to reduce our random variable f to have 0 mean and unit variance 65 Pr[X ≥ z] z 0.1% 3.09 0.5% 2.58 1% 2.33 5% 1.65 10% 1.28 20% 0.84 25% 0.69 40% 0.25
  • 66. Prof. Pier Luca Lanzi C4.5’s Pruning Method •  Given the error f on the training data, the upper bound for the error estimate for a node is computed as •  If c = 25% then z = 0.69 (from normal distribution) ! f is the error on the training data ! N is the number of instances covered by the leaf 66 ! " # $ % & + ! ! " # $ $ % & +−++= N z N z N f N f z N z fe 2 2 222 1 42
  • 67. Prof. Pier Luca Lanzi f=0.33 e=0.47 f=0.5 e=0.72 f=0.33 e=0.47 f = 5/14 e = 0.46 e < 0.51 so prune! Combined using ratios 6:2:6 gives 0.51
  • 68. Prof. Pier Luca Lanzi Examples Using KNIME
  • 69. Prof. Pier Luca Lanzi pruned tree (Gain ratio)
  • 70. Prof. Pier Luca Lanzi pruned tree (Gini index)
  • 71. Prof. Pier Luca Lanzi RegressionTrees
  • 72. Prof. Pier Luca Lanzi Regression Trees for Prediction •  Decision trees can also be used to predict the value of a numerical target variable •  Regression trees work similarly to decision trees, by analyzing several splits attempted, choosing the one that minimizes impurity •  Differences from Decision Trees ! Prediction is computed as the average of numerical target variable in the subspace (instead of the majority vote) ! Impurity is measured by sum of squared deviations from leaf mean (instead of information-based measures) ! Find split that produces greatest separation in [y – E(y)]2 ! Find nodes with minimal within variance and therefore ! Performance is measured by RMSE (root mean squared error) 72
  • 73. Prof. Pier Luca Lanzi 0 0 32 128 CHMAX 0 0 8 16 CHMIN Channels PerformanceCache (Kb) Main memory (Kb) Cycle time (ns) 45040001000480209 67328000512480208 … 26932320008000292 19825660002561251 PRPCACHMMAXMMINMYCT PRP = -55.9 + 0.0489 MYCT + 0.0153 MMIN + 0.0056 MMAX + 0.6410 CACH - 0.2700 CHMIN + 1.480 CHMAX
  • 74. Prof. Pier Luca Lanzi Mining Regression Trees in R # Regression Tree Example library(rpart) library(MASS) data(cpus) # grow tree fit <- rpart(perf~syct + mmin + mmax + cach + chmin + chmax, method="anova", data=cpus) # create attractive postcript plot of tree post(fit, file = "CPUPerformanceRegressionTree.ps", title = "Regression Tree for CPU Performance Dataset") 74
  • 75. Prof. Pier Luca Lanzi | mmax< 2.8e+04 cach< 27 cach< 96.5 cach< 80 mmax>=2.8e+04 cach>=27 cach>=96.5 cach>=80 105.62 n=209 60.72 n=182 39.638 n=141 133.22 n=41 114.44 n=34 224.43 n=7 408.26 n=27 299.21 n=19 667.25 n=8 Regression Tree for CPU Performance Dataset
  • 76. Prof. Pier Luca Lanzi Summary
  • 77. Prof. Pier Luca Lanzi Summary •  Decision tree construction is a recursive procedure involving ! The selection of the best splitting attribute ! (and thus) a selection of an adequate purity measure (Information Gain, Information Gain Ratio, Gini Index) ! Pruning to avoid overfitting •  Information Gain is biased toward highly splitting attributes •  Information Gain Ratio takes the number of splits that an attribute induces into the picture but might not be enough 77
  • 78. Prof. Pier Luca Lanzi Suggested Homework •  Try building the tree for the nominal version of the Weather dataset using the Gini Index •  Compute the best upper bound you can for the computational complexity of the decision tree building •  Check the problems given in the previous year exams on the use of Information Gain, Gain Ratio and Gini index, solve them and check the results using Weka. 78