SlideShare a Scribd company logo
1 of 33
Download to read offline
Chapter ML:III
III. Decision Trees
K Decision Trees Basics
K Impurity Functions
K Decision Tree Algorithms
K Decision Tree Pruning
ML:III-1 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Specification of Classification Problems [ML Introduction]
Characterization of the model (model world):
K X is a set of feature vectors, also called feature space.
K C is a set of classes.
K c : X → C is the ideal classifier for X.
K D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × C is a set of examples.
ML:III-2 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Decision Tree for the Concept “EnjoySport”
Example Sky Temperature Humidity Wind Water Forecast EnjoySport
1 sunny warm normal strong warm same yes
2 sunny warm high strong warm same yes
3 rainy cold high strong warm change no
4 sunny warm high strong cool change yes
attribute: Sky
attribute: Temperature attribute: Wind
cold warm strong weak
sunny rainy
cloudy
label: yes
label: yes label: no label: yeslabel: no
ML:III-3 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Decision Tree for the Concept “EnjoySport”
Example Sky Temperature Humidity Wind Water Forecast EnjoySport
1 sunny warm normal strong warm same yes
2 sunny warm high strong warm same yes
3 rainy cold high strong warm change no
4 sunny warm high strong cool change yes
attribute: Sky
attribute: Temperature attribute: Wind
cold warm strong weak
sunny rainy
cloudy
label: yes
label: yes label: no label: yeslabel: no
Partitioning of X at the root node:
X = {x ∈ X : x|Sky = sunny} ∪ {x ∈ X : x|Sky = cloudy} ∪ {x ∈ X : x|Sky = rainy}
ML:III-4 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 1 (Splitting)
Let X be feature space and let D be a set of examples. A splitting of X is a
partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs
with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j .
A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj,
j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}.
ML:III-5 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 1 (Splitting)
Let X be feature space and let D be a set of examples. A splitting of X is a
partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs
with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j .
A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj,
j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}.
A splitting depends on the measurement scale of a feature:
A
x1
x2
x3
.
.
.
xp
x =
{a1 , a2 , a3 , ... , am }
domain(A)
x = x3A
ML:III-6 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 1 (Splitting)
Let X be feature space and let D be a set of examples. A splitting of X is a
partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs
with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j .
A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj,
j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}.
A splitting depends on the measurement scale of a feature:
1. m-ary splitting induced by a (nominal) feature A with finite domain:
A = {a1, . . . , am} : X = {x ∈ X : x|A = a1} ∪ . . . ∪ {x ∈ X : x|A = am}
ML:III-7 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 1 (Splitting)
Let X be feature space and let D be a set of examples. A splitting of X is a
partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs
with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j .
A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj,
j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}.
A splitting depends on the measurement scale of a feature:
1. m-ary splitting induced by a (nominal) feature A with finite domain:
A = {a1, . . . , am} : X = {x ∈ X : x|A = a1} ∪ . . . ∪ {x ∈ X : x|A = am}
2. Binary splitting induced by a (nominal) feature A:
A ⊂ A : X = {x ∈ X : x|A ∈ A } ∪ {x ∈ X : x|A ∈ A }
3. Binary splitting induced by an ordinal feature A:
v ∈ dom(A) : X = {x ∈ X : x|A v} ∪ {x ∈ X : x|A v}
ML:III-8 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K The syntax x|A denotes the projection operator, which returns that vector component
(dimension) of x = (x1, . . . , xp) that is associated with A. Without loss of generality this
projection can be presumed being unique.
K A splitting of X into two disjoint, non-empty subsets is called a binary splitting.
K We consider only splittings of X that are induced by a splitting of a single feature A of X.
Keyword: monothetic splitting
ML:III-9 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 2 (Decision Tree)
Let X be feature space and let C be a set of classes. A decision tree T for X and C
is a finite tree with a distinguished root node. A non-leaf node t of T has assigned
(1) a set X(t) ⊆ X, (2) a splitting of X(t), and (3) a one-to-one mapping of the
subsets of the splitting to its successors.
X(t) = X iff t is root node. A leaf node of T has assigned a class from C.
ML:III-10 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Definition 2 (Decision Tree)
Let X be feature space and let C be a set of classes. A decision tree T for X and C
is a finite tree with a distinguished root node. A non-leaf node t of T has assigned
(1) a set X(t) ⊆ X, (2) a splitting of X(t), and (3) a one-to-one mapping of the
subsets of the splitting to its successors.
X(t) = X iff t is root node. A leaf node of T has assigned a class from C.
Classification of some x ∈ X given a decision tree T:
1. Find the root node of T.
2. If t is a non-leaf node, find among its successors that node whose subset of
the splitting of X(t) contains x. Repeat this step.
3. If t is a leaf node, label x with the respective class.
§ The set of possible decision trees forms the hypothesis space H.
ML:III-11 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K The classification of an x ∈ X determines a unique path from the root node of T to some leaf
node of T.
K At each non-leaf node a particular feature of x is evaluated in order to find the next node
along with a possible next feature to be analyzed.
K Each path from the root node to some leaf node corresponds to a conjunction of feature
values, which are successively tested. This test can be formulated as a decision rule.
Example:
IF Sky=rainy AND Wind=weak THEN EnjoySport=yes
If all tests in T are of the kind shown in the example, namely, an equality test regarding a
feature value, all feature domains must be finite.
K If in all non-leaf nodes of T only one feature is evaluated at a time, T is called a monothetic
decision tree. Examples for polythetic decision trees are the so-called oblique decision trees.
K Decision trees became popular in 1986, with the introduction of the ID3 Algorithm by
J. R. Quinlan.
ML:III-12 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Notation
Let T be decision tree for X and C, let D be a set of examples, and let t be a node
of T. Then we agree on the following notation:
K X(t) denotes the subset of the feature space X that is represented by t.
(as used in the decision tree definition)
K D(t) denotes the subset of the example set D that is represented by t, where
D(t) = {(x, c(x)) ∈ D | x ∈ X(t)}. (see the splitting definition)
Illustration:
t1
t3t2 D(t2)
c1 c2
c3
c3 c1
c1
D(t3)
t5t4 D(t5)D(t4)
D(t1)
D(t1)
D(t2) D(t3)
D(t5)
D(t6)t6
D(t6)
ML:III-13 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K The set X(t) is comprised of those members x of X that are filtered by a path from the root
node of T to the node t.
K leaves(T) denotes the set of all leaf nodes of T.
K A single node t of a decision tree T, and hence T itself, encode a piecewise constant
function. This way, t as well as T can form complex, non-linear classifiers. The functions
encoded by t and T differ in the number of evaluated features of x, which is one for t and the
tree height for T.
K In the following we will use the symbols “t” and “T” to denote also the classifiers that are
encoded by a node t and a tree T respectively:
t, T : X → C (instead of yt, yT : X → C)
ML:III-14 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Algorithm Template: Construction
Algorithm: DT-construct Decision Tree Construction
Input: D (Sub)set of examples.
Output: t Root node of a decision (sub)tree.
DT-construct(D)
1. t = newNode()
label(t) = representativeClass(D)
2. IF impure(D)
THEN criterion = splitCriterion(D)
ELSE return(t)
3. {D1, . . . , Ds} = decompose(D, criterion)
4. FOREACH D IN {D1, . . . , Ds} DO
addSuccessor(t, DT-construct(D ))
ENDDO
5. return(t)
[Illustration]
ML:III-15 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Algorithm Template: Classification
Algorithm: DT-classify Decision Tree Classification
Input: x Feature vector.
t Root node of a decision (sub)tree.
Output: y(x) Class of feature vector x in the decision (sub)tree below t.
DT-classify(x, t)
1. IF isLeafNode(t)
THEN return(label(t))
ELSE return(DT-classify(x, splitSuccessor(t, x))
ML:III-16 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K Since DT-construct assigns to each node of a decision tree T a class, each subtree of T (as
well as each pruned version of a subtree of T) represents a valid decision tree on its own.
K Functions of DT-construct:
– representativeClass(D)
Returns a representative class for the example set D. Note that, due to pruning, each
node may become a leaf node.
– impure(D)
Evaluates the (im)purity of a set D of examples.
– splitCriterion(D)
Returns a split criterion for X(t) based on the examples in D(t).
– decompose(D, criterion)
Returns a splitting of D according to criterion.
– addSuccessor(t, t )
Inserts the successor t for node t.
K Functions of DT-classify:
– isLeafNode(t)
Tests whether t is a leaf node.
– splitSuccessor(t, x)
Returns the (unique) successor t of t for which x ∈ X(t ) holds.
ML:III-17 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
When to Use Decision Trees
Problem characteristics that may suggest a decision tree classifier:
K the objects can be described by feature-value combinations.
K the domain and range of the target function are discrete
K hypotheses take the form of disjunctions
K the training set contains noise
Selected application areas:
K medical diagnosis
K fault detection in technical systems
K risk analysis for credit approval
K basic scheduling tasks such as calendar management
K classification of design flaws in software engineering
ML:III-18 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
On the Construction of Decision Trees
K How to exploit an example set both efficiently and effectively?
K According to what rationale should a node become a leaf node?
K How to assign a class for nodes of impure example sets?
K How to evaluate decision tree performance?
ML:III-19 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees
1. Size
Among those theories that can explain an observation, the most simple one is
to be preferred (Ockham’s Razor) :
Entia non sunt multiplicanda sine necessitate.
[Johannes Clauberg 1622-1665]
Here: among all decision trees of minimum classification error we choose the
one of smallest size.
2. Classification error
Quantifies the rigor according to which a class label is assigned to x in a leaf
node of T, based on the examples in D.
If all leaf nodes of a decision tree T represent a single example of D, the
classification error of T with respect to D is zero.
ML:III-20 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees
1. Size
Among those theories that can explain an observation, the most simple one is
to be preferred (Ockham’s Razor) :
Entia non sunt multiplicanda sine necessitate.
[Johannes Clauberg 1622-1665]
Here: among all decision trees of minimum classification error we choose the
one of smallest size.
2. Classification error
Quantifies the rigor according to which a class label is assigned to x in a leaf
node of T, based on the examples in D.
If all leaf nodes of a decision tree T represent a single example of D, the
classification error of T with respect to D is zero.
ML:III-21 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Size
K Leaf node number
The leaf node number corresponds to number of rules that are encoded in a
decision tree.
K Tree height
The tree height corresponds to the maximum rule length and bounds the
number of premises to be evaluated to reach a class decision.
K External path length
The external path length totals the lengths of all paths from the root of a tree
to its leaf nodes. It corresponds to the space to store all rules that are
encoded in a decision tree.
K Weighted external path length
The weighted external path length is defined as the external path length with
each length value weighted by the number of examples in D that are
classified by this path.
ML:III-22 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Size
K Leaf node number
The leaf node number corresponds to number of rules that are encoded in a
decision tree.
K Tree height
The tree height corresponds to the maximum rule length and bounds the
number of premises to be evaluated to reach a class decision.
K External path length
The external path length totals the lengths of all paths from the root of a tree
to its leaf nodes. It corresponds to the space to store all rules that are
encoded in a decision tree.
K Weighted external path length
The weighted external path length is defined as the external path length with
each length value weighted by the number of examples in D that are
classified by this path.
ML:III-23 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Size (continued)
The following trees correctly classify all examples in D :
attribute: color
attribute: size label: eatable 2x
small large
red brown
green
label: eatable 1x
label: eatable 1xlabel: toxic 1x
attribute: size
small large
label: eatable 2xattribute: points
yes no
(a)
label: eatable 2xlabel: toxic 1x
(b)
Criterion (a) (b)
Leaf node number 4 3
Tree height 2 2
External path length 6 5
Weighted external path length 7 8
ML:III-24 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Size (continued)
Theorem 3 (External Path Length Bound)
The problem to decide for a set of examples D whether or not a decision tree exists
whose external path length is bounded by b, is NP-complete.
ML:III-25 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Classification Error
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined
as follows:
label(t) = argmax
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
ML:III-26 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Classification Error
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined
as follows:
label(t) = argmax
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
Misclassification rate of node classifier t wrt. D(t) :
Err(t, D(t)) =
|{(x, c(x)) ∈ D(t) : c(x) = label(t)}|
|D(t)|
= 1 − max
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
ML:III-27 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Classification Error
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined
as follows:
label(t) = argmax
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
Misclassification rate of node classifier t wrt. D(t) :
Err(t, D(t)) =
|{(x, c(x)) ∈ D(t) : c(x) = label(t)}|
|D(t)|
= 1 − max
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
Misclassification rate of decision tree classifier T wrt. D(t) :
Err(T, D) =
t ∈ leaves (T)
|D(t)|
|D|
· Err(t, D(t))
ML:III-28 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K Observe the difference between max(f) and argmax(f). Both expressions maximize f, but
the former returns the maximum f-value (the image) while the latter returns the argument
(the preimage) for which f becomes maximum:
– max
c ∈C
(f(c)) = max{f(c) | c ∈ C}
– argmax
c ∈C
(f(c)) = c∗
⇒ f(c∗
) = max
c ∈C
(f(c))
K The classifiers t and T may not have been constructed using D(t) as training data. I.e., the
example set D(t) is in the role of a holdout test set.
K The true misclassification rate Err∗
(T) is based on a probability measure P on X × C (and
not on relative frequencies). For a node t of T this probability becomes minimum iff:
label(t) = argmax
c ∈C
P(c | X(t))
K If D has been used as training set, a reliable interpretation of the (training) error Err(T, D) in
terms of Err∗
(T) requires the Inductive Learning Hypothesis to hold. This implies, among
others, that the distribution of C over the feature space X corresponds to the distribution of C
over the training set D.
ML:III-29 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Misclassification Costs
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. In addition, we are given a cost measure for the
misclassification. Then, the class that is assigned to t, label(t), is defined as follows:
label(t) = argmin
c ∈C c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
· cost(c | c)
ML:III-30 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Misclassification Costs
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. In addition, we are given a cost measure for the
misclassification. Then, the class that is assigned to t, label(t), is defined as follows:
label(t) = argmin
c ∈C c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
· cost(c | c)
Misclassification costs of node classifier t wrt. D(t) :
Errcost(t, D(t)) =
1
|Dt|
·
(x,c(x))∈D(t)
cost(label(t) | c(x)) = min
c ∈C
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
·cost(c | c)
ML:III-31 Decision Trees © STEIN/LETTMANN 2005-2017
Decision Trees Basics
Performance of Decision Trees: Misclassification Costs
Given a decision tree T, a set of examples D, and a node t of T that represents the
example subset D(t) ⊆ D. In addition, we are given a cost measure for the
misclassification. Then, the class that is assigned to t, label(t), is defined as follows:
label(t) = argmin
c ∈C c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
· cost(c | c)
Misclassification costs of node classifier t wrt. D(t) :
Errcost(t, D(t)) =
1
|Dt|
·
(x,c(x))∈D(t)
cost(label(t) | c(x)) = min
c ∈C
c ∈C
|{(x, c(x)) ∈ D(t) : c(x) = c}|
|D(t)|
·cost(c | c)
Misclassification costs of decision tree classifier T wrt. D(t) :
Errcost(T, D) =
t ∈ leaves (T)
|D(t)|
|D|
· Errcost(t, D(t))
ML:III-32 Decision Trees © STEIN/LETTMANN 2005-2017
Remarks:
K Again, observe the difference between min(f) and argmin(f). Both expressions minimize f,
but the former returns the minimum f-value (the image) while the latter returns the argument
(the preimage) for which f becomes minimum.
ML:III-33 Decision Trees © STEIN/LETTMANN 2005-2017

More Related Content

What's hot

CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jierenjie ren
 
Integration and its basic rules and function.
Integration and its basic rules and function.Integration and its basic rules and function.
Integration and its basic rules and function.Kartikey Rohila
 
Lesson 12: Implicit Differentiation
Lesson 12: Implicit DifferentiationLesson 12: Implicit Differentiation
Lesson 12: Implicit DifferentiationMatthew Leingang
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3arogozhnikov
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6NTNU
 
Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCSR2011
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Matthew Leingang
 
Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCSR2011
 
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...TELKOMNIKA JOURNAL
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesMatthew Leingang
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...Alexander Decker
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2arogozhnikov
 
Applications of Differentiation
Applications of DifferentiationApplications of Differentiation
Applications of DifferentiationJoey Valdriz
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4arogozhnikov
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
Applications of maxima and minima
Applications of maxima and minimaApplications of maxima and minima
Applications of maxima and minimarouwejan
 
Cvpr2007 object category recognition p2 - part based models
Cvpr2007 object category recognition   p2 - part based modelsCvpr2007 object category recognition   p2 - part based models
Cvpr2007 object category recognition p2 - part based modelszukun
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1arogozhnikov
 

What's hot (20)

CommunicationComplexity1_jieren
CommunicationComplexity1_jierenCommunicationComplexity1_jieren
CommunicationComplexity1_jieren
 
Integration and its basic rules and function.
Integration and its basic rules and function.Integration and its basic rules and function.
Integration and its basic rules and function.
 
Lesson 12: Implicit Differentiation
Lesson 12: Implicit DifferentiationLesson 12: Implicit Differentiation
Lesson 12: Implicit Differentiation
 
Machine learning in science and industry — day 3
Machine learning in science and industry — day 3Machine learning in science and industry — day 3
Machine learning in science and industry — day 3
 
Conference poster 6
Conference poster 6Conference poster 6
Conference poster 6
 
Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigoriev
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
Csr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigorievCsr2011 june14 09_30_grigoriev
Csr2011 june14 09_30_grigoriev
 
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...
Using Alpha-cuts and Constraint Exploration Approach on Quadratic Programming...
 
Lesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum ValuesLesson 19: Maximum and Minimum Values
Lesson 19: Maximum and Minimum Values
 
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
MUMS Opening Workshop - Quantifying Nonparametric Modeling Uncertainty with B...
 
The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...The comparative study of finite difference method and monte carlo method for ...
The comparative study of finite difference method and monte carlo method for ...
 
Machine learning in science and industry — day 2
Machine learning in science and industry — day 2Machine learning in science and industry — day 2
Machine learning in science and industry — day 2
 
1 - Linear Regression
1 - Linear Regression1 - Linear Regression
1 - Linear Regression
 
Applications of Differentiation
Applications of DifferentiationApplications of Differentiation
Applications of Differentiation
 
Machine learning in science and industry — day 4
Machine learning in science and industry — day 4Machine learning in science and industry — day 4
Machine learning in science and industry — day 4
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Applications of maxima and minima
Applications of maxima and minimaApplications of maxima and minima
Applications of maxima and minima
 
Cvpr2007 object category recognition p2 - part based models
Cvpr2007 object category recognition   p2 - part based modelsCvpr2007 object category recognition   p2 - part based models
Cvpr2007 object category recognition p2 - part based models
 
Machine learning in science and industry — day 1
Machine learning in science and industry — day 1Machine learning in science and industry — day 1
Machine learning in science and industry — day 1
 

Similar to Machine Learning and Data Mining - Decision Trees

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationGautier Marti
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsIRJET Journal
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Matthew Leingang
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Matthew Leingang
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Mel Anthony Pepito
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusMatthew Leingang
 
The Chasm at Depth Four, and Tensor Rank : Old results, new insights
The Chasm at Depth Four, and Tensor Rank : Old results, new insightsThe Chasm at Depth Four, and Tensor Rank : Old results, new insights
The Chasm at Depth Four, and Tensor Rank : Old results, new insightscseiitgn
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 
Ireducible core and equal remaining obligations rule for mcst games
Ireducible core and equal remaining obligations rule for mcst gamesIreducible core and equal remaining obligations rule for mcst games
Ireducible core and equal remaining obligations rule for mcst gamesvinnief
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdfLibya Thomas
 
Complete l fuzzy metric spaces and common fixed point theorems
Complete l fuzzy metric spaces and  common fixed point theoremsComplete l fuzzy metric spaces and  common fixed point theorems
Complete l fuzzy metric spaces and common fixed point theoremsAlexander Decker
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesAlexander Decker
 

Similar to Machine Learning and Data Mining - Decision Trees (20)

Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
On Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond CorrelationOn Clustering Financial Time Series - Beyond Correlation
On Clustering Financial Time Series - Beyond Correlation
 
A Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point TheoremsA Generalized Metric Space and Related Fixed Point Theorems
A Generalized Metric Space and Related Fixed Point Theorems
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)
 
Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)Lesson 26: The Fundamental Theorem of Calculus (slides)
Lesson 26: The Fundamental Theorem of Calculus (slides)
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
metric spaces
metric spacesmetric spaces
metric spaces
 
Lesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of CalculusLesson 28: The Fundamental Theorem of Calculus
Lesson 28: The Fundamental Theorem of Calculus
 
The Chasm at Depth Four, and Tensor Rank : Old results, new insights
The Chasm at Depth Four, and Tensor Rank : Old results, new insightsThe Chasm at Depth Four, and Tensor Rank : Old results, new insights
The Chasm at Depth Four, and Tensor Rank : Old results, new insights
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 
Online Maths Assignment Help
Online Maths Assignment HelpOnline Maths Assignment Help
Online Maths Assignment Help
 
7 Trees.pptx
7 Trees.pptx7 Trees.pptx
7 Trees.pptx
 
Ireducible core and equal remaining obligations rule for mcst games
Ireducible core and equal remaining obligations rule for mcst gamesIreducible core and equal remaining obligations rule for mcst games
Ireducible core and equal remaining obligations rule for mcst games
 
Online Maths Assignment Help
Online Maths Assignment HelpOnline Maths Assignment Help
Online Maths Assignment Help
 
Frobenious theorem
Frobenious theoremFrobenious theorem
Frobenious theorem
 
Clustering Algorithms.pdf
Clustering Algorithms.pdfClustering Algorithms.pdf
Clustering Algorithms.pdf
 
Complete l fuzzy metric spaces and common fixed point theorems
Complete l fuzzy metric spaces and  common fixed point theoremsComplete l fuzzy metric spaces and  common fixed point theorems
Complete l fuzzy metric spaces and common fixed point theorems
 
A common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spacesA common fixed point theorem in cone metric spaces
A common fixed point theorem in cone metric spaces
 
i2ml3e-chap2.pptx
i2ml3e-chap2.pptxi2ml3e-chap2.pptx
i2ml3e-chap2.pptx
 

Recently uploaded

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaVirag Sontakke
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerunnathinaik
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxAnaBeatriceAblay2
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Celine George
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 

Recently uploaded (20)

Painted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of IndiaPainted Grey Ware.pptx, PGW Culture of India
Painted Grey Ware.pptx, PGW Culture of India
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 
internship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developerinternship ppt on smartinternz platform as salesforce developer
internship ppt on smartinternz platform as salesforce developer
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptxENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
ENGLISH5 QUARTER4 MODULE1 WEEK1-3 How Visual and Multimedia Elements.pptx
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
Incoming and Outgoing Shipments in 1 STEP Using Odoo 17
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 

Machine Learning and Data Mining - Decision Trees

  • 1. Chapter ML:III III. Decision Trees K Decision Trees Basics K Impurity Functions K Decision Tree Algorithms K Decision Tree Pruning ML:III-1 Decision Trees © STEIN/LETTMANN 2005-2017
  • 2. Decision Trees Basics Specification of Classification Problems [ML Introduction] Characterization of the model (model world): K X is a set of feature vectors, also called feature space. K C is a set of classes. K c : X → C is the ideal classifier for X. K D = {(x1, c(x1)), . . . , (xn, c(xn))} ⊆ X × C is a set of examples. ML:III-2 Decision Trees © STEIN/LETTMANN 2005-2017
  • 3. Decision Trees Basics Decision Tree for the Concept “EnjoySport” Example Sky Temperature Humidity Wind Water Forecast EnjoySport 1 sunny warm normal strong warm same yes 2 sunny warm high strong warm same yes 3 rainy cold high strong warm change no 4 sunny warm high strong cool change yes attribute: Sky attribute: Temperature attribute: Wind cold warm strong weak sunny rainy cloudy label: yes label: yes label: no label: yeslabel: no ML:III-3 Decision Trees © STEIN/LETTMANN 2005-2017
  • 4. Decision Trees Basics Decision Tree for the Concept “EnjoySport” Example Sky Temperature Humidity Wind Water Forecast EnjoySport 1 sunny warm normal strong warm same yes 2 sunny warm high strong warm same yes 3 rainy cold high strong warm change no 4 sunny warm high strong cool change yes attribute: Sky attribute: Temperature attribute: Wind cold warm strong weak sunny rainy cloudy label: yes label: yes label: no label: yeslabel: no Partitioning of X at the root node: X = {x ∈ X : x|Sky = sunny} ∪ {x ∈ X : x|Sky = cloudy} ∪ {x ∈ X : x|Sky = rainy} ML:III-4 Decision Trees © STEIN/LETTMANN 2005-2017
  • 5. Decision Trees Basics Definition 1 (Splitting) Let X be feature space and let D be a set of examples. A splitting of X is a partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j . A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj, j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}. ML:III-5 Decision Trees © STEIN/LETTMANN 2005-2017
  • 6. Decision Trees Basics Definition 1 (Splitting) Let X be feature space and let D be a set of examples. A splitting of X is a partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j . A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj, j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}. A splitting depends on the measurement scale of a feature: A x1 x2 x3 . . . xp x = {a1 , a2 , a3 , ... , am } domain(A) x = x3A ML:III-6 Decision Trees © STEIN/LETTMANN 2005-2017
  • 7. Decision Trees Basics Definition 1 (Splitting) Let X be feature space and let D be a set of examples. A splitting of X is a partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j . A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj, j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}. A splitting depends on the measurement scale of a feature: 1. m-ary splitting induced by a (nominal) feature A with finite domain: A = {a1, . . . , am} : X = {x ∈ X : x|A = a1} ∪ . . . ∪ {x ∈ X : x|A = am} ML:III-7 Decision Trees © STEIN/LETTMANN 2005-2017
  • 8. Decision Trees Basics Definition 1 (Splitting) Let X be feature space and let D be a set of examples. A splitting of X is a partitioning of X into mutually exclusive subsets X1, . . . , Xs. I.e., X = X1 ∪ . . . ∪ Xs with Xj = ∅ and Xj ∩ Xj = ∅, where j, j ∈ {1, . . . , s}, j = j . A splitting X1, . . . , Xs of X induces a splitting D1, . . . , Ds of D, where Dj, j = 1, . . . , s, is defined as {(x, c(x)) ∈ D | x ∈ Xj}. A splitting depends on the measurement scale of a feature: 1. m-ary splitting induced by a (nominal) feature A with finite domain: A = {a1, . . . , am} : X = {x ∈ X : x|A = a1} ∪ . . . ∪ {x ∈ X : x|A = am} 2. Binary splitting induced by a (nominal) feature A: A ⊂ A : X = {x ∈ X : x|A ∈ A } ∪ {x ∈ X : x|A ∈ A } 3. Binary splitting induced by an ordinal feature A: v ∈ dom(A) : X = {x ∈ X : x|A v} ∪ {x ∈ X : x|A v} ML:III-8 Decision Trees © STEIN/LETTMANN 2005-2017
  • 9. Remarks: K The syntax x|A denotes the projection operator, which returns that vector component (dimension) of x = (x1, . . . , xp) that is associated with A. Without loss of generality this projection can be presumed being unique. K A splitting of X into two disjoint, non-empty subsets is called a binary splitting. K We consider only splittings of X that are induced by a splitting of a single feature A of X. Keyword: monothetic splitting ML:III-9 Decision Trees © STEIN/LETTMANN 2005-2017
  • 10. Decision Trees Basics Definition 2 (Decision Tree) Let X be feature space and let C be a set of classes. A decision tree T for X and C is a finite tree with a distinguished root node. A non-leaf node t of T has assigned (1) a set X(t) ⊆ X, (2) a splitting of X(t), and (3) a one-to-one mapping of the subsets of the splitting to its successors. X(t) = X iff t is root node. A leaf node of T has assigned a class from C. ML:III-10 Decision Trees © STEIN/LETTMANN 2005-2017
  • 11. Decision Trees Basics Definition 2 (Decision Tree) Let X be feature space and let C be a set of classes. A decision tree T for X and C is a finite tree with a distinguished root node. A non-leaf node t of T has assigned (1) a set X(t) ⊆ X, (2) a splitting of X(t), and (3) a one-to-one mapping of the subsets of the splitting to its successors. X(t) = X iff t is root node. A leaf node of T has assigned a class from C. Classification of some x ∈ X given a decision tree T: 1. Find the root node of T. 2. If t is a non-leaf node, find among its successors that node whose subset of the splitting of X(t) contains x. Repeat this step. 3. If t is a leaf node, label x with the respective class. § The set of possible decision trees forms the hypothesis space H. ML:III-11 Decision Trees © STEIN/LETTMANN 2005-2017
  • 12. Remarks: K The classification of an x ∈ X determines a unique path from the root node of T to some leaf node of T. K At each non-leaf node a particular feature of x is evaluated in order to find the next node along with a possible next feature to be analyzed. K Each path from the root node to some leaf node corresponds to a conjunction of feature values, which are successively tested. This test can be formulated as a decision rule. Example: IF Sky=rainy AND Wind=weak THEN EnjoySport=yes If all tests in T are of the kind shown in the example, namely, an equality test regarding a feature value, all feature domains must be finite. K If in all non-leaf nodes of T only one feature is evaluated at a time, T is called a monothetic decision tree. Examples for polythetic decision trees are the so-called oblique decision trees. K Decision trees became popular in 1986, with the introduction of the ID3 Algorithm by J. R. Quinlan. ML:III-12 Decision Trees © STEIN/LETTMANN 2005-2017
  • 13. Decision Trees Basics Notation Let T be decision tree for X and C, let D be a set of examples, and let t be a node of T. Then we agree on the following notation: K X(t) denotes the subset of the feature space X that is represented by t. (as used in the decision tree definition) K D(t) denotes the subset of the example set D that is represented by t, where D(t) = {(x, c(x)) ∈ D | x ∈ X(t)}. (see the splitting definition) Illustration: t1 t3t2 D(t2) c1 c2 c3 c3 c1 c1 D(t3) t5t4 D(t5)D(t4) D(t1) D(t1) D(t2) D(t3) D(t5) D(t6)t6 D(t6) ML:III-13 Decision Trees © STEIN/LETTMANN 2005-2017
  • 14. Remarks: K The set X(t) is comprised of those members x of X that are filtered by a path from the root node of T to the node t. K leaves(T) denotes the set of all leaf nodes of T. K A single node t of a decision tree T, and hence T itself, encode a piecewise constant function. This way, t as well as T can form complex, non-linear classifiers. The functions encoded by t and T differ in the number of evaluated features of x, which is one for t and the tree height for T. K In the following we will use the symbols “t” and “T” to denote also the classifiers that are encoded by a node t and a tree T respectively: t, T : X → C (instead of yt, yT : X → C) ML:III-14 Decision Trees © STEIN/LETTMANN 2005-2017
  • 15. Decision Trees Basics Algorithm Template: Construction Algorithm: DT-construct Decision Tree Construction Input: D (Sub)set of examples. Output: t Root node of a decision (sub)tree. DT-construct(D) 1. t = newNode() label(t) = representativeClass(D) 2. IF impure(D) THEN criterion = splitCriterion(D) ELSE return(t) 3. {D1, . . . , Ds} = decompose(D, criterion) 4. FOREACH D IN {D1, . . . , Ds} DO addSuccessor(t, DT-construct(D )) ENDDO 5. return(t) [Illustration] ML:III-15 Decision Trees © STEIN/LETTMANN 2005-2017
  • 16. Decision Trees Basics Algorithm Template: Classification Algorithm: DT-classify Decision Tree Classification Input: x Feature vector. t Root node of a decision (sub)tree. Output: y(x) Class of feature vector x in the decision (sub)tree below t. DT-classify(x, t) 1. IF isLeafNode(t) THEN return(label(t)) ELSE return(DT-classify(x, splitSuccessor(t, x)) ML:III-16 Decision Trees © STEIN/LETTMANN 2005-2017
  • 17. Remarks: K Since DT-construct assigns to each node of a decision tree T a class, each subtree of T (as well as each pruned version of a subtree of T) represents a valid decision tree on its own. K Functions of DT-construct: – representativeClass(D) Returns a representative class for the example set D. Note that, due to pruning, each node may become a leaf node. – impure(D) Evaluates the (im)purity of a set D of examples. – splitCriterion(D) Returns a split criterion for X(t) based on the examples in D(t). – decompose(D, criterion) Returns a splitting of D according to criterion. – addSuccessor(t, t ) Inserts the successor t for node t. K Functions of DT-classify: – isLeafNode(t) Tests whether t is a leaf node. – splitSuccessor(t, x) Returns the (unique) successor t of t for which x ∈ X(t ) holds. ML:III-17 Decision Trees © STEIN/LETTMANN 2005-2017
  • 18. Decision Trees Basics When to Use Decision Trees Problem characteristics that may suggest a decision tree classifier: K the objects can be described by feature-value combinations. K the domain and range of the target function are discrete K hypotheses take the form of disjunctions K the training set contains noise Selected application areas: K medical diagnosis K fault detection in technical systems K risk analysis for credit approval K basic scheduling tasks such as calendar management K classification of design flaws in software engineering ML:III-18 Decision Trees © STEIN/LETTMANN 2005-2017
  • 19. Decision Trees Basics On the Construction of Decision Trees K How to exploit an example set both efficiently and effectively? K According to what rationale should a node become a leaf node? K How to assign a class for nodes of impure example sets? K How to evaluate decision tree performance? ML:III-19 Decision Trees © STEIN/LETTMANN 2005-2017
  • 20. Decision Trees Basics Performance of Decision Trees 1. Size Among those theories that can explain an observation, the most simple one is to be preferred (Ockham’s Razor) : Entia non sunt multiplicanda sine necessitate. [Johannes Clauberg 1622-1665] Here: among all decision trees of minimum classification error we choose the one of smallest size. 2. Classification error Quantifies the rigor according to which a class label is assigned to x in a leaf node of T, based on the examples in D. If all leaf nodes of a decision tree T represent a single example of D, the classification error of T with respect to D is zero. ML:III-20 Decision Trees © STEIN/LETTMANN 2005-2017
  • 21. Decision Trees Basics Performance of Decision Trees 1. Size Among those theories that can explain an observation, the most simple one is to be preferred (Ockham’s Razor) : Entia non sunt multiplicanda sine necessitate. [Johannes Clauberg 1622-1665] Here: among all decision trees of minimum classification error we choose the one of smallest size. 2. Classification error Quantifies the rigor according to which a class label is assigned to x in a leaf node of T, based on the examples in D. If all leaf nodes of a decision tree T represent a single example of D, the classification error of T with respect to D is zero. ML:III-21 Decision Trees © STEIN/LETTMANN 2005-2017
  • 22. Decision Trees Basics Performance of Decision Trees: Size K Leaf node number The leaf node number corresponds to number of rules that are encoded in a decision tree. K Tree height The tree height corresponds to the maximum rule length and bounds the number of premises to be evaluated to reach a class decision. K External path length The external path length totals the lengths of all paths from the root of a tree to its leaf nodes. It corresponds to the space to store all rules that are encoded in a decision tree. K Weighted external path length The weighted external path length is defined as the external path length with each length value weighted by the number of examples in D that are classified by this path. ML:III-22 Decision Trees © STEIN/LETTMANN 2005-2017
  • 23. Decision Trees Basics Performance of Decision Trees: Size K Leaf node number The leaf node number corresponds to number of rules that are encoded in a decision tree. K Tree height The tree height corresponds to the maximum rule length and bounds the number of premises to be evaluated to reach a class decision. K External path length The external path length totals the lengths of all paths from the root of a tree to its leaf nodes. It corresponds to the space to store all rules that are encoded in a decision tree. K Weighted external path length The weighted external path length is defined as the external path length with each length value weighted by the number of examples in D that are classified by this path. ML:III-23 Decision Trees © STEIN/LETTMANN 2005-2017
  • 24. Decision Trees Basics Performance of Decision Trees: Size (continued) The following trees correctly classify all examples in D : attribute: color attribute: size label: eatable 2x small large red brown green label: eatable 1x label: eatable 1xlabel: toxic 1x attribute: size small large label: eatable 2xattribute: points yes no (a) label: eatable 2xlabel: toxic 1x (b) Criterion (a) (b) Leaf node number 4 3 Tree height 2 2 External path length 6 5 Weighted external path length 7 8 ML:III-24 Decision Trees © STEIN/LETTMANN 2005-2017
  • 25. Decision Trees Basics Performance of Decision Trees: Size (continued) Theorem 3 (External Path Length Bound) The problem to decide for a set of examples D whether or not a decision tree exists whose external path length is bounded by b, is NP-complete. ML:III-25 Decision Trees © STEIN/LETTMANN 2005-2017
  • 26. Decision Trees Basics Performance of Decision Trees: Classification Error Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmax c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| ML:III-26 Decision Trees © STEIN/LETTMANN 2005-2017
  • 27. Decision Trees Basics Performance of Decision Trees: Classification Error Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmax c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| Misclassification rate of node classifier t wrt. D(t) : Err(t, D(t)) = |{(x, c(x)) ∈ D(t) : c(x) = label(t)}| |D(t)| = 1 − max c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| ML:III-27 Decision Trees © STEIN/LETTMANN 2005-2017
  • 28. Decision Trees Basics Performance of Decision Trees: Classification Error Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmax c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| Misclassification rate of node classifier t wrt. D(t) : Err(t, D(t)) = |{(x, c(x)) ∈ D(t) : c(x) = label(t)}| |D(t)| = 1 − max c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| Misclassification rate of decision tree classifier T wrt. D(t) : Err(T, D) = t ∈ leaves (T) |D(t)| |D| · Err(t, D(t)) ML:III-28 Decision Trees © STEIN/LETTMANN 2005-2017
  • 29. Remarks: K Observe the difference between max(f) and argmax(f). Both expressions maximize f, but the former returns the maximum f-value (the image) while the latter returns the argument (the preimage) for which f becomes maximum: – max c ∈C (f(c)) = max{f(c) | c ∈ C} – argmax c ∈C (f(c)) = c∗ ⇒ f(c∗ ) = max c ∈C (f(c)) K The classifiers t and T may not have been constructed using D(t) as training data. I.e., the example set D(t) is in the role of a holdout test set. K The true misclassification rate Err∗ (T) is based on a probability measure P on X × C (and not on relative frequencies). For a node t of T this probability becomes minimum iff: label(t) = argmax c ∈C P(c | X(t)) K If D has been used as training set, a reliable interpretation of the (training) error Err(T, D) in terms of Err∗ (T) requires the Inductive Learning Hypothesis to hold. This implies, among others, that the distribution of C over the feature space X corresponds to the distribution of C over the training set D. ML:III-29 Decision Trees © STEIN/LETTMANN 2005-2017
  • 30. Decision Trees Basics Performance of Decision Trees: Misclassification Costs Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. In addition, we are given a cost measure for the misclassification. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmin c ∈C c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| · cost(c | c) ML:III-30 Decision Trees © STEIN/LETTMANN 2005-2017
  • 31. Decision Trees Basics Performance of Decision Trees: Misclassification Costs Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. In addition, we are given a cost measure for the misclassification. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmin c ∈C c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| · cost(c | c) Misclassification costs of node classifier t wrt. D(t) : Errcost(t, D(t)) = 1 |Dt| · (x,c(x))∈D(t) cost(label(t) | c(x)) = min c ∈C c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| ·cost(c | c) ML:III-31 Decision Trees © STEIN/LETTMANN 2005-2017
  • 32. Decision Trees Basics Performance of Decision Trees: Misclassification Costs Given a decision tree T, a set of examples D, and a node t of T that represents the example subset D(t) ⊆ D. In addition, we are given a cost measure for the misclassification. Then, the class that is assigned to t, label(t), is defined as follows: label(t) = argmin c ∈C c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| · cost(c | c) Misclassification costs of node classifier t wrt. D(t) : Errcost(t, D(t)) = 1 |Dt| · (x,c(x))∈D(t) cost(label(t) | c(x)) = min c ∈C c ∈C |{(x, c(x)) ∈ D(t) : c(x) = c}| |D(t)| ·cost(c | c) Misclassification costs of decision tree classifier T wrt. D(t) : Errcost(T, D) = t ∈ leaves (T) |D(t)| |D| · Errcost(t, D(t)) ML:III-32 Decision Trees © STEIN/LETTMANN 2005-2017
  • 33. Remarks: K Again, observe the difference between min(f) and argmin(f). Both expressions minimize f, but the former returns the minimum f-value (the image) while the latter returns the argument (the preimage) for which f becomes minimum. ML:III-33 Decision Trees © STEIN/LETTMANN 2005-2017