By: Abdelfattah Al Zaqqa PSUT-AmmanJordan
Agenda







Introduction
AQ
ID3
C4.5
ILA

Al Zaqqa-PSUT
Introduction-Machine Learning




Machine learning is a branch of artificial
intelligence, concerns the construction and
study of systems that can learn from data.
Machine learning and Data mining
 Machine
 Known

 Data

learning prediction

properties learned from the training data.

mining discovery

 Previously

 With

unknown properties in the data.

overlapping

Al Zaqqa-PSUT
What is Decision Tree?


A decision tree is a tree in
which each branch node
represents a choice between a
number of alternatives, and
each leaf node represents a
decision.

Al Zaqqa-PSUT

Root Node:
Attribute

Edges:
Attribute Value

Leaf Node:
output, class
or decision
Introduction


ID3 (Iterative Dichotomiser 3) is an algorithm
invented by Ross Quinlan used to generate a
decision tree from a dataset using Shannon
Entropy.



Typically used in the machine learning and
natural language processing domains.

Al Zaqqa-PSUT
ID3 basics








ID3 employs Top_Down Induction of Decision
Tree (greedy algorithm)
Attribute selection is the fundamental step to
construct a decision tree.
Select which attribute will be selected to
become a node of the decision tree and so on.
There are two terms Entropy and Information
Gain is used to process attribute selection.

Al Zaqqa-PSUT
Entropy


Entropy H(S) is a measure of the amount of
uncertainty in the (data) set S

More uniform More information we can gain
More entropy More information we can gain
Al Zaqqa-PSUT
Entropy

set S
Positive

Negative

Entropy(S)= - P(positive)log2P(positive) P(negative)log2P(negative)
Al Zaqqa-PSUT
Information Gain


is the measure of the difference in entropy
from before to after the set is split on an
attribute .

Al Zaqqa-PSUT
Example
Outlook

Temperature

Humidity

Wind

Play ball

Sunny

Hot

High

Sunny

Hot

High

Overcast Hot

High

Weak

Yes

Mild High

Weak

Yes
Yes

Rain

Weak

No
Strong No

Rain

Cool

Normal Weak

Rain

Cool

Normal

Strong No

Cool

Normal

Strong

Overcast
Sunny

Mild High

Sunny

Cool

Weak

Yes
No

Sunny
Overcast

Yes

Mild

Normal Weak

Yes

Mild

Rain

Normal Weak

Normal

Overcast Hot
Rain

Yes

Strong

Mild High

Strong

Yes

Normal Weak
Mild High

Yes
Strong No
Total

Al Zaqqa-PSUT

14
Example-Dataset Elements
Outlook

Temperature

Humidity

Wind

Play ball

Sunny

Hot

High

Sunny

Hot

High

Overcast Hot

High

Weak

Yes

Mild High

Weak

Yes
Yes

Rain

Weak

No
Strong No

Rain

Cool

Normal Weak

Rain

Cool

Normal

Strong No

Cool

Normal

Strong

Overcast
Sunny

Mild High

Sunny

Cool

Weak

Yes
No

Sunny
Overcast

Yes

Mild

Normal Weak

Yes

Mild

Rain

Normal Weak

Normal

Overcast Hot
Rain

Yes

Strong

Mild High

Strong

Yes

Normal Weak
Mild High

Yes
Strong No
Total

Collection (S) All the records in the table refer as Collection
(S).

14

Al Zaqqa-PSUT
Example-Dataset Elements
Outlook

Temperature

Humidity

Wind

Play ball

Sunny

High

Sunny

Hot

High

Overcast Hot

Attributes

Hot

High

Weak

Yes

Mild High

Weak

Yes
Yes

Rain

Weak

No
Strong

No

Rain

Cool

Normal Weak

Rain

Cool

Normal

Strong

Cool

Normal

Strong Yes

Overcast
Sunny

Mild High

Sunny

Cool

No

Weak

No

Overcast

Normal Weak

Yes

Mild

Sunny

Yes

Mild

Rain

Normal Weak

Normal

Mild High

Overcast Hot
Rain

Strong Yes
Strong Yes

Normal Weak
Strong

No

Total

Class(C) or
Classifier:Play ball

Mild High

Yes

14

Because based on Outlook, Temperature, Humidity and
Wind we need to decide whether we can Play ball or
not, that’s why Play ball is a classifier to make decision.
Al Zaqqa-PSUT
ID3 Algorithm
1.

Compute Entropy(S) =
-(9/14)log2(9/14)-(5/14)log2(5/14)=0.940

2.

Compute information gain for each
attribute:


Gain(S,Windy) = Entropy(S)(8/14)Entropy(Sfalse) -(6/14)Entropy(Strue)
=0.048
Windy: Weak=8(6+,2-), Strong=6(3+,3-)
• Entropy(Sfalse)=-6/8Log2(6/8)-2/8Log2(2/8)=0.811
• Entropy(Strue) =-3/6Log2(3/6)-3/6Log2(3/6)=1
Gain(S,Windy) = 0.940-(8/14)(0.811)-(6/14)(1)=0.048
Al Zaqqa-PSUT
ID3 Algorithm
3.

Select attribute with the maximum
information gain for splitting:





Gain(S, Windy)=0.048
Gain(S, Humidity) =0.151
Gain(S, Temperature)=0.029
Gain(S, Outlook) = 0.246

Al Zaqqa-PSUT
ID3 Algorithm
4.

Apply ID3 to each child
node of this root, until leaf
node or node that has
entropy=0 are reached.

Al Zaqqa-PSUT
C4.5


C4.5 is an extension of Quinlan's earlier ID3
algorithm.
 Handling

both continuous and discrete attributes.
 Handling training data with missing attribute
values
 Pruning trees after creation.

Al Zaqqa-PSUT
Continuous-valued attributes
Outlook

Temperature

Humidity

Wind

Play ball

Sunny

Hot

0.9

Sunny

Hot

0.87

Overcast Hot

0.93

Weak

Yes

0.89

Weak

Yes

Weak

Yes

Rain

Mild

Weak

No
Strong No

Rain

Cool

0.80

Rain

Cool

0.59

Strong No

Cool

0.77

Strong

Overcast
Sunny

Mild

Yes

Weak

0.68

Weak

Yes

Mild

0.84

Weak

Yes

Mild

Sunny

0.91

0.72

Strong

Yes

Mild

0.49

Strong

Yes

Cool
Rain

Sunny

Overcast
Overcast Hot
Rain

0.74
Mild

No

Weak

0.86

Yes
Strong No
Total

Al Zaqqa-PSUT

14
Continuous-valued attributes
Humidity

Play ball

0.9

No

0.87

1. sort the numeric attribute values,
2. Identify adjacent examples that differ in
their target classification to pick the
threshold.

No

0.93

Yes

0.89

Yes

0.80

Yes

0.59

0.68

0.72

0.87

0.9

0.91

Humidity

yes

yes

no

no

no

No

0.77
0.91

Humidity

Yes
No

0.68

Yes

0.84

Yes

0.72

Yes

0.49

Yes

0.74

Humidity>(0.72+0.87)/2 Humidity>0.795

Yes

0.86

No

Al Zaqqa-PSUT
Continuous-valued attributes

Al Zaqqa-PSUT
Overfitting

“Under fitting”

“Just right”

“Over fitting”

Overfitting: If we have too many attributes(features) the
learned hypothesis may fit the training set very well, but
fail to generalize to new examples (Predict price on new
examples).

Al Zaqqa-PSUT
Overfitting

Al Zaqqa-PSUT
Overfitting

Al Zaqqa-PSUT
Why overfitting happens?


Presence of error in the
training examples. (In
general in machine learning).



When small numbers of
examples are associated
with leaf node.

Al Zaqqa-PSUT
Reduce Overfitting


Stop growing the tree earlier, before it
reaches the point where it perfectly
classifies the training data. (difficult)



Allow the tree to overfit the data, and
then post-prune the tree.

Al Zaqqa-PSUT
Rule post-pruning
(Outlook = Sunny " Humidity = Normal)  P
(Outlook = Sunny " Humidity = High)  N
(Outlook = Overcast)  P
(Outlook = Rain " Wind = Strong)  N
(Outlook = Rain " Wind = Weak)  P

Al Zaqqa-PSUT
Rule post-pruning
•Prune preconditions
Outlook

Temp

Humidity

Wind

Tennis

Rain

Low

High

Weak

No

Rain

Hot

High

Strong

No

(Outlook = Sunny " Humidity = High)  N
(Outlook = Sunny " Humidity = Normal)  P
(Outlook = Overcast)  P
(Outlook = Rain " Wind = Strong)  N
(Outlook = Rain " Wind = Weak)  P

Al Zaqqa-PSUT
Rule post-pruning
•Prune preconditions
Outlook

Temp

Humidity

Wind

Tennis

Rain

Low

High

Weak

No

Rain

Hot

High

Strong

No

(Outlook = Sunny " Humidity = High)  N
(Outlook = Sunny " Humidity = Normal)  P
(Outlook = Overcast)  P
(Outlook = Rain)  N
(Outlook = Rain " Wind = Weak)  P

New instances
Outlook

Humidity

Wind

Tennis

Sunny

Low

Low

Weak

yes

Rain
Al Zaqqa-PSUT

Temp
Hot

High

Weak

No
Rule post-pruning


Validation set
 Save a portion of the data for validation
Training set
s

Validation set

Test set

<= t, prune subtree


{s validation performance with subtree at node, t
validation set performance with leaf instead of subtree)

Rule post-pruning (Quinlan 1993)
 Can remove smaller elements than whole subtrees
 Improved readability
 Reduced-error pruning (Quinlan 1987)
…


Al Zaqqa-PSUT
Missing information


Example: Missing information in mammograph
data
BI-RAD Age

shape

Margin

Density Class

4

48

4

5

?

1

5

67

3

5

3

1

5

57

4

4

3

1

5

60

?

5

1

1

4

53

?

4

3

1

4

28

1

1

3

0

4

70

?

2

3

0

2

66

1

1

?

0

5

63

3

?

3

0

4

78

1

1

1

0

Al Zaqqa-PSUT
Missing information-according to
most common


Fill in the data according to most common
(given class)
BI-RAD Age

shape

Margin

Density Class

4

48

4

5

3

1

5

67

3

5

3

1

5

57

4

4

3

1

5

60

4

5

1

1

4

53

4

4

3

1

4

28

1

1

3

0

4

70

1

2

3

0

2

66

1

1

3

0

5

63

3

?

3

0

4

78

1

1

0

Al Zaqqa-PSUT
1
Missing information-according to
proportions
Fraction

BI-RAD

Age

shape

Margin

Density

Class

0.75

4

48

4

5

3

1

0.25

4

48

4

5

1

1

1

5

67

3

5

3

1

1

5

57

4

4

3

1

0.66

5

60

4

5

1

1

0.33

5

60

3

5

1

1

0.66

4

53

4

4

3

1

0.33

4

53

3

4

3

1

1

4

28

1

1

3

0

0.75

4

70

1

2

3

0

0.25

4

70

3

2

3

0

0.25

2

66

1

1

1

0

0.75

2

66

1

1

3

0

0.75

5

63

3

1

3

0

0.25

5

63

3

2

3

0

1

4

78

1

1

1

0

Al Zaqqa-PSUT

33/4
11/4
Summery




ID3, C4.5 :used to generate a decision tree
developed by Ross Quinlan typically used in the
machine learning and natural language
processing domains
ID3, C4.5: uses the entropy of an attribute and
picks the attribute with the highest reduction in
entropy to determine which attribute should the
data be split with first and then through a series of
recursive functions that calculate the entropy of
the node the process is continued until all the left
nodes are pure.
Al Zaqqa-PSUT
Id3,c4.5 algorithim

Id3,c4.5 algorithim

Editor's Notes

  • #10 to minimize the decisiontree depth, when we traverse the tree path, weneed to select the optimal attribute for splitting thetree node, which we can easily imply that theattribute with the most entropy reduction is thebest choice.