Decision Tree, Entropy
Md Saeed Siddik
Khaza Moinuddin Mazumder
Decision Tree
A decision tree is a decision support tool that
uses a tree and their possible consequences.
Decision Tree is a flow-chart like structure in which
internal node represents test on an attribute
each branch represents outcome of test
each leaf node represents class label (decision taken
after computing all attributes)
03/10/2013DT and Entropy2
Consists of DT
03/10/2013DT and Entropy3
 A decision tree consists of 3 types of nodes:
1.Decision nodes
2.Chance nodes
3.End nodes
Types of variables in DT
Four types of tree can generated from a variables.
Those are..
03/10/2013DT and Entropy4
Terminal
.
Both are Left side
/
Both are Right side

Separated in Both side
/
Decision Table
03/10/2013DT and Entropy5
Evidence Action Author Thread Length
e1 skip known new long
e2 read unknown new short
e3 skip unknown old long
e4 skip known old long
e5 read known new short
e6 skip known old long
Author
Length
Skip
Rea
d
Thread
read skip
Decision Tree
03/10/2013DT and Entropy6
Decision
03/10/2013DT and Entropy7
 Known ∧ Long ⇒ Skip
 Known ∧ Short ⇒ Read
 Unknown ∧ New ⇒ Read
 Unknown ∧ Old ⇒ Skip
Entropy
Entropy is a measure of the uncertainty in a random
variable
The term Entropy, usually refers to the Shannon
entropy, which quantifies the expected value of the
information contained in a message.
Given a random variable ‘v’ with value Vk , the entropy
of x is defined by
k
kk
vPvPvH )(log)()( 2
03/10/2013DT and Entropy8
Entropy Measurement Unit
03/10/2013DT and Entropy9
 bit
 {0,1}
 Based on 2
 nat
 Also known as nit or nepit
 Logarithmic unit, based on e
 1 nat = 1.44 bit = 0.434 ban
 ban
 Also known as hartley or a dit (short for decimal digit)
 Logarithmic unit, based on 10
 Introduced by Alan Turing and I J Good
 1 ban = 3.32 bits = 2.30 nats
Entropy
03/10/2013DT and Entropy10
 Given the Boolean random variable with
probability q, (1-q)
)1(log)1(log)( 22
qqqqqB
Entropy for n+p variables
03/10/2013DT and Entropy11
if we consider we have n+p examples
Where p is positive and n is negative.
qp
n
qp
n
qp
p
qp
p
qp
p
B
2
log
2
log
)(
Reminder
03/10/2013DT and Entropy12
The Expected Entropy (EH) or Reminder remaining
after trying attribute A (with branches i = 1,2.....,k)
is :
d
k kk
kkk
pn
p
B
pn
pn
Ader
1
)()(minRe
Information Gain (IG)
03/10/2013DT and Entropy13
Information Gain is a non-symmetric measure of
the difference between two probability
distributions P and Q.
)(minRe)()( Ader
np
p
BAGain
Calculate the root
03/10/2013DT and Entropy14
 Choose the attribute with highest gain.

Decision Tree and entropy

  • 1.
    Decision Tree, Entropy MdSaeed Siddik Khaza Moinuddin Mazumder
  • 2.
    Decision Tree A decisiontree is a decision support tool that uses a tree and their possible consequences. Decision Tree is a flow-chart like structure in which internal node represents test on an attribute each branch represents outcome of test each leaf node represents class label (decision taken after computing all attributes) 03/10/2013DT and Entropy2
  • 3.
    Consists of DT 03/10/2013DTand Entropy3  A decision tree consists of 3 types of nodes: 1.Decision nodes 2.Chance nodes 3.End nodes
  • 4.
    Types of variablesin DT Four types of tree can generated from a variables. Those are.. 03/10/2013DT and Entropy4 Terminal . Both are Left side / Both are Right side Separated in Both side /
  • 5.
    Decision Table 03/10/2013DT andEntropy5 Evidence Action Author Thread Length e1 skip known new long e2 read unknown new short e3 skip unknown old long e4 skip known old long e5 read known new short e6 skip known old long
  • 6.
  • 7.
    Decision 03/10/2013DT and Entropy7 Known ∧ Long ⇒ Skip  Known ∧ Short ⇒ Read  Unknown ∧ New ⇒ Read  Unknown ∧ Old ⇒ Skip
  • 8.
    Entropy Entropy is ameasure of the uncertainty in a random variable The term Entropy, usually refers to the Shannon entropy, which quantifies the expected value of the information contained in a message. Given a random variable ‘v’ with value Vk , the entropy of x is defined by k kk vPvPvH )(log)()( 2 03/10/2013DT and Entropy8
  • 9.
    Entropy Measurement Unit 03/10/2013DTand Entropy9  bit  {0,1}  Based on 2  nat  Also known as nit or nepit  Logarithmic unit, based on e  1 nat = 1.44 bit = 0.434 ban  ban  Also known as hartley or a dit (short for decimal digit)  Logarithmic unit, based on 10  Introduced by Alan Turing and I J Good  1 ban = 3.32 bits = 2.30 nats
  • 10.
    Entropy 03/10/2013DT and Entropy10 Given the Boolean random variable with probability q, (1-q) )1(log)1(log)( 22 qqqqqB
  • 11.
    Entropy for n+pvariables 03/10/2013DT and Entropy11 if we consider we have n+p examples Where p is positive and n is negative. qp n qp n qp p qp p qp p B 2 log 2 log )(
  • 12.
    Reminder 03/10/2013DT and Entropy12 TheExpected Entropy (EH) or Reminder remaining after trying attribute A (with branches i = 1,2.....,k) is : d k kk kkk pn p B pn pn Ader 1 )()(minRe
  • 13.
    Information Gain (IG) 03/10/2013DTand Entropy13 Information Gain is a non-symmetric measure of the difference between two probability distributions P and Q. )(minRe)()( Ader np p BAGain
  • 14.
    Calculate the root 03/10/2013DTand Entropy14  Choose the attribute with highest gain.