Application Examples
• Predictingtumor cells as benign or malignant
• Classifying credit card transactions
as legitimate or fraudulent
• Classifying secondary structures of protein
as alpha-helix, beta-sheet, or random
coil
• Categorizing news stories as finance,
weather, entertainment, sports, etc
3.
Application Example (Trainingset)
name address salary Loan
amount
# of
dependen
ts
Own
home?
Loan
payd?
sami nablus 5000 20000 7 yes +
jamal jenin 4500 15000 8 no -
… … … … … … …
4.
Application Example Cont(testing)
name address salary Loan
amount
# of
dependen
ts
Own
home?
Loan
payd?
ahmad ramallah 6000 40000 6 no ???????
5.
Decision Trees
a decisiontree consists of
• Nodes
– test for the value of a certain attribute
• Edges
– correspond to the outcome of a test
– connect to the next node or leaf
• Leaves
– terminal nodes that predict the outcome
Classifying an Example
1.startat the root
2.perform the test
3.follow the edge corresponding to outcome
4.goto step two unless leaf
5.predict that outcome associated with the leaf
Decision Tree Induction
•Many Algorithms:
– Hunt’s Algorithm (one of the earliest)
– CART
– ID3, C4.5
– SLIQ,SPRINT
12.
Decision Tree Induction
•Given a set of examples, training set, build an
appropriate decision tree to classify these examples
• Divide the problem into sub-problems, solve each
problem (Divide-And-Conquer):
1.select a test for root node and create branch for each
possible outcome of the test
2.split examples into subsets one for each branch
extending from the node
3.repeat recursively for each branch, using only examples
that reach the branch
4.stop recursion for a branch if all its instances have the
same class
13.
Example: Build adecision tree to classify the four examples
color size shape class
Blue Big Round +
Red Big Square -
Blue Small Round +
green big square -
14.
1. select size(randomly, not best choice)
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size
15.
2. Attach sizechildren
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size
big
small
16.
3. Classify thefour examples at size
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size 1+, 2-, 3+, 4-
big
small
1+, 2-, 4- 3+
+
17.
4. select color(randomly, not best choice)
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size 1+, 2-, 3+, 4-
big
small
1+, 2-, 4- 3+
+
color
18.
5. Add colorchildren
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size 1+, 2-, 3+, 4-
big
small
1+, 2-, 4- 3+
+
color
blue
red
green
19.
6. Classify thethree examples at color
Object color size shape class
1 Blue Big Round +
2 Red Big Square -
3 Blue Small Round +
4 green big square -
size 1+, 2-, 3+, 4-
big
small
1+, 2-, 4- 3+
+
color
blue
red
green
1+
2- 4-
+
- -
20.
ID3 Algorithm
buildtree(examples, questions,default)
//examples: a list of training examples
//questions: a set of candidate questions (attributes)
//default: default label prediction, e.g., over-all majority
IF empty(examples) then return default;
IF (examples have same class x) then return x;
IF empty(questions) then return majority_vote(examples);
q = best_question(examples, questions)
Let there be n answers to q
– Create and return an internal node with n children
– The ith child is built by calling
buildtree({example|q=ith answer}, questions - {q}, default)
21.
What is aGood Attribute(Question)?
• We want to grow a simple tree
• a good attribute splits the examples so that each child
node is as pure as possible (examples in each child
node are almost of a single class)
• Maximum order: All examples are of the same class
• Minimum order: All classes are equally likely
• We want a measure to estimate the degree of "order"
of an attribute
• Entropy is a measure for (un-)orderness
• Entropy is the amount of information needed to
classify examples. All examples of the same class, no
information needed to classify examples