1. P1WU
UNIT โ III: CLASSIFICATION
Topic 5: DECISION TREES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2. UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naรฏve Text Classification 4.
Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or
Dimensionality Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional
Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
3. DECISION TREES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
4. INTRODUCTION TO DECISION TREES
โข What is a decision tree?
โข A decision tree is a structure that includes a root node,
branches, and leaf nodes.
a) Each internal node denotes a test on an attribute,
b) each branch denotes the outcome of a test, and
c) each leaf node holds a class label.
โข The topmost node in the tree is the root node.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
5. INTRODUCTION TO DECISION TREES
โข ID3, C4.5, and CART adopt a greedy (i.e., non-backtracking)
approach in which decision trees are constructed in a top-
down recursive divide-and-conquer manner.
โข Most algorithms for decision tree induction also follow such a
top-down approach, which starts with a training set of
tuples and their associated class labels.
โข The training set is recursively partitioned into smaller
subsets as the tree is being built.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
6. INTRODUCTION TO DECISION TREES
โข Decision tree induction is the learning of decision trees from
class-labeled training tuples.
โข A decision tree is a flowchart-like tree structure, where each
internal node (nonleaf node) denotes a test on an attribute,
โข each branch represents an outcome of the test, and
โข each leaf node (or terminal node) holds a class label.
โข The top most node in a tree is the root node.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
7. INTRODUCTION TO DECISION TREES
โขA decision tree is a tree where
โข internal node = a test on an attribute
โข tree branch = an outcome of the test
โข leaf node = class label or class distribution
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
8. Benefits of Decision Trees
โขThe benefits of having a decision tree are as
follows โ
a) It does not require any domain knowledge.
b) It is easy to comprehend.
c) The learning and classification steps of a decision tree are
simple and fast.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
9. Brief History of Decision Trees
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
CLS (Hunt etal. 1966)--- cost driven
ID3 (Quinlan, 1986 MLJ) --- Information-driven
C4.5 (Quinlan, 1993) --- Gain ratio + Pruning ideas
CART (Breiman et al. 1984) --- Gini Index
10. Elegance of Decision Trees
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
11. Structure of Decision Trees
โข If x1 > a1 & x2 > a2, then itโs A class
โข C4.5, CART, two of the most widely used
โข Easy interpretation, but accuracy generally unattractive
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Leaf nodes
Internal nodes
Root node
A
B
B A
A
x1
x2
x4
x3
> a1
> a2
12. Example of Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
13. Another Example of Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
14. Decision Tree classification Tasks
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
15. 5/16/2022 Data Mining: Concepts and Techniques 15
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to โNoโ
16. 5/16/2022 Data Mining: Concepts and Techniques 16
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
17. 5/16/2022 Data Mining: Concepts and Techniques 17
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
18. 5/16/2022 Data Mining: Concepts and Techniques 18
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
19. 5/16/2022 Data Mining: Concepts and Techniques 19
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
20. 5/16/2022 Data Mining: Concepts and Techniques 20
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to โNoโ
21. 5/16/2022 Data Mining: Concepts and Techniques 21
Decision Tree Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Tree
Induction
algorithm
Training Set
Decision Tree
22. Constructing a Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
23. Constructing a Decision Tree
โข Two phases of decision tree generation:
1. tree construction
โข at start, all the training examples at the root
โข partition examples based on selected attributes
โข test attributes are selected based on a heuristic or a statistical measure
2. tree pruning
โข identify and remove branches that reflect noise or outliers
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
24. Constructing a Decision Tree
โข Basic step:
Determination of the root node of the tree and
the root node of its sub-trees
โข Most Discriminatory Feature
โข Every feature can be used to partition the training data
โข If the partitions contain a pure class of training instances, then this feature is
most discriminatory
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
25. Constructing a Decision Tree:- Example of Partitions
โข Categorical feature
โข Number of partitions of the training data is equal to the number of values of
this feature
โข Numerical feature
โข Two partitions
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
26. Decision Tree Induction Algorithm
โข A machine researcher named J. Ross Quinlan in 1980 developed a
decision tree algorithm known as ID3 (Iterative Dichotomiser).
โข Later, he presented C4.5, which was the successor of ID3. ID3 and C4.5
adopt a greedy approach.
โข In this algorithm, there is no backtracking; the trees are constructed
in a top-down recursive divide-and-conquer manner.
โข Generating a decision tree form training tuples of data partition D
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
27. Algorithm : Generate_decision_tree
โข Input:
โข Data partition, D, which is a set of training tuples and their
associated class labels.
โข attribute_list, the set of candidate attributes.
โข Attribute selection method, a procedure to determine the splitting
criterion that best partitions that the data tuples into individual
classes. This criterion includes a splitting_attribute and either a
splitting point or splitting subset.
Output: A Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
28. Algorithm : Generate_decision_tree
โข Method
1) create a node N;
2) if tuples in D are all of the same class, C then
3) return N as leaf node labeled with class C;
4) if attribute_list is empty then
5) return N as leaf node with labeled with majority class in D;//majority voting
6) apply attribute_selection_method(D, attribute_list) to find the best splitting_criterion;
7) label node N with splitting_criterion;
8) if splitting_attribute is discrete-valued and multiway splits allowed then // not restricted to binary trees
9) attribute_list = attribute_list - splitting attribute; // remove splitting attribute
10) for each outcome j of splitting criterion
11) // partition the tuples and grow subtrees for each partition
12) let Dj be the set of data tuples in D satisfying outcome j; // a partition
13) if Dj is empty then
14) attach a leaf labeled with the majority class in D to node N;
else attach the node returned by Generate_decision tree(Dj, attribute list) to node N;
end for
15) return N;
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
29. Constructing Decision Tree Example :- Weather Forecasting
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
30. Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
9 Play samples
5 Donโt
A total of 14.
31. Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Outlook Temp Humidity Windy class
Sunny 75 70 true Play
Sunny 80 90 true Donโt
Sunny 85 85 false Donโt
Sunny 72 95 true Donโt
Sunny 69 70 false Play
Overcast 72 90 true Play
Overcast 83 78 false Play
Overcast 64 65 true Play
Overcast 81 75 false Play
Rain 71 80 true Donโt
Rain 65 70 true Donโt
Rain 75 80 false Play
Rain 68 80 false Play
Rain 70 96 false Play
Instance #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
32. Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2
outlook
windy
humidity
Play
Play
Play
Donโt
Donโt
sunny
overcast
rain
<= 75
> 75 false
true
2
4
3
3
33. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Total 14 training
instances
1,2,3,4,5
P,D,D,D,P
6,7,8,9
P,P,P,P
10,11,12,13,14
D, D, P, P, P
Outlook =
sunny
Outlook =
overcast
Outlook =
rain
34. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Total 14 training
instances
5,8,11,13,14
P,P, D, P, P
1,2,3,4,6,7,9,10,12
P,D,D,D,P,P,P,D,P
Temperature
<= 70
Temperature
> 70
35. Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2
outlook
windy
humidity
Play
Play
Play
Donโt
Donโt
sunny
overcast
rain
<= 75
> 75 false
true
2
4
3
3
36. Constructing Decision Tree Example :-
Decision on Buying a Computer / customer likely to
purchase a computer
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
37. Constructing Decision Tree Example :-
Decision on Buying a Computer
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
The following decision tree is for the concept buy_computer that
indicates :
Whether a customer at a company is likely to buy a computer or not?
๏Each internal node represents a test on an attribute.
๏Each leaf node represents a class.
38. Constructing Decision Tree :- Training Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31โฆ40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31โฆ40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31โฆ40 medium no excellent yes
31โฆ40 high yes fair yes
>40 medium no excellent no
This follows
an
example of
Quinlanโs
ID3 (Playing
Tennis)
39. Constructing Decision Tree :- Output: A Decision Tree for
โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
age?
overcast
student? credit rating?
<=30 >40
no yes yes
yes
31..40
fair
excellent
yes
no
From the training dataset , calculate entropy value, which indicates that splitting attribute is: age
40. A Decision Tree for โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
41. A Decision Tree for โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
From the training data set , age= youth has 2 classes based on student attribute
42. A Decision Tree for โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
based on majority voting in student attribute , RID=3 is grouped under yes group.
43. A Decision Tree for โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
From the training data set , age= senior has 2 classes based on credit rating.
44. A Decision Tree for โbuys_computerโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Final Decision Tree
45. Classification by Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
46. Classification by Decision Tree
โข A typical decision tree that represents the concept buys
computer, that is, it predicts whether a customer at
AllElectronics is likely to purchase a computer.
โข Internal nodes are denoted by rectangles, and leaf nodes are
denoted by ovals.
โข Some decision tree algorithms produce only binary trees
(where each internal node branches to exactly two other
nodes), whereas others can produce non binary trees.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
47. Classification by Decision Tree
โข โHow are decision trees used for classification?โ
โข Given a tuple, X, for which the associated class label is
unknown, the attribute values of the tuple are tested against
the decision tree.
โข A path is traced from the root to a leaf node, which holds the
class prediction for that tuple.
โข Decision trees can easily be converted to classification rules.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
48. Classification by Decision Tree
Why are decision tree classifiers so popular?
โข The construction of decision tree classifiers does not require any domain
knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery.
โข Decision trees can handle high dimensional data.
โข Their representation of acquired knowledge in tree form is intuitive and
generally easy to assimilate by humans.
โข The learning and classification steps of decision tree induction are simple
and fast.
โข In general, decision tree classifiers have good accuracy.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
49. Classification by Decision Tree
Extracting Classification Rules from Trees
โข Represent the knowledge in the form of IF-THEN rules
โข One rule is created for each path from the root to a leaf
โข Each attribute-value pair along a path forms a conjunction
โข The leaf node holds the class prediction
โข Rules are easier for humans to understand
โข Example
IF age = โ<=30โ AND student = โnoโ THEN buys_computer = โnoโ
IF age = โ<=30โ AND student = โyesโ THEN buys_computer = โyesโ
IF age = โ31โฆ40โ THEN buys_computer = โyesโ
IF age = โ>40โ AND credit_rating = โexcellentโ THEN buys_computer = โyesโ
IF age = โ<=30โ AND credit_rating = โfairโ THEN buys_computer = โnoโ
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
50. Classification by Decision Tree
Training Set and Its AVC Sets
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
student Buy_Computer
yes no
yes 6 1
no 3 4
Age Buy_Computer
yes no
<=30 3 2
31..40 4 0
>40 3 2
Credit
rating
Buy_Computer
yes no
fair 6 2
excellent 3 3
age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31โฆ40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31โฆ40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31โฆ40 medium no excellent yes
31โฆ40 high yes fair yes
>40 medium no excellent no
AVC-set on income
AVC-set on Age
AVC-set on Student
Training Examples
income Buy_Computer
yes no
high 2 2
medium 4 2
low 3 1
AVC-set on
credit_rating
51. Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ VIII
PROFESSIONAL ELECTIVE โ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES