SlideShare a Scribd company logo
1 of 51
Download to read offline
P1WU
UNIT โ€“ III: CLASSIFICATION
Topic 5: DECISION TREES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
UNIT III : TEXT CLASSIFICATION AND CLUSTERING
1.A Characterization of Text
Classification
2. Unsupervised Algorithms:
Clustering
3. Naรฏve Text Classification 4.
Supervised Algorithms
5. Decision Tree
6. k-NN Classifier
7. SVM Classifier
8. Feature Selection or
Dimensionality Reduction
9. Evaluation metrics
10. Accuracy and Error
11. Organizing the classes
12. Indexing and Searching
13. Inverted Indexes
14. Sequential Searching
15. Multi-dimensional
Indexing
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
DECISION TREES
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO DECISION TREES
โ€ข What is a decision tree?
โ€ข A decision tree is a structure that includes a root node,
branches, and leaf nodes.
a) Each internal node denotes a test on an attribute,
b) each branch denotes the outcome of a test, and
c) each leaf node holds a class label.
โ€ข The topmost node in the tree is the root node.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO DECISION TREES
โ€ข ID3, C4.5, and CART adopt a greedy (i.e., non-backtracking)
approach in which decision trees are constructed in a top-
down recursive divide-and-conquer manner.
โ€ข Most algorithms for decision tree induction also follow such a
top-down approach, which starts with a training set of
tuples and their associated class labels.
โ€ข The training set is recursively partitioned into smaller
subsets as the tree is being built.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO DECISION TREES
โ€ข Decision tree induction is the learning of decision trees from
class-labeled training tuples.
โ€ข A decision tree is a flowchart-like tree structure, where each
internal node (nonleaf node) denotes a test on an attribute,
โ€ข each branch represents an outcome of the test, and
โ€ข each leaf node (or terminal node) holds a class label.
โ€ข The top most node in a tree is the root node.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
INTRODUCTION TO DECISION TREES
โ€ขA decision tree is a tree where
โ€ข internal node = a test on an attribute
โ€ข tree branch = an outcome of the test
โ€ข leaf node = class label or class distribution
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Benefits of Decision Trees
โ€ขThe benefits of having a decision tree are as
follows โˆ’
a) It does not require any domain knowledge.
b) It is easy to comprehend.
c) The learning and classification steps of a decision tree are
simple and fast.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Brief History of Decision Trees
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
CLS (Hunt etal. 1966)--- cost driven
ID3 (Quinlan, 1986 MLJ) --- Information-driven
C4.5 (Quinlan, 1993) --- Gain ratio + Pruning ideas
CART (Breiman et al. 1984) --- Gini Index
Elegance of Decision Trees
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Structure of Decision Trees
โ€ข If x1 > a1 & x2 > a2, then itโ€™s A class
โ€ข C4.5, CART, two of the most widely used
โ€ข Easy interpretation, but accuracy generally unattractive
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Leaf nodes
Internal nodes
Root node
A
B
B A
A
x1
x2
x4
x3
> a1
> a2
Example of Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Another Example of Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Decision Tree classification Tasks
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
5/16/2022 Data Mining: Concepts and Techniques 15
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to โ€œNoโ€
5/16/2022 Data Mining: Concepts and Techniques 16
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
5/16/2022 Data Mining: Concepts and Techniques 17
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
5/16/2022 Data Mining: Concepts and Techniques 18
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
5/16/2022 Data Mining: Concepts and Techniques 19
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
5/16/2022 Data Mining: Concepts and Techniques 20
Apply Model to Test Data
Refund
MarSt
TaxInc
YES
NO
NO
NO
Yes No
Married
Single, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to โ€œNoโ€
5/16/2022 Data Mining: Concepts and Techniques 21
Decision Tree Classification Task
Apply
Model
Induction
Deduction
Learn
Model
Model
Tid Attrib1 Attrib2 Attrib3 Class
1 Yes Large 125K No
2 No Medium 100K No
3 No Small 70K No
4 Yes Medium 120K No
5 No Large 95K Yes
6 No Medium 60K No
7 Yes Large 220K No
8 No Small 85K Yes
9 No Medium 75K No
10 No Small 90K Yes
10
Tid Attrib1 Attrib2 Attrib3 Class
11 No Small 55K ?
12 Yes Medium 80K ?
13 Yes Large 110K ?
14 No Small 95K ?
15 No Large 67K ?
10
Test Set
Tree
Induction
algorithm
Training Set
Decision Tree
Constructing a Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing a Decision Tree
โ€ข Two phases of decision tree generation:
1. tree construction
โ€ข at start, all the training examples at the root
โ€ข partition examples based on selected attributes
โ€ข test attributes are selected based on a heuristic or a statistical measure
2. tree pruning
โ€ข identify and remove branches that reflect noise or outliers
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing a Decision Tree
โ€ข Basic step:
Determination of the root node of the tree and
the root node of its sub-trees
โ€ข Most Discriminatory Feature
โ€ข Every feature can be used to partition the training data
โ€ข If the partitions contain a pure class of training instances, then this feature is
most discriminatory
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing a Decision Tree:- Example of Partitions
โ€ข Categorical feature
โ€ข Number of partitions of the training data is equal to the number of values of
this feature
โ€ข Numerical feature
โ€ข Two partitions
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Decision Tree Induction Algorithm
โ€ข A machine researcher named J. Ross Quinlan in 1980 developed a
decision tree algorithm known as ID3 (Iterative Dichotomiser).
โ€ข Later, he presented C4.5, which was the successor of ID3. ID3 and C4.5
adopt a greedy approach.
โ€ข In this algorithm, there is no backtracking; the trees are constructed
in a top-down recursive divide-and-conquer manner.
โ€ข Generating a decision tree form training tuples of data partition D
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Algorithm : Generate_decision_tree
โ€ข Input:
โ€ข Data partition, D, which is a set of training tuples and their
associated class labels.
โ€ข attribute_list, the set of candidate attributes.
โ€ข Attribute selection method, a procedure to determine the splitting
criterion that best partitions that the data tuples into individual
classes. This criterion includes a splitting_attribute and either a
splitting point or splitting subset.
Output: A Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Algorithm : Generate_decision_tree
โ€ข Method
1) create a node N;
2) if tuples in D are all of the same class, C then
3) return N as leaf node labeled with class C;
4) if attribute_list is empty then
5) return N as leaf node with labeled with majority class in D;//majority voting
6) apply attribute_selection_method(D, attribute_list) to find the best splitting_criterion;
7) label node N with splitting_criterion;
8) if splitting_attribute is discrete-valued and multiway splits allowed then // not restricted to binary trees
9) attribute_list = attribute_list - splitting attribute; // remove splitting attribute
10) for each outcome j of splitting criterion
11) // partition the tuples and grow subtrees for each partition
12) let Dj be the set of data tuples in D satisfying outcome j; // a partition
13) if Dj is empty then
14) attach a leaf labeled with the majority class in D to node N;
else attach the node returned by Generate_decision tree(Dj, attribute list) to node N;
end for
15) return N;
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing Decision Tree Example :- Weather Forecasting
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
9 Play samples
5 Donโ€™t
A total of 14.
Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Outlook Temp Humidity Windy class
Sunny 75 70 true Play
Sunny 80 90 true Donโ€™t
Sunny 85 85 false Donโ€™t
Sunny 72 95 true Donโ€™t
Sunny 69 70 false Play
Overcast 72 90 true Play
Overcast 83 78 false Play
Overcast 64 65 true Play
Overcast 81 75 false Play
Rain 71 80 true Donโ€™t
Rain 65 70 true Donโ€™t
Rain 75 80 false Play
Rain 68 80 false Play
Rain 70 96 false Play
Instance #
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2
outlook
windy
humidity
Play
Play
Play
Donโ€™t
Donโ€™t
sunny
overcast
rain
<= 75
> 75 false
true
2
4
3
3
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Total 14 training
instances
1,2,3,4,5
P,D,D,D,P
6,7,8,9
P,P,P,P
10,11,12,13,14
D, D, P, P, P
Outlook =
sunny
Outlook =
overcast
Outlook =
rain
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Total 14 training
instances
5,8,11,13,14
P,P, D, P, P
1,2,3,4,6,7,9,10,12
P,D,D,D,P,P,P,D,P
Temperature
<= 70
Temperature
> 70
Constructing Decision Tree :- A Simple Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
2
outlook
windy
humidity
Play
Play
Play
Donโ€™t
Donโ€™t
sunny
overcast
rain
<= 75
> 75 false
true
2
4
3
3
Constructing Decision Tree Example :-
Decision on Buying a Computer / customer likely to
purchase a computer
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Constructing Decision Tree Example :-
Decision on Buying a Computer
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
The following decision tree is for the concept buy_computer that
indicates :
Whether a customer at a company is likely to buy a computer or not?
๏ƒ˜Each internal node represents a test on an attribute.
๏ƒ˜Each leaf node represents a class.
Constructing Decision Tree :- Training Dataset
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
age income student credit_rating buys_computer
<=30 high no fair no
<=30 high no excellent no
31โ€ฆ40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31โ€ฆ40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31โ€ฆ40 medium no excellent yes
31โ€ฆ40 high yes fair yes
>40 medium no excellent no
This follows
an
example of
Quinlanโ€™s
ID3 (Playing
Tennis)
Constructing Decision Tree :- Output: A Decision Tree for
โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
age?
overcast
student? credit rating?
<=30 >40
no yes yes
yes
31..40
fair
excellent
yes
no
From the training dataset , calculate entropy value, which indicates that splitting attribute is: age
A Decision Tree for โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
A Decision Tree for โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
From the training data set , age= youth has 2 classes based on student attribute
A Decision Tree for โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
based on majority voting in student attribute , RID=3 is grouped under yes group.
A Decision Tree for โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
From the training data set , age= senior has 2 classes based on credit rating.
A Decision Tree for โ€œbuys_computerโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Final Decision Tree
Classification by Decision Tree
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification by Decision Tree
โ€ข A typical decision tree that represents the concept buys
computer, that is, it predicts whether a customer at
AllElectronics is likely to purchase a computer.
โ€ข Internal nodes are denoted by rectangles, and leaf nodes are
denoted by ovals.
โ€ข Some decision tree algorithms produce only binary trees
(where each internal node branches to exactly two other
nodes), whereas others can produce non binary trees.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification by Decision Tree
โ€ข โ€œHow are decision trees used for classification?โ€
โ€ข Given a tuple, X, for which the associated class label is
unknown, the attribute values of the tuple are tested against
the decision tree.
โ€ข A path is traced from the root to a leaf node, which holds the
class prediction for that tuple.
โ€ข Decision trees can easily be converted to classification rules.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification by Decision Tree
Why are decision tree classifiers so popular?
โ€ข The construction of decision tree classifiers does not require any domain
knowledge or parameter setting, and therefore is appropriate for
exploratory knowledge discovery.
โ€ข Decision trees can handle high dimensional data.
โ€ข Their representation of acquired knowledge in tree form is intuitive and
generally easy to assimilate by humans.
โ€ข The learning and classification steps of decision tree induction are simple
and fast.
โ€ข In general, decision tree classifiers have good accuracy.
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification by Decision Tree
Extracting Classification Rules from Trees
โ€ข Represent the knowledge in the form of IF-THEN rules
โ€ข One rule is created for each path from the root to a leaf
โ€ข Each attribute-value pair along a path forms a conjunction
โ€ข The leaf node holds the class prediction
โ€ข Rules are easier for humans to understand
โ€ข Example
IF age = โ€œ<=30โ€ AND student = โ€œnoโ€ THEN buys_computer = โ€œnoโ€
IF age = โ€œ<=30โ€ AND student = โ€œyesโ€ THEN buys_computer = โ€œyesโ€
IF age = โ€œ31โ€ฆ40โ€ THEN buys_computer = โ€œyesโ€
IF age = โ€œ>40โ€ AND credit_rating = โ€œexcellentโ€ THEN buys_computer = โ€œyesโ€
IF age = โ€œ<=30โ€ AND credit_rating = โ€œfairโ€ THEN buys_computer = โ€œnoโ€
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
Classification by Decision Tree
Training Set and Its AVC Sets
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES
student Buy_Computer
yes no
yes 6 1
no 3 4
Age Buy_Computer
yes no
<=30 3 2
31..40 4 0
>40 3 2
Credit
rating
Buy_Computer
yes no
fair 6 2
excellent 3 3
age income studentcredit_rating
buys_computer
<=30 high no fair no
<=30 high no excellent no
31โ€ฆ40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31โ€ฆ40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31โ€ฆ40 medium no excellent yes
31โ€ฆ40 high yes fair yes
>40 medium no excellent no
AVC-set on income
AVC-set on Age
AVC-set on Student
Training Examples
income Buy_Computer
yes no
high 2 2
medium 4 2
low 3 1
AVC-set on
credit_rating
Any Questions?
AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
SEMESTER โ€“ VIII
PROFESSIONAL ELECTIVE โ€“ IV
CS8080- INFORMATION RETRIEVAL TECHNIQUES

More Related Content

Similar to CS8080_IRT_UNIT - III T5 DECISION TREES.pdf

July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon Online
July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon OnlineJuly 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon Online
July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon OnlineNYU Tandon Online
ย 
internshipfinalpresentation
internshipfinalpresentationinternshipfinalpresentation
internshipfinalpresentationSamarth Patel
ย 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data miningKamal Acharya
ย 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsQuantUniversity
ย 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challangesIvica Crnkovic
ย 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratchFEG
ย 
Design and Analysis Algorithms.pdf
Design and Analysis Algorithms.pdfDesign and Analysis Algorithms.pdf
Design and Analysis Algorithms.pdfHarshNagda5
ย 
2022-S1-IT2070-Lecture-06-Algorithms.pptx
2022-S1-IT2070-Lecture-06-Algorithms.pptx2022-S1-IT2070-Lecture-06-Algorithms.pptx
2022-S1-IT2070-Lecture-06-Algorithms.pptxpradeepwalter
ย 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1Jessie_N
ย 
Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance
Machine Learning/ Data Science: Boosting Predictive Analytics Model PerformanceMachine Learning/ Data Science: Boosting Predictive Analytics Model Performance
Machine Learning/ Data Science: Boosting Predictive Analytics Model PerformanceT. Scott Clendaniel
ย 
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...Knobbe Martens - Intellectual Property Law
ย 

Similar to CS8080_IRT_UNIT - III T5 DECISION TREES.pdf (20)

CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdfCS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
CS8080_IRT_UNIT - III T7 SVM CLASSIFIER.pdf
ย 
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdfCS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
CS8080_IRT_UNIT - III T2 UNSUPERVISED ALGORITHMS -CLUSTERING.pdf
ย 
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdfCS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
CS8080_IRT_UNIT - III T3 NAIVE TEXT CLASSIFICATION.pdf
ย 
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdfCS8080_IRT_UNIT - III T10  ACCURACY AND ERROR.pdf
CS8080_IRT_UNIT - III T10 ACCURACY AND ERROR.pdf
ย 
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdfCS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
CS8080_IRT_UNIT - III T9 EVALUATION METRICS.pdf
ย 
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdfCS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
CS8080_IRT_UNIT - III T12 INDEXING AND SEARCHING.pdf
ย 
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdfCS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
CS8080_IRT_UNIT - III T15 MULTI-DIMENSIONAL INDEXING.pdf
ย 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
ย 
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdfCS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
CS8080_IRT_UNIT - III T11 ORGANIZING THE CLASSES.pdf
ย 
July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon Online
July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon OnlineJuly 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon Online
July 28, 2016 Webcast for the Industrial Engineering MS at NYU Tandon Online
ย 
internshipfinalpresentation
internshipfinalpresentationinternshipfinalpresentation
internshipfinalpresentation
ย 
Classification techniques in data mining
Classification techniques in data miningClassification techniques in data mining
Classification techniques in data mining
ย 
Machine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and ApplicationsMachine Learning and AI: Core Methods and Applications
Machine Learning and AI: Core Methods and Applications
ย 
2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges2020 09-16-ai-engineering challanges
2020 09-16-ai-engineering challanges
ย 
2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch2023 Supervised Learning for Orange3 from scratch
2023 Supervised Learning for Orange3 from scratch
ย 
Design and Analysis Algorithms.pdf
Design and Analysis Algorithms.pdfDesign and Analysis Algorithms.pdf
Design and Analysis Algorithms.pdf
ย 
2022-S1-IT2070-Lecture-06-Algorithms.pptx
2022-S1-IT2070-Lecture-06-Algorithms.pptx2022-S1-IT2070-Lecture-06-Algorithms.pptx
2022-S1-IT2070-Lecture-06-Algorithms.pptx
ย 
Data science workflow v1.1
Data science workflow v1.1Data science workflow v1.1
Data science workflow v1.1
ย 
Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance
Machine Learning/ Data Science: Boosting Predictive Analytics Model PerformanceMachine Learning/ Data Science: Boosting Predictive Analytics Model Performance
Machine Learning/ Data Science: Boosting Predictive Analytics Model Performance
ย 
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
Understanding and Protecting Artificial Intelligence Technology (Machine Lear...
ย 

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING

More from AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING (13)

JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptxJAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
JAVA PROGRAM CONSTRUCTS OR LANGUAGE BASICS.pptx
ย 
INTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.pptINTRO TO PROGRAMMING.ppt
INTRO TO PROGRAMMING.ppt
ย 
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptxCS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
CS3391 OOP UT-I T4 JAVA BUZZWORDS.pptx
ย 
CS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOPCS3391 OOP UT-I T1 OVERVIEW OF OOP
CS3391 OOP UT-I T1 OVERVIEW OF OOP
ย 
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMINGCS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
CS3391 OOP UT-I T3 FEATURES OF OBJECT ORIENTED PROGRAMMING
ย 
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptxCS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
CS3391 OOP UT-I T2 OBJECT ORIENTED PROGRAMMING PARADIGM.pptx
ย 
CS3391 -OOP -UNIT โ€“ V NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ V NOTES FINAL.pdfCS3391 -OOP -UNIT โ€“ V NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ V NOTES FINAL.pdf
ย 
CS3391 -OOP -UNIT โ€“ IV NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ IV NOTES FINAL.pdfCS3391 -OOP -UNIT โ€“ IV NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ IV NOTES FINAL.pdf
ย 
CS3391 -OOP -UNIT โ€“ III NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ III  NOTES FINAL.pdfCS3391 -OOP -UNIT โ€“ III  NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ III NOTES FINAL.pdf
ย 
CS3391 -OOP -UNIT โ€“ II NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ II  NOTES FINAL.pdfCS3391 -OOP -UNIT โ€“ II  NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ II NOTES FINAL.pdf
ย 
CS3391 -OOP -UNIT โ€“ I NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ I  NOTES FINAL.pdfCS3391 -OOP -UNIT โ€“ I  NOTES FINAL.pdf
CS3391 -OOP -UNIT โ€“ I NOTES FINAL.pdf
ย 
CS3251-_PIC
CS3251-_PICCS3251-_PIC
CS3251-_PIC
ย 
CS8080 IRT UNIT I NOTES.pdf
CS8080 IRT UNIT I  NOTES.pdfCS8080 IRT UNIT I  NOTES.pdf
CS8080 IRT UNIT I NOTES.pdf
ย 

Recently uploaded

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
ย 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
ย 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...9953056974 Low Rate Call Girls In Saket, Delhi NCR
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .DerechoLaboralIndivi
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLManishPatel169454
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
ย 

Recently uploaded (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
ย 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
ย 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
ย 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
ย 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
ย 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
ย 
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort ServiceCall Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
Call Girls in Ramesh Nagar Delhi ๐Ÿ’ฏ Call Us ๐Ÿ”9953056974 ๐Ÿ” Escort Service
ย 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
ย 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
ย 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
ย 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
ย 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
ย 
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar  โ‰ผ๐Ÿ” Delhi door step de...
Call Now โ‰ฝ 9953056974 โ‰ผ๐Ÿ” Call Girls In New Ashok Nagar โ‰ผ๐Ÿ” Delhi door step de...
ย 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
ย 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
ย 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
ย 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
ย 
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELLPVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
PVC VS. FIBERGLASS (FRP) GRAVITY SEWER - UNI BELL
ย 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
ย 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
ย 

CS8080_IRT_UNIT - III T5 DECISION TREES.pdf

  • 1. P1WU UNIT โ€“ III: CLASSIFICATION Topic 5: DECISION TREES AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 2. UNIT III : TEXT CLASSIFICATION AND CLUSTERING 1.A Characterization of Text Classification 2. Unsupervised Algorithms: Clustering 3. Naรฏve Text Classification 4. Supervised Algorithms 5. Decision Tree 6. k-NN Classifier 7. SVM Classifier 8. Feature Selection or Dimensionality Reduction 9. Evaluation metrics 10. Accuracy and Error 11. Organizing the classes 12. Indexing and Searching 13. Inverted Indexes 14. Sequential Searching 15. Multi-dimensional Indexing AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 3. DECISION TREES AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 4. INTRODUCTION TO DECISION TREES โ€ข What is a decision tree? โ€ข A decision tree is a structure that includes a root node, branches, and leaf nodes. a) Each internal node denotes a test on an attribute, b) each branch denotes the outcome of a test, and c) each leaf node holds a class label. โ€ข The topmost node in the tree is the root node. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 5. INTRODUCTION TO DECISION TREES โ€ข ID3, C4.5, and CART adopt a greedy (i.e., non-backtracking) approach in which decision trees are constructed in a top- down recursive divide-and-conquer manner. โ€ข Most algorithms for decision tree induction also follow such a top-down approach, which starts with a training set of tuples and their associated class labels. โ€ข The training set is recursively partitioned into smaller subsets as the tree is being built. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 6. INTRODUCTION TO DECISION TREES โ€ข Decision tree induction is the learning of decision trees from class-labeled training tuples. โ€ข A decision tree is a flowchart-like tree structure, where each internal node (nonleaf node) denotes a test on an attribute, โ€ข each branch represents an outcome of the test, and โ€ข each leaf node (or terminal node) holds a class label. โ€ข The top most node in a tree is the root node. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 7. INTRODUCTION TO DECISION TREES โ€ขA decision tree is a tree where โ€ข internal node = a test on an attribute โ€ข tree branch = an outcome of the test โ€ข leaf node = class label or class distribution AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 8. Benefits of Decision Trees โ€ขThe benefits of having a decision tree are as follows โˆ’ a) It does not require any domain knowledge. b) It is easy to comprehend. c) The learning and classification steps of a decision tree are simple and fast. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 9. Brief History of Decision Trees AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES CLS (Hunt etal. 1966)--- cost driven ID3 (Quinlan, 1986 MLJ) --- Information-driven C4.5 (Quinlan, 1993) --- Gain ratio + Pruning ideas CART (Breiman et al. 1984) --- Gini Index
  • 10. Elegance of Decision Trees AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 11. Structure of Decision Trees โ€ข If x1 > a1 & x2 > a2, then itโ€™s A class โ€ข C4.5, CART, two of the most widely used โ€ข Easy interpretation, but accuracy generally unattractive AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Leaf nodes Internal nodes Root node A B B A A x1 x2 x4 x3 > a1 > a2
  • 12. Example of Decision Tree AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 13. Another Example of Decision Tree AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 14. Decision Tree classification Tasks AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 15. 5/16/2022 Data Mining: Concepts and Techniques 15 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to โ€œNoโ€
  • 16. 5/16/2022 Data Mining: Concepts and Techniques 16 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 17. 5/16/2022 Data Mining: Concepts and Techniques 17 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 18. 5/16/2022 Data Mining: Concepts and Techniques 18 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 19. 5/16/2022 Data Mining: Concepts and Techniques 19 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 20. 5/16/2022 Data Mining: Concepts and Techniques 20 Apply Model to Test Data Refund MarSt TaxInc YES NO NO NO Yes No Married Single, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to โ€œNoโ€
  • 21. 5/16/2022 Data Mining: Concepts and Techniques 21 Decision Tree Classification Task Apply Model Induction Deduction Learn Model Model Tid Attrib1 Attrib2 Attrib3 Class 1 Yes Large 125K No 2 No Medium 100K No 3 No Small 70K No 4 Yes Medium 120K No 5 No Large 95K Yes 6 No Medium 60K No 7 Yes Large 220K No 8 No Small 85K Yes 9 No Medium 75K No 10 No Small 90K Yes 10 Tid Attrib1 Attrib2 Attrib3 Class 11 No Small 55K ? 12 Yes Medium 80K ? 13 Yes Large 110K ? 14 No Small 95K ? 15 No Large 67K ? 10 Test Set Tree Induction algorithm Training Set Decision Tree
  • 22. Constructing a Decision Tree AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 23. Constructing a Decision Tree โ€ข Two phases of decision tree generation: 1. tree construction โ€ข at start, all the training examples at the root โ€ข partition examples based on selected attributes โ€ข test attributes are selected based on a heuristic or a statistical measure 2. tree pruning โ€ข identify and remove branches that reflect noise or outliers AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 24. Constructing a Decision Tree โ€ข Basic step: Determination of the root node of the tree and the root node of its sub-trees โ€ข Most Discriminatory Feature โ€ข Every feature can be used to partition the training data โ€ข If the partitions contain a pure class of training instances, then this feature is most discriminatory AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 25. Constructing a Decision Tree:- Example of Partitions โ€ข Categorical feature โ€ข Number of partitions of the training data is equal to the number of values of this feature โ€ข Numerical feature โ€ข Two partitions AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 26. Decision Tree Induction Algorithm โ€ข A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known as ID3 (Iterative Dichotomiser). โ€ข Later, he presented C4.5, which was the successor of ID3. ID3 and C4.5 adopt a greedy approach. โ€ข In this algorithm, there is no backtracking; the trees are constructed in a top-down recursive divide-and-conquer manner. โ€ข Generating a decision tree form training tuples of data partition D AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 27. Algorithm : Generate_decision_tree โ€ข Input: โ€ข Data partition, D, which is a set of training tuples and their associated class labels. โ€ข attribute_list, the set of candidate attributes. โ€ข Attribute selection method, a procedure to determine the splitting criterion that best partitions that the data tuples into individual classes. This criterion includes a splitting_attribute and either a splitting point or splitting subset. Output: A Decision Tree AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 28. Algorithm : Generate_decision_tree โ€ข Method 1) create a node N; 2) if tuples in D are all of the same class, C then 3) return N as leaf node labeled with class C; 4) if attribute_list is empty then 5) return N as leaf node with labeled with majority class in D;//majority voting 6) apply attribute_selection_method(D, attribute_list) to find the best splitting_criterion; 7) label node N with splitting_criterion; 8) if splitting_attribute is discrete-valued and multiway splits allowed then // not restricted to binary trees 9) attribute_list = attribute_list - splitting attribute; // remove splitting attribute 10) for each outcome j of splitting criterion 11) // partition the tuples and grow subtrees for each partition 12) let Dj be the set of data tuples in D satisfying outcome j; // a partition 13) if Dj is empty then 14) attach a leaf labeled with the majority class in D to node N; else attach the node returned by Generate_decision tree(Dj, attribute list) to node N; end for 15) return N; AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 29. Constructing Decision Tree Example :- Weather Forecasting AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 30. Constructing Decision Tree :- A Simple Dataset AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES 9 Play samples 5 Donโ€™t A total of 14.
  • 31. Constructing Decision Tree :- A Simple Dataset AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Outlook Temp Humidity Windy class Sunny 75 70 true Play Sunny 80 90 true Donโ€™t Sunny 85 85 false Donโ€™t Sunny 72 95 true Donโ€™t Sunny 69 70 false Play Overcast 72 90 true Play Overcast 83 78 false Play Overcast 64 65 true Play Overcast 81 75 false Play Rain 71 80 true Donโ€™t Rain 65 70 true Donโ€™t Rain 75 80 false Play Rain 68 80 false Play Rain 70 96 false Play Instance # 1 2 3 4 5 6 7 8 9 10 11 12 13 14
  • 32. Constructing Decision Tree :- A Simple Dataset AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES 2 outlook windy humidity Play Play Play Donโ€™t Donโ€™t sunny overcast rain <= 75 > 75 false true 2 4 3 3
  • 33. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Total 14 training instances 1,2,3,4,5 P,D,D,D,P 6,7,8,9 P,P,P,P 10,11,12,13,14 D, D, P, P, P Outlook = sunny Outlook = overcast Outlook = rain
  • 34. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Total 14 training instances 5,8,11,13,14 P,P, D, P, P 1,2,3,4,6,7,9,10,12 P,D,D,D,P,P,P,D,P Temperature <= 70 Temperature > 70
  • 35. Constructing Decision Tree :- A Simple Dataset AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES 2 outlook windy humidity Play Play Play Donโ€™t Donโ€™t sunny overcast rain <= 75 > 75 false true 2 4 3 3
  • 36. Constructing Decision Tree Example :- Decision on Buying a Computer / customer likely to purchase a computer AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 37. Constructing Decision Tree Example :- Decision on Buying a Computer AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES The following decision tree is for the concept buy_computer that indicates : Whether a customer at a company is likely to buy a computer or not? ๏ƒ˜Each internal node represents a test on an attribute. ๏ƒ˜Each leaf node represents a class.
  • 38. Constructing Decision Tree :- Training Dataset AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES age income student credit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31โ€ฆ40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31โ€ฆ40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31โ€ฆ40 medium no excellent yes 31โ€ฆ40 high yes fair yes >40 medium no excellent no This follows an example of Quinlanโ€™s ID3 (Playing Tennis)
  • 39. Constructing Decision Tree :- Output: A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES age? overcast student? credit rating? <=30 >40 no yes yes yes 31..40 fair excellent yes no From the training dataset , calculate entropy value, which indicates that splitting attribute is: age
  • 40. A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 41. A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES From the training data set , age= youth has 2 classes based on student attribute
  • 42. A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES based on majority voting in student attribute , RID=3 is grouped under yes group.
  • 43. A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES From the training data set , age= senior has 2 classes based on credit rating.
  • 44. A Decision Tree for โ€œbuys_computerโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES Final Decision Tree
  • 45. Classification by Decision Tree AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 46. Classification by Decision Tree โ€ข A typical decision tree that represents the concept buys computer, that is, it predicts whether a customer at AllElectronics is likely to purchase a computer. โ€ข Internal nodes are denoted by rectangles, and leaf nodes are denoted by ovals. โ€ข Some decision tree algorithms produce only binary trees (where each internal node branches to exactly two other nodes), whereas others can produce non binary trees. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 47. Classification by Decision Tree โ€ข โ€œHow are decision trees used for classification?โ€ โ€ข Given a tuple, X, for which the associated class label is unknown, the attribute values of the tuple are tested against the decision tree. โ€ข A path is traced from the root to a leaf node, which holds the class prediction for that tuple. โ€ข Decision trees can easily be converted to classification rules. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 48. Classification by Decision Tree Why are decision tree classifiers so popular? โ€ข The construction of decision tree classifiers does not require any domain knowledge or parameter setting, and therefore is appropriate for exploratory knowledge discovery. โ€ข Decision trees can handle high dimensional data. โ€ข Their representation of acquired knowledge in tree form is intuitive and generally easy to assimilate by humans. โ€ข The learning and classification steps of decision tree induction are simple and fast. โ€ข In general, decision tree classifiers have good accuracy. AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 49. Classification by Decision Tree Extracting Classification Rules from Trees โ€ข Represent the knowledge in the form of IF-THEN rules โ€ข One rule is created for each path from the root to a leaf โ€ข Each attribute-value pair along a path forms a conjunction โ€ข The leaf node holds the class prediction โ€ข Rules are easier for humans to understand โ€ข Example IF age = โ€œ<=30โ€ AND student = โ€œnoโ€ THEN buys_computer = โ€œnoโ€ IF age = โ€œ<=30โ€ AND student = โ€œyesโ€ THEN buys_computer = โ€œyesโ€ IF age = โ€œ31โ€ฆ40โ€ THEN buys_computer = โ€œyesโ€ IF age = โ€œ>40โ€ AND credit_rating = โ€œexcellentโ€ THEN buys_computer = โ€œyesโ€ IF age = โ€œ<=30โ€ AND credit_rating = โ€œfairโ€ THEN buys_computer = โ€œnoโ€ AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES
  • 50. Classification by Decision Tree Training Set and Its AVC Sets AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES student Buy_Computer yes no yes 6 1 no 3 4 Age Buy_Computer yes no <=30 3 2 31..40 4 0 >40 3 2 Credit rating Buy_Computer yes no fair 6 2 excellent 3 3 age income studentcredit_rating buys_computer <=30 high no fair no <=30 high no excellent no 31โ€ฆ40 high no fair yes >40 medium no fair yes >40 low yes fair yes >40 low yes excellent no 31โ€ฆ40 low yes excellent yes <=30 medium no fair no <=30 low yes fair yes >40 medium yes fair yes <=30 medium yes excellent yes 31โ€ฆ40 medium no excellent yes 31โ€ฆ40 high yes fair yes >40 medium no excellent no AVC-set on income AVC-set on Age AVC-set on Student Training Examples income Buy_Computer yes no high 2 2 medium 4 2 low 3 1 AVC-set on credit_rating
  • 51. Any Questions? AALIM MUHAMMED SALEGH COLLEGE OF ENGINEERING DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING SEMESTER โ€“ VIII PROFESSIONAL ELECTIVE โ€“ IV CS8080- INFORMATION RETRIEVAL TECHNIQUES