SlideShare a Scribd company logo
1 of 44
Classification-
Decision Tree
Classification by Decision Tree
Induction
• Decision tree
– A flow-chart-like tree structure
– Internal node denotes a test on an attribute
– Branch represents an outcome of the test
– Leaf nodes represent class labels or class distribution
– The topmost node in the tree is the root node.
• Decision tree generation consists of two phases
– Tree construction
• At start, all the training examples are at the root
• Partition examples recursively based on selected attributes
– Tree pruning
• Identify and remove branches that reflect noise or outliers
• Use of decision tree: Classifying an unknown sample
– Test the attribute values of the sample against the decision tree
Decision Tree for PlayTennis
Outlook
Sunny Overcast Rain
Humidity
High Normal
No Yes
Each internal node
tests an attribute
Each branch corresponds to an
attribute value node
Each leaf node assigns a classification
(1) Which to start? (root)
(2) Which node to
proceed?
(3) When to stop/ come to conclusion?
Decision trees classify instances or examples by starting at the root of the tree
and moving through it until a leaf node.
Decision Tree for Conjunction
Outlook
Sunny Overcast Rain
Wind
Strong Weak
No Yes
No
Outlook=Sunny  Wind=Weak
No
Decision Tree for Disjunction
Outlook
Sunny Overcast Rain
Yes
Outlook=Sunny  Wind=Weak
Wind
Strong Weak
No Yes
Wind
Strong Weak
No Yes
Decision Tree for XOR
Outlook
Sunny Overcast Rain
Wind
Strong Weak
Yes No
Outlook=Sunny XOR Wind=Weak
Wind
Strong Weak
No Yes
Wind
Strong Weak
No Yes
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
• decision trees represent disjunctions of conjunctions
(Outlook=Sunny  Humidity=Normal)
 (Outlook=Overcast)
 (Outlook=Rain  Wind=Weak)
Decision Tree
When to consider Decision Trees
• Instances describable by attribute-value pairs
• Target function is discrete valued
• Disjunctive hypothesis may be required
• Possibly noisy training data
• Missing attribute values
• Examples:
– Medical diagnosis
– Credit risk analysis
– Object classification for robot manipulator (Tan
1993)
A simple example
• You want to guess the outcome of next week's game
between the MallRats and the Chinooks.
• Available knowledge / Attribute
– was the game at Home or Away
– was the starting time 5pm, 7pm or 9pm.
– Did Joe play center, or forward.
– whether that opponent's center was tall or not.
– …..
Basket ball data
What we know ?
• The game will be away, at 9pm, and that Joe will play
center on offense…
• A classification problem
• Generalizing the learned rule to new examples
• What you don't know, of course, is who will win this game.
• Of course, it is reasonable to assume that this future game will
resemble the past games. Note, however, there are no previous games
that match these specific values -- ie, no previous game was exactly
[Where=Away, When=9pm, FredStarts=No, JoeOffense=Center,
JoeDefends=Forward, OppC=Tall].
We therefore need to generalize -- by using the known examples to infer
the likely outcome of this new situation. But how?
Use a Decision Tree to determine who should win the game
As we did not indicate the outcome of this game we call this an
"unlabeled instance"; the goal of a classifier is finding the class label for
such unlabeled instances.
An instance that also includes the outcome is called a "labeled instance" ---
eg, the first row of the table
corresponds to the labeled instance
Decision Trees
In general, a decision tree is a tree structure; see left-hard
figure below.
Example of a Decision Tree
Tid Refund Marital
Status
Taxable
Income Cheat
1 Yes Single 125K No
2 No Married 100K No
3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
6 No Married 60K No
7 Yes Divorced 220K No
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes
10
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Training Data Model: Decision Tree
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Start at the root of tree
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Apply Model to Test Data
Refund
MarSt
TaxInc
YESNO
NO
NO
Yes No
MarriedSingle, Divorced
< 80K > 80K
Refund Marital
Status
Taxable
Income Cheat
No Married 80K ?
10
Test Data
Assign Cheat to “No”
Principle
‒ Basic algorithm (adopted by ID3, C4.5 and CART): a greedy algorithm
‒ Tree is constructed in a top-down recursive divide-and-conquer manner
‒ Attributes are categorical (if continuous-valued, they are discretized in
advance)
‒ Choose the best attribute(s) to split the remaining instances and make
that attribute a decision node
Iterations
‒ At start, all the training tuples are at the root
‒ Tuples are partitioned recursively based on selected attributes
‒ Test attributes are selected on the basis of a heuristic or statistical
measure (e.g., information gain)
Stopping conditions
‒ All samples for a given node belong to the same class
‒ There are no remaining attributes for further partitioning – majority
voting is employed for classifying the leaf
‒ There are no samples left
Decision Tree Algorithm
Example
Example
Example
Three Possible Partition Scenarios
How to choose An Attribute?
• An attribute selection measure is a heuristic for selecting the splitting
criterion that “best” separates a given data partition, D, of class
labeled training tuples into individual classes.
Ideally
‒ Each resulting partition would be pure
‒ A pure partition is a partition containing tuples that all belong to the
same class
• Attribute selection measures (splitting rules)
‒ Determine how the tuples at a given node are to be split
‒ Provide ranking for each attribute describing the tuples
‒ The attribute with highest score is chosen
‒ Determine a split point or a splitting subset
• Methods
– Information gain (ID3 (Iterative Dichotomiser 3) /C4.5)
– Gain ratio
– Gini Index (IBM IntelligentMiner)
Attribute Selection Measures
Before Describing Information Gain
Entropy is a measure of the average information content one
is missing when one does not know the value of the random
variable.
– Shannon's metric of "Entropy" of information is a foundational
concept of information theory.
– The entropy of a variable is the "amount of information"
contained in the variable.
High Entropy
– X is from a uniform like distribution
– Flat histogram
– Values sampled from it are less predictable
Low Entropy
– X is from a varied (peaks and valleys) distribution
– Histogram has many lows and highs
– Values sampled from it are more predictable
1st approach: Information Gain
Approach
Information Gain Approach
Assume there are two classes, P and N
Let the set of examples D contain p elements of class P
and n elements of class N
The amount of information, needed to decide if an
arbitrary example in D belongs to P or N is defined as
Info(D) =
np
n
np
n
np
p
np
p
npI



 22 loglog),(
Information Gain Approach
log2x=log10x/log102
Info(D): Example
Information Gain in Attribute
• Assume that using attribute A a set D will be
partitioned into sets {D1, D2 , …, Dv}
– If D contains pi examples of P and ni examples of N, the
entropy, or the expected information needed to classify
objects in all subtrees Si is
• The encoding information that would be gained by
branching on A

 



1
),()(
i
ii
ii
npI
np
np
AE
)(),()( AEnpIAGain 
Information Gain in Attribute
Infoage(D): Example
Information Gain in Attribute
Infoage(D): Example
Extracting Classification Rules from
Trees
• Represent the knowledge in the form of IF-THEN rules
• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction
• The leaf node holds the class prediction
• Rules are easier for humans to understand
• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer
= “yes”
IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”
Avoid Overfitting in
Classification
• The generated tree may overfit the training data
– Too many branches, some may reflect anomalies
due to noise or outliers
– Result is in poor accuracy for unseen samples
• Two approaches to avoid overfitting
– Prepruning: Halt tree construction early—do not
split a node if this would result in the goodness
measure falling below a threshold
• Difficult to choose an appropriate threshold
– Postpruning: Remove branches from a “fully grown”
tree—get a sequence of progressively pruned trees
• Use a set of data different from the training data
to decide which is the “best pruned tree”
Approaches to Determine the Final
Tree Size
• Separate training (2/3) and testing (1/3) sets
• Use cross validation, e.g., 10-fold cross
validation
• Use all the data for training
– but apply a statistical test (e.g., chi-square) to
estimate whether expanding or pruning a node
may improve the entire distribution
• Use minimum description length (MDL) principle:
– halting growth of the tree when the encoding is
Enhancements to basic decision
tree induction
• Allow for continuous-valued attributes
– Dynamically define new discrete-valued attributes that
partition the continuous attribute value into a discrete
set of intervals
• Handle missing attribute values
– Assign the most common value of the attribute
– Assign probability to each of the possible values
• Attribute construction
– Create new attributes based on existing ones that are
sparsely represented
– This reduces fragmentation, repetition, and replication
Sore Throat Fever Swollen Glands Congestion Headache Diagnosis
YES YES YES YES YES Strep Throat
NO NO NO YES YES Allergy
YES YES NO YES NO Cold
YES NO YES NO NO Strep Throat
NO YES NO YES NO Cold
NO NO NO YES NO Allergy
NO NO YES NO NO Strep Throat
YES NO NO YES YES Allergy
NO YES NO YES YES Cold
YES YES NO YES YES Cold
Exercise: For the following Medical Diagnosis Data, create a
decision tree.
 
2 2 2
2 2
10 10
10 10
3 3 3 3 4 4
log log log
10 10 10 10 10 10
0.3log (0.3) 2 0.4log (0.4)
log (0.3) log (0.4)
0.6 0.4
log 2 log 2
( 0.522) ( 0.397)
0.6 0.4
0.301 0.301
0.6(1.73) 0.4(1.
InfoGain
      
         
      
   
 
   
 
  
    
  318)
1.038 0.5272 1.562  
S=Strep Throat (3)+Allergy(3)+Cold(4)=10
Info(S)=1.562
Finding Splitting Attribute
• Select Attribute with highest Gain
Sore
Throat=
Strep
Throat
Allergy Cold
YES 2 1 2
NO 1 2 2
Information Gain x P
Information Gain x P
+ = Entropy
Sore
Throat=
2 2 2
2 2 1 1 2 2
( ) log log log
5 5 5 5 5 5
( ) 1.52
Info YES
Info YES
      
         
      

2 2 2
1 1 2 2 2 2
( ) log log log
5 5 5 5 5 5
( ) 1.52
Info NO
Info NO
      
         
      

Entropy (E(Sore Throat)= P(YES)x1.52 + P(NO)x1.52
= (5/10)x1.52 + (5/10)x1.52 = 1.52
Gain (Sore Throat)= Info(S)-E(Sore Throat)
= 1.562-1.52 = 0.05
• Gain for each Attribute
Attribute Gain
Sore Throat 0.05
Fever 0.72
Swallen Glands 0.88
Congestion 0.45
Headache 0.05
Decision Tree
Swallen
Glands
YesNo
Diagnosis=Strep Throat
Fever
YesNo
Diagnosis=ColdDiagnosis=Allergy
IF Swallen Glands = “YES”, THEN Diagnosis=Strep Throat
IF Swallen Glands = “NO” AND Fever = “YES”, THEN Diagnosis=Cold
IF Swallen Glands = “NO” AND Fever = “NO”, THEN Diagnosis=Allergy

More Related Content

What's hot

Decision Trees
Decision TreesDecision Trees
Decision TreesStudent
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierNeha Kulkarni
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Simplilearn
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)Shweta Ghate
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classificationKrish_ver2
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)Sharayu Patil
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CARTXueping Peng
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagationKrish_ver2
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 

What's hot (20)

Decision Trees
Decision TreesDecision Trees
Decision Trees
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Decision tree
Decision treeDecision tree
Decision tree
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
Support Vector Machine - How Support Vector Machine works | SVM in Machine Le...
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Data mining technique (decision tree)
Data mining technique (decision tree)Data mining technique (decision tree)
Data mining technique (decision tree)
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
K - Nearest neighbor ( KNN )
K - Nearest neighbor  ( KNN )K - Nearest neighbor  ( KNN )
K - Nearest neighbor ( KNN )
 
Decision tree
Decision treeDecision tree
Decision tree
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Support vector machines (svm)
Support vector machines (svm)Support vector machines (svm)
Support vector machines (svm)
 
Decision tree
Decision treeDecision tree
Decision tree
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Decision Tree - C4.5&CART
Decision Tree - C4.5&CARTDecision Tree - C4.5&CART
Decision Tree - C4.5&CART
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 

Similar to Lect9 Decision tree

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2Nandhini S
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptxcmpt cmpt
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forestsViet-Trung TRAN
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptxGaytriDhingra1
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptxWanderer20
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxPlacementsBCA
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptxImXaib
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analyticsDinakar nk
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 

Similar to Lect9 Decision tree (20)

CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
NN Classififcation Neural Network NN.pptx
NN Classififcation   Neural Network NN.pptxNN Classififcation   Neural Network NN.pptx
NN Classififcation Neural Network NN.pptx
 
7 decision tree
7 decision tree7 decision tree
7 decision tree
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Data Mining
Data MiningData Mining
Data Mining
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
ai4.ppt
ai4.pptai4.ppt
ai4.ppt
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Big Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptxBig Data Analytics - Unit 3.pptx
Big Data Analytics - Unit 3.pptx
 
Classification & Clustering.pptx
Classification & Clustering.pptxClassification & Clustering.pptx
Classification & Clustering.pptx
 
Predictive analytics
Predictive analyticsPredictive analytics
Predictive analytics
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 

More from hktripathy

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematicshktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your datahktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data samplinghktripathy
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualizationhktripathy
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streamshktripathy
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundationshktripathy
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligencehktripathy
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cyclehktripathy
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big datahktripathy
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & predictionhktripathy
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysishktripathy
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmhktripathy
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysishktripathy
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-Ihktripathy
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mininghktripathy
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your datahktripathy
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introductionhktripathy
 

More from hktripathy (18)

Lect 3 background mathematics
Lect 3 background mathematicsLect 3 background mathematics
Lect 3 background mathematics
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 
Lecture7.1 data sampling
Lecture7.1 data samplingLecture7.1 data sampling
Lecture7.1 data sampling
 
Lecture5 virtualization
Lecture5 virtualizationLecture5 virtualization
Lecture5 virtualization
 
Lecture6 introduction to data streams
Lecture6 introduction to data streamsLecture6 introduction to data streams
Lecture6 introduction to data streams
 
Lecture4 big data technology foundations
Lecture4 big data technology foundationsLecture4 big data technology foundations
Lecture4 big data technology foundations
 
Lecture3 business intelligence
Lecture3 business intelligenceLecture3 business intelligence
Lecture3 business intelligence
 
Lecture2 big data life cycle
Lecture2 big data life cycleLecture2 big data life cycle
Lecture2 big data life cycle
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
Lect8 Classification & prediction
Lect8 Classification & predictionLect8 Classification & prediction
Lect8 Classification & prediction
 
Lect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysisLect7 Association analysis to correlation analysis
Lect7 Association analysis to correlation analysis
 
Lect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithmLect6 Association rule & Apriori algorithm
Lect6 Association rule & Apriori algorithm
 
Lect5 principal component analysis
Lect5 principal component analysisLect5 principal component analysis
Lect5 principal component analysis
 
Lect4 principal component analysis-I
Lect4 principal component analysis-ILect4 principal component analysis-I
Lect4 principal component analysis-I
 
Lect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data MiningLect 3 background mathematics for Data Mining
Lect 3 background mathematics for Data Mining
 
Lect 2 getting to know your data
Lect 2 getting to know your dataLect 2 getting to know your data
Lect 2 getting to know your data
 
Lect 1 introduction
Lect 1 introductionLect 1 introduction
Lect 1 introduction
 

Recently uploaded

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17Celine George
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxJiesonDelaCerna
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfadityarao40181
 

Recently uploaded (20)

Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17How to Configure Email Server in Odoo 17
How to Configure Email Server in Odoo 17
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
CELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptxCELL CYCLE Division Science 8 quarter IV.pptx
CELL CYCLE Division Science 8 quarter IV.pptx
 
Biting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdfBiting mechanism of poisonous snakes.pdf
Biting mechanism of poisonous snakes.pdf
 

Lect9 Decision tree

  • 2. Classification by Decision Tree Induction • Decision tree – A flow-chart-like tree structure – Internal node denotes a test on an attribute – Branch represents an outcome of the test – Leaf nodes represent class labels or class distribution – The topmost node in the tree is the root node. • Decision tree generation consists of two phases – Tree construction • At start, all the training examples are at the root • Partition examples recursively based on selected attributes – Tree pruning • Identify and remove branches that reflect noise or outliers • Use of decision tree: Classifying an unknown sample – Test the attribute values of the sample against the decision tree
  • 3. Decision Tree for PlayTennis Outlook Sunny Overcast Rain Humidity High Normal No Yes Each internal node tests an attribute Each branch corresponds to an attribute value node Each leaf node assigns a classification (1) Which to start? (root) (2) Which node to proceed? (3) When to stop/ come to conclusion? Decision trees classify instances or examples by starting at the root of the tree and moving through it until a leaf node.
  • 4. Decision Tree for Conjunction Outlook Sunny Overcast Rain Wind Strong Weak No Yes No Outlook=Sunny  Wind=Weak No
  • 5. Decision Tree for Disjunction Outlook Sunny Overcast Rain Yes Outlook=Sunny  Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  • 6. Decision Tree for XOR Outlook Sunny Overcast Rain Wind Strong Weak Yes No Outlook=Sunny XOR Wind=Weak Wind Strong Weak No Yes Wind Strong Weak No Yes
  • 7. Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes YesNo • decision trees represent disjunctions of conjunctions (Outlook=Sunny  Humidity=Normal)  (Outlook=Overcast)  (Outlook=Rain  Wind=Weak) Decision Tree
  • 8. When to consider Decision Trees • Instances describable by attribute-value pairs • Target function is discrete valued • Disjunctive hypothesis may be required • Possibly noisy training data • Missing attribute values • Examples: – Medical diagnosis – Credit risk analysis – Object classification for robot manipulator (Tan 1993)
  • 9. A simple example • You want to guess the outcome of next week's game between the MallRats and the Chinooks. • Available knowledge / Attribute – was the game at Home or Away – was the starting time 5pm, 7pm or 9pm. – Did Joe play center, or forward. – whether that opponent's center was tall or not. – …..
  • 11. What we know ? • The game will be away, at 9pm, and that Joe will play center on offense… • A classification problem • Generalizing the learned rule to new examples • What you don't know, of course, is who will win this game. • Of course, it is reasonable to assume that this future game will resemble the past games. Note, however, there are no previous games that match these specific values -- ie, no previous game was exactly [Where=Away, When=9pm, FredStarts=No, JoeOffense=Center, JoeDefends=Forward, OppC=Tall]. We therefore need to generalize -- by using the known examples to infer the likely outcome of this new situation. But how?
  • 12. Use a Decision Tree to determine who should win the game As we did not indicate the outcome of this game we call this an "unlabeled instance"; the goal of a classifier is finding the class label for such unlabeled instances. An instance that also includes the outcome is called a "labeled instance" --- eg, the first row of the table corresponds to the labeled instance
  • 13. Decision Trees In general, a decision tree is a tree structure; see left-hard figure below.
  • 14. Example of a Decision Tree Tid Refund Marital Status Taxable Income Cheat 1 Yes Single 125K No 2 No Married 100K No 3 No Single 70K No 4 Yes Married 120K No 5 No Divorced 95K Yes 6 No Married 60K No 7 Yes Divorced 220K No 8 No Single 85K Yes 9 No Married 75K No 10 No Single 90K Yes 10 Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Training Data Model: Decision Tree
  • 15. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Start at the root of tree
  • 16. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 17. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 18. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 19. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data
  • 20. Apply Model to Test Data Refund MarSt TaxInc YESNO NO NO Yes No MarriedSingle, Divorced < 80K > 80K Refund Marital Status Taxable Income Cheat No Married 80K ? 10 Test Data Assign Cheat to “No”
  • 21. Principle ‒ Basic algorithm (adopted by ID3, C4.5 and CART): a greedy algorithm ‒ Tree is constructed in a top-down recursive divide-and-conquer manner ‒ Attributes are categorical (if continuous-valued, they are discretized in advance) ‒ Choose the best attribute(s) to split the remaining instances and make that attribute a decision node Iterations ‒ At start, all the training tuples are at the root ‒ Tuples are partitioned recursively based on selected attributes ‒ Test attributes are selected on the basis of a heuristic or statistical measure (e.g., information gain) Stopping conditions ‒ All samples for a given node belong to the same class ‒ There are no remaining attributes for further partitioning – majority voting is employed for classifying the leaf ‒ There are no samples left Decision Tree Algorithm
  • 26. How to choose An Attribute? • An attribute selection measure is a heuristic for selecting the splitting criterion that “best” separates a given data partition, D, of class labeled training tuples into individual classes. Ideally ‒ Each resulting partition would be pure ‒ A pure partition is a partition containing tuples that all belong to the same class • Attribute selection measures (splitting rules) ‒ Determine how the tuples at a given node are to be split ‒ Provide ranking for each attribute describing the tuples ‒ The attribute with highest score is chosen ‒ Determine a split point or a splitting subset • Methods – Information gain (ID3 (Iterative Dichotomiser 3) /C4.5) – Gain ratio – Gini Index (IBM IntelligentMiner) Attribute Selection Measures
  • 27. Before Describing Information Gain Entropy is a measure of the average information content one is missing when one does not know the value of the random variable. – Shannon's metric of "Entropy" of information is a foundational concept of information theory. – The entropy of a variable is the "amount of information" contained in the variable. High Entropy – X is from a uniform like distribution – Flat histogram – Values sampled from it are less predictable Low Entropy – X is from a varied (peaks and valleys) distribution – Histogram has many lows and highs – Values sampled from it are more predictable
  • 28. 1st approach: Information Gain Approach
  • 30. Assume there are two classes, P and N Let the set of examples D contain p elements of class P and n elements of class N The amount of information, needed to decide if an arbitrary example in D belongs to P or N is defined as Info(D) = np n np n np p np p npI     22 loglog),( Information Gain Approach log2x=log10x/log102
  • 32. Information Gain in Attribute
  • 33. • Assume that using attribute A a set D will be partitioned into sets {D1, D2 , …, Dv} – If D contains pi examples of P and ni examples of N, the entropy, or the expected information needed to classify objects in all subtrees Si is • The encoding information that would be gained by branching on A       1 ),()( i ii ii npI np np AE )(),()( AEnpIAGain  Information Gain in Attribute
  • 35. Information Gain in Attribute
  • 37. Extracting Classification Rules from Trees • Represent the knowledge in the form of IF-THEN rules • One rule is created for each path from the root to a leaf • Each attribute-value pair along a path forms a conjunction • The leaf node holds the class prediction • Rules are easier for humans to understand • Example IF age = “<=30” AND student = “no” THEN buys_computer = “no” IF age = “<=30” AND student = “yes” THEN buys_computer = “yes” IF age = “31…40” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes” IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”
  • 38. Avoid Overfitting in Classification • The generated tree may overfit the training data – Too many branches, some may reflect anomalies due to noise or outliers – Result is in poor accuracy for unseen samples • Two approaches to avoid overfitting – Prepruning: Halt tree construction early—do not split a node if this would result in the goodness measure falling below a threshold • Difficult to choose an appropriate threshold – Postpruning: Remove branches from a “fully grown” tree—get a sequence of progressively pruned trees • Use a set of data different from the training data to decide which is the “best pruned tree”
  • 39. Approaches to Determine the Final Tree Size • Separate training (2/3) and testing (1/3) sets • Use cross validation, e.g., 10-fold cross validation • Use all the data for training – but apply a statistical test (e.g., chi-square) to estimate whether expanding or pruning a node may improve the entire distribution • Use minimum description length (MDL) principle: – halting growth of the tree when the encoding is
  • 40. Enhancements to basic decision tree induction • Allow for continuous-valued attributes – Dynamically define new discrete-valued attributes that partition the continuous attribute value into a discrete set of intervals • Handle missing attribute values – Assign the most common value of the attribute – Assign probability to each of the possible values • Attribute construction – Create new attributes based on existing ones that are sparsely represented – This reduces fragmentation, repetition, and replication
  • 41. Sore Throat Fever Swollen Glands Congestion Headache Diagnosis YES YES YES YES YES Strep Throat NO NO NO YES YES Allergy YES YES NO YES NO Cold YES NO YES NO NO Strep Throat NO YES NO YES NO Cold NO NO NO YES NO Allergy NO NO YES NO NO Strep Throat YES NO NO YES YES Allergy NO YES NO YES YES Cold YES YES NO YES YES Cold Exercise: For the following Medical Diagnosis Data, create a decision tree.
  • 42.   2 2 2 2 2 10 10 10 10 3 3 3 3 4 4 log log log 10 10 10 10 10 10 0.3log (0.3) 2 0.4log (0.4) log (0.3) log (0.4) 0.6 0.4 log 2 log 2 ( 0.522) ( 0.397) 0.6 0.4 0.301 0.301 0.6(1.73) 0.4(1. InfoGain                                               318) 1.038 0.5272 1.562   S=Strep Throat (3)+Allergy(3)+Cold(4)=10 Info(S)=1.562
  • 43. Finding Splitting Attribute • Select Attribute with highest Gain Sore Throat= Strep Throat Allergy Cold YES 2 1 2 NO 1 2 2 Information Gain x P Information Gain x P + = Entropy Sore Throat= 2 2 2 2 2 1 1 2 2 ( ) log log log 5 5 5 5 5 5 ( ) 1.52 Info YES Info YES                          2 2 2 1 1 2 2 2 2 ( ) log log log 5 5 5 5 5 5 ( ) 1.52 Info NO Info NO                          Entropy (E(Sore Throat)= P(YES)x1.52 + P(NO)x1.52 = (5/10)x1.52 + (5/10)x1.52 = 1.52 Gain (Sore Throat)= Info(S)-E(Sore Throat) = 1.562-1.52 = 0.05
  • 44. • Gain for each Attribute Attribute Gain Sore Throat 0.05 Fever 0.72 Swallen Glands 0.88 Congestion 0.45 Headache 0.05 Decision Tree Swallen Glands YesNo Diagnosis=Strep Throat Fever YesNo Diagnosis=ColdDiagnosis=Allergy IF Swallen Glands = “YES”, THEN Diagnosis=Strep Throat IF Swallen Glands = “NO” AND Fever = “YES”, THEN Diagnosis=Cold IF Swallen Glands = “NO” AND Fever = “NO”, THEN Diagnosis=Allergy