SlideShare a Scribd company logo
DECISION TREES
Contents
• Decision Trees
• ID3 Algorithm
• C 4.5 Algorithm
• Random Forest
Decision Trees
• Decision tree is a simple but powerful learning Paradigm.
• It is a type of classification algorithm for supervised learning.
• A decision tree is a tree in which each branch node represents
a choice between a number of alternatives and each leaf node
represents a decision.
• A node with outgoing edges is called an internal or test node.
All other nodes are called leaves (also known as terminal or
decision nodes).
Decision Trees
Construction of decision tree
1) First test all attributes and select the one that will function
as the best root.
2) Break-up the training set into subsets based on the
branches of the root node.
3) Test the remaining attributes to see which ones fit best
underneath the branches of the root node.
4) Continue this process for all other branches until
Decision Trees
I. All the examples of a subset are of one type.
II. There are no examples left.
III. There are no more attributes left.
Decision Tree Example
• Given a collection of shape and colour example, learn a
decision tree that represents it. Use this representation to
classify new examples.
8
• Decision Trees are classifiers for instances represented as
features vectors. (color = ;shape = ;label = )
• Nodes are tests for feature values;
• There is one branch for each value of the feature
• Leaves specify the categories (labels)
• Can categorize instances into multiple disjoint categories – multi-class
Color
Shape
Blue red Green
Shape
square
triangle circle circlesquare
AB
CA
B
B
Evaluation of a
Decision Tree
(color = RED ;shape = triangle)
Learning a
Decision Tree
Decision Trees: The Representation
9
They can represent any Boolean function
• Can be rewritten as rules in Disjunctive Normal Form (DNF)
• green  square positive
• blue  circle  positive
• blue  square  positive
• The disjunction of these rules is equivalent to the Decision Tree
Color
Shape
Blue red Green
Shape
square
triangle circle circlesquare
-+
++
-
+
Binary classified Decision Trees
Limitation of Decision Trees
• Learning a tree that classifies the training data perfectly may
not lead to the tree with the best generalization performance.
- There may be noise in the training data the tree is fitting
- The algorithm might be making decisions based on
very little data.
• A hypothesis h is said to overfit the training data if there is
another hypothesis, h’, such that ‘h’ has smaller error than h’
on the training data but h has larger error on the test data than h’.
Decision Trees
Evaluation Methods for Decision Trees
• Two basic approaches
- Pre-pruning: Stop growing the tree at some point during
construction when it is determined that there is not enough
data to make reliable choices.
- Post-pruning: Grow the full tree and then remove nodes
that seem not to have sufficient evidence.
• Methods for evaluating subtrees to prune:
- Cross-validation: Reserve hold-out set to evaluate utility.
- Statistical testing: Test if the observed regularity can be
dismissed as likely to be occur by chance.
- Minimum Description Length: Is the additional complexity of
the hypothesis smaller than remembering the exceptions ?
ID3 (Iterative Dichotomiser 3)
• Invented by J.Ross Quinlan in 1975.
• Used to generate a decision tree from a given data set by
employing a top-down, greedy search, to test each attribute at
every node of the tree.
• The resulting tree is used to classify future samples.
TrainingSet Attributes Classes
Gender Car
Ownership
Travel Cost Income
Level
Transportation
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Cheap Medium Train
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Male 0 Standard Medium Train
Female 1 Standard Medium Train
Female 1 Expensive High Car
Male 2 Expensive Medium Car
Female 2 Expensive High Car
Algorithm
• Calculate the Entropy of every attribute using the data set.
• Split the set into subsets using the attribute for which entropy is
minimum (or, equivalently, information gain is maximum).
• Make a decision tree node containing that attribute.
• Recurse on subsets using remaining attributes.
Entropy
• In order to define Information Gain precisely, we need to discuss
Entropy first.
• A formula to calculate the homogeneity of a sample.
• A completely homogeneous sample has entropy of 0 (leaf node).
• An equally divided sample has entropy of 1.
• The formula for entropy is:
Entropy(S) = −𝑝(𝐼) log2 𝑝(𝐼)
• where p(I) is the proportion of S belonging to class I. ∑ is over
total outcomes.
Entropy
• Example 1
If S is a collection of 14 examples with 9 YES and 5 NO
examples then
Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14)
= 0.940
Information Gain
• The information gain is based on the decrease in entropy after a
dataset is split on an attribute.
• The formula for calculating information gain is:
Gain(S, A) = Entropy(S) - ((|Sv| / |S|) * Entropy(Sv))
Where, Sv = subset of S for which attribute A has value v
• |Sv| = number of elements in Sv
• |S| = number of elements in S
Procedure
• First the entropy of the total dataset is calculated.
• The dataset is then split on the different attributes.
• The entropy for each branch is calculated.
• Then it is added proportionally, to get total entropy for the split.
• The resulting entropy is subtracted from the entropy before the
split.
• The result is the Information Gain, or decrease in entropy.
• The attribute that yields the largest IG is chosen for the decision
node.
Our Problem :
TrainingSet Attributes Classes
Gender Car
Ownership
Travel Cost Income
Level
Transportation
Male 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Female 1 Cheap Medium Train
Female 0 Cheap Low Bus
Male 1 Cheap Medium Bus
Male 0 Standard Medium Train
Female 1 Standard Medium Train
Female 1 Expensive High Car
Male 2 Expensive Medium Car
Female 2 Expensive High Car
Calculate the entropy of the total dataset
• First compute the Entropy of given training set.
Probability
Bus : 4/10 = 0.4
Train: 3/10 = 0.3
Car:3/10 = 0.3
E(S) = -P(I) log2 P(I)
E(S) = -(0.4) log2 (0.4) – (0.3)log2 (0.3) – (0.3)log2 (0.3) = 1.571
Split the dataset on ‘Gender’ attribute
Probability
Bus : 3/5 = 0.6
Train: 1/5 = 0.2
Car:1/5 = 0.2
Probability
Bus : 1/5 = 0.2
Train: 2/5 = 0.4
Car:2/5 = 0.4
Gain(S,A) = E(S) – I(S,A)
I(S,A) = 1.522 *(5/10) +
1.371*(5/10)
Gain(S,A) = 1.571 – 1.447
= 0.12
E(Sv) = -0.6 log2 (0.6) – 0.2 log2 (0.2)- 0.2 log2 (0.2)
= 1.522
E(Sv) = -0.2 log2 (0.2) – 0.4 log2 (0.4)- 0.4 log2 (0.4)
= 1.371
Split the dataset on ‘ownership’ attribute
Probability
Bus : 2/3= 0.6
Train: 1/3 = 0.3
Car:0/3 = 0
Entropy = 0.918
Probability
Bus : 2/5 = 0.4
Train: 2/5 = 0.4
Car:1/5 = 0.2
Entropy = 1.522
Probability
Bus : 0/2 = 0
Train: 0/2 = 0
Car:2/2 = 1
Entropy = 0
Gain(S,A) = E(S) – I(S,A)
I(S,A) = 0.918 *(3/10) +
1.522*(5/10) + 0*(2/10)
Gain(S,A) = 1.571 – 1.0364
= 0.534
• If we choose Travel Cost as splitting attribute,
-Entropy for Cheap = 0.722
Standard = 0
Expensive = 0
IG = 1.21
• If we choose Income Level as splitting attribute,
-Entropy for Low = 0
Medium = 1.459
High = 0
IG = 0.695
Attribute Information
Gain
Gender 0.125
Car
Ownership
0.534
Travel Cost 1.21
Income Level 0.695
Diagram : Decision Tree
?
Travel
Cost
XpensivStandrdCheap
Train Car
Iteration on Subset of Training Set
Probability
Bus : 4/5 = 0.8
Train: 1/5 = 0.2
Car:0/5 = 0
E(S) = -P(I) log2 P(I)
E(S) = -(0.8) log2 (0.8) – (0.2)log2 (0.2) = 0.722
• If we choose Gender as splitting attribute,
-Entropy for Male = 0
Female = 1
IG = 0.322
• If we choose Car Ownership as splitting attribute,
-Entropy for 0 = 0
1 = 0.918
IG = 0.171
• If we choose Income Level as splitting attribute,
-Entropy for Low = 0
Medium = 0.918
IG = 0.171
Attributes Information
Gain
Gender 0.322
Car
Ownership
0.171
Income Level 0.171
Diagram : Decision Tree
Travel
Cost
XpensivStandrdCheap
Train CarMale Female
Bus 0 1
Train Bus
Solution to Our Problem :
Name Gender Car
Ownership
Travel Cost Income
Level
Transportat
ion
Abhi Male 1 Standard High Train
Pavi Male 0 Cheap Medium Bus
Ammu Female 1 Cheap High Bus
Drawbacks of ID3
• Data may be over-fitted or over-classified, if a small sample is
tested.
• Only one attribute at a time is tested for making a decision.
• Classifying continuous data may be computationally
expensive.
• Does not handle numeric attributes and missing values.
C4.5 Algorithm
• This algorithm is also invented by J.Ross Quinlan., for generating
decision tree.
• It can be treated as an extension of ID3.
• C4.5 is mainly used for classification of data.
• It is called statistical classifier because of its potential for handling noisy
data.
Construction
• Base cases are determined.
• For each attribute a, the normalized information gain is found
out by splitting on a.
• Let a-best be the attribute with the highest normalized
information gain.
• Create a decision node that split on a-best
• Recur on the sub-lists obtained by splitting on a-best, and add
those nodes as children of node.
• The new Features include,
(i) accepts both continuous and discrete features.
(ii) handles incomplete data points.
(iii) solves over-fitting problem by bottom-up technique
usually known as "pruning”.
(iv) different weights can be applied the features that comprise
the training data.
Application based on features
• Continuous attribute categorisation.
discretisation – It partitions the continuous attribute values
into discrete set of intervals.
• Handling missing values.
Uses the theory of probability – probability is calculated from
the observed frequencies, takes the best probable value after
computing the frequencies.
Random Forest
• Random forest is an ensemble classifier which uses many
decision trees.
• It builds multiple decision trees and merges them together to
get a more accurate and stable prediction.
• It can be used for both classification and regression problems.
Algorithm
1) Create a bootstrapped dataset.
2) Create a decision tree using the bootstrapped dataset, but
only use a random subset of variables(or columns) at each
step.
3) Go back to step 1 and repeat.
Using a bootstrapped sample and considering only a subset of
variables at each step results in a wide variety of trees.
Explanation
• First, the process starts with the randomisation of the data sets
(done by bagging)
• Bagging – helps to reduce the variance, helps in effective
classification.
• RF contains many classification trees.
• Bagging – Bootstrapping the data plus using the aggregate to
make a decision.
• Out-of-Bag-Dataset – The dataset formed by using un chosen
tuples from the training set during the process of bootstrapping.
Gini Index
• Random Forest uses the gini index taken from the CART learning
system to construct decision trees. The gini index of node
impurity is the measure most commonly chosen for
classification-type problems. If a dataset T contains examples
from n classes.
• gini index, Gini(T) is defined as:
Gini(T) = 1 - ∑ 𝑗=1
𝑛
𝑃𝑗
2
where pj is the relative frequency of class j in T
• If a dataset T is split into two subsets T1 and T2 with sizes N1
and N2 respectively, the gini index of the split data contains
examples from n classes, the gini index (T) is defined as:
• Ginisplit(T) = N1/Ngini(T1) + N2/Ngini(T2)
Flow Chart of Algorithm
Advantages
• It produces a highly accurate classifier and learning is fast
• It runs efficiently on large data bases.
• It can handle thousands of input variables without variable
deletion.
• It computes proximities between pairs of cases that can be
used in clustering, locating outliers or (by scaling) give
interesting views of the data.
• It offers an experimental method for detecting variable
interactions.

More Related Content

What's hot

Decision tree
Decision treeDecision tree
Decision tree
ShraddhaPandey45
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
Krish_ver2
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
Md. Main Uddin Rony
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
Mohit Rajput
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
Krish_ver2
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
MdAlAmin187
 
Back propagation
Back propagationBack propagation
Back propagation
Nagarajan
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
Neha Kulkarni
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
Archana Swaminathan
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
parry prabhu
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
SANTHOSH RAJA M G
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3Laila Fatehy
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
CosmoAIMS Bassett
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
mrizwan969
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Marina Santini
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
Hemant Chetwani
 

What's hot (20)

Decision tree
Decision treeDecision tree
Decision tree
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
2.3 bayesian classification
2.3 bayesian classification2.3 bayesian classification
2.3 bayesian classification
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Decision tree in artificial intelligence
Decision tree in artificial intelligenceDecision tree in artificial intelligence
Decision tree in artificial intelligence
 
Back propagation
Back propagationBack propagation
Back propagation
 
K-Nearest Neighbor Classifier
K-Nearest Neighbor ClassifierK-Nearest Neighbor Classifier
K-Nearest Neighbor Classifier
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Decision tree lecture 3
Decision tree lecture 3Decision tree lecture 3
Decision tree lecture 3
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
CART – Classification & Regression Trees
CART – Classification & Regression TreesCART – Classification & Regression Trees
CART – Classification & Regression Trees
 

Similar to Decision Trees

Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
PriyadharshiniG41
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
Souma Maiti
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
JayabharathiMuraliku
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
Nandhini S
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
AdityaSoraut
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
Vijayalakshmi171563
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
Wanderer20
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
Wanderer20
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
DrGnaneswariG
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
Kush Kulshrestha
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
tttiba
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
Rajasekhar364622
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
Viet-Trung TRAN
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
Wake Tech BAS
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
vijaita kashyap
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
ssuser4c50a9
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
Chitrachitrap
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
GaytriDhingra1
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
GaytriDhingra1
 

Similar to Decision Trees (20)

Decision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptxDecision Tree Classification Algorithm.pptx
Decision Tree Classification Algorithm.pptx
 
Decision Tree in Machine Learning
Decision Tree in Machine Learning  Decision Tree in Machine Learning
Decision Tree in Machine Learning
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
CSA 3702 machine learning module 2
CSA 3702 machine learning module 2CSA 3702 machine learning module 2
CSA 3702 machine learning module 2
 
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdfMachine Learning Unit-5 Decesion Trees & Random Forest.pdf
Machine Learning Unit-5 Decesion Trees & Random Forest.pdf
 
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptxMACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
MACHINE LEARNING - ENTROPY & INFORMATION GAINpptx
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
module_3_1.pptx
module_3_1.pptxmodule_3_1.pptx
module_3_1.pptx
 
Chapter 4.pdf
Chapter 4.pdfChapter 4.pdf
Chapter 4.pdf
 
Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees Machine Learning Algorithm - Decision Trees
Machine Learning Algorithm - Decision Trees
 
Decision tree for data mining and computer
Decision tree for data mining and computerDecision tree for data mining and computer
Decision tree for data mining and computer
 
Using Tree algorithms on machine learning
Using Tree algorithms on machine learningUsing Tree algorithms on machine learning
Using Tree algorithms on machine learning
 
From decision trees to random forests
From decision trees to random forestsFrom decision trees to random forests
From decision trees to random forests
 
BAS 250 Lecture 8
BAS 250 Lecture 8BAS 250 Lecture 8
BAS 250 Lecture 8
 
Lec 18-19.pptx
Lec 18-19.pptxLec 18-19.pptx
Lec 18-19.pptx
 
Lecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdfLecture 5 Decision tree.pdf
Lecture 5 Decision tree.pdf
 
Unit 2-ML.pptx
Unit 2-ML.pptxUnit 2-ML.pptx
Unit 2-ML.pptx
 
Machine Learning
Machine Learning Machine Learning
Machine Learning
 
AI -learning and machine learning.pptx
AI  -learning and machine learning.pptxAI  -learning and machine learning.pptx
AI -learning and machine learning.pptx
 

More from Student

Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
Student
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
Student
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaStudent
 
A review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicA review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicStudent
 
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Student
 
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Student
 
Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...
Student
 
Desgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by JaseelaDesgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by Jaseela
Student
 
Google glass by Jaseela
Google glass by JaseelaGoogle glass by Jaseela
Google glass by Jaseela
Student
 

More from Student (9)

Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
 
Python for Machine Learning
Python for Machine LearningPython for Machine Learning
Python for Machine Learning
 
Introducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by JaseelaIntroducing Technologies for Handling Big Data by Jaseela
Introducing Technologies for Handling Big Data by Jaseela
 
A review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopicA review of localization systems for robotic endoscopic
A review of localization systems for robotic endoscopic
 
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
Signature Free Virus Blocking Method to Detect Software Code Security (Intern...
 
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)Sigfree ppt (International Journal of Computer Science and Mobile Computing)
Sigfree ppt (International Journal of Computer Science and Mobile Computing)
 
Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...Design and Implementation of Improved Authentication System for Android Smart...
Design and Implementation of Improved Authentication System for Android Smart...
 
Desgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by JaseelaDesgn&imp authentctn.ppt by Jaseela
Desgn&imp authentctn.ppt by Jaseela
 
Google glass by Jaseela
Google glass by JaseelaGoogle glass by Jaseela
Google glass by Jaseela
 

Recently uploaded

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
PedroFerreira53928
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
EverAndrsGuerraGuerr
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
kaushalkr1407
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
Nguyen Thanh Tu Collection
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
Excellence Foundation for South Sudan
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
DeeptiGupta154
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
bennyroshan06
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
Fundacja Rozwoju Społeczeństwa Przedsiębiorczego
 

Recently uploaded (20)

Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Thesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.pptThesis Statement for students diagnonsed withADHD.ppt
Thesis Statement for students diagnonsed withADHD.ppt
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
The Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdfThe Roman Empire A Historical Colossus.pdf
The Roman Empire A Historical Colossus.pdf
 
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
GIÁO ÁN DẠY THÊM (KẾ HOẠCH BÀI BUỔI 2) - TIẾNG ANH 8 GLOBAL SUCCESS (2 CỘT) N...
 
Introduction to Quality Improvement Essentials
Introduction to Quality Improvement EssentialsIntroduction to Quality Improvement Essentials
Introduction to Quality Improvement Essentials
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Overview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with MechanismOverview on Edible Vaccine: Pros & Cons with Mechanism
Overview on Edible Vaccine: Pros & Cons with Mechanism
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptxMARUTI SUZUKI- A Successful Joint Venture in India.pptx
MARUTI SUZUKI- A Successful Joint Venture in India.pptx
 
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdfESC Beyond Borders _From EU to You_ InfoPack general.pdf
ESC Beyond Borders _From EU to You_ InfoPack general.pdf
 

Decision Trees

  • 2. Contents • Decision Trees • ID3 Algorithm • C 4.5 Algorithm • Random Forest
  • 3. Decision Trees • Decision tree is a simple but powerful learning Paradigm. • It is a type of classification algorithm for supervised learning. • A decision tree is a tree in which each branch node represents a choice between a number of alternatives and each leaf node represents a decision. • A node with outgoing edges is called an internal or test node. All other nodes are called leaves (also known as terminal or decision nodes).
  • 5. Construction of decision tree 1) First test all attributes and select the one that will function as the best root. 2) Break-up the training set into subsets based on the branches of the root node. 3) Test the remaining attributes to see which ones fit best underneath the branches of the root node. 4) Continue this process for all other branches until
  • 6. Decision Trees I. All the examples of a subset are of one type. II. There are no examples left. III. There are no more attributes left.
  • 7. Decision Tree Example • Given a collection of shape and colour example, learn a decision tree that represents it. Use this representation to classify new examples.
  • 8. 8 • Decision Trees are classifiers for instances represented as features vectors. (color = ;shape = ;label = ) • Nodes are tests for feature values; • There is one branch for each value of the feature • Leaves specify the categories (labels) • Can categorize instances into multiple disjoint categories – multi-class Color Shape Blue red Green Shape square triangle circle circlesquare AB CA B B Evaluation of a Decision Tree (color = RED ;shape = triangle) Learning a Decision Tree Decision Trees: The Representation
  • 9. 9 They can represent any Boolean function • Can be rewritten as rules in Disjunctive Normal Form (DNF) • green  square positive • blue  circle  positive • blue  square  positive • The disjunction of these rules is equivalent to the Decision Tree Color Shape Blue red Green Shape square triangle circle circlesquare -+ ++ - + Binary classified Decision Trees
  • 10. Limitation of Decision Trees • Learning a tree that classifies the training data perfectly may not lead to the tree with the best generalization performance. - There may be noise in the training data the tree is fitting - The algorithm might be making decisions based on very little data. • A hypothesis h is said to overfit the training data if there is another hypothesis, h’, such that ‘h’ has smaller error than h’ on the training data but h has larger error on the test data than h’.
  • 12. Evaluation Methods for Decision Trees • Two basic approaches - Pre-pruning: Stop growing the tree at some point during construction when it is determined that there is not enough data to make reliable choices. - Post-pruning: Grow the full tree and then remove nodes that seem not to have sufficient evidence. • Methods for evaluating subtrees to prune: - Cross-validation: Reserve hold-out set to evaluate utility. - Statistical testing: Test if the observed regularity can be dismissed as likely to be occur by chance. - Minimum Description Length: Is the additional complexity of the hypothesis smaller than remembering the exceptions ?
  • 13. ID3 (Iterative Dichotomiser 3) • Invented by J.Ross Quinlan in 1975. • Used to generate a decision tree from a given data set by employing a top-down, greedy search, to test each attribute at every node of the tree. • The resulting tree is used to classify future samples.
  • 14.
  • 15.
  • 16. TrainingSet Attributes Classes Gender Car Ownership Travel Cost Income Level Transportation Male 0 Cheap Low Bus Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car
  • 17. Algorithm • Calculate the Entropy of every attribute using the data set. • Split the set into subsets using the attribute for which entropy is minimum (or, equivalently, information gain is maximum). • Make a decision tree node containing that attribute. • Recurse on subsets using remaining attributes.
  • 18. Entropy • In order to define Information Gain precisely, we need to discuss Entropy first. • A formula to calculate the homogeneity of a sample. • A completely homogeneous sample has entropy of 0 (leaf node). • An equally divided sample has entropy of 1. • The formula for entropy is: Entropy(S) = −𝑝(𝐼) log2 𝑝(𝐼) • where p(I) is the proportion of S belonging to class I. ∑ is over total outcomes.
  • 19. Entropy • Example 1 If S is a collection of 14 examples with 9 YES and 5 NO examples then Entropy(S) = - (9/14) Log2 (9/14) - (5/14) Log2 (5/14) = 0.940
  • 20. Information Gain • The information gain is based on the decrease in entropy after a dataset is split on an attribute. • The formula for calculating information gain is: Gain(S, A) = Entropy(S) - ((|Sv| / |S|) * Entropy(Sv)) Where, Sv = subset of S for which attribute A has value v • |Sv| = number of elements in Sv • |S| = number of elements in S
  • 21. Procedure • First the entropy of the total dataset is calculated. • The dataset is then split on the different attributes. • The entropy for each branch is calculated. • Then it is added proportionally, to get total entropy for the split. • The resulting entropy is subtracted from the entropy before the split. • The result is the Information Gain, or decrease in entropy. • The attribute that yields the largest IG is chosen for the decision node.
  • 23. TrainingSet Attributes Classes Gender Car Ownership Travel Cost Income Level Transportation Male 0 Cheap Low Bus Male 1 Cheap Medium Bus Female 1 Cheap Medium Train Female 0 Cheap Low Bus Male 1 Cheap Medium Bus Male 0 Standard Medium Train Female 1 Standard Medium Train Female 1 Expensive High Car Male 2 Expensive Medium Car Female 2 Expensive High Car
  • 24. Calculate the entropy of the total dataset • First compute the Entropy of given training set. Probability Bus : 4/10 = 0.4 Train: 3/10 = 0.3 Car:3/10 = 0.3 E(S) = -P(I) log2 P(I) E(S) = -(0.4) log2 (0.4) – (0.3)log2 (0.3) – (0.3)log2 (0.3) = 1.571
  • 25.
  • 26. Split the dataset on ‘Gender’ attribute Probability Bus : 3/5 = 0.6 Train: 1/5 = 0.2 Car:1/5 = 0.2 Probability Bus : 1/5 = 0.2 Train: 2/5 = 0.4 Car:2/5 = 0.4 Gain(S,A) = E(S) – I(S,A) I(S,A) = 1.522 *(5/10) + 1.371*(5/10) Gain(S,A) = 1.571 – 1.447 = 0.12 E(Sv) = -0.6 log2 (0.6) – 0.2 log2 (0.2)- 0.2 log2 (0.2) = 1.522 E(Sv) = -0.2 log2 (0.2) – 0.4 log2 (0.4)- 0.4 log2 (0.4) = 1.371
  • 27.
  • 28. Split the dataset on ‘ownership’ attribute Probability Bus : 2/3= 0.6 Train: 1/3 = 0.3 Car:0/3 = 0 Entropy = 0.918 Probability Bus : 2/5 = 0.4 Train: 2/5 = 0.4 Car:1/5 = 0.2 Entropy = 1.522 Probability Bus : 0/2 = 0 Train: 0/2 = 0 Car:2/2 = 1 Entropy = 0 Gain(S,A) = E(S) – I(S,A) I(S,A) = 0.918 *(3/10) + 1.522*(5/10) + 0*(2/10) Gain(S,A) = 1.571 – 1.0364 = 0.534
  • 29. • If we choose Travel Cost as splitting attribute, -Entropy for Cheap = 0.722 Standard = 0 Expensive = 0 IG = 1.21 • If we choose Income Level as splitting attribute, -Entropy for Low = 0 Medium = 1.459 High = 0 IG = 0.695
  • 31. Diagram : Decision Tree ? Travel Cost XpensivStandrdCheap Train Car
  • 32. Iteration on Subset of Training Set Probability Bus : 4/5 = 0.8 Train: 1/5 = 0.2 Car:0/5 = 0 E(S) = -P(I) log2 P(I) E(S) = -(0.8) log2 (0.8) – (0.2)log2 (0.2) = 0.722
  • 33. • If we choose Gender as splitting attribute, -Entropy for Male = 0 Female = 1 IG = 0.322 • If we choose Car Ownership as splitting attribute, -Entropy for 0 = 0 1 = 0.918 IG = 0.171 • If we choose Income Level as splitting attribute, -Entropy for Low = 0 Medium = 0.918 IG = 0.171
  • 35. Diagram : Decision Tree Travel Cost XpensivStandrdCheap Train CarMale Female Bus 0 1 Train Bus
  • 36. Solution to Our Problem : Name Gender Car Ownership Travel Cost Income Level Transportat ion Abhi Male 1 Standard High Train Pavi Male 0 Cheap Medium Bus Ammu Female 1 Cheap High Bus
  • 37. Drawbacks of ID3 • Data may be over-fitted or over-classified, if a small sample is tested. • Only one attribute at a time is tested for making a decision. • Classifying continuous data may be computationally expensive. • Does not handle numeric attributes and missing values.
  • 38. C4.5 Algorithm • This algorithm is also invented by J.Ross Quinlan., for generating decision tree. • It can be treated as an extension of ID3. • C4.5 is mainly used for classification of data. • It is called statistical classifier because of its potential for handling noisy data.
  • 39. Construction • Base cases are determined. • For each attribute a, the normalized information gain is found out by splitting on a. • Let a-best be the attribute with the highest normalized information gain. • Create a decision node that split on a-best • Recur on the sub-lists obtained by splitting on a-best, and add those nodes as children of node.
  • 40. • The new Features include, (i) accepts both continuous and discrete features. (ii) handles incomplete data points. (iii) solves over-fitting problem by bottom-up technique usually known as "pruning”. (iv) different weights can be applied the features that comprise the training data.
  • 41. Application based on features • Continuous attribute categorisation. discretisation – It partitions the continuous attribute values into discrete set of intervals. • Handling missing values. Uses the theory of probability – probability is calculated from the observed frequencies, takes the best probable value after computing the frequencies.
  • 42. Random Forest • Random forest is an ensemble classifier which uses many decision trees. • It builds multiple decision trees and merges them together to get a more accurate and stable prediction. • It can be used for both classification and regression problems.
  • 43. Algorithm 1) Create a bootstrapped dataset. 2) Create a decision tree using the bootstrapped dataset, but only use a random subset of variables(or columns) at each step. 3) Go back to step 1 and repeat. Using a bootstrapped sample and considering only a subset of variables at each step results in a wide variety of trees.
  • 44. Explanation • First, the process starts with the randomisation of the data sets (done by bagging) • Bagging – helps to reduce the variance, helps in effective classification. • RF contains many classification trees. • Bagging – Bootstrapping the data plus using the aggregate to make a decision. • Out-of-Bag-Dataset – The dataset formed by using un chosen tuples from the training set during the process of bootstrapping.
  • 45. Gini Index • Random Forest uses the gini index taken from the CART learning system to construct decision trees. The gini index of node impurity is the measure most commonly chosen for classification-type problems. If a dataset T contains examples from n classes. • gini index, Gini(T) is defined as: Gini(T) = 1 - ∑ 𝑗=1 𝑛 𝑃𝑗 2 where pj is the relative frequency of class j in T
  • 46. • If a dataset T is split into two subsets T1 and T2 with sizes N1 and N2 respectively, the gini index of the split data contains examples from n classes, the gini index (T) is defined as: • Ginisplit(T) = N1/Ngini(T1) + N2/Ngini(T2)
  • 47. Flow Chart of Algorithm
  • 48. Advantages • It produces a highly accurate classifier and learning is fast • It runs efficiently on large data bases. • It can handle thousands of input variables without variable deletion. • It computes proximities between pairs of cases that can be used in clustering, locating outliers or (by scaling) give interesting views of the data. • It offers an experimental method for detecting variable interactions.

Editor's Notes

  1. A classification technique (or classifier) is a systematic approach to building classification models from an input data set. Examples include decision tree classifiers, rule-based classifiers, neural networks, support vector machines, and na ̈ ıve Bayes classifiers. Each technique employs a learning algorithm to identify a model that best fits the relationship between the attribute set and class label of the input data. The model generated by a learning algorithm should both fit the input data well and correctly predict the class labels of records it has never seen before. Therefore, a key objective of the learning algorithm is to build models with good generalization capability; i.e., models that accurately predict the class labels of previously unknown records.
  2. -Ensemble methods are learning algorithms that construct a. set of classifiers and then classify new data points by taking a (weighted) vote of their predictions. -To create a bootstrapped dataset that is the same size as the original, we just randomly selects samples from the original dataset.