SlideShare a Scribd company logo
1 of 38
Made by : Komal Kotak
Decision tree
Decision tree
▸A decision tree is a decision support tool
that uses a tree-like graph or model of
decisions and their possible consequences,
including chance event outcomes,
resource costs, and utility. It is one way to
display an algorithm.
3
Why decision tree?
▸Decision trees are powerful and popular
tools for classification and prediction.
▸Decision trees represent rules, which can
be understood by humans and used in
knowledge system such as database.
4
Requirements 1
▸Attribute-value description: object or case must be
expressible in terms of a fixed collection of
properties or attributes (e.g., hot, mild, cold).
▸Predefined classes (target values): the target
function has discrete output values (bollean or
multiclass)
▸Sufficient data: enough training cases should be
provided to learn the model.
5
Easy Example 1
A Decision Tree with two choices.
6
Go to Graduate School to get my
MBA.
Go to Work “in the Real World”
Notation used in Decision trees
▸A box is used to show a choice that the
manager has to make.
▸A circle is used to show that a probability
outcome will occur.
▸Lines connect outcomes to their choice
or probability outcome.
7
Easy Example 1 Revisited
What are some of the costs we should take into account
when deciding whether or not to go to business school?
• Tuition and Fees
• Rent / Food / etc.
• Opportunity cost of salary
• Anticipated future earnings
8
Simple Decision Tree Model
9
Go to Graduate
School to get my
MBA.
Go to Work “in the
Real World”
2 Years of tuition: $55,000, 2 years of Room/Board:
$20,000; 2 years of Opportunity Cost of Salary =
$100,000
Total = $175,000.
PLUS  Anticipated 5 year salary after Business
School = $600,000.
NPV (business school) = $600,000 - $175,000 =
$425,000
First two year salary = $100,000 (from above),
minus expenses of $20,000.
Final five year salary = $330,000
NPV (no b-school) = $410,000
“Go to Business
School
Easy Example 2
▸The following decision tree is for
the concept buy computer that
indicates whether a customer at a
company is likely to buy a computer
or not. Each internal node represents
a test on an attribute. Each leaf node
represents a class.
11
How does a tree decide where to split?
▸The decision of making strategic splits heavily affects a tree’s accuracy. The
decision criteria is different for classification and regression trees.
▸Decision trees use multiple algorithms to decide to split a node in two or
more sub-nodes. The creation of sub-nodes increases the homogeneity of
resultant sub-nodes. In other words, we can say that purity of the node
increases with respect to the target variable. Decision tree splits the nodes on
all available variables and then selects the split which results in most
homogeneous sub-nodes.
12
The algorithm selection is
also based on type of target
variables. Let’s look at the
four most commonly used
algorithms in decision tree:
Gini Index
Chi-Square
Information Gain
Reduction in Variance
13
Gini Index
Gini index says, if we select two items from a population at random then
they must be of same class and probability for this is 1 if population is pure.
▸It works with categorical target variable “Success” or “Failure”.
▸It performs only Binary splits
▸Higher the value of Gini higher the homogeneity.
▸CART (Classification and Regression Tree) uses Gini method to create
binary splits.
14
Gini Index
Steps to Calculate Gini for a split
▸Calculate Gini for sub-nodes, using formula sum of square of probability
for success and failure (p^2+q^2).
▸Calculate Gini for split using weighted Gini score of each node of that split
▸Example: –To segregate the students based on target variable ( playing
cricket or not ). In the snapshot below, we split the population using two
input variables Gender and Class. Now, I want to identify which split is
producing more homogeneous sub-nodes using Gini index.
15
16
Gini Index
Split on Gender:
▸Calculate, Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68
▸Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55
▸Calculate weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59
Similar for Split on Class:
▸Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51
▸Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51
▸Calculate weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51
▸Above, you can see that Gini score for Split on Gender is higher than Split on
Class, hence, the node split will take place on Gender
17
Chi-Square
It is an algorithm to find out the statistical significance between the differences
between sub-nodes and parent node. We measure it by sum of squares of
standardized differences between observed and expected frequencies of target
variable.
▸It works with categorical target variable “Success” or “Failure”.
▸It can perform two or more splits.
▸Higher the value of Chi-Square higher the statistical significance of differences
between sub-node and Parent node.
▸Chi-Square of each node is calculated using formula,
▸Chi-square = ((Actual – Expected)^2 / Expected)^1/2
▸It generates tree called CHAID (Chi-square Automatic Interaction Detector)
18
Chi-Square
Steps to Calculate Chi-square for a split:
▸Calculate Chi-square for individual node by calculating the deviation for
Success and Failure both
▸Calculated Chi-square of Split using Sum of all Chi-square of success and
Failure of each node of the split
Example: Let’s work with above example that we have used to calculate Gini.
19
Chi-Square
Split on Gender:
▸First we are populating for node Female, Populate the actual value for “Play Cricket” and “Not
Play Cricket”, here these are 2 and 8 respectively.
▸Calculate expected value for “Play Cricket” and “Not Play Cricket”, here it would be 5 for both
because parent node has probability of 50% and we have applied same probability on Female
count(10).
▸Calculate deviations by using formula, Actual – Expected. It is for “Play Cricket” (2 – 5 = -3) and
for “Not play cricket” ( 8 – 5 = 3).
▸Calculate Chi-square of node for “Play Cricket” and “Not Play Cricket” using formula with
formula, = ((Actual – Expected)^2 / Expected)^1/2. You can refer below table for calculation.
▸Follow similar steps for calculating Chi-square value for Male node.
▸Now add all Chi-square values to calculate Chi-square for split Gender.
20
Chi-Square
Split on Class:
Perform similar steps of calculation for split on Class and you will come up with below table.
Above, you can see that Chi-square also identify the Gender split is more significant compare to
Class.
21
Information Gain
Look at the image below and think which node can be described easily. I am sure, your answer is C
because it requires less information as all values are similar. On the other hand, B requires more
information to describe it and A requires the maximum information. In other words, we can say
that C is a Pure node, B is less Impure and A is more impure.
Now, we can build a conclusion that less impure node requires less information to describe it.
And, more impure node requires more information. Information theory is a measure to define this
degree of disorganization in a system known as Entropy. If the sample is completely
homogeneous, then the entropy is zero and if the sample is an equally divided (50% – 50%), it has
entropy of one.
22
Information Gain
Entropy can be calculated using formula:-
▸Here p and q is probability of success and failure respectively in that node. Entropy
is also used with categorical target variable. It chooses the split which has lowest
entropy compared to parent node and other splits. The lesser the entropy, the better it
is.
Steps to calculate entropy for a split:
▸Calculate entropy of parent node
▸Calculate entropy of each individual node of split and calculate weighted average of
all sub-nodes available in split.
23
Information Gain
Example: Let’s use this method to identify best split for student example.
▸Entropy for parent node = -(15/30) log2 (15/30) – (15/30) log2 (15/30) = 1. Here 1 shows
that it is a impure node.
▸Entropy for Female node = -(2/10) log2 (2/10) – (8/10) log2 (8/10) = 0.72 and for male
node, -(13/20) log2 (13/20) – (7/20) log2 (7/20) = 0.93
▸Entropy for split Gender = Weighted entropy of sub-nodes = (10/30)*0.72 + (20/30)*0.93 =
0.86
▸Entropy for Class IX node, -(6/14) log2 (6/14) – (8/14) log2 (8/14) = 0.99 and for Class
X node, -(9/16) log2 (9/16) – (7/16) log2 (7/16) = 0.99.
▸Entropy for split Class = (14/30)*0.99 + (16/30)*0.99 = 0.99
Above, you can see that entropy for Split on Gender is the lowest among all, so the tree will split
on Gender. We can derive information gain from entropy as 1- Entropy.
24
Reduction in Variance
▸Till now, we have discussed the algorithms for categorical target variable. Reduction
in variance is an algorithm used for continuous target variables (regression
problems). This algorithm uses the standard formula of variance to choose the
best split. The split with lower variance is selected as the criteria to split the
population:
▸Above X-bar is mean of the values, X is actual and n is number of values.
▸Steps to calculate Variance:
▸Calculate variance for each node.
▸Calculate variance for each split as weighted average of each node variance.
25
Reduction in Variance
▸ Example:- Let’s assign numerical value 1 for play cricket and 0 for not playing cricket. Now
follow the steps to identify the right split:
▸ Variance for Root node, here mean value is (15*1 + 15*0)/30 = 0.5 and we have 15 one and 15
zero. Now variance would be ((1-0.5)^2+(1-0.5)^2+….15 times+(0-0.5)^2+(0-0.5)^2+…15 times) /
30, this can be written as (15*(1-0.5)^2+15*(0-0.5)^2) / 30 = 0.25
▸Mean of Female node = (2*1+8*0)/10=0.2 and Variance = (2*(1-0.2)^2+8*(0-0.2)^2) / 10 = 0.16
▸Mean of Male Node = (13*1+7*0)/20=0.65 and Variance = (13*(1-0.65)^2+7*(0-0.65)^2) / 20 = 0.23
▸Variance for Split Gender = Weighted Variance of Sub-nodes = (10/30)*0.16 + (20/30) *0.23 = 0.21
▸Mean of Class IX node = (6*1+8*0)/14=0.43 and Variance = (6*(1-0.43)^2+8*(0-0.43)^2) / 14= 0.24
▸Mean of Class X node = (9*1+7*0)/16=0.56 and Variance = (9*(1-0.56)^2+7*(0-0.56)^2) / 16 = 0.25
▸Variance for Split Gender = (14/30)*0.24 + (16/30) *0.25 = 0.25
Above, you can see that Gender split has lower variance compare to parent node, so the split would
take place on Gender variable.
26
TREE PRUNING
▸Pruning is a technique in machine learning that reduces the
size of decision trees by removing sections of the tree that
provide little power to classify instances. Pruning reduces the
complexity of the final classifier, and hence improves predictive
accuracy by the reduction of overfitting.
27
Techniques of TREE PRUNING
▸Pruning can occur in a top down or bottom up fashion. A top down
pruning will traverse nodes and trim sub trees starting at the root, while a
bottom up pruning will start at the leaf nodes. Below are several popular
pruning algorithms.
▸Reduced error pruning
▸One of the simplest forms of pruning is reduced error pruning. Starting at
the leaves, each node is replaced with its most popular class. If the
prediction accuracy is not affected then the change is kept. While somewhat
naive, reduced error pruning has the advantage of simplicity and speed.
28
Techniques of TREE PRUNING
▸Cost complexity pruning
▸Cost complexity pruning generates a series of trees T0 . . . Tm where T0 is the initial tree
and Tm is the root alone. At step i the tree is created by removing a sub tree from tree i-1
and replacing it with a leaf node with value chosen as in the tree building algorithm. The
sub tree that is removed is chosen as follows. Define the error rate of tree T over data set
S as err(T,S). The sub tree that minimizes
is chosen for removal. The function prune(T,t) defines the tree gotten by pruning the sub
trees t from the tree T. Once the series of trees has been created, the best tree is chosen by
generalized accuracy as measured by a training set or cross-validation.
29
Bayesian
Classification
Bayesian Classification: Why?
▸Probabilistic learning: Calculate explicit probabilities for hypothesis,
among the most practical approaches to certain types of learning problems
▸Incremental: Each training example can incrementally
increase/decrease the probability that a hypothesis is correct. Prior
knowledge can be combined with observed data.
▸Probabilistic prediction: Predict multiple hypotheses, weighted by
their probabilities
▸Standard: Even when Bayesian methods are computationally intractable,
they can provide a standard of optimal decision making against which other
methods can be measured
31
Bayesian Theorem
▸Given training data D, posteriori probability of a hypothesis h,
P(h|D) follows the Bayes theorem
▸MAP (maximum posteriori) hypothesis
▸Practical difficulty: require initial knowledge of many
probabilities, significant computational cost.
32
)(
)()|()|(
DP
hPhDPDhP 
.)()|(maxarg)|(maxarg hPhDP
Hh
DhP
HhMAP
h




Naïve Bayes Classifier
▸A simplified assumption: attributes are
conditionally independent:
▸Greatly reduces the computation cost, only count
the class distribution.
33


n
i
jijj CdPCPDCP
1
)|()()|(
Naïve Bayes Classifier
▸If i-th attribute is categorical:
P(di|C) is estimated as the relative freq of samples having value
di as i-th attribute in class C
▸If i-th attribute is continuous:
P(di|C) is estimated thru a Gaussian density function
▸Computationally easy in both cases
34
Play-tennis example: estimating P(xi|C)
35
Outlook Temperature Humidity Windy Class
sunny hot high false N
sunny hot high true N
overcast hot high false P
rain mild high false P
rain cool normal false P
rain cool normal true N
overcast cool normal true P
sunny mild high false N
sunny cool normal false P
rain mild normal false P
sunny mild normal true P
overcast mild high true P
overcast hot normal false P
rain mild high true N
P(true|n) = 3/5P(true|p) = 3/9
P(false|n) = 2/5P(false|p) = 6/9
P(high|n) = 4/5P(high|p) = 3/9
P(normal|n) = 2/5P(normal|p) = 6/9
P(hot|n) = 2/5P(hot|p) = 2/9
P(mild|n) = 2/5P(mild|p) = 4/9
P(cool|n) = 1/5P(cool|p) = 3/9
P(rain|n) = 2/5P(rain|p) = 3/9
P(overcast|n) = 0P(overcast|p) = 4/9
P(sunny|n) = 3/5P(sunny|p) = 2/9
windy
humidity
temperature
outlook
P(n) = 5/14
P(p) = 9/14
Naive Bayesian Classifier
▸Given a training set, we can compute the probabilities
36
O utlook P N H um idity P N
sunny 2/9 3/5 high 3/9 4/5
overcast 4/9 0 norm al 6/9 1/5
rain 3/9 2/5
Tem preature W indy
hot 2/9 2/5 true 3/9 3/5
m ild 4/9 2/5 false 6/9 2/5
cool 3/9 1/5
Play-tennis example: classifying X
▸An unseen sample X = <rain, hot, high, false>
▸P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) =
3/9·2/9·3/9·6/9·9/14 = 0.010582
▸P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) =
2/5·2/5·4/5·2/5·5/14 = 0.018286
▸Sample X is classified in class n (don’t play)
37
THANKS!

More Related Content

What's hot

Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Mustafa Sherazi
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.pptbutest
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Simplilearn
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesMarc Garcia
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithmparry prabhu
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learningHaris Jamil
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...Simplilearn
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)Abhimanyu Dwivedi
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Simplilearn
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmPalin analytics
 

What's hot (20)

Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)Clustering in data Mining (Data Mining)
Clustering in data Mining (Data Mining)
 
Slide3.ppt
Slide3.pptSlide3.ppt
Slide3.ppt
 
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
Decision Tree In R | Decision Tree Algorithm | Data Science Tutorial | Machin...
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
CART: Not only Classification and Regression Trees
CART: Not only Classification and Regression TreesCART: Not only Classification and Regression Trees
CART: Not only Classification and Regression Trees
 
Clustering
ClusteringClustering
Clustering
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
K mean-clustering algorithm
K mean-clustering algorithmK mean-clustering algorithm
K mean-clustering algorithm
 
Presentation on K-Means Clustering
Presentation on K-Means ClusteringPresentation on K-Means Clustering
Presentation on K-Means Clustering
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Clustering in Data Mining
Clustering in Data MiningClustering in Data Mining
Clustering in Data Mining
 
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
KNN Algorithm - How KNN Algorithm Works With Example | Data Science For Begin...
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Machine learning session4(linear regression)
Machine learning   session4(linear regression)Machine learning   session4(linear regression)
Machine learning session4(linear regression)
 
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
Random Forest In R | Random Forest Algorithm | Random Forest Tutorial |Machin...
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Decision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning AlgorithmDecision Trees for Classification: A Machine Learning Algorithm
Decision Trees for Classification: A Machine Learning Algorithm
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Decision tree
Decision treeDecision tree
Decision tree
 

Similar to Decision Tree and Bayesian Classification

A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxevonnehoggarth79783
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxanhlodge
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithmsShalitha Suranga
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docxagnesdcarey33086
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxanhlodge
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxagnesdcarey33086
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptxCYPatrickKwee
 
Aaa ped-16-Unsupervised Learning: clustering
Aaa ped-16-Unsupervised Learning: clusteringAaa ped-16-Unsupervised Learning: clustering
Aaa ped-16-Unsupervised Learning: clusteringAminaRepo
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxboyfieldhouse
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Minha Hwang
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideMegan Verbakel
 

Similar to Decision Tree and Bayesian Classification (20)

Decision tree
Decision tree Decision tree
Decision tree
 
A General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docxA General Manger of Harley-Davidson has to decide on the size of a.docx
A General Manger of Harley-Davidson has to decide on the size of a.docx
 
SAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docxSAMPLING MEAN DEFINITION The term sampling mean .docx
SAMPLING MEAN DEFINITION The term sampling mean .docx
 
Machine learning algorithms
Machine learning algorithmsMachine learning algorithms
Machine learning algorithms
 
Decision theory
Decision theoryDecision theory
Decision theory
 
AI Algorithms
AI AlgorithmsAI Algorithms
AI Algorithms
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
SAMPLING MEAN DEFINITION The term sampling mean is.docx
SAMPLING MEAN  DEFINITION  The term sampling mean is.docxSAMPLING MEAN  DEFINITION  The term sampling mean is.docx
SAMPLING MEAN DEFINITION The term sampling mean is.docx
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docxSAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
SAMPLING MEANDEFINITIONThe term sampling mean is a stati.docx
 
Machine Learning Approach.pptx
Machine Learning Approach.pptxMachine Learning Approach.pptx
Machine Learning Approach.pptx
 
Aaa ped-16-Unsupervised Learning: clustering
Aaa ped-16-Unsupervised Learning: clusteringAaa ped-16-Unsupervised Learning: clustering
Aaa ped-16-Unsupervised Learning: clustering
 
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docxAnswer the questions in one paragraph 4-5 sentences. · Why did t.docx
Answer the questions in one paragraph 4-5 sentences. · Why did t.docx
 
Chapter 04
Chapter 04 Chapter 04
Chapter 04
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Assessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's GuideAssessing Model Performance - Beginner's Guide
Assessing Model Performance - Beginner's Guide
 

More from Komal Kotak

Internet Of things
Internet Of things Internet Of things
Internet Of things Komal Kotak
 
Mesh analysis and Nodal Analysis
Mesh analysis and Nodal AnalysisMesh analysis and Nodal Analysis
Mesh analysis and Nodal AnalysisKomal Kotak
 
4 stroke petrol engine and 4 stroke diesel engine
4 stroke petrol engine and 4 stroke diesel engine 4 stroke petrol engine and 4 stroke diesel engine
4 stroke petrol engine and 4 stroke diesel engine Komal Kotak
 
Control structure of c
Control structure of cControl structure of c
Control structure of cKomal Kotak
 
Traits of good listener
Traits of good listenerTraits of good listener
Traits of good listenerKomal Kotak
 

More from Komal Kotak (6)

Internet Of things
Internet Of things Internet Of things
Internet Of things
 
Mesh analysis and Nodal Analysis
Mesh analysis and Nodal AnalysisMesh analysis and Nodal Analysis
Mesh analysis and Nodal Analysis
 
4 stroke petrol engine and 4 stroke diesel engine
4 stroke petrol engine and 4 stroke diesel engine 4 stroke petrol engine and 4 stroke diesel engine
4 stroke petrol engine and 4 stroke diesel engine
 
Control structure of c
Control structure of cControl structure of c
Control structure of c
 
Corruption
CorruptionCorruption
Corruption
 
Traits of good listener
Traits of good listenerTraits of good listener
Traits of good listener
 

Recently uploaded

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...Soham Mondal
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxvipinkmenon1
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and usesDevarapalliHaritha
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learningmisbanausheenparvam
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024Mark Billinghurst
 

Recently uploaded (20)

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
OSVC_Meta-Data based Simulation Automation to overcome Verification Challenge...
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Introduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptxIntroduction to Microprocesso programming and interfacing.pptx
Introduction to Microprocesso programming and interfacing.pptx
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
power system scada applications and uses
power system scada applications and usespower system scada applications and uses
power system scada applications and uses
 
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
chaitra-1.pptx fake news detection using machine learning
chaitra-1.pptx  fake news detection using machine learningchaitra-1.pptx  fake news detection using machine learning
chaitra-1.pptx fake news detection using machine learning
 
IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024IVE Industry Focused Event - Defence Sector 2024
IVE Industry Focused Event - Defence Sector 2024
 

Decision Tree and Bayesian Classification

  • 1. Made by : Komal Kotak
  • 3. Decision tree ▸A decision tree is a decision support tool that uses a tree-like graph or model of decisions and their possible consequences, including chance event outcomes, resource costs, and utility. It is one way to display an algorithm. 3
  • 4. Why decision tree? ▸Decision trees are powerful and popular tools for classification and prediction. ▸Decision trees represent rules, which can be understood by humans and used in knowledge system such as database. 4
  • 5. Requirements 1 ▸Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold). ▸Predefined classes (target values): the target function has discrete output values (bollean or multiclass) ▸Sufficient data: enough training cases should be provided to learn the model. 5
  • 6. Easy Example 1 A Decision Tree with two choices. 6 Go to Graduate School to get my MBA. Go to Work “in the Real World”
  • 7. Notation used in Decision trees ▸A box is used to show a choice that the manager has to make. ▸A circle is used to show that a probability outcome will occur. ▸Lines connect outcomes to their choice or probability outcome. 7
  • 8. Easy Example 1 Revisited What are some of the costs we should take into account when deciding whether or not to go to business school? • Tuition and Fees • Rent / Food / etc. • Opportunity cost of salary • Anticipated future earnings 8
  • 9. Simple Decision Tree Model 9 Go to Graduate School to get my MBA. Go to Work “in the Real World” 2 Years of tuition: $55,000, 2 years of Room/Board: $20,000; 2 years of Opportunity Cost of Salary = $100,000 Total = $175,000. PLUS  Anticipated 5 year salary after Business School = $600,000. NPV (business school) = $600,000 - $175,000 = $425,000 First two year salary = $100,000 (from above), minus expenses of $20,000. Final five year salary = $330,000 NPV (no b-school) = $410,000
  • 11. Easy Example 2 ▸The following decision tree is for the concept buy computer that indicates whether a customer at a company is likely to buy a computer or not. Each internal node represents a test on an attribute. Each leaf node represents a class. 11
  • 12. How does a tree decide where to split? ▸The decision of making strategic splits heavily affects a tree’s accuracy. The decision criteria is different for classification and regression trees. ▸Decision trees use multiple algorithms to decide to split a node in two or more sub-nodes. The creation of sub-nodes increases the homogeneity of resultant sub-nodes. In other words, we can say that purity of the node increases with respect to the target variable. Decision tree splits the nodes on all available variables and then selects the split which results in most homogeneous sub-nodes. 12
  • 13. The algorithm selection is also based on type of target variables. Let’s look at the four most commonly used algorithms in decision tree: Gini Index Chi-Square Information Gain Reduction in Variance 13
  • 14. Gini Index Gini index says, if we select two items from a population at random then they must be of same class and probability for this is 1 if population is pure. ▸It works with categorical target variable “Success” or “Failure”. ▸It performs only Binary splits ▸Higher the value of Gini higher the homogeneity. ▸CART (Classification and Regression Tree) uses Gini method to create binary splits. 14
  • 15. Gini Index Steps to Calculate Gini for a split ▸Calculate Gini for sub-nodes, using formula sum of square of probability for success and failure (p^2+q^2). ▸Calculate Gini for split using weighted Gini score of each node of that split ▸Example: –To segregate the students based on target variable ( playing cricket or not ). In the snapshot below, we split the population using two input variables Gender and Class. Now, I want to identify which split is producing more homogeneous sub-nodes using Gini index. 15
  • 16. 16
  • 17. Gini Index Split on Gender: ▸Calculate, Gini for sub-node Female = (0.2)*(0.2)+(0.8)*(0.8)=0.68 ▸Gini for sub-node Male = (0.65)*(0.65)+(0.35)*(0.35)=0.55 ▸Calculate weighted Gini for Split Gender = (10/30)*0.68+(20/30)*0.55 = 0.59 Similar for Split on Class: ▸Gini for sub-node Class IX = (0.43)*(0.43)+(0.57)*(0.57)=0.51 ▸Gini for sub-node Class X = (0.56)*(0.56)+(0.44)*(0.44)=0.51 ▸Calculate weighted Gini for Split Class = (14/30)*0.51+(16/30)*0.51 = 0.51 ▸Above, you can see that Gini score for Split on Gender is higher than Split on Class, hence, the node split will take place on Gender 17
  • 18. Chi-Square It is an algorithm to find out the statistical significance between the differences between sub-nodes and parent node. We measure it by sum of squares of standardized differences between observed and expected frequencies of target variable. ▸It works with categorical target variable “Success” or “Failure”. ▸It can perform two or more splits. ▸Higher the value of Chi-Square higher the statistical significance of differences between sub-node and Parent node. ▸Chi-Square of each node is calculated using formula, ▸Chi-square = ((Actual – Expected)^2 / Expected)^1/2 ▸It generates tree called CHAID (Chi-square Automatic Interaction Detector) 18
  • 19. Chi-Square Steps to Calculate Chi-square for a split: ▸Calculate Chi-square for individual node by calculating the deviation for Success and Failure both ▸Calculated Chi-square of Split using Sum of all Chi-square of success and Failure of each node of the split Example: Let’s work with above example that we have used to calculate Gini. 19
  • 20. Chi-Square Split on Gender: ▸First we are populating for node Female, Populate the actual value for “Play Cricket” and “Not Play Cricket”, here these are 2 and 8 respectively. ▸Calculate expected value for “Play Cricket” and “Not Play Cricket”, here it would be 5 for both because parent node has probability of 50% and we have applied same probability on Female count(10). ▸Calculate deviations by using formula, Actual – Expected. It is for “Play Cricket” (2 – 5 = -3) and for “Not play cricket” ( 8 – 5 = 3). ▸Calculate Chi-square of node for “Play Cricket” and “Not Play Cricket” using formula with formula, = ((Actual – Expected)^2 / Expected)^1/2. You can refer below table for calculation. ▸Follow similar steps for calculating Chi-square value for Male node. ▸Now add all Chi-square values to calculate Chi-square for split Gender. 20
  • 21. Chi-Square Split on Class: Perform similar steps of calculation for split on Class and you will come up with below table. Above, you can see that Chi-square also identify the Gender split is more significant compare to Class. 21
  • 22. Information Gain Look at the image below and think which node can be described easily. I am sure, your answer is C because it requires less information as all values are similar. On the other hand, B requires more information to describe it and A requires the maximum information. In other words, we can say that C is a Pure node, B is less Impure and A is more impure. Now, we can build a conclusion that less impure node requires less information to describe it. And, more impure node requires more information. Information theory is a measure to define this degree of disorganization in a system known as Entropy. If the sample is completely homogeneous, then the entropy is zero and if the sample is an equally divided (50% – 50%), it has entropy of one. 22
  • 23. Information Gain Entropy can be calculated using formula:- ▸Here p and q is probability of success and failure respectively in that node. Entropy is also used with categorical target variable. It chooses the split which has lowest entropy compared to parent node and other splits. The lesser the entropy, the better it is. Steps to calculate entropy for a split: ▸Calculate entropy of parent node ▸Calculate entropy of each individual node of split and calculate weighted average of all sub-nodes available in split. 23
  • 24. Information Gain Example: Let’s use this method to identify best split for student example. ▸Entropy for parent node = -(15/30) log2 (15/30) – (15/30) log2 (15/30) = 1. Here 1 shows that it is a impure node. ▸Entropy for Female node = -(2/10) log2 (2/10) – (8/10) log2 (8/10) = 0.72 and for male node, -(13/20) log2 (13/20) – (7/20) log2 (7/20) = 0.93 ▸Entropy for split Gender = Weighted entropy of sub-nodes = (10/30)*0.72 + (20/30)*0.93 = 0.86 ▸Entropy for Class IX node, -(6/14) log2 (6/14) – (8/14) log2 (8/14) = 0.99 and for Class X node, -(9/16) log2 (9/16) – (7/16) log2 (7/16) = 0.99. ▸Entropy for split Class = (14/30)*0.99 + (16/30)*0.99 = 0.99 Above, you can see that entropy for Split on Gender is the lowest among all, so the tree will split on Gender. We can derive information gain from entropy as 1- Entropy. 24
  • 25. Reduction in Variance ▸Till now, we have discussed the algorithms for categorical target variable. Reduction in variance is an algorithm used for continuous target variables (regression problems). This algorithm uses the standard formula of variance to choose the best split. The split with lower variance is selected as the criteria to split the population: ▸Above X-bar is mean of the values, X is actual and n is number of values. ▸Steps to calculate Variance: ▸Calculate variance for each node. ▸Calculate variance for each split as weighted average of each node variance. 25
  • 26. Reduction in Variance ▸ Example:- Let’s assign numerical value 1 for play cricket and 0 for not playing cricket. Now follow the steps to identify the right split: ▸ Variance for Root node, here mean value is (15*1 + 15*0)/30 = 0.5 and we have 15 one and 15 zero. Now variance would be ((1-0.5)^2+(1-0.5)^2+….15 times+(0-0.5)^2+(0-0.5)^2+…15 times) / 30, this can be written as (15*(1-0.5)^2+15*(0-0.5)^2) / 30 = 0.25 ▸Mean of Female node = (2*1+8*0)/10=0.2 and Variance = (2*(1-0.2)^2+8*(0-0.2)^2) / 10 = 0.16 ▸Mean of Male Node = (13*1+7*0)/20=0.65 and Variance = (13*(1-0.65)^2+7*(0-0.65)^2) / 20 = 0.23 ▸Variance for Split Gender = Weighted Variance of Sub-nodes = (10/30)*0.16 + (20/30) *0.23 = 0.21 ▸Mean of Class IX node = (6*1+8*0)/14=0.43 and Variance = (6*(1-0.43)^2+8*(0-0.43)^2) / 14= 0.24 ▸Mean of Class X node = (9*1+7*0)/16=0.56 and Variance = (9*(1-0.56)^2+7*(0-0.56)^2) / 16 = 0.25 ▸Variance for Split Gender = (14/30)*0.24 + (16/30) *0.25 = 0.25 Above, you can see that Gender split has lower variance compare to parent node, so the split would take place on Gender variable. 26
  • 27. TREE PRUNING ▸Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting. 27
  • 28. Techniques of TREE PRUNING ▸Pruning can occur in a top down or bottom up fashion. A top down pruning will traverse nodes and trim sub trees starting at the root, while a bottom up pruning will start at the leaf nodes. Below are several popular pruning algorithms. ▸Reduced error pruning ▸One of the simplest forms of pruning is reduced error pruning. Starting at the leaves, each node is replaced with its most popular class. If the prediction accuracy is not affected then the change is kept. While somewhat naive, reduced error pruning has the advantage of simplicity and speed. 28
  • 29. Techniques of TREE PRUNING ▸Cost complexity pruning ▸Cost complexity pruning generates a series of trees T0 . . . Tm where T0 is the initial tree and Tm is the root alone. At step i the tree is created by removing a sub tree from tree i-1 and replacing it with a leaf node with value chosen as in the tree building algorithm. The sub tree that is removed is chosen as follows. Define the error rate of tree T over data set S as err(T,S). The sub tree that minimizes is chosen for removal. The function prune(T,t) defines the tree gotten by pruning the sub trees t from the tree T. Once the series of trees has been created, the best tree is chosen by generalized accuracy as measured by a training set or cross-validation. 29
  • 31. Bayesian Classification: Why? ▸Probabilistic learning: Calculate explicit probabilities for hypothesis, among the most practical approaches to certain types of learning problems ▸Incremental: Each training example can incrementally increase/decrease the probability that a hypothesis is correct. Prior knowledge can be combined with observed data. ▸Probabilistic prediction: Predict multiple hypotheses, weighted by their probabilities ▸Standard: Even when Bayesian methods are computationally intractable, they can provide a standard of optimal decision making against which other methods can be measured 31
  • 32. Bayesian Theorem ▸Given training data D, posteriori probability of a hypothesis h, P(h|D) follows the Bayes theorem ▸MAP (maximum posteriori) hypothesis ▸Practical difficulty: require initial knowledge of many probabilities, significant computational cost. 32 )( )()|()|( DP hPhDPDhP  .)()|(maxarg)|(maxarg hPhDP Hh DhP HhMAP h    
  • 33. Naïve Bayes Classifier ▸A simplified assumption: attributes are conditionally independent: ▸Greatly reduces the computation cost, only count the class distribution. 33   n i jijj CdPCPDCP 1 )|()()|(
  • 34. Naïve Bayes Classifier ▸If i-th attribute is categorical: P(di|C) is estimated as the relative freq of samples having value di as i-th attribute in class C ▸If i-th attribute is continuous: P(di|C) is estimated thru a Gaussian density function ▸Computationally easy in both cases 34
  • 35. Play-tennis example: estimating P(xi|C) 35 Outlook Temperature Humidity Windy Class sunny hot high false N sunny hot high true N overcast hot high false P rain mild high false P rain cool normal false P rain cool normal true N overcast cool normal true P sunny mild high false N sunny cool normal false P rain mild normal false P sunny mild normal true P overcast mild high true P overcast hot normal false P rain mild high true N P(true|n) = 3/5P(true|p) = 3/9 P(false|n) = 2/5P(false|p) = 6/9 P(high|n) = 4/5P(high|p) = 3/9 P(normal|n) = 2/5P(normal|p) = 6/9 P(hot|n) = 2/5P(hot|p) = 2/9 P(mild|n) = 2/5P(mild|p) = 4/9 P(cool|n) = 1/5P(cool|p) = 3/9 P(rain|n) = 2/5P(rain|p) = 3/9 P(overcast|n) = 0P(overcast|p) = 4/9 P(sunny|n) = 3/5P(sunny|p) = 2/9 windy humidity temperature outlook P(n) = 5/14 P(p) = 9/14
  • 36. Naive Bayesian Classifier ▸Given a training set, we can compute the probabilities 36 O utlook P N H um idity P N sunny 2/9 3/5 high 3/9 4/5 overcast 4/9 0 norm al 6/9 1/5 rain 3/9 2/5 Tem preature W indy hot 2/9 2/5 true 3/9 3/5 m ild 4/9 2/5 false 6/9 2/5 cool 3/9 1/5
  • 37. Play-tennis example: classifying X ▸An unseen sample X = <rain, hot, high, false> ▸P(X|p)·P(p) = P(rain|p)·P(hot|p)·P(high|p)·P(false|p)·P(p) = 3/9·2/9·3/9·6/9·9/14 = 0.010582 ▸P(X|n)·P(n) = P(rain|n)·P(hot|n)·P(high|n)·P(false|n)·P(n) = 2/5·2/5·4/5·2/5·5/14 = 0.018286 ▸Sample X is classified in class n (don’t play) 37