SlideShare a Scribd company logo
1 of 68
Decision Tree, Naive Bayes,
Association rule Mining, Support
Vector Machine, KNN, Kmeans
Clustering, Random Forest
Presented to
Prof. Vibhakar Mansotra
Dean of Mathematical Science, University of Jammu
Presented by
Akanksha Bali
Research Scholar,Batch-2019, University of Jammu
Contents
 Decision Tree
 Naive Bayes Classifier
 Support Vector Machine
 Association Rule Mining
 Apriori Algorithm
 K Nearest Neighbour
 K means Clustering
 Random forest
2
Decision Trees
 A decision tree is a flowchart-like tree structure where the data is continuously
split according to a certain parameter
 Each internal node(decision node) denotes a test on an
attribute.
 Each branch represents an outcome of the test.
 Here are two main types of decision trees:
Classification trees (yes/no types)
What we’ve seen above is an example of classification tree, where the
outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable
is categorical.
Regression trees (continuous data types)
Here the decision or the outcome variable is continuous, e.g. a number
like 12
3
4
Entropy
Entropy
Entropy, also called as shannon entropy is denoted by H(S) for a finite set S,
is the measure of the amount of uncertainty or randomness in data.
H(S) = ∑ p(x)log2p(x)
Information gain
Information gain is also called as kullback-leibler divergence denoted by
IG(S,A) for a set S is the effective change in entropy after deciding on a
particular attribute A. It measures the relative change in entropy with respect
to the independent variables.
IG(S,A) = H(S)-H(S,A)
IG(S,A) = H(S) - ∑P(x)*H(x)
Where IG(S, A) is the information gain by applying feature A. H(S) is the
entropy of the entire set, while the second term calculates the entropy after
applying the feature A, where p(x) is the probability of event x.
5
Top-Down Induction of Decision
Trees ID3
D3 Algorithm will perform following tasks recursively
1.Create root node for the tree
2.If all examples are positive, return leaf node ‘positive’
3.Else if all examples are negative, return leaf node ‘negative’
4.Calculate the entropy of current state H(S)
5.For each attribute, calculate the entropy with respect to the attribute
‘x’ denoted by H(S, x)
6. Calculate
7. Select the attribute which has maximum value of IG(S, x)
8. Remove the attribute that offers highest IG from the set of attributes
9. Repeat until we run out of all attributes, or the decision tree has all
leaf nodes.
6
Training Example
NoStrongHighMildRainD14
YesWeakNormalHotOvercastD13
YesStrongHighMildOvercastD12
YesStrongNormalMildSunnyD11
YesStrongNormalMildRainD10
YesWeakNormalColdSunnyD9
NoWeakHighMildSunnyD8
YesWeakNormalCoolOvercastD7
NoStrongNormalCoolRainD6
YesWeakNormalCoolRainD5
YesWeakHighMildRainD4
YesWeakHighHotOvercastD3
NoStrongHighHotSunnyD2
NoWeakHighHotSunnyD1
Play TennisWindHumidityTemp.OutlookDay
7
Selecting the Next Attribute
Humidity
High Normal
[3+, 4-] [6+, 1-]
S=[9+,5-]
E=0.940
Gain(S,Humidity)
=0.940-(7/14)*0.985
– (7/14)*0.592
=0.151
E=0.985 E=0.592
Wind
Weak Strong
[6+, 2-] [3+, 3-]
S=[9+,5-]
E=0.940
E=0.811 E=1.0
Gain(S,Wind)
=0.940-(8/14)*0.811
– (6/14)*1.0
=0.048
Humidity provides greater info. gain than Wind, w.r.t target classification.
8
Selecting the Next Attribute
Outlook
Sunny Rain
[2+, 3-] [3+, 2-]
S=[9+,5-]
E=0.940
Gain(S,Outlook)
=0.940-(5/14)*0.971
-(4/14)*0.0 – (5/14)*0.0971
=0.247
E=0.971 E=0.971
Over
cast
[4+, 0]
E=0.0
9
Selecting the Next Attribute
The information gain values for the 4 attributes
are:
• Gain(S,Outlook) =0.247
• Gain(S,Humidity) =0.151
• Gain(S,Wind) =0.048
• Gain(S,Temperature) =0.029
where S denotes the collection of training
examples
10
ID3 Algorithm
Outlook
Sunny Overcast Rain
Yes
[D1,D2,…,D14]
[9+,5-]
Ssunny =[D1,D2,D8,D9,D11]
[2+,3-]
? ?
[D3,D7,D12,D13]
[4+,0-]
[D4,D5,D6,D10,D14]
[3+,2-]
Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970
Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570
Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
11
ID3 Algorithm
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
[D3,D7,D12,D13]
[D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
12
Converting a Tree to Rules
Outlook
Sunny Overcast Rain
Humidity
High Normal
Wind
Strong Weak
No Yes
Yes
YesNo
R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No
R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then PlayTennis=Yes
R3: If (Outlook=Overcast) Then PlayTennis=Yes
R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No
R5: If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
13
Overfitting
 One of the biggest problems with decision trees
is Overfitting
14
Avoid Overfitting
 stop growing when split not statistically
significant
 grow full tree, then post-prune
NAÏVE BAYES ALGORITHM
 The Bayesian Classification represents a supervised
learning method as well as a statistical method for
classification.
 It can solve diagnostic and predictive problems.
 It is based on the name of Thomas Bayes(1700-61).
 It works on the principle of comditional probability as
given by the bayes theorem.
15
Derivation
Derivation D : Set of tuples
Each Tuple is an ‘n’ dimensional attribute vector X :
(x1,x2,x3,…. xn)
Let there be ‘m’ Classes : C1,C2,C3…Cm
Maximum Posteriori Hypothesis
P(Ci/X) = P(X/Ci) P(Ci) / P(X) (bayes theorem)
16
Problem Statement
 Consider the given data set, apply naive bayes
algorithm and predict that if the fruit has the
folowing properties then which type of fruit it is
Fruit = { yellow, sweet, long}
Fruit Yellow Sweet Long Total
Orange 350 450 0 650
Banana 400 300 350 400
others 50 100 50 150
Total 800 850 400 1200
Problem
 Step 1: Compute the prior probabilities for each of the class of fruits:
 P(C=orange) = 650/1200 = 0.54
 P(C=banana) = 400/1200 = 0.33
 P(C=others) = 150/1200 = 0.125
 Step 2: Compute the probability of evidence
 P(X1=long) = 400/1200=0.33
 P(X2=sweet) = 850/1200 = 0.708
 P(X3=yellow) = 800/1200 = 0.66
 Step 3: Compute the probability of likelihood of evidences
 P(C=orange|X1=long) = 0/400 = 0
 P(C=orange|X2=sweet) = 450/850 = 0.52
 P(C=orange|X3=yellow) = 350/800 = 0.43
 P(C=Banana|X1=long) = 350/400 = 0.875
 P(C=Banana|X2=sweet) = 300/850 = 0.35
 P(C=Banana|X3=yellow) = 400/800 = 0.5
 P(C=others|X1=long) = 50/400 = 0.125
 P(C=others|X2=sweet) = 100/850 = 0.117
 P(C=others|X3=yellow) = 50/800 = 0.0625
18
Problem
 Step 5: Calculate posterior probability
 P(Yellow|Orange)=P(orange|Yellow)*P(yellow)
= (0.43*0.66)/0.5 = 0.5676
 P(Sweet|Orange) = 0.69
 P(Long|Orange) = 0
Step 6: P(fruit| Orange) = 0.56*0.69*0 = 0
In the Similar way P(fruit|banana)= 1*0.75 * 0.87 = 0.65
P(fruit|others) = 0.33*0.66*0.33 = 0.072
Step 7: Prediction :- type of fruit is Banana
19
P(Orange)
Association rule mining
 Association rule learning is a
rule-based machine learning method for discovering
interesting relations between variable
 Using association rule learning, the supermarket can
determine which products are frequently bought
together and use this information for marketing
purposes. This is sometimes referred to as market
basket analysis.
20
Market basket analysis
Association rule mining
Important concepts of Association Rule Mining:
The support supp(X) of an itemset is defined as the proportion of transactions in
the data set which contain the itemset. In the example database, the itemset
{milk,bread,butter}has a support of 1/5=0.2 since it occurs in 20% of all
transactions (1 out of 5 transactions).
The confidence of a rule is defined conf(X=>Y)= supp(XUY)/supp(X)
For example, the rule {butter, bread}=>{milk}
has a confidence of supp(butter,bread,milk}/support(butter,bread} = 0.2/0.2=1
in the database, which means that for 100% of the transactions containing butter
and bread the rule is correct (100% of the times a customer buys butter and bread,
milk is bought as well).
22
APIORI ALGORITHM
 The name of the algorithm is based on the fact that the
algorithm uses prior knowledge of frequent itemset properties.
 Apriori employs an iterative approach known as a level-wise
search, where k-itemsets are used to explore (k+1)-itemsets.
 First, the set of frequent 1-itemsets is found by scanning the
database to accumulate the count for each item, and collecting
those items that satisfy minimum support.
 The resulting set is denoted L1.
 Next, L1 is used to find L2, the set of frequent 2-itemsets, which
is used to find L3, and so on, until no more frequent k-itemsets
can be found.
 The finding of each Lk requires one full scan of the database.
Problem Statement
For the following given transaction dataset. Generate rules
using apriori algorithm. Consider the values as support =
50% and confidence = 50%
24
Transaction ID Items
Purchased
I1 A,B,C
I2 A,C
I3 A,D
I4 B,E,F
Problem Statement
 Step 1: Create table of Frequent itemset and calculate
support
25
items Frequency Support count
{A} 3 ¾=75%
{B} 2 2/4=50%
{C} 2 2/4=50%
{D} 1 ¼=25%
{E} 1 ¼=25%
{F} 1 ¼=25%
Problem Statement
 Step 2: Choose rows with support value is equal or
greater than 50%
26
items Frequency Support count
{A} 3 ¾=75%
{B} 2 2/4=50%
{C} 2 2/4=50%
Problem Statement
 Step 3: Create table of 2 item Frequent set and calculate
their frequency and support
27
items Frequency Support count
{A,B} 1 ¼ =25%
{A,C} 2 2/4 =50%
{B,C} 1 ¼ =25%
Problem Statement
 Step 4: Choose rows with support value is equal or
greater than 50%
 Formulate Final rules and calculate confidence
28
items Frequency Support count
{A,C} 2 2/4 =50%
Association
rules
supp confiden
ce
Conf%
A->C 2 2/3=.66 66%
C->A 2 2/2=1 100%
SUPPORT VECTOR
MACHINE
 Support Vector Machine” (SVM) is a supervised machine
learning algorithm which can be used for both classification or
regression challenges.
 Mostly used in classification problems.
 we perform classification by finding the hyper-plane
that differentiate the two classes very well
29
Identify the right hyper-plane
scenario 1: scenario 2:
30
scenario 3:
Identify the right hyper-plane
31
scenario 4:
Support vector machine
 Pros:
 It works really well with clear margin of separation
 It is effective in high dimensional spaces.
 It is effective in cases where number of dimensions is greater
than the number of samples.
 It uses a subset of training points in the decision function
(called support vectors), so it is also memory efficient.
 Cons:
 It doesn’t perform well, when we have large data set because
the required training time is higher
 It also doesn’t perform very well, when the data set has more
noise i.e. target classes are overlapping
32
K Nearest Neighbour
 Contents
 Introduction
 Closeness
 Algorithm
 Example
33
K Nearest Neighbour
• K-Nearest Neighbors is one of the most basic yet
essential classification algorithms in Machine Learning.
It belongs to the supervised learning domain and finds
intense application in pattern recognition, data mining
and intrusion detection.
• It was first described in the early 1950s.
• Gained popularity, when increased computing power
became available.
• Used widely in area of pattern recognition and
statistical estimation.
34
Closeness
 The Euclidean distance between two points
or tuples, say,
X1 = (x11,x12,...,x1n) and X2 =(x21,x22,...,x2n),is
35
KNN Classifier Algorithm
36
.
Example
•  We have data from the questionnaires survey and objective
testing with two attributes (acid durability and strength) to classify
whether a special paper tissue is good or not. Here are four training
samples :
X1 = Acid Durability
(seconds)
X2 = Strength
(kg/square meter)
Y = Classification
7 7 Bad
7 4 Bad
3 4 Good
1 4 Good
Now the factory produces a new paper tissue that passes the
laboratory test with X1 = 3 and X2 = 7. Guess the classification
of this new tissue.
 Step 1 : Initialize and Define k.
Lets say, k = 3
(Always choose k as an odd number if the number of attributes is even to avoid
a tie in the class prediction)
 Step 2 : Compute the distance between input sample and
trainingsample
- Co-ordinate of the input sample is (3,7).
- Instead of calculating the Euclidean distance, we
calculate the Squared Euclidean distance.
X1 = Acid Durability
(seconds)
X2 = Strength
(kg/square meter)
Squared Euclidean distance
7 7 (7-3)2 + (7-7)2 = 16
7 4 (7-3)2 + (4-7)2 = 25
3 4 (3-3)2 + (4-7)2 = 09
1 4 (1-3)2 + (4-7)2 = 13
 Step 3 : Sort the distance and determine the nearest
neighbours based of the Kth minimum distance :
X1 = Acid
Durability
(seconds)
X2 = Strength
(kg/square
meter)
Squared
Euclidean
distance
Rank
minimum
distance
Is it included
in 3-Nearest
Neighbour?
7 7 16 3 Yes
7 4 25 4 No
3 4 09 1 Yes
1 4 13 2 Yes
Example
Step 4 : Take 3-Nearest Neighbours:
Gather the category Y of the nearest neighbours.
X1 = Acid
Durability
(seconds)
X2 =
Strength
(kg/square
meter)
Squared
Euclidean
distance
Rank
minimum
distance
Is it
included in
3-Nearest
Neighbour?
Y =
Categor
y of the
nearest
neighbo
ur
7 7 16 3 Yes Bad
7 4 25 4 No -
3 4 09 1 Yes Good
1 4 13 2 Yes Good
Example
Step 5 : Apply simple majority
Use simple majority of the category of the nearest
neighbours as the prediction value of the query
instance.
We have 2 “good” and 1 “bad”. Thus we conclude
that the new paper tissue that passes the laboratory
test with X1 = 3 and X2 = 7 is included in the
“good” category.
Example
K – Means Clustering
 Contents
 Introduction
 Algorithm
 Example
 Application
42
KNN Clustering Algorithm
 Clustering: the process of grouping a set of objects into
classes of similar objects
 Documents within a cluster should be similar.
 Documents from different clusters should be dissimilar.
 The commonest form of unsupervised learning

Unsupervised learning = learning from raw data, as
opposed to supervised data where a classification of
examples is given.
 in principle, optimal partition achieved via minimising the sum
of squared distance to its “representative object” in each cluster
43
2
1
2
)(),( knn
N
n
k mxd −= ∑=
mxe.g., Euclidean distance =
K-means Algorithm
44
.
A Simple example showing the implementation
of k-means algorithm
(using K=2)
 .
Step 1:
Initialization: Randomly we choose following two centroids
(k=2) for two clusters.
In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
Step 2:
 Thus, we obtain two clusters
containing:
{1,2,3} and {4,5,6,7}.
 Their new centroids are:
Step 3:
 Now using these centroids
we compute the Euclidean
distance of each object, as
shown in table.
 Therefore, the new
clusters are:
{1,2} and {3,4,5,6,7}
 Next centroids are:
m1=(1.25,1.5) and m2 =
(3.9,5.1)
λ Step 4 :
The clusters obtained are:
{1,2} and {3,4,5,6,7}
λ Therefore, there is no
change in the cluster.
λ Thus, the algorithm comes
to a halt here and final
result consist of 2 clusters
{1,2} and {3,4,5,6,7}.
Example
Subject A B
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 5.0 7.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
50
consider the following data set consisting of the scores
of two variables on each of seven individuals:
Example
.
51
This data set is to be grouped into two clusters. As a first step in
finding a sensible initial partition, let the A & B values of the two
individuals furthest apart (using the Euclidean distance measure),
define the initial cluster means, giving:
  Individual
Mean Vector
(centroid)
Group 1 1 (1.0, 1.0)
Group 2 4 (5.0, 7.0)
Example
 The remaining individuals are now examined in sequence and
allocated to the cluster to which they are closest, in terms of
Euclidean distance to the cluster mean. The mean vector is
recalculated each time a new member is added.
52
Cluster 1 Cluster 2
Step Individual
Mean
Vector
(centroid)
Individual
Mean
Vector
(centroid)
1 1 (1.0, 1.0) 4 (5.0, 7.0)
2 1, 2 (1.2, 1.5) 4 (5.0, 7.0)
3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0)
4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0)
5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7)
6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)
Example
 Now the initial partition has changed, and the two
clusters at this stage having the following
characteristics:
53
  Individual
Mean Vector
(centroid)
Cluster 1 1, 2, 3 (1.8, 2.3)
Cluster 2 4, 5, 6, 7 (4.1, 5.4)
Example
Individual
Distance to mean
(centroid) of
Cluster 1
Distance to mean
(centroid) of
Cluster 2
1 1.5 5.4
2 0.4 4.3
3 2.1 1.8
4 5.7 1.8
5 3.2 0.7
6 3.8 0.6
7 2.8 1.1
54
But we cannot yet be sure that each individual has been assigned
to the right cluster. So, we compare each individual’s distance to
its own cluster mean and to
that of the opposite cluster.
Example
 The iterative relocation would now continue from this new
partition until no more relocations occur. However, in this
example each individual is now nearer its own cluster mean than
that of the other cluster and the iteration stops, choosing the
latest partitioning as the final cluster solution.
55
  Individual
Mean Vector
(centroid)
Cluster 1 1, 2 (1.3, 1.5)
Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)
Applications
 Clustering helps marketers improve their customer base
and work on the target areas. It helps group people
(according to different criteria’s such as willingness,
purchasing power etc.) based on their similarity in many
ways related to the product under consideration.
 Clustering helps in identification of groups of houses on
the basis of their value, type and geographical locations.
 Clustering is used to study earth-quake. Based on the
areas hit by an earthquake in a region, clustering can
help analyse the next probable location where
earthquake can occur.
56
Random Forest
 Contents
 Random Forest Introduction
 Pseudocode
 Prediction Pseudocode
 Example
 Random Forest vs Decision Tree
 Advantages
 Disadvantages
 Application
57
Random Forest
 Random forest algorithm is a supervised classification and
regression algorithm.
 Randomly creates a forest with several trees.
58
Random Forest pseudocode
 Randomly select “k” features from total “m” features.
 Where k << m
 Among the “k” features, calculate the node “d” using
the best split point.
 Split the node into daughter nodes using the best split.
 Repeat 1 to 3 steps until “l” number of nodes has been
reached.
 Build forest by repeating steps 1 to 4 for “n” number
times to create “n” number of trees.
59
Prediction pseudocode
To perform prediction using the trained random forest
algorithm uses the below pseudocode.
 Takes the test features and use the rules of each
randomly created decision tree to predict the oucome
and stores the predicted outcome (target)
 Calculate the votes for each predicted target.
 Consider the high voted predicted target as the final
prediction from the random forest algorithm.
60
Example
61
Day Outlook Humidity Wind Play
D1 Sunny High Weak Yes
D2 Sunny High Strong No
D3 Overcast High Weak Yes
D4 Rain High Weak Yes
D5 Rain Normal Weak Yes
D6 Rain Normal Strong No
D7 Overcast Normal Strong Yes
D8 Sunny High Weak No
D9 Sunny Normal Weak Yes
D10 Rain Normal Weak Yes
D11 Sunny Normal Strong Yes
D12 Overcast High Strong Yes
D13 Overcast Normal Weak Yes
D14 Rain High Strong No
Example
 Whether the game will happen if the weather condition
is
Outlook = Rain, Humidity = High, Wind = weak
Play=?
 Step 1: divide the data into smaller subsets
 Step 2: every subsets need not be distinct, some
subsets may be overlapped
62
63
No Yes
Weak
Strong
Yes No
High
Normal
Yes No
High
Normal
Weak Weak StrongStrong
D1,D2,D3
D3,D4,D5,D6
D7,D8,D9
Majority Vote = Play
No Play
Play
Advantages
 Random forests is considered as a highly accurate and
robust method.
 It does not suffer from the overfitting problem.
 The algorithm can be used in both classification and
regression problems.
 Random forests can also handle missing values.
 You can get the relative feature importance, which helps
in selecting the most contributing features for the
classifier.
64
Disadvantages
 It can take longer than expected time to compute a large
number of trees.
 The model is difficult to interpret compared to a
decision tree.
65
Random forest vs Decision
Trees
 Random forests is a set of multiple decision trees.
 Deep decision trees may suffer from overfitting, but
random forests prevents overfitting by creating trees on
random subsets.
 Decision trees are computationally faster.
 Random forests is difficult to interpret, while a decision
tree is easily interpretable and can be converted to
rules.
66
Applications
 Banking
 Medicine
 Stock Market
 E-Commerce
67
68

More Related Content

What's hot

Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree LearningMilind Gokhale
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forestsMarc Garcia
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with PythonDavis David
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining Sulman Ahmed
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and predictionDataminingTools Inc
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithmRashid Ansari
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Simplilearn
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data miningEr. Nawaraj Bhandari
 

What's hot (20)

Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
Decision Tree Learning
Decision Tree LearningDecision Tree Learning
Decision Tree Learning
 
Understanding random forests
Understanding random forestsUnderstanding random forests
Understanding random forests
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.01 Data Mining: Concepts and Techniques, 2nd ed.
01 Data Mining: Concepts and Techniques, 2nd ed.
 
Decision tree and random forest
Decision tree and random forestDecision tree and random forest
Decision tree and random forest
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Apriori
AprioriApriori
Apriori
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Exploratory data analysis with Python
Exploratory data analysis with PythonExploratory data analysis with Python
Exploratory data analysis with Python
 
Classification in data mining
Classification in data mining Classification in data mining
Classification in data mining
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Random forest algorithm
Random forest algorithmRandom forest algorithm
Random forest algorithm
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
Random Forest Algorithm - Random Forest Explained | Random Forest In Machine ...
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Classification and prediction in data mining
Classification and prediction in data miningClassification and prediction in data mining
Classification and prediction in data mining
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 

Similar to Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, K nearest Neighbour, K means Clustering, Random Forest By akanksha Bali

Download presentation source
Download presentation sourceDownload presentation source
Download presentation sourcebutest
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.pptbutest
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...IOSRjournaljce
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.pptRohit Raj
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classificationLokarchanaD
 
Classifiers
ClassifiersClassifiers
ClassifiersAyurdata
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos butest
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsJason Tsai
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3Krish_ver2
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
 

Similar to Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, K nearest Neighbour, K means Clustering, Random Forest By akanksha Bali (20)

Download presentation source
Download presentation sourceDownload presentation source
Download presentation source
 
My7class
My7classMy7class
My7class
 
Unit 3.pptx
Unit 3.pptxUnit 3.pptx
Unit 3.pptx
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
3_learning.ppt
3_learning.ppt3_learning.ppt
3_learning.ppt
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Classification
Classification Classification
Classification
 
Unit-1.pdf
Unit-1.pdfUnit-1.pdf
Unit-1.pdf
 
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...An Experimental Study of Diabetes Disease Prediction System Using Classificat...
An Experimental Study of Diabetes Disease Prediction System Using Classificat...
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Business Analytics using R.ppt
Business Analytics using R.pptBusiness Analytics using R.ppt
Business Analytics using R.ppt
 
Unit-4 classification
Unit-4 classificationUnit-4 classification
Unit-4 classification
 
Classifiers
ClassifiersClassifiers
Classifiers
 
Naive bayes classifier
Naive bayes classifierNaive bayes classifier
Naive bayes classifier
 
Unit-2.ppt
Unit-2.pptUnit-2.ppt
Unit-2.ppt
 
Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos Introduction to Machine Learning Aristotelis Tsirigos
Introduction to Machine Learning Aristotelis Tsirigos
 
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning BasicsDeep Learning: Introduction & Chapter 5 Machine Learning Basics
Deep Learning: Introduction & Chapter 5 Machine Learning Basics
 
1.11.association mining 3
1.11.association mining 31.11.association mining 3
1.11.association mining 3
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 
Lecture 7
Lecture 7Lecture 7
Lecture 7
 

More from Akanksha Bali

Feedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
Feedback by akanksha bali, Feedback of FDP, Shortterm course, WorkshopFeedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
Feedback by akanksha bali, Feedback of FDP, Shortterm course, WorkshopAkanksha Bali
 
Feedback by akanksha bali
Feedback by akanksha baliFeedback by akanksha bali
Feedback by akanksha baliAkanksha Bali
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha BaliAkanksha Bali
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliAkanksha Bali
 
Python Basics by Akanksha Bali
Python Basics by Akanksha BaliPython Basics by Akanksha Bali
Python Basics by Akanksha BaliAkanksha Bali
 
Machine learning basics by akanksha bali
Machine learning basics by akanksha baliMachine learning basics by akanksha bali
Machine learning basics by akanksha baliAkanksha Bali
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics Akanksha Bali
 

More from Akanksha Bali (7)

Feedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
Feedback by akanksha bali, Feedback of FDP, Shortterm course, WorkshopFeedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
Feedback by akanksha bali, Feedback of FDP, Shortterm course, Workshop
 
Feedback by akanksha bali
Feedback by akanksha baliFeedback by akanksha bali
Feedback by akanksha bali
 
Regression analysis by akanksha Bali
Regression analysis by akanksha BaliRegression analysis by akanksha Bali
Regression analysis by akanksha Bali
 
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha BaliRegression (Linear Regression and Logistic Regression) by Akanksha Bali
Regression (Linear Regression and Logistic Regression) by Akanksha Bali
 
Python Basics by Akanksha Bali
Python Basics by Akanksha BaliPython Basics by Akanksha Bali
Python Basics by Akanksha Bali
 
Machine learning basics by akanksha bali
Machine learning basics by akanksha baliMachine learning basics by akanksha bali
Machine learning basics by akanksha bali
 
Machine learning basics
Machine learning basics Machine learning basics
Machine learning basics
 

Recently uploaded

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )Tsuyoshi Horigome
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingrakeshbaidya232001
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINESIVASHANKAR N
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAbhinavSharma374939
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 

Recently uploaded (20)

SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )SPICE PARK APR2024 ( 6,793 SPICE Models )
SPICE PARK APR2024 ( 6,793 SPICE Models )
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Porous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writingPorous Ceramics seminar and technical writing
Porous Ceramics seminar and technical writing
 
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINEMANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
MANUFACTURING PROCESS-II UNIT-2 LATHE MACHINE
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINEDJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
DJARUM4D - SLOT GACOR ONLINE | SLOT DEMO ONLINE
 
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSHARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130
 
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IVHARMONY IN THE NATURE AND EXISTENCE - Unit-IV
HARMONY IN THE NATURE AND EXISTENCE - Unit-IV
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Analog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog ConverterAnalog to Digital and Digital to Analog Converter
Analog to Digital and Digital to Analog Converter
 
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 

Decision Tree, Naive Bayes, Association Rule Mining, Support Vector Machine, K nearest Neighbour, K means Clustering, Random Forest By akanksha Bali

  • 1. Decision Tree, Naive Bayes, Association rule Mining, Support Vector Machine, KNN, Kmeans Clustering, Random Forest Presented to Prof. Vibhakar Mansotra Dean of Mathematical Science, University of Jammu Presented by Akanksha Bali Research Scholar,Batch-2019, University of Jammu
  • 2. Contents  Decision Tree  Naive Bayes Classifier  Support Vector Machine  Association Rule Mining  Apriori Algorithm  K Nearest Neighbour  K means Clustering  Random forest 2
  • 3. Decision Trees  A decision tree is a flowchart-like tree structure where the data is continuously split according to a certain parameter  Each internal node(decision node) denotes a test on an attribute.  Each branch represents an outcome of the test.  Here are two main types of decision trees: Classification trees (yes/no types) What we’ve seen above is an example of classification tree, where the outcome was a variable like ‘fit’ or ‘unfit’. Here the decision variable is categorical. Regression trees (continuous data types) Here the decision or the outcome variable is continuous, e.g. a number like 12 3
  • 4. 4 Entropy Entropy Entropy, also called as shannon entropy is denoted by H(S) for a finite set S, is the measure of the amount of uncertainty or randomness in data. H(S) = ∑ p(x)log2p(x) Information gain Information gain is also called as kullback-leibler divergence denoted by IG(S,A) for a set S is the effective change in entropy after deciding on a particular attribute A. It measures the relative change in entropy with respect to the independent variables. IG(S,A) = H(S)-H(S,A) IG(S,A) = H(S) - ∑P(x)*H(x) Where IG(S, A) is the information gain by applying feature A. H(S) is the entropy of the entire set, while the second term calculates the entropy after applying the feature A, where p(x) is the probability of event x.
  • 5. 5 Top-Down Induction of Decision Trees ID3 D3 Algorithm will perform following tasks recursively 1.Create root node for the tree 2.If all examples are positive, return leaf node ‘positive’ 3.Else if all examples are negative, return leaf node ‘negative’ 4.Calculate the entropy of current state H(S) 5.For each attribute, calculate the entropy with respect to the attribute ‘x’ denoted by H(S, x) 6. Calculate 7. Select the attribute which has maximum value of IG(S, x) 8. Remove the attribute that offers highest IG from the set of attributes 9. Repeat until we run out of all attributes, or the decision tree has all leaf nodes.
  • 7. 7 Selecting the Next Attribute Humidity High Normal [3+, 4-] [6+, 1-] S=[9+,5-] E=0.940 Gain(S,Humidity) =0.940-(7/14)*0.985 – (7/14)*0.592 =0.151 E=0.985 E=0.592 Wind Weak Strong [6+, 2-] [3+, 3-] S=[9+,5-] E=0.940 E=0.811 E=1.0 Gain(S,Wind) =0.940-(8/14)*0.811 – (6/14)*1.0 =0.048 Humidity provides greater info. gain than Wind, w.r.t target classification.
  • 8. 8 Selecting the Next Attribute Outlook Sunny Rain [2+, 3-] [3+, 2-] S=[9+,5-] E=0.940 Gain(S,Outlook) =0.940-(5/14)*0.971 -(4/14)*0.0 – (5/14)*0.0971 =0.247 E=0.971 E=0.971 Over cast [4+, 0] E=0.0
  • 9. 9 Selecting the Next Attribute The information gain values for the 4 attributes are: • Gain(S,Outlook) =0.247 • Gain(S,Humidity) =0.151 • Gain(S,Wind) =0.048 • Gain(S,Temperature) =0.029 where S denotes the collection of training examples
  • 10. 10 ID3 Algorithm Outlook Sunny Overcast Rain Yes [D1,D2,…,D14] [9+,5-] Ssunny =[D1,D2,D8,D9,D11] [2+,3-] ? ? [D3,D7,D12,D13] [4+,0-] [D4,D5,D6,D10,D14] [3+,2-] Gain(Ssunny, Humidity)=0.970-(3/5)0.0 – 2/5(0.0) = 0.970 Gain(Ssunny, Temp.)=0.970-(2/5)0.0 –2/5(1.0)-(1/5)0.0 = 0.570 Gain(Ssunny, Wind)=0.970= -(2/5)1.0 – 3/5(0.918) = 0.019
  • 11. 11 ID3 Algorithm Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes YesNo [D3,D7,D12,D13] [D8,D9,D11] [D6,D14][D1,D2] [D4,D5,D10]
  • 12. 12 Converting a Tree to Rules Outlook Sunny Overcast Rain Humidity High Normal Wind Strong Weak No Yes Yes YesNo R1: If (Outlook=Sunny) ∧ (Humidity=High) Then PlayTennis=No R2: If (Outlook=Sunny) ∧ (Humidity=Normal) Then PlayTennis=Yes R3: If (Outlook=Overcast) Then PlayTennis=Yes R4: If (Outlook=Rain) ∧ (Wind=Strong) Then PlayTennis=No R5: If (Outlook=Rain) ∧ (Wind=Weak) Then PlayTennis=Yes
  • 13. 13 Overfitting  One of the biggest problems with decision trees is Overfitting
  • 14. 14 Avoid Overfitting  stop growing when split not statistically significant  grow full tree, then post-prune
  • 15. NAÏVE BAYES ALGORITHM  The Bayesian Classification represents a supervised learning method as well as a statistical method for classification.  It can solve diagnostic and predictive problems.  It is based on the name of Thomas Bayes(1700-61).  It works on the principle of comditional probability as given by the bayes theorem. 15
  • 16. Derivation Derivation D : Set of tuples Each Tuple is an ‘n’ dimensional attribute vector X : (x1,x2,x3,…. xn) Let there be ‘m’ Classes : C1,C2,C3…Cm Maximum Posteriori Hypothesis P(Ci/X) = P(X/Ci) P(Ci) / P(X) (bayes theorem) 16
  • 17. Problem Statement  Consider the given data set, apply naive bayes algorithm and predict that if the fruit has the folowing properties then which type of fruit it is Fruit = { yellow, sweet, long} Fruit Yellow Sweet Long Total Orange 350 450 0 650 Banana 400 300 350 400 others 50 100 50 150 Total 800 850 400 1200
  • 18. Problem  Step 1: Compute the prior probabilities for each of the class of fruits:  P(C=orange) = 650/1200 = 0.54  P(C=banana) = 400/1200 = 0.33  P(C=others) = 150/1200 = 0.125  Step 2: Compute the probability of evidence  P(X1=long) = 400/1200=0.33  P(X2=sweet) = 850/1200 = 0.708  P(X3=yellow) = 800/1200 = 0.66  Step 3: Compute the probability of likelihood of evidences  P(C=orange|X1=long) = 0/400 = 0  P(C=orange|X2=sweet) = 450/850 = 0.52  P(C=orange|X3=yellow) = 350/800 = 0.43  P(C=Banana|X1=long) = 350/400 = 0.875  P(C=Banana|X2=sweet) = 300/850 = 0.35  P(C=Banana|X3=yellow) = 400/800 = 0.5  P(C=others|X1=long) = 50/400 = 0.125  P(C=others|X2=sweet) = 100/850 = 0.117  P(C=others|X3=yellow) = 50/800 = 0.0625 18
  • 19. Problem  Step 5: Calculate posterior probability  P(Yellow|Orange)=P(orange|Yellow)*P(yellow) = (0.43*0.66)/0.5 = 0.5676  P(Sweet|Orange) = 0.69  P(Long|Orange) = 0 Step 6: P(fruit| Orange) = 0.56*0.69*0 = 0 In the Similar way P(fruit|banana)= 1*0.75 * 0.87 = 0.65 P(fruit|others) = 0.33*0.66*0.33 = 0.072 Step 7: Prediction :- type of fruit is Banana 19 P(Orange)
  • 20. Association rule mining  Association rule learning is a rule-based machine learning method for discovering interesting relations between variable  Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. 20
  • 22. Association rule mining Important concepts of Association Rule Mining: The support supp(X) of an itemset is defined as the proportion of transactions in the data set which contain the itemset. In the example database, the itemset {milk,bread,butter}has a support of 1/5=0.2 since it occurs in 20% of all transactions (1 out of 5 transactions). The confidence of a rule is defined conf(X=>Y)= supp(XUY)/supp(X) For example, the rule {butter, bread}=>{milk} has a confidence of supp(butter,bread,milk}/support(butter,bread} = 0.2/0.2=1 in the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well). 22
  • 23. APIORI ALGORITHM  The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset properties.  Apriori employs an iterative approach known as a level-wise search, where k-itemsets are used to explore (k+1)-itemsets.  First, the set of frequent 1-itemsets is found by scanning the database to accumulate the count for each item, and collecting those items that satisfy minimum support.  The resulting set is denoted L1.  Next, L1 is used to find L2, the set of frequent 2-itemsets, which is used to find L3, and so on, until no more frequent k-itemsets can be found.  The finding of each Lk requires one full scan of the database.
  • 24. Problem Statement For the following given transaction dataset. Generate rules using apriori algorithm. Consider the values as support = 50% and confidence = 50% 24 Transaction ID Items Purchased I1 A,B,C I2 A,C I3 A,D I4 B,E,F
  • 25. Problem Statement  Step 1: Create table of Frequent itemset and calculate support 25 items Frequency Support count {A} 3 ¾=75% {B} 2 2/4=50% {C} 2 2/4=50% {D} 1 ¼=25% {E} 1 ¼=25% {F} 1 ¼=25%
  • 26. Problem Statement  Step 2: Choose rows with support value is equal or greater than 50% 26 items Frequency Support count {A} 3 ¾=75% {B} 2 2/4=50% {C} 2 2/4=50%
  • 27. Problem Statement  Step 3: Create table of 2 item Frequent set and calculate their frequency and support 27 items Frequency Support count {A,B} 1 ¼ =25% {A,C} 2 2/4 =50% {B,C} 1 ¼ =25%
  • 28. Problem Statement  Step 4: Choose rows with support value is equal or greater than 50%  Formulate Final rules and calculate confidence 28 items Frequency Support count {A,C} 2 2/4 =50% Association rules supp confiden ce Conf% A->C 2 2/3=.66 66% C->A 2 2/2=1 100%
  • 29. SUPPORT VECTOR MACHINE  Support Vector Machine” (SVM) is a supervised machine learning algorithm which can be used for both classification or regression challenges.  Mostly used in classification problems.  we perform classification by finding the hyper-plane that differentiate the two classes very well 29
  • 30. Identify the right hyper-plane scenario 1: scenario 2: 30 scenario 3:
  • 31. Identify the right hyper-plane 31 scenario 4:
  • 32. Support vector machine  Pros:  It works really well with clear margin of separation  It is effective in high dimensional spaces.  It is effective in cases where number of dimensions is greater than the number of samples.  It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient.  Cons:  It doesn’t perform well, when we have large data set because the required training time is higher  It also doesn’t perform very well, when the data set has more noise i.e. target classes are overlapping 32
  • 33. K Nearest Neighbour  Contents  Introduction  Closeness  Algorithm  Example 33
  • 34. K Nearest Neighbour • K-Nearest Neighbors is one of the most basic yet essential classification algorithms in Machine Learning. It belongs to the supervised learning domain and finds intense application in pattern recognition, data mining and intrusion detection. • It was first described in the early 1950s. • Gained popularity, when increased computing power became available. • Used widely in area of pattern recognition and statistical estimation. 34
  • 35. Closeness  The Euclidean distance between two points or tuples, say, X1 = (x11,x12,...,x1n) and X2 =(x21,x22,...,x2n),is 35
  • 37. Example •  We have data from the questionnaires survey and objective testing with two attributes (acid durability and strength) to classify whether a special paper tissue is good or not. Here are four training samples : X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Y = Classification 7 7 Bad 7 4 Bad 3 4 Good 1 4 Good Now the factory produces a new paper tissue that passes the laboratory test with X1 = 3 and X2 = 7. Guess the classification of this new tissue.
  • 38.  Step 1 : Initialize and Define k. Lets say, k = 3 (Always choose k as an odd number if the number of attributes is even to avoid a tie in the class prediction)  Step 2 : Compute the distance between input sample and trainingsample - Co-ordinate of the input sample is (3,7). - Instead of calculating the Euclidean distance, we calculate the Squared Euclidean distance. X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance 7 7 (7-3)2 + (7-7)2 = 16 7 4 (7-3)2 + (4-7)2 = 25 3 4 (3-3)2 + (4-7)2 = 09 1 4 (1-3)2 + (4-7)2 = 13
  • 39.  Step 3 : Sort the distance and determine the nearest neighbours based of the Kth minimum distance : X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance Rank minimum distance Is it included in 3-Nearest Neighbour? 7 7 16 3 Yes 7 4 25 4 No 3 4 09 1 Yes 1 4 13 2 Yes Example
  • 40. Step 4 : Take 3-Nearest Neighbours: Gather the category Y of the nearest neighbours. X1 = Acid Durability (seconds) X2 = Strength (kg/square meter) Squared Euclidean distance Rank minimum distance Is it included in 3-Nearest Neighbour? Y = Categor y of the nearest neighbo ur 7 7 16 3 Yes Bad 7 4 25 4 No - 3 4 09 1 Yes Good 1 4 13 2 Yes Good Example
  • 41. Step 5 : Apply simple majority Use simple majority of the category of the nearest neighbours as the prediction value of the query instance. We have 2 “good” and 1 “bad”. Thus we conclude that the new paper tissue that passes the laboratory test with X1 = 3 and X2 = 7 is included in the “good” category. Example
  • 42. K – Means Clustering  Contents  Introduction  Algorithm  Example  Application 42
  • 43. KNN Clustering Algorithm  Clustering: the process of grouping a set of objects into classes of similar objects  Documents within a cluster should be similar.  Documents from different clusters should be dissimilar.  The commonest form of unsupervised learning  Unsupervised learning = learning from raw data, as opposed to supervised data where a classification of examples is given.  in principle, optimal partition achieved via minimising the sum of squared distance to its “representative object” in each cluster 43 2 1 2 )(),( knn N n k mxd −= ∑= mxe.g., Euclidean distance =
  • 45. A Simple example showing the implementation of k-means algorithm (using K=2)  .
  • 46. Step 1: Initialization: Randomly we choose following two centroids (k=2) for two clusters. In this case the 2 centroid are: m1=(1.0,1.0) and m2=(5.0,7.0).
  • 47. Step 2:  Thus, we obtain two clusters containing: {1,2,3} and {4,5,6,7}.  Their new centroids are:
  • 48. Step 3:  Now using these centroids we compute the Euclidean distance of each object, as shown in table.  Therefore, the new clusters are: {1,2} and {3,4,5,6,7}  Next centroids are: m1=(1.25,1.5) and m2 = (3.9,5.1)
  • 49. λ Step 4 : The clusters obtained are: {1,2} and {3,4,5,6,7} λ Therefore, there is no change in the cluster. λ Thus, the algorithm comes to a halt here and final result consist of 2 clusters {1,2} and {3,4,5,6,7}.
  • 50. Example Subject A B 1 1.0 1.0 2 1.5 2.0 3 3.0 4.0 4 5.0 7.0 5 3.5 5.0 6 4.5 5.0 7 3.5 4.5 50 consider the following data set consisting of the scores of two variables on each of seven individuals:
  • 51. Example . 51 This data set is to be grouped into two clusters. As a first step in finding a sensible initial partition, let the A & B values of the two individuals furthest apart (using the Euclidean distance measure), define the initial cluster means, giving:   Individual Mean Vector (centroid) Group 1 1 (1.0, 1.0) Group 2 4 (5.0, 7.0)
  • 52. Example  The remaining individuals are now examined in sequence and allocated to the cluster to which they are closest, in terms of Euclidean distance to the cluster mean. The mean vector is recalculated each time a new member is added. 52 Cluster 1 Cluster 2 Step Individual Mean Vector (centroid) Individual Mean Vector (centroid) 1 1 (1.0, 1.0) 4 (5.0, 7.0) 2 1, 2 (1.2, 1.5) 4 (5.0, 7.0) 3 1, 2, 3 (1.8, 2.3) 4 (5.0, 7.0) 4 1, 2, 3 (1.8, 2.3) 4, 5 (4.2, 6.0) 5 1, 2, 3 (1.8, 2.3) 4, 5, 6 (4.3, 5.7) 6 1, 2, 3 (1.8, 2.3) 4, 5, 6, 7 (4.1, 5.4)
  • 53. Example  Now the initial partition has changed, and the two clusters at this stage having the following characteristics: 53   Individual Mean Vector (centroid) Cluster 1 1, 2, 3 (1.8, 2.3) Cluster 2 4, 5, 6, 7 (4.1, 5.4)
  • 54. Example Individual Distance to mean (centroid) of Cluster 1 Distance to mean (centroid) of Cluster 2 1 1.5 5.4 2 0.4 4.3 3 2.1 1.8 4 5.7 1.8 5 3.2 0.7 6 3.8 0.6 7 2.8 1.1 54 But we cannot yet be sure that each individual has been assigned to the right cluster. So, we compare each individual’s distance to its own cluster mean and to that of the opposite cluster.
  • 55. Example  The iterative relocation would now continue from this new partition until no more relocations occur. However, in this example each individual is now nearer its own cluster mean than that of the other cluster and the iteration stops, choosing the latest partitioning as the final cluster solution. 55   Individual Mean Vector (centroid) Cluster 1 1, 2 (1.3, 1.5) Cluster 2 3, 4, 5, 6, 7 (3.9, 5.1)
  • 56. Applications  Clustering helps marketers improve their customer base and work on the target areas. It helps group people (according to different criteria’s such as willingness, purchasing power etc.) based on their similarity in many ways related to the product under consideration.  Clustering helps in identification of groups of houses on the basis of their value, type and geographical locations.  Clustering is used to study earth-quake. Based on the areas hit by an earthquake in a region, clustering can help analyse the next probable location where earthquake can occur. 56
  • 57. Random Forest  Contents  Random Forest Introduction  Pseudocode  Prediction Pseudocode  Example  Random Forest vs Decision Tree  Advantages  Disadvantages  Application 57
  • 58. Random Forest  Random forest algorithm is a supervised classification and regression algorithm.  Randomly creates a forest with several trees. 58
  • 59. Random Forest pseudocode  Randomly select “k” features from total “m” features.  Where k << m  Among the “k” features, calculate the node “d” using the best split point.  Split the node into daughter nodes using the best split.  Repeat 1 to 3 steps until “l” number of nodes has been reached.  Build forest by repeating steps 1 to 4 for “n” number times to create “n” number of trees. 59
  • 60. Prediction pseudocode To perform prediction using the trained random forest algorithm uses the below pseudocode.  Takes the test features and use the rules of each randomly created decision tree to predict the oucome and stores the predicted outcome (target)  Calculate the votes for each predicted target.  Consider the high voted predicted target as the final prediction from the random forest algorithm. 60
  • 61. Example 61 Day Outlook Humidity Wind Play D1 Sunny High Weak Yes D2 Sunny High Strong No D3 Overcast High Weak Yes D4 Rain High Weak Yes D5 Rain Normal Weak Yes D6 Rain Normal Strong No D7 Overcast Normal Strong Yes D8 Sunny High Weak No D9 Sunny Normal Weak Yes D10 Rain Normal Weak Yes D11 Sunny Normal Strong Yes D12 Overcast High Strong Yes D13 Overcast Normal Weak Yes D14 Rain High Strong No
  • 62. Example  Whether the game will happen if the weather condition is Outlook = Rain, Humidity = High, Wind = weak Play=?  Step 1: divide the data into smaller subsets  Step 2: every subsets need not be distinct, some subsets may be overlapped 62
  • 63. 63 No Yes Weak Strong Yes No High Normal Yes No High Normal Weak Weak StrongStrong D1,D2,D3 D3,D4,D5,D6 D7,D8,D9 Majority Vote = Play No Play Play
  • 64. Advantages  Random forests is considered as a highly accurate and robust method.  It does not suffer from the overfitting problem.  The algorithm can be used in both classification and regression problems.  Random forests can also handle missing values.  You can get the relative feature importance, which helps in selecting the most contributing features for the classifier. 64
  • 65. Disadvantages  It can take longer than expected time to compute a large number of trees.  The model is difficult to interpret compared to a decision tree. 65
  • 66. Random forest vs Decision Trees  Random forests is a set of multiple decision trees.  Deep decision trees may suffer from overfitting, but random forests prevents overfitting by creating trees on random subsets.  Decision trees are computationally faster.  Random forests is difficult to interpret, while a decision tree is easily interpretable and can be converted to rules. 66
  • 67. Applications  Banking  Medicine  Stock Market  E-Commerce 67
  • 68. 68