Overview of Machine
Learning
Part - 02
DT, RF, KNN, Clustering
Evaluation of models
How to measure if your model is performing good?
Classification Regression
Accuracy,
Precision,
Recall,
F1-score,
Confusion Matrix
MAE,
MSE,
RMSE,
R-squared
Mobile Price Range Prediction Dataset
Mobile Price Prediction Dataset
Accuracy
accuracy = 6/9 = 0.667
or
66.7% accuracy
Confusion Matrix
Output:
Actual Predicted
Positive Positive = True Positive
Positive Negative = False Negative
Negative Positive = False Positive
Negative Negative = True Negative
Cat Not Cat
Cat Not Cat
Confusion Matrix
Confusion Matrix
4 2
2 5
Actual Predicted
Cat Cat
Cat Not Cat
Cat Cat
Not Cat Not Cat
Not Cat Cat
Not Cat Not Cat
Not Cat Not Cat
Cat Cat
Not Cat Cat
Not Cat Not Cat
Cat Not Cat
Cat Cat
Not Cat Not Cat
Performance Quiz
Can you tell the accuracy from the
confusion matrix?
Precision
মডেল যতগুলো পজিটিভ বলতেছে তার মধ্যে কতগুলা আসলেই পজিটিভ?
TP FN
FP TN
Predicted
Actual
Positive Negative
Positive
Negative
Recall
যতগুলো পজিটিভ ছিলো তার মধ্যে কতগুলাকে মডেল পজিটিভ হিসেবে ধরতে পারছে?
TP FN
FP TN
Predicted
Actual
Positive Negative
Positive
Negative
F1 Score
● Why F1 score is better than accuracy, precision or recall?
Terrorist Detection Model —> Accuracy?
In a Model, TP=40, FP=1, FN=20, FN=39 —> Precision?
In a Model, TP=40, FP=20, FN=1, FN=39 —> Recall?
● Why harmonic average, instead of normal average?
let, P=99, R=20,
(P+R)/2 = 59.5 f1-score = 33.277
Decision Tree
Decision Tree
Decision Tree
● Programmatically → It is a giant structure of nested if-else condition
● Mathematically → uses hyperplanes to cut the coordinate system
Based on gender
Gender Occup. Sugges.
F Student PUBG
F Programmer Github
F Programmer Github
Gender Occup. Sugges.
M Programmer Whatsapp
M Student PUBG
M Student PUBG
Based on Occupation
Gender Occup. Sugges.
F Student PUBG
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Programmer Github
M Programmer Whatsapp
F Programmer Github
female male student programmer
Entropy
Measure of Purity/Impurity
E
Entropy
Gender Occup. Sugges.
F Student PUBG
F Programmer Github
F Programmer Github
Gender Occup. Sugges.
M Programmer Whatsapp
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Student PUBG
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Programmer Github
M Programmer Whatsapp
F Programmer Github
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- 3/3 log 3/3
= 0
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
Calculating using Information Gain
Information Gain measures the quality of a split.
● Step-1: Calculate Entropy of the parent
E(Parent) = - 1/6 log 1/6 - 2/6 log 2/6 - 3/6 log 3/6 = 1.459
● Step-2: Calculate Entropy of the Children
[done in previous slide]
● Step-3: Calculate Information I of Children
Entropy
Gender Occup. Sugges.
F Student PUBG
F Programmer Github
F Programmer Github
Gender Occup. Sugges.
M Programmer Whatsapp
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Student PUBG
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Programmer Github
M Programmer Whatsapp
F Programmer Github
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- 3/3 log 3/3
= 0
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
I(Gender) = (3/6 * 0.52) + (3/6 * 0.52)
= 0.52
I(Occupation) = (3/6 * 0) + (3/6 * 0.52)
= 0.26
Calculating using Information Gain
Information Gain measures the quality of a split.
● Step-1: Calculate Entropy of the parent
E(Parent) = - 1/6 log 1/6 - 2/6 log 2/6 - 3/6 log 3/6 = 1.459
● Step-2: Calculate Entropy of the Children
[done in previous slide]
● Step-3: Calculate Information I of Children
● Step-4: Calculate Gain for the Children
Entropy
Gender Occup. Sugges.
F Student PUBG
F Programmer Github
F Programmer Github
Gender Occup. Sugges.
M Programmer Whatsapp
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Student PUBG
M Student PUBG
M Student PUBG
Gender Occup. Sugges.
F Programmer Github
M Programmer Whatsapp
F Programmer Github
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
- 3/3 log 3/3
= 0
- ⅓ log ⅓ - ⅔ log ⅔
= 0.52
I(Gender) = (3/6 * 0.52) + (3/6 * 0.52)
= 0.52
I(Occupation) = (3/6 * 0) + (3/6 * 0.52)
= 0.26
gain(Gender) = E(Parent) - I(Gender)
= 1.459 - 0.52
= 0.939
gain(Occupation) = E(Parent) - I(Occupation)
= 1.459 - 0.26
= 1.199
Overfitting & Underfitting
Random Forest
Random Forest
● Wisdom of the Crowd
collective opinion of a diverse independent group of individuals
Example: imdb rating, democracy
● Ensemble Learning
collection of multiple machine learning model.
Ensemble method requires variation. Ways to bring variation:
1) Using different models
2) Using same model but different dataset
3) Mixing both of above.
Types of Ensemble Learning
Voting
Dataset1
DT LgR SVM
Types of Ensemble Learning
Stacking: gives priority to the model which is more accurate
Dataset1
DT LgR SVM
Model
w0
w1
w2
Types of Ensemble Learning
Bagging (Bootstrapped Aggregation)
Dataset
Dataset1 Dataset2 Dataset3
DT DT DT
Types of Ensemble Learning
Boosting
Dataset
DT DT DT DT
Why Ensemble Technique Works?
Random Forest
If all the models in Bagging is Decision tree then it's a random forest.
Out of Bag (OOB) Evaluation
Out of bag samples: that never picked
Dataset = {1,2,3,4,5,6,7,8,9}
● DT1 = {1,3,2,5,6}
● DT2 = {2,9,6,5,2}
● DT3 = {4,1,2,9,4}
sample 7 & 8 is never used. they are out of bag sample. Mathematical experiment
says, 37% samples becomes OOB.
They are used as validation set. because they are never seen by the model.
K-Nearest Neighbour
Model that learns training data
and makes prediction on the
knowledge gained from training
Model that don't learn training
data and use training data only
while making predictions.
Linear Regression,
Logistic Regression,
Decision Tree,
Random Forest
Naive Bayes,
K-Nearest Neighbor
KNN
If a student misses class, as a
teacher whom you will ask the
reason about the absence?
Distance
Voting
KNN
Training Dataset:
Testing Sample:
Height = 172 cm and Weight = 70 kg;
Class=?
Step-1: Calculate Distance from all the training sample
Height Weight Class Distance
160 55 Athlete sqrt((172 - 160)2
+ (70-55)2
) = 19.21
170 65 Athlete sqrt((172 - 170)2
+ (70-65)2
) = 5.38
175 75 Non-Athlete sqrt((172 - 175)2
+ (70-75)2
) = 5.83
180 85 Non-Athlete sqrt((172 - 180)2
+ (70-85)2
) = 17
Nearest?
1
2
3
4
Step-2: Select K-nearest Example and Assign most common class
● K=1, class=?
● K=2, class=?
● K=3, class=?
Athlete
Tie
Non-Athlete
How to resolve tie?
● Reduce the Value of K
● Weighted Voting Based on Distance
● Use a Tiebreaker Rule:
Select the class that occurs most frequently in the entire dataset (global
majority class).
Distance Measure
● Euclidean Distance
● Manhattan Distance
● Minkowski Distance
Clustering
Welcome to unsupervised learning
Clustering
Learning by observation
Types of clustering
Hierarchical Clustering
● Agglomerative
(Bottom up)
● Divisive
(Top down)
Types of clustering
Density Based Clustering
● DBSCAN
Types of clustering
Grid Based Clustering
Types of clustering
Partitioning based Clustering (K-means, PAM, CLARA)
K-means algorithm
v1 v2
1 1.0 1.0
2 1.5 2.0
3 3.0 4.0
4 1.0 3.0
5 3.5 5.0
6 4.5 5.0
7 3.5 4.5
K-means Clustering
Step-1: Take any k sample as centroid of your cluster.
Let k1 = (1, 1) and K2 = (1.5, 2)
K-means clustering
Distance from k1 (1,1) Distance from k2 (1.5, 2)
(1,1) 0 1.11
(1.5, 2) 1.11 0
(3, 4) 3.6 2.5
(1, 3) 2 1.11
(3.5, 5) 4.71 3.6
(4.5, 5) 5.31 4.24
(3.5, 4.5) 4.3 3.2
Step-2: Calculate distance from each centroid to all sample
K-means clustering
Distance from k1
(1,1)
Distance from k2
(1.5, 2)
Assigned cluster
(1,1) 0 1.11 K1
(1.5, 2) 1.11 0 K2
(3, 4) 3.6 2.5 K2
(1, 3) 2 1.11 K2
(3.5, 5) 4.71 3.6 K2
(4.5, 5) 5.31 4.24 K2
(3.5, 4.5) 4.3 3.2 K2
Step-3: For each point, find the nearest Centroid based on distance and assign
the point to that cluster
K-means clustering
Now two clusters are: {(1,1)} and
{(1.5, 2), (3, 4), (1, 3), (3.5, 5), (3.5, 5), (4.5, 5), (3.5, 4.5)}
K-means clustering
Step-4: Calculate new centroids for each
clusters, which is the average of all the
samples of a cluster
for the first cluster, K1 = (1,1)
for the second cluster,
K2 = ( (1.5+3+1+3.5+4.5+3.5)/6,
(2+4+3+5+5+5)/6) = (2.83, 4)
New centroid
K-means clustering
Go to step-1 again with the new centroids, repeat until centroids dont change after
all the 4 steps.
Distance from k1
(1,1)
Distance from k2
(2.83, 4)
Assigned cluster
(1,1) 0 3.51 K1
(1.5, 2) 1.11 2.4 K1
(3, 4) 3.6 0.17 K2
(1, 3) 2 2.08 K1
(3.5, 5) 4.71 1.2 K2
(4.5, 5) 5.31 1.94 K2
(3.5, 4.5) 4.3 0.83 K2
K-means clustering
Now two clusters are: {(1,1), (1.5, 2), (1, 3)} and
{(3, 4), (3.5, 5), (3.5, 5), (4.5, 5), (3.5, 4.5)}
K-means Clustering
New centroids:
for the first cluster
((1+1.5+1)/3, (1+2+3)/3) = (1.66, 2)
for the second cluster
((3+3.5+4.5+3.5)/4, (4+5+5+4.5)/4)
= (3.62, 4.62)
Do yourself:
Do the same process again with the
new centroids and see if the
centroids changes anymore
Overview of Machine Learning Part-2.pptx

Overview of Machine Learning Part-2.pptx

  • 1.
    Overview of Machine Learning Part- 02 DT, RF, KNN, Clustering
  • 2.
    Evaluation of models Howto measure if your model is performing good?
  • 3.
  • 4.
    Mobile Price RangePrediction Dataset
  • 5.
  • 6.
    Accuracy accuracy = 6/9= 0.667 or 66.7% accuracy
  • 7.
    Confusion Matrix Output: Actual Predicted PositivePositive = True Positive Positive Negative = False Negative Negative Positive = False Positive Negative Negative = True Negative Cat Not Cat Cat Not Cat
  • 8.
  • 9.
    Confusion Matrix 4 2 25 Actual Predicted Cat Cat Cat Not Cat Cat Cat Not Cat Not Cat Not Cat Cat Not Cat Not Cat Not Cat Not Cat Cat Cat Not Cat Cat Not Cat Not Cat Cat Not Cat Cat Cat Not Cat Not Cat Performance Quiz Can you tell the accuracy from the confusion matrix?
  • 10.
    Precision মডেল যতগুলো পজিটিভবলতেছে তার মধ্যে কতগুলা আসলেই পজিটিভ? TP FN FP TN Predicted Actual Positive Negative Positive Negative
  • 11.
    Recall যতগুলো পজিটিভ ছিলোতার মধ্যে কতগুলাকে মডেল পজিটিভ হিসেবে ধরতে পারছে? TP FN FP TN Predicted Actual Positive Negative Positive Negative
  • 12.
    F1 Score ● WhyF1 score is better than accuracy, precision or recall? Terrorist Detection Model —> Accuracy? In a Model, TP=40, FP=1, FN=20, FN=39 —> Precision? In a Model, TP=40, FP=20, FN=1, FN=39 —> Recall? ● Why harmonic average, instead of normal average? let, P=99, R=20, (P+R)/2 = 59.5 f1-score = 33.277
  • 13.
  • 14.
  • 15.
    Decision Tree ● Programmatically→ It is a giant structure of nested if-else condition ● Mathematically → uses hyperplanes to cut the coordinate system
  • 16.
    Based on gender GenderOccup. Sugges. F Student PUBG F Programmer Github F Programmer Github Gender Occup. Sugges. M Programmer Whatsapp M Student PUBG M Student PUBG Based on Occupation Gender Occup. Sugges. F Student PUBG M Student PUBG M Student PUBG Gender Occup. Sugges. F Programmer Github M Programmer Whatsapp F Programmer Github female male student programmer
  • 17.
  • 18.
    Entropy Gender Occup. Sugges. FStudent PUBG F Programmer Github F Programmer Github Gender Occup. Sugges. M Programmer Whatsapp M Student PUBG M Student PUBG Gender Occup. Sugges. F Student PUBG M Student PUBG M Student PUBG Gender Occup. Sugges. F Programmer Github M Programmer Whatsapp F Programmer Github - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - 3/3 log 3/3 = 0 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52
  • 19.
    Calculating using InformationGain Information Gain measures the quality of a split. ● Step-1: Calculate Entropy of the parent E(Parent) = - 1/6 log 1/6 - 2/6 log 2/6 - 3/6 log 3/6 = 1.459 ● Step-2: Calculate Entropy of the Children [done in previous slide] ● Step-3: Calculate Information I of Children
  • 20.
    Entropy Gender Occup. Sugges. FStudent PUBG F Programmer Github F Programmer Github Gender Occup. Sugges. M Programmer Whatsapp M Student PUBG M Student PUBG Gender Occup. Sugges. F Student PUBG M Student PUBG M Student PUBG Gender Occup. Sugges. F Programmer Github M Programmer Whatsapp F Programmer Github - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - 3/3 log 3/3 = 0 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 I(Gender) = (3/6 * 0.52) + (3/6 * 0.52) = 0.52 I(Occupation) = (3/6 * 0) + (3/6 * 0.52) = 0.26
  • 21.
    Calculating using InformationGain Information Gain measures the quality of a split. ● Step-1: Calculate Entropy of the parent E(Parent) = - 1/6 log 1/6 - 2/6 log 2/6 - 3/6 log 3/6 = 1.459 ● Step-2: Calculate Entropy of the Children [done in previous slide] ● Step-3: Calculate Information I of Children ● Step-4: Calculate Gain for the Children
  • 22.
    Entropy Gender Occup. Sugges. FStudent PUBG F Programmer Github F Programmer Github Gender Occup. Sugges. M Programmer Whatsapp M Student PUBG M Student PUBG Gender Occup. Sugges. F Student PUBG M Student PUBG M Student PUBG Gender Occup. Sugges. F Programmer Github M Programmer Whatsapp F Programmer Github - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 - 3/3 log 3/3 = 0 - ⅓ log ⅓ - ⅔ log ⅔ = 0.52 I(Gender) = (3/6 * 0.52) + (3/6 * 0.52) = 0.52 I(Occupation) = (3/6 * 0) + (3/6 * 0.52) = 0.26 gain(Gender) = E(Parent) - I(Gender) = 1.459 - 0.52 = 0.939 gain(Occupation) = E(Parent) - I(Occupation) = 1.459 - 0.26 = 1.199
  • 23.
  • 24.
  • 25.
    Random Forest ● Wisdomof the Crowd collective opinion of a diverse independent group of individuals Example: imdb rating, democracy ● Ensemble Learning collection of multiple machine learning model. Ensemble method requires variation. Ways to bring variation: 1) Using different models 2) Using same model but different dataset 3) Mixing both of above.
  • 26.
    Types of EnsembleLearning Voting Dataset1 DT LgR SVM
  • 27.
    Types of EnsembleLearning Stacking: gives priority to the model which is more accurate Dataset1 DT LgR SVM Model w0 w1 w2
  • 28.
    Types of EnsembleLearning Bagging (Bootstrapped Aggregation) Dataset Dataset1 Dataset2 Dataset3 DT DT DT
  • 29.
    Types of EnsembleLearning Boosting Dataset DT DT DT DT
  • 30.
  • 31.
    Random Forest If allthe models in Bagging is Decision tree then it's a random forest.
  • 32.
    Out of Bag(OOB) Evaluation Out of bag samples: that never picked Dataset = {1,2,3,4,5,6,7,8,9} ● DT1 = {1,3,2,5,6} ● DT2 = {2,9,6,5,2} ● DT3 = {4,1,2,9,4} sample 7 & 8 is never used. they are out of bag sample. Mathematical experiment says, 37% samples becomes OOB. They are used as validation set. because they are never seen by the model.
  • 33.
  • 34.
    Model that learnstraining data and makes prediction on the knowledge gained from training Model that don't learn training data and use training data only while making predictions. Linear Regression, Logistic Regression, Decision Tree, Random Forest Naive Bayes, K-Nearest Neighbor
  • 35.
    KNN If a studentmisses class, as a teacher whom you will ask the reason about the absence? Distance Voting
  • 36.
    KNN Training Dataset: Testing Sample: Height= 172 cm and Weight = 70 kg; Class=?
  • 37.
    Step-1: Calculate Distancefrom all the training sample Height Weight Class Distance 160 55 Athlete sqrt((172 - 160)2 + (70-55)2 ) = 19.21 170 65 Athlete sqrt((172 - 170)2 + (70-65)2 ) = 5.38 175 75 Non-Athlete sqrt((172 - 175)2 + (70-75)2 ) = 5.83 180 85 Non-Athlete sqrt((172 - 180)2 + (70-85)2 ) = 17 Nearest? 1 2 3 4
  • 38.
    Step-2: Select K-nearestExample and Assign most common class ● K=1, class=? ● K=2, class=? ● K=3, class=? Athlete Tie Non-Athlete
  • 39.
    How to resolvetie? ● Reduce the Value of K ● Weighted Voting Based on Distance ● Use a Tiebreaker Rule: Select the class that occurs most frequently in the entire dataset (global majority class).
  • 40.
    Distance Measure ● EuclideanDistance ● Manhattan Distance ● Minkowski Distance
  • 41.
  • 42.
  • 43.
    Types of clustering HierarchicalClustering ● Agglomerative (Bottom up) ● Divisive (Top down)
  • 44.
    Types of clustering DensityBased Clustering ● DBSCAN
  • 45.
    Types of clustering GridBased Clustering
  • 46.
    Types of clustering Partitioningbased Clustering (K-means, PAM, CLARA)
  • 47.
    K-means algorithm v1 v2 11.0 1.0 2 1.5 2.0 3 3.0 4.0 4 1.0 3.0 5 3.5 5.0 6 4.5 5.0 7 3.5 4.5
  • 48.
    K-means Clustering Step-1: Takeany k sample as centroid of your cluster. Let k1 = (1, 1) and K2 = (1.5, 2)
  • 49.
    K-means clustering Distance fromk1 (1,1) Distance from k2 (1.5, 2) (1,1) 0 1.11 (1.5, 2) 1.11 0 (3, 4) 3.6 2.5 (1, 3) 2 1.11 (3.5, 5) 4.71 3.6 (4.5, 5) 5.31 4.24 (3.5, 4.5) 4.3 3.2 Step-2: Calculate distance from each centroid to all sample
  • 50.
    K-means clustering Distance fromk1 (1,1) Distance from k2 (1.5, 2) Assigned cluster (1,1) 0 1.11 K1 (1.5, 2) 1.11 0 K2 (3, 4) 3.6 2.5 K2 (1, 3) 2 1.11 K2 (3.5, 5) 4.71 3.6 K2 (4.5, 5) 5.31 4.24 K2 (3.5, 4.5) 4.3 3.2 K2 Step-3: For each point, find the nearest Centroid based on distance and assign the point to that cluster
  • 51.
    K-means clustering Now twoclusters are: {(1,1)} and {(1.5, 2), (3, 4), (1, 3), (3.5, 5), (3.5, 5), (4.5, 5), (3.5, 4.5)}
  • 52.
    K-means clustering Step-4: Calculatenew centroids for each clusters, which is the average of all the samples of a cluster for the first cluster, K1 = (1,1) for the second cluster, K2 = ( (1.5+3+1+3.5+4.5+3.5)/6, (2+4+3+5+5+5)/6) = (2.83, 4) New centroid
  • 53.
    K-means clustering Go tostep-1 again with the new centroids, repeat until centroids dont change after all the 4 steps. Distance from k1 (1,1) Distance from k2 (2.83, 4) Assigned cluster (1,1) 0 3.51 K1 (1.5, 2) 1.11 2.4 K1 (3, 4) 3.6 0.17 K2 (1, 3) 2 2.08 K1 (3.5, 5) 4.71 1.2 K2 (4.5, 5) 5.31 1.94 K2 (3.5, 4.5) 4.3 0.83 K2
  • 54.
    K-means clustering Now twoclusters are: {(1,1), (1.5, 2), (1, 3)} and {(3, 4), (3.5, 5), (3.5, 5), (4.5, 5), (3.5, 4.5)}
  • 55.
    K-means Clustering New centroids: forthe first cluster ((1+1.5+1)/3, (1+2+3)/3) = (1.66, 2) for the second cluster ((3+3.5+4.5+3.5)/4, (4+5+5+4.5)/4) = (3.62, 4.62) Do yourself: Do the same process again with the new centroids and see if the centroids changes anymore