Ways to evaluate a machine learning model’s performance

Machine Learning
Various ways to evaluate a machine learning model’s performance
February 26, 2020
Mala Deep Upadhaya

Table of Content
Slides 2 of 17
1.Warm Up With Concept
2. Confusion matrix
3.Accuracy
4.Precision
5.Recall
6.Specificity
7.F1 score
8.Precision-Recall or PR curve
9.ROC (Receiver Operating Characteristics) curve
10.PR vs ROC curve.

Warm Up With Concept
Slides 3 of 17
• Dealing around binary
classification Problem
Example: we’ll have to find if an image is
of a cat or a dog
Or
a patient is having cancer (positive) or is
found healthy (negative)
Source: https://towardsdatascience.com/decoding-the-
confusion-matrix-bb4801decbb

Confusion Matrix
Slides 4 of 17
• True positives (TP): Predicted positive and are actually
positive
• False positives (FP): Predicted positive and are actually
negative
• True negatives (TN): Predicted negative and are actually
negative
• False negatives (FN): Predicted negative and are actually
positive.

Accuracy
Slides 5 of 17
• Most used metric to judge a model
• Not clear indicator of the performance.
Example: Cancer detection model
• Chances of having cancer are very low. Say out of 100, 90 of the patients don’t have cancer
and the remaining 10 have it.
• We don’t want to miss on a patient who is having cancer but goes undetected (false
negative).
• Detecting everyone as not having cancer gives an accuracy of 90% straight.
The model did nothing here but just gave cancer free for all the 100 predictions.
Worse happens when classes are
imbalanced.
Formula:
We surely need better alternatives.

Precision
Slides 6 of 17
• Gives percentage of positive instances out of the total
predicted positive instances.
Formula:
Denominator is the model prediction
done as positive from the whole given
dataset.
How much the model is right when it says it is right!
Use it when you want to find the answer for the:

Recall/Sensitivity/True Positive Rate
Slides 7 of 17
• Gives percentage of positive instances out of the total actual
positive instances.
Formula:
Denominator is actual number of
positive instances present in the
dataset
How much extra right ones, the model missed when it showed
the right ones!
Use it when you want to find the answer for the:

Specificity
Slides 8 of 17
• Gives percentage of negative instances out of the total
actual negative instances.
• Like Recall but the shift is on the negative instances.
Formula: Denominator is actual number of
negative instances present in the
dataset
It is finding out how many healthy patients were not having cancer and were told they don’t have cancer

F1 score
Slides 9 of 17
• It is harmonic mean of precision and recall so higher the F1 score, the
better.
• Model does well in F1 score if the positive predicted are actually positives
(precision) and doesn't miss out on positives and predicts them negative
(recall).
Formula:
One drawback: Both precision and recall are given equal importance due to which according to our application we
may need one higher than the other and F1 score may not be the exact metric for it.

PR curve
Slides 10 of 17
• It is the curve between precision and recall for various
threshold values.
• In figure: 6 predictors showing their respective precision-
recall curve for various threshold values.
• Top right part of the graph is the ideal space where we get
high precision and recall
• PR AUC is just the area under the curve.
How to choose the predictor and the threshold value?
Based on our application nature
Source: https://github.com/MenuPolis/MLT/wiki/PR-Curve

ROC curve
Slides 11 of 17
• ROC stands for Receiver Operating
Characteristic.
• Graph is plotted against TPR and FPR for various
threshold values.
• TPR increases, FPR also increases.
• Eg: We have four categories and we want the
threshold value that leads us closer to the top
left corner.

ROC curve
Slides 12 of 17
• Comparing different predictors (here
3) on a given dataset also becomes
easy as you can see in right side
figure, one can choose the threshold
according to the application at
hand.
• ROC AUC is just the area under the
curve, the higher its numerical value
the better.
Source:https://en.wikipedia.org/wiki/Receiver_operating_characteristic

PR vs ROC curve
Slides 13 of 17
• Which one to use PR or ROC?
The answer lies in TRUE NEGATIVES.

PR vs ROC curve
Slides 14 of 17
• Due to the absence of TN in the precision-recall equation, they are useful in
imbalanced classes.
• Class imbalance means when there is a majority of the negative class.
• Metric doesn’t take much into consideration the high number of TRUE NEGATIVES of
the negative class which is in majority, giving better resistance to the imbalance.
This is important when the
detection of the positive class is
very important.
• Eg: To detect cancer patients, which has a high-class
imbalance because very few have it out of all the
diagnosed. We certainly don’t want to miss on a person
having cancer and going undetected (recall) and be sure
the detected one is having it (precision).

PR vs ROC curve
Slides 15 of 17
• Due to the consideration of TN or the negative class in the ROC equation, it is useful
when both the classes are important to us.
• Like the detection of cats and dog. The importance of true negatives makes sure that
both the classes are given importance, like the output of a CNN model in
determining the image is of a cat or a dog.

Conclusion
Slides 16 of 17
• Confusion matrix : Representation of the
True positives (TP),False positives (FP),True
negatives (TN), False negatives (FN)in a
matrix format.
• Accuracy: Worse happens when classes are
imbalanced.
• Precision: Find the answer of How much the
model is right when it says it is right!
• Recall: Find the answer of How much extra
right ones, the model missed when it
showed the right ones!
• Specificity: Like Recall but the shift is on
the negative instances.
• F1 score: Is harmonic mean of precision
and recall so higher the F1 score, the
better.
• Precision-Recall or PR curve: Curve
between precision and recall for various
threshold values.
• ROC curve: Graph is plotted against TPR
and FPR for various threshold values.
Evaluation metric to use depends heavily on the task at hand.

References
• https://towardsdatascience.com/various-ways-to-evaluate-a-machine-learning-models-performance-230449055f15
Slides 17 of 17

Ways to evaluate a machine learning model’s performance

More Related Content

What's hot

Similar to Ways to evaluate a machine learning model’s performance

Recently uploaded

Ways to evaluate a machine learning model’s performance