5. Need for Performance Metrics
✦How do you rank machine learning algorithm?
✦How can you pick one algorithm over the other?
✦How do you measure and compare these
algorithms?
6. Need for Performance Metrics
✦ Performance metric is the answer to these
questions.
✦It helps measure and compare algorithms.
7. - Stephen Few
“Numbers have an important story to tell.
They rely on you to give them a voice.”
Performance Metrics
8. - Stephen Few
“Numbers have an important story to tell.
They rely on you to give them a voice.”
Performance Metrics
Assess Machine Learning Algorithms
Machine learning models are evaluated against your selected performance
metrics
Help evaluate efficiency and accuracy of machine learning models
9. Key Methods of Performance
Metrics
Confusion Matrix Accuracy
Precision Recall
Specificity F1 Score
10. Meaning of Confusion Matrix
TP FP
FN TN
Actual
Positives(1) Negatives(0)
Positives(1)
Negatives(0)
Predicted
One of the most intuitive and easiest metrics used to find
correctness and accuracy
Not a performance measure
Almost all performance metrics are based on confusion matrix
11. Confusion Matrix : Example
Cancer Prediction System
There are different approaches that can
hep the center predict cancer
Okay
Let me introduce you to one of the easiest
matrices that can help you predict whether a
person has cancer, the confusion matrix.
12. Confusion Matrix : Classification
Problem
How to predict if a person has cancer?
Give a label / class to the target variables:
When a person is diagnosed with cancer
When a person is does not have cancer
1
0
13. Confusion Matrix : Classification
Problem
TP FP
FN TN
Actual
Positives(1) Negatives(0)
Positives(1)
Negatives(0)
Predicted
Sets of classes are given in both dimensions
15. True Positive
True Positive
T
P
T
N
F
N
F
P
True Positives are the cases where the actual
class of the data point is 1 (true) and the
predicted value is also 1 (true).
The case where a person has cancer and the
model classifies the case as cancer positive
comes under true positive.
16. True Negative
True Negative
T
P
T
N
F
N
F
P
True Negatives are the cases when the actual
class of the data point is 0 (false) and the
predicted is also 0 (false). It is negative
because the class predicted was negative.
The case where a person does not have
cancer and the model classifies the case as
cancer negative comes under true negative.
17. False Positive
T
P
T
N
F
N
F
P
False positives are the cases when the actual
class of the data point is 0 (false) and the
predicted is 1 (true). It is false because the
model has predicted incorrectly.
The case where a person does not have
cancer and the model classifies the case as
cancer positive comes under false positive.
False Positive
18. False Negative
False Negative
T
P
T
N
F
N
F
P
• False negatives are the cases when the
actual class of the data point is 1 (true) and
the predicted is 0 (false).
• It is false because the model has predicted
incorrectly.
• It is negative because the class predicted
was negative.
The case where a person has cancer and the
model classifies the case as cancer negative
comes under false negatives.
19. Minimize False Cases
What should be
minimised?
✦A model is best identified by its accuracy
✦No rules are defined to identify false cases
✦It depends on business requirements and context
of the problem.
20. Minimize False Negative :
Example
Out of 100
people
Actual cancer
patients = 5
Bad Model
Predicts everyone as non-
cancerous
Accuracy = 95%
When a person who does not have cancer is
classified as cancerous
Missing a cancer patient will be a huge
mistake
21. Minimize False Positive :
Example
The model needs to classify an email as spam or ham (term used for
genuine email).
Assign a label / class to the target variables:
Email is spam
Email is not spam
1
0
22. Minimize False Positive :
Example
Incoming mail Model
In case of false positive
Important email as spam
! Business stands a chance to miss
an important communication
An important email marked as
spam is more business critical
than diverting a spam email to
inbox.
Classifies
25. Accuracy : Example
When the target variable
classes in the data are nearly
balanced
When do we use
accuracy?
26. Accuracy : Example
The machine learning model will
have approximately 97%
accuracy in any new predictions.
27. Accuracy : Example
5 out of 100 people have cancer
When do you
NOT use
accuracy?
It’s a bad model and predicts every case as
noncancerous
It classifies 95 noncancerous patients correctly and 5
cancerous patients as noncancerous
Accuracy of the model is 95%
When the target variable classes in the data are a
majority of one class
28. Precision
• Refers to the closeness of two or more
measurements
• Aims at deriving correct proportion of
positive identifications
30. Precision : Example
Its a bad model and predicts every case as cancer
When do we use
precision?
Everyone has been predicted as having cancer
Precision of the model is 5%
5 out of 100 people have cancer
31. Recall or Sensitivity
Recall or sensitivity measures the proportion of
actual positives and that are correctly identified.
33. Recall or Sensitivity : Example
Predicts every case as cancer
When do we use
recall?
Recall is 100%
Precision of the model is 5%
5 out of 100 people have cancer
34. Recall as a Measure
When do we use
precision and
when do we use
recall?
Precision is about being
precise, whereas recall is about
capturing all the cases.
35. Recall as a Measure
When do we use
precision and
when do we use
recall?
If the model captures one
correct cancer positive case, it is
100% precise.
36. Recall as a Measure
When do we use
precision and
when do we use
recall?
If the model captures ever case
as cancer positive, you
have100% recall.
37. Recall as a Measure
When do we use
precision and
when do we use
recall?
To focus on minimising false
negatives you would want 100%
recall with a good precision
score.
38. Recall as a Measure
When do we use
precision and
when do we use
recall?
To focus on minimising false
positives you should aim for
100% precision.
39. Specificity
• Measures = proportion of actual negatives
that are correctly identified
• Tries to identify probability of a negative test
result when input with a negative example
41. Specificity : Example
Predicts every case as cancer
So specificity is
the exact
opposite of
recall
Specificity is 0%
5 out of 100 people have cancer
42. F1 Score
Do you have to carry both precision and
recall in your pockets every time you
make a model to solve a classification
problem?
No to avoid taking both precision and
recall, its best to get a single score
(F1 score) that can represent both
precision (P) and recall (R).
43. F1 Score : Calculation
3 97
0 0
Actual
Fraud Not Fraud
Fraud
Not Fraud
Predicates
F1 Score =
2 * Precision * Recall
Precision + Recall
44. F1 Score : Example
97 out of 100 credit card transactions are legit and 3 are
fraud
When do you
use F1 score?
Predicts everything as fraud
Fraud detection
45. F1 Score : Example
Precision =
3
100
= 3%
Recall =
100
3
= 100%
Arithmetic Mean =
3+100
2
= 51.5%
46. Harmonic Mean
• Harmonic mean is an average used when x
and y are equal
• Value of the mean is smaller when x and y are
different
With reference to the fraud detection example,
F1 Score can be calculated as
F1 Score =
2 * Precision * Recall
Precision + Recall
=
2 * 3 * 100
100 + 3
= 5%
47. Key Takeaways
✦Confusion matrix is used to find correctness and accusation of machine learning models. It is
also used for classification problems where the output can be one of two or more types of
classes.
✦Accuracy is the number of correct prediction made by the model over all kinds of predictions.
✦Precisision refers to the closeness of two or more measurements to each other
✦Recall measures the proportion of actual positives that are identified correctly.
✦Specificity measures the proportion of actual negatives that are identified correctly.
✦F1 Score gives a single score that represents both precision (P) and recall (R).
✦Harmonic mean is used when the sample data contains extreme value because it is more
balanced than arithmetic mean.
Editor's Notes
So many algorithms around. How do you decide which is best?
Cancer research
No model is 100% accurate and therefore to be closer to accurate we have to minimise the errors in false cases