2. Confusion Matrix
Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
3. Confusion Matrix
Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
4. Most popular Classification metrics
Name Description Interpretation
Accuracy ð¡ðð¢ð ððð ðð¡ðð£ðð + ð¡ðð¢ð ððððð¡ðð£ðð
ððð ððð¡ð
The proportion of exact match of
prediction y to real y.
Best is 1.0
Misclassification Rate 1 â Accuracy OR
ðððð ð ððð ðð¡ðð£ðð +ðððð ð ððððð¡ðð£ðð
ððð ððð¡ð
The proportion of incorrect match of
prediction to real y.
Best is 0
Precision ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
How many selected items are relevant?
Best is 1.0
Recall
AKA Sensitivity, hit rate
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
How many relevant items are selected?
Best is 1.0
F-score A measure that balances precision and recall.
ð¹ â 1 ð ðððð = 2 â
ðððððð ððð â ðððððð
ðððððð ððð + ðððððð
Balance score between precision and
recall ï use F-1 score
5. Predictions and errors
Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
How many relevant items are selected?ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
How many selected items are relevant?
https://en.wikipedia.org/wiki/Precision_and_recall
When it predicts yes, how often is it correct?
When it's actually yes, how often does it predict yes?
6.
7. Consider this example
The mafia syndicate makes sure to get the
right person for the family.
When in doubt, reject.
Only accept when absolutely sure.
Does the Don look for a high
precision or recall?
8. Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
Class 1: the Donâs family Class 0: not in the family
Willing to err on the False negative
side
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
So⊠OR?
9. Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
Class 1: the Donâs family Class 0: not in the family
Willing to err on the False Negative
side
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
OR?
When False Positive is high ï False
Negative is low
Best? False Negative is ALL of
misclassification rate ï False
positive is 0
misclassification rate =
ðððð ð ððð ðð¡ðð£ðð +ðððð ð ððððð¡ðð£ðð
ððð ððð¡ð
10. Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
Class 1: the Donâs family Class 0: not in the family
Willing to err on the False Negative
side
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
OR?
When False Positive is high ï False
Positive is low
Best? False Negative is ALL of
misclassification rate ï False
positive is 0
misclassification rate =
ðððð ð ððð ðð¡ðð£ðð +ðððð ð ððððð¡ðð£ðð
ððð ððð¡ð
11. ThereforeâŠ
⢠It is aboutâŠ
PRECISION!
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
How many selected items are relevant?
When it predicts yes, how often is it correct?
12. Consider another example
You want to marry ï go purchase a $$$ ring
Credit card company rejects your purchase,
saying it âmight beâ fraud, calling you to
verify.
Does the credit card company
look for a high precision or
recall?
13. Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
Class 1: fraud Class 0: not fraud
Willing to err on the False Positive
side
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
So⊠OR?
14. Real Y is positive (Class 1) Real Y is negative (Class 0)
Predicted Y is
positive (Class 1)
True positive
predict positive, and is positive
ï Correct / True prediction
False positive
predict positive, but is negative
ï Incorrect / False positive
Type I error
Predicted Y is
negative (Class 0)
False negative
predict negative, but is positive
ï Incorrect / False prediction
Type II error
True negative
predict negative, and is negative
ï Correct / True prediction
Class 1: fraud Class 0: not fraud
Willing to err on the False Positive
side
ðððððð ððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððð ðð¡ðð£ðð
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð
OR?
When False Positive is high ï False
Negative is low
Best? False Positive is ALL of
misclassification rate ï False
Negative is 0
misclassification rate =
ðððð ð ððð ðð¡ðð£ðð +ðððð ð ððððð¡ðð£ðð
ððð ððð¡ð
15. ThereforeâŠ
⢠It is aboutâŠ
RECALL!
How many relevant items are selected?
When it's actually yes, how often does it predict yes?
ð ððððð =
ð¡ðð¢ð ððð ðð¡ðð£ðð
ð¡ðð¢ð ððð ðð¡ðð£ðð + ðððð ð ððððð¡ðð£ðð