20. 14 in total
β 3 incorrect
β 11 correct
80% accuracy
(only 6% worse!)
21. Accuracy
β Easy to understand
β Misleading in cases of imbalance (our case: only 20% are spam)
β 86% seems quite good β until compared to baseline with 80%
Is there anything else?
23. Precision
Among predicted as spam, how many messages are indeed spam
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
24. Precision
Among predicted as spam, how many messages are indeed spam
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
25. Precision
Among predicted as spam, how many messages are indeed spam
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
26. Precision
Among predicted as spam, how many messages are indeed spam
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
27. Precision
Among predicted as spam, how many messages are indeed spam
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
β
β
β
28. Precision
Among predicted as spam, how many messages are indeed spam
ββββββπ
βπβββπββ
β
β
Precision = 2 / 3 = 66%
30. Recall
Among all the spam messages, how many are classified correctly
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
31. Recall
Among all the spam messages, how many are classified correctly
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
32. Recall
Among all the spam messages, how many are classified correctly
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
33. Recall
Among all the spam messages, how many are classified correctly
πβββββπ
βπβββββ
ββββββπ
βπβββπβ
Predict
Actual
Prediction
β
ββ
34. Recall
Among all the spam messages, how many are classified correctly
ββββββπ
βπβββπββ
ββ
Recall = 2 / 3 = 66%
36. Precision vs Recall
Recall:
Fraction of all spam messages that ended
up in spam.
We check both inbox and spam.
Precision:
Fraction of correctly classified spam
messages in spam.
We check only the spam folder.
38. Baseline
Dummy baseline
(always predict not spam)
Home task:
Check these numbers!
Do it for two baselines:
β Always predict βspamβ
β Always predict βno spamβ
3
0
R =
0
0
P =
39. Summary
β Accuracy may be misleading in cases of class imbalance
β Precision: number of actual spam messages in the spam folder
β Recall: number of spam messages that correctly ended up in the spam folder
40. mlbookcamp.com
β Learn Machine Learning by doing
projects
β http://bit.ly/mlbookcamp
β Get 40% off with code βgrigorevpcβ
Machine Learning
Bookcamp