3. Outline
• What is Machine Learning?
• Main Types of Learning
• Model Validation, Selection, and Evaluation
• Applied Machine Learning Process
• Cautions
5. –Arthur Samuel (1959)
“Field of study that gives computers the ability
to learn without being explicitly programmed.”
6. –Tom Mitchell (1988)
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”
9. Programming
“Given a specification of a function f,
implement f that meets the specification.”
Machine Learning
“Given example (x, y) pairs, induce f such
that y = f(x) for given pairs and generalizes
well for unseen x”
–Peter Norvig (2014)
10. Why is Machine Learning so hard?
http://veronicaforand.com/
36. Calculating Conditional Probability
• Probability that I eat bread for breakfast, P(A), is 0.6.
• Probability that I eat steak for lunch, P(B), is 0.5.
• Given I eat steak for lunch, the probability that I eat bread
for breakfast, P(A | B), is 0.7.
• What is P(B | A)?
• What about when A and B are independent?
37. A2A1 A3 An
Ck
. . .
P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)
P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)
with independence assumption, we then have
Naive Bayes
38. Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
39. Naive Bayes
P(Spam | Party, Programming) = P(Spam) * P(Party | Spam) * P(Programming | Spam)
P(NotSpam | Party, Programming) = P(NotSpam) * P(Party | NotSpam) * P(Programming | NotSpam)
We want to find if “Party Programming” is spam or not?
We need to know
P(Spam), P(NotSpam)
P(Party | Spam), P(Party | NotSpam)
P(Programming | Spam), P(Programming | NotSpam)
40. Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
P(Spam) = ? P(NotSpam) = ?
P(Party | Spam) = ? P(Party | NotSpam) = ?
P(Programming | Spam) = ? P(Programming | NotSpam) = ?
41. Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No
P(Spam) = 3/5 P(NotSpam) = 2/5
P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2
P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2
43. Decision Tree
Outlook
Humidity Wind
Sunny
Overcast
Rain
Yes
High Normal Strong Weak
No Yes No Yes
Day Outlook Temp Humidity WInd Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Mild High Strong Yes
D4 Rain Cool Normal Strong No
Play tennis?
51. Should I recommend “The Last Which Hunter” to
Roofimon? (User-Based)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Kan 5 4 1 3
Roofimon 5 4 3 ?
Juacompe 1 3 3
John 4 1What should the rating be?
Find the most similar user to Roofimon
52. Should I recommend “The Last Which Hunter” to
Roofimon? (Item-Based)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Kan 5 4 1 3
Roofimon 5 4 3 ?
Juacompe 1 3 3
John 4 1
Find the most similar item to The Last Witch Hunter
What should the rating be?
53. Should I recommend “The Last Which Hunter” to
Roofimon? (Matrix Factorization)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Roofimon 5 4 3 ?
User Scary Kiddy
Roofimon 2 5
Movie Scary Kiddy
TLWH 3/4 1/4
(2 x 3/4) + (5 x 1/4) = 2.75
66. How to Avoid Overfitting and Underfitting
• Using more data does NOT always help.
• Recommend to
• find a good number of features;
• perform cross validation;
• use regularization when overfitting is found.
76. Applied Machine Learning Process
http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/
77. Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000)
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining
84. http://newventurist.com/
• Curse of dimensionality
• Correlation does NOT
imply causation.
• Learn many models,
not just ONE.
• More data beats
a cleaver algorithm.
• Data alone are not enough.
A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)
Some Cautions
86. Machine Learning and Feature Representation
Learning
Algorithm
Input
— Feature engineering is the key. —
Feature
Representation
87. Garbage In - Garbage Out
http://blog.marksgroup.net/2013/05/zoho-crm-garbage-in-garbage-out-its.html
88. Example of Feature Engineering
Width (m) Length (m) Cost (baht)
100 100 1,200,000
500 50 1,300,000
100 80 1,000,000
400 100 1,500,000
Are the data good to
model the area’s cost?
Size (m x m) Cost (baht)
100,000 1,200,000
25,000 1,300,000
8,000 1,000,000
400,00 1,500,000
Engineer features.
They look better here.