Machine Learning at Geeky Base 2

Machine Learning
http://www.bigdata-madesimple.com/
Kan Ouivirach
Geeky Base (2015)

About Me
Research & Development
Engineer

www.kanouivirach.com
Kan Ouivirach

Outline
• What is Machine Learning?
• Main Types of Learning
• Model Validation, Selection, and Evaluation
• Applied Machine Learning Process
• Cautions

–Arthur Samuel (1959)
“Field of study that gives computers the ability
to learn without being explicitly programmed.”

–Tom Mitchell (1988)
“A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience E.”

Statistics vs. Data Mining vs. Machine Learning vs. …?

Programming vs. Machine Learning?

Programming
“Given a speciﬁcation of a function f,
implement f that meets the speciﬁcation.”
Machine Learning
“Given example (x, y) pairs, induce f such
that y = f(x) for given pairs and generalizes
well for unseen x”
–Peter Norvig (2014)

Why is Machine Learning so hard?
http://veronicaforand.com/

http://www.thinkgeek.com/product/f0ba/
What do you see?
11111110
11100101
00001010
While the computer sees this

Machine Learning and Feature Representation
Learning
Algorithm
Input
Feature
Representation

Dog and Cat?
http://thisvsthatshow.com/

Applications of Machine Learning
• Search Engines
• Medical Diagnosis
• Object Recognition
• Stock Market Analysis
• Credit Card Fraud Detection
• Speech Recognition
• etc.

Recommendation System on Amazon.com

http://www.npr.org/sections/money/2011/11/15/142366953/the-tuesday-podcast-from-harvard-economist-to-casino-ceo
Ceasars Entertainment Corporation
Gary Loveman

God’s Eye
Fast & Furious 7
http://www.standbyformindcontrol.com/2015/04/furious-7-gets-completely-untethered/

PREDdictive POLicing - type of crime, place of crime, and time of crime
http://www.predpol.com/

Speech Recognition from Microsoft

Robot Localization
https://github.com/mjl/particle_ﬁlter_demo

Classiﬁcation
Regression
Similarity Matching
Clustering
Co-Occurrence Grouping
Proﬁling
Link Prediction
Data Reduction
Causal Modeling

Main Types of Learning
• Supervised Learning
• Unsupervised Learning
• Reinforcement Learning

Supervised Learning
y = f(x)
Given x, y pairs, ﬁnd a function f that will map
new x to a proper y.

Supervised Learning Problems
• Regression
• Classiﬁcation

http://thisvsthatshow.com/
Classiﬁcation

k-Nearest Neighbors
http://bdewilde.github.io/blog/blogger/2012/10/26/classiﬁcation-of-hand-written-digits-3/

Perceptron
Processor
Input 0
Input 1
Output
One or more inputs, a processor, and a single output

Perceptron Algorithm
Processor
12
4
Output
0.5
-1
(12 x 0.5) + (4 x -1)
sign(2)
+1

Perceptron’s Goal
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/
w0x0 + w1x1

How Perceptron Learning Works
https://datasciencelab.wordpress.com/2014/01/10/machine-learning-classics-the-perceptron/

Let’s implement k-Nearest Neighbors!

Probability Theory
https://seisanshi.wordpress.com/tag/probability/

Calculating Conditional Probability
• Probability that I eat bread for breakfast, P(A), is 0.6.
• Probability that I eat steak for lunch, P(B), is 0.5.
• Given I eat steak for lunch, the probability that I eat bread
for breakfast, P(A | B), is 0.7.
• What is P(B | A)?
• What about when A and B are independent?

A2A1 A3 An
Ck
. . .
P(Ck | A1, …, An) = P(Ck) * P(A1, …, An | Ck) / P(A1, …, An)
P(Ck | A1, …, An) P(Ck) * Prod P(Ai | C)
with independence assumption, we then have
Naive Bayes

Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
3 Party Sale Discount Yes
4 Python Party No
5 Python Programming No

Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
4 Python Party No
P(Spam) = ? P(NotSpam) = ?
P(Party | Spam) = ? P(Party | NotSpam) = ?
P(Programming | Spam) = ? P(Programming | NotSpam) = ?

Naive Bayes
No. Content Spam?
1 Party Yes
2 Sale Discount Yes
4 Python Party No
P(Spam) = 3/5 P(NotSpam) = 2/5
P(Party | Spam) = 2/3 P(Party | NotSpam) = 1/2
P(Programming | Spam) = 0 P(Programming | NotSpam) = 1/2

Naive Bayes
P(Spam | Party, Programming) = 3/5 * 2/3 * 0 = 0
P(NotSpam | Party, Programming) = 2/5 * 1/2 * 1/2 = 0.1
P(NotSpam | Party, Programming) > P(Spam | Party, Programming)
“Party Programming” is NOT a spam.

Decision Tree
Outlook
Humidity Wind
Sunny
Overcast
Rain
Yes
High Normal Strong Weak
No Yes No Yes
Day Outlook Temp Humidity WInd Play
D1 Sunny Hot High Weak No
D2 Sunny Hot High Strong No
D3 Overcast Mild High Strong Yes
D4 Rain Cool Normal Strong No
Play tennis?

Support Vector Machines
x
y
Current Coordinate System
x
z
New Coordinate System
“Kernel Trick”

Support Vector Machines
http://www.mblondel.org/journal/2010/09/19/support-vector-machines-in-python/
3 support vectors

Unsupervised Learning
f(x)
Given x, ﬁnd a function f that gives a compact
description of x.

Unsupervised Learning
• k-Means Clustering
• Hierarchical Clustering
• Gaussian Mixture Models (GMMs)

k-Means Clustering
http://stackoverﬂow.com/questions/24645068/k-means-clustering-major-understanding-issue/24645894#24645894

Should I recommend “The Last Which Hunter” to
Roofimon? (User-Based)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Kan 5 4 1 3
Roofimon 5 4 3 ?
Juacompe 1 3 3
John 4 1What should the rating be?
Find the most similar user to Roofimon

Rooﬁmon? (Item-Based)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Kan 5 4 1 3
Rooﬁmon 5 4 3 ?
Juacompe 1 3 3
John 4 1
Find the most similar item to The Last Witch Hunter
What should the rating be?

Roofimon? (Matrix Factorization)
The Hunger
Game
Warcraft The
Beginning
The Good
Dinosaur
The Last
Witch Hunter
Roofimon 5 4 3 ?
User Scary Kiddy
Roofimon 2 5
Movie Scary Kiddy
TLWH 3/4 1/4
(2 x 3/4) + (5 x 1/4) = 2.75

Anomaly Detection
http://modernfarmer.com/2013/11/farm-pop-idioms/

http://boxesandarrows.com/designing-screens-using-cores-and-paths/

1D k-Means Clustering
• Given these items: {2, 4, 10, 12, 3, 20, 30, 11, 25}
• Given these initial centroids: m1 = 2 and m2 = 4
• Find me the ﬁnal clusters!
Initialize Assign
Update
Centroids
Converge? Done
Yes
No

Recap: Supervised vs. Unsupervised?

Reinforcement Learning
y = f(x)
Given x and z, ﬁnd a function f that generates y.
z

Flappy Bird Hack using
Reinforcement Learning
http://sarvagyavaish.github.io/FlappyBirdRL/

I’ve got a perfect classiﬁers!
https://500px.com/photo/65907417/like-a-frog-trapped-inside-a-coconut-shell-by-ellena-susanti

http://blog.csdn.net/love_tea_cat/article/details/25972921
Overfitting (High Variance)
Normal fit Overfitting

http://blog.csdn.net/love_tea_cat/article/details/25972921
Underfitting (High Bias)
Normal fit Underfitting

How to Avoid Overfitting and Underfitting
• Using more data does NOT always help.
• Recommend to
• find a good number of features;
• perform cross validation;
• use regularization when overfitting is found.

Model Selection
• Use cross validation to ﬁnd the best parameters for
the model.

Metrics
• Accuracy
• True Positive, False Positive, True Negative, False
Negative
• Precision and Recall
• F1 Score
• etc.

Let’s evaluate this Giving Cats system!

Give me cats!
3 True Positives
1 False Positive
2 False Negatives
4 True Negatives
System
User

Precision and Recall
http://en.wikipedia.org/wiki/Precision_and_recall

False Positive or False Negative?

Metrics Summary
https://en.wikipedia.org/wiki/Receiver_operating_characteristic

Applied Machine Learning Process
http://machinelearningmastery.com/process-for-working-through-machine-learning-problems/

Cross Industry Standard Process for Data Mining (CRISP-DM; Shearer, 2000)
https://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining

Deﬁne the Problem
https://youmustdesireit.wordpress.com/2014/03/05/developing-and-nurturing-creative-problem-solving/

Prepare Data
http://vpnexpress.net/big-data-use-a-vpn-block-data-collection/

Spot Check Algorithms
https://www.ﬂickr.com/photos/withassociates/4385364607/sizes/l/

If two models ﬁt the data equally well,
choose the simpler one.

Improve Results
http://www.mobilemechanicprosaustin.com/

Present Results
http://www.langevin.com/blog/2013/04/25/5-tips-for-projecting-conﬁdence/presentation-skills-2/

http://newventurist.com/
• Curse of dimensionality
• Correlation does NOT  
imply causation.
• Learn many models,  
not just ONE.
• More data beats  
a cleaver algorithm.
• Data alone are not enough.
A Few Useful Things You Need to Know about Machine Learning, Pedro Domigos (2012)
Some Cautions

–John G. Richardson
“Learning Best
Through Experience”
https://studio.azureml.net/

Machine Learning and Feature Representation
Learning
Algorithm
Input
— Feature engineering is the key. —
Feature
Representation

Garbage In - Garbage Out
http://blog.marksgroup.net/2013/05/zoho-crm-garbage-in-garbage-out-its.html

Example of Feature Engineering
Width (m) Length (m) Cost (baht)
100 100 1,200,000
500 50 1,300,000
100 80 1,000,000
400 100 1,500,000
Are the data good to
model the area’s cost?
Size (m x m) Cost (baht)
100,000 1,200,000
25,000 1,300,000
8,000 1,000,000
400,00 1,500,000
Engineer features.
They look better here.

Deep Learning at Microsoft’s Speech Group

http://www.barnstable.k12.ma.us/domain/210

https://github.com/zkan/intro-to-machine-learning

Machine Learning at Geeky Base 2

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (8)

Similar to Machine Learning at Geeky Base 2

Similar to Machine Learning at Geeky Base 2 (20)

More from Kan Ouivirach, Ph.D.

More from Kan Ouivirach, Ph.D. (16)

Recently uploaded

Recently uploaded (20)

Machine Learning at Geeky Base 2