Clear Lines Consulting · clear-lines.com 5/20/2013 · 1
F# Coding Dojo
A gentle introduction to Machine
Learning with F#
Clear Lines Consulting · clear-lines.com 5/20/2013 · 2
The goal tonight
» Take a Kaggle data science contest
» Write some code and have fun
» Write a classifier, from scratch, using F#
» Learn some Machine Learning concepts
» Stretch goal: send results to Kaggle
Clear Lines Consulting · clear-lines.com 5/20/2013 · 6
Real data
» 28 x 28 pixels
» Grayscale: each pixel 0 (white) to 255 (black)
» Flattened: one record = Number + 784 Pixels
» CSV file
Clear Lines Consulting · clear-lines.com 5/20/2013 · 7
Illustration (simplified data)
Pixels (real: 784 fields, from 0 to 255)Actual Number
1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
Clear Lines Consulting · clear-lines.com 5/20/2013 · 8
What’s a Classifier?
» “Give me an unknown data point, and I will
predict what class it belongs to”
» In this case, classes = 0, 1, 2, … 9
» Unknown data point = scanned digit, without
the class it belongs to
Clear Lines Consulting · clear-lines.com 5/20/2013 · 9
The KNN Classifier
» KNN = K-Nearest-Neighbors algorithm
» Given an unknown subject to classify,
» Look up all the known examples,
» Find the K closest examples,
» Take a majority vote,
» Predict what the majority says
Clear Lines Consulting · clear-lines.com 5/20/2013 · 10
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
Which item from the sample
is nearest / closest to the Unknown
item we want to predict?
Suppose we have just 2 examples in the sample,
and want to predict the class of Unknown
Clear Lines Consulting · clear-lines.com 5/20/2013 · 11
What does “close” mean?
» To define “close” we need a distance
» We can use the distance between images as a
measure for “close”
» Other distances can be used as well
» Note: Square root not important here
Clear Lines Consulting · clear-lines.com 5/20/2013 · 12
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
X
1
X
X
X
X
X
X
X
X
0
Differences
Let’s compute the distance
between Unknown and our
two examples…
Clear Lines Consulting · clear-lines.com 5/20/2013 · 14
Illustration: 1 nearest neighbor
1
0
?
SampleUnknown The first example is closest
to our Unknown candidate:
we predict that Unknown
has the same Number, 1