Upcoming SlideShare
×

# FSharp and Machine Learning Dojo

4,652 views
4,522 views

Published on

Intro slides for an simple Machine Learning Dojo with F#; the companion code is at bit.ly/FSharp-ML-Dojo

Published in: Technology, Education
6 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

Views
Total views
4,652
On SlideShare
0
From Embeds
0
Number of Embeds
1,256
Actions
Shares
0
0
0
Likes
6
Embeds 0
No embeds

No notes for slide

### FSharp and Machine Learning Dojo

1. 1. Clear Lines Consulting · clear-lines.com 5/20/2013 · 1F# Coding DojoA gentle introduction to MachineLearning with F#
2. 2. Clear Lines Consulting · clear-lines.com 5/20/2013 · 2The goal tonight» Take a Kaggle data science contest» Write some code and have fun» Write a classifier, from scratch, using F#» Learn some Machine Learning concepts» Stretch goal: send results to Kaggle
3. 3. Clear Lines Consulting · clear-lines.com 5/20/2013 · 3What you may need to know
4. 4. Clear Lines Consulting · clear-lines.com 5/20/2013 · 4Kaggle Digit Recognizer contest» Full description on Kaggle.com» Dataset: hand-written digits (0, 1, … , 9)» Goal = automatically recognize digits» Training sample = 50,000 examples» Contest: predict 20,000 “unknown” digits
5. 5. Clear Lines Consulting · clear-lines.com 5/20/2013 · 5The data “looks like that”1
6. 6. Clear Lines Consulting · clear-lines.com 5/20/2013 · 6Real data» 28 x 28 pixels» Grayscale: each pixel 0 (white) to 255 (black)» Flattened: one record = Number + 784 Pixels» CSV file
7. 7. Clear Lines Consulting · clear-lines.com 5/20/2013 · 7Illustration (simplified data)Pixels (real: 784 fields, from 0 to 255)Actual Number1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
8. 8. Clear Lines Consulting · clear-lines.com 5/20/2013 · 8What’s a Classifier?» “Give me an unknown data point, and I willpredict what class it belongs to”» In this case, classes = 0, 1, 2, … 9» Unknown data point = scanned digit, withoutthe class it belongs to
9. 9. Clear Lines Consulting · clear-lines.com 5/20/2013 · 9The KNN Classifier» KNN = K-Nearest-Neighbors algorithm» Given an unknown subject to classify,» Look up all the known examples,» Find the K closest examples,» Take a majority vote,» Predict what the majority says
10. 10. Clear Lines Consulting · clear-lines.com 5/20/2013 · 10Illustration: 1 nearest neighbor10?Sample UnknownWhich item from the sampleis nearest / closest to the Unknownitem we want to predict?Suppose we have just 2 examples in the sample,and want to predict the class of Unknown
11. 11. Clear Lines Consulting · clear-lines.com 5/20/2013 · 11What does “close” mean?» To define “close” we need a distance» We can use the distance between images as ameasure for “close”» Other distances can be used as well» Note: Square root not important here
12. 12. Clear Lines Consulting · clear-lines.com 5/20/2013 · 12Illustration: 1 nearest neighbor10?Sample UnknownX1XXXXXXXX0DifferencesLet’s compute the distancebetween Unknown and ourtwo examples…
13. 13. Clear Lines Consulting · clear-lines.com 5/20/2013 · 13Illustration: 1 nearest neighbor10?SampleUnknown10?    (255-0)2(255-0)2(255-0)2 (0-255)2 Etc… Distance = 721Distance = 255
14. 14. Clear Lines Consulting · clear-lines.com 5/20/2013 · 14Illustration: 1 nearest neighbor10?SampleUnknown The first example is closestto our Unknown candidate:we predict that Unknownhas the same Number, 1
15. 15. Clear Lines Consulting · clear-lines.com 5/20/2013 · 15Questions?
16. 16. Clear Lines Consulting · clear-lines.com 5/20/2013 · 16Let’s start coding!» Code 1-nearest-neighbor classifier» “Guided script” available at:» Bit.ly/FSharp-ML-Dojo» https://gist.github.com/mathias-brandewinder/5558573