### FSharp and Machine Learning Dojo

1. Clear Lines Consulting · clear-lines.com 5/20/2013 · 1 F# Coding Dojo A gentle introduction to Machine Learning with F#
2. Clear Lines Consulting · clear-lines.com 5/20/2013 · 2 The goal tonight » Take a Kaggle data science contest » Write some code and have fun » Write a classifier, from scratch, using F# » Learn some Machine Learning concepts » Stretch goal: send results to Kaggle
3. Clear Lines Consulting · clear-lines.com 5/20/2013 · 3 What you may need to know
4. Clear Lines Consulting · clear-lines.com 5/20/2013 · 4 Kaggle Digit Recognizer contest » Full description on Kaggle.com » Dataset: hand-written digits (0, 1, … , 9) » Goal = automatically recognize digits » Training sample = 50,000 examples » Contest: predict 20,000 “unknown” digits
5. Clear Lines Consulting · clear-lines.com 5/20/2013 · 5 The data “looks like that” 1
6. Clear Lines Consulting · clear-lines.com 5/20/2013 · 6 Real data » 28 x 28 pixels » Grayscale: each pixel 0 (white) to 255 (black) » Flattened: one record = Number + 784 Pixels » CSV file
7. Clear Lines Consulting · clear-lines.com 5/20/2013 · 7 Illustration (simplified data) Pixels (real: 784 fields, from 0 to 255)Actual Number 1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
8. Clear Lines Consulting · clear-lines.com 5/20/2013 · 8 What’s a Classifier? » “Give me an unknown data point, and I will predict what class it belongs to” » In this case, classes = 0, 1, 2, … 9 » Unknown data point = scanned digit, without the class it belongs to
9. Clear Lines Consulting · clear-lines.com 5/20/2013 · 9 The KNN Classifier » KNN = K-Nearest-Neighbors algorithm » Given an unknown subject to classify, » Look up all the known examples, » Find the K closest examples, » Take a majority vote, » Predict what the majority says
10. Clear Lines Consulting · clear-lines.com 5/20/2013 · 10 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown Which item from the sample is nearest / closest to the Unknown item we want to predict? Suppose we have just 2 examples in the sample, and want to predict the class of Unknown
11. Clear Lines Consulting · clear-lines.com 5/20/2013 · 11 What does “close” mean? » To define “close” we need a distance » We can use the distance between images as a measure for “close” » Other distances can be used as well » Note: Square root not important here
12. Clear Lines Consulting · clear-lines.com 5/20/2013 · 12 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown X 1 X X X X X X X X 0 Differences Let’s compute the distance between Unknown and our two examples…
13. Clear Lines Consulting · clear-lines.com 5/20/2013 · 13 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown 1 0 ?       (255-0)2 (255-0)2 (255-0)2 (0-255)2 Etc… Distance = 721 Distance = 255
14. Clear Lines Consulting · clear-lines.com 5/20/2013 · 14 Illustration: 1 nearest neighbor 1 0 ? SampleUnknown The first example is closest to our Unknown candidate: we predict that Unknown has the same Number, 1
15. Clear Lines Consulting · clear-lines.com 5/20/2013 · 15 Questions?
16. Clear Lines Consulting · clear-lines.com 5/20/2013 · 16 Let’s start coding! » Code 1-nearest-neighbor classifier » “Guided script” available at: » Bit.ly/FSharp-ML-Dojo » https://gist.github.com/mathias- brandewinder/5558573