Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Clear Lines Consulting · clear-lines.com 5/20/2013 · 1F# Coding DojoA gentle introduction to MachineLearning with F#
Clear Lines Consulting · clear-lines.com 5/20/2013 · 2The goal tonight» Take a Kaggle data science contest» Write some cod...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 3What you may need to know
Clear Lines Consulting · clear-lines.com 5/20/2013 · 4Kaggle Digit Recognizer contest» Full description on Kaggle.com» Dat...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 5The data “looks like that”1
Clear Lines Consulting · clear-lines.com 5/20/2013 · 6Real data» 28 x 28 pixels» Grayscale: each pixel 0 (white) to 255 (b...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 7Illustration (simplified data)Pixels (real: 784 fields, from 0 to 25...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 8What’s a Classifier?» “Give me an unknown data point, and I willpred...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 9The KNN Classifier» KNN = K-Nearest-Neighbors algorithm» Given an un...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 10Illustration: 1 nearest neighbor10?Sample UnknownWhich item from th...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 11What does “close” mean?» To define “close” we need a distance» We c...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 12Illustration: 1 nearest neighbor10?Sample UnknownX1XXXXXXXX0Differe...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 13Illustration: 1 nearest neighbor10?SampleUnknown10?    (25...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 14Illustration: 1 nearest neighbor10?SampleUnknown The first example ...
Clear Lines Consulting · clear-lines.com 5/20/2013 · 15Questions?
Clear Lines Consulting · clear-lines.com 5/20/2013 · 16Let’s start coding!» Code 1-nearest-neighbor classifier» “Guided sc...
Upcoming SlideShare
Loading in …5
×

FSharp and Machine Learning Dojo

5,036 views

Published on

Intro slides for an simple Machine Learning Dojo with F#; the companion code is at bit.ly/FSharp-ML-Dojo

Published in: Technology, Education

FSharp and Machine Learning Dojo

  1. 1. Clear Lines Consulting · clear-lines.com 5/20/2013 · 1F# Coding DojoA gentle introduction to MachineLearning with F#
  2. 2. Clear Lines Consulting · clear-lines.com 5/20/2013 · 2The goal tonight» Take a Kaggle data science contest» Write some code and have fun» Write a classifier, from scratch, using F#» Learn some Machine Learning concepts» Stretch goal: send results to Kaggle
  3. 3. Clear Lines Consulting · clear-lines.com 5/20/2013 · 3What you may need to know
  4. 4. Clear Lines Consulting · clear-lines.com 5/20/2013 · 4Kaggle Digit Recognizer contest» Full description on Kaggle.com» Dataset: hand-written digits (0, 1, … , 9)» Goal = automatically recognize digits» Training sample = 50,000 examples» Contest: predict 20,000 “unknown” digits
  5. 5. Clear Lines Consulting · clear-lines.com 5/20/2013 · 5The data “looks like that”1
  6. 6. Clear Lines Consulting · clear-lines.com 5/20/2013 · 6Real data» 28 x 28 pixels» Grayscale: each pixel 0 (white) to 255 (black)» Flattened: one record = Number + 784 Pixels» CSV file
  7. 7. Clear Lines Consulting · clear-lines.com 5/20/2013 · 7Illustration (simplified data)Pixels (real: 784 fields, from 0 to 255)Actual Number1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
  8. 8. Clear Lines Consulting · clear-lines.com 5/20/2013 · 8What’s a Classifier?» “Give me an unknown data point, and I willpredict what class it belongs to”» In this case, classes = 0, 1, 2, … 9» Unknown data point = scanned digit, withoutthe class it belongs to
  9. 9. Clear Lines Consulting · clear-lines.com 5/20/2013 · 9The KNN Classifier» KNN = K-Nearest-Neighbors algorithm» Given an unknown subject to classify,» Look up all the known examples,» Find the K closest examples,» Take a majority vote,» Predict what the majority says
  10. 10. Clear Lines Consulting · clear-lines.com 5/20/2013 · 10Illustration: 1 nearest neighbor10?Sample UnknownWhich item from the sampleis nearest / closest to the Unknownitem we want to predict?Suppose we have just 2 examples in the sample,and want to predict the class of Unknown
  11. 11. Clear Lines Consulting · clear-lines.com 5/20/2013 · 11What does “close” mean?» To define “close” we need a distance» We can use the distance between images as ameasure for “close”» Other distances can be used as well» Note: Square root not important here
  12. 12. Clear Lines Consulting · clear-lines.com 5/20/2013 · 12Illustration: 1 nearest neighbor10?Sample UnknownX1XXXXXXXX0DifferencesLet’s compute the distancebetween Unknown and ourtwo examples…
  13. 13. Clear Lines Consulting · clear-lines.com 5/20/2013 · 13Illustration: 1 nearest neighbor10?SampleUnknown10?    (255-0)2(255-0)2(255-0)2 (0-255)2 Etc… Distance = 721Distance = 255
  14. 14. Clear Lines Consulting · clear-lines.com 5/20/2013 · 14Illustration: 1 nearest neighbor10?SampleUnknown The first example is closestto our Unknown candidate:we predict that Unknownhas the same Number, 1
  15. 15. Clear Lines Consulting · clear-lines.com 5/20/2013 · 15Questions?
  16. 16. Clear Lines Consulting · clear-lines.com 5/20/2013 · 16Let’s start coding!» Code 1-nearest-neighbor classifier» “Guided script” available at:» Bit.ly/FSharp-ML-Dojo» https://gist.github.com/mathias-brandewinder/5558573

×