SlideShare a Scribd company logo
1 of 16
Clear Lines Consulting · clear-lines.com 5/20/2013 · 1
F# Coding Dojo
A gentle introduction to Machine
Learning with F#
Clear Lines Consulting · clear-lines.com 5/20/2013 · 2
The goal tonight
» Take a Kaggle data science contest
» Write some code and have fun
» Write a classifier, from scratch, using F#
» Learn some Machine Learning concepts
» Stretch goal: send results to Kaggle
Clear Lines Consulting · clear-lines.com 5/20/2013 · 3
What you may need to know
Clear Lines Consulting · clear-lines.com 5/20/2013 · 4
Kaggle Digit Recognizer contest
» Full description on Kaggle.com
» Dataset: hand-written digits (0, 1, … , 9)
» Goal = automatically recognize digits
» Training sample = 50,000 examples
» Contest: predict 20,000 “unknown” digits
Clear Lines Consulting · clear-lines.com 5/20/2013 · 5
The data “looks like that”
1
Clear Lines Consulting · clear-lines.com 5/20/2013 · 6
Real data
» 28 x 28 pixels
» Grayscale: each pixel 0 (white) to 255 (black)
» Flattened: one record = Number + 784 Pixels
» CSV file
Clear Lines Consulting · clear-lines.com 5/20/2013 · 7
Illustration (simplified data)
Pixels (real: 784 fields, from 0 to 255)Actual Number
1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
Clear Lines Consulting · clear-lines.com 5/20/2013 · 8
What’s a Classifier?
» “Give me an unknown data point, and I will
predict what class it belongs to”
» In this case, classes = 0, 1, 2, … 9
» Unknown data point = scanned digit, without
the class it belongs to
Clear Lines Consulting · clear-lines.com 5/20/2013 · 9
The KNN Classifier
» KNN = K-Nearest-Neighbors algorithm
» Given an unknown subject to classify,
» Look up all the known examples,
» Find the K closest examples,
» Take a majority vote,
» Predict what the majority says
Clear Lines Consulting · clear-lines.com 5/20/2013 · 10
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
Which item from the sample
is nearest / closest to the Unknown
item we want to predict?
Suppose we have just 2 examples in the sample,
and want to predict the class of Unknown
Clear Lines Consulting · clear-lines.com 5/20/2013 · 11
What does “close” mean?
» To define “close” we need a distance
» We can use the distance between images as a
measure for “close”
» Other distances can be used as well
» Note: Square root not important here
Clear Lines Consulting · clear-lines.com 5/20/2013 · 12
Illustration: 1 nearest neighbor
1
0
?
Sample Unknown
X
1
X
X
X
X
X
X
X
X
0
Differences
Let’s compute the distance
between Unknown and our
two examples…
Clear Lines Consulting · clear-lines.com 5/20/2013 · 13
Illustration: 1 nearest neighbor
1
0
?
Sample
Unknown
1
0
?

    
(255-0)2
(255-0)2
(255-0)2 (0-255)2 Etc… Distance = 721
Distance = 255
Clear Lines Consulting · clear-lines.com 5/20/2013 · 14
Illustration: 1 nearest neighbor
1
0
?
SampleUnknown The first example is closest
to our Unknown candidate:
we predict that Unknown
has the same Number, 1
Clear Lines Consulting · clear-lines.com 5/20/2013 · 15
Questions?
Clear Lines Consulting · clear-lines.com 5/20/2013 · 16
Let’s start coding!
» Code 1-nearest-neighbor classifier
» “Guided script” available at:
» Bit.ly/FSharp-ML-Dojo
» https://gist.github.com/mathias-
brandewinder/5558573

More Related Content

Similar to FSharp and Machine Learning Dojo

TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptxTE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptxAbhijeetDhanrajSalve
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareTigerGraph
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationBigML, Inc
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionGirish Gore
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationSurendra Gadwal
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyAlon Bochman, CFA
 
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Neo4j
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksMLReview
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseAlberto Danese
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Pramit Choudhary
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesBigML, Inc
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Lucidworks
 
MLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLMLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLBigML, Inc
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningNikolay Karelin
 

Similar to FSharp and Machine Learning Dojo (20)

TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptxTE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
TE_B_10_INTERNSHIP_PPT_ANIKET_BHAVSAR.pptx
 
BSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly DetectionBSSML16 L3. Clusters and Anomaly Detection
BSSML16 L3. Clusters and Anomaly Detection
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
Fast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA HardwareFast Parallel Similarity Calculations with FPGA Hardware
Fast Parallel Similarity Calculations with FPGA Hardware
 
VSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet AllocationVSSML18. Clustering and Latent Dirichlet Allocation
VSSML18. Clustering and Latent Dirichlet Allocation
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
Introduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regressionIntroduction to machine learning and model building using linear regression
Introduction to machine learning and model building using linear regression
 
A Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence MaximizationA Novel Target Marketing Approach based on Influence Maximization
A Novel Target Marketing Approach based on Influence Maximization
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)Improve ML Predictions using Graph Analytics (today!)
Improve ML Predictions using Graph Analytics (today!)
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
Cluster Analysis for Dummies
Cluster Analysis for DummiesCluster Analysis for Dummies
Cluster Analysis for Dummies
 
L13. Cluster Analysis
L13. Cluster AnalysisL13. Cluster Analysis
L13. Cluster Analysis
 
Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
DutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and AnomaliesDutchMLSchool. Clusters and Anomalies
DutchMLSchool. Clusters and Anomalies
 
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
Image-Based E-Commerce Product Discovery: A Deep Learning Case Study - Denis ...
 
MLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigMLMLSD18. Basic Transformations - BigML
MLSD18. Basic Transformations - BigML
 
Main principles of Data Science and Machine Learning
Main principles of Data Science and Machine LearningMain principles of Data Science and Machine Learning
Main principles of Data Science and Machine Learning
 

Recently uploaded

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESmohitsingh558521
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????blackmambaettijean
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Recently uploaded (20)

Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICESSALESFORCE EDUCATION CLOUD | FEXLE SERVICES
SALESFORCE EDUCATION CLOUD | FEXLE SERVICES
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is Artificial Intelligence?????????
What is Artificial Intelligence?????????What is Artificial Intelligence?????????
What is Artificial Intelligence?????????
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

FSharp and Machine Learning Dojo

  • 1. Clear Lines Consulting · clear-lines.com 5/20/2013 · 1 F# Coding Dojo A gentle introduction to Machine Learning with F#
  • 2. Clear Lines Consulting · clear-lines.com 5/20/2013 · 2 The goal tonight » Take a Kaggle data science contest » Write some code and have fun » Write a classifier, from scratch, using F# » Learn some Machine Learning concepts » Stretch goal: send results to Kaggle
  • 3. Clear Lines Consulting · clear-lines.com 5/20/2013 · 3 What you may need to know
  • 4. Clear Lines Consulting · clear-lines.com 5/20/2013 · 4 Kaggle Digit Recognizer contest » Full description on Kaggle.com » Dataset: hand-written digits (0, 1, … , 9) » Goal = automatically recognize digits » Training sample = 50,000 examples » Contest: predict 20,000 “unknown” digits
  • 5. Clear Lines Consulting · clear-lines.com 5/20/2013 · 5 The data “looks like that” 1
  • 6. Clear Lines Consulting · clear-lines.com 5/20/2013 · 6 Real data » 28 x 28 pixels » Grayscale: each pixel 0 (white) to 255 (black) » Flattened: one record = Number + 784 Pixels » CSV file
  • 7. Clear Lines Consulting · clear-lines.com 5/20/2013 · 7 Illustration (simplified data) Pixels (real: 784 fields, from 0 to 255)Actual Number 1,0,0,255,0,0,255,255,0,0,0,255,0,0,0,255,0
  • 8. Clear Lines Consulting · clear-lines.com 5/20/2013 · 8 What’s a Classifier? » “Give me an unknown data point, and I will predict what class it belongs to” » In this case, classes = 0, 1, 2, … 9 » Unknown data point = scanned digit, without the class it belongs to
  • 9. Clear Lines Consulting · clear-lines.com 5/20/2013 · 9 The KNN Classifier » KNN = K-Nearest-Neighbors algorithm » Given an unknown subject to classify, » Look up all the known examples, » Find the K closest examples, » Take a majority vote, » Predict what the majority says
  • 10. Clear Lines Consulting · clear-lines.com 5/20/2013 · 10 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown Which item from the sample is nearest / closest to the Unknown item we want to predict? Suppose we have just 2 examples in the sample, and want to predict the class of Unknown
  • 11. Clear Lines Consulting · clear-lines.com 5/20/2013 · 11 What does “close” mean? » To define “close” we need a distance » We can use the distance between images as a measure for “close” » Other distances can be used as well » Note: Square root not important here
  • 12. Clear Lines Consulting · clear-lines.com 5/20/2013 · 12 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown X 1 X X X X X X X X 0 Differences Let’s compute the distance between Unknown and our two examples…
  • 13. Clear Lines Consulting · clear-lines.com 5/20/2013 · 13 Illustration: 1 nearest neighbor 1 0 ? Sample Unknown 1 0 ?       (255-0)2 (255-0)2 (255-0)2 (0-255)2 Etc… Distance = 721 Distance = 255
  • 14. Clear Lines Consulting · clear-lines.com 5/20/2013 · 14 Illustration: 1 nearest neighbor 1 0 ? SampleUnknown The first example is closest to our Unknown candidate: we predict that Unknown has the same Number, 1
  • 15. Clear Lines Consulting · clear-lines.com 5/20/2013 · 15 Questions?
  • 16. Clear Lines Consulting · clear-lines.com 5/20/2013 · 16 Let’s start coding! » Code 1-nearest-neighbor classifier » “Guided script” available at: » Bit.ly/FSharp-ML-Dojo » https://gist.github.com/mathias- brandewinder/5558573