SlideShare a Scribd company logo
1 of 51
Machine Learning in a
Flash
Kory Becker
August, 2017, http://primaryobjects.com
1
Sponsored by
2
AI IS GOOD
AI !== Machine Learning
 Logical AI, Symbolic, Knowledge-
based
 Pattern Recognition, Representation
 Inference, Common Sense, Planning
 Heuristics, Ontology, Artificial Life,
Genetic
 Machine Learning, Statistics
3
Machine Learning
Algorithms
Supervised
Linear Regression
Logistic Regression
Support Vector Machines
Neural Networks
Unsupervised
K-means Clustering
Principal Component Analysis (Dimensionality
Reduction)
4
Linear Regression
Logistic Regression
Logistic Regression
Linear Classification
Support Vector Machine
Non-Linear Classification
Support Vector Machine
Gaussian Kernel
Pop Quiz!
Question 1: Supervised
or Unsupervised?
 You are designing an agent for The Matrix.
 It’s task is to classify people that are threats to the system.
 Feature Set:
 Age
 IQ
 Level of Education
 # of Times They Watched the Movie The Matrix
 Training Set of 100,000 people: 50k threats, 50k non-threats
Question 2: Supervised
or Unsupervised?
 You are designing the brain of a battle robot.
 It’s primary attack is hand-to-hand combat. Your task is to
find the most effective move combos.
 Feature Set:
 # of Kicks
 # of Punches
 # of Head-butts
 # of Leg Sweeps
 Training Set of 100,000 winning battles
Natural Language
Processing
Convert text into a numerical representation
Find commonalities within data
Clustering
Make predictions from data
Classification
Category, Popularity, Sentiment,
Relationships
Bag of Words Model
Corpus
Cats like to chase mice.
Dogs like to eat big bones.
Create a Dictionary Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
Digitize Text
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Vector Length = 8
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Classify Documents
(eating)
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
0
1
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
1
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Does it Really Work?
> data
[1] "Cats like to chase mice." "Dogs like to eat big
bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document
Term Matrix
100% Accuracy Training
Test Case
Success! Source code:
https://goo.gl/UxjPBs
Unsupervised Learning
Finding patterns in data
Grouping similar data into clusters
Does not require labeled data
Exploratory data analysis
Predict clusters for new data!
K-Means Clustering
Popular clustering algorithm
Groups data into k clusters
Data points belong to the cluster with closest mean
Each cluster has a centroid (center)
k-Means Algorithm
Choose a value for k (number of clusters)
 Guess
 Rule of thumb: ~~(Math.sqrt(points.length * 0.5))
Initialize centroids
 Random
 Farthest Point
 K-means++
Assign data points to closest centroid
Move centroids to center of assigned points
Demo: https://goo.gl/AjNEJk
Clustering Example 1
Clustering Example 1
Clustering Example 1
Clustering Example 2
Clustering Example 2
Predicting Color Groups
rgb(255, 0, 0)
rgb(0, 255, 0)
rgb(0, 0, 255)
rgb(200, 0, 150)
rgb(50, 199, 135)
rgb(100, 180, 255)
red
green
blue
?
?
?
Predicting Color Groups
rgb(255, 0, 0)
rgb(0, 255, 0)
rgb(0, 0, 255)
rgb(200, 0, 150)
rgb(50, 199, 135)
rgb(100, 180, 255)
red
green
blue
?
?
?
= 16777216
Encoding
= 65280
= 13107350
= 3327879
= 6599935
= 255
(Red * 256 * 256) + (Green * 256) + (Blue)
1000 Colors
100 Colors
Calculating Centroids
Classifying Colors to a Cluster
Grouping Colors into their Cluster
Predicting Color Groups
rgb(241, 52, 11)
rgb(80, 187, 139)
rgb(34, 15, 194)
?
?
?
Predicting on New Data
Predicting on New Data
Categorizing Stocks & Bonds
Data Source: Vanguard ETF funds
Data Fields:
 Ticker, Asset Class, Expense Ratio
 Price, Change 1, Change 2, SEC Yield
 YTD, Year 1, Year 5, Year 10
 Since Inception
Can We Predict a Category?
Categorizing Stocks & Bonds
International
Stocks
Interm Bond
Long
Bond
Asset Classes
Stock Sector
Stock Mid-Cap Blend
Stock Large-Cap Value
International
Bond Inter-term Investment
Bond Inter-term Government
Bond Long-term Government
Categorizing Stocks & Bonds
Asset Classes
Stock Sector
Stock Mid-Cap Blend
Stock Large-Cap Value
International
Bond Inter-term Investment
Bond Inter-term Government
Bond Long-term Government
Categorizing Stocks & Bonds
Features
Ticker, Asset Class, Expense Ratio
Price, Change 1, Change 2, SEC Yield
YTD, Year 1, Year 5, Year 10
Since Inception
Categorizing Stocks & Bonds
Example Data
VYM, Stock - Large-Cap Value, 0.08%, $77.95, -
$0.11, -0.14%, 3.09%B, 4.49%, 12.73%, 13.64%,
7.08%, 7.53% (11/10/2006)
VIG, Stock - Large-Cap Blend, 0.08%, $92.39, -
$0.21, -0.23%, 1.93%B, 9.62%, 13.75%, 12.75%,
7.41%, 7.92% (04/21/2006)
Categorizing Stocks & Bonds
Which of these look similar?
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► VIG, Stock - Large-Cap Blend
► 9.62, 13.75, 12.75, 7.41
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VCIT, Bond - Inter-term
► 3.47, 1.1, 4.04, 0
Group Into Five Clusters
1. Stock
2. StockBigGain
3. International
4. SmallMidLargeCap
5. Bond
Categorizing Stocks & Bonds
Why?
55
2
= 5
Predicting on New Data
Use centroids from training
Determine cluster for each test point
Assign label
Easy!
Categorizing Stocks & Bonds
Predicting on New Data
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VTI, Stock – Large-Cap Blend
► 9.04, 18.49, 14.55, 7.39
Stock
Bond
?
Predicting on New Data
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VTI, Stock – Large-Cap Blend
► 9.04, 18.49, 14.55, 7.39
Stock
Bond
?
Results – Did It Work?
►VEU, International, 1, International
►VNQI, International, 1,International
►VXUS, International, 1, International
►BLV, Bond - Long-term, 3, Stock
►BIV, Bond - Inter-term, 4, Bond
►VCLT, Bond - Long-term, 4, Bond
►BSV, Bond - Short-term, 4, Bond
 VIG, Stock - Large-Cap Blend, 5, SmallMidLargeCap
 VUG, Stock - Large-Cap Growth, 5, SmallMidLargeCap
 VTI, Stock - Large-Cap Blend, 5, SmallMidLargeCap
Categorizing Stocks & Bonds
Not Bad! 
Thank you!
Kory Becker
http://primaryobjects.com
@primaryobjects

More Related Content

Similar to Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regionsbutest
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxssuserc074dd
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision treesPadma Metta
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Austin Benson
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsDerek Kane
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Jeffrey Nichols
 
IBM Watson Concept Insights
IBM Watson Concept InsightsIBM Watson Concept Insights
IBM Watson Concept InsightsKory Becker
 
Topic_6
Topic_6Topic_6
Topic_6butest
 
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Numenta
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligencePallavi Vashistha
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...yaevents
 
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...Yandex
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015Antoine Taly
 
ensemble learning
ensemble learningensemble learning
ensemble learningbutest
 
Data Science 101
Data Science 101Data Science 101
Data Science 101ideatoipo
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwittersLiu Chang
 

Similar to Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
PHP - Introduction to PHP MySQL Joins and SQL Functions
PHP -  Introduction to PHP MySQL Joins and SQL FunctionsPHP -  Introduction to PHP MySQL Joins and SQL Functions
PHP - Introduction to PHP MySQL Joins and SQL Functions
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptx
 
Unit 4.pptx
Unit 4.pptxUnit 4.pptx
Unit 4.pptx
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
 
IBM Watson Concept Insights
IBM Watson Concept InsightsIBM Watson Concept Insights
IBM Watson Concept Insights
 
Topic_6
Topic_6Topic_6
Topic_6
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
 
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 
TextMiningTwitters
TextMiningTwittersTextMiningTwitters
TextMiningTwitters
 

More from Kory Becker

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game IsolationKory Becker
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Kory Becker
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapKory Becker
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019Kory Becker
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Kory Becker
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkKory Becker
 
Self Programming Artificial Intelligence
Self Programming Artificial IntelligenceSelf Programming Artificial Intelligence
Self Programming Artificial IntelligenceKory Becker
 
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Kory Becker
 

More from Kory Becker (8)

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
 
Self Programming Artificial Intelligence
Self Programming Artificial IntelligenceSelf Programming Artificial Intelligence
Self Programming Artificial Intelligence
 
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
 

Recently uploaded

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech studentsHimanshiGarg82
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...kalichargn70th171
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfryanfarris8
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 

Recently uploaded (20)

Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students8257 interfacing 2 in microprocessor for btech students
8257 interfacing 2 in microprocessor for btech students
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
The Guide to Integrating Generative AI into Unified Continuous Testing Platfo...
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 

Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering

  • 1. Machine Learning in a Flash Kory Becker August, 2017, http://primaryobjects.com 1 Sponsored by
  • 3. AI !== Machine Learning  Logical AI, Symbolic, Knowledge- based  Pattern Recognition, Representation  Inference, Common Sense, Planning  Heuristics, Ontology, Artificial Life, Genetic  Machine Learning, Statistics 3
  • 4. Machine Learning Algorithms Supervised Linear Regression Logistic Regression Support Vector Machines Neural Networks Unsupervised K-means Clustering Principal Component Analysis (Dimensionality Reduction) 4
  • 11. Question 1: Supervised or Unsupervised?  You are designing an agent for The Matrix.  It’s task is to classify people that are threats to the system.  Feature Set:  Age  IQ  Level of Education  # of Times They Watched the Movie The Matrix  Training Set of 100,000 people: 50k threats, 50k non-threats
  • 12. Question 2: Supervised or Unsupervised?  You are designing the brain of a battle robot.  It’s primary attack is hand-to-hand combat. Your task is to find the most effective move combos.  Feature Set:  # of Kicks  # of Punches  # of Head-butts  # of Leg Sweeps  Training Set of 100,000 winning battles
  • 13. Natural Language Processing Convert text into a numerical representation Find commonalities within data Clustering Make predictions from data Classification Category, Popularity, Sentiment, Relationships
  • 14. Bag of Words Model Corpus Cats like to chase mice. Dogs like to eat big bones.
  • 15. Create a Dictionary Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 16. Digitize Text Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Vector Length = 8 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 17. Classify Documents (eating) Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 0 1 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 18. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 19. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 20. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 1 Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 21. Does it Really Work? > data [1] "Cats like to chase mice." "Dogs like to eat big bones." > train big bone cat chase dog eat like mice y 1 0 0 1 1 0 0 1 1 0 2 1 1 0 0 1 1 1 0 1 > predict(fit, newdata = train) [1] 0 1 > data2 [1] "Bats eat bugs." > test big bone cat chase dog eat like mice 1 0 0 0 0 0 1 0 0 > predict(fit, newdata = test) [1] 1 Document Term Matrix 100% Accuracy Training Test Case Success! Source code: https://goo.gl/UxjPBs
  • 22. Unsupervised Learning Finding patterns in data Grouping similar data into clusters Does not require labeled data Exploratory data analysis Predict clusters for new data!
  • 23. K-Means Clustering Popular clustering algorithm Groups data into k clusters Data points belong to the cluster with closest mean Each cluster has a centroid (center)
  • 24. k-Means Algorithm Choose a value for k (number of clusters)  Guess  Rule of thumb: ~~(Math.sqrt(points.length * 0.5)) Initialize centroids  Random  Farthest Point  K-means++ Assign data points to closest centroid Move centroids to center of assigned points Demo: https://goo.gl/AjNEJk
  • 30. Predicting Color Groups rgb(255, 0, 0) rgb(0, 255, 0) rgb(0, 0, 255) rgb(200, 0, 150) rgb(50, 199, 135) rgb(100, 180, 255) red green blue ? ? ?
  • 31. Predicting Color Groups rgb(255, 0, 0) rgb(0, 255, 0) rgb(0, 0, 255) rgb(200, 0, 150) rgb(50, 199, 135) rgb(100, 180, 255) red green blue ? ? ? = 16777216 Encoding = 65280 = 13107350 = 3327879 = 6599935 = 255 (Red * 256 * 256) + (Green * 256) + (Blue)
  • 36. Grouping Colors into their Cluster
  • 37. Predicting Color Groups rgb(241, 52, 11) rgb(80, 187, 139) rgb(34, 15, 194) ? ? ? Predicting on New Data
  • 39. Categorizing Stocks & Bonds Data Source: Vanguard ETF funds Data Fields:  Ticker, Asset Class, Expense Ratio  Price, Change 1, Change 2, SEC Yield  YTD, Year 1, Year 5, Year 10  Since Inception
  • 40. Can We Predict a Category? Categorizing Stocks & Bonds International Stocks Interm Bond Long Bond
  • 41. Asset Classes Stock Sector Stock Mid-Cap Blend Stock Large-Cap Value International Bond Inter-term Investment Bond Inter-term Government Bond Long-term Government Categorizing Stocks & Bonds
  • 42. Asset Classes Stock Sector Stock Mid-Cap Blend Stock Large-Cap Value International Bond Inter-term Investment Bond Inter-term Government Bond Long-term Government Categorizing Stocks & Bonds
  • 43. Features Ticker, Asset Class, Expense Ratio Price, Change 1, Change 2, SEC Yield YTD, Year 1, Year 5, Year 10 Since Inception Categorizing Stocks & Bonds
  • 44. Example Data VYM, Stock - Large-Cap Value, 0.08%, $77.95, - $0.11, -0.14%, 3.09%B, 4.49%, 12.73%, 13.64%, 7.08%, 7.53% (11/10/2006) VIG, Stock - Large-Cap Blend, 0.08%, $92.39, - $0.21, -0.23%, 1.93%B, 9.62%, 13.75%, 12.75%, 7.41%, 7.92% (04/21/2006) Categorizing Stocks & Bonds
  • 45. Which of these look similar? Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► VIG, Stock - Large-Cap Blend ► 9.62, 13.75, 12.75, 7.41 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VCIT, Bond - Inter-term ► 3.47, 1.1, 4.04, 0
  • 46. Group Into Five Clusters 1. Stock 2. StockBigGain 3. International 4. SmallMidLargeCap 5. Bond Categorizing Stocks & Bonds Why? 55 2 = 5
  • 47. Predicting on New Data Use centroids from training Determine cluster for each test point Assign label Easy! Categorizing Stocks & Bonds
  • 48. Predicting on New Data Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VTI, Stock – Large-Cap Blend ► 9.04, 18.49, 14.55, 7.39 Stock Bond ?
  • 49. Predicting on New Data Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VTI, Stock – Large-Cap Blend ► 9.04, 18.49, 14.55, 7.39 Stock Bond ?
  • 50. Results – Did It Work? ►VEU, International, 1, International ►VNQI, International, 1,International ►VXUS, International, 1, International ►BLV, Bond - Long-term, 3, Stock ►BIV, Bond - Inter-term, 4, Bond ►VCLT, Bond - Long-term, 4, Bond ►BSV, Bond - Short-term, 4, Bond  VIG, Stock - Large-Cap Blend, 5, SmallMidLargeCap  VUG, Stock - Large-Cap Growth, 5, SmallMidLargeCap  VTI, Stock - Large-Cap Blend, 5, SmallMidLargeCap Categorizing Stocks & Bonds Not Bad! 