SlideShare a Scribd company logo
Machine Learning in a
Flash
Kory Becker
August, 2017, http://primaryobjects.com
1
Sponsored by
2
AI IS GOOD
AI !== Machine Learning
 Logical AI, Symbolic, Knowledge-
based
 Pattern Recognition, Representation
 Inference, Common Sense, Planning
 Heuristics, Ontology, Artificial Life,
Genetic
 Machine Learning, Statistics
3
Machine Learning
Algorithms
Supervised
Linear Regression
Logistic Regression
Support Vector Machines
Neural Networks
Unsupervised
K-means Clustering
Principal Component Analysis (Dimensionality
Reduction)
4
Linear Regression
Logistic Regression
Logistic Regression
Linear Classification
Support Vector Machine
Non-Linear Classification
Support Vector Machine
Gaussian Kernel
Pop Quiz!
Question 1: Supervised
or Unsupervised?
 You are designing an agent for The Matrix.
 It’s task is to classify people that are threats to the system.
 Feature Set:
 Age
 IQ
 Level of Education
 # of Times They Watched the Movie The Matrix
 Training Set of 100,000 people: 50k threats, 50k non-threats
Question 2: Supervised
or Unsupervised?
 You are designing the brain of a battle robot.
 It’s primary attack is hand-to-hand combat. Your task is to
find the most effective move combos.
 Feature Set:
 # of Kicks
 # of Punches
 # of Head-butts
 # of Leg Sweeps
 Training Set of 100,000 winning battles
Natural Language
Processing
Convert text into a numerical representation
Find commonalities within data
Clustering
Make predictions from data
Classification
Category, Popularity, Sentiment,
Relationships
Bag of Words Model
Corpus
Cats like to chase mice.
Dogs like to eat big bones.
Create a Dictionary Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Cats like to chase mice.
Dogs like to eat big bones.
Corpus
Digitize Text
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Vector Length = 8
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Classify Documents
(eating)
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
0
1
Corpus
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
?
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Predict on New Data
Cats like to chase mice.
1 1 1 1 0 0 0 0
Dogs like to eat big bones.
0 1 0 0 1 1 1 1
Bats eat bugs.
0 0 0 0 0 1 0 0
0
1
1
Dictionary
0 - cats
1 - like
2 - chase
3 - mice
4 - dogs
5 - eat
6 - big
7 - bones
Does it Really Work?
> data
[1] "Cats like to chase mice." "Dogs like to eat big
bones."
> train
big bone cat chase dog eat like mice y
1 0 0 1 1 0 0 1 1 0
2 1 1 0 0 1 1 1 0 1
> predict(fit, newdata = train)
[1] 0 1
> data2
[1] "Bats eat bugs."
> test
big bone cat chase dog eat like mice
1 0 0 0 0 0 1 0 0
> predict(fit, newdata = test)
[1] 1
Document
Term Matrix
100% Accuracy Training
Test Case
Success! Source code:
https://goo.gl/UxjPBs
Unsupervised Learning
Finding patterns in data
Grouping similar data into clusters
Does not require labeled data
Exploratory data analysis
Predict clusters for new data!
K-Means Clustering
Popular clustering algorithm
Groups data into k clusters
Data points belong to the cluster with closest mean
Each cluster has a centroid (center)
k-Means Algorithm
Choose a value for k (number of clusters)
 Guess
 Rule of thumb: ~~(Math.sqrt(points.length * 0.5))
Initialize centroids
 Random
 Farthest Point
 K-means++
Assign data points to closest centroid
Move centroids to center of assigned points
Demo: https://goo.gl/AjNEJk
Clustering Example 1
Clustering Example 1
Clustering Example 1
Clustering Example 2
Clustering Example 2
Predicting Color Groups
rgb(255, 0, 0)
rgb(0, 255, 0)
rgb(0, 0, 255)
rgb(200, 0, 150)
rgb(50, 199, 135)
rgb(100, 180, 255)
red
green
blue
?
?
?
Predicting Color Groups
rgb(255, 0, 0)
rgb(0, 255, 0)
rgb(0, 0, 255)
rgb(200, 0, 150)
rgb(50, 199, 135)
rgb(100, 180, 255)
red
green
blue
?
?
?
= 16777216
Encoding
= 65280
= 13107350
= 3327879
= 6599935
= 255
(Red * 256 * 256) + (Green * 256) + (Blue)
1000 Colors
100 Colors
Calculating Centroids
Classifying Colors to a Cluster
Grouping Colors into their Cluster
Predicting Color Groups
rgb(241, 52, 11)
rgb(80, 187, 139)
rgb(34, 15, 194)
?
?
?
Predicting on New Data
Predicting on New Data
Categorizing Stocks & Bonds
Data Source: Vanguard ETF funds
Data Fields:
 Ticker, Asset Class, Expense Ratio
 Price, Change 1, Change 2, SEC Yield
 YTD, Year 1, Year 5, Year 10
 Since Inception
Can We Predict a Category?
Categorizing Stocks & Bonds
International
Stocks
Interm Bond
Long
Bond
Asset Classes
Stock Sector
Stock Mid-Cap Blend
Stock Large-Cap Value
International
Bond Inter-term Investment
Bond Inter-term Government
Bond Long-term Government
Categorizing Stocks & Bonds
Asset Classes
Stock Sector
Stock Mid-Cap Blend
Stock Large-Cap Value
International
Bond Inter-term Investment
Bond Inter-term Government
Bond Long-term Government
Categorizing Stocks & Bonds
Features
Ticker, Asset Class, Expense Ratio
Price, Change 1, Change 2, SEC Yield
YTD, Year 1, Year 5, Year 10
Since Inception
Categorizing Stocks & Bonds
Example Data
VYM, Stock - Large-Cap Value, 0.08%, $77.95, -
$0.11, -0.14%, 3.09%B, 4.49%, 12.73%, 13.64%,
7.08%, 7.53% (11/10/2006)
VIG, Stock - Large-Cap Blend, 0.08%, $92.39, -
$0.21, -0.23%, 1.93%B, 9.62%, 13.75%, 12.75%,
7.41%, 7.92% (04/21/2006)
Categorizing Stocks & Bonds
Which of these look similar?
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► VIG, Stock - Large-Cap Blend
► 9.62, 13.75, 12.75, 7.41
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VCIT, Bond - Inter-term
► 3.47, 1.1, 4.04, 0
Group Into Five Clusters
1. Stock
2. StockBigGain
3. International
4. SmallMidLargeCap
5. Bond
Categorizing Stocks & Bonds
Why?
55
2
= 5
Predicting on New Data
Use centroids from training
Determine cluster for each test point
Assign label
Easy!
Categorizing Stocks & Bonds
Predicting on New Data
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VTI, Stock – Large-Cap Blend
► 9.04, 18.49, 14.55, 7.39
Stock
Bond
?
Predicting on New Data
Categorizing Stocks & Bonds
► VYM, Stock - Large-Cap Value
► 4.49, 12.73, 13.64, 7.08
► EDV, Bond - Long-term
► 6.48, -10.37, 3.38, 0
► VTI, Stock – Large-Cap Blend
► 9.04, 18.49, 14.55, 7.39
Stock
Bond
?
Results – Did It Work?
►VEU, International, 1, International
►VNQI, International, 1,International
►VXUS, International, 1, International
►BLV, Bond - Long-term, 3, Stock
►BIV, Bond - Inter-term, 4, Bond
►VCLT, Bond - Long-term, 4, Bond
►BSV, Bond - Short-term, 4, Bond
 VIG, Stock - Large-Cap Blend, 5, SmallMidLargeCap
 VUG, Stock - Large-Cap Growth, 5, SmallMidLargeCap
 VTI, Stock - Large-Cap Blend, 5, SmallMidLargeCap
Categorizing Stocks & Bonds
Not Bad! 
Thank you!
Kory Becker
http://primaryobjects.com
@primaryobjects

More Related Content

Similar to Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
butest
 
PHP - Introduction to PHP MySQL Joins and SQL Functions
PHP -  Introduction to PHP MySQL Joins and SQL FunctionsPHP -  Introduction to PHP MySQL Joins and SQL Functions
PHP - Introduction to PHP MySQL Joins and SQL Functions
Vibrant Technologies & Computers
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
Syed Muhammad Zeejah Hashmi
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptx
ssuserc074dd
 
Unit 4.pptx
Unit 4.pptxUnit 4.pptx
Unit 4.pptx
DrThenmozhiSPESUMCA
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
Padma Metta
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18
Austin Benson
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
Derek Kane
 
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
ItemResponseTheory+ComputerizedAdaptiveTesting.pptxItemResponseTheory+ComputerizedAdaptiveTesting.pptx
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
foodcoop1
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Jeffrey Nichols
 
IBM Watson Concept Insights
IBM Watson Concept InsightsIBM Watson Concept Insights
IBM Watson Concept Insights
Kory Becker
 
Topic_6
Topic_6Topic_6
Topic_6
butest
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
Institute of Agricultural Machinery, NARO
 
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Numenta
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
Pallavi Vashistha
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
yaevents
 
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Yandex
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
Antoine Taly
 
ensemble learning
ensemble learningensemble learning
ensemble learning
butest
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
ideatoipo
 

Similar to Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering (20)

Computational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding RegionsComputational Biology, Part 4 Protein Coding Regions
Computational Biology, Part 4 Protein Coding Regions
 
PHP - Introduction to PHP MySQL Joins and SQL Functions
PHP -  Introduction to PHP MySQL Joins and SQL FunctionsPHP -  Introduction to PHP MySQL Joins and SQL Functions
PHP - Introduction to PHP MySQL Joins and SQL Functions
 
Genetic algorithm
Genetic algorithmGenetic algorithm
Genetic algorithm
 
ObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptxObjRecog2-17 (1).pptx
ObjRecog2-17 (1).pptx
 
Unit 4.pptx
Unit 4.pptxUnit 4.pptx
Unit 4.pptx
 
Machine learning and decision trees
Machine learning and decision treesMachine learning and decision trees
Machine learning and decision trees
 
Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18Simplicial closure and higher-order link prediction --- SIAMNS18
Simplicial closure and higher-order link prediction --- SIAMNS18
 
Data Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic AlgorithmsData Science - Part XIV - Genetic Algorithms
Data Science - Part XIV - Genetic Algorithms
 
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
ItemResponseTheory+ComputerizedAdaptiveTesting.pptxItemResponseTheory+ComputerizedAdaptiveTesting.pptx
ItemResponseTheory+ComputerizedAdaptiveTesting.pptx
 
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
Who will RT this?: Automatically Identifying and Engaging Strangers on Twitte...
 
IBM Watson Concept Insights
IBM Watson Concept InsightsIBM Watson Concept Insights
IBM Watson Concept Insights
 
Topic_6
Topic_6Topic_6
Topic_6
 
ML基本からResNetまで
ML基本からResNetまでML基本からResNetまで
ML基本からResNetまで
 
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
Brains, Data, and Machine Intelligence (2014 04 14 London Meetup)
 
The application of artificial intelligence
The application of artificial intelligenceThe application of artificial intelligence
The application of artificial intelligence
 
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
Mike Thelwall - Sentiment strength detection for the social web: From YouTube...
 
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
Michael Thelwall "Sentiment strength detection for the social web: From YouTu...
 
Sciences Games #Glass2015
Sciences Games #Glass2015Sciences Games #Glass2015
Sciences Games #Glass2015
 
ensemble learning
ensemble learningensemble learning
ensemble learning
 
Data Science 101
Data Science 101Data Science 101
Data Science 101
 

More from Kory Becker

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
Kory Becker
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Kory Becker
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
Kory Becker
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
Kory Becker
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Kory Becker
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
Kory Becker
 
Self Programming Artificial Intelligence
Self Programming Artificial IntelligenceSelf Programming Artificial Intelligence
Self Programming Artificial Intelligence
Kory Becker
 
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Kory Becker
 

More from Kory Becker (8)

Intelligent Heuristics for the Game Isolation
Intelligent Heuristics  for the Game IsolationIntelligent Heuristics  for the Game Isolation
Intelligent Heuristics for the Game Isolation
 
Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020Tips for Submitting a Proposal to Grace Hopper GHC 2020
Tips for Submitting a Proposal to Grace Hopper GHC 2020
 
Grace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing RecapGrace Hopper 2019 Quantum Computing Recap
Grace Hopper 2019 Quantum Computing Recap
 
An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019An Introduction to Quantum Computing - Hopper X1 NYC 2019
An Introduction to Quantum Computing - Hopper X1 NYC 2019
 
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
Self-Programming Artificial Intelligence Grace Hopper GHC 2018 GHC18
 
Self Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning TalkSelf Programming Artificial Intelligence - Lightning Talk
Self Programming Artificial Intelligence - Lightning Talk
 
Self Programming Artificial Intelligence
Self Programming Artificial IntelligenceSelf Programming Artificial Intelligence
Self Programming Artificial Intelligence
 
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
Detecting a Hacked Tweet with Machine Learning (5 Minute Presentation)
 

Recently uploaded

Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
bhumivarma35300
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
akshesh doshi
 
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
kiara pandey
 
Introduction to Cloud computing for Internet of Things
Introduction to Cloud computing for Internet of ThingsIntroduction to Cloud computing for Internet of Things
Introduction to Cloud computing for Internet of Things
NachuSubramanian1
 
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
bahubalikumar09988
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
jealousviolet
 
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
simran hot girls
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
VishrutGoyani1
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
SimonedeGijt
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS Construction ERP Software
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
Philip Schwarz
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
confluent
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
Srinivas Dukka
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
AsifKarimJim
 
React Native vs Flutter - SSTech System
React Native vs Flutter  - SSTech SystemReact Native vs Flutter  - SSTech System
React Native vs Flutter - SSTech System
SSTech System
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
shivamt017
 
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docxComprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Aardwolf Security
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
aslasdfmkhan4750
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
miso_uam
 
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Deliverybangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
sunilverma7884
 

Recently uploaded (20)

Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
Independent Girls call Service Pune 000XX00000 Provide Best And Top Girl Serv...
 
ThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and DjangoThaiPy meetup - Indexes and Django
ThaiPy meetup - Indexes and Django
 
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
Celebrity Girls Call Mumbai 9930687706 Unlimited Short Providing Girls Servic...
 
Introduction to Cloud computing for Internet of Things
Introduction to Cloud computing for Internet of ThingsIntroduction to Cloud computing for Internet of Things
Introduction to Cloud computing for Internet of Things
 
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...🚂🚘 Premium Girls Call Ranchi  🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
🚂🚘 Premium Girls Call Ranchi 🛵🚡000XX00000 💃 Choose Best And Top Girl Service...
 
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
VVIP Girls Call Mumbai 9910780858 Provide Best And Top Girl Service And No1 i...
 
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
Girls Call Jogeshwari 9967584737 Provide Best And Top Girl Service And No1 in...
 
Google ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learningGoogle ML-Kit - Understanding on-device machine learning
Google ML-Kit - Understanding on-device machine learning
 
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptxWired_2.0_Create_AmsterdamJUG_09072024.pptx
Wired_2.0_Create_AmsterdamJUG_09072024.pptx
 
NYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction InnovationNYGGS 360: A Complete ERP for Construction Innovation
NYGGS 360: A Complete ERP for Construction Innovation
 
Folding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a seriesFolding Cheat Sheet #7 - seventh in a series
Folding Cheat Sheet #7 - seventh in a series
 
Il Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazioneIl Data Streaming per un’AI real-time di nuova generazione
Il Data Streaming per un’AI real-time di nuova generazione
 
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
AWS DevOps-Tutorial CHANAKYA SRIYAN DUKKA.
 
welcome to presentation on Google Apps
welcome to   presentation on Google Appswelcome to   presentation on Google Apps
welcome to presentation on Google Apps
 
React Native vs Flutter - SSTech System
React Native vs Flutter  - SSTech SystemReact Native vs Flutter  - SSTech System
React Native vs Flutter - SSTech System
 
Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.Shivam Pandit working on Php Web Developer.
Shivam Pandit working on Php Web Developer.
 
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docxComprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
Comprehensive Vulnerability Assessments Process _ Aardwolf Security.docx
 
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
Independent Girls Call ServiCe Hyderabad 0000000000 Tanisha Best High Class H...
 
Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)Software development... for all? (keynote at ICSOFT'2024)
Software development... for all? (keynote at ICSOFT'2024)
 
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Deliverybangalore Girls call  👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
bangalore Girls call 👀 XXXXXXXXXXX 👀 Rs.9.5 K Cash Payment With Room Delivery
 

Machine Learning in a Flash (Extended Edition): An Introduction to Natural Language Processing and Clustering

  • 1. Machine Learning in a Flash Kory Becker August, 2017, http://primaryobjects.com 1 Sponsored by
  • 3. AI !== Machine Learning  Logical AI, Symbolic, Knowledge- based  Pattern Recognition, Representation  Inference, Common Sense, Planning  Heuristics, Ontology, Artificial Life, Genetic  Machine Learning, Statistics 3
  • 4. Machine Learning Algorithms Supervised Linear Regression Logistic Regression Support Vector Machines Neural Networks Unsupervised K-means Clustering Principal Component Analysis (Dimensionality Reduction) 4
  • 11. Question 1: Supervised or Unsupervised?  You are designing an agent for The Matrix.  It’s task is to classify people that are threats to the system.  Feature Set:  Age  IQ  Level of Education  # of Times They Watched the Movie The Matrix  Training Set of 100,000 people: 50k threats, 50k non-threats
  • 12. Question 2: Supervised or Unsupervised?  You are designing the brain of a battle robot.  It’s primary attack is hand-to-hand combat. Your task is to find the most effective move combos.  Feature Set:  # of Kicks  # of Punches  # of Head-butts  # of Leg Sweeps  Training Set of 100,000 winning battles
  • 13. Natural Language Processing Convert text into a numerical representation Find commonalities within data Clustering Make predictions from data Classification Category, Popularity, Sentiment, Relationships
  • 14. Bag of Words Model Corpus Cats like to chase mice. Dogs like to eat big bones.
  • 15. Create a Dictionary Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones Cats like to chase mice. Dogs like to eat big bones. Corpus
  • 16. Digitize Text Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Vector Length = 8 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 17. Classify Documents (eating) Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 0 1 Corpus Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 18. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 19. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 ? Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 20. Predict on New Data Cats like to chase mice. 1 1 1 1 0 0 0 0 Dogs like to eat big bones. 0 1 0 0 1 1 1 1 Bats eat bugs. 0 0 0 0 0 1 0 0 0 1 1 Dictionary 0 - cats 1 - like 2 - chase 3 - mice 4 - dogs 5 - eat 6 - big 7 - bones
  • 21. Does it Really Work? > data [1] "Cats like to chase mice." "Dogs like to eat big bones." > train big bone cat chase dog eat like mice y 1 0 0 1 1 0 0 1 1 0 2 1 1 0 0 1 1 1 0 1 > predict(fit, newdata = train) [1] 0 1 > data2 [1] "Bats eat bugs." > test big bone cat chase dog eat like mice 1 0 0 0 0 0 1 0 0 > predict(fit, newdata = test) [1] 1 Document Term Matrix 100% Accuracy Training Test Case Success! Source code: https://goo.gl/UxjPBs
  • 22. Unsupervised Learning Finding patterns in data Grouping similar data into clusters Does not require labeled data Exploratory data analysis Predict clusters for new data!
  • 23. K-Means Clustering Popular clustering algorithm Groups data into k clusters Data points belong to the cluster with closest mean Each cluster has a centroid (center)
  • 24. k-Means Algorithm Choose a value for k (number of clusters)  Guess  Rule of thumb: ~~(Math.sqrt(points.length * 0.5)) Initialize centroids  Random  Farthest Point  K-means++ Assign data points to closest centroid Move centroids to center of assigned points Demo: https://goo.gl/AjNEJk
  • 30. Predicting Color Groups rgb(255, 0, 0) rgb(0, 255, 0) rgb(0, 0, 255) rgb(200, 0, 150) rgb(50, 199, 135) rgb(100, 180, 255) red green blue ? ? ?
  • 31. Predicting Color Groups rgb(255, 0, 0) rgb(0, 255, 0) rgb(0, 0, 255) rgb(200, 0, 150) rgb(50, 199, 135) rgb(100, 180, 255) red green blue ? ? ? = 16777216 Encoding = 65280 = 13107350 = 3327879 = 6599935 = 255 (Red * 256 * 256) + (Green * 256) + (Blue)
  • 36. Grouping Colors into their Cluster
  • 37. Predicting Color Groups rgb(241, 52, 11) rgb(80, 187, 139) rgb(34, 15, 194) ? ? ? Predicting on New Data
  • 39. Categorizing Stocks & Bonds Data Source: Vanguard ETF funds Data Fields:  Ticker, Asset Class, Expense Ratio  Price, Change 1, Change 2, SEC Yield  YTD, Year 1, Year 5, Year 10  Since Inception
  • 40. Can We Predict a Category? Categorizing Stocks & Bonds International Stocks Interm Bond Long Bond
  • 41. Asset Classes Stock Sector Stock Mid-Cap Blend Stock Large-Cap Value International Bond Inter-term Investment Bond Inter-term Government Bond Long-term Government Categorizing Stocks & Bonds
  • 42. Asset Classes Stock Sector Stock Mid-Cap Blend Stock Large-Cap Value International Bond Inter-term Investment Bond Inter-term Government Bond Long-term Government Categorizing Stocks & Bonds
  • 43. Features Ticker, Asset Class, Expense Ratio Price, Change 1, Change 2, SEC Yield YTD, Year 1, Year 5, Year 10 Since Inception Categorizing Stocks & Bonds
  • 44. Example Data VYM, Stock - Large-Cap Value, 0.08%, $77.95, - $0.11, -0.14%, 3.09%B, 4.49%, 12.73%, 13.64%, 7.08%, 7.53% (11/10/2006) VIG, Stock - Large-Cap Blend, 0.08%, $92.39, - $0.21, -0.23%, 1.93%B, 9.62%, 13.75%, 12.75%, 7.41%, 7.92% (04/21/2006) Categorizing Stocks & Bonds
  • 45. Which of these look similar? Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► VIG, Stock - Large-Cap Blend ► 9.62, 13.75, 12.75, 7.41 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VCIT, Bond - Inter-term ► 3.47, 1.1, 4.04, 0
  • 46. Group Into Five Clusters 1. Stock 2. StockBigGain 3. International 4. SmallMidLargeCap 5. Bond Categorizing Stocks & Bonds Why? 55 2 = 5
  • 47. Predicting on New Data Use centroids from training Determine cluster for each test point Assign label Easy! Categorizing Stocks & Bonds
  • 48. Predicting on New Data Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VTI, Stock – Large-Cap Blend ► 9.04, 18.49, 14.55, 7.39 Stock Bond ?
  • 49. Predicting on New Data Categorizing Stocks & Bonds ► VYM, Stock - Large-Cap Value ► 4.49, 12.73, 13.64, 7.08 ► EDV, Bond - Long-term ► 6.48, -10.37, 3.38, 0 ► VTI, Stock – Large-Cap Blend ► 9.04, 18.49, 14.55, 7.39 Stock Bond ?
  • 50. Results – Did It Work? ►VEU, International, 1, International ►VNQI, International, 1,International ►VXUS, International, 1, International ►BLV, Bond - Long-term, 3, Stock ►BIV, Bond - Inter-term, 4, Bond ►VCLT, Bond - Long-term, 4, Bond ►BSV, Bond - Short-term, 4, Bond  VIG, Stock - Large-Cap Blend, 5, SmallMidLargeCap  VUG, Stock - Large-Cap Growth, 5, SmallMidLargeCap  VTI, Stock - Large-Cap Blend, 5, SmallMidLargeCap Categorizing Stocks & Bonds Not Bad! 