SlideShare a Scribd company logo
American Sign
Language
Recognizer
By Ming Rutar
ASL Recognizer is a Udacity AI Course Project
Udacity is an online school founded by top AI gurus. http://www.udacity.com
Zillion ideas
floating in
academia
world
Few ideas
made to
Industry
Industry Cutting
Edge
Technologies
Science/Theory
Udacity teaches cutting-edge technologies with
academic depth and hands-on practices on
technologies
Technology/Practice
❖ A course lasts 3 - 6 months with
3-7 projects.
❖ The projects are product-like.
❖ Focus on core technologies and
provide helpers on utilitive tasks,
such as environment setup.
❖ Very active online communities.
Course instructors also
participate.
❖ Student projects are reviewed by
experts of the subject matter.
❖ If one had graduated, he/she can
always access the course
materials, which are adhered
with the technology trend and
updated accordingly.
❖ Affordable price.
The task
The overall goal of this project is to build a word recognizer for American Sign Language video
sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden
Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign
Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the
right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test
sets are pre-defined. You will derive a variety of feature sets
The Dataset
We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic
after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as
Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are
preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this:
X
Y
nx
ny
lx
rx
ly
ry
More about the data
The training input file:
video,speaker,word,startframe,endframe
1,woman-1,JOHN,8,17
1,woman-1,WRITE,22,50
1,woman-1,HOMEWORK,51,77
3,woman-2,IX-1P,4,11
3,woman-2,SEE,12,20
3,woman-2,JOHN,20,31
3,woman-2,YESTERDAY,31,40
3,woman-2,IX,44,52
4,woman-1,JOHN,2,13
4,woman-1,IX-1P,13,18
4,woman-1,SEE,19,27
4,woman-1,IX,28,35
4,woman-1,YESTERDAY,36,47
5,woman-2,LOVE,12,21
The test input file:
video,speaker,word,startframe,endframe
2,woman-1,JOHN,7,20
2,woman-1,WRITE,23,36
2,woman-1,HOMEWORK,38,63
7,man-1,JOHN,22,39
7,man-1,CAN,42,47
7,man-1,GO,48,56
7,man-1,CAN,62,73
12,woman-2,JOHN,9,15
12,woman-2,CAN,19,24
12,woman-2,GO,25,34
12,woman-2,CAN,35,51
21,woman-2,JOHN,6,26
the training data contains 112 unique words; test data contains 66 unique words; in test data, we
have 40 sentences made of 178 words.l
Feature Extraction
Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to
select features. Examples:
X
Y
g-ly
g-ry
g-rx
g-lx
Feature_ground
features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly']
asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y']
asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x']
...
X
rr
ltheta
lr
rtheta
feature_polar
features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta']
asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2)
asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y'])
...
HMMLearn
HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be
represented in Bayesian network:
We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word
‘Chocolate’ with different number of hidden states
● We initiate the class with number of hidden states,
number of iteration and more, see reference at
http://hmmlearn.readthedocs.io/en/latest/api.html#hm
mlearn.hmm.GaussianHMM
● for training we call method fit() and pass in the training
data, it returns itself.
● for inference, we call method score() with the word, it
emits a float that indicates the likelihood of input.
How do we do it
● We train the model one word at time with the training data.
● The words are encoded by associated with a unique integer, the word id
● A word has an associated list of feature set
● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then
select the best model for the word
● So after training, each word has a model.
● We test the models by building a recognizer that
○ Pick a feature and a model, test them with full sentences:
■ For each word in a sentence, ‘reading’ feature set
■ Pick the model with highest score model
■ From the model we find the word id
○ We decode the sequence of word id to a sentence
○ Company the synthesized sentence with the original sentence and get the Error Rate
● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
Model Selection
The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We
improve the model selection by use 2 popular information criteria:
● Bayesian information criteria (BIC)
○ The purpose is to punish the word with longer seq to prevent overfit.
○ BIC = −2 log L + p log N
■ where p is a parameter, L is Gausian score, N is the hmm length of the word.
■ p is very magical!!!
■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf
● Discriminative Information Criterion (DIC)
○ DIC scores the discriminant ability of a training set for one word against competing words.
Testing and Output
model_selector=SelectorBIC_orig, features=scale_podel
**** WER = 0.43258426966292135
Total correct: 101 out of 178
Video Recognized Correct
=====================================================================================================
2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK
7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN
12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN
21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN
25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX
28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX
30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX
36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1
40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE
43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE
50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD
54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE
57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY
67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE
71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY
74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY
77: *JOHN BLAME MARY ANN BLAME MARY
The Results
features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta
by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words

More Related Content

Similar to American sign language recognizer

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document file
sheiblu
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
NUS-ISS
 
Data Science
Data Science Data Science
Data Science
University of Sindh
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
RahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
Ganesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
kalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
Aravind Reddy
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET Journal
 
DP Project Report
DP Project ReportDP Project Report
DP Project Report
Chawal Ukesh
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
lbishal
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
Syeda Nasiha
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
KP Kshitij Parashar
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf
vanithagp1
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
QuantUniversity
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
Lu Jiang
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
台灣資料科學年會
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind-slides
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
Andy Zaidman
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
JaeHo Jang
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event Categorization
Editor IJCATR
 

Similar to American sign language recognizer (20)

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document file
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
DP Project Report
DP Project ReportDP Project Report
DP Project Report
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 
Tagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event CategorizationTagging based Efficient Web Video Event Categorization
Tagging based Efficient Web Video Event Categorization
 

Recently uploaded

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
Javier Junquera
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
Hiike
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 

Recently uploaded (20)

Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)GNSS spoofing via SDR (Criptored Talks 2024)
GNSS spoofing via SDR (Criptored Talks 2024)
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - HiikeSystem Design Case Study: Building a Scalable E-Commerce Platform - Hiike
System Design Case Study: Building a Scalable E-Commerce Platform - Hiike
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 

American sign language recognizer

  • 2. ASL Recognizer is a Udacity AI Course Project Udacity is an online school founded by top AI gurus. http://www.udacity.com Zillion ideas floating in academia world Few ideas made to Industry Industry Cutting Edge Technologies Science/Theory Udacity teaches cutting-edge technologies with academic depth and hands-on practices on technologies Technology/Practice ❖ A course lasts 3 - 6 months with 3-7 projects. ❖ The projects are product-like. ❖ Focus on core technologies and provide helpers on utilitive tasks, such as environment setup. ❖ Very active online communities. Course instructors also participate. ❖ Student projects are reviewed by experts of the subject matter. ❖ If one had graduated, he/she can always access the course materials, which are adhered with the technology trend and updated accordingly. ❖ Affordable price.
  • 3. The task The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test sets are pre-defined. You will derive a variety of feature sets
  • 4. The Dataset We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this: X Y nx ny lx rx ly ry
  • 5. More about the data The training input file: video,speaker,word,startframe,endframe 1,woman-1,JOHN,8,17 1,woman-1,WRITE,22,50 1,woman-1,HOMEWORK,51,77 3,woman-2,IX-1P,4,11 3,woman-2,SEE,12,20 3,woman-2,JOHN,20,31 3,woman-2,YESTERDAY,31,40 3,woman-2,IX,44,52 4,woman-1,JOHN,2,13 4,woman-1,IX-1P,13,18 4,woman-1,SEE,19,27 4,woman-1,IX,28,35 4,woman-1,YESTERDAY,36,47 5,woman-2,LOVE,12,21 The test input file: video,speaker,word,startframe,endframe 2,woman-1,JOHN,7,20 2,woman-1,WRITE,23,36 2,woman-1,HOMEWORK,38,63 7,man-1,JOHN,22,39 7,man-1,CAN,42,47 7,man-1,GO,48,56 7,man-1,CAN,62,73 12,woman-2,JOHN,9,15 12,woman-2,CAN,19,24 12,woman-2,GO,25,34 12,woman-2,CAN,35,51 21,woman-2,JOHN,6,26 the training data contains 112 unique words; test data contains 66 unique words; in test data, we have 40 sentences made of 178 words.l
  • 6. Feature Extraction Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to select features. Examples: X Y g-ly g-ry g-rx g-lx Feature_ground features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly'] asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y'] asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x'] ... X rr ltheta lr rtheta feature_polar features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta'] asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2) asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y']) ...
  • 7. HMMLearn HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be represented in Bayesian network: We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word ‘Chocolate’ with different number of hidden states ● We initiate the class with number of hidden states, number of iteration and more, see reference at http://hmmlearn.readthedocs.io/en/latest/api.html#hm mlearn.hmm.GaussianHMM ● for training we call method fit() and pass in the training data, it returns itself. ● for inference, we call method score() with the word, it emits a float that indicates the likelihood of input.
  • 8. How do we do it ● We train the model one word at time with the training data. ● The words are encoded by associated with a unique integer, the word id ● A word has an associated list of feature set ● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then select the best model for the word ● So after training, each word has a model. ● We test the models by building a recognizer that ○ Pick a feature and a model, test them with full sentences: ■ For each word in a sentence, ‘reading’ feature set ■ Pick the model with highest score model ■ From the model we find the word id ○ We decode the sequence of word id to a sentence ○ Company the synthesized sentence with the original sentence and get the Error Rate ● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
  • 9. Model Selection The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We improve the model selection by use 2 popular information criteria: ● Bayesian information criteria (BIC) ○ The purpose is to punish the word with longer seq to prevent overfit. ○ BIC = −2 log L + p log N ■ where p is a parameter, L is Gausian score, N is the hmm length of the word. ■ p is very magical!!! ■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf ● Discriminative Information Criterion (DIC) ○ DIC scores the discriminant ability of a training set for one word against competing words.
  • 10. Testing and Output model_selector=SelectorBIC_orig, features=scale_podel **** WER = 0.43258426966292135 Total correct: 101 out of 178 Video Recognized Correct ===================================================================================================== 2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK 7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN 12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN 21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN 25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX 28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX 30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX 36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1 40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE 43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE 50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD 54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE 57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY 67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE 71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY 74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY 77: *JOHN BLAME MARY ANN BLAME MARY
  • 11. The Results features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words