SlideShare a Scribd company logo
1 of 11
Download to read offline
American Sign
Language
Recognizer
By Ming Rutar
ASL Recognizer is a Udacity AI Course Project
Udacity is an online school founded by top AI gurus. http://www.udacity.com
Zillion ideas
floating in
academia
world
Few ideas
made to
Industry
Industry Cutting
Edge
Technologies
Science/Theory
Udacity teaches cutting-edge technologies with
academic depth and hands-on practices on
technologies
Technology/Practice
❖ A course lasts 3 - 6 months with
3-7 projects.
❖ The projects are product-like.
❖ Focus on core technologies and
provide helpers on utilitive tasks,
such as environment setup.
❖ Very active online communities.
Course instructors also
participate.
❖ Student projects are reviewed by
experts of the subject matter.
❖ If one had graduated, he/she can
always access the course
materials, which are adhered
with the technology trend and
updated accordingly.
❖ Affordable price.
The task
The overall goal of this project is to build a word recognizer for American Sign Language video
sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden
Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign
Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the
right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test
sets are pre-defined. You will derive a variety of feature sets
The Dataset
We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic
after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as
Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are
preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this:
X
Y
nx
ny
lx
rx
ly
ry
More about the data
The training input file:
video,speaker,word,startframe,endframe
1,woman-1,JOHN,8,17
1,woman-1,WRITE,22,50
1,woman-1,HOMEWORK,51,77
3,woman-2,IX-1P,4,11
3,woman-2,SEE,12,20
3,woman-2,JOHN,20,31
3,woman-2,YESTERDAY,31,40
3,woman-2,IX,44,52
4,woman-1,JOHN,2,13
4,woman-1,IX-1P,13,18
4,woman-1,SEE,19,27
4,woman-1,IX,28,35
4,woman-1,YESTERDAY,36,47
5,woman-2,LOVE,12,21
The test input file:
video,speaker,word,startframe,endframe
2,woman-1,JOHN,7,20
2,woman-1,WRITE,23,36
2,woman-1,HOMEWORK,38,63
7,man-1,JOHN,22,39
7,man-1,CAN,42,47
7,man-1,GO,48,56
7,man-1,CAN,62,73
12,woman-2,JOHN,9,15
12,woman-2,CAN,19,24
12,woman-2,GO,25,34
12,woman-2,CAN,35,51
21,woman-2,JOHN,6,26
the training data contains 112 unique words; test data contains 66 unique words; in test data, we
have 40 sentences made of 178 words.l
Feature Extraction
Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to
select features. Examples:
X
Y
g-ly
g-ry
g-rx
g-lx
Feature_ground
features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly']
asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y']
asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x']
...
X
rr
ltheta
lr
rtheta
feature_polar
features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta']
asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2)
asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y'])
...
HMMLearn
HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be
represented in Bayesian network:
We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word
‘Chocolate’ with different number of hidden states
● We initiate the class with number of hidden states,
number of iteration and more, see reference at
http://hmmlearn.readthedocs.io/en/latest/api.html#hm
mlearn.hmm.GaussianHMM
● for training we call method fit() and pass in the training
data, it returns itself.
● for inference, we call method score() with the word, it
emits a float that indicates the likelihood of input.
How do we do it
● We train the model one word at time with the training data.
● The words are encoded by associated with a unique integer, the word id
● A word has an associated list of feature set
● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then
select the best model for the word
● So after training, each word has a model.
● We test the models by building a recognizer that
○ Pick a feature and a model, test them with full sentences:
■ For each word in a sentence, ‘reading’ feature set
■ Pick the model with highest score model
■ From the model we find the word id
○ We decode the sequence of word id to a sentence
○ Company the synthesized sentence with the original sentence and get the Error Rate
● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
Model Selection
The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We
improve the model selection by use 2 popular information criteria:
● Bayesian information criteria (BIC)
○ The purpose is to punish the word with longer seq to prevent overfit.
○ BIC = −2 log L + p log N
■ where p is a parameter, L is Gausian score, N is the hmm length of the word.
■ p is very magical!!!
■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf
● Discriminative Information Criterion (DIC)
○ DIC scores the discriminant ability of a training set for one word against competing words.
Testing and Output
model_selector=SelectorBIC_orig, features=scale_podel
**** WER = 0.43258426966292135
Total correct: 101 out of 178
Video Recognized Correct
=====================================================================================================
2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK
7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN
12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN
21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN
25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX
28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX
30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX
36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1
40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE
43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE
50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD
54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE
57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY
67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE
71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY
74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY
77: *JOHN BLAME MARY ANN BLAME MARY
The Results
features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta
by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words

More Related Content

Similar to American sign language recognizer

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document filesheiblu
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondNUS-ISS
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptRahulTr22
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .pptGanesh E
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptkalai75
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.pptAravind Reddy
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdfvkharish18
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET Journal
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015lbishal
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsxSyeda Nasiha
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...KP Kshitij Parashar
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdfvanithagp1
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nistLu Jiang
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人台灣資料科學年會
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind-slides
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the PeopleAndy Zaidman
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problemJaeHo Jang
 

Similar to American sign language recognizer (20)

Final Project Submission Document file
Final Project Submission Document fileFinal Project Submission Document file
Final Project Submission Document file
 
The Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and BeyondThe Frontier of Deep Learning in 2020 and Beyond
The Frontier of Deep Learning in 2020 and Beyond
 
Data Science
Data Science Data Science
Data Science
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Data science programming .ppt
Data science programming .pptData science programming .ppt
Data science programming .ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
Lec1cgu13updated.ppt
Lec1cgu13updated.pptLec1cgu13updated.ppt
Lec1cgu13updated.ppt
 
data-science-pdf-16588.pdf
data-science-pdf-16588.pdfdata-science-pdf-16588.pdf
data-science-pdf-16588.pdf
 
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
IRJET - Automatic Lip Reading: Classification of Words and Phrases using Conv...
 
DP Project Report
DP Project ReportDP Project Report
DP Project Report
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Internship PPT.ppsx
Internship PPT.ppsxInternship PPT.ppsx
Internship PPT.ppsx
 
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
Assistive system for Parkinson's patients - Carnegie Mellon University Spring...
 
_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf_OOP with JAVA Solution Manual (1).pdf
_OOP with JAVA Solution Manual (1).pdf
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 
CMU Trecvid med13 nist
CMU Trecvid med13 nistCMU Trecvid med13 nist
CMU Trecvid med13 nist
 
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
[TOxAIA新竹分校] 工業4.0潛力新應用! 多模式對話機器人
 
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
DataMind: An e-learning platform for Data Analysis based on R. RBelgium meetu...
 
Automatic for the People
Automatic for the PeopleAutomatic for the People
Automatic for the People
 
Open vocabulary problem
Open vocabulary problemOpen vocabulary problem
Open vocabulary problem
 

Recently uploaded

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

American sign language recognizer

  • 2. ASL Recognizer is a Udacity AI Course Project Udacity is an online school founded by top AI gurus. http://www.udacity.com Zillion ideas floating in academia world Few ideas made to Industry Industry Cutting Edge Technologies Science/Theory Udacity teaches cutting-edge technologies with academic depth and hands-on practices on technologies Technology/Practice ❖ A course lasts 3 - 6 months with 3-7 projects. ❖ The projects are product-like. ❖ Focus on core technologies and provide helpers on utilitive tasks, such as environment setup. ❖ Very active online communities. Course instructors also participate. ❖ Student projects are reviewed by experts of the subject matter. ❖ If one had graduated, he/she can always access the course materials, which are adhered with the technology trend and updated accordingly. ❖ Affordable price.
  • 3. The task The overall goal of this project is to build a word recognizer for American Sign Language video sequences, demonstrating the power of probabalistic models. In particular, this project employs hidden Markov models (HMM's) to analyze a series of measurements taken from videos of American Sign Language (ASL) collected for research (see the RWTH-BOSTON-104 Database). In this video, the right-hand x and y locations are plotted as the speaker signs the sentence.The raw data, train, and test sets are pre-defined. You will derive a variety of feature sets
  • 4. The Dataset We recognize the meaning of ASL when watch the hand movement of the speaker. The computer mimic after us. Nowaday, the technology can tag video, but not in 1990th. The hand gestion data, such as Cartesian coordinates of left and right hands, and of the nose, which servers as a reference, are preprocessed (extracted from the video). After load the data, the ‘asl’ dataframe looks like this: X Y nx ny lx rx ly ry
  • 5. More about the data The training input file: video,speaker,word,startframe,endframe 1,woman-1,JOHN,8,17 1,woman-1,WRITE,22,50 1,woman-1,HOMEWORK,51,77 3,woman-2,IX-1P,4,11 3,woman-2,SEE,12,20 3,woman-2,JOHN,20,31 3,woman-2,YESTERDAY,31,40 3,woman-2,IX,44,52 4,woman-1,JOHN,2,13 4,woman-1,IX-1P,13,18 4,woman-1,SEE,19,27 4,woman-1,IX,28,35 4,woman-1,YESTERDAY,36,47 5,woman-2,LOVE,12,21 The test input file: video,speaker,word,startframe,endframe 2,woman-1,JOHN,7,20 2,woman-1,WRITE,23,36 2,woman-1,HOMEWORK,38,63 7,man-1,JOHN,22,39 7,man-1,CAN,42,47 7,man-1,GO,48,56 7,man-1,CAN,62,73 12,woman-2,JOHN,9,15 12,woman-2,CAN,19,24 12,woman-2,GO,25,34 12,woman-2,CAN,35,51 21,woman-2,JOHN,6,26 the training data contains 112 unique words; test data contains 66 unique words; in test data, we have 40 sentences made of 178 words.l
  • 6. Feature Extraction Features are data we feed into networks. Feature selection is crucial in success of a network. Use common sense to select features. Examples: X Y g-ly g-ry g-rx g-lx Feature_ground features_ground = ['grnd-rx', 'grnd-ry', 'grnd-lx', 'grnd-ly'] asl.df['grnd-ly'] = asl.df['left-y'] - asl.df['nose-y'] asl.df['grnd-lx'] = asl.df['left-x'] - asl.df['nose-x'] ... X rr ltheta lr rtheta feature_polar features_polar = ['polar-rr', 'polar-rtheta', 'polar-lr', 'polar-ltheta'] asl.df['polar-rr'] = np.sqrt((asl.df['right-x']- asl.df['nose-x'])**2 + (asl.df['right-y']-asl.df['nose-y'])**2) asl.df['polar-rtheta'] = np.arctan2(asl.df['right-x']- asl.df['nose-x'],asl.df['right-y'] - asl.df['nose-y']) ...
  • 7. HMMLearn HMMLearn is a library for unsupervised learning. HMM stands for Hidden Markov Model. Just as Neural Network, it can be represented in Bayesian network: We use HMMLearn class GausianHMM model. Gausian curve is the famous bell curve. Below is the curves of word ‘Chocolate’ with different number of hidden states ● We initiate the class with number of hidden states, number of iteration and more, see reference at http://hmmlearn.readthedocs.io/en/latest/api.html#hm mlearn.hmm.GaussianHMM ● for training we call method fit() and pass in the training data, it returns itself. ● for inference, we call method score() with the word, it emits a float that indicates the likelihood of input.
  • 8. How do we do it ● We train the model one word at time with the training data. ● The words are encoded by associated with a unique integer, the word id ● A word has an associated list of feature set ● We train GaussianHMM model with a word feature set. Try with difference number of hidden states, then select the best model for the word ● So after training, each word has a model. ● We test the models by building a recognizer that ○ Pick a feature and a model, test them with full sentences: ■ For each word in a sentence, ‘reading’ feature set ■ Pick the model with highest score model ■ From the model we find the word id ○ We decode the sequence of word id to a sentence ○ Company the synthesized sentence with the original sentence and get the Error Rate ● The criteria for passing the project is < 60 % error rate, or recognize 40+% words correctly
  • 9. Model Selection The raw Gaussian model is a rough cut. In my test, it correctly recognized 58 words out of 178 (about 67% error rate). We improve the model selection by use 2 popular information criteria: ● Bayesian information criteria (BIC) ○ The purpose is to punish the word with longer seq to prevent overfit. ○ BIC = −2 log L + p log N ■ where p is a parameter, L is Gausian score, N is the hmm length of the word. ■ p is very magical!!! ■ to learn more, check this link http://www2.imm.dtu.dk/courses/02433/doc/ch6_slides.pdf ● Discriminative Information Criterion (DIC) ○ DIC scores the discriminant ability of a training set for one word against competing words.
  • 10. Testing and Output model_selector=SelectorBIC_orig, features=scale_podel **** WER = 0.43258426966292135 Total correct: 101 out of 178 Video Recognized Correct ===================================================================================================== 2: JOHN WRITE HOMEWORK JOHN WRITE HOMEWORK 7: JOHN *HAVE GO *ARRIVE JOHN CAN GO CAN 12: JOHN *WHAT *GO1 CAN JOHN CAN GO CAN 21: JOHN FISH WONT *WHO BUT *CAR *CHICKEN CHICKEN JOHN FISH WONT EAT BUT CAN EAT CHICKEN 25: JOHN *TELL *LOVE *WHO IX JOHN LIKE IX IX IX 28: JOHN *WHO *WHO *WHO IX JOHN LIKE IX IX IX 30: JOHN *MARY *MARY *MARY *MARY JOHN LIKE IX IX IX 36: MARY VEGETABLE *GIRL *GIVE *MARY *MARY MARY VEGETABLE KNOW IX LIKE CORN1 40: JOHN *VISIT *CORN *JOHN *MARY JOHN IX THINK MARY LOVE 43: JOHN *SHOULD BUY HOUSE JOHN MUST BUY HOUSE 50: *JOHN *SEE BUY CAR SHOULD FUTURE JOHN BUY CAR SHOULD 54: JOHN *JOHN *MARY BUY HOUSE JOHN SHOULD NOT BUY HOUSE 57: JOHN *PREFER VISIT MARY JOHN DECIDE VISIT MARY 67: JOHN *YESTERDAY NOT BUY HOUSE JOHN FUTURE NOT BUY HOUSE 71: JOHN *FUTURE VISIT MARY JOHN WILL VISIT MARY 74: *IX *MARY *MARY MARY JOHN NOT VISIT MARY 77: *JOHN BLAME MARY ANN BLAME MARY
  • 11. The Results features_customer2 is the winner. features_customer2 is scaled Cartesian coordinates + time delta by just scale the values of features_podel, scale_podel outperforms features_podel, 101 vs 89 words