SlideShare a Scribd company logo
1 of 1
Download to read offline
ISOLATED SIGN RECOGNITION WITH A SIAMESE
NEURAL NETWORK OF RGB AND DEPTH STREAMS
TUR Anil Osman, YALIM KELES Hacer
Ankara University Computer Engineering Department
anilosmantur@gmail.com, hkeles@ankara.edu.tr
Contact Information
Anil Osman TUR Hacer YALIM KELES
e-mail: anilosmantur@gmail.com hkeles@ankara.edu.tr
LinkedIn: linkedin.com/in/anilosmantur linkedin.com/in/haceryalimkeles
Research Gate: researchgate.net/profile/Anil_Tur researchgate.net/profile/Hacer_Keles
GitHub: github.com/AnilOsmanTur
AU CVML LAB
Website: cvml.ankara.edu.tr
LinkedIn: linkedin.com/in/aucvmllab
GitHub: github.com/aucvmllab
References
[1] S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi,
M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon,
"Chalearn looking at people challenge 2014: Dataset and
results". In: ECCV workshop. 2014.
[2] K. He, X. Zhang, S. Ren, J. Su, "Deep Residual Learning for
Image Recognition". Proceedings of the IEEE conference on
computer vision and pattern recognition. 2016.
[3] K. Simonyan, A. Zisserman, "Very deep convolutional net-
works for large-scale image recognition". arXiv preprint
arXiv:1409.1556, 2014.
[4] I. Sutskever, O. Vinyals, Q. V. Le, "Sequence to sequence
learning with neural networks". In: Advances in neural in-
formation processing systems, pp. 3104-3112, 2014.
[5] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, "Empirical evalu-
ation of gated recurrent neural networks on sequence mod-
eling." arXiv preprint arXiv:1412.3555. 2014.
[6] TensorFlow machine learning framework.
https://www.tensorflow.org.
[7] F. Chollet et al., "Keras." https://keras.io, 2015.
Abstract
Sign recognition is a challenging problem due to
high variance of the signs among different signers
and multiple modalities of the input information.
In addition, the challenges that exist in the action
classification problems in computer vision are
similar in this domain too, such as variations
in illumination and background. In this work,
we propose a Siamese Neural Network (SNN)
architecture that is used to extract features from
the RGB and the depth streams of a sign frame in
parallel. We use a pretrained model for the SNN
without any finetuning to our training data. We
then apply global feature pooling to the depth
and color features that the SNN generates and
feed the concatenation of the selected features to a
recurrent neural network (RNN) to discriminate
the signs. We trained our model parameters
with the Montalbano dataset and achieve 93.19%
test accuracy with ResNet-50 and 91.61% with
VGG-16 Network Models.
Introduction
Motivation
• To solve communication problems between
the deaf and the hearing communities.
• Human-machine interface that can be useful
for controlling machines with human ges-
tures for other purposes.
Problem and challanges
Recognizing signs independent from each other.
• High variance of the signs among different
signers i.e. body and pose variations, dura-
tion variance of the signs etc.
• Multiple modalities of the input information
i.e. illumination changes, occlusion prob-
lems etc.
Solution
1. To be able to represent inputs in more effec-
tive feature space, we employed pretrained
Convolutional Neural Networks (CNNs).
2. To classify generated feature vectors from
CNN we need to interpret sequences
Recurrent Neural Networks (RNNs) used.
Specially Long-Short Term Memory (LSTM)
model.
3. To generalize inputs and be robust to
changes and variations e.g. lightning, per-
son in training regularization methods used.
Dataset
Montalbano Gesture Dataset [1] used in our exper-
iments.
• Video samples are in 640x480 pixels and
recorded with speed of 20 fps.
• 20 different italian hand gestures from 27 dif-
ferent users.
• Dataset includes clothing, lightning, back-
ground changes.
RGB Depth User Index Skeletal
Preprocessing
• RGB and Depth input cropped to 400 by 400 square
images with help of x value from shoulder center
joint point in skeletal data.
• All samples fixed to 40 frame with sampling.
• Median filter applied to both of the inputs
• User index data used as mask to depth input to get
background subtraction. cutting
sequence distribution
Model Architecture and Training
• Convolutional parts from pretrained ResNet-50 [2], and VGG16 [3] models used.
• Global max pooling or global average pooling layers applied to the outputs of pretrained networks.
• Pooling layer outputs connected to Fully-connected (FC) layers.
• We experimented FC layers with ReLu, Sigmoid and ReLu + Batch Normalization configurations.
• RGB and Depth outputs from FC layers concatenated and connected to LSTM.
• Output of LSTM connected to Softmax layer to classify gestures.
• We used Adam optimizer with 1e-3 learning rate.
• We chose batch size as 16 .
• Pretrained models that used as feature extractors and no finetuning applied to them.
• We experimented with L2 norm and Dropout as regularization methods.
• We chose 0.2 lambda constant for L2 norm and 0.5 probability rate for Dropout.
Results
Accuracy Results
VGG-16
Global Average Pooling Global Max Pooling
LSTM ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm
No Regularization 87.27 88.15 87.56 85.45 83.32% 85.39
L2 88.35 86.57 84.60 87.46 85.78 84.01
Dropout 89.24 89.14 87.86 9.08 86.28 88.25
Dropout + L2 89.73 88.25 87.86 5.63 88.55 85.88
Accuracy Results
VGG-16
Global Average Pooling Global Max Pooling
GRU ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm
No Regularization 89.63 87.07 85.39 82.43 87.96 84.5
L2 86.48 81.44 68.41 87.46 81.34 54.59
Dropout 91.51 90.82 89.34 81.84 90.03 89.24
Dropout + L2 91.61 87.27 87.46 86.38 87.46 79.76
Accuracy Results
ResNet-50
Global Average Pooling Global Max Pooling
LSTM ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm
No Regularization 85.49 84.70 87.36 87.86 79.96 90.03
L2 87.17 86.97 85.88 86.08 87.46
Dropout 89.34 89.34 6.12 6.12 93.19
Dropout + L2 90.92 54.89 89.04
Accuracy Results
ResNet-50
Global Average Pooling Global Max Pooling
GRU ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm
No Regularization 89.04 88.15 85.59 90.03 86.87 90.92
L2 85.19 80.75 79.17 82.43 67.82
Dropout 90.92 85.09 27.34 91.91
Dropout + L2 89.26 89.63 82.92 81.54
• Pretrained ResNet-50 and VGG16 networks used as feature extractors.
• We obtained the best results, i.e. 93.19 accuracy, using ResNet-50 with LSTM.
• We have not applied hand or face segmentation to the inputs.
• We purposed simple yet effective architecture.
• We observed that when LSTM model starts memorization GRU model solves the memorization
problem.
Future Work
In this research we wanted to create a base model
classifier for sign recognition task. As a future
work we are planning to use it with our Turkish
Sign Language (TLS) dataset.
• 228 words are recorded with general use and
challenging factors in mind.
• We recorded our videos in 5 differnt modal-
ities (i.e. HD RGB, Depth, Infrared, Skeletal,
User mask) from Microsoft Kinect V2.
• Our signers consist of 6 sign language men-
tors, one deaf person and 5 trained signers.
In total 12 people.
• Each sign is recorded with 10 repetitions,
professional signers also provided 10 repe-
titions wearing black clothes.
• 228 words × ∼ 150 samples ≈ 34.200 sample
videos
The signs in the TSL corpus is divided into 7 cate-
gories:
• 1.group, has no occlusion, crossing or con-
tact with other body parts and includes 63
words.
• 2.group, hands can occlude each other or
contact can occur between the hands and in-
cludes 52 words.
• 3.group, hands can occlude face of the signer
or can touch it and includes 58 words.
• 4.group, contains crossing hands and occlu-
sions can occure. It includes 14 words.
• 5.group, depth information is essential. It in-
cludes 22 words.
• 6.group, compound words consists of words
that has more than one sign in it and in-
cludes 19 words.
• 7.group, similar signs and it is specially chal-
lening because of similar sign patterns.
TSL dataset
Acknowledgement
The research presented is part of a project funded
by TUBITAK (The Scientific and Technological
Research Council of Turkey) under grant number
217E022.

More Related Content

Similar to "Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" Poster

End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
A multilevel security scheme using chaos based encryption and steganography f...
A multilevel security scheme using chaos based encryption and steganography f...A multilevel security scheme using chaos based encryption and steganography f...
A multilevel security scheme using chaos based encryption and steganography f...eSAT Journals
 
A multilevel security scheme using chaos based
A multilevel security scheme using chaos basedA multilevel security scheme using chaos based
A multilevel security scheme using chaos basedeSAT Publishing House
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMLAI2
 
Convolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic handsConvolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic handsMohsen Jafarzadeh
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Fingerprint compression-based-on-...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Fingerprint compression-based-on-...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Fingerprint compression-based-on-...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Fingerprint compression-based-on-...IEEEBEBTECHSTUDENTPROJECTS
 
IRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural NetworkIRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural NetworkIRJET Journal
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRUananth
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEIRJET Journal
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionSai Kiran Kadam
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition Pruthvij Thakar
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Elena Sügis
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detectionNASIM ALAM
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc NetworksTara Hardin
 

Similar to "Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" Poster (20)

Conv-TasNet.pdf
Conv-TasNet.pdfConv-TasNet.pdf
Conv-TasNet.pdf
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
LSDI 2.pptx
LSDI 2.pptxLSDI 2.pptx
LSDI 2.pptx
 
A multilevel security scheme using chaos based encryption and steganography f...
A multilevel security scheme using chaos based encryption and steganography f...A multilevel security scheme using chaos based encryption and steganography f...
A multilevel security scheme using chaos based encryption and steganography f...
 
A multilevel security scheme using chaos based
A multilevel security scheme using chaos basedA multilevel security scheme using chaos based
A multilevel security scheme using chaos based
 
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and ArchitecturesMetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
MetaPerturb: Transferable Regularizer for Heterogeneous Tasks and Architectures
 
SBU072811_short.ppt
SBU072811_short.pptSBU072811_short.ppt
SBU072811_short.ppt
 
Convolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic handsConvolutional neural networks for speech controlled prosthetic hands
Convolutional neural networks for speech controlled prosthetic hands
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Fingerprint compression-based-on-...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Fingerprint compression-based-on-...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Fingerprint compression-based-on-...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Fingerprint compression-based-on-...
 
IRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural NetworkIRJET- Hand Sign Recognition using Convolutional Neural Network
IRJET- Hand Sign Recognition using Convolutional Neural Network
 
AI and Deep Learning
AI and Deep Learning AI and Deep Learning
AI and Deep Learning
 
Recurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRURecurrent Neural Networks, LSTM and GRU
Recurrent Neural Networks, LSTM and GRU
 
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLEMULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
MULTILINGUAL SPEECH TO TEXT CONVERSION USING HUGGING FACE FOR DEAF PEOPLE
 
Deep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker RecognitionDeep Learning for Automatic Speaker Recognition
Deep Learning for Automatic Speaker Recognition
 
Automated Speech Recognition
Automated Speech Recognition Automated Speech Recognition
Automated Speech Recognition
 
Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.Practice discovering biological knowledge using networks approach.
Practice discovering biological knowledge using networks approach.
 
Thesis Giani UIC Slides EN
Thesis Giani UIC Slides ENThesis Giani UIC Slides EN
Thesis Giani UIC Slides EN
 
Hate speech detection
Hate speech detectionHate speech detection
Hate speech detection
 
Speech driven gesture generation with Autoencoders - Project
Speech driven gesture generation with Autoencoders - ProjectSpeech driven gesture generation with Autoencoders - Project
Speech driven gesture generation with Autoencoders - Project
 
Wireless Ad Hoc Networks
Wireless Ad Hoc NetworksWireless Ad Hoc Networks
Wireless Ad Hoc Networks
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...Pooja Nehwal
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxFurkanTasci3
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiSuhani Kapoor
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computationsit20ad004
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...{Pooja:  9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
{Pooja: 9892124323 } Call Girl in Mumbai | Jas Kaur Rate 4500 Free Hotel Del...
 
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
VIP High Class Call Girls Bikaner Anushka 8250192130 Independent Escort Servi...
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
Data Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptxData Science Jobs and Salaries Analysis.pptx
Data Science Jobs and Salaries Analysis.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service AmravatiVIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
VIP Call Girls in Amravati Aarohi 8250192130 Independent Escort Service Amravati
 
Data Warehouse , Data Cube Computation
Data Warehouse   , Data Cube ComputationData Warehouse   , Data Cube Computation
Data Warehouse , Data Cube Computation
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 

"Isolated Sign Recognition with a Siamese Neural Network of RGB and Depth Streams" Poster

  • 1. ISOLATED SIGN RECOGNITION WITH A SIAMESE NEURAL NETWORK OF RGB AND DEPTH STREAMS TUR Anil Osman, YALIM KELES Hacer Ankara University Computer Engineering Department anilosmantur@gmail.com, hkeles@ankara.edu.tr Contact Information Anil Osman TUR Hacer YALIM KELES e-mail: anilosmantur@gmail.com hkeles@ankara.edu.tr LinkedIn: linkedin.com/in/anilosmantur linkedin.com/in/haceryalimkeles Research Gate: researchgate.net/profile/Anil_Tur researchgate.net/profile/Hacer_Keles GitHub: github.com/AnilOsmanTur AU CVML LAB Website: cvml.ankara.edu.tr LinkedIn: linkedin.com/in/aucvmllab GitHub: github.com/aucvmllab References [1] S. Escalera, X. Bar, J. Gonzlez, M.A. Bautista, M. Madadi, M. Reyes, V. Ponce, H.J. Escalante, J. Shotton, I. Guyon, "Chalearn looking at people challenge 2014: Dataset and results". In: ECCV workshop. 2014. [2] K. He, X. Zhang, S. Ren, J. Su, "Deep Residual Learning for Image Recognition". Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. [3] K. Simonyan, A. Zisserman, "Very deep convolutional net- works for large-scale image recognition". arXiv preprint arXiv:1409.1556, 2014. [4] I. Sutskever, O. Vinyals, Q. V. Le, "Sequence to sequence learning with neural networks". In: Advances in neural in- formation processing systems, pp. 3104-3112, 2014. [5] J. Chung, C. Gulcehre, K. Cho, Y. Bengio, "Empirical evalu- ation of gated recurrent neural networks on sequence mod- eling." arXiv preprint arXiv:1412.3555. 2014. [6] TensorFlow machine learning framework. https://www.tensorflow.org. [7] F. Chollet et al., "Keras." https://keras.io, 2015. Abstract Sign recognition is a challenging problem due to high variance of the signs among different signers and multiple modalities of the input information. In addition, the challenges that exist in the action classification problems in computer vision are similar in this domain too, such as variations in illumination and background. In this work, we propose a Siamese Neural Network (SNN) architecture that is used to extract features from the RGB and the depth streams of a sign frame in parallel. We use a pretrained model for the SNN without any finetuning to our training data. We then apply global feature pooling to the depth and color features that the SNN generates and feed the concatenation of the selected features to a recurrent neural network (RNN) to discriminate the signs. We trained our model parameters with the Montalbano dataset and achieve 93.19% test accuracy with ResNet-50 and 91.61% with VGG-16 Network Models. Introduction Motivation • To solve communication problems between the deaf and the hearing communities. • Human-machine interface that can be useful for controlling machines with human ges- tures for other purposes. Problem and challanges Recognizing signs independent from each other. • High variance of the signs among different signers i.e. body and pose variations, dura- tion variance of the signs etc. • Multiple modalities of the input information i.e. illumination changes, occlusion prob- lems etc. Solution 1. To be able to represent inputs in more effec- tive feature space, we employed pretrained Convolutional Neural Networks (CNNs). 2. To classify generated feature vectors from CNN we need to interpret sequences Recurrent Neural Networks (RNNs) used. Specially Long-Short Term Memory (LSTM) model. 3. To generalize inputs and be robust to changes and variations e.g. lightning, per- son in training regularization methods used. Dataset Montalbano Gesture Dataset [1] used in our exper- iments. • Video samples are in 640x480 pixels and recorded with speed of 20 fps. • 20 different italian hand gestures from 27 dif- ferent users. • Dataset includes clothing, lightning, back- ground changes. RGB Depth User Index Skeletal Preprocessing • RGB and Depth input cropped to 400 by 400 square images with help of x value from shoulder center joint point in skeletal data. • All samples fixed to 40 frame with sampling. • Median filter applied to both of the inputs • User index data used as mask to depth input to get background subtraction. cutting sequence distribution Model Architecture and Training • Convolutional parts from pretrained ResNet-50 [2], and VGG16 [3] models used. • Global max pooling or global average pooling layers applied to the outputs of pretrained networks. • Pooling layer outputs connected to Fully-connected (FC) layers. • We experimented FC layers with ReLu, Sigmoid and ReLu + Batch Normalization configurations. • RGB and Depth outputs from FC layers concatenated and connected to LSTM. • Output of LSTM connected to Softmax layer to classify gestures. • We used Adam optimizer with 1e-3 learning rate. • We chose batch size as 16 . • Pretrained models that used as feature extractors and no finetuning applied to them. • We experimented with L2 norm and Dropout as regularization methods. • We chose 0.2 lambda constant for L2 norm and 0.5 probability rate for Dropout. Results Accuracy Results VGG-16 Global Average Pooling Global Max Pooling LSTM ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm No Regularization 87.27 88.15 87.56 85.45 83.32% 85.39 L2 88.35 86.57 84.60 87.46 85.78 84.01 Dropout 89.24 89.14 87.86 9.08 86.28 88.25 Dropout + L2 89.73 88.25 87.86 5.63 88.55 85.88 Accuracy Results VGG-16 Global Average Pooling Global Max Pooling GRU ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm No Regularization 89.63 87.07 85.39 82.43 87.96 84.5 L2 86.48 81.44 68.41 87.46 81.34 54.59 Dropout 91.51 90.82 89.34 81.84 90.03 89.24 Dropout + L2 91.61 87.27 87.46 86.38 87.46 79.76 Accuracy Results ResNet-50 Global Average Pooling Global Max Pooling LSTM ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm No Regularization 85.49 84.70 87.36 87.86 79.96 90.03 L2 87.17 86.97 85.88 86.08 87.46 Dropout 89.34 89.34 6.12 6.12 93.19 Dropout + L2 90.92 54.89 89.04 Accuracy Results ResNet-50 Global Average Pooling Global Max Pooling GRU ReLU Sigmoid ReLU + Batch norm ReLU Sigmoid ReLU + Batch norm No Regularization 89.04 88.15 85.59 90.03 86.87 90.92 L2 85.19 80.75 79.17 82.43 67.82 Dropout 90.92 85.09 27.34 91.91 Dropout + L2 89.26 89.63 82.92 81.54 • Pretrained ResNet-50 and VGG16 networks used as feature extractors. • We obtained the best results, i.e. 93.19 accuracy, using ResNet-50 with LSTM. • We have not applied hand or face segmentation to the inputs. • We purposed simple yet effective architecture. • We observed that when LSTM model starts memorization GRU model solves the memorization problem. Future Work In this research we wanted to create a base model classifier for sign recognition task. As a future work we are planning to use it with our Turkish Sign Language (TLS) dataset. • 228 words are recorded with general use and challenging factors in mind. • We recorded our videos in 5 differnt modal- ities (i.e. HD RGB, Depth, Infrared, Skeletal, User mask) from Microsoft Kinect V2. • Our signers consist of 6 sign language men- tors, one deaf person and 5 trained signers. In total 12 people. • Each sign is recorded with 10 repetitions, professional signers also provided 10 repe- titions wearing black clothes. • 228 words × ∼ 150 samples ≈ 34.200 sample videos The signs in the TSL corpus is divided into 7 cate- gories: • 1.group, has no occlusion, crossing or con- tact with other body parts and includes 63 words. • 2.group, hands can occlude each other or contact can occur between the hands and in- cludes 52 words. • 3.group, hands can occlude face of the signer or can touch it and includes 58 words. • 4.group, contains crossing hands and occlu- sions can occure. It includes 14 words. • 5.group, depth information is essential. It in- cludes 22 words. • 6.group, compound words consists of words that has more than one sign in it and in- cludes 19 words. • 7.group, similar signs and it is specially chal- lening because of similar sign patterns. TSL dataset Acknowledgement The research presented is part of a project funded by TUBITAK (The Scientific and Technological Research Council of Turkey) under grant number 217E022.