SlideShare a Scribd company logo
1 of 94
Download to read offline
Xavier Giro-i-Nieto
@DocXavi
xavier.giro@upc.edu
Towards
Sign Language Translation & Production
Universitat de Barcelona
April 1, 2022
PhD candidate Amanda Duarte
2
2022
2018
2016 2020
CVPR
Caffe2 Research
Award 2017
Facebook
Research grant.
Amanda
Duarte
CVPR
Outline
3
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Open Challenges
Conclusion
Motivation: Accessibility
4
“World Report on Hearing”. World Health Organization 2021.
60.5M people in
Severe > Complete hearing loss
Motivation: Accessibility
5
Shelly Shadha, “Launch of the World Report on Hearing”. World Health Organization 2021.
Classic Motivation: Accessibility to basic services
6
“World Report on Hearing”.
World Health Organization 2021.
https://whereistheinterpreter.com/
#whereistheinterpreter
Motivation: Accessibility
7
Amit Moryossef, “Google Translate for Sign Language”. 2021. [talk] [code]
Motivation: Accessibility
8
Motivation: Accessibility
9
Motivation: Learning SL from Personal Assistants
10
Computer Human
Teaching
that scales
Interaction
Interaction
Human
Motivation: Multimodal translation in a Metaverse
11
Helping Hands, “Using ASL in Virtual Reality (VRChat)” (2020)
12
Spoken
language
Sign language
Motivation: Multimodal translation in a Metaverse
Outline
13
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Open Challenges
Conclusion
A crash course on Sign Languages (SL)
Cultural diversity of sign languages, similar to spoken languages
○ American (ASL), British (BSL), German (GSL), Chinese (CSL)… sign languages.
14
Irish Sign Language (ISL) Catalan Sign Language (LSC)
A crash course on Sign Languages (SL)
Sign languages are NOT a one-to-one mapping from spoken languages.
15
Look-Up
Table
Hi, I’m Amelia and I’m
going to talk to you
about how to remove
gum from hair.
Sign Language
(video)
Spoken Language
(transcription)
��🏼
A crash course on Sign Languages (SL)
There exist a textual transcription method named “glosses”.
16
HI, ME FS-AMELIA WILL
EXPLAIN HOW REMOVE
GUM FROM YOUR HAIR
Hi, I’m Amelia and I’m
going to talk to you about
how to remove gum from
hair.
Spoken Language
(transcription)
Sign Language
(transcription)
A crash course on Sign Languages (SL)
● Manual features:
○ Handshape
○ Palm
● Non-manual features
○ Head (nod / shake / tilt)
○ Mouth
○ Eyebrows
○ Cheeks
○ Facial grammar (or expressions)
○ Body position
...orientation, movement, location.
17
Stokoe Jr, William C. "Sign language structure: An outline of the visual communication systems of the American deaf." Journal of
deaf studies and deaf education (2005).
Figure: Arizona State University
A crash course on Sign Languages (SL)
SLs use persistent spatial grounding (eg. by pointing & placing) !
18
Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996).
“Right along here…” ...immobile entity is
located here,
A crash course on Sign Languages (SL)
SLs use persistent spatial grounding (eg. by pointing & placing) !
19
Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996).
“Not far and to the
right of,
...tall, vertical entity at this place.
Outline
20
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Challenges
Conclusion
Sign-to-Spoken Language Tasks
21
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
GIPHY/SIGNN WITH ROBERT
Isolated SL Recognition
Continuous SL Recognition
Finger-spelling
HI, ME FS-AMELIA WILL EXPLAIN
HOW REMOVE GUM FROM YOUR
HAIR
“I”
A, B, C, D...
Sign-to-Spoken Language Tasks
22
SL Translation Hi, I’m Amelia and I’m going to talk to you
about how to remove gum from hair.
Sign-Spoken Language Tasks
SL Production
SL Translation
Sign Language
(video)
23
Spoken Language
(transcription)
Hi, I’m Amelia and
I’m going to talk
to you about how
to remove gum
from hair.
Neural Machine Translation
24
Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." NeurIPS 2014.
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase
representations using RNN encoder-decoder for statistical machine translation." EMNLP 2014.
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Dia duit, is mise
Amelia agus beidh
mé ag caint leat faoi
conas guma a bhaint
de ghruaig.
Automatic Speech Recognition (ASR)
25
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with recurrent neural networks." ICML 2014.
#LAS Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary conversational speech
recognition." ICASSP 2016.
Image Captioning
26
Encoder Decoder
Representation
A group of people
shopping at ann
outdoor market.
Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015.
Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015.
Neural Sign Language Translation
27
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Translation
28
Camgoz, Necati Cihan, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden.
"Neural sign language translation." CVPR 2018.
Neural Sign Language Translation
29
Camgoz, Necati Cihan, Oscar Koller, Simon Hadfield, and Richard Bowden. "Sign language
transformers: Joint end-to-end sign language recognition and translation." CVPR 2020.
Neural Sign Language Production
30
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Production
31
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Mixed SIGNals: Sign Language Production via
a Mixture of Motion Primitives." ICCV 2021.
Neural Sign Language Production
32
Encoder Decoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
Neural Sign Language Production
33
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end
sign language production." ECCV 2020.
Neural Sign Language Production
34
Stoll, Stephanie, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. "Text2Sign: Towards sign
language production using neural machine translation and generative adversarial networks." IJCV 2020.
Neural Sign Language Production
35
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Signing at Scale: Learning to Co-Articulate
Signs for Large-Scale Photo-Realistic Sign Language Production." CVPR 2022.
Outline
36
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Challenges
Conclusion
Parallel corpus
37
Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning
phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
Continuous Sign Language Datasets
38
The How2Sign dataset
39
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
The How2Sign dataset
40
Multi-view RGB videos RGB-D videos
Body-face-hands keypoints
2D keypoints estimation from OpenPose [2]
How2 dataset [1]
Speech Signal
English Transcription
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
Instructional videos
Multi-view VGA and HD videos [3]
Multi-view recordings (only for a subset)
3D keypoints
estimation
Gloss Annotation
HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Continuous Sign Language Datasets
41
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
The largest dataset in ASL
42
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
43
Built on top of How2
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
Built on top of How2
Spoken Language
(speech)
SL Production
SL Translation
Sign Language
(video)
44
Spoken Language
(transcription)
Hi, I’m Amelia and I’m going to
talk to you about how to
remove gum from hair.
Synthesis
ASR
#How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for
multimodal language understanding." arXiv 2018.
Built on top of How2
How2 dataset [1]
Speech Signal
English Transcription
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
Instructional videos
English Speech
Speech track available for end-to-end English to ASL.
English Transcriptions
Automatically generated subtitles aligned at the
sentence level.
English to Brazilian Translations
Allows multilingual research.
45
#How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for
multimodal language understanding." arXiv 2018.
46
Built on top of How2
Front+side RGB, Front Depth & Multi-view RGB
47
Green Studio
Multi-view RGB videos
RGB-D videos
Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S.,
Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In:
ICCV, 2015.
Panoptic Studio
Multi-view recordings (only for a subset)
Multi-view VGA and HD videos
48
2D & 3D pose estimation
49
Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X.
How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
2D & 3D pose estimation
Multi-view RGB videos
Body-face-hands keypoints
2D keypoints estimation from OpenPose [1]
Multi-view recordings (only for a subset)
3D keypoints estimation [2]
[1] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" in TPAMI, 2019.
[2] Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015
Multi-view VGA and HD videos
50
51
Dataset statistics
Dataset hierarchy
52
Camera view
Recording
Video
Clip
Frame
Green studio: Frontal or side
Panoptic: Multi-view
ASL Gloss
English transcription
RGB, Depth
Openpose
Category
Signer
Studio
Green studio
Panoptic (multi-view)
Dataset statistics
53
Dataset statistics
Clips length Sentences length
54
Outline
55
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Application: Human motion transfer
Challenges
Conclusion
Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video
generation from 2D poses." ECCV 2020 SLRTP Workshop.
Application: Human motion transfer
56
2D Pose
estimation
[Openpose]
GAN-
generated
[Everybody
dance now]
Application: Human motion transfer
57
Application: Human motion transfer
58
59
“Choose one category”
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Classification
accuracy
60
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Mean Opinion
Score
“How well could you understand the video?”
61
“Translate the ASL signs into written English.”
Can ASL signers understand our generated videos ?
Skeleton
GAN-generated
Outline
62
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Application: Sign Language Video Retrieval
Challenges
Conclusion
Duarte, Amanda, Samuel Albanie, Xavier Giró-i-Nieto, and Gül Varol. "Sign Language Video Retrieval with Free-Form
Textual Queries." arXiv preprint arXiv:2201.02495 (2022).
Sign Language Video Retrieval
63
Encoder Encoder
Representation
Hi, I’m Amelia and
I’m going to talk to
you about how to
remove gum from
hair.
64
Sign Language Video Retrieval
65
Sign Language Video Retrieval
Challenge: How to train
without annotated datasets
for continuous SL ?
Approach: Produce
pseudo-annotations from
How2 + How2Sign.
66
Sign Spotting: Mouthing (M)
Albanie, S., Varol, G., Momeni, L., Afouras, T., Chung, J. S., Fox, N., & Zisserman, A.. BSL-1K: Scaling up co-articulated sign language recognition using
mouthing cues. ECCV 2020.
67
Sign Spotting: Visual Dictionaries (D)
Momeni, L., Varol, G., Albanie, S., Afouras, T., & Zisserman, A. Watch, read and lookup: learning to spot signs from multiple supervisors. ACCV 2020.
68
Sign Video Embeddings are learned with automatic annotations from:
1. Sign spotting: Mouthing (M)
2. Sign spotting from a visual dictionaries WLASL & MSASL (Di
)
Sign Language Video Retrieval
69
Effect of retraining an
I3D backbone with
automatic annotations.
1079
words
1887
words
Sign Language Video Retrieval
70
Sign Language Video Retrieval
71
Sign Language Video Retrieval
72
Top Hit #1 (query) Top Hit #2
Sign Language Video Retrieval
More qualitative results.
Top Hit #3
Video category: 1 (Sports and Fitness)
"Then bring your feet together and by
this time you should be able to have built
up enough strength to do a full push up."
Video category: 1 (Sports and Fitness)
"A proper cardio vascular program should
incorporate various aspects of training
through intensity, frequency, as well as
time..."
Video category: 1 (Sports and Fitness)
"Then when you get strong, then you can
start picking up your feet."
Outline
73
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Open Challenges
Conclusion
Duarte, Amanda, Samuel Albanie, Xavier Giró-i-Nieto, and Gül Varol. "Sign Language Video Retrieval with Free-Form
Textual Queries." arXiv preprint arXiv:2201.02495 (2022).
Challenges
74
Computer Vision
Speech
NLP
Training Data
Challenges in Computer Vision
75
Off-the-shelf pose detectors and generators struggle with hands.
76
��
Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time
hand shape and motion capture using multi-modal data." CVPR 2020.
Challenges in Computer Vision
77
��
Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of
part experts for whole-body 3d pose estimation in the wild." ECCV 2020.
Challenges in Computer Vision
78
��
Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language
production." ECCV 2020.
Challenges in Computer Vision
79
��
Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from
conversational gesture body dynamics." CVPR 2021.
Challenges in Computer Vision
Challenges
80
Computer Vision
Speech
NLP
Training Data
Challenges in NLP
Sign Languages are:
81
🤔
(Very) low-resource
languages…
...in a (very) high
dimensional space (video).
��🏼
��🏼
Challenges in NLP
82
Figure: TensorFlow tutorial
Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning
research 3, no. Feb (2003): 1137-1155.
🤔
What are “language
models” in sign
language ?
Challenges in NLP
83
How to transfer from
large pre-trained
(“foundation”) models ?
#GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models
are few-shot learners. NeurIPS 2020 (best paper award).
Source: [OpenAI API]
English: My name is Barbara.
ASL: ME NAME fs-B-A-R-B-A-R-A.
English: Is he a teacher?
ASL: HE TEACHER HE
English: Amir is tall.
ASL: fs-A-M-I-R, HE TALL HE
English: I’m not sad.
ASL: ME SAD ME 🤔
Challenges
84
Computer Vision
Speech
NLP
Training Data
Challenges in Speech Translation
85
Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech
translation." arXiv preprint arXiv:2107.08661 (2021).
Speech Video
Speech Speech
End-to-end End-to-end
🤔
Challenges
86
Computer Vision
Speech
NLP
Training Data
Challenges in Training Data
87
Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet]
X
88
Challenges in Training Data: Pseudo-glosses
Yin, Kayo, and Jesse Read. "Better Sign Language Translation with
STMC-Transformer." COLING 2020. [talk]
Moryossef, Amit, Kayo Yin, Graham Neubig, and Yoav Goldberg. "Data
Augmentation for Sign Language Gloss Translation." arXiv 2021.
Generation of gloss pseudo-labels by training a transformer.
Moreno D, Duarte A, Costa-jussà MR, Giró-i-Nieto X.
English to ASL Translator for Speech2Signs. UPC 2018.
89
Challenges in Training Data: Self-supervision
#SignBERT Hu, Hezhen, Weichao Zhao, Wengang Zhou, Yuechen Wang, and Houqiang Li. "SignBERT: Pre-Training of Hand-Model-Aware
Representation for Sign Language Recognition." ICCV 2021.
Outline
90
Motivation
A crash course on sign languages (SL)
State of the art
How2Sign dataset
Applications
Open Challenges
Conclusion
91
Conclusion: Speech2Signs (and Signs2Speech)
End-to-end translation & production
Hi, I’m Amelia and I’m going
to talk to you about how to
remove gum from hair.
HI, ME FS-AMELIA WILL
EXPLAIN HOW REMOVE
GUM FROM YOUR HAIR
Speech Language Gloss [1] Sign transcription [2] Video
3D Poses 2D Poses Segments [3]
Multiple vision, natural language & speech challenges for a societally impactful task.
[1] Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020.
[2] Hanke, Thomas. "HamNoSys-representing sign language data in language resources and language processing contexts." In LREC, vol. 4, pp. 1-6. 2004.
[3] Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional networks." ICASSP 2021.
Fellow researchers
92
Shruti
Palaskar
Deepti
Ghadiyaram
Florian
Metze
Francesc
Moreno
Jordi
Torres
Kevin
McGuinness
Gül
Varol
Samuel
Albanie
Marta R.
Costa-jussà
Kenneth
DeHaan
93
Benet
Oriol
Jordi
Aguilar
Cayetana
López
Lucas
Ventura
Sandra
Roca
Daniel
Moreno
Janna
Escur
Mireia
Hernández
Peter
Muschick
Pol
Pérez
Görkem
Camli
Jordi
López
Gerard
Gállego
Current & former students
Amanda
Duarte
Laia
Tarrés
Cristina
Puntí
Andrea
Iturralde
Maram A.
Mohamed
Álvaro
Budria
Patricia
Cabot
Divya
Chhipani
Javier
Sanz
Thank you
{Thank You}
Supported by
Facebook AI

More Related Content

What's hot

Denoising autoencoder by Harish.R
Denoising autoencoder by Harish.RDenoising autoencoder by Harish.R
Denoising autoencoder by Harish.RHARISH R
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningGanesh Satpute
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing Sandeep Wakchaure
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Akash Goel
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxColleen Farrelly
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language modelJiWenKim
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Suraj Aavula
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning AlgorithmsHichem Felouat
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxMrUnknown820784
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023HyunJoon Jung
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 

What's hot (20)

Denoising autoencoder by Harish.R
Denoising autoencoder by Harish.RDenoising autoencoder by Harish.R
Denoising autoencoder by Harish.R
 
Bert
BertBert
Bert
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Spell checker using Natural language processing
Spell checker using Natural language processing Spell checker using Natural language processing
Spell checker using Natural language processing
 
Ai black box
Ai black boxAi black box
Ai black box
 
LLMs Bootcamp
LLMs BootcampLLMs Bootcamp
LLMs Bootcamp
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders Intro to Deep learning - Autoencoders
Intro to Deep learning - Autoencoders
 
Heuristic search
Heuristic searchHeuristic search
Heuristic search
 
Generative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptxGenerative AI, WiDS 2023.pptx
Generative AI, WiDS 2023.pptx
 
Membrane computing
Membrane computingMembrane computing
Membrane computing
 
Pre trained language model
Pre trained language modelPre trained language model
Pre trained language model
 
AI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptxAI_Session 20 Horn clause.pptx
AI_Session 20 Horn clause.pptx
 
Convolution Neural Network (CNN)
Convolution Neural Network (CNN)Convolution Neural Network (CNN)
Convolution Neural Network (CNN)
 
Machine Learning (Classification Models)
Machine Learning (Classification Models)Machine Learning (Classification Models)
Machine Learning (Classification Models)
 
Machine Learning Algorithms
Machine Learning AlgorithmsMachine Learning Algorithms
Machine Learning Algorithms
 
Image captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptxImage captioning using DL and NLP.pptx
Image captioning using DL and NLP.pptx
 
Intro to ML.pptx
Intro to ML.pptxIntro to ML.pptx
Intro to ML.pptx
 
Landscape of AI/ML in 2023
Landscape of AI/ML in 2023Landscape of AI/ML in 2023
Landscape of AI/ML in 2023
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 

Similar to Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)Universitat Politècnica de Catalunya
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)hen_drik
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...Amit Sheth
 
electronics-11-01780-v2.pdf
electronics-11-01780-v2.pdfelectronics-11-01780-v2.pdf
electronics-11-01780-v2.pdfNaveenkushwaha18
 
Meta design and social creativity
Meta design and social creativityMeta design and social creativity
Meta design and social creativityJohn Thomas
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Universitat Politècnica de Catalunya
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya
 
SignReco: Sign Language Translator
SignReco: Sign Language TranslatorSignReco: Sign Language Translator
SignReco: Sign Language TranslatorIRJET Journal
 
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...Takashi Iba
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, GoogleISPMAIndia
 
Language and Communication: Two Sides of a Coin
Language and Communication: Two Sides of a CoinLanguage and Communication: Two Sides of a Coin
Language and Communication: Two Sides of a CoinDilip Barad
 

Similar to Towards Sign Language Translation & Production | Xavier Giro-i-Nieto (20)

Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
One Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and VisionOne Perceptron to Rule Them All: Language and Vision
One Perceptron to Rule Them All: Language and Vision
 
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
One Perceptron to Rule Them All (Re-Work Deep Learning Summit, London 2017)
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Speech and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Hacking Human Language (PyData London)
Hacking Human Language (PyData London)Hacking Human Language (PyData London)
Hacking Human Language (PyData London)
 
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
Self-supervised Audiovisual Learning - Xavier Giro - UPC Barcelona 2019
 
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...
From NLP to NLU: Why we need varied, comprehensive, and stratified knowledge,...
 
electronics-11-01780-v2.pdf
electronics-11-01780-v2.pdfelectronics-11-01780-v2.pdf
electronics-11-01780-v2.pdf
 
Meta design and social creativity
Meta design and social creativityMeta design and social creativity
Meta design and social creativity
 
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
Multimodal Deep Learning (D4L4 Deep Learning for Speech and Language UPC 2017)
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
 
SignReco: Sign Language Translator
SignReco: Sign Language TranslatorSignReco: Sign Language Translator
SignReco: Sign Language Translator
 
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...
Evolution of Pattern Languages: Designing Human Actions, Dialogue, & Films (P...
 
Once Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for MultimediaOnce Perceptron to Rule Them all: Deep Learning for Multimedia
Once Perceptron to Rule Them all: Deep Learning for Multimedia
 
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
Deep Language and Vision - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
"The Transformative Power of AI and Open Challenges" by Dr. Manish Gupta, Google
 
Sicsds
SicsdsSicsds
Sicsds
 
Language and Communication: Two Sides of a Coin
Language and Communication: Two Sides of a CoinLanguage and Communication: Two Sides of a Coin
Language and Communication: Two Sides of a Coin
 

More from Universitat Politècnica de Catalunya

The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...Universitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 

Recently uploaded

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 

Recently uploaded (20)

Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto

  • 1. Xavier Giro-i-Nieto @DocXavi xavier.giro@upc.edu Towards Sign Language Translation & Production Universitat de Barcelona April 1, 2022
  • 2. PhD candidate Amanda Duarte 2 2022 2018 2016 2020 CVPR Caffe2 Research Award 2017 Facebook Research grant. Amanda Duarte CVPR
  • 3. Outline 3 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Open Challenges Conclusion
  • 4. Motivation: Accessibility 4 “World Report on Hearing”. World Health Organization 2021. 60.5M people in Severe > Complete hearing loss
  • 5. Motivation: Accessibility 5 Shelly Shadha, “Launch of the World Report on Hearing”. World Health Organization 2021.
  • 6. Classic Motivation: Accessibility to basic services 6 “World Report on Hearing”. World Health Organization 2021. https://whereistheinterpreter.com/ #whereistheinterpreter
  • 7. Motivation: Accessibility 7 Amit Moryossef, “Google Translate for Sign Language”. 2021. [talk] [code]
  • 10. Motivation: Learning SL from Personal Assistants 10 Computer Human Teaching that scales Interaction Interaction Human
  • 11. Motivation: Multimodal translation in a Metaverse 11 Helping Hands, “Using ASL in Virtual Reality (VRChat)” (2020)
  • 13. Outline 13 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Open Challenges Conclusion
  • 14. A crash course on Sign Languages (SL) Cultural diversity of sign languages, similar to spoken languages ○ American (ASL), British (BSL), German (GSL), Chinese (CSL)… sign languages. 14 Irish Sign Language (ISL) Catalan Sign Language (LSC)
  • 15. A crash course on Sign Languages (SL) Sign languages are NOT a one-to-one mapping from spoken languages. 15 Look-Up Table Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Sign Language (video) Spoken Language (transcription) ��🏼
  • 16. A crash course on Sign Languages (SL) There exist a textual transcription method named “glosses”. 16 HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Spoken Language (transcription) Sign Language (transcription)
  • 17. A crash course on Sign Languages (SL) ● Manual features: ○ Handshape ○ Palm ● Non-manual features ○ Head (nod / shake / tilt) ○ Mouth ○ Eyebrows ○ Cheeks ○ Facial grammar (or expressions) ○ Body position ...orientation, movement, location. 17 Stokoe Jr, William C. "Sign language structure: An outline of the visual communication systems of the American deaf." Journal of deaf studies and deaf education (2005). Figure: Arizona State University
  • 18. A crash course on Sign Languages (SL) SLs use persistent spatial grounding (eg. by pointing & placing) ! 18 Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996). “Right along here…” ...immobile entity is located here,
  • 19. A crash course on Sign Languages (SL) SLs use persistent spatial grounding (eg. by pointing & placing) ! 19 Liddell, Scott K. "Spatial representations in discourse: Comparing spoken and signed language." Lingua (1996). “Not far and to the right of, ...tall, vertical entity at this place.
  • 20. Outline 20 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Challenges Conclusion
  • 21. Sign-to-Spoken Language Tasks 21 SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. GIPHY/SIGNN WITH ROBERT Isolated SL Recognition Continuous SL Recognition Finger-spelling HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR “I” A, B, C, D...
  • 22. Sign-to-Spoken Language Tasks 22 SL Translation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 23. Sign-Spoken Language Tasks SL Production SL Translation Sign Language (video) 23 Spoken Language (transcription) Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 24. Neural Machine Translation 24 Sutskever, Ilya, Oriol Vinyals, and Quoc V. Le. "Sequence to sequence learning with neural networks." NeurIPS 2014. Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." EMNLP 2014. Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Dia duit, is mise Amelia agus beidh mé ag caint leat faoi conas guma a bhaint de ghruaig.
  • 25. Automatic Speech Recognition (ASR) 25 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Graves, Alex, and Navdeep Jaitly. "Towards end-to-end speech recognition with recurrent neural networks." ICML 2014. #LAS Chan, William, Navdeep Jaitly, Quoc Le, and Oriol Vinyals. "Listen, attend and spell: A neural network for large vocabulary conversational speech recognition." ICASSP 2016.
  • 26. Image Captioning 26 Encoder Decoder Representation A group of people shopping at ann outdoor market. Vinyals, Oriol, Alexander Toshev, Samy Bengio, and Dumitru Erhan. "Show and tell: A neural image caption generator." CVPR 2015. Karpathy, Andrej, and Li Fei-Fei. "Deep visual-semantic alignments for generating image descriptions." CVPR 2015.
  • 27. Neural Sign Language Translation 27 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 28. Neural Sign Language Translation 28 Camgoz, Necati Cihan, Simon Hadfield, Oscar Koller, Hermann Ney, and Richard Bowden. "Neural sign language translation." CVPR 2018.
  • 29. Neural Sign Language Translation 29 Camgoz, Necati Cihan, Oscar Koller, Simon Hadfield, and Richard Bowden. "Sign language transformers: Joint end-to-end sign language recognition and translation." CVPR 2020.
  • 30. Neural Sign Language Production 30 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 31. Neural Sign Language Production 31 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Mixed SIGNals: Sign Language Production via a Mixture of Motion Primitives." ICCV 2021.
  • 32. Neural Sign Language Production 32 Encoder Decoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 33. Neural Sign Language Production 33 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." ECCV 2020.
  • 34. Neural Sign Language Production 34 Stoll, Stephanie, Necati Cihan Camgoz, Simon Hadfield, and Richard Bowden. "Text2Sign: Towards sign language production using neural machine translation and generative adversarial networks." IJCV 2020.
  • 35. Neural Sign Language Production 35 Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Signing at Scale: Learning to Co-Articulate Signs for Large-Scale Photo-Realistic Sign Language Production." CVPR 2022.
  • 36. Outline 36 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Challenges Conclusion
  • 37. Parallel corpus 37 Cho, Kyunghyun, Bart Van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. "Learning phrase representations using RNN encoder-decoder for statistical machine translation." AMNLP 2014.
  • 39. The How2Sign dataset 39 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 40. The How2Sign dataset 40 Multi-view RGB videos RGB-D videos Body-face-hands keypoints 2D keypoints estimation from OpenPose [2] How2 dataset [1] Speech Signal English Transcription Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Instructional videos Multi-view VGA and HD videos [3] Multi-view recordings (only for a subset) 3D keypoints estimation Gloss Annotation HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 41. Continuous Sign Language Datasets 41 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 42. The largest dataset in ASL 42 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 43. 43 Built on top of How2 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 44. Built on top of How2 Spoken Language (speech) SL Production SL Translation Sign Language (video) 44 Spoken Language (transcription) Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Synthesis ASR #How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for multimodal language understanding." arXiv 2018.
  • 45. Built on top of How2 How2 dataset [1] Speech Signal English Transcription Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. Instructional videos English Speech Speech track available for end-to-end English to ASL. English Transcriptions Automatically generated subtitles aligned at the sentence level. English to Brazilian Translations Allows multilingual research. 45 #How2 Sanabria, Ramon, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, and Florian Metze. "How2: a large-scale dataset for multimodal language understanding." arXiv 2018.
  • 46. 46 Built on top of How2
  • 47. Front+side RGB, Front Depth & Multi-view RGB 47
  • 48. Green Studio Multi-view RGB videos RGB-D videos Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015. Panoptic Studio Multi-view recordings (only for a subset) Multi-view VGA and HD videos 48
  • 49. 2D & 3D pose estimation 49 Duarte, A., Palaskar, S., Ventura, L., Ghadiyaram, D., DeHaan, K., Metze, F., ... & Giro-i-Nieto, X. How2Sign: a large-scale multimodal dataset for continuous American sign language. CVPR 2021.
  • 50. 2D & 3D pose estimation Multi-view RGB videos Body-face-hands keypoints 2D keypoints estimation from OpenPose [1] Multi-view recordings (only for a subset) 3D keypoints estimation [2] [1] Z. Cao, G. Hidalgo Martinez, T. Simon, S. Wei and Y. A. Sheikh, "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" in TPAMI, 2019. [2] Joo, H., Liu, H., Tan, L., Gui, L., Nabbe, B., Matthews, I., Kanade, T., Nobuhara,S., Sheikh, Y.: Panoptic studio: A massively multiview system for social motioncapture. In: ICCV, 2015 Multi-view VGA and HD videos 50
  • 52. Dataset hierarchy 52 Camera view Recording Video Clip Frame Green studio: Frontal or side Panoptic: Multi-view ASL Gloss English transcription RGB, Depth Openpose Category Signer Studio Green studio Panoptic (multi-view)
  • 54. Dataset statistics Clips length Sentences length 54
  • 55. Outline 55 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Application: Human motion transfer Challenges Conclusion Ventura, Lucas, Amanda Duarte, and Xavier Giró-i-Nieto. "Can everybody sign now? Exploring sign language video generation from 2D poses." ECCV 2020 SLRTP Workshop.
  • 56. Application: Human motion transfer 56 2D Pose estimation [Openpose] GAN- generated [Everybody dance now]
  • 59. 59 “Choose one category” Can ASL signers understand our generated videos ? Skeleton GAN-generated Classification accuracy
  • 60. 60 Can ASL signers understand our generated videos ? Skeleton GAN-generated Mean Opinion Score “How well could you understand the video?”
  • 61. 61 “Translate the ASL signs into written English.” Can ASL signers understand our generated videos ? Skeleton GAN-generated
  • 62. Outline 62 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Application: Sign Language Video Retrieval Challenges Conclusion Duarte, Amanda, Samuel Albanie, Xavier Giró-i-Nieto, and Gül Varol. "Sign Language Video Retrieval with Free-Form Textual Queries." arXiv preprint arXiv:2201.02495 (2022).
  • 63. Sign Language Video Retrieval 63 Encoder Encoder Representation Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair.
  • 65. 65 Sign Language Video Retrieval Challenge: How to train without annotated datasets for continuous SL ? Approach: Produce pseudo-annotations from How2 + How2Sign.
  • 66. 66 Sign Spotting: Mouthing (M) Albanie, S., Varol, G., Momeni, L., Afouras, T., Chung, J. S., Fox, N., & Zisserman, A.. BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues. ECCV 2020.
  • 67. 67 Sign Spotting: Visual Dictionaries (D) Momeni, L., Varol, G., Albanie, S., Afouras, T., & Zisserman, A. Watch, read and lookup: learning to spot signs from multiple supervisors. ACCV 2020.
  • 68. 68 Sign Video Embeddings are learned with automatic annotations from: 1. Sign spotting: Mouthing (M) 2. Sign spotting from a visual dictionaries WLASL & MSASL (Di ) Sign Language Video Retrieval
  • 69. 69 Effect of retraining an I3D backbone with automatic annotations. 1079 words 1887 words Sign Language Video Retrieval
  • 72. 72 Top Hit #1 (query) Top Hit #2 Sign Language Video Retrieval More qualitative results. Top Hit #3 Video category: 1 (Sports and Fitness) "Then bring your feet together and by this time you should be able to have built up enough strength to do a full push up." Video category: 1 (Sports and Fitness) "A proper cardio vascular program should incorporate various aspects of training through intensity, frequency, as well as time..." Video category: 1 (Sports and Fitness) "Then when you get strong, then you can start picking up your feet."
  • 73. Outline 73 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Open Challenges Conclusion Duarte, Amanda, Samuel Albanie, Xavier Giró-i-Nieto, and Gül Varol. "Sign Language Video Retrieval with Free-Form Textual Queries." arXiv preprint arXiv:2201.02495 (2022).
  • 75. Challenges in Computer Vision 75 Off-the-shelf pose detectors and generators struggle with hands.
  • 76. 76 �� Zhou, Yuxiao, Marc Habermann, Weipeng Xu, Ikhsanul Habibie, Christian Theobalt, and Feng Xu. "Monocular real-time hand shape and motion capture using multi-modal data." CVPR 2020. Challenges in Computer Vision
  • 77. 77 �� Weinzaepfel, Philippe, Romain Brégier, Hadrien Combaluzier, Vincent Leroy, and Grégory Rogez. "Dope: Distillation of part experts for whole-body 3d pose estimation in the wild." ECCV 2020. Challenges in Computer Vision
  • 78. 78 �� Saunders, Ben, Necati Cihan Camgoz, and Richard Bowden. "Progressive transformers for end-to-end sign language production." ECCV 2020. Challenges in Computer Vision
  • 79. 79 �� Ng, Evonne, Shiry Ginosar, Trevor Darrell, and Hanbyul Joo. "Body2hands: Learning to infer 3d hands from conversational gesture body dynamics." CVPR 2021. Challenges in Computer Vision
  • 81. Challenges in NLP Sign Languages are: 81 🤔 (Very) low-resource languages… ...in a (very) high dimensional space (video). ��🏼 ��🏼
  • 82. Challenges in NLP 82 Figure: TensorFlow tutorial Bengio, Yoshua, Réjean Ducharme, Pascal Vincent, and Christian Jauvin. "A neural probabilistic language model." Journal of machine learning research 3, no. Feb (2003): 1137-1155. 🤔 What are “language models” in sign language ?
  • 83. Challenges in NLP 83 How to transfer from large pre-trained (“foundation”) models ? #GPT-3 Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., ... & Agarwal, S. Language models are few-shot learners. NeurIPS 2020 (best paper award). Source: [OpenAI API] English: My name is Barbara. ASL: ME NAME fs-B-A-R-B-A-R-A. English: Is he a teacher? ASL: HE TEACHER HE English: Amir is tall. ASL: fs-A-M-I-R, HE TALL HE English: I’m not sad. ASL: ME SAD ME 🤔
  • 85. Challenges in Speech Translation 85 Jia, Ye, Michelle Tadmor Ramanovich, Tal Remez, and Roi Pomerantz. "Translatotron 2: Robust direct speech-to-speech translation." arXiv preprint arXiv:2107.08661 (2021). Speech Video Speech Speech End-to-end End-to-end 🤔
  • 87. Challenges in Training Data 87 Damen, Dima, and Michael Wray. "Supervision Levels Scale (SLS)." arXiv (2020). [tweet] X
  • 88. 88 Challenges in Training Data: Pseudo-glosses Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020. [talk] Moryossef, Amit, Kayo Yin, Graham Neubig, and Yoav Goldberg. "Data Augmentation for Sign Language Gloss Translation." arXiv 2021. Generation of gloss pseudo-labels by training a transformer. Moreno D, Duarte A, Costa-jussà MR, Giró-i-Nieto X. English to ASL Translator for Speech2Signs. UPC 2018.
  • 89. 89 Challenges in Training Data: Self-supervision #SignBERT Hu, Hezhen, Weichao Zhao, Wengang Zhou, Yuechen Wang, and Houqiang Li. "SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition." ICCV 2021.
  • 90. Outline 90 Motivation A crash course on sign languages (SL) State of the art How2Sign dataset Applications Open Challenges Conclusion
  • 91. 91 Conclusion: Speech2Signs (and Signs2Speech) End-to-end translation & production Hi, I’m Amelia and I’m going to talk to you about how to remove gum from hair. HI, ME FS-AMELIA WILL EXPLAIN HOW REMOVE GUM FROM YOUR HAIR Speech Language Gloss [1] Sign transcription [2] Video 3D Poses 2D Poses Segments [3] Multiple vision, natural language & speech challenges for a societally impactful task. [1] Yin, Kayo, and Jesse Read. "Better Sign Language Translation with STMC-Transformer." COLING 2020. [2] Hanke, Thomas. "HamNoSys-representing sign language data in language resources and language processing contexts." In LREC, vol. 4, pp. 1-6. 2004. [3] Renz, Katrin, Nicolaj C. Stache, Samuel Albanie, and Gül Varol. "Sign language segmentation with temporal convolutional networks." ICASSP 2021.
  • 93. 93 Benet Oriol Jordi Aguilar Cayetana López Lucas Ventura Sandra Roca Daniel Moreno Janna Escur Mireia Hernández Peter Muschick Pol Pérez Görkem Camli Jordi López Gerard Gállego Current & former students Amanda Duarte Laia Tarrés Cristina Puntí Andrea Iturralde Maram A. Mohamed Álvaro Budria Patricia Cabot Divya Chhipani Javier Sanz