SlideShare a Scribd company logo
1 of 53
Download to read offline
The Transformer
Lecture 20
Xavier Giro-i-Nieto
Associate Professor
Universitat Politecnica de Catalunya
@DocXavi
xavier.giro@upc.edu
2
Video-lecture
3
Acknowledgments
Marta R. Costa-jussà
Associate Professor
Universitat Politècnica de Catalunya
Carlos Escolano
PhD Candidate
Universitat Politècnica de Catalunya
Gerard I. Gállego
PhD Student
Universitat Politècnica de Catalunya
gerard.ion.gallego@upc.edu
@geiongallego
Outline
1. Reminders
4
5
Reminder
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
6
Reminder
Attention is a mechanism to compute a context vector (c) for a query (Q) as a
weighted sum of values (V).
Figure: Nikhil Shah, “Attention? An Other Perspective! [Part 1]” (2020)
7
Reminder
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
8
Reminder: Seq2Seq with Cross-Attention
Slide concept: Abigail See, Matthew Lamm (Stanford CS224N), 2020
In this case, cross-attention
refers to the attention
between the encoder and
decoder states.
9
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
What may the term “self” refer to, as a contrast of “cross”-attention ?
Outline
1. Motivation
2. Self-attention
10
11
Self-Attention (or intra-Attention)
Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. A structured self-attentive sentence embedding.
ICLR 2017.
Figure:
Jay Alammar,
“The Illustrated Transformer”
Self-attention refers to attending to other elements from the SAME sequence.
12
Self-Attention (or intra-Attention)
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
Query (Q)
g(x) = WQ
x
Key (K)
f(x) = WK
x
Value (V)
h(x) = WV
x
WQ
, WK
and WV
are projection
layers shared across all words.
13
Self-Attention (or intra-Attention)
Which steps are necessary to compute the contextual representation of a word
embedding e2
in a sequences of four words embeddings (e1
, e2
, e3
, e4
) ?
A (scaled) dot-product is computed between each pair of word embeddings
(eg. e1
and e2
)...
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
14
Self-Attention (or intra-Attention)
Which steps are necessary to compute the contextual representation of a word
embedding e2
in a sequences of four words embeddings (e1
, e2
, e3
, e4
) ?
… a softmax layer normalizes the attention scores to obtain the attention
distribution...
15
Self-Attention (or intra-Attention)
Which steps are necessary to compute the contextual representation of a word
embedding e2
in a sequences of four words embeddings (e1
, e2
, e3
, e4
) ?
...the same word
embeddings are combined to
obtain the contextual
representation e2
’.
16
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
17
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
18
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
19
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
20
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
21
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
22
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
23
Self-Attention (or intra-Attention)
Figure: Jay Alammar, “The illustrated Transformer” (2018)
24
Self-Attention (or intra-Attention) Scaled dot-product
attention
Figure: Jay Alammar, “The illustrated Transformer” (2018)
25
Study case: Self-Attention in for image generation
#SAGAN Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial
networks." ICML 2019. [video]
Figure:
Frank Xu
Generator (G): Details can be generated using cues from all feature locations.
Discriminator: Can check consistency betweenn features in distant portions of the image.
26
Study case: Self-Attention in for image generation
#SAGAN Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial
networks." ICML 2019. [video]
Query locations Attention maps for differet query locations
Outline
1. Motivation
2. Self-attention
3. Multi-head Self-Attention (MHSA)
27
28
Multi-Head Self-Attention (MHSA)
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
In vanilla self-attention, a single set of projection matrices WQ
, WK
, WV
is used.
29
Nikhil Sha, “Attention ? An other Perspective!”. 2020.
In multi-head self-attention, multiple sets of projection matrices are used, and can
provide different contextual representations for the same input token.
Multi-Head Self-Attention (MHSA)
30
The multi-head self-attended E’i
matrixes are concatenated:
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Multi-Head Self-Attention (MHSA)
31
A fully connected layer on top combines everything in a new E’.
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Multi-Head Self-Attention (MHSA)
Multi-head Self-Attention: Visualization
32
#BertViz Vig, Jesse. "A multiscale visualization of attention in the transformer model." ACL 2019. [code] [tweet]
Each colour
corresponds
to a head.
Blue: First
head only.
Multi-color:
Multiple
heads.
33
Self-Attention and Convolutional Layers
Cordonnier, J. B., Loukas, A., & Jaggi, M. On the relationship between self-attention and convolutional layers. ICLR 2020.
[tweet] [code]
Outline
1. Motivation
2. Self-attention
3. Multi-head Attention
4. Positional Encoding
34
Positional Encoding
35
Given that the attention mechanism allows accessing all input (and output)
tokens, we no longer need a memory through recurrent layers.
Positional Encoding
36
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Where is the relative relation in the sequence encoded ?
Positional Encoding
37
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Where is the relative relation in the sequence encoded ?
Positional Encoding
38
Maria Ribalta, Pere-Pau Vàzquez, “Visualization is all you need”. UPC GCED 2020.
Sinusoidal functions are typically used to provide positional encodings.
Positional Encoding
39
Figure: Jay Alammar, “The illustrated Transformer” (2018)
Outline
1. Motivation
2. Self-attention
3. Multi-head Attention
4. Positional Encoding
5. The Transformer
40
The Transformer
41
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
The Transformer removed the recurrency mechanism thanks to self-attention.
The Transformer
42
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Positional Encoding over the output
sequence.
Positional Encoding
over the input
sequence.
Auto-regressive (at test).
The Transformer
43
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
Cross-Attention (or inter-attention)
between input and output tokens
Self-attention for
the input tokens.
Self-attention for the output tokens.
The Transformer: Layers
44
#Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention
is all you need. NeurIPS 2017.
N decoder layers
N encoder
layers
The Transformer: Layers
45
#BertViz Vig, Jesse. "A multiscale visualization of attention in the transformer model." ACL 2019. [code] [tweet]
A birds-eye view of attention across all of the model’s layers and heads
The Transformer: Visualization
46
Maria Ribalta, Pere-Pau Vàzquez, “Visualization is all you need”. UPC GCED 2020.
47
Are Transformers for Language only ? NO !!
#ViT Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa
Dehghani et al. "An image is worth 16x16 words: Transformers for image recognition at scale." ICLR 2021. [blog] [code]
Outline
1. Motivation
2. Self-attention
3. Multi-head Attention
4. Positional Encoding
5. The Transformer
48
49
(extra) PyTorch Lab on Google Colab
DL resources from UPC Telecos:
● Lectures (with Slides & Videos)
● Labs
Gerard Gallego
gerard.ion.gallego@upc.edu
Student PhD
Universitat Politecnica de Catalunya
Technical University of Catalonia
50
Software
● Transformers in HuggingFace.
● GPT-Neo by EleutherAI
○ Similar results to GPT-3, but smaller and open source.
● Andrej Karpathy, minGPT (2020).
51
Learn more
Ashish Vaswani, Stanford CS224N 2019.
52
Learn more
● Tutorials
○ Sebastian Ruder, Deep Learning for NLP Best Practices # Attention (2017).
○ Chris Olah, Shan Carter, “Attention and Augmented Recurrent Neural Networks”. distill.pub 2016.
○ Lilian Weg, The Transformer Family. Lil’Log 2020
● Twitter threads
○ Christian Wolf (INSA Lyon)
● Scientific publications
○ #Perceiver Jaegle, Andrew, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, and Joao Carreira.
"Perceiver: General perception with iterative attention." arXiv preprint arXiv:2103.03206 (2021).
○ #Longformer Beltagy, Iz, Matthew E. Peters, and Arman Cohan. "Longformer: The long-document transformer." arXiv
preprint arXiv:2004.05150 (2020).
○ Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. Transformers are RNS: Fast autoregressive transformers with linear
attention. ICML 2020.
○ Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye
Teh, Tim Harley, Razvan Pascanu, “Multiplicative Interactions and Where to Find Them”. ICLR 2020. [tweet]
○ Self-attention in language
■ Cheng, J., Dong, L., & Lapata, M. (2016). Long short-term memory-networks for machine reading. arXiv preprint
arXiv:1601.06733.
○ Self-attention in images
■ Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image transformer. ICML
2018.
■ Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local neural networks." In CVPR 2018.
53
Questions ?

More Related Content

What's hot

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion ModelsDeep Learning JP
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...MLconf
 
はじめての方向け GANチュートリアル
はじめての方向け GANチュートリアルはじめての方向け GANチュートリアル
はじめての方向け GANチュートリアルyohei okawa
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and ApplicationsHoang Nguyen
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxSangmin Woo
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningMLAI2
 
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...SlideTeam
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial NetworksARISE analytics
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD
 
Autoencoder
AutoencoderAutoencoder
AutoencoderHARISH R
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2harmonylab
 
Chainer で Tensor コア (fp16) を使いこなす
Chainer で Tensor コア (fp16) を使いこなすChainer で Tensor コア (fp16) を使いこなす
Chainer で Tensor コア (fp16) を使いこなすNVIDIA Japan
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Kazuyuki Miyazawa
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkNaoki Matsunaga
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative ModelsDeep Learning JP
 
Overcoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkOvercoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkKaty Lee
 
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic ArithmeticZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmeticharmonylab
 

What's hot (20)

【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
【DL輪読会】High-Resolution Image Synthesis with Latent Diffusion Models
 
Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...Interpretability beyond feature attribution quantitative testing with concept...
Interpretability beyond feature attribution quantitative testing with concept...
 
はじめての方向け GANチュートリアル
はじめての方向け GANチュートリアルはじめての方向け GANチュートリアル
はじめての方向け GANチュートリアル
 
GANs and Applications
GANs and ApplicationsGANs and Applications
GANs and Applications
 
Masked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptxMasked Autoencoders Are Scalable Vision Learners.pptx
Masked Autoencoders Are Scalable Vision Learners.pptx
 
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Attention Is All You Need
Attention Is All You NeedAttention Is All You Need
Attention Is All You Need
 
Representational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual LearningRepresentational Continuity for Unsupervised Continual Learning
Representational Continuity for Unsupervised Continual Learning
 
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
Back Propagation Neural Network In AI PowerPoint Presentation Slide Templates...
 
【論文読み会】Self-Attention Generative Adversarial Networks
【論文読み会】Self-Attention Generative  Adversarial Networks【論文読み会】Self-Attention Generative  Adversarial Networks
【論文読み会】Self-Attention Generative Adversarial Networks
 
Transforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approachTransforming deep into transformers – a computer vision approach
Transforming deep into transformers – a computer vision approach
 
Autoencoder
AutoencoderAutoencoder
Autoencoder
 
BERT+XLNet+RoBERTa
BERT+XLNet+RoBERTaBERT+XLNet+RoBERTa
BERT+XLNet+RoBERTa
 
Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2Generating Diverse High-Fidelity Images with VQ-VAE-2
Generating Diverse High-Fidelity Images with VQ-VAE-2
 
Chainer で Tensor コア (fp16) を使いこなす
Chainer で Tensor コア (fp16) を使いこなすChainer で Tensor コア (fp16) を使いこなす
Chainer で Tensor コア (fp16) を使いこなす
 
Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査Teslaにおけるコンピュータビジョン技術の調査
Teslaにおけるコンピュータビジョン技術の調査
 
Active Learning と Bayesian Neural Network
Active Learning と Bayesian Neural NetworkActive Learning と Bayesian Neural Network
Active Learning と Bayesian Neural Network
 
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
[DL輪読会]Transframer: Arbitrary Frame Prediction with Generative Models
 
Overcoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural networkOvercoming catastrophic forgetting in neural network
Overcoming catastrophic forgetting in neural network
 
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic ArithmeticZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
 

Similar to The Transformer - Xavier Giró - UPC Barcelona 2021

Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaUniversitat Politècnica de Catalunya
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingMarco Wirthlin
 
[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptxTuCaoMinh2
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAJin-Hwa Kim
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learningRyohei Suzuki
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Universitat Politècnica de Catalunya
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Universitat Politècnica de Catalunya
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018Universitat Politècnica de Catalunya
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learningDaiki Tanaka
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Universitat Politècnica de Catalunya
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Julien SIMON
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255deffa5
 

Similar to The Transformer - Xavier Giró - UPC Barcelona 2021 (20)

Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC BarcelonaSelf-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
Self-supervised Visual Learning 2020 - Xavier Giro-i-Nieto - UPC Barcelona
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
alexVAE_New.pdf
alexVAE_New.pdfalexVAE_New.pdf
alexVAE_New.pdf
 
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
Deep Language and Vision by Amaia Salvador (Insight DCU 2018)
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Striving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational ModellingStriving to Demystify Bayesian Computational Modelling
Striving to Demystify Bayesian Computational Modelling
 
[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx[AIoTLab]attention mechanism.pptx
[AIoTLab]attention mechanism.pptx
 
Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)Deep Learning Representations for All (a.ka. the AI hype)
Deep Learning Representations for All (a.ka. the AI hype)
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
ALASI15 Writing Analytics Workshop
ALASI15 Writing Analytics WorkshopALASI15 Writing Analytics Workshop
ALASI15 Writing Analytics Workshop
 
Multimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QAMultimodal Residual Networks for Visual QA
Multimodal Residual Networks for Visual QA
 
Transformer based approaches for visual representation learning
Transformer based approaches for visual representation learningTransformer based approaches for visual representation learning
Transformer based approaches for visual representation learning
 
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
Deep Language and Vision (DLSL D2L4 2018 UPC Deep Learning for Speech and Lan...
 
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
Self-supervised Learning from Video Sequences - Xavier Giro - UPC Barcelona 2019
 
ICED 2013 A
ICED 2013 AICED 2013 A
ICED 2013 A
 
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018One Perceptron  to Rule them All: Deep Learning for Multimedia #A2IC2018
One Perceptron to Rule them All: Deep Learning for Multimedia #A2IC2018
 
Interpretability of machine learning
Interpretability of machine learningInterpretability of machine learning
Interpretability of machine learning
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 
Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)Deep Learning: concepts and use cases (October 2018)
Deep Learning: concepts and use cases (October 2018)
 
Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255Natural Language Processing: Lecture 255
Natural Language Processing: Lecture 255
 

More from Universitat Politècnica de Catalunya

Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoUniversitat Politècnica de Catalunya
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Universitat Politècnica de Catalunya
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosUniversitat Politècnica de Catalunya
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Universitat Politècnica de Catalunya
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Universitat Politècnica de Catalunya
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Universitat Politècnica de Catalunya
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Universitat Politècnica de Catalunya
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Universitat Politècnica de Catalunya
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Universitat Politècnica de Catalunya
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Universitat Politècnica de Catalunya
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Universitat Politècnica de Catalunya
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Universitat Politècnica de Catalunya
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Universitat Politècnica de Catalunya
 

More from Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
Transcription-Enriched Joint Embeddings for Spoken Descriptions of Images and...
 
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
Object Detection with Deep Learning - Xavier Giro-i-Nieto - UPC School Barcel...
 
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
Self-supervised Audiovisual Learning 2020 - Xavier Giro-i-Nieto - UPC Telecom...
 

Recently uploaded

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...Suhani Kapoor
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...shivangimorya083
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptxthyngster
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPramod Kumar Srivastava
 

Recently uploaded (20)

VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
VIP High Class Call Girls Jamshedpur Anushka 8250192130 Independent Escort Se...
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
Full night 🥵 Call Girls Delhi New Friends Colony {9711199171} Sanya Reddy ✌️o...
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptxEMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM  TRACKING WITH GOOGLE ANALYTICS.pptx
EMERCE - 2024 - AMSTERDAM - CROSS-PLATFORM TRACKING WITH GOOGLE ANALYTICS.pptx
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptxPKS-TGC-1084-630 - Stage 1 Proposal.pptx
PKS-TGC-1084-630 - Stage 1 Proposal.pptx
 

The Transformer - Xavier Giró - UPC Barcelona 2021

  • 1. The Transformer Lecture 20 Xavier Giro-i-Nieto Associate Professor Universitat Politecnica de Catalunya @DocXavi xavier.giro@upc.edu
  • 3. 3 Acknowledgments Marta R. Costa-jussà Associate Professor Universitat Politècnica de Catalunya Carlos Escolano PhD Candidate Universitat Politècnica de Catalunya Gerard I. Gállego PhD Student Universitat Politècnica de Catalunya gerard.ion.gallego@upc.edu @geiongallego
  • 5. 5 Reminder Nikhil Sha, “Attention ? An other Perspective!”. 2020.
  • 6. 6 Reminder Attention is a mechanism to compute a context vector (c) for a query (Q) as a weighted sum of values (V). Figure: Nikhil Shah, “Attention? An Other Perspective! [Part 1]” (2020)
  • 7. 7 Reminder Nikhil Sha, “Attention ? An other Perspective!”. 2020.
  • 8. 8 Reminder: Seq2Seq with Cross-Attention Slide concept: Abigail See, Matthew Lamm (Stanford CS224N), 2020 In this case, cross-attention refers to the attention between the encoder and decoder states.
  • 9. 9 Nikhil Sha, “Attention ? An other Perspective!”. 2020. What may the term “self” refer to, as a contrast of “cross”-attention ?
  • 11. 11 Self-Attention (or intra-Attention) Lin, Z., Feng, M., Santos, C. N. D., Yu, M., Xiang, B., Zhou, B., & Bengio, Y. A structured self-attentive sentence embedding. ICLR 2017. Figure: Jay Alammar, “The Illustrated Transformer” Self-attention refers to attending to other elements from the SAME sequence.
  • 12. 12 Self-Attention (or intra-Attention) Nikhil Sha, “Attention ? An other Perspective!”. 2020. Query (Q) g(x) = WQ x Key (K) f(x) = WK x Value (V) h(x) = WV x WQ , WK and WV are projection layers shared across all words.
  • 13. 13 Self-Attention (or intra-Attention) Which steps are necessary to compute the contextual representation of a word embedding e2 in a sequences of four words embeddings (e1 , e2 , e3 , e4 ) ? A (scaled) dot-product is computed between each pair of word embeddings (eg. e1 and e2 )... #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017.
  • 14. 14 Self-Attention (or intra-Attention) Which steps are necessary to compute the contextual representation of a word embedding e2 in a sequences of four words embeddings (e1 , e2 , e3 , e4 ) ? … a softmax layer normalizes the attention scores to obtain the attention distribution...
  • 15. 15 Self-Attention (or intra-Attention) Which steps are necessary to compute the contextual representation of a word embedding e2 in a sequences of four words embeddings (e1 , e2 , e3 , e4 ) ? ...the same word embeddings are combined to obtain the contextual representation e2 ’.
  • 16. 16 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 17. 17 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 18. 18 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 19. 19 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 20. 20 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 21. 21 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 22. 22 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 23. 23 Self-Attention (or intra-Attention) Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 24. 24 Self-Attention (or intra-Attention) Scaled dot-product attention Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 25. 25 Study case: Self-Attention in for image generation #SAGAN Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." ICML 2019. [video] Figure: Frank Xu Generator (G): Details can be generated using cues from all feature locations. Discriminator: Can check consistency betweenn features in distant portions of the image.
  • 26. 26 Study case: Self-Attention in for image generation #SAGAN Zhang, Han, Ian Goodfellow, Dimitris Metaxas, and Augustus Odena. "Self-attention generative adversarial networks." ICML 2019. [video] Query locations Attention maps for differet query locations
  • 27. Outline 1. Motivation 2. Self-attention 3. Multi-head Self-Attention (MHSA) 27
  • 28. 28 Multi-Head Self-Attention (MHSA) Nikhil Sha, “Attention ? An other Perspective!”. 2020. In vanilla self-attention, a single set of projection matrices WQ , WK , WV is used.
  • 29. 29 Nikhil Sha, “Attention ? An other Perspective!”. 2020. In multi-head self-attention, multiple sets of projection matrices are used, and can provide different contextual representations for the same input token. Multi-Head Self-Attention (MHSA)
  • 30. 30 The multi-head self-attended E’i matrixes are concatenated: Figure: Jay Alammar, “The illustrated Transformer” (2018) Multi-Head Self-Attention (MHSA)
  • 31. 31 A fully connected layer on top combines everything in a new E’. Figure: Jay Alammar, “The illustrated Transformer” (2018) Multi-Head Self-Attention (MHSA)
  • 32. Multi-head Self-Attention: Visualization 32 #BertViz Vig, Jesse. "A multiscale visualization of attention in the transformer model." ACL 2019. [code] [tweet] Each colour corresponds to a head. Blue: First head only. Multi-color: Multiple heads.
  • 33. 33 Self-Attention and Convolutional Layers Cordonnier, J. B., Loukas, A., & Jaggi, M. On the relationship between self-attention and convolutional layers. ICLR 2020. [tweet] [code]
  • 34. Outline 1. Motivation 2. Self-attention 3. Multi-head Attention 4. Positional Encoding 34
  • 35. Positional Encoding 35 Given that the attention mechanism allows accessing all input (and output) tokens, we no longer need a memory through recurrent layers.
  • 36. Positional Encoding 36 Figure: Jay Alammar, “The illustrated Transformer” (2018) Where is the relative relation in the sequence encoded ?
  • 37. Positional Encoding 37 Figure: Jay Alammar, “The illustrated Transformer” (2018) Where is the relative relation in the sequence encoded ?
  • 38. Positional Encoding 38 Maria Ribalta, Pere-Pau Vàzquez, “Visualization is all you need”. UPC GCED 2020. Sinusoidal functions are typically used to provide positional encodings.
  • 39. Positional Encoding 39 Figure: Jay Alammar, “The illustrated Transformer” (2018)
  • 40. Outline 1. Motivation 2. Self-attention 3. Multi-head Attention 4. Positional Encoding 5. The Transformer 40
  • 41. The Transformer 41 #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. The Transformer removed the recurrency mechanism thanks to self-attention.
  • 42. The Transformer 42 #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Positional Encoding over the output sequence. Positional Encoding over the input sequence. Auto-regressive (at test).
  • 43. The Transformer 43 #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. Cross-Attention (or inter-attention) between input and output tokens Self-attention for the input tokens. Self-attention for the output tokens.
  • 44. The Transformer: Layers 44 #Transformer Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I.. Attention is all you need. NeurIPS 2017. N decoder layers N encoder layers
  • 45. The Transformer: Layers 45 #BertViz Vig, Jesse. "A multiscale visualization of attention in the transformer model." ACL 2019. [code] [tweet] A birds-eye view of attention across all of the model’s layers and heads
  • 46. The Transformer: Visualization 46 Maria Ribalta, Pere-Pau Vàzquez, “Visualization is all you need”. UPC GCED 2020.
  • 47. 47 Are Transformers for Language only ? NO !! #ViT Dosovitskiy, Alexey, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani et al. "An image is worth 16x16 words: Transformers for image recognition at scale." ICLR 2021. [blog] [code]
  • 48. Outline 1. Motivation 2. Self-attention 3. Multi-head Attention 4. Positional Encoding 5. The Transformer 48
  • 49. 49 (extra) PyTorch Lab on Google Colab DL resources from UPC Telecos: ● Lectures (with Slides & Videos) ● Labs Gerard Gallego gerard.ion.gallego@upc.edu Student PhD Universitat Politecnica de Catalunya Technical University of Catalonia
  • 50. 50 Software ● Transformers in HuggingFace. ● GPT-Neo by EleutherAI ○ Similar results to GPT-3, but smaller and open source. ● Andrej Karpathy, minGPT (2020).
  • 51. 51 Learn more Ashish Vaswani, Stanford CS224N 2019.
  • 52. 52 Learn more ● Tutorials ○ Sebastian Ruder, Deep Learning for NLP Best Practices # Attention (2017). ○ Chris Olah, Shan Carter, “Attention and Augmented Recurrent Neural Networks”. distill.pub 2016. ○ Lilian Weg, The Transformer Family. Lil’Log 2020 ● Twitter threads ○ Christian Wolf (INSA Lyon) ● Scientific publications ○ #Perceiver Jaegle, Andrew, Felix Gimeno, Andrew Brock, Andrew Zisserman, Oriol Vinyals, and Joao Carreira. "Perceiver: General perception with iterative attention." arXiv preprint arXiv:2103.03206 (2021). ○ #Longformer Beltagy, Iz, Matthew E. Peters, and Arman Cohan. "Longformer: The long-document transformer." arXiv preprint arXiv:2004.05150 (2020). ○ Katharopoulos, A., Vyas, A., Pappas, N., & Fleuret, F. Transformers are RNS: Fast autoregressive transformers with linear attention. ICML 2020. ○ Siddhant M. Jayakumar, Wojciech M. Czarnecki, Jacob Menick, Jonathan Schwarz, Jack Rae, Simon Osindero, Yee Whye Teh, Tim Harley, Razvan Pascanu, “Multiplicative Interactions and Where to Find Them”. ICLR 2020. [tweet] ○ Self-attention in language ■ Cheng, J., Dong, L., & Lapata, M. (2016). Long short-term memory-networks for machine reading. arXiv preprint arXiv:1601.06733. ○ Self-attention in images ■ Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., & Tran, D. (2018). Image transformer. ICML 2018. ■ Wang, Xiaolong, Ross Girshick, Abhinav Gupta, and Kaiming He. "Non-local neural networks." In CVPR 2018.