Transformers for NLP
An Introduction to the Attention on Steroids
What is NLP?
Natural Language Processing
(NLP) is a subfield of artificial
intelligence which is primarily aimed
to program computers to process
and analyze large amounts of
natural language data
NLP techniques
In timeline ● Symbolic NLP (1950s to 1990s)
● Statistical NLP (1990s to 2010s)
● Neural NLP (2010s to present)
Deep Learning for NLP
Deep Learning Models
Neural Networks
Word to representation
(word2vec)
● Layered structure
● True targets vs
output predictions
● Weights and loss
functions
● Optimizers
RNN
Language Generation
Encoder-Decoder
Translation
● Joint RNNs
● Creating context
and using the final
context of first RNN
● Training using
parallel corpora
Image source: https://www.slideshare.net/darvind/nlp-using-transformers
Deep Learning Models
Attention Models
Translation and Image
Recognition
● Pay selective attention
● Utilize intermediate
states while decoding
● Do alignment while
translating
Transformer
Translation
Image source: https://www.slideshare.net/darvind/nlp-using-transformers
Transformers in
Translation
This architecture aims to solve
sequence-to-sequence tasks while
handling long-range dependencies with
ease. It relies entirely on self-attention
to compute representations of its input
and output WITHOUT using sequence-
aligned RNNs or convolution
Transformers in
Translation
Three types of attention
● Within encoder
● Encoder - Decoder
● Within decoder
Transformers in
Translation
Self-attention
● A rich representation which
captures the interdependencies
between the words compared
to single word embeddings
● Parallel computation
● Adhere to long-term
dependencies
Transformers in
Translation
Image source: https://arxiv.org/pdf/1706.03762.pdf
Transformers in
Translation
Challenges
● Attention can only deal with
fixed-length text strings. The
text has to be split into a
certain number of segments or
chunks before being fed into
the system as input
● This chunking of text causes
context fragmentation.
Transformers in
Translation
Future directions
● Transformer XL for language
modelling
● Google’s BERT
Thank you!!!

Introduction to Transformer Model

  • 1.
    Transformers for NLP AnIntroduction to the Attention on Steroids
  • 2.
    What is NLP? NaturalLanguage Processing (NLP) is a subfield of artificial intelligence which is primarily aimed to program computers to process and analyze large amounts of natural language data
  • 3.
    NLP techniques In timeline● Symbolic NLP (1950s to 1990s) ● Statistical NLP (1990s to 2010s) ● Neural NLP (2010s to present)
  • 4.
  • 5.
    Deep Learning Models NeuralNetworks Word to representation (word2vec) ● Layered structure ● True targets vs output predictions ● Weights and loss functions ● Optimizers RNN Language Generation Encoder-Decoder Translation ● Joint RNNs ● Creating context and using the final context of first RNN ● Training using parallel corpora Image source: https://www.slideshare.net/darvind/nlp-using-transformers
  • 6.
    Deep Learning Models AttentionModels Translation and Image Recognition ● Pay selective attention ● Utilize intermediate states while decoding ● Do alignment while translating Transformer Translation Image source: https://www.slideshare.net/darvind/nlp-using-transformers
  • 7.
    Transformers in Translation This architectureaims to solve sequence-to-sequence tasks while handling long-range dependencies with ease. It relies entirely on self-attention to compute representations of its input and output WITHOUT using sequence- aligned RNNs or convolution
  • 8.
    Transformers in Translation Three typesof attention ● Within encoder ● Encoder - Decoder ● Within decoder
  • 9.
    Transformers in Translation Self-attention ● Arich representation which captures the interdependencies between the words compared to single word embeddings ● Parallel computation ● Adhere to long-term dependencies
  • 10.
    Transformers in Translation Image source:https://arxiv.org/pdf/1706.03762.pdf
  • 11.
    Transformers in Translation Challenges ● Attentioncan only deal with fixed-length text strings. The text has to be split into a certain number of segments or chunks before being fed into the system as input ● This chunking of text causes context fragmentation.
  • 12.
    Transformers in Translation Future directions ●Transformer XL for language modelling ● Google’s BERT
  • 13.