This is the presentation from our AI Meet March 2017 on Attention Mechanism in Language Understanding and its Applications.
You can join Artifacia AI Meet Bangalore Group: https://www.meetup.com/Artifacia-AI-Meet/
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Attention Mechanism in Language Understanding and its Applications
1. Attention Mechanism in Language
Understanding and their Applications
by Rajarshee Mitra, Research Engineer, Artifacia
(@rajarshee_mitra)
March 25, 2017
2. AI Meet|
Agenda
1. What is Seq2Seq ?
2. Challenges in vanilla Seq2Seq
3. Attention - introduction
4. Attention (contd.)
5. Attention - microscopic view
6. Visualizing attention
7. More applications of attention
8. Reference.
3. AI Meet|
What is Seq2Seq ?
● Contains two different RNNs: An encoder and a decoder.
● Encoder encodes the sentence to form a context vector.
● Decodes decodes the vector to generate language.
● Applications: NMT, Summarization, Conversations etc.
4. AI Meet|
Challenges in vanilla Seq2Seq
● Hard for encoder to compress the whole source sentence into a
single vector.
● Performance deteriorates rapidly as the length of sentence
increases.
● A single context vector for generating every word in decoder does
not produce the best results.
5. AI Meet|
Attention
● Does not squash the whole source sentence into a vector.
● Considers a subset of the word vectors in the source more than the
others while generating each target word.
● It is an intuitive process and comparable to how we read.
● While inferring something from a piece of text -- like answering
questions, we pay more attention to some words each time in the
text.
● Eventually, the machine learns where to attend more and where,
less.
● Eg. while translating an english sentence to french, the fourth word
in the french sentence can be highly correlated to the first word in
the english sentence.
● Hence, it is not very useful to consider the whole english sentence
everytime for generating each french word.
6. AI Meet|
Attention (contd.)
● From a high level point of view, the attention model differs
from a traditional Seq2Seq in sense that in vanilla seq2seq,
we simply feed h_t to softmax and not h_t_
● C_t differs for every time step in the decoder.
9. AI Meet|
More applications of attention
1. Neural Machine Translation.
2. Text Summarization.
3. Voice recognition.
4. Generate parse trees of sentences.
5. Chatbots.
Attentional interfaces can be used whenever one wants to
interface with a neural network that has a repeating
structure in its output.