Smart Reply - Word-level
Sequence to sequence
Problem formulation
• Topical chat Dataset
This file contains information
about
• conversation_id
• Message
• Sentiment
• To implement a basic character-level recurrent
sequence-to-sequence model.
• We apply it get the short English sentences for smart
reply character-by-character.
Summary of the algorithm
• We start with input sequences from a domain and corresponding target
sequences from another domain .
• An encoder LSTM turns input sequences to 2 state vectors (we keep the
last LSTM state and discard the outputs).
• A decoder LSTM is trained to turn the target sequences into the same
sequence but offset by one time step in the future, a training process
called "teacher forcing" in this context. It uses as initial state the state
vectors from the encoder. Effectively, the decoder learns to
generate targets[t+1...] given targets[...t], conditioned on the input
sequence.
• In inference mode, when we want to decode unknown input sequences,
we: - Encode the input sequence into state vectors - Start with a target
sequence of size 1 (just the start-of-sequence character) - Feed the state
vectors and 1-char target sequence to the decoder to produce
predictions for the next character - Sample the next character using
these predictions (we simply use argmax). - Append the sampled
character to the target sequence - Repeat until we generate the end-of-
sequence character or we hit the character limit.
Head of the dataset
1. Setup
2.Configuration
3. Load the Dataset
4. Cleaning the data
5. Prepare the data
• Number of Iterations = 186952
• Time taken to run the loop = 00:38<00:00
• The progress is being tracked using tqdm library.
1. Vectorize the data.
2. Set the limitation of text and append them in variable to vectorize.
3. Put BOS and EOS for decoder input.
4. Separate unique words.
• Create word index dictionary, word as the key and index as
the value.
• Store the object to ‘handle’ variable both for input sequence
and target sequence.
• Initialize encoder/decoder, Both input output are three
dimensional tensors, consisting of each sentence, with all
words encoded in one-hot.
6. Build the Model
7. Train the model
Training and validation loss metrics observed during the training
phase.
8.Create Inference Model
Output
Conclusion
• The project attempts to predict the next character or
word in the sequence.
• This project generate the predictions by sending in
the text.
• Although this model hasn't reached human-level
intelligence!
• To attain more accurate results from this model, we
can increase the size of the dataset.
Reference
• https://keras.io/examples/nlp/lstm_seq2seq/
• https://alvinntnu.github.io/python-notes/nlp/seq-to-
seq-machine-translation.html
• https://blog.paperspace.com/implement-seq2seq-
for-text-summarization-keras/

Smart Reply - Word-level Sequence to sequence.pptx

  • 1.
    Smart Reply -Word-level Sequence to sequence
  • 2.
    Problem formulation • Topicalchat Dataset This file contains information about • conversation_id • Message • Sentiment • To implement a basic character-level recurrent sequence-to-sequence model. • We apply it get the short English sentences for smart reply character-by-character.
  • 3.
    Summary of thealgorithm • We start with input sequences from a domain and corresponding target sequences from another domain . • An encoder LSTM turns input sequences to 2 state vectors (we keep the last LSTM state and discard the outputs). • A decoder LSTM is trained to turn the target sequences into the same sequence but offset by one time step in the future, a training process called "teacher forcing" in this context. It uses as initial state the state vectors from the encoder. Effectively, the decoder learns to generate targets[t+1...] given targets[...t], conditioned on the input sequence. • In inference mode, when we want to decode unknown input sequences, we: - Encode the input sequence into state vectors - Start with a target sequence of size 1 (just the start-of-sequence character) - Feed the state vectors and 1-char target sequence to the decoder to produce predictions for the next character - Sample the next character using these predictions (we simply use argmax). - Append the sampled character to the target sequence - Repeat until we generate the end-of- sequence character or we hit the character limit.
  • 4.
    Head of thedataset
  • 5.
  • 6.
    3. Load theDataset 4. Cleaning the data
  • 7.
  • 8.
    • Number ofIterations = 186952 • Time taken to run the loop = 00:38<00:00 • The progress is being tracked using tqdm library. 1. Vectorize the data. 2. Set the limitation of text and append them in variable to vectorize. 3. Put BOS and EOS for decoder input. 4. Separate unique words.
  • 11.
    • Create wordindex dictionary, word as the key and index as the value. • Store the object to ‘handle’ variable both for input sequence and target sequence. • Initialize encoder/decoder, Both input output are three dimensional tensors, consisting of each sentence, with all words encoded in one-hot.
  • 12.
  • 14.
  • 15.
    Training and validationloss metrics observed during the training phase.
  • 16.
  • 18.
  • 19.
    Conclusion • The projectattempts to predict the next character or word in the sequence. • This project generate the predictions by sending in the text. • Although this model hasn't reached human-level intelligence! • To attain more accurate results from this model, we can increase the size of the dataset.
  • 20.