Call Girls Delhi {Jodhpur} 9711199012 high profile service
Deep Learning Project.pptx
1. PROJECT
PRESENTATION
Paper: Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text
Classification
Reference: Liu, Gang, and Jiabao Guo. "Bidirectional LSTM with attention
mechanism and convolutional layer for text classification." Neurocomputing 337
(2019): 325-338.
2. CONTENT
• Paper
• Dataset
• Vocabulary Building
• Word2Vector
• Model Generation
• Model Summary
• Model Training
• Future Work
3. PAPER
• Objective: Sentiment classification of polarized datasets, such as reviews, questions, etc.
• CNNs are able to extract features for sentence modelling while reducing dimensionality of
the data.
• RNNs are specialized for sequential modelling. Bi-LSTM, combines the forward hidden layer
and the backward hidden layer, which can access both the preceding and succeeding
contexts, to obtain the contextual information of the text.
• Attention mechanism is used in two-layers for the preceding and succeeding
contextual features to highlight the important information from the contextual
information by setting different weights
• Softmax layer to generate labels.
• Their model outperforms state-of-the-art classification methods in terms of
classification accuracy
4. DATASET: IMDB – MOVIE REVIEW
• This is dataset for binary sentiment classification containing 50,000 highly-polarized reviews
with 25k for training and 25k for testing, and divided into positive reviews (labelled ‘2’) and
negative reviews (labelled ‘1’). Examples are shown below:
5. VOCABULARY BUILDING
• The sentences consist of many forms of words such as punctuations, contractions, and
simple words such ‘am’, ‘been’, ‘is’, etc. all connected together to make sentence.
• These must be processed to extract only meaningful words into tokens and generate
vocabulary.
6. WORD2VECTOR
• Word embedding are vector representations of words or tokens
• The Word2Vector model is used to convert the one-hot encoding representations into
vectors that account for the context of the word with respect to other similar or related
words.
• Two types: Bag-of-Words or Skip-gram; here, skip-gram was used.
7. WORD2VEC (CONTD.)
• Skip-gram word2vec model created and initialized with embedding size of 30, sliding
window size of 5, and minimum frequency count of 5.
• The model was trained for 30 epochs for best results. The total parameters of the model
were found as follows (in picture)
• Examples from the model testing for word similarity are shown below:
8. WORD2VEC (CONTD.)
• T-SNE (t-distributed stochastic neighbor embedding is a good way to visualize word vectors.
• But, they do not always produce accurate representations as it involves transforming from a
higher dimension to a much lower dimension.
9. MODEL GENERATION
• Convolutional Layer: 1-D convolutional layer with input channel of 300, and output channel
of 100, used to extract features and reduce dimension
• BiLSTM: Bidirectional LSTM layer with hidden size of 150, to extract contextual information
from past and future data.
• Since the sentence size and thus the number of embeddings varies for each review or data
input, padding was performed with zeros on each batch, and then packed using
pack_padded_sequence for efficient computation, before being fed to BiLSTM.
• The forward hidden state and backward hidden state extracted separately as forward context
and backward context, and fed into two attention layers.
• Attention Layer: Forward attention layer of hidden size 150, and Backward attention layer of
hidden size 150; attention mechanism used is general attention.
• Softmax: Softmax layer used at the end to generate label with max. probability.
• Metrics: Accuracy
• Adam optimizer at 10 epochs, with CrossEntropy loss and 80%-20% split
12. FUTURE WORK
• Troubleshoot the main model training part and complete training.
• Modify attention mechanism with multi-head attention.
• Train and test model on a different dataset.