Deep Learning Project.pptx

•Download as PPTX, PDF•

0 likes•25 views

TasnimRahman54

Text Classification Deep Learning Project Presentation

Engineering

PROJECT
PRESENTATION
Paper: Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text
Classification
Reference: Liu, Gang, and Jiabao Guo. "Bidirectional LSTM with attention
mechanism and convolutional layer for text classification." Neurocomputing 337
(2019): 325-338.

CONTENT
• Paper
• Dataset
• Vocabulary Building
• Word2Vector
• Model Generation
• Model Summary
• Model Training
• Future Work

PAPER
• Objective: Sentiment classification of polarized datasets, such as reviews, questions, etc.
• CNNs are able to extract features for sentence modelling while reducing dimensionality of
the data.
• RNNs are specialized for sequential modelling. Bi-LSTM, combines the forward hidden layer
and the backward hidden layer, which can access both the preceding and succeeding
contexts, to obtain the contextual information of the text.
• Attention mechanism is used in two-layers for the preceding and succeeding
contextual features to highlight the important information from the contextual
information by setting different weights
• Softmax layer to generate labels.
• Their model outperforms state-of-the-art classification methods in terms of
classification accuracy

DATASET: IMDB – MOVIE REVIEW
• This is dataset for binary sentiment classification containing 50,000 highly-polarized reviews
with 25k for training and 25k for testing, and divided into positive reviews (labelled ‘2’) and
negative reviews (labelled ‘1’). Examples are shown below:

VOCABULARY BUILDING
• The sentences consist of many forms of words such as punctuations, contractions, and
simple words such ‘am’, ‘been’, ‘is’, etc. all connected together to make sentence.
• These must be processed to extract only meaningful words into tokens and generate
vocabulary.

WORD2VECTOR
• Word embedding are vector representations of words or tokens
• The Word2Vector model is used to convert the one-hot encoding representations into
vectors that account for the context of the word with respect to other similar or related
words.
• Two types: Bag-of-Words or Skip-gram; here, skip-gram was used.

WORD2VEC (CONTD.)
• Skip-gram word2vec model created and initialized with embedding size of 30, sliding
window size of 5, and minimum frequency count of 5.
• The model was trained for 30 epochs for best results. The total parameters of the model
were found as follows (in picture)
• Examples from the model testing for word similarity are shown below:

WORD2VEC (CONTD.)
• T-SNE (t-distributed stochastic neighbor embedding is a good way to visualize word vectors.
• But, they do not always produce accurate representations as it involves transforming from a
higher dimension to a much lower dimension.

MODEL GENERATION
• Convolutional Layer: 1-D convolutional layer with input channel of 300, and output channel
of 100, used to extract features and reduce dimension
• BiLSTM: Bidirectional LSTM layer with hidden size of 150, to extract contextual information
from past and future data.
• Since the sentence size and thus the number of embeddings varies for each review or data
input, padding was performed with zeros on each batch, and then packed using
pack_padded_sequence for efficient computation, before being fed to BiLSTM.
• The forward hidden state and backward hidden state extracted separately as forward context
and backward context, and fed into two attention layers.
• Attention Layer: Forward attention layer of hidden size 150, and Backward attention layer of
hidden size 150; attention mechanism used is general attention.
• Softmax: Softmax layer used at the end to generate label with max. probability.
• Metrics: Accuracy
• Adam optimizer at 10 epochs, with CrossEntropy loss and 80%-20% split

FUTURE WORK
• Troubleshoot the main model training part and complete training.
• Modify attention mechanism with multi-head attention.
• Train and test model on a different dataset.

Similar to Deep Learning Project.pptx

Transfer Learning in NLP: A SurveyNUPUR YADAV

NLP and Deep Learning for non_expertsSanghamitra Deb

Convolutional Neural Networks : Popular Architecturesananth

BERT MODULE FOR TEXT CLASSIFICATION.pptxManvanthBC

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee

NLP Classifier Models & MetricsSanghamitra Deb

“Design of Efficient Mobile Femtocell by Compression and Aggregation Technolo...Virendra Uppalwar

multi modal transformers representation generation .pptxsiddharth1729

Survey of Attention mechanismSwatiNarkhede1

Natural Language Processing Advancements By Deep Learning - A SurveyAkshayaNagarajan10

240318_JW_labseminar[Attention Is All You Need].pptxthanhdowork

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIMEFeng Zhu

Presentation vision transformersppt.pptxhtn540

IRE Semantic Annotation of Documents Sharvil Katariya

Lec16 - Autoencoders.pptxSameer Gulshan

A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...Lola Burgueño

Future semantic segmentation with convolutional LSTMKyuri Kim

2017:12:06 acl読み会"Learning attention for historical text normalization by lea...ayaha osaki

TensorFlow.pptxJayesh Patil

SpecAugment reviewJune-Woo Kim

Similar to Deep Learning Project.pptx (20)

Transfer Learning in NLP: A Survey

NLP and Deep Learning for non_experts

Convolutional Neural Networks : Popular Architectures

BERT MODULE FOR TEXT CLASSIFICATION.pptx

PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...

NLP Classifier Models & Metrics

“Design of Efficient Mobile Femtocell by Compression and Aggregation Technolo...

multi modal transformers representation generation .pptx

Survey of Attention mechanism

Natural Language Processing Advancements By Deep Learning - A Survey

240318_JW_labseminar[Attention Is All You Need].pptx

Predicting Azure Churn with Deep Learning and Explaining Predictions with LIME

Presentation vision transformersppt.pptx

IRE Semantic Annotation of Documents

Lec16 - Autoencoders.pptx

A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...

Future semantic segmentation with convolutional LSTM

2017:12:06 acl読み会"Learning attention for historical text normalization by lea...

TensorFlow.pptx

SpecAugment review

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Dr.Costas Sachpazis

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escortsranjana rawat

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Low Rate Call Girls In Saket, Delhi NCR

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝soniya singh

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).pptssuser5c9d4b1

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat

HARMONY IN THE NATURE AND EXISTENCE - Unit-IVRajaP95

Processing & Properties of Floor and Wall Tiles.pptxpranjaldaimarysona

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha

Introduction to IEEE STANDARDS and its different types.pptxupamatechverse

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130Suhani Kapoor

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICSRajkumarAkumalla

Current Transformer Drawing and GTP for MSETCLDeelipZope

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile

Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx

Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts

(RIA) Call Girls Bhosari ( 7001035870 ) HI-Fi Pune Escorts Service

Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...

(MEERA) Dapodi Call Girls Just Call 7001035870 [ Cash on Delivery ] Pune Escorts

9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf

Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...

Model Call Girl in Narela Delhi reach out to us at 🔝8264348440🔝

247267395-1-Symmetric-and-distributed-shared-memory-architectures-ppt (1).ppt

Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts

(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...

HARMONY IN THE NATURE AND EXISTENCE - Unit-IV

Processing & Properties of Floor and Wall Tiles.pptx

Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx

Introduction to IEEE STANDARDS and its different types.pptx

VIP Call Girls Service Kondapur Hyderabad Call +91-8250192130

HARDNESS, FRACTURE TOUGHNESS AND STRENGTH OF CERAMICS

Current Transformer Drawing and GTP for MSETCL

High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts

Call Girls Delhi {Jodhpur} 9711199012 high profile service

Deep Learning Project.pptx

1. PROJECT PRESENTATION Paper: Bidirectional LSTM with Attention Mechanism and Convolutional Layer for Text Classification Reference: Liu, Gang, and Jiabao Guo. "Bidirectional LSTM with attention mechanism and convolutional layer for text classification." Neurocomputing 337 (2019): 325-338.

2. CONTENT • Paper • Dataset • Vocabulary Building • Word2Vector • Model Generation • Model Summary • Model Training • Future Work

3. PAPER • Objective: Sentiment classification of polarized datasets, such as reviews, questions, etc. • CNNs are able to extract features for sentence modelling while reducing dimensionality of the data. • RNNs are specialized for sequential modelling. Bi-LSTM, combines the forward hidden layer and the backward hidden layer, which can access both the preceding and succeeding contexts, to obtain the contextual information of the text. • Attention mechanism is used in two-layers for the preceding and succeeding contextual features to highlight the important information from the contextual information by setting different weights • Softmax layer to generate labels. • Their model outperforms state-of-the-art classification methods in terms of classification accuracy

4. DATASET: IMDB – MOVIE REVIEW • This is dataset for binary sentiment classification containing 50,000 highly-polarized reviews with 25k for training and 25k for testing, and divided into positive reviews (labelled ‘2’) and negative reviews (labelled ‘1’). Examples are shown below:

5. VOCABULARY BUILDING • The sentences consist of many forms of words such as punctuations, contractions, and simple words such ‘am’, ‘been’, ‘is’, etc. all connected together to make sentence. • These must be processed to extract only meaningful words into tokens and generate vocabulary.

6. WORD2VECTOR • Word embedding are vector representations of words or tokens • The Word2Vector model is used to convert the one-hot encoding representations into vectors that account for the context of the word with respect to other similar or related words. • Two types: Bag-of-Words or Skip-gram; here, skip-gram was used.

7. WORD2VEC (CONTD.) • Skip-gram word2vec model created and initialized with embedding size of 30, sliding window size of 5, and minimum frequency count of 5. • The model was trained for 30 epochs for best results. The total parameters of the model were found as follows (in picture) • Examples from the model testing for word similarity are shown below:

8. WORD2VEC (CONTD.) • T-SNE (t-distributed stochastic neighbor embedding is a good way to visualize word vectors. • But, they do not always produce accurate representations as it involves transforming from a higher dimension to a much lower dimension.

9. MODEL GENERATION • Convolutional Layer: 1-D convolutional layer with input channel of 300, and output channel of 100, used to extract features and reduce dimension • BiLSTM: Bidirectional LSTM layer with hidden size of 150, to extract contextual information from past and future data. • Since the sentence size and thus the number of embeddings varies for each review or data input, padding was performed with zeros on each batch, and then packed using pack_padded_sequence for efficient computation, before being fed to BiLSTM. • The forward hidden state and backward hidden state extracted separately as forward context and backward context, and fed into two attention layers. • Attention Layer: Forward attention layer of hidden size 150, and Backward attention layer of hidden size 150; attention mechanism used is general attention. • Softmax: Softmax layer used at the end to generate label with max. probability. • Metrics: Accuracy • Adam optimizer at 10 epochs, with CrossEntropy loss and 80%-20% split

10. MODEL SUMMARY

11. MODEL TRAINING

12. FUTURE WORK • Troubleshoot the main model training part and complete training. • Modify attention mechanism with multi-head attention. • Train and test model on a different dataset.

Deep Learning Project.pptx

Recommended

Recommended

More Related Content

Similar to Deep Learning Project.pptx

Similar to Deep Learning Project.pptx (20)

Recently uploaded

Recently uploaded (20)

Deep Learning Project.pptx