SlideShare a Scribd company logo
1 of 22
DEEP LEARNING-BASED LANGUAGE
MODELS USING MULTI-TASK LEARNING
IN NATURAL LANGUAGE
UNDERSTANDING: A SYSTEMATIC
LITERATURE REVIEW AND FUTURE
DIRECTIONS
BY:
SANJAY BHARGAV MADAMANCHI
SJSU ID: 016421587
ABSTRACT
• Learning a new language is difficult for all and its even more difficult for a computer to
learn and process human language. But due to the recent techniques in Deep
Learning(DL) Natural Language Processing(NLP) tasks are enhanced significantly,
but these models cannot be entirely generated using NLP models. In order to meet
the latest trends Natural Language Understanding(NLU) a subfield of NLP is
emerged. NLU tasks include things like machine translation, text entailment, dialogue
based systems, natural language inference, sentiment analysis. The advancement in
the field of NLU can enhance the development in these models.
INTRODUCTION
• NLU is the emerging nowadays due to the increase in the GPT models , it mainly
concentrates on analyzing and extracting information from human language text.
NLU tasks include information-retrieval, summarization, language translation,
classification. NLU aims to attain the task proficiency for tasks contained in
standard benchmarking datasets like GLUE (General Language Understanding
Evaluation) and superGLUE. (Super GLUE).
METHODOLOGY
• The methodology mainly consists of key-phrases required to search results,
inclusion and exclusion criteria, selection results, quality assessments extraction
and data synthesis. The below table shows the evolution of models in NLU
BACKGROUND
• NLU includes building language models training them, testing them for accuracy.
This section contains the text classification tasks used in this paper. there are two
types of QA tasks QA extractive and generative QA, we are only considering
extractive QA in this part. NLI is used to predict whether we can predict meaning
of one text from other. neural machine translation is used as a process to
translate text by simulating human brain capabilities, the main goal of this part is
to retain the meaning and intent of the language while translating if from one to
other
MODELS CONSIDERED
FEED FORWARD NETWORK BASED MODELS
• Simple DL models for text representation include feedforward networks. Despite
this, they have a good level of accuracy on several TC benchmarks. Text is
viewed as a collection of words in these models. These models acquire a vector
representation for each word by word2vec. These are popular embeddings
models .Joulin et al. [23] introduced another classifier called fastText. It is efficient
and straightforward
RNN-BASED MODELS
• Usually, the text is treated as an order of words in RNNbased models. The basic
purpose of an RNN-based model for text categorization is to capture word
relationships between sentences and text structure. Plain RNN-based models, on
the other hand, do not perform as well as standard feed-forward neural networks.
MODELS BASED ON CNN
• CNNs are taught to identify patterns in space, while RNNs are trained to detect
patterns over time [30]. RNNs perform well in NLP tasks like RQA-POS tagging,
which need an understanding of long-range semantics, but CNN performs well in
situations where sensing local and location-independent patterns in the document
is critical.
CAPSULE NEURAL NETWORK-BASED MODELS
• CNN uses several layers of convolutions and pooling to classify pictures or text.
Pooling operations detect significant features and minimize the computation
complexity of convolution processes, but they miss spatial information and may
misclassify items depending on their orientation or proportion.
MODELS WITH MECHANISM OF ATTENTION
• The way one pays attention to distinct sections of a photograph or related words
of a single sentence motivates attention. Attention is becoming a central concept
and tool in developing DL models for NLP [45]. It can be thought of as a vector of
significant weights in a nutshell.
MODELS BASED ON GRAPH NEURAL NETWORKS
• Even though ordinary texts have a serial order, they also comprise inherent graph
structures similar to parse trees that speculate the relationships based on syntax
and semantics of the sentences.
MODELS WITH HYBRID TECHNIQUES
• Many hybrid models have been built to detect global and local documents by
combining LSTM and CNN architectures.
MODELS BASED ON TRANSFORMERS
• The sequential processing of text is one of the computational obstacles that
RNNs face. Even though RNNs are more sequential than CNNs, the computing
cost of capturing associations among words in a phrase climb with the length of
the sentence, much like CNNs.
DISCUSSIONS AND LIMITATION
• Large number of research papers are considered for this aspect in building large
language models in DL
• The above mentioned framework is proposed for creating NLU models in future, it
is mainly done considering BERT models. The models has two parts multi-tasking
and pre training, the proposed framework should work well as it combines
multiple techniques.
• Current models employ text classification tasks due scarcity of literature and
research in multi tasking DL models in NLU, and availability of large number of
models makes it difficult to find the suitable model and that meets the
requirements
CONCLUSION
• Majority of issues faced by multi taking learning are same and the findings
suggest that a hybrid model that contains strategies from Multi task learning and
active learning , still there is lot of scope to figure out how to improve accuracy
and resilient MTL models for general AI and next gen models
REFERENCES
• [23] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov,
``FastText.Zip: Compressing text classi cation models,’’ 2016, arXiv:1612.03651.
• [30] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ``Gradient-based learning
applied to document recognition,'' Proc. IEEE, vol. 86, no. 11, pp. 2278 2324,
Nov. 1998.
• [45] D. Bahdanau, K. Cho, and Y. Bengio, ``Neural machine translation by jointly
learning to align and translate,'' 2014, arXiv:1409.0473.
• https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9706456

More Related Content

Similar to short_story.pptx

A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
kevig
 

Similar to short_story.pptx (20)

Knowledge distillation deeplab
Knowledge distillation deeplabKnowledge distillation deeplab
Knowledge distillation deeplab
 
Natural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A SurveyNatural Language Processing Advancements By Deep Learning - A Survey
Natural Language Processing Advancements By Deep Learning - A Survey
 
Unit 5f.pptx
Unit 5f.pptxUnit 5f.pptx
Unit 5f.pptx
 
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 ReviewNatural Language Generation / Stanford cs224n 2019w lecture 15 Review
Natural Language Generation / Stanford cs224n 2019w lecture 15 Review
 
subrat
 subrat subrat
subrat
 
AINL 2016: Nikolenko
AINL 2016: NikolenkoAINL 2016: Nikolenko
AINL 2016: Nikolenko
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Neural word embedding and language modelling
Neural word embedding and language modellingNeural word embedding and language modelling
Neural word embedding and language modelling
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATIONAN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
AN IMPROVED MT5 MODEL FOR CHINESE TEXT SUMMARY GENERATION
 
Challenges in transfer learning in nlp
Challenges in transfer learning in nlpChallenges in transfer learning in nlp
Challenges in transfer learning in nlp
 
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
[DSC MENA 24] Nada_GabAllah_-_Advancement_in_NLP_and_Text_Analytics.pptx
 
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...[Paper Reading]  Unsupervised Learning of Sentence Embeddings using Compositi...
[Paper Reading] Unsupervised Learning of Sentence Embeddings using Compositi...
 
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2 Engineering Intelligent NLP Applications Using Deep Learning – Part 2
Engineering Intelligent NLP Applications Using Deep Learning – Part 2
 
LSTM Based Sentiment Analysis
LSTM Based Sentiment AnalysisLSTM Based Sentiment Analysis
LSTM Based Sentiment Analysis
 
Tomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLPTomáš Mikolov - Distributed Representations for NLP
Tomáš Mikolov - Distributed Representations for NLP
 
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
A NOVEL APPROACH FOR NAMED ENTITY RECOGNITION ON HINDI LANGUAGE USING RESIDUA...
 
240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx240115_Attention Is All You Need (2017 NIPS).pptx
240115_Attention Is All You Need (2017 NIPS).pptx
 
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...
 

Recently uploaded

Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
JohnnyPlasten
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
shambhavirathore45
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
shivangimorya083
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
shivangimorya083
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
Lars Albertsson
 

Recently uploaded (20)

VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 

short_story.pptx

  • 1. DEEP LEARNING-BASED LANGUAGE MODELS USING MULTI-TASK LEARNING IN NATURAL LANGUAGE UNDERSTANDING: A SYSTEMATIC LITERATURE REVIEW AND FUTURE DIRECTIONS BY: SANJAY BHARGAV MADAMANCHI SJSU ID: 016421587
  • 2. ABSTRACT • Learning a new language is difficult for all and its even more difficult for a computer to learn and process human language. But due to the recent techniques in Deep Learning(DL) Natural Language Processing(NLP) tasks are enhanced significantly, but these models cannot be entirely generated using NLP models. In order to meet the latest trends Natural Language Understanding(NLU) a subfield of NLP is emerged. NLU tasks include things like machine translation, text entailment, dialogue based systems, natural language inference, sentiment analysis. The advancement in the field of NLU can enhance the development in these models.
  • 3. INTRODUCTION • NLU is the emerging nowadays due to the increase in the GPT models , it mainly concentrates on analyzing and extracting information from human language text. NLU tasks include information-retrieval, summarization, language translation, classification. NLU aims to attain the task proficiency for tasks contained in standard benchmarking datasets like GLUE (General Language Understanding Evaluation) and superGLUE. (Super GLUE).
  • 4. METHODOLOGY • The methodology mainly consists of key-phrases required to search results, inclusion and exclusion criteria, selection results, quality assessments extraction and data synthesis. The below table shows the evolution of models in NLU
  • 5.
  • 6. BACKGROUND • NLU includes building language models training them, testing them for accuracy. This section contains the text classification tasks used in this paper. there are two types of QA tasks QA extractive and generative QA, we are only considering extractive QA in this part. NLI is used to predict whether we can predict meaning of one text from other. neural machine translation is used as a process to translate text by simulating human brain capabilities, the main goal of this part is to retain the meaning and intent of the language while translating if from one to other
  • 7.
  • 9. FEED FORWARD NETWORK BASED MODELS • Simple DL models for text representation include feedforward networks. Despite this, they have a good level of accuracy on several TC benchmarks. Text is viewed as a collection of words in these models. These models acquire a vector representation for each word by word2vec. These are popular embeddings models .Joulin et al. [23] introduced another classifier called fastText. It is efficient and straightforward
  • 10. RNN-BASED MODELS • Usually, the text is treated as an order of words in RNNbased models. The basic purpose of an RNN-based model for text categorization is to capture word relationships between sentences and text structure. Plain RNN-based models, on the other hand, do not perform as well as standard feed-forward neural networks.
  • 11. MODELS BASED ON CNN • CNNs are taught to identify patterns in space, while RNNs are trained to detect patterns over time [30]. RNNs perform well in NLP tasks like RQA-POS tagging, which need an understanding of long-range semantics, but CNN performs well in situations where sensing local and location-independent patterns in the document is critical.
  • 12. CAPSULE NEURAL NETWORK-BASED MODELS • CNN uses several layers of convolutions and pooling to classify pictures or text. Pooling operations detect significant features and minimize the computation complexity of convolution processes, but they miss spatial information and may misclassify items depending on their orientation or proportion.
  • 13. MODELS WITH MECHANISM OF ATTENTION • The way one pays attention to distinct sections of a photograph or related words of a single sentence motivates attention. Attention is becoming a central concept and tool in developing DL models for NLP [45]. It can be thought of as a vector of significant weights in a nutshell.
  • 14. MODELS BASED ON GRAPH NEURAL NETWORKS • Even though ordinary texts have a serial order, they also comprise inherent graph structures similar to parse trees that speculate the relationships based on syntax and semantics of the sentences.
  • 15. MODELS WITH HYBRID TECHNIQUES • Many hybrid models have been built to detect global and local documents by combining LSTM and CNN architectures.
  • 16. MODELS BASED ON TRANSFORMERS • The sequential processing of text is one of the computational obstacles that RNNs face. Even though RNNs are more sequential than CNNs, the computing cost of capturing associations among words in a phrase climb with the length of the sentence, much like CNNs.
  • 17.
  • 18. DISCUSSIONS AND LIMITATION • Large number of research papers are considered for this aspect in building large language models in DL
  • 19.
  • 20. • The above mentioned framework is proposed for creating NLU models in future, it is mainly done considering BERT models. The models has two parts multi-tasking and pre training, the proposed framework should work well as it combines multiple techniques. • Current models employ text classification tasks due scarcity of literature and research in multi tasking DL models in NLU, and availability of large number of models makes it difficult to find the suitable model and that meets the requirements
  • 21. CONCLUSION • Majority of issues faced by multi taking learning are same and the findings suggest that a hybrid model that contains strategies from Multi task learning and active learning , still there is lot of scope to figure out how to improve accuracy and resilient MTL models for general AI and next gen models
  • 22. REFERENCES • [23] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov, ``FastText.Zip: Compressing text classi cation models,’’ 2016, arXiv:1612.03651. • [30] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ``Gradient-based learning applied to document recognition,'' Proc. IEEE, vol. 86, no. 11, pp. 2278 2324, Nov. 1998. • [45] D. Bahdanau, K. Cho, and Y. Bengio, ``Neural machine translation by jointly learning to align and translate,'' 2014, arXiv:1409.0473. • https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9706456