short story presentation on Framework for Deep Learning-Based Language Models Using Multi-Task Learning in Natural Language Understanding: A Systematic Literature Review and Future Directions
1. DEEP LEARNING-BASED LANGUAGE
MODELS USING MULTI-TASK LEARNING
IN NATURAL LANGUAGE
UNDERSTANDING: A SYSTEMATIC
LITERATURE REVIEW AND FUTURE
DIRECTIONS
BY:
SANJAY BHARGAV MADAMANCHI
SJSU ID: 016421587
2. ABSTRACT
• Learning a new language is difficult for all and its even more difficult for a computer to
learn and process human language. But due to the recent techniques in Deep
Learning(DL) Natural Language Processing(NLP) tasks are enhanced significantly,
but these models cannot be entirely generated using NLP models. In order to meet
the latest trends Natural Language Understanding(NLU) a subfield of NLP is
emerged. NLU tasks include things like machine translation, text entailment, dialogue
based systems, natural language inference, sentiment analysis. The advancement in
the field of NLU can enhance the development in these models.
3. INTRODUCTION
• NLU is the emerging nowadays due to the increase in the GPT models , it mainly
concentrates on analyzing and extracting information from human language text.
NLU tasks include information-retrieval, summarization, language translation,
classification. NLU aims to attain the task proficiency for tasks contained in
standard benchmarking datasets like GLUE (General Language Understanding
Evaluation) and superGLUE. (Super GLUE).
4. METHODOLOGY
• The methodology mainly consists of key-phrases required to search results,
inclusion and exclusion criteria, selection results, quality assessments extraction
and data synthesis. The below table shows the evolution of models in NLU
5.
6. BACKGROUND
• NLU includes building language models training them, testing them for accuracy.
This section contains the text classification tasks used in this paper. there are two
types of QA tasks QA extractive and generative QA, we are only considering
extractive QA in this part. NLI is used to predict whether we can predict meaning
of one text from other. neural machine translation is used as a process to
translate text by simulating human brain capabilities, the main goal of this part is
to retain the meaning and intent of the language while translating if from one to
other
9. FEED FORWARD NETWORK BASED MODELS
• Simple DL models for text representation include feedforward networks. Despite
this, they have a good level of accuracy on several TC benchmarks. Text is
viewed as a collection of words in these models. These models acquire a vector
representation for each word by word2vec. These are popular embeddings
models .Joulin et al. [23] introduced another classifier called fastText. It is efficient
and straightforward
10. RNN-BASED MODELS
• Usually, the text is treated as an order of words in RNNbased models. The basic
purpose of an RNN-based model for text categorization is to capture word
relationships between sentences and text structure. Plain RNN-based models, on
the other hand, do not perform as well as standard feed-forward neural networks.
11. MODELS BASED ON CNN
• CNNs are taught to identify patterns in space, while RNNs are trained to detect
patterns over time [30]. RNNs perform well in NLP tasks like RQA-POS tagging,
which need an understanding of long-range semantics, but CNN performs well in
situations where sensing local and location-independent patterns in the document
is critical.
12. CAPSULE NEURAL NETWORK-BASED MODELS
• CNN uses several layers of convolutions and pooling to classify pictures or text.
Pooling operations detect significant features and minimize the computation
complexity of convolution processes, but they miss spatial information and may
misclassify items depending on their orientation or proportion.
13. MODELS WITH MECHANISM OF ATTENTION
• The way one pays attention to distinct sections of a photograph or related words
of a single sentence motivates attention. Attention is becoming a central concept
and tool in developing DL models for NLP [45]. It can be thought of as a vector of
significant weights in a nutshell.
14. MODELS BASED ON GRAPH NEURAL NETWORKS
• Even though ordinary texts have a serial order, they also comprise inherent graph
structures similar to parse trees that speculate the relationships based on syntax
and semantics of the sentences.
15. MODELS WITH HYBRID TECHNIQUES
• Many hybrid models have been built to detect global and local documents by
combining LSTM and CNN architectures.
16. MODELS BASED ON TRANSFORMERS
• The sequential processing of text is one of the computational obstacles that
RNNs face. Even though RNNs are more sequential than CNNs, the computing
cost of capturing associations among words in a phrase climb with the length of
the sentence, much like CNNs.
17.
18. DISCUSSIONS AND LIMITATION
• Large number of research papers are considered for this aspect in building large
language models in DL
19.
20. • The above mentioned framework is proposed for creating NLU models in future, it
is mainly done considering BERT models. The models has two parts multi-tasking
and pre training, the proposed framework should work well as it combines
multiple techniques.
• Current models employ text classification tasks due scarcity of literature and
research in multi tasking DL models in NLU, and availability of large number of
models makes it difficult to find the suitable model and that meets the
requirements
21. CONCLUSION
• Majority of issues faced by multi taking learning are same and the findings
suggest that a hybrid model that contains strategies from Multi task learning and
active learning , still there is lot of scope to figure out how to improve accuracy
and resilient MTL models for general AI and next gen models
22. REFERENCES
• [23] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, and T. Mikolov,
``FastText.Zip: Compressing text classi cation models,’’ 2016, arXiv:1612.03651.
• [30] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, ``Gradient-based learning
applied to document recognition,'' Proc. IEEE, vol. 86, no. 11, pp. 2278 2324,
Nov. 1998.
• [45] D. Bahdanau, K. Cho, and Y. Bengio, ``Neural machine translation by jointly
learning to align and translate,'' 2014, arXiv:1409.0473.
• https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9706456