Contextual AI - Leveraging RNNs,
LSTMs, and Transformers for
Predictive Text Generation
IT582 - Foundations of Machine Learning
Members:
Neel Milind Bokil - 202411027
Monson Reji Verghese - 202411039
Preet Rahul Shah - 202411053 Faculty: M. V. Joshi Sir
Predictive text input suggests words or
phrases as you type to help you complete
sentences faster.
As you begin typing a word or sentence, the
system analyzes the characters and context
to predict what you're likely to type next.
It then offers suggestions that you can
choose from, which helps you complete
your message more quickly.
So... What is Predictive Text Input ??
Problem Statement
Predictive text generation systems face challenges in balancing performance and
efficiency, managing high model complexity, and ensuring interpretability. Our
project addresses these issues by integrating RNNs, LSTMs, and transformers to
develop a more efficient, interpretable, and high-performing text generation
system.
Balancing Performance and Efficiency Managing Model Complexity
Enhancing Interpretability Creating a High-Performing
System
Motivation
Predictive text generation systems are integral to numerous applications, from
automated content creation to conversational agents. Despite their
advancements, these systems face significant challenges in balancing
performance, complexity, and interpretability. Traditional Recurrent Neural
Networks (RNNs) and Long Short-Term Memory (LSTM) networks, while
foundational, often fall short in capturing long-range dependencies and
complex contextual relationships within text. This limitation impedes their
effectiveness in generating coherent and contextually relevant text.
Literature Review
[1] CNO-LSTM: A Chaotic Neural Oscillatory Long Short-
Term Memory Model for Text Classification.
Model Introduction:
The CNO-LSTM model enhances traditional LSTM architecture by
integrating chaotic neural oscillation principles.
Designed specifically for text classification tasks, improving the model's ability
to capture complex patterns and dependencies in text data.
Evaluation Datasets:
20 Newsgroups: ~20,000 documents across 20 categories.
Reuters-21578: Thousands of news documents categorized into 135 topics.
IMDb: Movie reviews labeled for sentiment analysis.
IEEE access
[2] Attention Is All You Need
Transformer: Novel architecture for sequence-to-sequence tasks that relies solely on
self-attention mechanisms.
Encoder: Consists of a stack of identical layers (6 in the original model). Each layer has
two sub-layers:
Multi-head self-attention mechanism
Feed-forward neural network
Decoder: Also consists of a stack of identical layers (6 in the original model). Each layer
has three sub-layers:
Multi-head self-attention mechanism
Encoder-decoder attention mechanism
Feed-forward neural network
Positional Encoding: Added to the input embeddings to retain the positional
information of tokens.
NIPS
[3] BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
NAACL
Model Introduction:
BERT is a deep bidirectional transformer model designed for language understanding.
Pre-trains on unlabeled text by jointly conditioning on left and right context in all
layers.
Unlike unidirectional models (e.g., GPT), BERT uses both previous and next words for
better context.
Evaluation Datasets:
GLUE: BERT achieved a score of 80.5, which was a 7.7 point absolute improvement
over the previous state-of-the-art.
MultiNLI: BERT reached an accuracy of 86.7%, representing a 4.6% absolute
improvement.
SQuAD v1.1: BERT obtained a Test F1 score of 93.2, a 1.5 point improvement.
SQuAD v2.0: BERT achieved a Test F1 score of 83.1, improving by 5.1 points.
Preferred Datasets
SQuAD2.0
The Stanford Question Answering Dataset
WikiText-103
Large Movie Review Dataset
v1.0 (IMDb)
AG News (AG’s News Corpus)
Expected results and conclusions
Improved Accuracy: Integrating Transformers with LSTMs should enhance
predictive text generation accuracy.
Reduced Complexity and better task Performance : Optimization methods
like pruning will likely lower model complexity while maintaining accuracy.
Increased Interpretability: Visualizing attention will make it easier to understand
how the model makes decisions.
Comparative Analysis among diiferent models.
Tentative Timeline of the Project
Week 1: Studying research papers in detail
Week 2: Study specified models
Week 3-4: Data collection/preprocessing and Model Selection
Week 5-6: Training and Fine-Tuning
Week 7-9: Implementation
Week 10: UI Interface
Conclusion
References
[1] Shi, N., Chen, Z., Chen, L., & Lee, R. (2022). CNO-LSTM: A Chaotic Neural Oscillatory
Long Short-Term Memory Model for Text Classification.IEEE Access, 10, 129564-
129579. https://doi.org/10.1109/ACCESS.2022.3228600.
[2] Vaswani, A. "Attention is all you need." Advances in Neural Information Processing
Systems (2017).
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
Proceedings of the 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for
Computational Linguistics.

ML Project Presentation - Predictive text input generation

  • 1.
    Contextual AI -Leveraging RNNs, LSTMs, and Transformers for Predictive Text Generation IT582 - Foundations of Machine Learning Members: Neel Milind Bokil - 202411027 Monson Reji Verghese - 202411039 Preet Rahul Shah - 202411053 Faculty: M. V. Joshi Sir
  • 2.
    Predictive text inputsuggests words or phrases as you type to help you complete sentences faster. As you begin typing a word or sentence, the system analyzes the characters and context to predict what you're likely to type next. It then offers suggestions that you can choose from, which helps you complete your message more quickly. So... What is Predictive Text Input ??
  • 3.
    Problem Statement Predictive textgeneration systems face challenges in balancing performance and efficiency, managing high model complexity, and ensuring interpretability. Our project addresses these issues by integrating RNNs, LSTMs, and transformers to develop a more efficient, interpretable, and high-performing text generation system. Balancing Performance and Efficiency Managing Model Complexity Enhancing Interpretability Creating a High-Performing System
  • 4.
    Motivation Predictive text generationsystems are integral to numerous applications, from automated content creation to conversational agents. Despite their advancements, these systems face significant challenges in balancing performance, complexity, and interpretability. Traditional Recurrent Neural Networks (RNNs) and Long Short-Term Memory (LSTM) networks, while foundational, often fall short in capturing long-range dependencies and complex contextual relationships within text. This limitation impedes their effectiveness in generating coherent and contextually relevant text.
  • 5.
  • 6.
    [1] CNO-LSTM: AChaotic Neural Oscillatory Long Short- Term Memory Model for Text Classification. Model Introduction: The CNO-LSTM model enhances traditional LSTM architecture by integrating chaotic neural oscillation principles. Designed specifically for text classification tasks, improving the model's ability to capture complex patterns and dependencies in text data. Evaluation Datasets: 20 Newsgroups: ~20,000 documents across 20 categories. Reuters-21578: Thousands of news documents categorized into 135 topics. IMDb: Movie reviews labeled for sentiment analysis. IEEE access
  • 7.
    [2] Attention IsAll You Need Transformer: Novel architecture for sequence-to-sequence tasks that relies solely on self-attention mechanisms. Encoder: Consists of a stack of identical layers (6 in the original model). Each layer has two sub-layers: Multi-head self-attention mechanism Feed-forward neural network Decoder: Also consists of a stack of identical layers (6 in the original model). Each layer has three sub-layers: Multi-head self-attention mechanism Encoder-decoder attention mechanism Feed-forward neural network Positional Encoding: Added to the input embeddings to retain the positional information of tokens. NIPS
  • 8.
    [3] BERT: Pre-trainingof Deep Bidirectional Transformers for Language Understanding NAACL Model Introduction: BERT is a deep bidirectional transformer model designed for language understanding. Pre-trains on unlabeled text by jointly conditioning on left and right context in all layers. Unlike unidirectional models (e.g., GPT), BERT uses both previous and next words for better context. Evaluation Datasets: GLUE: BERT achieved a score of 80.5, which was a 7.7 point absolute improvement over the previous state-of-the-art. MultiNLI: BERT reached an accuracy of 86.7%, representing a 4.6% absolute improvement. SQuAD v1.1: BERT obtained a Test F1 score of 93.2, a 1.5 point improvement. SQuAD v2.0: BERT achieved a Test F1 score of 83.1, improving by 5.1 points.
  • 9.
    Preferred Datasets SQuAD2.0 The StanfordQuestion Answering Dataset WikiText-103 Large Movie Review Dataset v1.0 (IMDb) AG News (AG’s News Corpus)
  • 10.
    Expected results andconclusions Improved Accuracy: Integrating Transformers with LSTMs should enhance predictive text generation accuracy. Reduced Complexity and better task Performance : Optimization methods like pruning will likely lower model complexity while maintaining accuracy. Increased Interpretability: Visualizing attention will make it easier to understand how the model makes decisions. Comparative Analysis among diiferent models.
  • 11.
    Tentative Timeline ofthe Project Week 1: Studying research papers in detail Week 2: Study specified models Week 3-4: Data collection/preprocessing and Model Selection Week 5-6: Training and Fine-Tuning Week 7-9: Implementation Week 10: UI Interface Conclusion
  • 12.
    References [1] Shi, N.,Chen, Z., Chen, L., & Lee, R. (2022). CNO-LSTM: A Chaotic Neural Oscillatory Long Short-Term Memory Model for Text Classification.IEEE Access, 10, 129564- 129579. https://doi.org/10.1109/ACCESS.2022.3228600. [2] Vaswani, A. "Attention is all you need." Advances in Neural Information Processing Systems (2017). [3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for Computational Linguistics.