ML Project Presentation - Predictive text input generation

Contextual AI - Leveraging RNNs,
LSTMs, and Transformers for
Predictive Text Generation
IT582 - Foundations of Machine Learning
Members:
Neel Milind Bokil - 202411027
Monson Reji Verghese - 202411039
Preet Rahul Shah - 202411053 Faculty: M. V. Joshi Sir

Predictive text input suggests words or
phrases as you type to help you complete
sentences faster.
As you begin typing a word or sentence, the
system analyzes the characters and context
to predict what you're likely to type next.
It then offers suggestions that you can
choose from, which helps you complete
your message more quickly.
So... What is Predictive Text Input ??

Problem Statement
Predictive text generation systems face challenges in balancing performance and
efficiency, managing high model complexity, and ensuring interpretability. Our
project addresses these issues by integrating RNNs, LSTMs, and transformers to
develop a more efficient, interpretable, and high-performing text generation
system.
Balancing Performance and Efficiency Managing Model Complexity
Enhancing Interpretability Creating a High-Performing
System

Motivation
Predictive text generation systems are integral to numerous applications, from
automated content creation to conversational agents. Despite their
advancements, these systems face significant challenges in balancing
performance, complexity, and interpretability. Traditional Recurrent Neural
Networks (RNNs) and Long Short-Term Memory (LSTM) networks, while
foundational, often fall short in capturing long-range dependencies and
complex contextual relationships within text. This limitation impedes their
effectiveness in generating coherent and contextually relevant text.

[1] CNO-LSTM: A Chaotic Neural Oscillatory Long Short-
Term Memory Model for Text Classification.
Model Introduction:
The CNO-LSTM model enhances traditional LSTM architecture by
integrating chaotic neural oscillation principles.
Designed specifically for text classification tasks, improving the model's ability
to capture complex patterns and dependencies in text data.
Evaluation Datasets:
20 Newsgroups: ~20,000 documents across 20 categories.
Reuters-21578: Thousands of news documents categorized into 135 topics.
IMDb: Movie reviews labeled for sentiment analysis.
IEEE access

[2] Attention Is All You Need
Transformer: Novel architecture for sequence-to-sequence tasks that relies solely on
self-attention mechanisms.
Encoder: Consists of a stack of identical layers (6 in the original model). Each layer has
two sub-layers:
Multi-head self-attention mechanism
Feed-forward neural network
Decoder: Also consists of a stack of identical layers (6 in the original model). Each layer
has three sub-layers:
Multi-head self-attention mechanism
Encoder-decoder attention mechanism
Feed-forward neural network
Positional Encoding: Added to the input embeddings to retain the positional
information of tokens.
NIPS

[3] BERT: Pre-training of Deep Bidirectional
Transformers for Language Understanding
NAACL
Model Introduction:
BERT is a deep bidirectional transformer model designed for language understanding.
Pre-trains on unlabeled text by jointly conditioning on left and right context in all
layers.
Unlike unidirectional models (e.g., GPT), BERT uses both previous and next words for
better context.
Evaluation Datasets:
GLUE: BERT achieved a score of 80.5, which was a 7.7 point absolute improvement
over the previous state-of-the-art.
MultiNLI: BERT reached an accuracy of 86.7%, representing a 4.6% absolute
improvement.
SQuAD v1.1: BERT obtained a Test F1 score of 93.2, a 1.5 point improvement.
SQuAD v2.0: BERT achieved a Test F1 score of 83.1, improving by 5.1 points.

Preferred Datasets
SQuAD2.0
The Stanford Question Answering Dataset
WikiText-103
Large Movie Review Dataset
v1.0 (IMDb)
AG News (AG’s News Corpus)

Expected results and conclusions
Improved Accuracy: Integrating Transformers with LSTMs should enhance
predictive text generation accuracy.
Reduced Complexity and better task Performance : Optimization methods
like pruning will likely lower model complexity while maintaining accuracy.
Increased Interpretability: Visualizing attention will make it easier to understand
how the model makes decisions.
Comparative Analysis among diiferent models.

Tentative Timeline of the Project
Week 1: Studying research papers in detail
Week 2: Study specified models
Week 3-4: Data collection/preprocessing and Model Selection
Week 5-6: Training and Fine-Tuning
Week 7-9: Implementation
Week 10: UI Interface
Conclusion

References
[1] Shi, N., Chen, Z., Chen, L., & Lee, R. (2022). CNO-LSTM: A Chaotic Neural Oscillatory
Long Short-Term Memory Model for Text Classification.IEEE Access, 10, 129564-
129579. https://doi.org/10.1109/ACCESS.2022.3228600.
[2] Vaswani, A. "Attention is all you need." Advances in Neural Information Processing
Systems (2017).
[3] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT:
Pre-training of Deep Bidirectional Transformers for Language Understanding. In
Proceedings of the 2019 Conference of the North American Chapter of the
Association for Computational Linguistics: Human Language Technologies, Volume 1
(Long and Short Papers), pages 4171–4186, Minneapolis, Minnesota. Association for
Computational Linguistics.

ML Project Presentation - Predictive text input generation

More Related Content

Similar to ML Project Presentation - Predictive text input generation

Recently uploaded

ML Project Presentation - Predictive text input generation