Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx

Introduction to Deep Learning
for Language Processing
16/6/2025
Dr. Resmi N.G.
Senior Assistant Professor
Chinmaya Vishwa Vidyapeeth Deemed-to-
be University

Contents
Introduction to NLP
1. Evaluating Deep Learning
Models
3.
Key Deep Learning
Architectures for NLP
2. Future Trends in Deep
Learning for NLP
4.

Introduction to NLP
Core Concepts of Language Processing
01

https://commtelnetworks.com/exploring-the-impact-of-natural-language-processing-on-cni-operations/

History of NLP
https://blog.dataiku.com/nlp-metamorphosis

IBM and Georgetown University's Project (1954)
Onset of Natural Language Processing
The first public demonstration of machine translation: the Georgetown-IBM system, 7th January
1954 (https://open.unive.it/hitrade/books/HutchinsFirst.pdf)

Joseph Weizenbaum's chatbot
at MIT.
Pattern recognition for
simulating conversations.
Eliza (1966)
Rule-Based Models
https://dataproducts.io/introduction-to-natural-language-processing/

Terry Winograd's software at
MIT.
Understanding language in a
confined virtual environment.
SHRDLU (1970)
https://dataproducts.io/introduction-to-natural-language-processing/

Probabilistic approach for
identifying word roles.
Hidden Markov Models (HMM) (1971)
Shift to Statistical Approaches
https://medium.com/@postsanjay/hidden-markov-models-simplified-c3f58728caab

02
Key Deep Learning
Architectures for NLP

Definition and Importance
Deep learning is a subset of
machine learning that uses
algorithms modeled after the
human brain to analyze data.
Its importance lies in its ability to
learn hierarchical features from
data, enabling more sophisticated
models.
History and Evolution
Deep learning has evolved from early
neural networks developed in the
1950s.
Significant breakthroughs occurred
in the 2000s with the introduction of
more robust algorithms and
increased computational power,
paving the way for its current
applications.
Overview of Deep Learning

01
Neural Networks Architecture
Neural networks architecture refers to
the design and structure of neural
networks, including layers, nodes, and
connections, which enable learning
complex patterns in data.
Word Embeddings
Word embeddings are dense vector
representations of words that capture
semantic meanings and relationships,
allowing models to better process and
understand natural language.
02
Key Concepts in Deep Learning for NLP

Image created by ChatGPT
https://www.prepvector.com/blog/nlp-from-theory-to-practice
Traditional NLP Pipeline

https://ayselaydin.medium.com/1-text-preprocessing-techniques-for-nlp-37544483c007
Eg; https://platform.openai.com/tokenizer
Image created by Sora

Multilayer Perceptron(MLP) 1950s

https://medium.com/@rajan5787/recurrent-neural-networks-and-lstm-903862adb01
Recurrent Neural Networks (RNNs) 1986
Handling sequential dependencies.
Overcoming short- term relationship
limitations.
Learning representations by back-propagating errors (https://gwern.net/doc/ai/nn/1986-rumelhart-2.pdf) 1986
Finding Structure in Time(https://onlinelibrary.wiley.com/doi/epdf/10.1207/s15516709cog1402_1) 1990

PageRank Algorithm (1996)
https://en.wikipedia.org/wiki/PageRank

https://www.analyticsvidhya.com/blog/2021/03/introduction-to-long-short-term-memory-lstm/
Long Short Term Memory (LSTM) 1997
Long Short-Term Memory (https://www.bioinf.jku.at/publications/older/2604.pdf) 1997

Long Short- Term Memory (LSTM)
networks incorporated memory
cells and gates, effectively
managing long- term dependencies
and combating the vanishing
gradient challenge in traditional
RNNs.
Gated Recurrent Units (GRUs)
streamlined the LSTM architecture
by combining forget and input
gates into a single update gate,
maintaining performance while
increasing computational
efficiency.
LSTM: Addressing the Vanishing
Gradient Problem
GRU: A Simplified Alternative
Long Short Term Memory (LSTM) & Gated Recurrent Unit (GRU)
Sequence to Sequence Learning with Neural Networks (https://proceedings.neurips.cc/paper_files/paper/2014/file/5a18e133cbf9f257297f410bb7eca942-Paper.pdf) (2014)
Neural Machine Translation by Jointly Learning to Align and Translate (https://arxiv.org/abs/1409.0473) 2014

https://lena-voita.github.io/nlp_course/models/convolutional.html
Convolutional Neural Networks
(CNN)
Application in text classification and
image processing.
Convolutional Neural Network (CNN)
Convolutional Neural Networks for Sentence Classification (https://arxiv.org/abs/1408.5882) 2014

https://research.google/blog/zero-shot-translation-with-googles-multilingual-neural-machine-translation-system/
Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation (https://arxiv.org/abs/1406.1078) (2014)
Google Neural Machine Translation (GNMT) 2016

The Transformer Revolution (2017)
(Attention Is All You Need- https://arxiv.org/abs/1706.03762)
Parallelization and Self-Attention
• Self-Attention Mechanism: Allows each token
to attend to all others, capturing context across
entire sequences without recurrence.
• Positional Encoding: Injects sequence order
into the model since transformers lack
recurrence.
• Parallelization Advantage: Enables efficient
training by processing all input tokens
simultaneously, unlike RNNs.
https://www.youtube.com/watch?v=wjZofJX0v4M

Encoder-Decoder Architectures
Seq2Seq and Attention Mechanisms
• Sequence-to-Sequence (Seq2Seq): Encodes an
input sequence into a fixed vector, then decodes
it into an output sequence. Suited for translation
and summarization.
• Limitations of Fixed Vectors: Single vector
bottlenecks context retention for long
sequences, degrading performance.
• Attention Mechanism: Dynamically weights
input tokens at each decoding step, improving
alignment and translation accuracy.
https://spotintelligence.com/2023/09/28/sequence-to-sequence/

https://machinelearningmastery.com/the-transformer-model/
Transformer Architecture
Attention is All You Need (https://arxiv.org/abs/1706.03762) 2017

Transformer Variants
BERT, GPT, RoBERTa, T5
2018
BERT (Bidirectional Encoder
Representations from
Transformers )
Masked language model
leveraging bidirectional context;
excels in classification and QA
tasks.
Pre- training and fine- tuning
approach by Google.
GPT (OpenAI)(Autoregressive
Decoder)
Trained to predict next token;
powerful for generative tasks like
dialogue and storytelling. 2019
T5 (Google) & RoBERTa (FAIR)
T5 reframes NLP tasks as text-to-
text, while RoBERTa enhances
BERT with more training data and
tweaks.
GPT-1 (2018)
OpenAI's introduction of generative pre-
training.
GPT-2 (2019)
1.5 billion parameters.
GPT-3 (2020)
175 billion parameters.
Few- shot and zero- shot learning abilities.
GPT-4 (2023)
Multimodal Model
Approximately one trillion parameters.

Rise of Large Language Models
GPT-3, PaLM, Claude, LLaMA
• Scaling Laws: Larger models show emergent
abilities and improved generalization with more
parameters and training data.
• Versatile Capabilities: LLMs perform
translation, summarization, dialogue, and
reasoning in a zero/few-shot manner.
• Open vs Proprietary: Contrast between public
models (LLaMA) and closed models (GPT-3,
Claude) in accessibility and usage.
https://www.topbots.com/top-llm-research-papers-2023/

Cutting-edge LLMs (2025)
Gemini 2.5 Pro, GPT-4.5, DeepSeek R1, Meta Llama 4, Claude 3.7, Mistral
• Gemini 2.5 Pro: Google DeepMind’s flagship
model, released May 2025. Multimodal (text,
images, audio, video), 1 million-token context
window.
• GPT-4.5 (Orion) & Claude 3.7 (Sonnet): State-of-
the-art models with strong multimodal
reasoning and advanced instruction-following
abilities.
• Mistral & Open LLMs: Efficient, high-
performance open models that challenge
proprietary systems in benchmarks and
accessibility.

https://blog.gopenai.com/all-word-embedding-techniques-in-depth-768780914f6c

https://www.linkedin.com/pulse/demystifying-large-language-models-brij-kishore-pandey-6zo5e

https://digitaldata.science.blog/2022/01/18/natural-language-processing-basics-to-sota-models-part-1/

02 03
01
Natural Language Understanding
Natural Language Understanding
(NLU) involves teaching machines to
comprehend human language.
It includes tasks like sentiment
analysis and intent recognition,
crucial for conversational interfaces.
Machine Translation
Machine translation refers to the
automated process of translating text from
one language to another.
Deep learning has dramatically improved
translation accuracy by understanding
context, idiomatic expressions, and
nuances in language.
Text Generation
Text generation uses deep learning to
create coherent and contextually
relevant text.
This technology is employed in
content creation, chatbots, and
creative writing, showcasing its ability
to mimic human- like writing styles.
Applications in Language Processing

Visual Storytelling
Infographics Summarizing Findings
Applications

https://www.softermii.com/blog/how-to-build-a-large-language-model-step-by-step-guide

https://encord.com/blog/top-multimodal-models/
DALL-E

Evaluating Deep Learning
Models
05

https://www.evidentlyai.com/llm-guide/llm-benchmarks

https://datasciencedojo.com/blog/llm-evaluation-metrics-and-applications/
Bilingual Evaluation Understudy

Future Trends in Deep
Learning for NLP
06

Challenges and Future Trends
Multilinguality, Hallucination, On-Device NLP
• Hallucination & Robustness: LLMs may
produce plausible but incorrect output;
techniques like RAG and alignment help
mitigate.
• Multilingual & Low-Resource NLP: Creating
inclusive models that generalize across
languages and dialects remains a key challenge.
• On-Device and Private NLP: Trend toward
efficient, privacy-preserving NLP via quantized
models and edge deployments.
Photo by Volodymyr Hryshchenko on Unsplash

https://huggingface.co/blog/vlms

https://huggingface.co/blog/vlms
Structure of a Typical Vision Language Model

Neural Architecture Search (NAS) automates the
design of neural networks, optimizing architecture for
specific NLP tasks, potentially leading to better
performance and efficiency.
01
Neural Architecture Search
Zero- shot learning allows models to generalize to
unseen tasks without direct training on them,
enhancing their adaptability and utility across diverse
NLP applications.
02
Zero-shot Learning
Emerging Technologies

Bias in language models reflects societal
prejudices and can lead to unfair outcomes,
necessitating robust strategies to mitigate these
biases during development and deployment.
Bias in Language Models
Data privacy is a critical concern in NLP, as models
often rely on large datasets that may include
sensitive information, requiring strict adherence
to ethical guidelines and regulations.
Data Privacy Issues
Ethical Considerations
Ethical Considerations
As deep learning models become more prevalent, ethical
considerations regarding fairness, transparency, and accountability
must be prioritized to ensure responsible use and development of
technology.

https://www.linkedin.com/pulse/generative-ai-frameworks-tools-every-developeraiml-pavan-belagatti-2nvrc

Conclusion
Evolution and Outlook of NLP
• Architectural Progression: From RNNs to LLMs,
NLP models have dramatically expanded in
capability and scale.
• Real-World Impact: Deep learning powers
language applications across industries,
transforming communication and automation.
• Emerging Horizons: Sustainable, multilingual,
and adaptive NLP will shape the next generation
of AI systems.
Photo by SOULSANA on Unsplash

References
● https://www.slideshare.net/slideshow/introduction-to-natural-language-processing-stage
s-in-nlp-pipeline-challenges-in-nlp-ambiguities-in-nlp-language-models-tools-framework
s-and-datasets/280571940
For a brief introduction to natural language processing – stages in NLP pipeline, challenges etc. please refer to my
ppt :

Thank You
Dr. Resmi N.G.
Senior Assistant Professor
Chinmaya Vishwa Vidyapeeth Deemed-to-
be University

Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx

More Related Content

Similar to Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx

More from resming1

Recently uploaded

Deep Learning for Natural Language Processing_FDP on 16 June 2025 MITS.pptx

Editor's Notes