Natural Language Processing - Research and Application Trends

Research Opportunities, Applications and Trends in
Natural Language Processing (NLP)
Dr. Shreyas Rao
Associate Professor & Research Coordinator, Dept. of CSE,
Sahyadri College of Engineering and Management
Webinar on NLP 1
Center of Excellence in Artificial Intelligence & Machine Learning
Dept. of CSE, SCEM
Presents

10-12-2021 Webinar on NLP 2
1. Basics of NLP
2. NLP Research & Models
3. NLP Applications
4. Key Research and Project Opportunities
Contents of the Webinar

Natural Language Processing
 NLP stands for Natural Language Processing
 Subfield of Computer Science, Human language
(Linguistics), and Artificial Intelligence
 It is the technology that is used by machines to
understand, analyze, manipulate, and interpret
human's languages.

Basics of NLP
 Users can provide two kinds of input to Machine –> Text or Speech
 Machine has to first understand the human speech in order to act on it
Examples:
 User types a message “Sahyadri College Mangalore ” in Google Search
 User tells Alexa “Play me Kishore Kumar Songs”
In both the examples, Machine should understand the Syntax, Semantics,
Context and Intent of the statement, in order to provide a favorable reply.

Components of NLP
Understands the
Human Speech
Process and Action

Components of NLP
 Understand the Text
 Understand Syntax, Context, Sentiment,
Semantics and Intent from the text
 Syntactic Analysis
 Lemmatization, Stemming, Word
Segmentation, POS tagging etc.
 Semantic Analysis
 Named Entity Recognition (NER), Word
Sense Disambiguation etc.
 Produce meaningful sentences in
human-understandable text
 NLG Models:
 Markov Chain
 Recurrent Neural Network (RNN)
 Long Short-Term Memory (LSTM)
 Transformer
NLU NLG

NLP Pipeline for Natural Language Understanding
1. Normalization 2. Tokenization
3. Stop Words
Removal
4. Spell Check
5. Stemming/
Lemmatization
6. Conversational
Context
7. Named Entity
Recognition
8. Figures of Speech
Determination

Tools for Natural Language Understanding
 Sample Python Program to demonstrate NLU (NLP Pipeline) using NLTK
 Stanford CoreNLP Parser (Python)
 NLTK – Natural Language Toolkit (Python)
 SpaCy (Python)
 Apache OpenNLP (opensource Java Library)

NLP - RESEARCH
Sl. No Task Model
1 Word Encoding Neural Network
2 Sentence Encoding, Next word prediction, Learn
language
Recurrent Neural Network (RNN)
LSTM (Long Short-Term Memory)
3 Language Translation Encoder-Decoder
4 Translation, Text Generation, Text Summarization,
Text Paraphrasing, Text Classification, Language
Translation
Transformers
(Transfer Learning)

1. Neural Network
Word2Vec using Neural Networks
Disadvantages:
Word embedding captures meaning of words
How about sentences?

2. Recurrent Neural Network (RNN)
Represents “seq2seq” (Sequence to Sequence) Models
To convert sequences of Type A to sequences of Type B.
• Machine Translation
• Text Summarization
• Speech Recognition
• Question-Answering System

Recurrent Neural Network (RNN)
RNN learns Sentence Representations and Relationships RNN can predict next word
Disadvantages:
1. Could not parallelize efficiently
2. Longer sentences lose context
Ex: I am born in France, I speak fluent French

3. Encoder-Decoder
• Two RNNs are jointly trained
• The first RNN creates context
• The second RNN uses the final context of the first RNN.

3. Encoder-Decoder
Encoding - Convert Text to Vector format
One-Hot Encoding
Decoding - Get Final result

3. Encoder-Decoder
Word – Alignment Issue
Translation Context
Issue

4. Transfer Learning
Transfer learning is a deep-learning technique where a model is first pre-
trained on a data-rich task before being fine-tuned on a downstream task, has
emerged as a powerful technique in natural language processing (NLP).
The effectiveness of transfer learning has given rise to a diversity of
approaches, methodology, and practice.

Transfer Learning
Model developed for task 1 is reused for other tasks

Traditional ML vs Transfer Learning

Transformers
• Implements “Transfer Learning” technique towards NLP tasks
• Represents the “State-of-the-art” NLP Architecture since 2019
• Successful on 11 NLP tasks
• Promoted by Companies such as Google, OpenAI, Facebook, Allen Institute
forAI, Microsoft, Amazon,Grammarly etc.
• “HuggingFace” is the single most point of contact for Models, Datasets,
Resources regardingTransformers

Transformer Architecture
Ex: The animal didn't cross the
street, because it was too wide
Self-attention, sometimes
called intra-attention, is an
attention mechanism relating
different positions of a single
sequence in order to compute a
representation of the sequence
Encoding – Convert Text to Vector
format
One-Hot Encoding
The softmax function is used as the
activation function in the output layer of
neural networks when the network's goal is
to learn classification problems.
The softmax will squash the output results
between 0 and 1, and the sum of all outputs
will always add up to 1.

Transformer Architecture
German
English

Transformer Models
Sl. No Transformer Model Company & Info
1 BERT (Bidirectional Encoder Representations from Transformers)
BERT Base - 12 encoder & 768 hidden layers
BERT Large - 24 encoder & 1024 hidden layers
Google
Pre-trained on unlabeled data extracted
from BookCorpus with 800M words and
English Wikipedia with 2500M words
Machine Translation, Question Answering
2 GPT (Generative Pre-trained Transformer)
GPT-2 -> Trained for 1.5B parameters for a dataset of 8 Million Web
Pages
GPT-3 -> Trained on 570GB of text from Textbooks and Wikipedia
GPT-NEO -> Open source version of GPT-3
OpenAI (Co-founder Elon Musk)
Text Generation or Word prediction,
question answering, summarization
3 T5 (Text-To-Text-Transfer-Transformer Model) Question Answering, Language Translation
4 RoBERTa (Robustly Optimized BERT Pretraining Approach)
Trained on 360GB of text
Facebook
Multi-class Text Classification
All these models support Deep Learning Libraries TensorFlow (Google) and PyTorch
(Facebook)

Applications of Transformers

Applications of Transformers (NLP)
Question – Answering
Question Answering focuses on building
systems that automatically answer the
questions asked by humans in a natural
language.
Ex: Virtual Assistants (Alexa, Google Mini)
Spam detection
Is used to detect unwanted e-mails getting to a
user's inbox.

Sentiment Analysis
Also called “Opinion Mining”
This application is implemented through a
combination of NLP (Natural Language
Processing) and statistics by assigning the
values to the text (positive, negative, or
natural), identify the mood of the context
(happy, sad, angry, etc.)
Machine translation
Is used to translate text or speech from one natural
language to another natural language.

Text Generation
Used to generate text automatically based on user
data and context.
Ref: https://transformer.huggingface.co/doc/gpt2-
large
Text Summarization / Text Paraphrasing / GEC
Text Summarizer tools: Summarize Bot,
Resoomer, SMMRY
Text Paraphrasing tools – Quillbot, Spinbot,
Grammarly, GoParaphrase etc.
GEC (Grammatical Error Checker) - Grammarly

A Chatbot is an Artificial intelligence (AI) software that can simulate a
conversation (or a chat) with a user in Natural Language (Text or Voice) through
messaging applications, websites, mobile apps or through the telephone.
Chatbot (Most popular Use Case for NLP)
Chatbot Market Size and Trends

Chatbot Framework and Tools
Popular Frameworks Popular Tools
(No Code)

Chatbot Ecosystem
Multi-Channel
Digital Voice
Text
Website Chat
FB Messenger
WhatsApp
Google Assistant
Mobile App
Car / TV
Google Home
Alexa
NLP Engine
Chatbot
Custom Response
Normal Response
FAQs
Document
Intent Action
Microservice
External APIs
(JSON API)
Conversation Management
Fulfillment
Support Integration
Webhook
Request
Webhook
Response
Request
Response
Request
Response
Pattern Matching
(AIML)
Multinomial
Naive Bayes
NLU
Intent Matching
Entity Recognition
Context
Management
Dialog Workflow

Dhriti – Mental Health Resource Chatbot
Link - https://app.engati.com/static/standalone/bot.html?bot_key=889d005935e7437b
 Dhriti means ‘Vision’, ‘Firmness’, ‘Perseverance’, ‘Patience’, ‘Determination’
 Corona Second Wave has affected millions of people in India physically, mentally and
emotionally.
 Dhriti Bot provides “resources related to Mental Health” that can help people cope up
with the pandemic and gives them HOPE to move ahead with positivity and enthusiasm.
 Bot caters to Dakshina Kannada and Bangalore regions.
 Bot is multi-lingual and supports English, Kannada and Hindi languages.

Dhriti – Mental Health Resource Chatbot
Features of the bot
Helpline (Suicide, Alcohol De-addiction, General)
Directory of Mental Health Therapists (Counsellors, Psychologists and Psychiatrists)
Grief Counselling Support
Mental Health Apps (Anxiety, Depression, OCD, Addictions etc.)
Techniques for Mental Wellbeing (Yoga, Meditation, Pranayama)
Social Wellbeing (Recreational Apps & Support Groups)
General Covid Information (Symptoms, Prevention, Vaccination, Social Distance etc.)

Key Research Opportunities
Researchers / M.Tech Students
1. Select a NLPTask to conduct research
2. Review state-of-the-art literature inTransformers based on the task
3. Review appropriateTransformer Models (T5, BERT, GPT-2)
4. Contribute in Dataset and/or Model Fine-Tuning for various domains such
as Healthcare, Agriculture, Banking etc.
5. Involve in Chatbot / Virtual Assistant research (Fine tuning of NLU, NLP
pipeline, Models etc.)

Key Project Opportunities
B.Tech / M.Tech Students
1. Select a NLPTask and preferably a domain
2. Get open source datasets from Kaggle / HuggingFace portal OR build own
dataset
3. Use the existingTransformer Model to complete the task and get output
4. Fine-tune the models, if required
5. Provide a good UI for showcasing the results
6. Develop a Chatbot based on domain requirements

Natural Language Processing - Research and Application Trends

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Similar to Natural Language Processing - Research and Application Trends

Similar to Natural Language Processing - Research and Application Trends (20)

Recently uploaded

Recently uploaded (20)

Natural Language Processing - Research and Application Trends

Editor's Notes