SlideShare a Scribd company logo
Silversparro Technologies Pvt. Ltd.
Latest Trends in NLP
Deep Learning Intern
Milind Kudapa
Silversparro Technologies Pvt. Ltd.
Latest Trends in NLP
Exploring BERT
Deep Learning Intern
Sourya Dipta Das
Silversparro Technologies Pvt. Ltd.
Outline
● Finite State Automata
● Bag of Words - Naive Bayes approach
● Word2Vec - CBOW and SkipGram
● Seq2seq models
● Attention and Transformer
● ElMo and GPT
● BERT
Silversparro Technologies Pvt. Ltd.
Why Representation Learning?
● Unlike pixel values of images in Computer Vision, machines cannot understand
words as they are.
● Some form of representation in the form of numbers necessary for the machine
to understand.
● Hence, word embeddings
Silversparro Technologies Pvt. Ltd.
Finite State Automata
Examples: equal (adj1) + -al (q1) + -iz (q2) + -e (q3) = equalize
Rule Based
Silversparro Technologies Pvt. Ltd.
Bag of words and Naive Bayes
Working
● Vocabulary of known words
● Frequency of occurrence of words
Limitation
Naive Assumption
● Occurence of one word is independent of the
occurrences of all other words.
● Information on the order of words is lost
● OOV words cannot be modelled
Silversparro Technologies Pvt. Ltd.
Neural Models - Word2Vec
(Mikolov et. al. 2013)
King - Man + Woman = Queen
● First revolution in NLP as neural models were
used first time.
● CBOW - Predict word based on nearby context
words.
● SkipGram - Predict context words given the
target word.
Silversparro Technologies Pvt. Ltd.
Limitations of Word2Vec
There is no representation for out-of-vocabulary words (OOVs).
How to separate some opposite word pairs. For example, “good” and “bad” are usually
located very close to each other in the vector space, which may limit the
performance of word vectors in NLP tasks like sentiment analysis.
Embeddings are not context based, for e.g. the word ‘crane’ can be used in different
contexts but word2vec gives it the same representation, thus leading to loss of
information.
Silversparro Technologies Pvt. Ltd.
Seq2seq models
Silversparro Technologies
● Use of GRUs and LSTMs.
● Second Revolution in NLP
● Tasks such as Machine translation, Question
Answering, sentence classification etc. have
been achieved using these models.
Silversparro Technologies Pvt. Ltd.
ELMo, GPT - New Age in NLP
Silversparro Technologies
● Feature Based and Fine Tuning strategies.
● ELMo (Peters et. al) - feature based and GPT (Radford et. al.) - fine-tuning.
● They use unidirectional language models to learn general language
representations.
● ELMo uses bidirectional LSTM on a next word prediction task.
● In OpenAI GPT, the authors use a left-to-right architecture, and a Transformer as
a decoder.
Contextualized word embeddings
Silversparro Technologies Pvt. Ltd.
Silversparro Technologies Pvt. Ltd.
Bidirectional Encoder Representations
● Devlin et. al. 2018, Google Research
● The masked language model randomly
masks some of the tokens from the input,
and the objective is to predict the original
vocabulary id of the masked word based
only on its context.
● There are two steps in the framework: pre-
training and fine-tuning.
● Pre-training is first done on unlabeled data
on different tasks.
● For fine-tuning, the trained parameters
are first initialized and then the model is
fine tuned on different downstream tasks.
from Transformers - BERT
Silversparro Technologies Pvt. Ltd.
Model
Two models,
● BERT-Base - 12 encoder blocks - 110M
parameters
● BERT-Large - 24 encoder blocks - 340M
parameters
● BERT encoder is an semi-supervised model
trained on two tasks:
● Masked Language Model: 15% of tokens in a
sentence are masked [MASK] and the model
learns to predict the masked tokens.
● Next Sentence Prediction: The model is
trained to classify whether a particular
sentence follows the given sentence or not.
● For the pre-training corpus the authors used
the BooksCorpus (800M words) (Zhu et al.,
2015) and English Wikipedia (2,500M words).
Silversparro Technologies Pvt. Ltd.
Attention is all you need
Working of a Transformer:
● Uses Attention instead of Recurrent Units like
LSTM.
● Three trainable matrices are introduced,
Vaswani et. al.
● Queries, Keys, Values.
● Information regarding order of words is lost,
hence position embeddings are used.
Silversparro Technologies Pvt. Ltd.
Silversparro Technologies Pvt. Ltd.
Embedding from BERT
Silversparro Technologies Pvt. Ltd.
Our Task
● Customer Care call transcripts from large scale insurance aggregator.
● Hinglish (English + Hindi written in english)
● Task: Binary classification whether a given person will buy the service or not.
● Pre-trained model on 171M word corpus
● Achieved a pre-training accuracy of 75% on MLM.
Silversparro Technologies Pvt. Ltd.
XL-Net, bigger is better?
● Focus on no of parameters
● How they used autoencoders (read about them)
● Latest results

More Related Content

What's hot

[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
JEE HYUN PARK
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
Senthil Kumar M
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
zperjaccico
 
Bert
BertBert
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
Suman Debnath
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
cscpconf
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
gohyunwoong
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
H K Yoon
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
theyaseen51
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
Suman Debnath
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
Alexey Grigorev
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
Jaemin Cho
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
RIILP
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
AbdurrahimDerric
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
Manish Mishra
 
Word embedding
Word embedding Word embedding
Word embedding
ShivaniChoudhary74
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
seungwoo kim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
Tae Hwan Jung
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
Nuwan Sriyantha Bandara
 

What's hot (20)

[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
 
Bert
BertBert
Bert
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Word embedding
Word embedding Word embedding
Word embedding
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 

Similar to Latest trends in NLP - Exploring BERT

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Deep Learning Italia
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
helloworld28847
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
sudeshnakundu10
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solution
ArtsemZhyvalkouski
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
Jayavardhan Reddy Peddamail
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
Konstantin Savenkov
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET Journal
 
attention mechanism need_transformers.pptx
attention mechanism need_transformers.pptxattention mechanism need_transformers.pptx
attention mechanism need_transformers.pptx
imbasarath
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
IRJET Journal
 
How to build a GPT model.pdf
How to build a GPT model.pdfHow to build a GPT model.pdf
How to build a GPT model.pdf
StephenAmell4
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
RahulKumar854607
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
IRJET Journal
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
Arvind Devaraj
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
IRJET Journal
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
MedBelatrach
 
Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3
IRJET Journal
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
Arvind Devaraj
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B Kute
Tushar B Kute
 
Alabot
AlabotAlabot
Alabot
Gaurav P
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
IRJET Journal
 

Similar to Latest trends in NLP - Exploring BERT (20)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solution
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
attention mechanism need_transformers.pptx
attention mechanism need_transformers.pptxattention mechanism need_transformers.pptx
attention mechanism need_transformers.pptx
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
 
How to build a GPT model.pdf
How to build a GPT model.pdfHow to build a GPT model.pdf
How to build a GPT model.pdf
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B Kute
 
Alabot
AlabotAlabot
Alabot
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 

More from Silversparro Technologies

Interpreting NLP Predictions
Interpreting NLP PredictionsInterpreting NLP Predictions
Interpreting NLP Predictions
Silversparro Technologies
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
Silversparro Technologies
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
Silversparro Technologies
 
Video analytics in manufacturing
Video analytics in manufacturingVideo analytics in manufacturing
Video analytics in manufacturing
Silversparro Technologies
 
The unchartered territory of Deep fakes
The unchartered territory of Deep fakesThe unchartered territory of Deep fakes
The unchartered territory of Deep fakes
Silversparro Technologies
 
Building voice bots for customer service
Building voice bots for customer serviceBuilding voice bots for customer service
Building voice bots for customer service
Silversparro Technologies
 
Video analytics on the edge
Video analytics on the edgeVideo analytics on the edge
Video analytics on the edge
Silversparro Technologies
 
The promise of self supervised learning
The promise of self supervised learningThe promise of self supervised learning
The promise of self supervised learning
Silversparro Technologies
 
Temporal Action Detection
Temporal Action DetectionTemporal Action Detection
Temporal Action Detection
Silversparro Technologies
 
How to Build Recommendation Systems
How to Build Recommendation SystemsHow to Build Recommendation Systems
How to Build Recommendation Systems
Silversparro Technologies
 

More from Silversparro Technologies (10)

Interpreting NLP Predictions
Interpreting NLP PredictionsInterpreting NLP Predictions
Interpreting NLP Predictions
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Video analytics in manufacturing
Video analytics in manufacturingVideo analytics in manufacturing
Video analytics in manufacturing
 
The unchartered territory of Deep fakes
The unchartered territory of Deep fakesThe unchartered territory of Deep fakes
The unchartered territory of Deep fakes
 
Building voice bots for customer service
Building voice bots for customer serviceBuilding voice bots for customer service
Building voice bots for customer service
 
Video analytics on the edge
Video analytics on the edgeVideo analytics on the edge
Video analytics on the edge
 
The promise of self supervised learning
The promise of self supervised learningThe promise of self supervised learning
The promise of self supervised learning
 
Temporal Action Detection
Temporal Action DetectionTemporal Action Detection
Temporal Action Detection
 
How to Build Recommendation Systems
How to Build Recommendation SystemsHow to Build Recommendation Systems
How to Build Recommendation Systems
 

Recently uploaded

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
saastr
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
LucaBarbaro3
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
ScyllaDB
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
Data Hops
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 

Recently uploaded (20)

Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
Deep Dive: AI-Powered Marketing to Get More Leads and Customers with HyperGro...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Trusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process MiningTrusted Execution Environment for Decentralized Process Mining
Trusted Execution Environment for Decentralized Process Mining
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-EfficiencyFreshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
Freshworks Rethinks NoSQL for Rapid Scaling & Cost-Efficiency
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3FREE A4 Cyber Security Awareness  Posters-Social Engineering part 3
FREE A4 Cyber Security Awareness Posters-Social Engineering part 3
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 

Latest trends in NLP - Exploring BERT

  • 1. Silversparro Technologies Pvt. Ltd. Latest Trends in NLP Deep Learning Intern Milind Kudapa
  • 2. Silversparro Technologies Pvt. Ltd. Latest Trends in NLP Exploring BERT Deep Learning Intern Sourya Dipta Das
  • 3. Silversparro Technologies Pvt. Ltd. Outline ● Finite State Automata ● Bag of Words - Naive Bayes approach ● Word2Vec - CBOW and SkipGram ● Seq2seq models ● Attention and Transformer ● ElMo and GPT ● BERT
  • 4. Silversparro Technologies Pvt. Ltd. Why Representation Learning? ● Unlike pixel values of images in Computer Vision, machines cannot understand words as they are. ● Some form of representation in the form of numbers necessary for the machine to understand. ● Hence, word embeddings
  • 5. Silversparro Technologies Pvt. Ltd. Finite State Automata Examples: equal (adj1) + -al (q1) + -iz (q2) + -e (q3) = equalize Rule Based
  • 6. Silversparro Technologies Pvt. Ltd. Bag of words and Naive Bayes Working ● Vocabulary of known words ● Frequency of occurrence of words Limitation Naive Assumption ● Occurence of one word is independent of the occurrences of all other words. ● Information on the order of words is lost ● OOV words cannot be modelled
  • 7. Silversparro Technologies Pvt. Ltd. Neural Models - Word2Vec (Mikolov et. al. 2013) King - Man + Woman = Queen ● First revolution in NLP as neural models were used first time. ● CBOW - Predict word based on nearby context words. ● SkipGram - Predict context words given the target word.
  • 8. Silversparro Technologies Pvt. Ltd. Limitations of Word2Vec There is no representation for out-of-vocabulary words (OOVs). How to separate some opposite word pairs. For example, “good” and “bad” are usually located very close to each other in the vector space, which may limit the performance of word vectors in NLP tasks like sentiment analysis. Embeddings are not context based, for e.g. the word ‘crane’ can be used in different contexts but word2vec gives it the same representation, thus leading to loss of information.
  • 9. Silversparro Technologies Pvt. Ltd. Seq2seq models Silversparro Technologies ● Use of GRUs and LSTMs. ● Second Revolution in NLP ● Tasks such as Machine translation, Question Answering, sentence classification etc. have been achieved using these models.
  • 10. Silversparro Technologies Pvt. Ltd. ELMo, GPT - New Age in NLP Silversparro Technologies ● Feature Based and Fine Tuning strategies. ● ELMo (Peters et. al) - feature based and GPT (Radford et. al.) - fine-tuning. ● They use unidirectional language models to learn general language representations. ● ELMo uses bidirectional LSTM on a next word prediction task. ● In OpenAI GPT, the authors use a left-to-right architecture, and a Transformer as a decoder. Contextualized word embeddings
  • 12. Silversparro Technologies Pvt. Ltd. Bidirectional Encoder Representations ● Devlin et. al. 2018, Google Research ● The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. ● There are two steps in the framework: pre- training and fine-tuning. ● Pre-training is first done on unlabeled data on different tasks. ● For fine-tuning, the trained parameters are first initialized and then the model is fine tuned on different downstream tasks. from Transformers - BERT
  • 13. Silversparro Technologies Pvt. Ltd. Model Two models, ● BERT-Base - 12 encoder blocks - 110M parameters ● BERT-Large - 24 encoder blocks - 340M parameters ● BERT encoder is an semi-supervised model trained on two tasks: ● Masked Language Model: 15% of tokens in a sentence are masked [MASK] and the model learns to predict the masked tokens. ● Next Sentence Prediction: The model is trained to classify whether a particular sentence follows the given sentence or not. ● For the pre-training corpus the authors used the BooksCorpus (800M words) (Zhu et al., 2015) and English Wikipedia (2,500M words).
  • 14. Silversparro Technologies Pvt. Ltd. Attention is all you need Working of a Transformer: ● Uses Attention instead of Recurrent Units like LSTM. ● Three trainable matrices are introduced, Vaswani et. al. ● Queries, Keys, Values. ● Information regarding order of words is lost, hence position embeddings are used.
  • 16. Silversparro Technologies Pvt. Ltd. Embedding from BERT
  • 17. Silversparro Technologies Pvt. Ltd. Our Task ● Customer Care call transcripts from large scale insurance aggregator. ● Hinglish (English + Hindi written in english) ● Task: Binary classification whether a given person will buy the service or not. ● Pre-trained model on 171M word corpus ● Achieved a pre-training accuracy of 75% on MLM.
  • 18. Silversparro Technologies Pvt. Ltd. XL-Net, bigger is better? ● Focus on no of parameters ● How they used autoencoders (read about them) ● Latest results

Editor's Notes

  1. Converting all alphabet characters to lowercase, e.g. replacing “Word” with “word” Using a predefined contractions dictionary map to expend contractions, e.g. replacing “shouldn’t” with “should not” Replacing digits with a fixed token, e.g. converting “$ 350” to “$ ###”
  2. We use a combination of three models, using Glove,Paragram and FastText to generate word embeddings. We search for the original version,lowercase version,uppercase version,Capitalized version,stemmed version,lemmatized version and the corrected version in order to get the embedded vectors from these pre-trained embeddings.