SlideShare a Scribd company logo
1 of 18
Silversparro Technologies Pvt. Ltd.
Latest Trends in NLP
Deep Learning Intern
Milind Kudapa
Silversparro Technologies Pvt. Ltd.
Latest Trends in NLP
Exploring BERT
Deep Learning Intern
Sourya Dipta Das
Silversparro Technologies Pvt. Ltd.
Outline
● Finite State Automata
● Bag of Words - Naive Bayes approach
● Word2Vec - CBOW and SkipGram
● Seq2seq models
● Attention and Transformer
● ElMo and GPT
● BERT
Silversparro Technologies Pvt. Ltd.
Why Representation Learning?
● Unlike pixel values of images in Computer Vision, machines cannot understand
words as they are.
● Some form of representation in the form of numbers necessary for the machine
to understand.
● Hence, word embeddings
Silversparro Technologies Pvt. Ltd.
Finite State Automata
Examples: equal (adj1) + -al (q1) + -iz (q2) + -e (q3) = equalize
Rule Based
Silversparro Technologies Pvt. Ltd.
Bag of words and Naive Bayes
Working
● Vocabulary of known words
● Frequency of occurrence of words
Limitation
Naive Assumption
● Occurence of one word is independent of the
occurrences of all other words.
● Information on the order of words is lost
● OOV words cannot be modelled
Silversparro Technologies Pvt. Ltd.
Neural Models - Word2Vec
(Mikolov et. al. 2013)
King - Man + Woman = Queen
● First revolution in NLP as neural models were
used first time.
● CBOW - Predict word based on nearby context
words.
● SkipGram - Predict context words given the
target word.
Silversparro Technologies Pvt. Ltd.
Limitations of Word2Vec
There is no representation for out-of-vocabulary words (OOVs).
How to separate some opposite word pairs. For example, “good” and “bad” are usually
located very close to each other in the vector space, which may limit the
performance of word vectors in NLP tasks like sentiment analysis.
Embeddings are not context based, for e.g. the word ‘crane’ can be used in different
contexts but word2vec gives it the same representation, thus leading to loss of
information.
Silversparro Technologies Pvt. Ltd.
Seq2seq models
Silversparro Technologies
● Use of GRUs and LSTMs.
● Second Revolution in NLP
● Tasks such as Machine translation, Question
Answering, sentence classification etc. have
been achieved using these models.
Silversparro Technologies Pvt. Ltd.
ELMo, GPT - New Age in NLP
Silversparro Technologies
● Feature Based and Fine Tuning strategies.
● ELMo (Peters et. al) - feature based and GPT (Radford et. al.) - fine-tuning.
● They use unidirectional language models to learn general language
representations.
● ELMo uses bidirectional LSTM on a next word prediction task.
● In OpenAI GPT, the authors use a left-to-right architecture, and a Transformer as
a decoder.
Contextualized word embeddings
Silversparro Technologies Pvt. Ltd.
Silversparro Technologies Pvt. Ltd.
Bidirectional Encoder Representations
● Devlin et. al. 2018, Google Research
● The masked language model randomly
masks some of the tokens from the input,
and the objective is to predict the original
vocabulary id of the masked word based
only on its context.
● There are two steps in the framework: pre-
training and fine-tuning.
● Pre-training is first done on unlabeled data
on different tasks.
● For fine-tuning, the trained parameters
are first initialized and then the model is
fine tuned on different downstream tasks.
from Transformers - BERT
Silversparro Technologies Pvt. Ltd.
Model
Two models,
● BERT-Base - 12 encoder blocks - 110M
parameters
● BERT-Large - 24 encoder blocks - 340M
parameters
● BERT encoder is an semi-supervised model
trained on two tasks:
● Masked Language Model: 15% of tokens in a
sentence are masked [MASK] and the model
learns to predict the masked tokens.
● Next Sentence Prediction: The model is
trained to classify whether a particular
sentence follows the given sentence or not.
● For the pre-training corpus the authors used
the BooksCorpus (800M words) (Zhu et al.,
2015) and English Wikipedia (2,500M words).
Silversparro Technologies Pvt. Ltd.
Attention is all you need
Working of a Transformer:
● Uses Attention instead of Recurrent Units like
LSTM.
● Three trainable matrices are introduced,
Vaswani et. al.
● Queries, Keys, Values.
● Information regarding order of words is lost,
hence position embeddings are used.
Silversparro Technologies Pvt. Ltd.
Silversparro Technologies Pvt. Ltd.
Embedding from BERT
Silversparro Technologies Pvt. Ltd.
Our Task
● Customer Care call transcripts from large scale insurance aggregator.
● Hinglish (English + Hindi written in english)
● Task: Binary classification whether a given person will buy the service or not.
● Pre-trained model on 171M word corpus
● Achieved a pre-training accuracy of 75% on MLM.
Silversparro Technologies Pvt. Ltd.
XL-Net, bigger is better?
● Focus on no of parameters
● How they used autoencoders (read about them)
● Latest results

More Related Content

What's hot

BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesSenthil Kumar M
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representationszperjaccico
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMakerSuman Debnath
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understandinggohyunwoong
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)H K Yoon
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Taggingtheyaseen51
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to TransformersSuman Debnath
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaAlexey Grigorev
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101Jaemin Cho
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEAbdurrahimDerric
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflowseungwoo kim
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 

What's hot (20)

[Paper review] BERT
[Paper review] BERT[Paper review] BERT
[Paper review] BERT
 
BERT - Part 2 Learning Notes
BERT - Part 2 Learning NotesBERT - Part 2 Learning Notes
BERT - Part 2 Learning Notes
 
An Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language RepresentationsAn Introduction to Pre-training General Language Representations
An Introduction to Pre-training General Language Representations
 
Bert
BertBert
Bert
 
Transformers and BERT with SageMaker
Transformers and BERT with SageMakerTransformers and BERT with SageMaker
Transformers and BERT with SageMaker
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingBERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
 
Nlp and transformer (v3s)
Nlp and transformer (v3s)Nlp and transformer (v3s)
Nlp and transformer (v3s)
 
Parts of Speect Tagging
Parts of Speect TaggingParts of Speect Tagging
Parts of Speect Tagging
 
Introduction to Transformers
Introduction to TransformersIntroduction to Transformers
Introduction to Transformers
 
Introduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga PetrovaIntroduction to Transformers for NLP - Olga Petrova
Introduction to Transformers for NLP - Olga Petrova
 
Developing Korean Chatbot 101
Developing Korean Chatbot 101Developing Korean Chatbot 101
Developing Korean Chatbot 101
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCEDETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
DETERMINING CUSTOMER SATISFACTION IN-ECOMMERCE
 
Chatbot ppt
Chatbot pptChatbot ppt
Chatbot ppt
 
Word embedding
Word embedding Word embedding
Word embedding
 
NLP Deep Learning with Tensorflow
NLP Deep Learning with TensorflowNLP Deep Learning with Tensorflow
NLP Deep Learning with Tensorflow
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Introduction to Transformer Model
Introduction to Transformer ModelIntroduction to Transformer Model
Introduction to Transformer Model
 

Similar to Latest trends in NLP - Exploring BERT

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Deep Learning Italia
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfhelloworld28847
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfsudeshnakundu10
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionArtsemZhyvalkouski
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFJayavardhan Reddy Peddamail
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic ParrotsKonstantin Savenkov
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET Journal
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmIRJET Journal
 
How to build a GPT model.pdf
How to build a GPT model.pdfHow to build a GPT model.pdf
How to build a GPT model.pdfStephenAmell4
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...IRJET Journal
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers Arvind Devaraj
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesIRJET Journal
 
Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3IRJET Journal
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and TransformerArvind Devaraj
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteTushar B Kute
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET Journal
 

Similar to Latest trends in NLP - Exploring BERT (20)

Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
Transformer Seq2Sqe Models: Concepts, Trends & Limitations (DLI)
 
Transformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdfTransformer Models_ BERT vs. GPT.pdf
Transformer Models_ BERT vs. GPT.pdf
 
Introduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdfIntroduction to Large Language Models and the Transformer Architecture.pdf
Introduction to Large Language Models and the Transformer Architecture.pdf
 
Kaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solutionKaggle Tweet Sentiment Extraction: 1st place solution
Kaggle Tweet Sentiment Extraction: 1st place solution
 
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRFEnd-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
End-to-end sequence labeling via bi-directional LSTM-CNNs-CRF
 
GPT and other Text Transformers: Black Swans and Stochastic Parrots
GPT and other Text Transformers:  Black Swans and Stochastic ParrotsGPT and other Text Transformers:  Black Swans and Stochastic Parrots
GPT and other Text Transformers: Black Swans and Stochastic Parrots
 
IRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine LearningIRJET - Pseudocode to Python Translation using Machine Learning
IRJET - Pseudocode to Python Translation using Machine Learning
 
An Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting AlgorithmAn Efficient Approach to Produce Source Code by Interpreting Algorithm
An Efficient Approach to Produce Source Code by Interpreting Algorithm
 
How to build a GPT model.pdf
How to build a GPT model.pdfHow to build a GPT model.pdf
How to build a GPT model.pdf
 
Transformers AI PPT.pptx
Transformers AI PPT.pptxTransformers AI PPT.pptx
Transformers AI PPT.pptx
 
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
Advancements in Hindi-English Neural Machine Translation: Leveraging LSTM wit...
 
NLP using transformers
NLP using transformers NLP using transformers
NLP using transformers
 
Recent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP ApproachesRecent Trends in Translation of Programming Languages using NLP Approaches
Recent Trends in Translation of Programming Languages using NLP Approaches
 
LLM.pdf
LLM.pdfLLM.pdf
LLM.pdf
 
Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3Exploring the Role of Transformers in NLP: From BERT to GPT-3
Exploring the Role of Transformers in NLP: From BERT to GPT-3
 
Deep learning for NLP and Transformer
 Deep learning for NLP  and Transformer Deep learning for NLP  and Transformer
Deep learning for NLP and Transformer
 
Chapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B KuteChapter 01 Introduction to Java by Tushar B Kute
Chapter 01 Introduction to Java by Tushar B Kute
 
Alabot
AlabotAlabot
Alabot
 
IRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text DetectionIRJET- On-Screen Translator using NLP and Text Detection
IRJET- On-Screen Translator using NLP and Text Detection
 
Transformer Zoo
Transformer ZooTransformer Zoo
Transformer Zoo
 

More from Silversparro Technologies (10)

Interpreting NLP Predictions
Interpreting NLP PredictionsInterpreting NLP Predictions
Interpreting NLP Predictions
 
Uncertainties in Deep Learning
Uncertainties in Deep LearningUncertainties in Deep Learning
Uncertainties in Deep Learning
 
Video Classification Basic
Video Classification Basic Video Classification Basic
Video Classification Basic
 
Video analytics in manufacturing
Video analytics in manufacturingVideo analytics in manufacturing
Video analytics in manufacturing
 
The unchartered territory of Deep fakes
The unchartered territory of Deep fakesThe unchartered territory of Deep fakes
The unchartered territory of Deep fakes
 
Building voice bots for customer service
Building voice bots for customer serviceBuilding voice bots for customer service
Building voice bots for customer service
 
Video analytics on the edge
Video analytics on the edgeVideo analytics on the edge
Video analytics on the edge
 
The promise of self supervised learning
The promise of self supervised learningThe promise of self supervised learning
The promise of self supervised learning
 
Temporal Action Detection
Temporal Action DetectionTemporal Action Detection
Temporal Action Detection
 
How to Build Recommendation Systems
How to Build Recommendation SystemsHow to Build Recommendation Systems
How to Build Recommendation Systems
 

Recently uploaded

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 

Recently uploaded (20)

"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 

Latest trends in NLP - Exploring BERT

  • 1. Silversparro Technologies Pvt. Ltd. Latest Trends in NLP Deep Learning Intern Milind Kudapa
  • 2. Silversparro Technologies Pvt. Ltd. Latest Trends in NLP Exploring BERT Deep Learning Intern Sourya Dipta Das
  • 3. Silversparro Technologies Pvt. Ltd. Outline ● Finite State Automata ● Bag of Words - Naive Bayes approach ● Word2Vec - CBOW and SkipGram ● Seq2seq models ● Attention and Transformer ● ElMo and GPT ● BERT
  • 4. Silversparro Technologies Pvt. Ltd. Why Representation Learning? ● Unlike pixel values of images in Computer Vision, machines cannot understand words as they are. ● Some form of representation in the form of numbers necessary for the machine to understand. ● Hence, word embeddings
  • 5. Silversparro Technologies Pvt. Ltd. Finite State Automata Examples: equal (adj1) + -al (q1) + -iz (q2) + -e (q3) = equalize Rule Based
  • 6. Silversparro Technologies Pvt. Ltd. Bag of words and Naive Bayes Working ● Vocabulary of known words ● Frequency of occurrence of words Limitation Naive Assumption ● Occurence of one word is independent of the occurrences of all other words. ● Information on the order of words is lost ● OOV words cannot be modelled
  • 7. Silversparro Technologies Pvt. Ltd. Neural Models - Word2Vec (Mikolov et. al. 2013) King - Man + Woman = Queen ● First revolution in NLP as neural models were used first time. ● CBOW - Predict word based on nearby context words. ● SkipGram - Predict context words given the target word.
  • 8. Silversparro Technologies Pvt. Ltd. Limitations of Word2Vec There is no representation for out-of-vocabulary words (OOVs). How to separate some opposite word pairs. For example, “good” and “bad” are usually located very close to each other in the vector space, which may limit the performance of word vectors in NLP tasks like sentiment analysis. Embeddings are not context based, for e.g. the word ‘crane’ can be used in different contexts but word2vec gives it the same representation, thus leading to loss of information.
  • 9. Silversparro Technologies Pvt. Ltd. Seq2seq models Silversparro Technologies ● Use of GRUs and LSTMs. ● Second Revolution in NLP ● Tasks such as Machine translation, Question Answering, sentence classification etc. have been achieved using these models.
  • 10. Silversparro Technologies Pvt. Ltd. ELMo, GPT - New Age in NLP Silversparro Technologies ● Feature Based and Fine Tuning strategies. ● ELMo (Peters et. al) - feature based and GPT (Radford et. al.) - fine-tuning. ● They use unidirectional language models to learn general language representations. ● ELMo uses bidirectional LSTM on a next word prediction task. ● In OpenAI GPT, the authors use a left-to-right architecture, and a Transformer as a decoder. Contextualized word embeddings
  • 12. Silversparro Technologies Pvt. Ltd. Bidirectional Encoder Representations ● Devlin et. al. 2018, Google Research ● The masked language model randomly masks some of the tokens from the input, and the objective is to predict the original vocabulary id of the masked word based only on its context. ● There are two steps in the framework: pre- training and fine-tuning. ● Pre-training is first done on unlabeled data on different tasks. ● For fine-tuning, the trained parameters are first initialized and then the model is fine tuned on different downstream tasks. from Transformers - BERT
  • 13. Silversparro Technologies Pvt. Ltd. Model Two models, ● BERT-Base - 12 encoder blocks - 110M parameters ● BERT-Large - 24 encoder blocks - 340M parameters ● BERT encoder is an semi-supervised model trained on two tasks: ● Masked Language Model: 15% of tokens in a sentence are masked [MASK] and the model learns to predict the masked tokens. ● Next Sentence Prediction: The model is trained to classify whether a particular sentence follows the given sentence or not. ● For the pre-training corpus the authors used the BooksCorpus (800M words) (Zhu et al., 2015) and English Wikipedia (2,500M words).
  • 14. Silversparro Technologies Pvt. Ltd. Attention is all you need Working of a Transformer: ● Uses Attention instead of Recurrent Units like LSTM. ● Three trainable matrices are introduced, Vaswani et. al. ● Queries, Keys, Values. ● Information regarding order of words is lost, hence position embeddings are used.
  • 16. Silversparro Technologies Pvt. Ltd. Embedding from BERT
  • 17. Silversparro Technologies Pvt. Ltd. Our Task ● Customer Care call transcripts from large scale insurance aggregator. ● Hinglish (English + Hindi written in english) ● Task: Binary classification whether a given person will buy the service or not. ● Pre-trained model on 171M word corpus ● Achieved a pre-training accuracy of 75% on MLM.
  • 18. Silversparro Technologies Pvt. Ltd. XL-Net, bigger is better? ● Focus on no of parameters ● How they used autoencoders (read about them) ● Latest results

Editor's Notes

  1. Converting all alphabet characters to lowercase, e.g. replacing “Word” with “word” Using a predefined contractions dictionary map to expend contractions, e.g. replacing “shouldn’t” with “should not” Replacing digits with a fixed token, e.g. converting “$ 350” to “$ ###”
  2. We use a combination of three models, using Glove,Paragram and FastText to generate word embeddings. We search for the original version,lowercase version,uppercase version,Capitalized version,stemmed version,lemmatized version and the corrected version in order to get the embedded vectors from these pre-trained embeddings.