Deep Learning in NLP (BERT, ERNIE and REFORMER)

1
Agenda - Deep Learning techniques for Natural Language Processing (NLP)
• Introduction
• Use case landscape
• Shallow vs Deep
• Deep NLP -SOTAs
• BERT
• ERNIE
• REFORMER
• Implementation
• Q&A
Pic : Young Sheldon with ELIZA chatbot

2
AI Trends for 2020
Process Automation
AI in health care
Voice/Chat interface
Federated learning
Ethical AI

3
Conversational AI - Trends
Digital Assistants for Enterprises/Solution Bots
Facilitating easy mail searches, managing meetings without hassle, assigning tasks, accessing knowledge repositories and different applications with
zero-touch: these are some of the areas where a typical white-collar employee spends more than 25% of the effort. These non-value-adding activities
can be performed smartly by an intelligent virtual assistant
Augmented Reality in Conversational AI
AR in chatbots is a unique technology that can take the engagement level and usage to the next heights
No UI is the New UI
With the emergence of Conversational AI bots, you no longer have to look into multiple pages and tabs of a web/mobile app for any information or task
execution. You can simply query the bot, which does most of the work
SMS 2.0: RCS messaging
One of the key channels where Conversational AI bots would be published is SMS channels. Rich Communication Services (RCS) has been eventually
replacing conventional SMS channels.
Machine to Machine Conversations(M2M)
. Conversational AI bots used to trigger man-machine interaction, decipher information collected from the IoT devices to draw insights and make
recommendations

4
Fight against Covid19 - using NLP
• Online consultation
• Intelligent Robo-call platform (1500 call /
sec)
• Online search query handling
• Virtual healthcare software (bright.md)
• BlueDot (Canadian startup) for initial
scanning
https://www.technologyreview.com/2020/03/11/905366/how-baidu-is-bringing-ai-to-the-fight-
against-coronavirus/

5
A typical AI Application today means…

6
Basic Objectives of NLP computing models
• Understand semantics
• Lexical (word)
• Composition (Sentence)
• Discourse (long term
context)
• Understand syntax
• Understand context
• Understand intent
Leonard: Hey, Penny. How's work?
Penny: Great! I hope I'm a waitress at the Cheesecake
Factory for my whole life!
Sheldon: Was that sarcasm?
Penny: No.
Sheldon: Was that sarcasm?
Penny: Yes.
"The Financial Permeability," Season 2, The Big Bang Theory

7
NLP - Computing Domain
Shallow Learning
Deep learning
• POS tagging
• NER (Named Entity Recognition )
• Bag of Words
• TF-IDF
• LDA
• CRF (Conditional Random Field)
• SRL (Semantic Role Labelling)
• OCR
• Word Embedding
• Sequence learning models
• LSTM
• RNN
• Encoder Decoder models
• Attention
• Transformer
• Knowledge Graph

8
Deep learning for NLP
Deep learning journey

9
“The Unreasonable success of RNNs”
http://karpathy.github.io/2015/05/21/rnn-effectiveness/
Image
classification
Image captioning
Sentiment Analysis
Language
translation Subtitle generation

10
Word Embedding
• Vector space model
• Preserving semantics
• N-gram models
• Pre-trained models
• Building blocks for Language Model (LM)
Modi
Varanasi
Prime Minister
constituency
Word embedding space

11
ATTENTION block
Attention types
• Additive
• Multiplicative
• Self
• Key-Value

12
Sequence to Sequence (Seq2Seq) Model
Courtesy: Analytics Vidhya
• Encoder and Decoder models
• Encoder and Decoder can use any combination RNN or
LSTM or CNN to realize the model depending on
performance and other requirements
• Uses Attention mechanism for context preservation
Courtesy: towards Datascience

13
Transformer Based Language Models (LM)
• Seq2Seq model with ‘Attention’ with steroid
Transformer Based LMs Non-Transformer Based LMs
BERT(Google) ELMO (AllenAI) ,
ULMFiT(Fast.ai),
CoVE
GPT (OpenAI), GPT-2 GLove (Manning ,Socher and
others)
ERNIE (Baidu),
XLNet (Google Brain &
CMU)
Word2vec (Tomas Mikolov and
others)
Multihead attention blocks
Decoder has additional
Masked block

14
BERT
• BERT stands for Bidirectional Encoder Representations from
Transformers
import tensorflow as tf
import bert
from bert import tokenization
from bert import modeling
from bert import optimization
from bert import run_classifier
BERT_VOCAB = 'cased_L-12_H-768_A-12/vocab.txt'
BERT_INIT_CHKPNT = 'cased_L-12_H-768_A-12/bert_model.ckpt'
BERT_CONFIG = 'cased_L-12_H-768_A-12/bert_config.json'
tokenizer = tokenization.FullTokenizer(vocab_file=BERT_VOCAB,
do_lower_case=False)
bert_config = modeling.BertConfig.from_json_file(BERT_CONFIG)
model = _Model(bert_config, tokenizer)

15
BERT and Its Variants
• There are several variants of BERT for domain specific use cases. Few are mentioned below
• VideoBERT : Learning Cross-Modal Temporal Representations from Unlabeled Videos
• TinyBERT, ALBERT, ROBERTa...
BERT Variant Name Use case and Data set Reference
SciBERT Trained on papers from the corpus
of semanticscholar.org
https://arxiv.org/abs/1903.1
0676
BioBert Trained on BC5CDR and BioNLP13CG data set https://github.com/MeRajat
/SolvingAlmostAnythingWit
hBert
ClinicalBERT Trained on MIMIC-III data https://arxiv.org/abs/1904.0
5342
FinBERT TRC2-financial, Financial PhraseBank, FiQA
Sentiment
https://arxiv.org/abs/1908.1
0063

16
ERNIE 2.0
• ERNIE (Enhanced Representation through kNowledge IntEgration), a new knowledge integration language representation model
• ERNIE 2.0 is built as a continual pretraining framework to continuously gain enhancement on knowledge integration
through multi-task learning, enabling it to more fully learn various lexical, syntactic and semantic information through
massive data
• ERNIE 2.0 can incrementally train on several new tasks in sequence and accumulate the knowledge it obtains during the
learning process to apply to future tasks
Knowledge Graph
incorporation
Structured
knowledge
Encoding
Performs best for
Question Answer
type of use cases

17
Reformer: The Efficient Transformer
• A Transformer model designed to handle context windows of up to 1 million words, all on a single accelerator and using only
16GB of memory
• Reformer uses locality-sensitive-hashing (LSH) to reduce the complexity of attending over long sequences and reversible
residual layers to more efficiently use the memory available
• LSH accomplishes to handle large sequences in attention layer by computing a hash function that matches similar vectors
together, instead of searching through all possible pairs of vectors
• The second novel approach implemented in Reformer is to recompute the input of each layer on-demand during back-
propagation, rather than storing it in memory. This is accomplished by using reversible layers, where activations from the last
layer of the network are used to recover activations from any intermediate layer, by what amounts to running the network in
reverse
Reversible layers: (A) In a standard residual network, the activations from each layer
are used to update the inputs into the next layer. (B) In a reversible network, two
sets of activations are maintained, only one of which is updated after each layer. (C)
This approach enables running the network in reverse in order to recover all
intermediate values.
Locality-sensitive-hashing: Reformer takes in an input sequence of keys, where each
key is a vector representing individual words (or pixels, in the case of images) in the
first layer and larger contexts in subsequent layers. LSH is applied to the sequence,
after which the keys are sorted by their hash and chunked. Attention is applied only
within a single chunk and its immediate neighbors.

18
Multi-Lingual NLP (Multilingualism)
• NLP community has shown interest in multilingual NLP for specific reasons(both research and business)
• Some developments in this space happened by multiple individuals and organizations
• Again, BERT has its own variants as mBERT(https://github.com/google-
research/bert/blob/master/multilingual.md)
• Few more multilingual NLP network architectures are given below
• LASER (Language-Agnostic SEntence Representations)
(https://github.com/facebookresearch/LASER)
• Multilingual Universal Sentence Encoder for Semantic Retrieval
• https://github.com/facebookresearch/XLM

19
Context Aware Diagnostics and Troubleshooting

20
Confidentiality Notice
This document and all information contained herein is the sole property of Tata Elxsi Limited and shall not be reproduced or
disclosed to a third party without the express written consent of Tata Elxsi Limited.
www.tataelxsi.com
Thank You
Tata Elxsi
facebook.com/ElxsiTata twitter.com/tetataelxsi linkedin.com/company/tata-elxsi

Deep Learning in NLP (BERT, ERNIE and REFORMER)

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deep Learning in NLP (BERT, ERNIE and REFORMER)

Similar to Deep Learning in NLP (BERT, ERNIE and REFORMER) (20)

Recently uploaded

Recently uploaded (20)

Deep Learning in NLP (BERT, ERNIE and REFORMER)