State of NLP and Amazon Comprehend

State of NLP &
Amazon Comprehend
Brief Overview of the State of NLP and
AWS NLP Offering
Egor Pushkin
Principal Engineer
Amazon AI
January 2020

Introductions
Presenter
Machine learning at AWS, Amazon AI
Internet-scale services and global mobile deployments at Glympse
This Talk
Audience
How many people are familiar with T5 model? BERT? Transformer?
Given a passage of natural text and a question, find a span in the initial passage
that constitutes the answer (if the one is available).

Dark Ages... or Once Upon a Time...
Rule-based Systems
hardcoded rules
regular expressions
Statistical / Shallow Neural
corpus-based statistics
feature engineering
linear models

Stepping Stones / Driving Forces
Perceptron / Backpropagation
Automatic Differentiation
Deep Neural / CNN / LSTM / Transformer
Computational Resources / GPU
Data Availability
Attention to the Space

Natural Language Understanding
Language Analysis
Task → Dataset → Challenge → Model
What is "progress" in NLU?
higher accuracy on existing tasks and datasets
new (more complex) tasks and datasets
new categories of problems
“Language shapes the way we think, and determines what we can think about.”
– Benjamin Lee Whorf
Categories of NLU Tasks
sequence tagging
semantic relations
question answering
classification
language generation
...

Journey
to common
sense
reasoning
(hopefully)
In 1935, in an annual birthday celebration interview, Tesla announced a method of
transmitting mechanical energy with minimal loss over any terrestrial distance, a
related new means of communication, and a method of accurately determining the
location of underground mineral deposits.
SQuAD 1.1
How far did he claim the mechanical energy could be transmitted?
DROP
The median age in the city was 22.1 years. 10.1% of residents were under the age
of 18; 56.2% were between the ages of 18 and 24; 16.1% were from 25 to 44;
10.5% were from 45 to 64; and 7% were 65 years of age or older. The gender
makeup of the city was 64.3% male and 35.7% female.
Which age group was the second largest?
Why do people read gossip magazines?
entertained | get information | learn | improve know how | lawyer told to
CommonSense QA
...
???
via increasing
task complexity
Non Existing QA Challenge

P( Summer is over | LMs are essential to NLP )
Natural Language Modeling
Fundamentals
Evolution
“You shall know the nature of a word by the company it keeps.”
– John Rupert Firth
P(w) and P(wi
| wi-n-1:i-1
)
>100mm params
>10GB pre-training data
P( time | Once upon a )
Count-based LMs
Continuous-space LMs
(shallow)
Recurrent/Transformer LMs

Word Embeddings
Input representation
Semantic embeddings
Contextualized word embeddings
Word pieces
Hello? Is there
anybody in there?
surrealistic
existentialism
surreal ##istic existent
##ial ##ism ...
word2vec
GloVe
FastText
ELMO
BERT
walked
swam
walking
swimming

Sequence to Sequence
Encoder/decoder architecture
Sequence to Sequence Learning with Neural Networks (Sutskever et al., 2014)
encoder
decoder
input
output
internal
representation
A B C <EOS>
<EOS>
W X Y Z
W X Y Z
Encoder input and state as
input sequence is being fed into it.
Decoder state and output in process of
output sequence generation.
<BOS>
Previously generated context is added back to
the network giving it some context on what has
already been produced.
Translation, language generation
Multi-modal applications
Actual (or expected) output from training dataset
is used when model is trained with teacher
forcing enabled.
inference
training

Attention
Neural Machine Translation by Jointly Learning to Align and Translate (Bahdanau et al., 2014)
(Prior to attention) Context
vector (encoder state)
tended to forget things...
Attention mechanism was
introduced to memorize
long sentences
Was born for translation
A B C <EOS>
x x x x
• • • •
+
<EOS>
W X Y Z
softmax
Each item is dot producted
with the query to produce a
score, describing how well it
matches the query. The
scores are fed into a
softmax to create the
attention distribution.
q
decoder
states
encoder
states

Transformer
Multi-head self-attention
Attention Is All You Need (Vaswani et al., 2017)
Encoder/decoder
stacks
x
+
x
⁀
x ...
combining output of
multiple attention heads
k
q
v
self-attention
...
...
...

Word-pieces
Positional encoding
Segment embeddings
Learning objectives
Masked language modeling
Next sentence prediction
Large scale pre-training
English Wikipedia, BooksCorpus
Bidirectional, autoencoder
BERT
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (Devlin et al., 2018)
...
transformer
encoder
stack
tokenized input
token representations
output of
downstream task
pre-training
setup
fine-tuning
setup
LM learning objective at
training time
additional
layer(s)
output
probabilities
Transformer
contributions
embeddings
embeddings'
+ softmax
fully-connected
+ GELU + norm

Example tasks:
Classification
NER
Question Answering
Relation Extraction
Coreference Resolution
Natural Language Inference
...
Transfer Learning via Fine-tuning LM Models
Transfer Learning in Natural Language Processing (quick ~230 slide overview)
“NLP's ImageNet moment has arrived”
– Sebastian Ruder
Transfer Learning
Inductive
Transfer Learning
Transductive
Transfer Learning
Multi-task
Learning
Sequential Transfer
Learning
Domain
Adaptation
Cross-lingual
Learning
Different tasks;
labeled data in target
domain
Same tasks;
labeled data only in
source domain
Different
domains
Different
languages
Tasks learned
simultaneously
Tasks learned
sequentially

How far ... transmitted?
Extractive, open-domain
question answering based on
the given context
Training
fine-tune BERT encoder layers
train dense layer from scratch
tokenizer vocabulary stays intact
Inference
BERT for QA In 1935, in an annual birthday celebration interview, Tesla announced a method of
transmitting mechanical energy with minimal loss over any terrestrial distance, a
related new means of communication, and a method of accurately determining the
location of underground mineral deposits.
How far did he claim the mechanical energy could be transmitted?
SQuAD 1.1
BERT
In 1935, in … mineral deposits.
[CLS] 'In' '1935' ',' 'in' ... 'mineral' 'deposits' '.' [SEP] 'how' 'far' ... 'transmitted' '?'
BERT Tokenizer
Start Position End Position
… minimal loss over any terrestrial distance, a related new means of ...
Fully-connected layer
with weight matrix of
size [T*H, 2]
0 0 0 0 0 0 0 0 1 1 1 1 1
p
q

BERT for QA
What about multi-hop questions?

Pace Of Innovation
Perceptron 1957
Word Embeddings ~1960s
CNN 1989 ~30 years
LSTM 1997 12 years
Continuous-space LM 2001 4 years
Multi-task Learning 2008 7 years
Seq to Seq | Attention Sep, 2014 6 years
Transformer Jul 12, 2017 3 years
BERT Oct 11, 2018 1 year SoTA 11 tasks
Transformer-XL Jan 9, 2019 3 months SoTA 4 tasks
XLNet June 19, 2019 5 months SoTA 18 tasks
roBERTa July 26, 2019 1 month SoTA 6 tasks
ERNIE 2.0 July 29, 2019 3 days SoTA 9 tasks (ch)
ALBERT Sep 25, 2019 2 months SoTA 10 tasks
T5 Oct 24, 2019 1 month SoTA 16 tasks
...
All SoTA results below
this line are attributed to
transformer models

Topical Problems
Bias
Explainability
Troubleshooting
Control / Nuances
Privacy
...

Areas of Research
Transformers
Explainability
Auto ML / NAS
Low Resource / Green AI
Common Sense…
...

Tasks
https://nlusense.com
One biased opinion
regarding the structure
of NLP discipline

Overview
© 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved

AI Services
new
new
new

Amazon Comprehend

Sentiment Analysis
The Sentiment Analysis API returns the overall sentiment of a text (Positive, Negative, Neutral, or Mixed).
Can't wait to visit rainforest again! Such a magical place...
Analysis
Examples
positive: 0.95 | negative: 0.0 | neutral: 0.03 | mixed: 0.0
Been there twice and won't go ever again. Bad and not getting better...
The Sun is a G-type main-sequence star (G2V) based on its spectral class.
The food was great but the service could be improved.
Comprehend Sentiment Analysis © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved

Named Entity Recognition
Amazon.com, Inc. is located in Seattle, WA and was founded
July 5th, 1994 by Jeff Bezos , allowing customers to buy
everything from books to blenders. Seattle is north of Portland
and south of Vancouver, BC . Other notable Seattle -based
companies are Starbucks and Boeing .
PERSON
LOCATION
ORGANIZATION
COMMERCIAL_ITEM
EVENT
DATE
QUANTITY
TITLE
OTHER
ORGANIZATION
DATE
LOCATION
PERSON
LOCATION LOCATION
LOCATION LOCATION
ORGANIZATION ORGANIZATION
Entity Types
Recognition Example
The Entity Recognition API returns the named entities ("People," "Places," "Locations," etc.) that
are automatically categorized based on the provided text.
Comprehend Entities Recognition © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved

Keyphrase Extraction
The Keyphrase Extraction API returns the key phrases or talking points and a confidence score to
support that this is a key phrase.
I'm an avid photographer , and I'm primarily found shooting with
my DSLR or my instant film camera that I carry around for
casual use . While nothing beats my DSLR in power and
convenience , there's something magical about my instant film camera .
Extraction Example
Comprehend Keyphrase Extraction © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved

Language Detection
The Language Detection API automatically identifies text written in over 100 languages and
returns the dominant language with a confidence score to support that a language is dominant.
Amazon Elastic Compute Cloud (Amazon EC2) è un servizio Web che
fornisce capacità di elaborazione sicura e scalabile nel cloud. È
concepito per rendere più semplice il cloud computing su scala Web per
gli sviluppatori.
Detection Example
af | am | ar | as | az | ba
be | bn | bs | bg | ca | ceb
cs | cv | cy | da | de | el
en | eo | et | eu | fa | fi
fr | gd | ga | gl | gu | ht
he | hi | hr | hu | hy | ilo
id | is | it | jv | ja | kn
ka | kk | km | ky | ko | ku
la | lv | lt | lb | ml | mr
mk | mg | mn | ms | my | ne
new | nl | no | or | pa | pl
pt | ps | qu | ro | ru | sa
si | sk | sl | sd | so | es
sq | sr | su | sw | sv | ta
tt | te | tg | tl | th | tk
tr | ug | uk | ur | uz | vi
yi | yo | zh | zh-TW
Italian: 0.99
Comprehend Language Detection © 2020 Amazon Web Services, Inc. or its affiliates. All rights reserved

Syntax Analysis
The Amazon Comprehend Syntax API enables customers to analyze text using tokenization and Parts of
Speech (PoS), and identify word boundaries and labels like nouns and adjectives within the text.
He also investigated the thermal properties of light
which laid the foundation of the photon theory
of light . In 1917 .
PRON
ADJ
ADP
ADV
AUX
CONJ
DET
INTJ
NOUN
NUM
PoS Tags
Comprehend Syntax Analysis
O
PART
PRON
PROPN
PUNCT
SCONJ
SYM
VERB
DET
ADV VERB ADJ NOUN ADP NOUN
PRON VERB DET NOUN ADP DET NOUN NOUN
ADP NOUN PUNCT ADP NUM PUNCT
Analysis Example

Input document 1
Input document 2
Custom Classification (Multi-Class)
CLASS_1
CLASS_1
CLASS_2
CLASS_3
CLASS_4
...
Training
DOC_1
DOC_2
DOC_3
DOC_4
DOC_5
...
The Custom Classification API enables you to easily build custom text classification models using your
business-specific labels without learning ML.
Comprehend Custom Classification
Classification
CLASS_1: 0.85 | CLASS_2: 0.1 | CLASS_4: 0.4
CLASS_2: 0.05 | CLASS_3: 0.76 | CLASS_4: 0.23

Custom Classification (Multi-Label)
LABEL_1 | LABEL_2
LABEL_3
LABEL_3 | LABEL_4 | LABEL_5
LABEL_1 | LABEL_3 |
LABEL_6
...
Training
DOC_1
DOC_2
DOC_3
DOC_4
DOC_5
...
Multi-label Text Classification provides ability to tag documents with multiple labels at the same time
(unlike mult-class mode, where each document is associated with a single class).
Comprehend Custom Classification
Classification
LABEL_3: 0.34 | LABEL_4: 0.87
new
LABEL_5: 0.26
Input document 2
LABEL_3: 0.73 | LABEL_6: 0.78
LABEL_1: 0.51 | LABEL_2: 0.93
LABEL_6: 0.85
Input document 1

Custom Entities
File
Doc1
Doc2
...
Annotations Mode
Line
0
1
...
Begin Offset
10
0
...
Begin Offset
23
8
...
Type
LAWYER
JUDGE
...
Text
John
Bob
Roy
...
Entity List Mode
Type
LAWYER
LAWYER
JUDGE
...
Custom Entities allows you to customize
Amazon Comprehend to identify terms that
are specific to your domain.
Robert Shelby , who in June 2018 imposed a preliminary
injunction stopping the sales while the suit was litigated, ruled on
motions by both sides seeking summary judgment.
JUDGE
Recognition Example
Comprehend Custom Entities
or

Topic Modeling
Topic Modeling identifies relevant terms or topics from a collection of documents stored in Amazon S3. It
will identify the most common topics in the collection and organize them in groups and then map which
documents belong to which topic.
Comprehend Topic Modeling
Topic
1
1
2
2.
Topic Terms
Term
Amazon
Seattle
Holidays
Shopping
Weight
0.87
0.13
0.78
0.22
Document
Doc1.txt
Doc2.txt
Doc3.txt
Doc4.txt
Topics
Topic
1
1
2
2
Proportion
0.87
0.65
0.78
0.67
Input Documents

Q&A
Egor Pushkin
https://www.linkedin.com/in/egorpushkin/
@egorpushkin

State of NLP and Amazon Comprehend

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to State of NLP and Amazon Comprehend

Similar to State of NLP and Amazon Comprehend (20)

Recently uploaded

Recently uploaded (20)

State of NLP and Amazon Comprehend