Emily Pitler - Representations from Natural Language Data: Successes and Challenges

Successes and Challenges
Emily Pitler, Google AI
Representations
from Natural
Language Data

State-of-the-art in Natural Language Understanding in 2017
→ → Custom Recurrent Architectures
P 2

Oct. 2018: One Model with Task-specific Tuning in Minutes
P 3

BERT: Bidirectional Encoder Representations from Transformers
P 4

https://ai.googleblog.com/2017/08/transformer-novel-neural-network.html
Transformers: Attention is All You Need
P 5Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser, Polosukhin, NIPS 2017

From One-Hot Vectors to Word Embeddings &
Self-Attention
P 6
animal...street...it
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding
The Annotated Transformer, The Illustrated Transformer, The Illustrated BERT

Self-Attention
P 7
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding

query, key, value
Self-Attention
P 8
0000…10001…01000…0
one-hot
1.4…3.74.9…6.42.5…8.0
embedding

query, key, value
Self-Attention
P 9
0000…10001…01000…0
one-hot
0.1
0.2
0.7
(self-)attention
1.4…3.74.9…6.42.5…8.0
embedding

query, key, value
Self-Attention
P 10
0000…10001…01000…0
one-hot
0.1
0.2
0.7
(self-)attention
1.4…3.74.9…6.42.5…8.0
embedding

OpenAI: Generative Pretraining
The animal tired Acceptable
<s> the … too <s> the … tired
P 11
Transformer Transformer Transformer
Transformer TransformerTransformer

Understanding Can Need “Future” Information
How far is Jacksonville from Miami?
Jacksonville is in the First Coast region of northeast Florida and is centered on the
banks of the St. Johns River, about 25 miles (40 km) south of the Georgia state line
and about 340 miles (550 km) north of Miami.
VERB NOUN
Mark which area you want to distress. Mark, which area do you want to distress?
P 12

Naive Bidirectionality: Words Can “See Themselves”
The animal tired The animal tired
<s> the … too <s> the … too
P 13

Training BERT
Masked Language Model (Fill-in-the-blank)
Deep learning (also [MASK] [MASK] deep structured learning or [MASK]
learning) is part of a broader family of machine learning methods
[MASK] on [MASK] data representations, as opposed to task-specific
algorithms.
[MASK] is allergic to peaches. Is
P 14https://en.wikipedia.org/wiki/Deep_learning
https://en.wikipedia.org/wiki/Daniel_Tiger%27s_Neighborhood
BooksCorpus: Zhu, Kiros, Zemel, Salakhutdinov, Urtasun, Torralba, Fidler, CVPR 2015

SWAG: Zellers, Bisk, Schwartz, Choi, EMNLP 2018 SQuAD: Rajpurkar, Zhang, Lopyrev, Liang, EMNLP 2016
Results: Commonsense Reasoning and Question Answering
P 18

MRPC: Dolan and Brockett, IWP 2005
Pretraining Tasks Matter...and Bigger = Better *
P 19

Do I Need Full BERT Models for All My Tasks?
P 20Houlsby, Giurgiu, Jastrzebski, Morrone, de Laroussilhe, Gesmundo, Attariyan, Gelly, arxiv Feb 2019

Try It Out, Get Faster Training with TPUs
P 21

Mismatches between Training and
Realistic Inputs
Two Case Studies: Mixed Language Text and Identifying Commands
P 22

P 23
Multiple Languages: Frequent “In-the-Wild”, Rare in Training

P 24
Multiple Languages: Frequent “In-the-Wild”, Rare in Training

“A Fast, Compact, Accurate Model for Language Identification
of Codemixed Text”
Zhang, Riesa , Gillick , Bakalov, Baldridge, Weiss, EMNLP 2018 P 25

Accuracy and Speed of Token-level Language Id
P 26

Accuracy and Speed of Token-level Language Id
P 27

Useful Preprocessing Step Across Tasks
P 28

Mismatches between Training and
Realistic Inputs
Two Case Studies: Mixed Language Text and Identifying Commands
P 29

Noun-Verb Ambiguity
“lives” / Noun → /laIvz/
“lives” / Verb → /lIvz/
flies
NOUN
Mark VERB
P 30Elkahky, Webster, Andor, Pitler, EMNLP 2018

Certain insects can damage plumerias, such as mites, flies, or aphids. NOUN
Mark which area you want to distress. VERB
P 31
“A Challenge Set and Methods for Noun-Verb Ambiguity”,
EMNLP 2018

Accuracy on Noun-Verb Disambiguation
P 32

Pronunciation of Homographs Accuracy
P 33

Mark VERB
Webster, Recasens, Axelrod, Baldridge, TACL 2019 Kwiatkowski, Palomaki, Redfield, Collins, Parikh, Alberti, Epstein, Polosukhin, Kelcey, Devlin, Lee, Toutanova, Jones, Chang, Dai, Uszkoreit, Le, Petrov, TACL 2019
Released Datasets with “In-the-Wild” Natural Challenges
P 34

Emily Pitler - Representations from Natural Language Data: Successes and Challenges

Recommended

Recommended

More Related Content

Similar to Emily Pitler - Representations from Natural Language Data: Successes and Challenges

Similar to Emily Pitler - Representations from Natural Language Data: Successes and Challenges (20)

More from MLconf

More from MLconf (20)

Recently uploaded

Recently uploaded (20)

Emily Pitler - Representations from Natural Language Data: Successes and Challenges