SlideShare a Scribd company logo
1 of 65
Download to read offline
2
3
Hosting NLP models as online services is common
Are hosted models really secure?e.g., machine translation, textual entailment, summarization
3
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
4
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
Key considerations as text generation models continue to improve:

1. Faster generation to minimize latency
2. Secure models to prevent theft of intellectual property
Talk outline
• Examine story generation as a concrete
example to highlight these two challenges
• Go over modifications of model architectures to
increase generation speed
• Explore security vulnerabilities with current NLP
pipelines
5
6
story generation
produce a narrative that follows from
a given input prompt or context
7
story generation
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
produce a narrative that follows from
a given input prompt or context
8
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story: It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying flat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the rst to stumble upon it.

It was worth more than my life.

Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.

The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...

https://www.reddit.com/r/WritingPrompts/comments/dk2gdx/wp_flowers_have_become_so_rare_that_they_are_the/
9
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story: It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying flat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the rst to stumble upon it.

It was worth more than my life.

Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.

The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...

describe the setting
10
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying flat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the rst to stumble upon it.

It was worth more than my life.

Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.

The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...

Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
11
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying flat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the rst to stumble upon it.

It was worth more than my life.

Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.

The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...

Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
more setting
12
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying flat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the rst to stumble upon it.

It was worth more than my life.

Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.

The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...

Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
more setting
introduce new concept
13
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
GPT-2 large generates:
We have to be prepared and take care of the plants.
We can provide your flowers in all kinds of colors. The most
important thing to remember when choosing flowers is the color of
the petals.
Please use these links for more information.
http://www.floralkids.com/flowers/natural-flowers/cotton/
http://www.floralkids.com/flowers/natural-flowers/flowers-for-lilies/
http://www.floralkids.com/flowers/natural-flowers/rose/
http://www.floralkids.com/flowers/natural-flowers/sweet-scented-
flowers/
14
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
GPT-2 large generates:
We have to be prepared and take care of the plants.
We can provide your flowers in all kinds of colors. The most
important thing to remember when choosing flowers is the color of
the petals.
Please use these links for more information.
http://www.floralkids.com/flowers/natural-flowers/cotton/
http://www.floralkids.com/flowers/natural-flowers/flowers-for-lilies/
http://www.floralkids.com/flowers/natural-flowers/rose/
http://www.floralkids.com/flowers/natural-flowers/sweet-scented-
flowers/
… complete garbage
15
this task is extremely hard!
• many different plausible stories per prompt,
which makes training / evaluation difcult
• do we need to model the planning process?
• stories can be really long
• stories have all sorts of complex linguistic
phenomena
16
nevertheless, there’s lots of recent
work on story generation!
• Fan et al., ACL 2018: Hierarchical Neural Story
Generation
• Yao et al., AAAI 2019: Plan-And-Write: Towards
Better Automatic Storytelling
• Tambwekar et al., IJCAI 2019: Controllable
Neural Story Plot Generation via Reward
Shaping
and many more!
machine-in-the-loop storytelling
• A text generation model provides suggestions
(e.g., of scenes) to real authors, who can then
use it as inspiration for their nal story
17
Flowers have become
so rare that…
Blackbox API
(story generation)
user query:
model output:
It was the color of a
sunset, red-tipped
petals faded to yellow…
18
STORIUM: a dataset and evaluation
platform for story generation
Shufan Wang
ongoing work! see www.storium.com
Nader Akoury Nanyun Peng
www.storium.com
19
20
stories are written by a
group of users, each of
whom write from the
perspective of a
different character
21
each character is
associated with a set of
properties (“cards”)
22
a “narrator” user creates challenges for the
characters to solve. characters write scene
entries addressing these challenges.
23
24
# games 5,743
# users 30,119
# player cards 154,586
# challenges 61,223
# scene entries 448,264
total # of tokens ~126 million
ave tokens per entry ~247
25
how do we evaluate?
• we put our models on the Storium platform!
• users can generate a suggestion using our
model and then edit it. we collect their edits as
well as some simple feedback.
• compute recall-based metrics like ROUGE, or MT
metrics like HTER (human-targeted translation edit
rate)
• evaluation by domain experts for free!
We’ve built models that look like this:
26
a simple model
Challenge:
Find something valuable
from these strangers…
Character card:
Untrusting: you are hurt
and never trust people
Character:
Edward: a rich but
extremely introverted man
Scene entry:
Believing that these
people are trying to sell
him something, Edward
sighed and…
segment
embeddings
position
embeddings
token
embeddings
+
+
Scene entry completion:
Believing that these people are trying to
sell him something, Edward sighed and
turned away. The place seemed
unfamiliar to him, old and poorly kept. “I
am going to leave”, Edward said…
ne-tuned GPT-2
27
adding more context…
Challenge:
Find something valuable
from these strangers…
Character card:
Untrusting: you are hurt
and never trust people
Character:
Edward: a rich but
extremely introverted man
Scene entry:
Believing that these
people are trying to sell
him something, Edward
sighed and…
segment
embeddings
position
embeddings
token
embeddings
+
+
Scene entry completion:
Believing that these people are trying to
sell him something, Edward sighed and
turned away. The place seemed
unfamiliar to him, old and poorly kept. “I
am going to leave”, Edward said…
ne-tuned GPT-2
Previous challenge:
Investigate the strangers
with the dogs…
Previous entry written by this character:
Edward peered leisurely at the couple
through the kitchen window. They were
talking animatedly…
Immediately previous scene entry:
Lisa kept shouting, and the dogs kept
barking. Why were we arguing so near to
the mark’s house? So stupid…
28
how do users modify the generated suggestions?
scene entry after user edited the suggestion:
Chiang Lee hacked the shop's computer before going inside. That should keep
them busy. Alarms went off, electonic locks opened, and they worried about
losing all their data. He slipped in, and crossed the "security line". In this area
of the shop there was an orange sticker across all windows. Chiang Lee was
taking photos as he went for the record; every once in awhile someone
wanted special items, so it would be good to keep a log. However, for now he
knew exactly what he wanted - he found it, and pocketed it. It was a wide-
spectrum signal blocker. It could interfere with electronic equipment, block signals
and - allegedly - interfere with laser blasts and force fields. Whether CIA, FBI or
UFO, anyone wanting to take him or slow him down had better stick to analogue
and wood, because anything scientific or technological was toast. Chiang
noticed other people taking pictures while running around the shop looking
for some secret item...he blended in perfectly.
Common usage pattern: users keep
short phrases from the suggestion
but completely change the content
29
often, users delete everything
final edited version:
Annie checked her list again; she'd already read it multiple times, but she might now,
finally, be ready for Christmas. Thanks to her new flight experience, she was actually
going to be able to help deliver some of the many presents this year. She had also
managed some private preparations of her own for afterwards. Now watching the snow
fall softly just outside the window, Annie Bell felt the familiar Christmas tingle as the
magic of the evening approached. She dressed warmly, and went to help Noelle with
the reindeer, knowing the other elves would soon start to arrive to begin packing the
sleigh. Back in her room, six brightly wrapped Christmas presents glinted on her desk
each labelled to one of her new found friends.
generated text:
*Oh, wait*, Annie thought. *This is just like in the movies.* That's where she got stuck
with a living ghost! She was surprised at how this part of Christmas went on so long -
and by its strange behaviour that it started acting up again when things weren't exactly
normal for her to be having fun. So... what did Annie do? She had a friend helping all
over town. A caring friend who passed through Kringle every year-and-a half, passing
out presents from their favourite tree as they went! And here they were going to make
Christmas special (at least until a return trip home!), and now there seemed no end in
sight before she'd need an exorcism, or possibly a demon blood sacrifice... But maybe
if these last few days hadn’t gone so very smoothly… she could reach back into her
own deep pocket one more time and get her hands free – It would give you a real
boost later. Lets hope nothing happened during the ritual, too… But enough “hurr durr”
no n-gram overlap at all!
going forward…
• We will release the full STORIUM dataset to
researchers interested in building models
• You can submit your models to us for
evaluation on the Storium platform
• All models have to hook into a simple API
• Storium users don’t know what model is used
• A leaderboard for story generation!
30
31
practical considerations for
deploying generation models
• Setting up the Storium interface has been an
enormous engineering effort
• One of the most important challenges: latency
• If real users have to wait more than 30-60 seconds
to get model outputs, they aren’t gonna use it!
• How can we maintain a reasonable wait time even
when faced with a large volume of queries?
32
Transformer LMs are slow…
100 context
tokens
200 context
tokens
300 context
tokens
Generate next
100 tokens
2.3s 2.7s 3.1s
Generate next
200 tokens
4.4s 4.6s 5.7s
Generate next
300 tokens
7.3s 8.4 9.7s
GPT-2 medium timed on a single 2080Ti
We want to have contexts of >1000
tokens, so we need to be careful!
33
Removing self-attention in Transformers
for machine translation
Simeng Sun
ongoing work
Weiqiu You
(ACL 2020)
34
Self-attention
• Do we really need it?
• Self-attention is quadratic in the input length,
which makes things slow
• Everyone uses the Transformer for everything
nowadays, but it’s not really clear what
components of the model are necessary…
35
Self-attention
53
committee awards Strickland advanced opticswho
Layer p
Q
K
V
these awesome slides
are from Emma Strubell
Nobel
36
Self-attention
54
Layer p
Q
K
V
[Vaswani et al. 2017]
committee awards Strickland advanced opticswhoNobel
these awesome slides
are from Emma Strubell
37
optics
advanced
who
Strickland
awards
committee
Nobel
Self-attention
56
Layer p
Q
K
V
A
[Vaswani et al. 2017]
committee awards Strickland advanced opticswhoNobel
these awesome slides
are from Emma Strubell
38
Self-attention
57
Layer p
Q
K
V
[Vaswani et al. 2017]
committee awards Strickland advanced opticswhoNobel
optics
advanced
who
Strickland
awards
committee
NobelA
these awesome slides
are from Emma Strubell
39
Self-attention
59
Layer p
Q
K
V
M
[Vaswani et al. 2017]
committee awards Strickland advanced opticswhoNobel
optics
advanced
who
Strickland
awards
committee
NobelA
40
what if instead of learning the
attention scores, we simply
hard-code them?
41
Hardcoded self-attention
61
committee awards Strickland advanced opticswho
Layer p
V
Nobel
42
Hardcoded self-attention
62
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
43
Hardcoded self-attention
63
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
These values are from a standard normal
distribution centered on the word to the
left of the current word, “Strickland”
44
Hardcoded self-attention
64
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0
0.6
0.3
0.1
0
0
0
0
left
center
right
rst
We can use distributions
centered at different
locations to hard-code
multiple heads
45
Hardcoded self-attention
65
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0
0.6
0.3
0.1
0
0
0
0
left
center
right
rst
This is equivalent to a 1-d
convolution with a xed kernel
46
encoder self
attention decoder self
attention
decoder
cross
attention
All three attention types
can be hard-coded!
47
how much do you need self-attention?
dataset
all heads
learned
IWSLT16 en-de 30.0
IWSLT16 de-en 34.4
WMT16 en-ro 33.0
WMT16 ro-en 33.1
WMT14 en-de 26.8
WMT14 en-fr 40.3
48
how much do you need self-attention?
dataset
all heads
learned
hard-code
self attn
IWSLT16 en-de 30.0 30.3
IWSLT16 de-en 34.4 34.8
WMT16 en-ro 33.0 32.4
WMT16 ro-en 33.1 32.8
WMT14 en-de 26.8 26.3
WMT14 en-fr 40.3 39.1
hard-coding self-
attention in the
encoder and
decoder does not
signicantly
impact BLEU!
49
efciency improvements
• Batch size increase: hard-coding self-attention
and reducing cross attention heads allows us to
use 27% more tokens per batch
• Decoding speed (sentences/sec) increases by
30.2% over the baseline Transformer
can we do better?
Moral of the story: if latency is important for your task,
consider drastic simplications to your neural models;
oftentimes, you can make them faster without signicant
performance loss
50
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
This pipeline is vulnerable to model extraction attacks
51
Thieves on Sesame Street!
Model Extraction of BERT-based APIs
Kalpesh
Krishna1
Mohit
Iyyer1
Nicolas
Papernot2
Ankur P.
Parikh2
Gaurav S.
Tomar2
1 2
Work done during an internship at Google AI Language.
(ICLR 2020)
52
“This is a
great movie!”
Photo credits - http://jalammar.github.io/illustrated-bert/
Victim Model (Blackbox API)
(binary sentiment classification)
Positive
5
A company trains a binary sentiment classifier based on BERT
53
“This is a
great movie!” Positive
Victim Model (Blackbox API)
6
It is released as a black-box API (the “victim model”)
54
“seventeen III.
miles Vegas”
“Circle Ford had
support. wife rulers
broken Jan Family”
x N
7
A malicious user generates many queries
(in this work, random gibberish sequences of words)
55
“seventeen III.
miles Vegas”
“Circle Ford had
support. wife rulers
broken Jan Family”
x N
Victim Model (Blackbox API)
Positive
Negative
8
The attacker queries the API with the generated inputs and
collects the labels
56
“seventeen III.
miles Vegas”
“Circle Ford had
support. wife rulers
broken Jan Family”
x N
Training Data X Training Data Y
Victim Model (Blackbox API)
Positive
Negative
9
The collected data is used to train a “copy” of the model
57
“seventeen III.
miles Vegas”
“Circle Ford had
support. wife rulers
broken Jan Family”
x N
“This is a
great movie!” Positive
Victim Model (Blackbox API)
Extracted Model
Positive
Negative
10
The stolen copy (“extracted model”) works well on real data
58
Why is model extraction a problem?
Leakage of original
training data
Adversarial example
generation
Image credits - Papernot et al. 2017, Fredrikson et al. 2015
Theft of intellectual
property
11
59
These attacks are economically practical
Google Cloud Natural Language API cost <= $1.00 per 1000 API calls.
https://cloud.google.com/products/calculator/
Dataset Size Upperbound Price
SST2 (sentiment classify) 67349 sentences $62.35
Switchboard (speech) 300 hours $430.56
Translation 1 million sentences
(100 characters each)
$2000.00
12
Smart attackers can scrape APIs like Google Translate for free
60
We use two query generators - RANDOM & WIKI
19
RANDOM
(gibberish sequences of words
sampled from a fixed vocabulary)
1. cent 1977, preparation (120 remote
Program finance add broader protection
2. Mike zone fights Woods Second State
known, defined come
WIKI
(sentences from Wikipedia)
1. The unique glass chapel made public
and press viewing of the wedding easy.
2. Wrapped in Red was first released
internationally on October 25, 2013.
61
Results - attacks are effective
24
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
A BERT model trained on the real SQuAD data gets 90.6 F1
62
Results - attacks are effective
25
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
RANDOM achieves 85.8 F1 (~95% performance) without seeing a
single grammatically valid paragraph or question during training
63
Results - attacks are effective
26
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
WIKI achieves 89.4 F1 (~99% performance) without seeing a
single grammatically valid question during training
There are no effective defenses
for this kind of attack, so think
twice before allowing users to
have query access to your
models!
64
In a recent paper (“Imitation Attacks and Defenses for
Black-box Machine Translation System”), Google
Translate was successfully stolen!
65
thanks! questions?
work supported by
Thanks!

More Related Content

More from Fwdays

More from Fwdays (20)

"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y..."How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
"How Preply reduced ML model development time from 1 month to 1 day",Yevhen Y...
 
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
"GenAI Apps: Our Journey from Ideas to Production Excellence",Danil Topchii
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets"What is a RAG system and how to build it",Dmytro Spodarets
"What is a RAG system and how to build it",Dmytro Spodarets
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi"Distributed graphs and microservices in Prom.ua",  Maksym Kindritskyi
"Distributed graphs and microservices in Prom.ua", Maksym Kindritskyi
 
"Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl..."Rethinking the existing data loading and processing process as an ETL exampl...
"Rethinking the existing data loading and processing process as an ETL exampl...
 
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T..."How Ukrainian IT specialist can go on vacation abroad without crossing the T...
"How Ukrainian IT specialist can go on vacation abroad without crossing the T...
 
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ..."The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
"The Strength of Being Vulnerable: the experience from CIA, Tesla and Uber", ...
 
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu..."[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
"[QUICK TALK] Radical candor: how to achieve results faster thanks to a cultu...
 
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care..."[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
"[QUICK TALK] PDP Plan, the only one door to raise your salary and boost care...
 
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"..."4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
"4 horsemen of the apocalypse of working relationships (+ antidotes to them)"...
 
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast..."Reconnecting with Purpose: Rediscovering Job Interest after Burnout",  Anast...
"Reconnecting with Purpose: Rediscovering Job Interest after Burnout", Anast...
 
"Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others..."Mentoring 101: How to effectively invest experience in the success of others...
"Mentoring 101: How to effectively invest experience in the success of others...
 
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova"Mission (im) possible: How to get an offer in 2024?",  Oleksandra Myronova
"Mission (im) possible: How to get an offer in 2024?", Oleksandra Myronova
 
"Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv..."Why have we learned how to package products, but not how to 'package ourselv...
"Why have we learned how to package products, but not how to 'package ourselv...
 
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin..."How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
"How to tame the dragon, or leadership with imposter syndrome", Oleksandr Zin...
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Mohit Iyyer "Towards fast and coherent long-form text generation"

  • 1.
  • 2. 2 3 Hosting NLP models as online services is common Are hosted models really secure?e.g., machine translation, textual entailment, summarization
  • 3. 3 I drove to school Blackbox API (e.g., Google Translate) Ich bin zur Schule gefahren. user query: model output:
  • 4. 4 I drove to school Blackbox API (e.g., Google Translate) Ich bin zur Schule gefahren. user query: model output: Key considerations as text generation models continue to improve: 1. Faster generation to minimize latency 2. Secure models to prevent theft of intellectual property
  • 5. Talk outline • Examine story generation as a concrete example to highlight these two challenges • Go over modications of model architectures to increase generation speed • Explore security vulnerabilities with current NLP pipelines 5
  • 6. 6 story generation produce a narrative that follows from a given input prompt or context
  • 7. 7 story generation Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. produce a narrative that follows from a given input prompt or context
  • 8. 8 Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: story: It was the color of a sunset: red tipped petals faded to yellow. Stark contrast to the roiling forever-cloud above it. A thick green stem climbed out of a crack in the dry earth, thorns lining it like a ladder. It had been hidden between two dry logs, lying flat like hollowed bones. It must have peered out from between the dead trees only in the last few days, and I had been the rst to stumble upon it. It was worth more than my life. Below me lay my village of rusted huts. They littered the valley sporadically, as if a train had been derailed and its carriages rolled free, far and wide. Men and women worked around the shacks, digging and raking the land, trying to grow weeds that could be eaten, as kids kicked tins into makeshift goals around them. The rose was worth more than any of their lives too. All of them together, even. What a biodome would pay for it... https://www.reddit.com/r/WritingPrompts/comments/dk2gdx/wp_flowers_have_become_so_rare_that_they_are_the/
  • 9. 9 Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: story: It was the color of a sunset: red tipped petals faded to yellow. Stark contrast to the roiling forever-cloud above it. A thick green stem climbed out of a crack in the dry earth, thorns lining it like a ladder. It had been hidden between two dry logs, lying flat like hollowed bones. It must have peered out from between the dead trees only in the last few days, and I had been the rst to stumble upon it. It was worth more than my life. Below me lay my village of rusted huts. They littered the valley sporadically, as if a train had been derailed and its carriages rolled free, far and wide. Men and women worked around the shacks, digging and raking the land, trying to grow weeds that could be eaten, as kids kicked tins into makeshift goals around them. The rose was worth more than any of their lives too. All of them together, even. What a biodome would pay for it... describe the setting
  • 10. 10 It was the color of a sunset: red tipped petals faded to yellow. Stark contrast to the roiling forever-cloud above it. A thick green stem climbed out of a crack in the dry earth, thorns lining it like a ladder. It had been hidden between two dry logs, lying flat like hollowed bones. It must have peered out from between the dead trees only in the last few days, and I had been the rst to stumble upon it. It was worth more than my life. Below me lay my village of rusted huts. They littered the valley sporadically, as if a train had been derailed and its carriages rolled free, far and wide. Men and women worked around the shacks, digging and raking the land, trying to grow weeds that could be eaten, as kids kicked tins into makeshift goals around them. The rose was worth more than any of their lives too. All of them together, even. What a biodome would pay for it... Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: story: describe the setting introduce main character
  • 11. 11 It was the color of a sunset: red tipped petals faded to yellow. Stark contrast to the roiling forever-cloud above it. A thick green stem climbed out of a crack in the dry earth, thorns lining it like a ladder. It had been hidden between two dry logs, lying flat like hollowed bones. It must have peered out from between the dead trees only in the last few days, and I had been the rst to stumble upon it. It was worth more than my life. Below me lay my village of rusted huts. They littered the valley sporadically, as if a train had been derailed and its carriages rolled free, far and wide. Men and women worked around the shacks, digging and raking the land, trying to grow weeds that could be eaten, as kids kicked tins into makeshift goals around them. The rose was worth more than any of their lives too. All of them together, even. What a biodome would pay for it... Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: story: describe the setting introduce main character more setting
  • 12. 12 It was the color of a sunset: red tipped petals faded to yellow. Stark contrast to the roiling forever-cloud above it. A thick green stem climbed out of a crack in the dry earth, thorns lining it like a ladder. It had been hidden between two dry logs, lying flat like hollowed bones. It must have peered out from between the dead trees only in the last few days, and I had been the rst to stumble upon it. It was worth more than my life. Below me lay my village of rusted huts. They littered the valley sporadically, as if a train had been derailed and its carriages rolled free, far and wide. Men and women worked around the shacks, digging and raking the land, trying to grow weeds that could be eaten, as kids kicked tins into makeshift goals around them. The rose was worth more than any of their lives too. All of them together, even. What a biodome would pay for it... Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: story: describe the setting introduce main character more setting introduce new concept
  • 13. 13 Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: GPT-2 large generates: We have to be prepared and take care of the plants. We can provide your flowers in all kinds of colors. The most important thing to remember when choosing flowers is the color of the petals. Please use these links for more information. http://www.floralkids.com/flowers/natural-flowers/cotton/ http://www.floralkids.com/flowers/natural-flowers/flowers-for-lilies/ http://www.floralkids.com/flowers/natural-flowers/rose/ http://www.floralkids.com/flowers/natural-flowers/sweet-scented- flowers/
  • 14. 14 Flowers have become so rare that they are the most sought after items in the world, sold at high prices in black markets, under guard in national museums etc. You just stumbled across a natural rose. prompt: GPT-2 large generates: We have to be prepared and take care of the plants. We can provide your flowers in all kinds of colors. The most important thing to remember when choosing flowers is the color of the petals. Please use these links for more information. http://www.floralkids.com/flowers/natural-flowers/cotton/ http://www.floralkids.com/flowers/natural-flowers/flowers-for-lilies/ http://www.floralkids.com/flowers/natural-flowers/rose/ http://www.floralkids.com/flowers/natural-flowers/sweet-scented- flowers/ … complete garbage
  • 15. 15 this task is extremely hard! • many different plausible stories per prompt, which makes training / evaluation difcult • do we need to model the planning process? • stories can be really long • stories have all sorts of complex linguistic phenomena
  • 16. 16 nevertheless, there’s lots of recent work on story generation! • Fan et al., ACL 2018: Hierarchical Neural Story Generation • Yao et al., AAAI 2019: Plan-And-Write: Towards Better Automatic Storytelling • Tambwekar et al., IJCAI 2019: Controllable Neural Story Plot Generation via Reward Shaping and many more!
  • 17. machine-in-the-loop storytelling • A text generation model provides suggestions (e.g., of scenes) to real authors, who can then use it as inspiration for their nal story 17 Flowers have become so rare that… Blackbox API (story generation) user query: model output: It was the color of a sunset, red-tipped petals faded to yellow…
  • 18. 18 STORIUM: a dataset and evaluation platform for story generation Shufan Wang ongoing work! see www.storium.com Nader Akoury Nanyun Peng www.storium.com
  • 19. 19
  • 20. 20 stories are written by a group of users, each of whom write from the perspective of a different character
  • 21. 21 each character is associated with a set of properties (“cards”)
  • 22. 22 a “narrator” user creates challenges for the characters to solve. characters write scene entries addressing these challenges.
  • 23. 23
  • 24. 24 # games 5,743 # users 30,119 # player cards 154,586 # challenges 61,223 # scene entries 448,264 total # of tokens ~126 million ave tokens per entry ~247
  • 25. 25 how do we evaluate? • we put our models on the Storium platform! • users can generate a suggestion using our model and then edit it. we collect their edits as well as some simple feedback. • compute recall-based metrics like ROUGE, or MT metrics like HTER (human-targeted translation edit rate) • evaluation by domain experts for free!
  • 26. We’ve built models that look like this: 26 a simple model Challenge: Find something valuable from these strangers… Character card: Untrusting: you are hurt and never trust people Character: Edward: a rich but extremely introverted man Scene entry: Believing that these people are trying to sell him something, Edward sighed and… segment embeddings position embeddings token embeddings + + Scene entry completion: Believing that these people are trying to sell him something, Edward sighed and turned away. The place seemed unfamiliar to him, old and poorly kept. “I am going to leave”, Edward said… ne-tuned GPT-2
  • 27. 27 adding more context… Challenge: Find something valuable from these strangers… Character card: Untrusting: you are hurt and never trust people Character: Edward: a rich but extremely introverted man Scene entry: Believing that these people are trying to sell him something, Edward sighed and… segment embeddings position embeddings token embeddings + + Scene entry completion: Believing that these people are trying to sell him something, Edward sighed and turned away. The place seemed unfamiliar to him, old and poorly kept. “I am going to leave”, Edward said… ne-tuned GPT-2 Previous challenge: Investigate the strangers with the dogs… Previous entry written by this character: Edward peered leisurely at the couple through the kitchen window. They were talking animatedly… Immediately previous scene entry: Lisa kept shouting, and the dogs kept barking. Why were we arguing so near to the mark’s house? So stupid…
  • 28. 28 how do users modify the generated suggestions? scene entry after user edited the suggestion: Chiang Lee hacked the shop's computer before going inside. That should keep them busy. Alarms went off, electonic locks opened, and they worried about losing all their data. He slipped in, and crossed the "security line". In this area of the shop there was an orange sticker across all windows. Chiang Lee was taking photos as he went for the record; every once in awhile someone wanted special items, so it would be good to keep a log. However, for now he knew exactly what he wanted - he found it, and pocketed it. It was a wide- spectrum signal blocker. It could interfere with electronic equipment, block signals and - allegedly - interfere with laser blasts and force fields. Whether CIA, FBI or UFO, anyone wanting to take him or slow him down had better stick to analogue and wood, because anything scientific or technological was toast. Chiang noticed other people taking pictures while running around the shop looking for some secret item...he blended in perfectly. Common usage pattern: users keep short phrases from the suggestion but completely change the content
  • 29. 29 often, users delete everything final edited version: Annie checked her list again; she'd already read it multiple times, but she might now, finally, be ready for Christmas. Thanks to her new flight experience, she was actually going to be able to help deliver some of the many presents this year. She had also managed some private preparations of her own for afterwards. Now watching the snow fall softly just outside the window, Annie Bell felt the familiar Christmas tingle as the magic of the evening approached. She dressed warmly, and went to help Noelle with the reindeer, knowing the other elves would soon start to arrive to begin packing the sleigh. Back in her room, six brightly wrapped Christmas presents glinted on her desk each labelled to one of her new found friends. generated text: *Oh, wait*, Annie thought. *This is just like in the movies.* That's where she got stuck with a living ghost! She was surprised at how this part of Christmas went on so long - and by its strange behaviour that it started acting up again when things weren't exactly normal for her to be having fun. So... what did Annie do? She had a friend helping all over town. A caring friend who passed through Kringle every year-and-a half, passing out presents from their favourite tree as they went! And here they were going to make Christmas special (at least until a return trip home!), and now there seemed no end in sight before she'd need an exorcism, or possibly a demon blood sacrifice... But maybe if these last few days hadn’t gone so very smoothly… she could reach back into her own deep pocket one more time and get her hands free – It would give you a real boost later. Lets hope nothing happened during the ritual, too… But enough “hurr durr” no n-gram overlap at all!
  • 30. going forward… • We will release the full STORIUM dataset to researchers interested in building models • You can submit your models to us for evaluation on the Storium platform • All models have to hook into a simple API • Storium users don’t know what model is used • A leaderboard for story generation! 30
  • 31. 31 practical considerations for deploying generation models • Setting up the Storium interface has been an enormous engineering effort • One of the most important challenges: latency • If real users have to wait more than 30-60 seconds to get model outputs, they aren’t gonna use it! • How can we maintain a reasonable wait time even when faced with a large volume of queries?
  • 32. 32 Transformer LMs are slow… 100 context tokens 200 context tokens 300 context tokens Generate next 100 tokens 2.3s 2.7s 3.1s Generate next 200 tokens 4.4s 4.6s 5.7s Generate next 300 tokens 7.3s 8.4 9.7s GPT-2 medium timed on a single 2080Ti We want to have contexts of >1000 tokens, so we need to be careful!
  • 33. 33 Removing self-attention in Transformers for machine translation Simeng Sun ongoing work Weiqiu You (ACL 2020)
  • 34. 34 Self-attention • Do we really need it? • Self-attention is quadratic in the input length, which makes things slow • Everyone uses the Transformer for everything nowadays, but it’s not really clear what components of the model are necessary…
  • 35. 35 Self-attention 53 committee awards Strickland advanced opticswho Layer p Q K V these awesome slides are from Emma Strubell Nobel
  • 36. 36 Self-attention 54 Layer p Q K V [Vaswani et al. 2017] committee awards Strickland advanced opticswhoNobel these awesome slides are from Emma Strubell
  • 37. 37 optics advanced who Strickland awards committee Nobel Self-attention 56 Layer p Q K V A [Vaswani et al. 2017] committee awards Strickland advanced opticswhoNobel these awesome slides are from Emma Strubell
  • 38. 38 Self-attention 57 Layer p Q K V [Vaswani et al. 2017] committee awards Strickland advanced opticswhoNobel optics advanced who Strickland awards committee NobelA these awesome slides are from Emma Strubell
  • 39. 39 Self-attention 59 Layer p Q K V M [Vaswani et al. 2017] committee awards Strickland advanced opticswhoNobel optics advanced who Strickland awards committee NobelA
  • 40. 40 what if instead of learning the attention scores, we simply hard-code them?
  • 41. 41 Hardcoded self-attention 61 committee awards Strickland advanced opticswho Layer p V Nobel
  • 42. 42 Hardcoded self-attention 62 committee awards Strickland advanced opticswho Layer p V Nobel optics advanced who Strickland awards committee Nobel 0 0 0.1 0.2 0.4 0.2 0.1
  • 43. 43 Hardcoded self-attention 63 committee awards Strickland advanced opticswho Layer p V Nobel optics advanced who Strickland awards committee Nobel 0 0 0.1 0.2 0.4 0.2 0.1 These values are from a standard normal distribution centered on the word to the left of the current word, “Strickland”
  • 44. 44 Hardcoded self-attention 64 committee awards Strickland advanced opticswho Layer p V Nobel optics advanced who Strickland awards committee Nobel 0 0 0.1 0.2 0.4 0.2 0.1 0 0.1 0.2 0.4 0.2 0.1 0 0.1 0.2 0.4 0.2 0.1 0 0 0.6 0.3 0.1 0 0 0 0 left center right rst We can use distributions centered at different locations to hard-code multiple heads
  • 45. 45 Hardcoded self-attention 65 committee awards Strickland advanced opticswho Layer p V Nobel optics advanced who Strickland awards committee Nobel 0 0 0.1 0.2 0.4 0.2 0.1 0 0.1 0.2 0.4 0.2 0.1 0 0.1 0.2 0.4 0.2 0.1 0 0 0.6 0.3 0.1 0 0 0 0 left center right rst This is equivalent to a 1-d convolution with a xed kernel
  • 46. 46 encoder self attention decoder self attention decoder cross attention All three attention types can be hard-coded!
  • 47. 47 how much do you need self-attention? dataset all heads learned IWSLT16 en-de 30.0 IWSLT16 de-en 34.4 WMT16 en-ro 33.0 WMT16 ro-en 33.1 WMT14 en-de 26.8 WMT14 en-fr 40.3
  • 48. 48 how much do you need self-attention? dataset all heads learned hard-code self attn IWSLT16 en-de 30.0 30.3 IWSLT16 de-en 34.4 34.8 WMT16 en-ro 33.0 32.4 WMT16 ro-en 33.1 32.8 WMT14 en-de 26.8 26.3 WMT14 en-fr 40.3 39.1 hard-coding self- attention in the encoder and decoder does not signicantly impact BLEU!
  • 49. 49 efciency improvements • Batch size increase: hard-coding self-attention and reducing cross attention heads allows us to use 27% more tokens per batch • Decoding speed (sentences/sec) increases by 30.2% over the baseline Transformer can we do better? Moral of the story: if latency is important for your task, consider drastic simplications to your neural models; oftentimes, you can make them faster without signicant performance loss
  • 50. 50 I drove to school Blackbox API (e.g., Google Translate) Ich bin zur Schule gefahren. user query: model output: This pipeline is vulnerable to model extraction attacks
  • 51. 51 Thieves on Sesame Street! Model Extraction of BERT-based APIs Kalpesh Krishna1 Mohit Iyyer1 Nicolas Papernot2 Ankur P. Parikh2 Gaurav S. Tomar2 1 2 Work done during an internship at Google AI Language. (ICLR 2020)
  • 52. 52 “This is a great movie!” Photo credits - http://jalammar.github.io/illustrated-bert/ Victim Model (Blackbox API) (binary sentiment classification) Positive 5 A company trains a binary sentiment classifier based on BERT
  • 53. 53 “This is a great movie!” Positive Victim Model (Blackbox API) 6 It is released as a black-box API (the “victim model”)
  • 54. 54 “seventeen III. miles Vegas” “Circle Ford had support. wife rulers broken Jan Family” x N 7 A malicious user generates many queries (in this work, random gibberish sequences of words)
  • 55. 55 “seventeen III. miles Vegas” “Circle Ford had support. wife rulers broken Jan Family” x N Victim Model (Blackbox API) Positive Negative 8 The attacker queries the API with the generated inputs and collects the labels
  • 56. 56 “seventeen III. miles Vegas” “Circle Ford had support. wife rulers broken Jan Family” x N Training Data X Training Data Y Victim Model (Blackbox API) Positive Negative 9 The collected data is used to train a “copy” of the model
  • 57. 57 “seventeen III. miles Vegas” “Circle Ford had support. wife rulers broken Jan Family” x N “This is a great movie!” Positive Victim Model (Blackbox API) Extracted Model Positive Negative 10 The stolen copy (“extracted model”) works well on real data
  • 58. 58 Why is model extraction a problem? Leakage of original training data Adversarial example generation Image credits - Papernot et al. 2017, Fredrikson et al. 2015 Theft of intellectual property 11
  • 59. 59 These attacks are economically practical Google Cloud Natural Language API cost <= $1.00 per 1000 API calls. https://cloud.google.com/products/calculator/ Dataset Size Upperbound Price SST2 (sentiment classify) 67349 sentences $62.35 Switchboard (speech) 300 hours $430.56 Translation 1 million sentences (100 characters each) $2000.00 12 Smart attackers can scrape APIs like Google Translate for free
  • 60. 60 We use two query generators - RANDOM & WIKI 19 RANDOM (gibberish sequences of words sampled from a fixed vocabulary) 1. cent 1977, preparation (120 remote Program finance add broader protection 2. Mike zone fights Woods Second State known, defined come WIKI (sentences from Wikipedia) 1. The unique glass chapel made public and press viewing of the wedding easy. 2. Wrapped in Red was first released internationally on October 25, 2013.
  • 61. 61 Results - attacks are effective 24 # of Queries SST2 (%) MNLI (%) SQUAD (F1) API / Victim Model 1x 93.1 85.8 90.6 RANDOM 1x 90.1 76.3 79.1 RANDOM upto 10x 90.5 78.5 85.8 WIKI 1x 91.4 77.8 86.1 WIKI upto 10x 91.7 79.3 89.4 A BERT model trained on the real SQuAD data gets 90.6 F1
  • 62. 62 Results - attacks are effective 25 # of Queries SST2 (%) MNLI (%) SQUAD (F1) API / Victim Model 1x 93.1 85.8 90.6 RANDOM 1x 90.1 76.3 79.1 RANDOM upto 10x 90.5 78.5 85.8 WIKI 1x 91.4 77.8 86.1 WIKI upto 10x 91.7 79.3 89.4 RANDOM achieves 85.8 F1 (~95% performance) without seeing a single grammatically valid paragraph or question during training
  • 63. 63 Results - attacks are effective 26 # of Queries SST2 (%) MNLI (%) SQUAD (F1) API / Victim Model 1x 93.1 85.8 90.6 RANDOM 1x 90.1 76.3 79.1 RANDOM upto 10x 90.5 78.5 85.8 WIKI 1x 91.4 77.8 86.1 WIKI upto 10x 91.7 79.3 89.4 WIKI achieves 89.4 F1 (~99% performance) without seeing a single grammatically valid question during training
  • 64. There are no effective defenses for this kind of attack, so think twice before allowing users to have query access to your models! 64 In a recent paper (“Imitation Attacks and Defenses for Black-box Machine Translation System”), Google Translate was successfully stolen!