Longform text generation, which involves building computational models to produce lengthy output texts (e.g., summaries, or translations of documents), is still in its infancy. In this talk, I focus on two key challenges with long-form text generation models: (1) the deterioration of their output quality as outputs become longer and more complex, and (2) their general memory and speed inefficiency, which causes latency issues when they are deployed in user-facing applications. Iframe these issues within a case study on computational story generation, a task for which my lab has successfully built and deployed a system to thousands of users. I will discuss specifics of our model architecture, training strategies to maximize memory use, and simplified model variants to increase inference speed.
Mohit Iyyer "Towards fast and coherent long-form text generation"
1.
2. 2
3
Hosting NLP models as online services is common
Are hosted models really secure?e.g., machine translation, textual entailment, summarization
3. 3
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
4. 4
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
Key considerations as text generation models continue to improve:
1. Faster generation to minimize latency
2. Secure models to prevent theft of intellectual property
5. Talk outline
⢠Examine story generation as a concrete
example to highlight these two challenges
⢠Go over modiďŹcations of model architectures to
increase generation speed
⢠Explore security vulnerabilities with current NLP
pipelines
5
7. 7
story generation
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
produce a narrative that follows from
a given input prompt or context
8. 8
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story: It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying ďŹat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the ďŹrst to stumble upon it.
It was worth more than my life.
Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.
The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...
https://www.reddit.com/r/WritingPrompts/comments/dk2gdx/wp_ďŹowers_have_become_so_rare_that_they_are_the/
9. 9
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story: It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying ďŹat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the ďŹrst to stumble upon it.
It was worth more than my life.
Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.
The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...
describe the setting
10. 10
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying ďŹat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the ďŹrst to stumble upon it.
It was worth more than my life.
Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.
The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
11. 11
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying ďŹat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the ďŹrst to stumble upon it.
It was worth more than my life.
Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.
The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
more setting
12. 12
It was the color of a sunset: red tipped petals faded to
yellow. Stark contrast to the roiling forever-cloud above it. A
thick green stem climbed out of a crack in the dry earth,
thorns lining it like a ladder. It had been hidden between two
dry logs, lying ďŹat like hollowed bones. It must have peered
out from between the dead trees only in the last few days,
and I had been the ďŹrst to stumble upon it.
It was worth more than my life.
Below me lay my village of rusted huts. They littered the
valley sporadically, as if a train had been derailed and its
carriages rolled free, far and wide. Men and women worked
around the shacks, digging and raking the land, trying to
grow weeds that could be eaten, as kids kicked tins into
makeshift goals around them.
The rose was worth more than any of their lives too. All of
them together, even. What a biodome would pay for it...
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
story:
describe the setting
introduce main character
more setting
introduce new concept
13. 13
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
GPT-2 large generates:
We have to be prepared and take care of the plants.
We can provide your ďŹowers in all kinds of colors. The most
important thing to remember when choosing ďŹowers is the color of
the petals.
Please use these links for more information.
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/cotton/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/ďŹowers-for-lilies/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/rose/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/sweet-scented-
ďŹowers/
14. 14
Flowers have become so rare that they are the most sought
after items in the world, sold at high prices in black markets,
under guard in national museums etc. You just stumbled
across a natural rose.
prompt:
GPT-2 large generates:
We have to be prepared and take care of the plants.
We can provide your ďŹowers in all kinds of colors. The most
important thing to remember when choosing ďŹowers is the color of
the petals.
Please use these links for more information.
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/cotton/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/ďŹowers-for-lilies/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/rose/
http://www.ďŹoralkids.com/ďŹowers/natural-ďŹowers/sweet-scented-
ďŹowers/
⌠complete garbage
15. 15
this task is extremely hard!
⢠many different plausible stories per prompt,
which makes training / evaluation difďŹcult
⢠do we need to model the planning process?
⢠stories can be really long
⢠stories have all sorts of complex linguistic
phenomena
16. 16
nevertheless, thereâs lots of recent
work on story generation!
⢠Fan et al., ACL 2018: Hierarchical Neural Story
Generation
⢠Yao et al., AAAI 2019: Plan-And-Write: Towards
Better Automatic Storytelling
⢠Tambwekar et al., IJCAI 2019: Controllable
Neural Story Plot Generation via Reward
Shaping
and many more!
17. machine-in-the-loop storytelling
⢠A text generation model provides suggestions
(e.g., of scenes) to real authors, who can then
use it as inspiration for their ďŹnal story
17
Flowers have become
so rare thatâŚ
Blackbox API
(story generation)
user query:
model output:
It was the color of a
sunset, red-tipped
petals faded to yellowâŚ
18. 18
STORIUM: a dataset and evaluation
platform for story generation
Shufan Wang
ongoing work! see www.storium.com
Nader Akoury Nanyun Peng
www.storium.com
24. 24
# games 5,743
# users 30,119
# player cards 154,586
# challenges 61,223
# scene entries 448,264
total # of tokens ~126 million
ave tokens per entry ~247
25. 25
how do we evaluate?
⢠we put our models on the Storium platform!
⢠users can generate a suggestion using our
model and then edit it. we collect their edits as
well as some simple feedback.
⢠compute recall-based metrics like ROUGE, or MT
metrics like HTER (human-targeted translation edit
rate)
⢠evaluation by domain experts for free!
26. Weâve built models that look like this:
26
a simple model
Challenge:
Find something valuable
from these strangersâŚ
Character card:
Untrusting: you are hurt
and never trust people
Character:
Edward: a rich but
extremely introverted man
Scene entry:
Believing that these
people are trying to sell
him something, Edward
sighed andâŚ
segment
embeddings
position
embeddings
token
embeddings
+
+
Scene entry completion:
Believing that these people are trying to
sell him something, Edward sighed and
turned away. The place seemed
unfamiliar to him, old and poorly kept. âI
am going to leaveâ, Edward saidâŚ
ďŹne-tuned GPT-2
27. 27
adding more contextâŚ
Challenge:
Find something valuable
from these strangersâŚ
Character card:
Untrusting: you are hurt
and never trust people
Character:
Edward: a rich but
extremely introverted man
Scene entry:
Believing that these
people are trying to sell
him something, Edward
sighed andâŚ
segment
embeddings
position
embeddings
token
embeddings
+
+
Scene entry completion:
Believing that these people are trying to
sell him something, Edward sighed and
turned away. The place seemed
unfamiliar to him, old and poorly kept. âI
am going to leaveâ, Edward saidâŚ
ďŹne-tuned GPT-2
Previous challenge:
Investigate the strangers
with the dogsâŚ
Previous entry written by this character:
Edward peered leisurely at the couple
through the kitchen window. They were
talking animatedlyâŚ
Immediately previous scene entry:
Lisa kept shouting, and the dogs kept
barking. Why were we arguing so near to
the markâs house? So stupidâŚ
28. 28
how do users modify the generated suggestions?
scene entry after user edited the suggestion:
Chiang Lee hacked the shop's computer before going inside. That should keep
them busy. Alarms went off, electonic locks opened, and they worried about
losing all their data. He slipped in, and crossed the "security line". In this area
of the shop there was an orange sticker across all windows. Chiang Lee was
taking photos as he went for the record; every once in awhile someone
wanted special items, so it would be good to keep a log. However, for now he
knew exactly what he wanted - he found it, and pocketed it. It was a wide-
spectrum signal blocker. It could interfere with electronic equipment, block signals
and - allegedly - interfere with laser blasts and force fields. Whether CIA, FBI or
UFO, anyone wanting to take him or slow him down had better stick to analogue
and wood, because anything scientific or technological was toast. Chiang
noticed other people taking pictures while running around the shop looking
for some secret item...he blended in perfectly.
Common usage pattern: users keep
short phrases from the suggestion
but completely change the content
29. 29
often, users delete everything
final edited version:
Annie checked her list again; she'd already read it multiple times, but she might now,
finally, be ready for Christmas. Thanks to her new flight experience, she was actually
going to be able to help deliver some of the many presents this year. She had also
managed some private preparations of her own for afterwards. Now watching the snow
fall softly just outside the window, Annie Bell felt the familiar Christmas tingle as the
magic of the evening approached. She dressed warmly, and went to help Noelle with
the reindeer, knowing the other elves would soon start to arrive to begin packing the
sleigh. Back in her room, six brightly wrapped Christmas presents glinted on her desk
each labelled to one of her new found friends.
generated text:
*Oh, wait*, Annie thought. *This is just like in the movies.* That's where she got stuck
with a living ghost! She was surprised at how this part of Christmas went on so long -
and by its strange behaviour that it started acting up again when things weren't exactly
normal for her to be having fun. So... what did Annie do? She had a friend helping all
over town. A caring friend who passed through Kringle every year-and-a half, passing
out presents from their favourite tree as they went! And here they were going to make
Christmas special (at least until a return trip home!), and now there seemed no end in
sight before she'd need an exorcism, or possibly a demon blood sacrifice... But maybe
if these last few days hadnât gone so very smoothly⌠she could reach back into her
own deep pocket one more time and get her hands free â It would give you a real
boost later. Lets hope nothing happened during the ritual, too⌠But enough âhurr durrâ
no n-gram overlap at all!
30. going forwardâŚ
⢠We will release the full STORIUM dataset to
researchers interested in building models
⢠You can submit your models to us for
evaluation on the Storium platform
⢠All models have to hook into a simple API
⢠Storium users donât know what model is used
⢠A leaderboard for story generation!
30
31. 31
practical considerations for
deploying generation models
⢠Setting up the Storium interface has been an
enormous engineering effort
⢠One of the most important challenges: latency
⢠If real users have to wait more than 30-60 seconds
to get model outputs, they arenât gonna use it!
⢠How can we maintain a reasonable wait time even
when faced with a large volume of queries?
32. 32
Transformer LMs are slowâŚ
100 context
tokens
200 context
tokens
300 context
tokens
Generate next
100 tokens
2.3s 2.7s 3.1s
Generate next
200 tokens
4.4s 4.6s 5.7s
Generate next
300 tokens
7.3s 8.4 9.7s
GPT-2 medium timed on a single 2080Ti
We want to have contexts of >1000
tokens, so we need to be careful!
34. 34
Self-attention
⢠Do we really need it?
⢠Self-attention is quadratic in the input length,
which makes things slow
⢠Everyone uses the Transformer for everything
nowadays, but itâs not really clear what
components of the model are necessaryâŚ
38. 38
Self-attention
57
Layer p
Q
K
V
[Vaswani et al. 2017]
committee awards Strickland advanced opticswhoNobel
optics
advanced
who
Strickland
awards
committee
NobelA
these awesome slides
are from Emma Strubell
43. 43
Hardcoded self-attention
63
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
These values are from a standard normal
distribution centered on the word to the
left of the current word, âStricklandâ
44. 44
Hardcoded self-attention
64
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0
0.6
0.3
0.1
0
0
0
0
left
center
right
ďŹrst
We can use distributions
centered at different
locations to hard-code
multiple heads
45. 45
Hardcoded self-attention
65
committee awards Strickland advanced opticswho
Layer p
V
Nobel
optics
advanced
who
Strickland
awards
committee
Nobel
0
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0.1
0.2
0.4
0.2
0.1
0
0
0.6
0.3
0.1
0
0
0
0
left
center
right
ďŹrst
This is equivalent to a 1-d
convolution with a ďŹxed kernel
47. 47
how much do you need self-attention?
dataset
all heads
learned
IWSLT16 en-de 30.0
IWSLT16 de-en 34.4
WMT16 en-ro 33.0
WMT16 ro-en 33.1
WMT14 en-de 26.8
WMT14 en-fr 40.3
48. 48
how much do you need self-attention?
dataset
all heads
learned
hard-code
self attn
IWSLT16 en-de 30.0 30.3
IWSLT16 de-en 34.4 34.8
WMT16 en-ro 33.0 32.4
WMT16 ro-en 33.1 32.8
WMT14 en-de 26.8 26.3
WMT14 en-fr 40.3 39.1
hard-coding self-
attention in the
encoder and
decoder does not
signiďŹcantly
impact BLEU!
49. 49
efďŹciency improvements
⢠Batch size increase: hard-coding self-attention
and reducing cross attention heads allows us to
use 27% more tokens per batch
⢠Decoding speed (sentences/sec) increases by
30.2% over the baseline Transformer
can we do better?
Moral of the story: if latency is important for your task,
consider drastic simpliďŹcations to your neural models;
oftentimes, you can make them faster without signiďŹcant
performance loss
50. 50
I drove to school
Blackbox API
(e.g., Google Translate)
Ich bin zur
Schule gefahren.
user query: model output:
This pipeline is vulnerable to model extraction attacks
51. 51
Thieves on Sesame Street!
Model Extraction of BERT-based APIs
Kalpesh
Krishna1
Mohit
Iyyer1
Nicolas
Papernot2
Ankur P.
Parikh2
Gaurav S.
Tomar2
1 2
Work done during an internship at Google AI Language.
(ICLR 2020)
52. 52
âThis is a
great movie!â
Photo credits - http://jalammar.github.io/illustrated-bert/
Victim Model (Blackbox API)
(binary sentiment classification)
Positive
5
A company trains a binary sentiment classifier based on BERT
53. 53
âThis is a
great movie!â Positive
Victim Model (Blackbox API)
6
It is released as a black-box API (the âvictim modelâ)
54. 54
âseventeen III.
miles Vegasâ
âCircle Ford had
support. wife rulers
broken Jan Familyâ
x N
7
A malicious user generates many queries
(in this work, random gibberish sequences of words)
55. 55
âseventeen III.
miles Vegasâ
âCircle Ford had
support. wife rulers
broken Jan Familyâ
x N
Victim Model (Blackbox API)
Positive
Negative
8
The attacker queries the API with the generated inputs and
collects the labels
56. 56
âseventeen III.
miles Vegasâ
âCircle Ford had
support. wife rulers
broken Jan Familyâ
x N
Training Data X Training Data Y
Victim Model (Blackbox API)
Positive
Negative
9
The collected data is used to train a âcopyâ of the model
57. 57
âseventeen III.
miles Vegasâ
âCircle Ford had
support. wife rulers
broken Jan Familyâ
x N
âThis is a
great movie!â Positive
Victim Model (Blackbox API)
Extracted Model
Positive
Negative
10
The stolen copy (âextracted modelâ) works well on real data
58. 58
Why is model extraction a problem?
Leakage of original
training data
Adversarial example
generation
Image credits - Papernot et al. 2017, Fredrikson et al. 2015
Theft of intellectual
property
11
59. 59
These attacks are economically practical
Google Cloud Natural Language API cost <= $1.00 per 1000 API calls.
https://cloud.google.com/products/calculator/
Dataset Size Upperbound Price
SST2 (sentiment classify) 67349 sentences $62.35
Switchboard (speech) 300 hours $430.56
Translation 1 million sentences
(100 characters each)
$2000.00
12
Smart attackers can scrape APIs like Google Translate for free
60. 60
We use two query generators - RANDOM & WIKI
19
RANDOM
(gibberish sequences of words
sampled from a fixed vocabulary)
1. cent 1977, preparation (120 remote
Program finance add broader protection
2. Mike zone fights Woods Second State
known, defined come
WIKI
(sentences from Wikipedia)
1. The unique glass chapel made public
and press viewing of the wedding easy.
2. Wrapped in Red was first released
internationally on October 25, 2013.
61. 61
Results - attacks are effective
24
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
A BERT model trained on the real SQuAD data gets 90.6 F1
62. 62
Results - attacks are effective
25
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
RANDOM achieves 85.8 F1 (~95% performance) without seeing a
single grammatically valid paragraph or question during training
63. 63
Results - attacks are effective
26
# of Queries SST2 (%) MNLI (%) SQUAD (F1)
API / Victim Model 1x 93.1 85.8 90.6
RANDOM 1x 90.1 76.3 79.1
RANDOM upto 10x 90.5 78.5 85.8
WIKI 1x 91.4 77.8 86.1
WIKI upto 10x 91.7 79.3 89.4
WIKI achieves 89.4 F1 (~99% performance) without seeing a
single grammatically valid question during training
64. There are no effective defenses
for this kind of attack, so think
twice before allowing users to
have query access to your
models!
64
In a recent paper (âImitation Attacks and Defenses for
Black-box Machine Translation Systemâ), Google
Translate was successfully stolen!