Presented by Changmao Li
Transformers to Learn Hierarchical
Contexts in Multiparty Dialogue
2
Outline
• Introduction and Background
• Approach
• Experiments
• Results
• Analysis
• Appendix
• Conclusion
• List of Contribution
• Future Work
• References
3
Introduction
and
Background
4
Language Model embeddings
• Static Word Embedding
• Skip-Gram & CBOW (aka Word2Vec)
• Glove
• fastText
• Contextualized (Dynamic) Word Embedding
(LM)
• ELMO (Embeddings from Language
Models)
• ULMFiT (Universal Language Model Fine-
tuning)
• BERT (Bidirectional Encoder
Representations from Transformers)
• GPT & GPT-2 (Generative Pre-Training)
5
Static vs. Contextual
• Static Word Embeddings fail to capture
polysemy.
• Static Word Embeddings could only leverage
off the vector outputs from unsupervised
models for downstream tasks
• Traditional word vectors are shallow
representations (a single layer of weights,
known as embeddings).
6
Recent Contextual Language modeling
Approach
From original BERT paper
7
What is Transformer Encoder
From paper: Attention is All you need
8
What is the Multi-head Attention
From paper: Attention is All you need
9
What is the Multi-head Attention
10
BERT Related transformer language
modeling approach
– BERT
– RoBERTa
– AlBERT
– …..
11
BERT
From Original BERT paper
12
RoBERTa
• More data
• Longer time to train
• Dynamic masking
• No next sentence prediction task
13
AlBERT
• Factorized embedding parameterizaFon.
– Reduce the embedding parameters O(V × H) to
O(V × E + E × H).
• Shared layers weights
• Sentence ordering predicFon
14
• Corpus
• Friends Dialogue Transcript
• Character Mining Project
• Tasks
• FriendsQA
• Friends Reading Comprehension
• Friends Emotion Detection
• Friends Personality Detection
15
Corpus
• 10 seasons of the Friends show
• Dialogue Example:
From Friends transcript
16
FriendsQA
Only Annotated first 4 seasons
From FriendsQA dataset
17
Related question answering Tasks
– general domain datasets
• SQuAD 1.0
• SQuAD 2.0
• MS Marco
• TRIVIAQA
• NEWSQA
• NarrativeQA
– Multi-turn question answering datasets
• SQA
• QuAC
• CoQA
• CQA
– Dialogue based question answering datasets
• Dream
18
Approach
19
Approach
• Problem with Original Transformer Based
language modeling Approach for Dialogue
• They pretrained on the formal writing
not dialogue based corpus
• Simply concatenating all the dialogue
utterances into whole context as input
20
Approach
• Pretraining with transferred BERT/RoBERTa
weights
– Token level masked language model
– UWerance level masked language model
– UWerance order predicXon
• Fine-tuning
– Joint learning of two tasks
21
Pretraining
• Stage 1: Token level Masked Language Model
22
Pretraining
• Stage 2: Utterance Level Masked language model
23
Pretraining
• Stage 3: Utterance order prediction
24
Fine-tuning
• FriendsQA task
– The utterance ID prediction
– The token span prediction
25
Experiments
26
Experiments
• FriendsQA task
– Chronological Data Split
27
Experiments
• FriendsQA task
– Evaluation Metrics:
• EM: exact match
– Check if the prediction and gold answer are the exactly same
• SM: Span-based Match
– Each answer is treated as bag-of-words
– Compute macro-average F1 score
• UM: utterance match
– checks if the prediction resides within the same utterance as
the gold answer span
28
Results
29
Results for Friends QA
30
Analysis
31
Analysis
• Ablation Studies
Method EM SM UM
BERTpre with uid_loss 45.7(±0.8) 61.1(±0.8) 71.5(±0.5)
BERTpre without uid_loss 45.6(±0.9) 61.2(±0.7) 71.3(±0.6)
BERTpre+ulm with uid_loss 46.2(±1.1) 62.4(±1.2) 72.5(±0.8)
BERTpre+ulm without uid_loss 45.7(±0.9) 61.8(±0.9) 71.8(±0.5)
BERTpre+ulm+uop with uid_loss 46.8(±1.3) 63.1(±1.1) 73.3(±0.7)
BERTpre+ulm+uop without uid_loss 45.6(±0.9) 61.7(±0.7) 71.7(±0.6)
RoBERTapre with uid_loss 52.8(±0.9) 68.7(±0.8) 81.9(±0.5)
RoBERTapre without uid_loss 52.6(±0.7) 68.6(±0.6) 81.7(±0.7)
RoBERTapre+ulm with uid_loss 53.2(±0.6) 69.2(±0.7) 82.4(±0.5)
RoBERTapre+ulm without uid_loss 52.9(±0.8) 68.7(±1.1) 81.7(±0.6)
RoBERTapre+ulm+uop with uid_loss 53.5(±0.7) 69.6(±0.8) 82.7(±0.5)
RoBERTapre+ulm+uop without uid_loss 52.5(±0.8) 68.8(±0.5) 81.9(±0.7)
32
Analysis
• FriendsQA Tasks
– Question type analysis
33
34
Analysis
• Error Analysis
35
Analysis
• Error examples
36
Analysis
• Error examples
37
Analysis
• Error examples
38
Analysis
• FriendsQA Task Remained Challenges
– Inference in the dialogue?
• Still mainly doing pattern matching.
• In some cases, the utterance id prediction let model
forcedly learn the right utterance of an answer span.
– Deal with speakers and mentions?
• Adding the speakers into the vocabulary cannot improve
the results.
39
Appendix
40
Appendix: Friends Reading
Comprehension
All 10 seasons are processed for the task
From FriendsRC dataset
41
Appendix:Friends Emotion Detection
From Emotion Detection Dataset
42
Appendix:Friends Personality
Detection
From Personality Detection Dataset
43
Appendix
• FriendsRC task
– Chronological Data Split
– Evaluation metrics:
• Accuracy
– Check if the prediction and gold answer is the exactly same
44
Appendix
• FriendsPD Task and FriendsED Task
– Random Data split
– Evaluation metrics:
• Accuracy
– Check if the prediction and gold answer are the exactly same
45
Appendix: Results
• FriendsRC results
Method Accuracy
BERTpre 29.9(±0.8)
BERTpre+ulm 29.8(±0.9)
BERTpre+ulm+uop 29.9(±0.7)
RoBERTapre 31.2(±1.1)
RoBERTapre+ulm 31.2(±1.0)
RoBERTapre+ulm+uop 31.1(±0.9)
46
Appendix: Results
• FriendsED results
Method Accuracy
BERTpre 33.4(±0.3)
BERTpre+ulm 33.2(±0.5)
BERTpre+ulm+uop 33.2(±0.5)
RoBERTapre 34.5(±0.8)
RoBERTapre+ulm 34.2(±0.9)
RoBERTapre+ulm+uop 34.2(±0.7)
47
Appendix: Results
• FriendsPD results
Method AGR CON EXT OPN NEU
BERTpre 58.2(±0.5) 57.7(±0.3) 59.2(±0.6) 61.2(±0.5) 59.3(±0.5)
BERTpre+ulm 58.1(±0.7) 57.5(±0.4) 59.1(±0.8) 61.2(±0.5) 59.3(±0.5)
BERTpre+ulm+uop 58.2(±0.5) 57.7(±0.6) 59.1(±0.5) 61.1(±0.5) 59.2(±0.5)
RoBERTapre 59.7(±0.7) 58.6(±0.5) 60.7(±0.7) 65.9(±0.6) 61.1(±0.5)
RoBERTapre+ulm 59.5(±0.5) 58.5(±0.8) 60.7(±0.8) 65.8(±0.9) 61.1(±0.5)
RoBERTapre+ulm+uop 59.6(±0.8) 58.6(±0.6) 60.6(±0.5) 65.8(±0.7) 61.1(±0.5)
48
Appendix: Analysis
• FriendsRC Task
– The mention encoding @entxx cannot be well-encoded
into the model
• FriendsPD Task and FriendsED Task
– Token level information is enough for this two problem
– incorporating utterance level information cannot make any
improvement
49
Conclusion
50
Conclusion
• A novel transformer approach that interprets
hierarchical contexts in multiparty dialogue.
• Evaluated on FriendsQA task and outperforms
BERT and RoBERTa.
• Although the model shows no help to other
character mining tasks, it still gives promising
idea for future studies.
51
List of Contributions
52
List of ContribuKons
• New pre-training tasks are introduced to improve the
quality of both token-level and utterance-level
embeddings generated by the transformers, that better
suit to handle dialogue contexts.
• A new multi-task learning approach is proposed to
fine-tune the language model for span-based QA that
takes full advantage of the hierarchical embeddings
created from the pre-training.
• The approach outperforms the previous state-of-the-
art models using BERT and RoBERTa on the span-based
QA task using dialogues as evidence documents.
53
Future Work
54
Future work
• Figure out how to represent speakers and
mentions in the dialogue.
• Figure out how to inference in the dialogue.
• Design new advanced dialogue language
model that can fit for all tasks.
55
References
56
References
Tri Nguyen, Mir Rosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated
machine reading comprehension dataset. InProceedings of the Workshopon CogniNve ComputaNon: IntegraNng neural and symbolic
approaches 2016 co-located with the 30th AnnualConference on Neural InformaNon Processing Systems (NIPS 2016), Barcelona, Spain,
December 9, 2016.
Pranav Rajpurkar, Jian Zhang, KonstanNn Lopyrev, and Percy Liang. 2016. Squad: 100,000+ quesNons for ma-chine comprehension of
text.Proceedings of the 2016 Conference on Empirical Methods in Natural LanguageProcessing.
Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable quesNons forsquad.Proceedings of the 56th
Annual MeeNng of the AssociaNon for ComputaNonal LinguisNcs (Volume 2:Short Papers).
Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. Coqa: A conversaNonal quesNon answering chal-lenge.TransacNons of the
AssociaNon for ComputaNonal LinguisNcs, 7:249–266, Mar.
Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019. DREAM: A Challenge Data Setand Models for Dialogue-Based
Reading Comprehension.TransacNons of the AssociaNon for ComputaNonalLinguisNcs, 7:217–231.
Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex quesNons.Pro-ceedings of the 2018 Conference
of the North American Chapter of the AssociaNon for ComputaNonal Linguis-Ncs: Human Language Technologies, Volume 1 (Long Papers).
Trieu H. Trinh and Quoc V. Le. 2018. A Simple Method for Commonsense Reasoning.arXiv, 1806.02847.
Adam Trischler, Tong Wang, Xingdi Yuan, JusNn Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Sule-man. 2017. Newsqa: A machine
comprehension dataset.Proceedings of the 2nd Workshop on RepresentaNonLearning for NLP.
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, andIllia Polosukhin. 2017. AienNon is
all you need. InProceedings of the 31st InternaNonal Conference onNeural InformaNon Processing Systems, NIPS’17, pages 6000–6010, USA.
Curran Associates Inc.
Zhengzhe Yang and Jinho D. Choi. 2019. FriendsQA: Open-domain quesNon answering on TV show transcripts.InProceedings of the 20th Annual
SIGdial MeeNng on Discourse and Dialogue, pages 188–197, Stockholm,Sweden, September. AssociaNon for ComputaNonal LinguisNcs.
Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet:Generalized autoregressive pretraining
for language understanding. In H. Wallach, H. Larochelle, A. Beygelz-imer, F. d'Alch ́e-Buc, E. Fox, and R. Garnei, editors,Advances in Neural
InformaNon Processing Systems 32,pages 5754–5764. Curran Associates, Inc
57
References
Eunsol Choi, He He, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zeilemoyer.2018. Quac: QuesNon answering in
context.Proceedings of the 2018 Conference on Empirical Methods inNatural Language Processing.
Alexis CONNEAU and Guillaume Lample. 2019. Cross-lingual language model pretraining. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ́e-Buc,
E. Fox, and R. Garnei, editors,Advances in Neural InformaNonProcessing Systems 32, pages 7057–7067. Curran Associates, Inc.
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KrisNna Toutanova. 2019. BERT: Pre-training of Deep Bidirec-Nonal Transformers for Language
Understanding. InProceedings of the 2019 Conference of the North AmericanChapter of the AssociaNon for ComputaNonal LinguisNcs: Human Language
Technologies, NAACL’19, pages4171–4186.
Aaron Gokaslan and Vanya Cohen, 2019. OpenWebText Corpus.Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured
learning for sequenNalquesNon answering. InProceedings of the 55th Annual MeeNng of the AssociaNon for ComputaNonal Lin-guisNcs (Volume 1: Long
Papers), pages 1821–1831, Vancouver, Canada, July. AssociaNon for ComputaNonalLinguisNcs.
Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zeilemoyer. 2017. Triviaqa: A large scale distantly super-vised challenge dataset for reading
comprehension.Proceedings of the 55th Annual MeeNng of the AssociaNonfor ComputaNonal LinguisNcs (Volume 1: Long Papers).
Tom ́aˇs Koˇcisk ́y, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G ́abor Melis, and EdwardGrefensteie. 2018. The narraNveqa
reading comprehension challenge.TransacNons of the AssociaNon forComputaNonal LinguisNcs, 6:317–328, Dec.
Zhenzhong Lan, Mingda Chen, SebasNan Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised
learning of language representaNons.
Yinhan Liu, Myle Oi, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, LukeZeilemoyer, and Veselin Stoyanov. 2019.
RoBERTa: A Robustly OpNmized BERT Pretraining Approach.arXiv, 1907.11692. SebasNan Nagel, 2016. News Dataset Available.
58
Thank you

Transformers to Learn Hierarchical Contexts in Multiparty Dialogue

  • 1.
    Presented by ChangmaoLi Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
  • 2.
    2 Outline • Introduction andBackground • Approach • Experiments • Results • Analysis • Appendix • Conclusion • List of Contribution • Future Work • References
  • 3.
  • 4.
    4 Language Model embeddings •Static Word Embedding • Skip-Gram & CBOW (aka Word2Vec) • Glove • fastText • Contextualized (Dynamic) Word Embedding (LM) • ELMO (Embeddings from Language Models) • ULMFiT (Universal Language Model Fine- tuning) • BERT (Bidirectional Encoder Representations from Transformers) • GPT & GPT-2 (Generative Pre-Training)
  • 5.
    5 Static vs. Contextual •Static Word Embeddings fail to capture polysemy. • Static Word Embeddings could only leverage off the vector outputs from unsupervised models for downstream tasks • Traditional word vectors are shallow representations (a single layer of weights, known as embeddings).
  • 6.
    6 Recent Contextual Languagemodeling Approach From original BERT paper
  • 7.
    7 What is TransformerEncoder From paper: Attention is All you need
  • 8.
    8 What is theMulti-head Attention From paper: Attention is All you need
  • 9.
    9 What is theMulti-head Attention
  • 10.
    10 BERT Related transformerlanguage modeling approach – BERT – RoBERTa – AlBERT – …..
  • 11.
  • 12.
    12 RoBERTa • More data •Longer time to train • Dynamic masking • No next sentence prediction task
  • 13.
    13 AlBERT • Factorized embeddingparameterizaFon. – Reduce the embedding parameters O(V × H) to O(V × E + E × H). • Shared layers weights • Sentence ordering predicFon
  • 14.
    14 • Corpus • FriendsDialogue Transcript • Character Mining Project • Tasks • FriendsQA • Friends Reading Comprehension • Friends Emotion Detection • Friends Personality Detection
  • 15.
    15 Corpus • 10 seasonsof the Friends show • Dialogue Example: From Friends transcript
  • 16.
    16 FriendsQA Only Annotated first4 seasons From FriendsQA dataset
  • 17.
    17 Related question answeringTasks – general domain datasets • SQuAD 1.0 • SQuAD 2.0 • MS Marco • TRIVIAQA • NEWSQA • NarrativeQA – Multi-turn question answering datasets • SQA • QuAC • CoQA • CQA – Dialogue based question answering datasets • Dream
  • 18.
  • 19.
    19 Approach • Problem withOriginal Transformer Based language modeling Approach for Dialogue • They pretrained on the formal writing not dialogue based corpus • Simply concatenating all the dialogue utterances into whole context as input
  • 20.
    20 Approach • Pretraining withtransferred BERT/RoBERTa weights – Token level masked language model – UWerance level masked language model – UWerance order predicXon • Fine-tuning – Joint learning of two tasks
  • 21.
    21 Pretraining • Stage 1:Token level Masked Language Model
  • 22.
    22 Pretraining • Stage 2:Utterance Level Masked language model
  • 23.
    23 Pretraining • Stage 3:Utterance order prediction
  • 24.
    24 Fine-tuning • FriendsQA task –The utterance ID prediction – The token span prediction
  • 25.
  • 26.
    26 Experiments • FriendsQA task –Chronological Data Split
  • 27.
    27 Experiments • FriendsQA task –Evaluation Metrics: • EM: exact match – Check if the prediction and gold answer are the exactly same • SM: Span-based Match – Each answer is treated as bag-of-words – Compute macro-average F1 score • UM: utterance match – checks if the prediction resides within the same utterance as the gold answer span
  • 28.
  • 29.
  • 30.
  • 31.
    31 Analysis • Ablation Studies MethodEM SM UM BERTpre with uid_loss 45.7(±0.8) 61.1(±0.8) 71.5(±0.5) BERTpre without uid_loss 45.6(±0.9) 61.2(±0.7) 71.3(±0.6) BERTpre+ulm with uid_loss 46.2(±1.1) 62.4(±1.2) 72.5(±0.8) BERTpre+ulm without uid_loss 45.7(±0.9) 61.8(±0.9) 71.8(±0.5) BERTpre+ulm+uop with uid_loss 46.8(±1.3) 63.1(±1.1) 73.3(±0.7) BERTpre+ulm+uop without uid_loss 45.6(±0.9) 61.7(±0.7) 71.7(±0.6) RoBERTapre with uid_loss 52.8(±0.9) 68.7(±0.8) 81.9(±0.5) RoBERTapre without uid_loss 52.6(±0.7) 68.6(±0.6) 81.7(±0.7) RoBERTapre+ulm with uid_loss 53.2(±0.6) 69.2(±0.7) 82.4(±0.5) RoBERTapre+ulm without uid_loss 52.9(±0.8) 68.7(±1.1) 81.7(±0.6) RoBERTapre+ulm+uop with uid_loss 53.5(±0.7) 69.6(±0.8) 82.7(±0.5) RoBERTapre+ulm+uop without uid_loss 52.5(±0.8) 68.8(±0.5) 81.9(±0.7)
  • 32.
  • 33.
  • 34.
  • 35.
  • 36.
  • 37.
  • 38.
    38 Analysis • FriendsQA TaskRemained Challenges – Inference in the dialogue? • Still mainly doing pattern matching. • In some cases, the utterance id prediction let model forcedly learn the right utterance of an answer span. – Deal with speakers and mentions? • Adding the speakers into the vocabulary cannot improve the results.
  • 39.
  • 40.
    40 Appendix: Friends Reading Comprehension All10 seasons are processed for the task From FriendsRC dataset
  • 41.
  • 42.
  • 43.
    43 Appendix • FriendsRC task –Chronological Data Split – Evaluation metrics: • Accuracy – Check if the prediction and gold answer is the exactly same
  • 44.
    44 Appendix • FriendsPD Taskand FriendsED Task – Random Data split – Evaluation metrics: • Accuracy – Check if the prediction and gold answer are the exactly same
  • 45.
    45 Appendix: Results • FriendsRCresults Method Accuracy BERTpre 29.9(±0.8) BERTpre+ulm 29.8(±0.9) BERTpre+ulm+uop 29.9(±0.7) RoBERTapre 31.2(±1.1) RoBERTapre+ulm 31.2(±1.0) RoBERTapre+ulm+uop 31.1(±0.9)
  • 46.
    46 Appendix: Results • FriendsEDresults Method Accuracy BERTpre 33.4(±0.3) BERTpre+ulm 33.2(±0.5) BERTpre+ulm+uop 33.2(±0.5) RoBERTapre 34.5(±0.8) RoBERTapre+ulm 34.2(±0.9) RoBERTapre+ulm+uop 34.2(±0.7)
  • 47.
    47 Appendix: Results • FriendsPDresults Method AGR CON EXT OPN NEU BERTpre 58.2(±0.5) 57.7(±0.3) 59.2(±0.6) 61.2(±0.5) 59.3(±0.5) BERTpre+ulm 58.1(±0.7) 57.5(±0.4) 59.1(±0.8) 61.2(±0.5) 59.3(±0.5) BERTpre+ulm+uop 58.2(±0.5) 57.7(±0.6) 59.1(±0.5) 61.1(±0.5) 59.2(±0.5) RoBERTapre 59.7(±0.7) 58.6(±0.5) 60.7(±0.7) 65.9(±0.6) 61.1(±0.5) RoBERTapre+ulm 59.5(±0.5) 58.5(±0.8) 60.7(±0.8) 65.8(±0.9) 61.1(±0.5) RoBERTapre+ulm+uop 59.6(±0.8) 58.6(±0.6) 60.6(±0.5) 65.8(±0.7) 61.1(±0.5)
  • 48.
    48 Appendix: Analysis • FriendsRCTask – The mention encoding @entxx cannot be well-encoded into the model • FriendsPD Task and FriendsED Task – Token level information is enough for this two problem – incorporating utterance level information cannot make any improvement
  • 49.
  • 50.
    50 Conclusion • A noveltransformer approach that interprets hierarchical contexts in multiparty dialogue. • Evaluated on FriendsQA task and outperforms BERT and RoBERTa. • Although the model shows no help to other character mining tasks, it still gives promising idea for future studies.
  • 51.
  • 52.
    52 List of ContribuKons •New pre-training tasks are introduced to improve the quality of both token-level and utterance-level embeddings generated by the transformers, that better suit to handle dialogue contexts. • A new multi-task learning approach is proposed to fine-tune the language model for span-based QA that takes full advantage of the hierarchical embeddings created from the pre-training. • The approach outperforms the previous state-of-the- art models using BERT and RoBERTa on the span-based QA task using dialogues as evidence documents.
  • 53.
  • 54.
    54 Future work • Figureout how to represent speakers and mentions in the dialogue. • Figure out how to inference in the dialogue. • Design new advanced dialogue language model that can fit for all tasks.
  • 55.
  • 56.
    56 References Tri Nguyen, MirRosenberg, Xia Song, Jianfeng Gao, Saurabh Tiwary, Rangan Majumder, and Li Deng. 2016. MS MARCO: A human generated machine reading comprehension dataset. InProceedings of the Workshopon CogniNve ComputaNon: IntegraNng neural and symbolic approaches 2016 co-located with the 30th AnnualConference on Neural InformaNon Processing Systems (NIPS 2016), Barcelona, Spain, December 9, 2016. Pranav Rajpurkar, Jian Zhang, KonstanNn Lopyrev, and Percy Liang. 2016. Squad: 100,000+ quesNons for ma-chine comprehension of text.Proceedings of the 2016 Conference on Empirical Methods in Natural LanguageProcessing. Pranav Rajpurkar, Robin Jia, and Percy Liang. 2018. Know what you don’t know: Unanswerable quesNons forsquad.Proceedings of the 56th Annual MeeNng of the AssociaNon for ComputaNonal LinguisNcs (Volume 2:Short Papers). Siva Reddy, Danqi Chen, and Christopher D. Manning. 2019. Coqa: A conversaNonal quesNon answering chal-lenge.TransacNons of the AssociaNon for ComputaNonal LinguisNcs, 7:249–266, Mar. Kai Sun, Dian Yu, Jianshu Chen, Dong Yu, Yejin Choi, and Claire Cardie. 2019. DREAM: A Challenge Data Setand Models for Dialogue-Based Reading Comprehension.TransacNons of the AssociaNon for ComputaNonalLinguisNcs, 7:217–231. Alon Talmor and Jonathan Berant. 2018. The web as a knowledge-base for answering complex quesNons.Pro-ceedings of the 2018 Conference of the North American Chapter of the AssociaNon for ComputaNonal Linguis-Ncs: Human Language Technologies, Volume 1 (Long Papers). Trieu H. Trinh and Quoc V. Le. 2018. A Simple Method for Commonsense Reasoning.arXiv, 1806.02847. Adam Trischler, Tong Wang, Xingdi Yuan, JusNn Harris, Alessandro Sordoni, Philip Bachman, and Kaheer Sule-man. 2017. Newsqa: A machine comprehension dataset.Proceedings of the 2nd Workshop on RepresentaNonLearning for NLP. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, andIllia Polosukhin. 2017. AienNon is all you need. InProceedings of the 31st InternaNonal Conference onNeural InformaNon Processing Systems, NIPS’17, pages 6000–6010, USA. Curran Associates Inc. Zhengzhe Yang and Jinho D. Choi. 2019. FriendsQA: Open-domain quesNon answering on TV show transcripts.InProceedings of the 20th Annual SIGdial MeeNng on Discourse and Dialogue, pages 188–197, Stockholm,Sweden, September. AssociaNon for ComputaNonal LinguisNcs. Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Russ R Salakhutdinov, and Quoc V Le. 2019. Xlnet:Generalized autoregressive pretraining for language understanding. In H. Wallach, H. Larochelle, A. Beygelz-imer, F. d'Alch ́e-Buc, E. Fox, and R. Garnei, editors,Advances in Neural InformaNon Processing Systems 32,pages 5754–5764. Curran Associates, Inc
  • 57.
    57 References Eunsol Choi, HeHe, Mohit Iyyer, Mark Yatskar, Wen-tau Yih, Yejin Choi, Percy Liang, and Luke Zeilemoyer.2018. Quac: QuesNon answering in context.Proceedings of the 2018 Conference on Empirical Methods inNatural Language Processing. Alexis CONNEAU and Guillaume Lample. 2019. Cross-lingual language model pretraining. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alch ́e-Buc, E. Fox, and R. Garnei, editors,Advances in Neural InformaNonProcessing Systems 32, pages 7057–7067. Curran Associates, Inc. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and KrisNna Toutanova. 2019. BERT: Pre-training of Deep Bidirec-Nonal Transformers for Language Understanding. InProceedings of the 2019 Conference of the North AmericanChapter of the AssociaNon for ComputaNonal LinguisNcs: Human Language Technologies, NAACL’19, pages4171–4186. Aaron Gokaslan and Vanya Cohen, 2019. OpenWebText Corpus.Mohit Iyyer, Wen-tau Yih, and Ming-Wei Chang. 2017. Search-based neural structured learning for sequenNalquesNon answering. InProceedings of the 55th Annual MeeNng of the AssociaNon for ComputaNonal Lin-guisNcs (Volume 1: Long Papers), pages 1821–1831, Vancouver, Canada, July. AssociaNon for ComputaNonalLinguisNcs. Mandar Joshi, Eunsol Choi, Daniel Weld, and Luke Zeilemoyer. 2017. Triviaqa: A large scale distantly super-vised challenge dataset for reading comprehension.Proceedings of the 55th Annual MeeNng of the AssociaNonfor ComputaNonal LinguisNcs (Volume 1: Long Papers). Tom ́aˇs Koˇcisk ́y, Jonathan Schwarz, Phil Blunsom, Chris Dyer, Karl Moritz Hermann, G ́abor Melis, and EdwardGrefensteie. 2018. The narraNveqa reading comprehension challenge.TransacNons of the AssociaNon forComputaNonal LinguisNcs, 6:317–328, Dec. Zhenzhong Lan, Mingda Chen, SebasNan Goodman, Kevin Gimpel, Piyush Sharma, and Radu Soricut. 2019. Albert: A lite bert for self-supervised learning of language representaNons. Yinhan Liu, Myle Oi, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, LukeZeilemoyer, and Veselin Stoyanov. 2019. RoBERTa: A Robustly OpNmized BERT Pretraining Approach.arXiv, 1907.11692. SebasNan Nagel, 2016. News Dataset Available.
  • 58.