A Compare-Aggregate Model with
Latent Clustering
for Answer Selection
Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim,
Trung Bui and Kyomin Jung
Index
Research Objective
Proposed Methods
Experimental Results
• Employing Pretrained Language Model (LM)
• Latent Clustering (LC)
• Transfer Learning (TL)
• Point-wise Learning
Conclusions
Index
Objective
Conclusions
Methods
Experiments
*https://en.wikipedia.org/wiki/Question_answering
Question Answering System (QA)
is a computer science discipline
within the fields of information retrieval (IR)
and natural language processing (NLP),
which is concerned with building systems
that automatically answer questions
posed by humans in a natural language*.
Research Objective
3 /18
Objective
Objective
Conclusions
Methods
Experiments
Question: Who is the Tang?
Split the passage into multiple sentences  focus on the relevant one
Passage: Journey to the West is one of the four classics of Chinese literature.
Written by the Ming Dynasty novelist Wu Cheng’en during the 16th century, this
beloved adventure tale combines action, humor, and spiritual lessons.
The novel takes place in the seventh century. It tells the story of one of Buddha
Sakyamuni’s disciples who was banished from the heavenly paradise for the crime of
slighting the Buddha Law. He was sent to the human world and forced to spend ten
lifetimes practicing religious self-cultivation in order to atone for his sins.
In his tenth lifetime, now during the Tang Dynasty, he reincarnates as a monk
named Xuan Zang (also known as Tang Monk and Tripitaka). The emperor wishes
this monk can travel west and bring holy Mahayana Buddhist scriptures back to
China. After being inspired by a vision from the Bodhisattva Guanyin, the monk
accepts the mission and sets off on the sacred quest.
sentence-level
Research Objective
4 /18
Objective
Objective
Conclusions
Methods
Experiments
Implement one of the best model
& make it enhanced
Benefit from the large corpus
Ideas
5 /18
Objective
Objective
Conclusions
Methods
Experiments
Enhance the previous model
• Comp-aggr-dynamic-clip attention (Comp-Clip)
• Apply Latent Cluster method (LC)
• Apply pointwise learning (from listwise learning)
Benefit from large corpus
• Adopt the pre-trained Language Model (LM)
• Apply Transfer-Learning (TL) using QNLI dataset
previous
Proposed Methods
6 /18
Methods
Objective
Conclusions
Methods
Experiments
• Comp-aggr-dynamic-clip attention* (Comp-Clip)
Baseline Model
*Bian et al., “A compareaggregate model with dynamic-clip attention for answer selection.”, CIKM 2017 7 /18
Methods
Objective
Conclusions
Methods
Experiments
• Employing language model for word-embedding layer
• Deep contextualized word representation (ELMo*)
*ELMO ( Embeddings from Language Models ), https://allennlp.org/elmo, https://arxiv.org/abs/1802.05365
Pretrained Language Model (LM)
8 /18
Methods
Objective
Conclusions
Methods
Experiments
Question Answer
𝑀𝑀1 … 𝑀𝑀𝑛𝑛
Answer (Question)
sentence representation
0.6 0.3 0.1
Compute
similarity
Latent-cluster
information
𝑴𝑴𝑳𝑳𝑳𝑳 = ∑𝒌𝒌 𝜶𝜶𝟏𝟏:𝒌𝒌
𝑨𝑨
� 𝑴𝑴𝒌𝒌
𝒑𝒑𝟏𝟏:𝒏𝒏
𝑨𝑨
= (𝑺𝑺𝑨𝑨)⊺ � 𝑾𝑾𝑾𝑾
Latent Clustering (LC)
9 /18
Methods
K-max-pool
softmax
�𝒑𝒑𝟏𝟏:𝒌𝒌
𝑨𝑨
𝑴𝑴𝑳𝑳𝑳𝑳
• Extract latent cluster information from Question and Answer
• Auxiliary knowledge in understanding text
𝑺𝑺𝑨𝑨 = avg (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖)
𝜶𝜶𝟏𝟏:𝒌𝒌
𝑨𝑨
Objective
Conclusions
Methods
Experiments
• Train the model with a large corpus, then finetune the
model to target corpus
train dev test train dev test
WikiQA 873 126 143 8.6k 1.1k 2.3k 9.9
TREC-QA 1.2k 65 68 53k 1.1k 1.4k 43.5
QNLI 86k 10k - 428k 169k - 5.0
Dataset
answer candidates
/ question
Listwise pairs Pointwise pairs
Transfer Learning (TL)
10 /18
Methods
Objective
Conclusions
Methods
Experiments
• Object function determines the way that the model will be
optimized
KL-divergence in the list Cross-entropy
Point-wise Learning
11 /18
train dev test train dev test
WikiQA 873 126 143 8.6k 1.1k 2.3k 9.9
TREC-QA 1.2k 65 68 53k 1.1k 1.4k 43.5
QNLI 86k 10k - 428k 169k - 5.0
Dataset
answer candidates
/ question
Listwise pairs Pointwise pairs
Methods
Objective
Conclusions
Methods
Experiments
Overall Architecture
12 /18
Methods
Objective
Conclusions
Methods
Experiments
• Benchmark dataset for sentence-level QA
• WikiQA
• Answer selection QA dataset constructed from real queries of
Bing and Wikipedia
• TREC-QA
• Answer selection QA dataset created from the TREC Question-
Answering tracks
• QNLI
• Modified version of the SQuAD dataset that allows for
sentence selection QA
Datasets
13 /18
Experiments
Objective
Conclusions
Methods
Experiments
• We achieve the state-of-the-art performance in both dataset
ours
previous
LM: Language Model
LC : Latent Clustering
TL : Transfer Learning (using Squad-T)
Experimental Results
14 /18
Experiments
Objective
Conclusions
Methods
Experiments
• We achieve the state-of-the-art performance in both dataset
7.0% (0.714  0.764)
9.1% (0.764  0.834)
LM: Language Model
LC : Latent Clustering
TL : Transfer Learning (using Squad-T)
Experimental Results
15 /18
Experiments
Objective
Conclusions
Methods
Experiments
• We achieve the state-of-the-art performance in both dataset
3.9% (0.835  0.868)
0.8% (0.868  0.875)
LM: Language Model
LC : Latent Clustering
TL : Transfer Learning (using Squad-T)
Experimental Results
16 /18
Experiments
Objective
Conclusions
Methods
Experiments
• QNLI experiment shows efficacy of Latent Clustering method
• Point-wise learning show higher performance than that of the list-wise
learning
Experimental Results
17 /18
Experiments
Objective
Conclusions
Methods
Experiments
Conclusions
• Show that leveraging a large amount of data is crucial for capturing
the contextual representation of input text
• Show that the proposed latent clustering method with a pointwise
objective function significantly improves model performance in the
sentence-level QA task
• Achieves state-of-the-art performance on both the WikiQA and
TREC-QA datasets.
18 /18
Conclusions
Thank you
slide, paper  http://david-yoon.github.io

[slide] A Compare-Aggregate Model with Latent Clustering for Answer Selection

  • 1.
    A Compare-Aggregate Modelwith Latent Clustering for Answer Selection Seunghyun Yoon, Franck Dernoncourt, Doo Soon Kim, Trung Bui and Kyomin Jung
  • 2.
    Index Research Objective Proposed Methods ExperimentalResults • Employing Pretrained Language Model (LM) • Latent Clustering (LC) • Transfer Learning (TL) • Point-wise Learning Conclusions Index
  • 3.
    Objective Conclusions Methods Experiments *https://en.wikipedia.org/wiki/Question_answering Question Answering System(QA) is a computer science discipline within the fields of information retrieval (IR) and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language*. Research Objective 3 /18 Objective
  • 4.
    Objective Conclusions Methods Experiments Question: Who isthe Tang? Split the passage into multiple sentences  focus on the relevant one Passage: Journey to the West is one of the four classics of Chinese literature. Written by the Ming Dynasty novelist Wu Cheng’en during the 16th century, this beloved adventure tale combines action, humor, and spiritual lessons. The novel takes place in the seventh century. It tells the story of one of Buddha Sakyamuni’s disciples who was banished from the heavenly paradise for the crime of slighting the Buddha Law. He was sent to the human world and forced to spend ten lifetimes practicing religious self-cultivation in order to atone for his sins. In his tenth lifetime, now during the Tang Dynasty, he reincarnates as a monk named Xuan Zang (also known as Tang Monk and Tripitaka). The emperor wishes this monk can travel west and bring holy Mahayana Buddhist scriptures back to China. After being inspired by a vision from the Bodhisattva Guanyin, the monk accepts the mission and sets off on the sacred quest. sentence-level Research Objective 4 /18 Objective
  • 5.
    Objective Conclusions Methods Experiments Implement one ofthe best model & make it enhanced Benefit from the large corpus Ideas 5 /18 Objective
  • 6.
    Objective Conclusions Methods Experiments Enhance the previousmodel • Comp-aggr-dynamic-clip attention (Comp-Clip) • Apply Latent Cluster method (LC) • Apply pointwise learning (from listwise learning) Benefit from large corpus • Adopt the pre-trained Language Model (LM) • Apply Transfer-Learning (TL) using QNLI dataset previous Proposed Methods 6 /18 Methods
  • 7.
    Objective Conclusions Methods Experiments • Comp-aggr-dynamic-clip attention*(Comp-Clip) Baseline Model *Bian et al., “A compareaggregate model with dynamic-clip attention for answer selection.”, CIKM 2017 7 /18 Methods
  • 8.
    Objective Conclusions Methods Experiments • Employing languagemodel for word-embedding layer • Deep contextualized word representation (ELMo*) *ELMO ( Embeddings from Language Models ), https://allennlp.org/elmo, https://arxiv.org/abs/1802.05365 Pretrained Language Model (LM) 8 /18 Methods
  • 9.
    Objective Conclusions Methods Experiments Question Answer 𝑀𝑀1 …𝑀𝑀𝑛𝑛 Answer (Question) sentence representation 0.6 0.3 0.1 Compute similarity Latent-cluster information 𝑴𝑴𝑳𝑳𝑳𝑳 = ∑𝒌𝒌 𝜶𝜶𝟏𝟏:𝒌𝒌 𝑨𝑨 � 𝑴𝑴𝒌𝒌 𝒑𝒑𝟏𝟏:𝒏𝒏 𝑨𝑨 = (𝑺𝑺𝑨𝑨)⊺ � 𝑾𝑾𝑾𝑾 Latent Clustering (LC) 9 /18 Methods K-max-pool softmax �𝒑𝒑𝟏𝟏:𝒌𝒌 𝑨𝑨 𝑴𝑴𝑳𝑳𝑳𝑳 • Extract latent cluster information from Question and Answer • Auxiliary knowledge in understanding text 𝑺𝑺𝑨𝑨 = avg (𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑖𝑖) 𝜶𝜶𝟏𝟏:𝒌𝒌 𝑨𝑨
  • 10.
    Objective Conclusions Methods Experiments • Train themodel with a large corpus, then finetune the model to target corpus train dev test train dev test WikiQA 873 126 143 8.6k 1.1k 2.3k 9.9 TREC-QA 1.2k 65 68 53k 1.1k 1.4k 43.5 QNLI 86k 10k - 428k 169k - 5.0 Dataset answer candidates / question Listwise pairs Pointwise pairs Transfer Learning (TL) 10 /18 Methods
  • 11.
    Objective Conclusions Methods Experiments • Object functiondetermines the way that the model will be optimized KL-divergence in the list Cross-entropy Point-wise Learning 11 /18 train dev test train dev test WikiQA 873 126 143 8.6k 1.1k 2.3k 9.9 TREC-QA 1.2k 65 68 53k 1.1k 1.4k 43.5 QNLI 86k 10k - 428k 169k - 5.0 Dataset answer candidates / question Listwise pairs Pointwise pairs Methods
  • 12.
  • 13.
    Objective Conclusions Methods Experiments • Benchmark datasetfor sentence-level QA • WikiQA • Answer selection QA dataset constructed from real queries of Bing and Wikipedia • TREC-QA • Answer selection QA dataset created from the TREC Question- Answering tracks • QNLI • Modified version of the SQuAD dataset that allows for sentence selection QA Datasets 13 /18 Experiments
  • 14.
    Objective Conclusions Methods Experiments • We achievethe state-of-the-art performance in both dataset ours previous LM: Language Model LC : Latent Clustering TL : Transfer Learning (using Squad-T) Experimental Results 14 /18 Experiments
  • 15.
    Objective Conclusions Methods Experiments • We achievethe state-of-the-art performance in both dataset 7.0% (0.714  0.764) 9.1% (0.764  0.834) LM: Language Model LC : Latent Clustering TL : Transfer Learning (using Squad-T) Experimental Results 15 /18 Experiments
  • 16.
    Objective Conclusions Methods Experiments • We achievethe state-of-the-art performance in both dataset 3.9% (0.835  0.868) 0.8% (0.868  0.875) LM: Language Model LC : Latent Clustering TL : Transfer Learning (using Squad-T) Experimental Results 16 /18 Experiments
  • 17.
    Objective Conclusions Methods Experiments • QNLI experimentshows efficacy of Latent Clustering method • Point-wise learning show higher performance than that of the list-wise learning Experimental Results 17 /18 Experiments
  • 18.
    Objective Conclusions Methods Experiments Conclusions • Show thatleveraging a large amount of data is crucial for capturing the contextual representation of input text • Show that the proposed latent clustering method with a pointwise objective function significantly improves model performance in the sentence-level QA task • Achieves state-of-the-art performance on both the WikiQA and TREC-QA datasets. 18 /18 Conclusions
  • 19.
    Thank you slide, paper http://david-yoon.github.io