SlideShare a Scribd company logo
1 of 38
BottleSum
Unsupervised and Self-supervised Sentence
Summarization using the Information
Bottleneck Principle
Peter West, Ari Holtzman, Jan Buys, Yejin Choi
1
Summary
โ€ข Methods
โ€ข Unsupervised summarization
โ€ข Self-supervised summarization
โ€ข Using Information Bottleneck principle for summarization
โ€ข Outcomes
โ€ข Better result on automatic / human evaluation
โ€ข Better result on domain where sentence-summary is not available
2
What is task?
3
What is task?
โ€ข Sentence summarization (compression)
โ€ข Given a sentence, summarize it into shorter one
โ€ข Suppose that good sentence summary contains information
related to the broader context while discarding less
significant details
4
How to solve it?
5
How to solve it?
โ€ข Unsupervised methods
โ€ข Why unsupervised?
โ€ข Sentence-summarize pair is not always available
โ€ข Current unsupervised methods use autoencoder(AE) as core
of methods
โ€ข The source sentence should be accurately predicted from summary
โ€ข This goes against the fundamental goal of summarization
โ€ข Crucially needs to forget all but the โ€œrelevantโ€ information
โ€ข Therefore, use Information Bottleneck(IB) to discard
irrelevant information
6
Example
โ€ข Next sentence is about control, so summary should only
contain about control
โ€ข However, summary by autoencoder necessary refer to
population to restore information about it
7
Methods
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
is based mainly on the principle of the
Information Bottleneck(IB)
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
โ€ข Extractive unsupervised method
โ€ข No need to train
โ€ข Take consecutive two sentences as input
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“
โ€ข Abstractive self-supervised method
โ€ข Use the result of ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
as training data
โ€ข Take a sentence as input
8
What is Information Bottleneck? (1/2)
โ€ข Given source ๐‘†, external variable ๐‘Œ, summary ๐‘†
โ€ข Learning a conditional distribution ๐‘( ๐‘†|๐‘†) minimizing:
๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ( ๐‘†; ๐‘Œ)
โ€ข ๐›ฝ: Coefficient to balance two terms
โ€ข ๐ผ : Mutual information between two variables
โ€ข ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = ๐ป ๐‘Œ โˆ’ ๐ป ๐‘Œ ๐‘‹
โ€ข Given information about Y, how much ambiguity of X decrease?
โ€ข Refer appendix
9
What is Information Bottleneck? (2/2)
โ€ข minimizing:
๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ ๐‘†; ๐‘Œ
โ€ข Generate good summary ๐‘† from source ๐‘†
โ€ข First term: pruning term
โ€ข If ๐‘† decrease ambiguity of ๐‘†, then ๐ผ ๐‘†; ๐‘† increase
โ€ข Ensures irrelevant information is discarded
โ€ข Second term: relevant term
โ€ข If ๐‘† decrease ambiguity of ๐‘Œ, then โˆ’๐ผ ๐‘†; ๐‘Œ decrease
โ€ข Ensures ๐‘† and ๐‘Œ share information.
(1)
10
Why IB is better?
โ€ข Suppose in ๐‘†, we have some information ๐‘ which is irrelevant
to ๐‘Œ
โ€ข In IB:
โ€ข Not containing ๐‘ is better.
โ€ข If contain, ๐ผ ๐‘†; ๐‘† increases, and ๐ผ ๐‘†; ๐‘Œ is not affected.
โ€ข In AE:
โ€ข Containing ๐‘ is better.
โ€ข ๐‘ contains information about ๐‘†, decrease reconstruct loss.
โ€ข Reconstruct loss: Suppose reconstruct ๐‘†โ€ฒ from ๐‘†. Difference
between ๐‘†โ€ฒ and ๐‘†
11
Extractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
โ€ข Extractive unsupervised method
12
Implement IB
โ€ข Given sentence ๐‘†, using next sentence as relevance variable ๐‘Œ
โ€ข Get deterministic function mapping sentence ๐‘† to summary ๐‘ 
โ€ข Therefore, ๐‘ ๐‘  ๐‘  = 1
โ€ข In this setting, minimizing (1) is equal to minimizing:
๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ ๐‘†; ๐‘Œ = โˆ’log ๐‘( ๐‘ ) โˆ’ ๐›ฝ1 ๐‘ ๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก| ๐‘  ๐‘ ๐‘  ๐‘™๐‘œ๐‘”๐‘(๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก| ๐‘ )
โ€ข Use pretrained language model for estimating distributions
โ€ข GPT-2 was used
13
Algorithm (1/8)
โ€ข Implement extractive method
โ€ข Iteratively deleting words or phrases from candidates,
starting with the original sentence
โ€ข At each elimination step, only consider candidate deletions
which decrease the value of pruning term
โ€ข When expanding candidate, chose a few candidates with the
highest relevance scores to optimize relevant term
14
Algorithm (2/8)
โ€ข Algorithm:
โ€ข Input:
โ€ข ๐‘ : sentence
โ€ข ๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก: context
โ€ข Hyper parameter:
โ€ข ๐‘š: the number of words to delete
โ€ข ๐‘˜: the number of candidates to
search
15
Algorithm (3/8)
โ€ข E.g
โ€ข ๐‘  = Unsupervised methods use autoencoder as core of methods
โ€ข ๐‘˜ = 1
โ€ข ๐‘š = 3
16
Algorithm (4/8)
โ€ข sโ€ฒ
= Unsupervised methods use autoencoder as core of methods
โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9)
methods use autoencoder as core of methods
Unsupervised use autoencoder as core of methods
Unsupervised methods autoencoder as core of methods
โ€ฆ
Unsupervised methods use autoencoder as core of
17
Algorithm (5/8)
โ€ข sโ€ฒ
= Unsupervised methods use autoencoder as core of methods
โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9)
use autoencoder as core of methods
Unsupervised autoencoder as core of methods
Unsupervised methods as core of methods
โ€ฆ
Unsupervised methods use autoencoder as core
18
Algorithm (6/8)
โ€ข sโ€ฒ
= Unsupervised methods use autoencoder as core of methods
โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9)
autoencoder as core of methods
Unsupervised as core of methods
Unsupervised methods core of methods
โ€ฆ
Unsupervised methods use autoencoder as
19
Algorithm (7/8)
โ€ข Discard bad candidates (l10~l11)
โ€ข For every candidate, estimate ๐‘(๐‘ โ€ฒโ€ฒ)
โ€ข If p sโ€ฒ
< ๐‘(๐‘ โ€ฒโ€ฒ
), then add ๐‘ โ€ฒโ€ฒ
as candidate.
โ€ข This procedure corresponds to decreasing the value of
pruning term.
methods use autoencoder as core of methods
Unsupervised use autoencoder as core of methods
Unsupervised methods use autoencoder as core methods
Unsupervised autoencoder as core of methods
Unsupervised methods as core of methods
โ€ฆ
20
Algorithm (8/8)
โ€ข Chose next sโ€ฒ
from candidates (l4~l5).
โ€ข Sort candidates by ๐‘(๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก|๐‘ โ€ฒ) on descending order.
โ€ข Chose top ๐‘˜ candidates as next sโ€ฒ.
โ€ข This procedure corresponds to decreasing the value of
relevant term.
Unsupervised methods use autoencoder as core methods
Unsupervised use autoencoder as core of methods
methods use autoencoder as core of methods
21
Note
โ€ข It does not train anything at all.
โ€ข About ๐›ฝ1
โ€ข In this algorithms, ensure both pruning term and relevant term
improves
โ€ข Thus, the pruning term and relevant term are not compared directly
โ€ข Therefore, choce of ๐›ฝ1 is less important
22
Abstractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“
(1/2)
โ€ข Abstractive summarization method
โ€ข Train GPT-2 model for summarization
โ€ข Self-supervised learning
โ€ข Use ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅโ€™s output as training data
โ€ข Aims
โ€ข Remove the restriction of extractiveness
โ€ข Learn an explicit compression function not requiring a next sentence
23
Abstractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“
(2/2)
โ€ข Fine-tune GPT-2 model for summarization
โ€ข Train language model
โ€ข Input:[ sentence + โ€œTL;DR:โ€ + summary ]
โ€ข E.g. Hong Kong, a bustling metropolis with a population over 7 million, was
once under British Rule. TL;DR: Hong Kong was once under British Rule.
โ€ข When make summary
โ€ข Input: [ sentence + โ€œTL;DR:โ€ ]
โ€ข E.g. Hong Kong, a bustling metropolis with a population over 7 million, was
once under British Rule. TL;DR:
24
Experiments
25
Unsupervised Models
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
, ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“
โ€ข Use ๐‘˜ =1, ๐‘š =3
โ€ข ๐‘…๐‘’๐‘๐‘œ๐‘› ๐ธ๐‘ฅ
โ€ข Follows ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ, but replace next sentence with the source sentence
โ€ข For probing the role of next sentence
โ€ข ๐‘†๐ธ๐‘„3
โ€ข Trained with an autoencoding objective paired with a topic loss and
language model prior loss
โ€ข Have the highest unsupervised result on DUC
โ€ข PREFIX
โ€ข First 75 bytes of the source sentence
โ€ข INPUT
โ€ข The full input sentence 26
Supervised Models
โ€ข ABS
โ€ข Supervised SOTA result on DUC-2003 dataset
โ€ข Result of Li et al.
โ€ข Supervised SOTA result on DUC-2004 dataset
27
Dataset & Evaluation method
โ€ข Evaluate models on three dataset
โ€ข DUC-2003, DUC-2004 datasets
โ€ข Automatic ROUGE metrics
โ€ข Sentence-summary pairs
โ€ข CNN corpus
โ€ข Summary is not available
โ€ข Human evaluation
โ€ข Compare two summary from different models over 3 attributes
โ€ข coherence, conciseness, and agreement with the input 28
DUC-2003, DUC-2004
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
achieves the highest R-1
and R-L scores for unsupervised on both
dataset
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ achieves the second highest
scores
โ€ข R-2 scores for ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ are lower than
baselines
โ€ข Possibly due to a lack of fluency
29
CNN corpus
โ€ข Use CNN corpus
โ€ข Only sentence is available
โ€ข ๐‘†๐ธ๐‘„3
is trained on CNN corpus
โ€ข ABS is not trained
โ€ข Use the model as originally trained on the Gigaword sentence
dataset
โ€ข Attribute scores are average of three peopleโ€™s score
โ€ข Attribute scores are averaged over a scale of 1 (better), 0
(equal) and -1 (worse)
30
CNN corpus
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ and ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ show stronger performance
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“
score is better than ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ
โ€ข A combination of abstractivness and learning a cohesive underlying
of summarization enable more favorable summary for human
31
Model Comparison
โ€ข ABS requires learning on a large supervised training set
โ€ข Poor out-of-domain performance
โ€ข ๐‘†๐ธ๐‘„3 is unsupervised, but still needs extensive training on a
large corpus of in-domain text
โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ requires neither of them
32
Conclusion
33
Conclusion
โ€ข Methods
โ€ข Unsupervised summarization
โ€ข Self-supervised summarization
โ€ข Using Information Bottleneck principle for summarization
โ€ข Outcomes
โ€ข Better result on automatic / human evaluation
โ€ข Better result on domain where sentence-summary is not available
34
Appendix
35
Definition of mutual information
โ€ข Event ๐ธ, probability ๐‘ƒ(๐ธ)
โ€ข Information: โˆ’ log ๐‘ƒ ๐ธ (generally, base is 2)
โ€ข Entropy: H P = โˆ’ ๐ธโˆˆ๐‘‹ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ
โ€ข E.g. ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.5, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.5
โ€ข H P = โˆ’ ๐ธโˆˆฮฉ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ = โˆ’P sunny logP sunny โˆ’ P rain logP rain
= โˆ’0.5 โˆ— โˆ’1 โˆ’ 0.5 โˆ— โˆ’1 = 1
โ€ข E.g. ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.9, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.1
โ€ข H P = โˆ’ ๐ธโˆˆฮฉ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ = โˆ’P sunny logP sunny โˆ’ P rain logP rain
โ‰ˆ โˆ’0.9 โˆ— โˆ’0.15 โˆ’ 0.5 โˆ— โˆ’3.3 = 2.0
36
Definition of mutual information
โ€ข Conditional Entropy
โ€ข ๐ป ๐‘‹ ๐‘ฆ = โˆ’ ๐‘ฅโˆˆ๐‘‹ ๐‘ƒ ๐‘ฅ ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ)
โ€ข ๐ป ๐‘‹ ๐‘Œ = ๐‘ฆ ๐‘ƒ ๐‘ฆ ๐ป(๐‘‹| ๐‘ฆ) = โ‹ฏ = โˆ’ ๐‘ฅโˆˆ๐‘‹,๐‘ฆโˆˆ๐‘Œ ๐‘ƒ ๐‘ฅ, ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ)
โ€ข Mutual information
โ€ข ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = ๐ป ๐‘Œ โˆ’ ๐ป ๐‘Œ ๐‘‹
โ€ข Given information about Y, how much ambiguity of X decrease?
โ€ข E.g. ๐‘ƒ ๐‘‹ = ๐‘ƒ(๐‘‹|๐‘Œ)
โ€ข ๐ป ๐‘‹ = ๐ป ๐‘‹ ๐‘Œ , so ๐ผ ๐‘‹; ๐‘Œ = 0
37
Definition of mutual information
โ€ข E.g.
โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.5, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.5
โ€ข ๐‘ƒ ๐‘๐‘œ๐‘›๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘™๐‘  = 0.2, ๐‘ƒ ๐‘๐‘™๐‘ข๐‘’๐‘ ๐‘˜๐‘ฆ = 0.8
โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ |contrails = 0.2, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› | ๐‘๐‘œ๐‘›๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘™๐‘  = 0.8
โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ |bluesky = 0.8, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› | ๐‘๐‘™๐‘ข๐‘’๐‘ ๐‘˜๐‘ฆ = 0.2
โ†’ ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = โˆ’
๐‘ฅโˆˆ๐‘‹
๐‘ƒ ๐‘ฅ ๐‘™๐‘œ๐‘”๐‘ƒ ๐‘ฅ +
๐‘ฅโˆˆ๐‘‹,๐‘ฆโˆˆ๐‘Œ
๐‘ƒ ๐‘ฅ, ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ)
= โˆ’0.5 โˆ— โˆ’1 โˆ’ 0.5 โˆ— โˆ’1 + 0.04 โˆ— โˆ’2.3 + 0.16 โˆ— โˆ’0.32 + 0.64 โˆ— โˆ’0.32
+0.16 โˆ— โˆ’2.3 = 1 โˆ’ 0.716 = 0.184
38

More Related Content

What's hot

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisSebastian Ruder
ย 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learningSanjib Basak
ย 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
ย 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationPier Luca Lanzi
ย 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Sri Ambati
ย 
Lecture 7
Lecture 7Lecture 7
Lecture 7butest
ย 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationPier Luca Lanzi
ย 
evaluation in infomation retrival
evaluation in infomation retrivalevaluation in infomation retrival
evaluation in infomation retrivaljetaime
ย 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniquesOmar F. Althuwaynee
ย 
Introduction to sentiment analysis
Introduction to sentiment analysisIntroduction to sentiment analysis
Introduction to sentiment analysisRajesh Piryani
ย 
Lecture 07 search techniques
Lecture 07 search techniquesLecture 07 search techniques
Lecture 07 search techniquesHema Kashyap
ย 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
ย 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
ย 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
ย 

What's hot (16)

A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment AnalysisA Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
A Hierarchical Model of Reviews for Aspect-based Sentiment Analysis
ย 
Topic extraction using machine learning
Topic extraction using machine learningTopic extraction using machine learning
Topic extraction using machine learning
ย 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
ย 
DMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluationDMTM Lecture 06 Classification evaluation
DMTM Lecture 06 Classification evaluation
ย 
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
Achieving Algorithmic Transparency with Shapley Additive Explanations (H2O Lo...
ย 
Lecture 7
Lecture 7Lecture 7
Lecture 7
ย 
DMTM Lecture 19 Data exploration
DMTM Lecture 19 Data explorationDMTM Lecture 19 Data exploration
DMTM Lecture 19 Data exploration
ย 
evaluation in infomation retrival
evaluation in infomation retrivalevaluation in infomation retrival
evaluation in infomation retrival
ย 
Missing Data and data imputation techniques
Missing Data and data imputation techniquesMissing Data and data imputation techniques
Missing Data and data imputation techniques
ย 
Introduction to sentiment analysis
Introduction to sentiment analysisIntroduction to sentiment analysis
Introduction to sentiment analysis
ย 
Lecture 07 search techniques
Lecture 07 search techniquesLecture 07 search techniques
Lecture 07 search techniques
ย 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
ย 
Parkinson disease classification v2.0
Parkinson disease classification v2.0Parkinson disease classification v2.0
Parkinson disease classification v2.0
ย 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
ย 
Sentiment Analysis - Amazon Alexa Reviews
Sentiment Analysis - Amazon Alexa ReviewsSentiment Analysis - Amazon Alexa Reviews
Sentiment Analysis - Amazon Alexa Reviews
ย 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
ย 

Similar to Bottle sum

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Sangwoo Mo
ย 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningDongmin Lee
ย 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...Balรกzs Hidasi
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...Vahid Taslimitehrani
ย 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models ananth
ย 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityStitch Fix Algorithms
ย 
Unit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxUnit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxhiblooms
ย 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Chris Ohk
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceAmit Sharma
ย 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
ย 
Topic 3 (Stats summary)
Topic 3 (Stats summary)Topic 3 (Stats summary)
Topic 3 (Stats summary)Ryan Herzog
ย 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Julius Hietala
ย 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
ย 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsAndrea Arcuri
ย 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching AlgorithmsAfaq Mansoor Khan
ย 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Asiri Wijesinghe
ย 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Anirudh Jayakumar
ย 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ Tsuyoshi Sakama
ย 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analyticsdeepika4721
ย 

Similar to Bottle sum (20)

Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
Challenging Common Assumptions in the Unsupervised Learning of Disentangled R...
ย 
Exploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement LearningExploration Strategies in Reinforcement Learning
Exploration Strategies in Reinforcement Learning
ย 
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
GRU4Rec v2 - Recurrent Neural Networks with Top-k Gains for Session-based Rec...
ย 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
ย 
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
A new CPXR Based Logistic Regression Method and Clinical Prognostic Modeling ...
ย 
Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models Artificial Intelligence Course: Linear models
Artificial Intelligence Course: Linear models
ย 
Progression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test VelocityProgression by Regression: How to increase your A/B Test Velocity
Progression by Regression: How to increase your A/B Test Velocity
ย 
Unit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptxUnit 3 โ€“ AIML.pptx
Unit 3 โ€“ AIML.pptx
ย 
Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017Proximal Policy Optimization Algorithms, Schulman et al, 2017
Proximal Policy Optimization Algorithms, Schulman et al, 2017
ย 
Dowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inferenceDowhy: An end-to-end library for causal inference
Dowhy: An end-to-end library for causal inference
ย 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
ย 
Topic 3 (Stats summary)
Topic 3 (Stats summary)Topic 3 (Stats summary)
Topic 3 (Stats summary)
ย 
Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"Slides for "Do Deep Generative Models Know What They Don't know?"
Slides for "Do Deep Generative Models Know What They Don't know?"
ย 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
ย 
ISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to StatisticsISSTA'16 Summer School: Intro to Statistics
ISSTA'16 Summer School: Intro to Statistics
ย 
Searching Algorithms
Searching AlgorithmsSearching Algorithms
Searching Algorithms
ย 
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool (BLAST)
ย 
Tutorial on Coreference Resolution
Tutorial on Coreference Resolution Tutorial on Coreference Resolution
Tutorial on Coreference Resolution
ย 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ 
Elements of Statistical Learning ่ชญใฟไผš ็ฌฌ2็ซ 
ย 
Statistics for analytics
Statistics for analyticsStatistics for analytics
Statistics for analytics
ย 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araรบjo
ย 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
ย 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
ย 
Scaling API-first โ€“ The story of a global engineering organization
Scaling API-first โ€“ The story of a global engineering organizationScaling API-first โ€“ The story of a global engineering organization
Scaling API-first โ€“ The story of a global engineering organizationRadu Cotescu
ย 
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜RTylerCroy
ย 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
ย 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
ย 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
ย 
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024BookNet Canada
ย 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
ย 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
ย 
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
ย 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
ย 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
ย 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
ย 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
ย 
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...Alan Dix
ย 
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhisoniya singh
ย 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
ย 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
ย 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
ย 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
ย 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
ย 
Scaling API-first โ€“ The story of a global engineering organization
Scaling API-first โ€“ The story of a global engineering organizationScaling API-first โ€“ The story of a global engineering organization
Scaling API-first โ€“ The story of a global engineering organization
ย 
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜๐Ÿฌ  The future of MySQL is Postgres   ๐Ÿ˜
๐Ÿฌ The future of MySQL is Postgres ๐Ÿ˜
ย 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
ย 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
ย 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
ย 
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: Whatโ€™s new for BISAC - Tech Forum 2024
ย 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
ย 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
ย 
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 โœ“Call Girls In Kalyan ( Mumbai ) secure service
ย 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
ย 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
ย 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
ย 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
ย 
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...
Swan(sea) Song โ€“ personal research during my six years at Swansea ... and bey...
ย 
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | DelhiFULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY ๐Ÿ” 8264348440 ๐Ÿ” Call Girls in Diplomatic Enclave | Delhi
ย 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
ย 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
ย 

Bottle sum

  • 1. BottleSum Unsupervised and Self-supervised Sentence Summarization using the Information Bottleneck Principle Peter West, Ari Holtzman, Jan Buys, Yejin Choi 1
  • 2. Summary โ€ข Methods โ€ข Unsupervised summarization โ€ข Self-supervised summarization โ€ข Using Information Bottleneck principle for summarization โ€ข Outcomes โ€ข Better result on automatic / human evaluation โ€ข Better result on domain where sentence-summary is not available 2
  • 4. What is task? โ€ข Sentence summarization (compression) โ€ข Given a sentence, summarize it into shorter one โ€ข Suppose that good sentence summary contains information related to the broader context while discarding less significant details 4
  • 5. How to solve it? 5
  • 6. How to solve it? โ€ข Unsupervised methods โ€ข Why unsupervised? โ€ข Sentence-summarize pair is not always available โ€ข Current unsupervised methods use autoencoder(AE) as core of methods โ€ข The source sentence should be accurately predicted from summary โ€ข This goes against the fundamental goal of summarization โ€ข Crucially needs to forget all but the โ€œrelevantโ€ information โ€ข Therefore, use Information Bottleneck(IB) to discard irrelevant information 6
  • 7. Example โ€ข Next sentence is about control, so summary should only contain about control โ€ข However, summary by autoencoder necessary refer to population to restore information about it 7
  • 8. Methods โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ is based mainly on the principle of the Information Bottleneck(IB) โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ โ€ข Extractive unsupervised method โ€ข No need to train โ€ข Take consecutive two sentences as input โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ โ€ข Abstractive self-supervised method โ€ข Use the result of ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ as training data โ€ข Take a sentence as input 8
  • 9. What is Information Bottleneck? (1/2) โ€ข Given source ๐‘†, external variable ๐‘Œ, summary ๐‘† โ€ข Learning a conditional distribution ๐‘( ๐‘†|๐‘†) minimizing: ๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ( ๐‘†; ๐‘Œ) โ€ข ๐›ฝ: Coefficient to balance two terms โ€ข ๐ผ : Mutual information between two variables โ€ข ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = ๐ป ๐‘Œ โˆ’ ๐ป ๐‘Œ ๐‘‹ โ€ข Given information about Y, how much ambiguity of X decrease? โ€ข Refer appendix 9
  • 10. What is Information Bottleneck? (2/2) โ€ข minimizing: ๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ ๐‘†; ๐‘Œ โ€ข Generate good summary ๐‘† from source ๐‘† โ€ข First term: pruning term โ€ข If ๐‘† decrease ambiguity of ๐‘†, then ๐ผ ๐‘†; ๐‘† increase โ€ข Ensures irrelevant information is discarded โ€ข Second term: relevant term โ€ข If ๐‘† decrease ambiguity of ๐‘Œ, then โˆ’๐ผ ๐‘†; ๐‘Œ decrease โ€ข Ensures ๐‘† and ๐‘Œ share information. (1) 10
  • 11. Why IB is better? โ€ข Suppose in ๐‘†, we have some information ๐‘ which is irrelevant to ๐‘Œ โ€ข In IB: โ€ข Not containing ๐‘ is better. โ€ข If contain, ๐ผ ๐‘†; ๐‘† increases, and ๐ผ ๐‘†; ๐‘Œ is not affected. โ€ข In AE: โ€ข Containing ๐‘ is better. โ€ข ๐‘ contains information about ๐‘†, decrease reconstruct loss. โ€ข Reconstruct loss: Suppose reconstruct ๐‘†โ€ฒ from ๐‘†. Difference between ๐‘†โ€ฒ and ๐‘† 11
  • 12. Extractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ โ€ข Extractive unsupervised method 12
  • 13. Implement IB โ€ข Given sentence ๐‘†, using next sentence as relevance variable ๐‘Œ โ€ข Get deterministic function mapping sentence ๐‘† to summary ๐‘  โ€ข Therefore, ๐‘ ๐‘  ๐‘  = 1 โ€ข In this setting, minimizing (1) is equal to minimizing: ๐ผ ๐‘†; ๐‘† โˆ’ ๐›ฝ๐ผ ๐‘†; ๐‘Œ = โˆ’log ๐‘( ๐‘ ) โˆ’ ๐›ฝ1 ๐‘ ๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก| ๐‘  ๐‘ ๐‘  ๐‘™๐‘œ๐‘”๐‘(๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก| ๐‘ ) โ€ข Use pretrained language model for estimating distributions โ€ข GPT-2 was used 13
  • 14. Algorithm (1/8) โ€ข Implement extractive method โ€ข Iteratively deleting words or phrases from candidates, starting with the original sentence โ€ข At each elimination step, only consider candidate deletions which decrease the value of pruning term โ€ข When expanding candidate, chose a few candidates with the highest relevance scores to optimize relevant term 14
  • 15. Algorithm (2/8) โ€ข Algorithm: โ€ข Input: โ€ข ๐‘ : sentence โ€ข ๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก: context โ€ข Hyper parameter: โ€ข ๐‘š: the number of words to delete โ€ข ๐‘˜: the number of candidates to search 15
  • 16. Algorithm (3/8) โ€ข E.g โ€ข ๐‘  = Unsupervised methods use autoencoder as core of methods โ€ข ๐‘˜ = 1 โ€ข ๐‘š = 3 16
  • 17. Algorithm (4/8) โ€ข sโ€ฒ = Unsupervised methods use autoencoder as core of methods โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9) methods use autoencoder as core of methods Unsupervised use autoencoder as core of methods Unsupervised methods autoencoder as core of methods โ€ฆ Unsupervised methods use autoencoder as core of 17
  • 18. Algorithm (5/8) โ€ข sโ€ฒ = Unsupervised methods use autoencoder as core of methods โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9) use autoencoder as core of methods Unsupervised autoencoder as core of methods Unsupervised methods as core of methods โ€ฆ Unsupervised methods use autoencoder as core 18
  • 19. Algorithm (6/8) โ€ข sโ€ฒ = Unsupervised methods use autoencoder as core of methods โ€ข List up next candidates ๐‘ โ€ฒโ€ฒ by removing up to ๐‘š words (l7~l9) autoencoder as core of methods Unsupervised as core of methods Unsupervised methods core of methods โ€ฆ Unsupervised methods use autoencoder as 19
  • 20. Algorithm (7/8) โ€ข Discard bad candidates (l10~l11) โ€ข For every candidate, estimate ๐‘(๐‘ โ€ฒโ€ฒ) โ€ข If p sโ€ฒ < ๐‘(๐‘ โ€ฒโ€ฒ ), then add ๐‘ โ€ฒโ€ฒ as candidate. โ€ข This procedure corresponds to decreasing the value of pruning term. methods use autoencoder as core of methods Unsupervised use autoencoder as core of methods Unsupervised methods use autoencoder as core methods Unsupervised autoencoder as core of methods Unsupervised methods as core of methods โ€ฆ 20
  • 21. Algorithm (8/8) โ€ข Chose next sโ€ฒ from candidates (l4~l5). โ€ข Sort candidates by ๐‘(๐‘  ๐‘›๐‘’๐‘ฅ๐‘ก|๐‘ โ€ฒ) on descending order. โ€ข Chose top ๐‘˜ candidates as next sโ€ฒ. โ€ข This procedure corresponds to decreasing the value of relevant term. Unsupervised methods use autoencoder as core methods Unsupervised use autoencoder as core of methods methods use autoencoder as core of methods 21
  • 22. Note โ€ข It does not train anything at all. โ€ข About ๐›ฝ1 โ€ข In this algorithms, ensure both pruning term and relevant term improves โ€ข Thus, the pruning term and relevant term are not compared directly โ€ข Therefore, choce of ๐›ฝ1 is less important 22
  • 23. Abstractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ (1/2) โ€ข Abstractive summarization method โ€ข Train GPT-2 model for summarization โ€ข Self-supervised learning โ€ข Use ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅโ€™s output as training data โ€ข Aims โ€ข Remove the restriction of extractiveness โ€ข Learn an explicit compression function not requiring a next sentence 23
  • 24. Abstractive: ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ (2/2) โ€ข Fine-tune GPT-2 model for summarization โ€ข Train language model โ€ข Input:[ sentence + โ€œTL;DR:โ€ + summary ] โ€ข E.g. Hong Kong, a bustling metropolis with a population over 7 million, was once under British Rule. TL;DR: Hong Kong was once under British Rule. โ€ข When make summary โ€ข Input: [ sentence + โ€œTL;DR:โ€ ] โ€ข E.g. Hong Kong, a bustling metropolis with a population over 7 million, was once under British Rule. TL;DR: 24
  • 26. Unsupervised Models โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ , ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ โ€ข Use ๐‘˜ =1, ๐‘š =3 โ€ข ๐‘…๐‘’๐‘๐‘œ๐‘› ๐ธ๐‘ฅ โ€ข Follows ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ, but replace next sentence with the source sentence โ€ข For probing the role of next sentence โ€ข ๐‘†๐ธ๐‘„3 โ€ข Trained with an autoencoding objective paired with a topic loss and language model prior loss โ€ข Have the highest unsupervised result on DUC โ€ข PREFIX โ€ข First 75 bytes of the source sentence โ€ข INPUT โ€ข The full input sentence 26
  • 27. Supervised Models โ€ข ABS โ€ข Supervised SOTA result on DUC-2003 dataset โ€ข Result of Li et al. โ€ข Supervised SOTA result on DUC-2004 dataset 27
  • 28. Dataset & Evaluation method โ€ข Evaluate models on three dataset โ€ข DUC-2003, DUC-2004 datasets โ€ข Automatic ROUGE metrics โ€ข Sentence-summary pairs โ€ข CNN corpus โ€ข Summary is not available โ€ข Human evaluation โ€ข Compare two summary from different models over 3 attributes โ€ข coherence, conciseness, and agreement with the input 28
  • 29. DUC-2003, DUC-2004 โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ achieves the highest R-1 and R-L scores for unsupervised on both dataset โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ achieves the second highest scores โ€ข R-2 scores for ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ are lower than baselines โ€ข Possibly due to a lack of fluency 29
  • 30. CNN corpus โ€ข Use CNN corpus โ€ข Only sentence is available โ€ข ๐‘†๐ธ๐‘„3 is trained on CNN corpus โ€ข ABS is not trained โ€ข Use the model as originally trained on the Gigaword sentence dataset โ€ข Attribute scores are average of three peopleโ€™s score โ€ข Attribute scores are averaged over a scale of 1 (better), 0 (equal) and -1 (worse) 30
  • 31. CNN corpus โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ and ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ show stronger performance โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐‘†๐‘’๐‘™๐‘“ score is better than ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ โ€ข A combination of abstractivness and learning a cohesive underlying of summarization enable more favorable summary for human 31
  • 32. Model Comparison โ€ข ABS requires learning on a large supervised training set โ€ข Poor out-of-domain performance โ€ข ๐‘†๐ธ๐‘„3 is unsupervised, but still needs extensive training on a large corpus of in-domain text โ€ข ๐ต๐‘œ๐‘ก๐‘ก๐‘™๐‘’๐‘†๐‘ข๐‘š ๐ธ๐‘ฅ requires neither of them 32
  • 34. Conclusion โ€ข Methods โ€ข Unsupervised summarization โ€ข Self-supervised summarization โ€ข Using Information Bottleneck principle for summarization โ€ข Outcomes โ€ข Better result on automatic / human evaluation โ€ข Better result on domain where sentence-summary is not available 34
  • 36. Definition of mutual information โ€ข Event ๐ธ, probability ๐‘ƒ(๐ธ) โ€ข Information: โˆ’ log ๐‘ƒ ๐ธ (generally, base is 2) โ€ข Entropy: H P = โˆ’ ๐ธโˆˆ๐‘‹ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ โ€ข E.g. ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.5, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.5 โ€ข H P = โˆ’ ๐ธโˆˆฮฉ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ = โˆ’P sunny logP sunny โˆ’ P rain logP rain = โˆ’0.5 โˆ— โˆ’1 โˆ’ 0.5 โˆ— โˆ’1 = 1 โ€ข E.g. ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.9, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.1 โ€ข H P = โˆ’ ๐ธโˆˆฮฉ ๐‘ƒ ๐ธ ๐‘™๐‘œ๐‘”๐‘ƒ ๐ธ = โˆ’P sunny logP sunny โˆ’ P rain logP rain โ‰ˆ โˆ’0.9 โˆ— โˆ’0.15 โˆ’ 0.5 โˆ— โˆ’3.3 = 2.0 36
  • 37. Definition of mutual information โ€ข Conditional Entropy โ€ข ๐ป ๐‘‹ ๐‘ฆ = โˆ’ ๐‘ฅโˆˆ๐‘‹ ๐‘ƒ ๐‘ฅ ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ) โ€ข ๐ป ๐‘‹ ๐‘Œ = ๐‘ฆ ๐‘ƒ ๐‘ฆ ๐ป(๐‘‹| ๐‘ฆ) = โ‹ฏ = โˆ’ ๐‘ฅโˆˆ๐‘‹,๐‘ฆโˆˆ๐‘Œ ๐‘ƒ ๐‘ฅ, ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ) โ€ข Mutual information โ€ข ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = ๐ป ๐‘Œ โˆ’ ๐ป ๐‘Œ ๐‘‹ โ€ข Given information about Y, how much ambiguity of X decrease? โ€ข E.g. ๐‘ƒ ๐‘‹ = ๐‘ƒ(๐‘‹|๐‘Œ) โ€ข ๐ป ๐‘‹ = ๐ป ๐‘‹ ๐‘Œ , so ๐ผ ๐‘‹; ๐‘Œ = 0 37
  • 38. Definition of mutual information โ€ข E.g. โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ = 0.5, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› = 0.5 โ€ข ๐‘ƒ ๐‘๐‘œ๐‘›๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘™๐‘  = 0.2, ๐‘ƒ ๐‘๐‘™๐‘ข๐‘’๐‘ ๐‘˜๐‘ฆ = 0.8 โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ |contrails = 0.2, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› | ๐‘๐‘œ๐‘›๐‘ก๐‘Ÿ๐‘Ž๐‘–๐‘™๐‘  = 0.8 โ€ข ๐‘ƒ ๐‘ ๐‘ข๐‘›๐‘›๐‘ฆ |bluesky = 0.8, ๐‘ƒ ๐‘Ÿ๐‘Ž๐‘–๐‘› | ๐‘๐‘™๐‘ข๐‘’๐‘ ๐‘˜๐‘ฆ = 0.2 โ†’ ๐ผ ๐‘‹; ๐‘Œ = ๐ป ๐‘‹ โˆ’ ๐ป ๐‘‹ ๐‘Œ = โˆ’ ๐‘ฅโˆˆ๐‘‹ ๐‘ƒ ๐‘ฅ ๐‘™๐‘œ๐‘”๐‘ƒ ๐‘ฅ + ๐‘ฅโˆˆ๐‘‹,๐‘ฆโˆˆ๐‘Œ ๐‘ƒ ๐‘ฅ, ๐‘ฆ ๐‘™๐‘œ๐‘”๐‘ƒ(๐‘ฅ|๐‘ฆ) = โˆ’0.5 โˆ— โˆ’1 โˆ’ 0.5 โˆ— โˆ’1 + 0.04 โˆ— โˆ’2.3 + 0.16 โˆ— โˆ’0.32 + 0.64 โˆ— โˆ’0.32 +0.16 โˆ— โˆ’2.3 = 1 โˆ’ 0.716 = 0.184 38

Editor's Notes

  1. BottleSumใจใ„ใ†ๆ‰‹ๆณ•ใ‚’็ดนไป‹ใ—ใพใ™
  2. ๆœ€ๅˆใซๆฆ‚่ฆใ‚’่ชฌๆ˜Žใ—ใพใ™ใ€‚ ใ“ใฎ่ซ–ๆ–‡ใงใฏUnsupervisedใŠใ‚ˆใณSelf-supervised ใง่ฆ็ด„ใ‚’่กŒใ†ๆ‰‹ๆณ•ใ‚’ๆๆกˆใ—ใฆใ„ใพใ™ใ€‚ Information Bottleneck ใจใ„ใ†ใ€ๆƒ…ๅ ฑ็†่ซ–ใซๅŸบใฅใ็†่ซ–ใ‚’็”จใ„ใฆ่ฆ็ด„ใ‚’่กŒใ„ใพใ™ ๆˆๆžœใจใ—ใฆใ€่‡ชๅ‹•็š„ใซ่กŒใ‚ใ‚Œใ‚‹่ฉ•ไพกใ€ไบบ้–“ใซใ‚ˆใ‚Š่กŒใ†่ฉ•ไพกใซใŠใ„ใฆ่‰ฏใ„็ตๆžœใŒๅพ—ใ‚‰ใ‚Œใพใ—ใŸใ€‚ ็‰นใซใ€ๆ–‡็ซ ใจใ‚ตใƒžใƒชใƒผใฎใƒšใ‚ขใŒๅพ—ใ‚‰ใ‚Œใชใ„้ ˜ๅŸŸใซใŠใ„ใฆใ‚‚่‰ฏใ„็ตๆžœใŒๅพ—ใ‚‰ใ‚Œใพใ—ใŸ
  3. ใ“ใฎ่ซ–ๆ–‡ใงๅ–ใ‚Š็ต„ใ‚“ใ ใ‚ฟใ‚นใ‚ฏใฏๆ–‡็ซ ใฎ่ฆ็ด„ใงใ™ใ€‚ ๆ–‡็ซ ใŒไธŽใˆใ‚‰ใ‚ŒใŸใจใใซใ€ใใฎๆ–‡็ซ ใ‚’่ฆ็ด„ใ—ใŸ็Ÿญใ„ๆ–‡็ซ ใ‚’็”Ÿๆˆใ—ใพใ™ ใ“ใฎ่ซ–ๆ–‡ใงใฏใ€ๅน…ๅบƒใ„ๆ–‡่„ˆใซ้–ขใ™ใ‚‹ๆƒ…ๅ ฑใ‚’ๅซใ‚“ใงใ„ใฆใ€ใ‹ใค็ดฐใ‹ใ„ๆƒ…ๅ ฑใฏๆจใฆใŸใ‚ˆใ†ใชๆ–‡็ซ ใŒ่‰ฏใ„่ฆ็ด„ใงใ‚ใ‚‹ใจใ„ใ†่€ƒใˆใฎใ‚‚ใจใงๆ‰‹ๆณ•ใ‚’ๆๆกˆใ—ใฆใ„ใพใ™
  4. ใชใœUnsupervised ใชใฎใ‹ใงใ™ใŒใ€ใ“ใ‚Œใฏๆ–‡็ซ ใจใ‚ตใƒžใƒชใƒผใฎใƒšใ‚ขใฏๅธธใซไฝฟใˆใ‚‹ใจใ„ใ†ใ‚ใ‘ใงใฏใชใ„ใŸใ‚ใงใ™ใ€‚ ๅคง่ฆๆจกใชๆ–‡็ซ ใจใ‚ตใƒžใƒชใƒผใฎใƒšใ‚ขใ‚’็”จๆ„ใ™ใ‚‹ใ“ใจใฏไธ€่ˆฌ็š„ใซใฏใ‚ณใ‚นใƒˆใŒใ‹ใ‹ใ‚Š้›ฃใ—ใ„ใฎใงใ€็”Ÿใฎๆ–‡็ซ ใ ใ‘ใง่ฆ็ด„ใŒ็”Ÿๆˆใงใใ‚‹ใ‚ˆใ†ใซใชใ‚Œใฐๅฌ‰ใ—ใ„ใงใ™ใ€‚ #ๅพŒใฎๅฎŸ้จ“ใฎ้ƒจๅˆ†ใงใ‚‚่ฟฐในใพใ™ใŒใ€ๆ—ขๅญ˜ใฎๆ•™ๅธซใ‚ใ‚Šๅญฆ็ฟ’ใงๅญฆ็ฟ’##ใ—ใŸใƒขใƒ‡ใƒซใงใฏใ€ๅญฆ็ฟ’ๆ™‚ใซ็”จใ„ใŸใƒ†ใ‚ญใ‚นใƒˆใจ่ฆ็ด„ใ‚’็”Ÿๆˆใ™ใ‚‹ใจ#ใใซ็”จใ„ใ‚‹ใƒ†ใ‚ญใ‚นใƒˆใฎ็‰นๅพดใฎๅทฎใŒๅคงใใ„ใจใ€็ถบ้บ—ใช่ฆ็ด„ใŒ็”Ÿๆˆ#ใงใใชใ„ใจใ„ใ†ๆฌ ็‚นใŒใ‚ใ‚Šใพใ™ใ€‚ ใพใŸใ€ๆ—ขๅญ˜ใฎๆ‰‹ๆณ•ใฏใ‚ชใƒผใƒˆใ‚จใƒณใ‚ณใƒผใƒ€ใƒผใ‚’ๅŸบใซใ—ใŸๆ‰‹ๆณ•ใซใชใฃใฆใ„ใพใ™ใ€‚ ใ‚ชใƒผใƒˆใ‚จใƒณใ‚ณใƒผใƒ€ใƒผใ‚’ๅŸบใซใ™ใ‚‹ๆ‰‹ๆณ•ใงใฏใ€ๅœง็ธฎใ—ใŸ่ฆ็ด„ใ‹ใ‚‰่ฆ็ด„ใ™ใ‚‹ๅ‰ใฎๆ–‡็ซ ใ‚’ๆญฃ็ขบใซๅพฉๅ…ƒใ™ใ‚‹ๅฟ…่ฆใŒใ‚ใ‚Šใ€ๆ–‡่„ˆใซ้–ขไฟ‚ใชใ„ๆƒ…ๅ ฑใ‚’ๆจใฆใ‚‹ใจใ„ใ†ใ“ใจใŒใงใใพใ›ใ‚“ใ€‚ ใ‚ˆใฃใฆใ€ใ“ใฎ่ซ–ๆ–‡ใงใฏ Information Bottleneck ใ‚’็”จใ„ใฆๆ–‡่„ˆใซใ‚ใพใ‚Š้–ขไฟ‚ใฎใชใ„ๆƒ…ๅ ฑใ‚’ๆจใฆใ‚‹ใ‚ˆใ†ใซใ—ใพใ™ใ€‚
  5. ไพ‹ใˆใฐใ€ใ“ใฎๆ–‡็ซ ใ‚’่€ƒใˆใพใ™ใ€‚ Source ใฏ่ฆ็ด„ใ™ใ‚‹ๆ–‡็ซ ใ€ๅณใฎๆ–‡็ซ ใฏๆฌกใฎๆ–‡็ซ ใงใ™ใ€‚ ๆฌกใฎๆ–‡็ซ ใฏ็ตฑๆฒปๆจฉใฎ่ฉฑใชใฎใงใ€่ฆ็ด„ใงใฏ็ตฑๆฒปใซใคใ„ใฆใฎ่ฉฑใ ใ‘ใ‚’ๅซใ‚ใฆใŠใใŸใ„ใจ่€ƒใˆใพใ™ใ€‚ ใ—ใ‹ใ—ใ€AEใงใฏๅ…ƒใฎๆ–‡็ซ ใ‚’ๅพฉๅ…ƒใ™ใ‚‹ใ“ใจใŒๅฟ…่ฆใชใฎใงใ€ไบบๅฃใฎ่ฉฑใ‚‚่ฆ็ด„ใฎไธญใซๅซใ‚ใฆใŠใๅฟ…่ฆใŒใ‚ใ‚Šใพใ™ใ€‚ IBใฎ่ฆ็ด„ใงใฏ็ตฑๆฒปๆจฉใ ใ‘ใฎ่ฉฑใ‚’ใ—ใฆใ„ใ‚‹ใฎใงใ€ใ“ใฎใ‚ˆใ†ใชใ‚ฑใƒผใ‚นใงใฏIBใ‚’ไฝฟใฃใŸๆ–นใŒ่‡ช็„ถใช่ฆ็ด„ใŒ็”Ÿๆˆใงใใฆใ„ใ‚‹ใ“ใจใŒใ‚ใ‹ใ‚Šใพใ™ใ€‚
  6. ใ“ใฎ่ซ–ๆ–‡ใงๆๆกˆใ•ใ‚Œใฆใ„ใ‚‹ๆ‰‹ๆณ•ใฏ2ใคใงใ™ใ€‚ BottleSum Ex ใจใ„ใ†ๆ‰‹ๆณ•ใฏใ€Information Bottleneck ใ‚’็”จใ„ใฆExtractive ใช่ฆ็ด„ใ‚’่กŒใ†ๆ‰‹ๆณ•ใงใ™ใ€‚ ไบ‹ๅ‰ใซๅญฆ็ฟ’ใ•ใ‚ŒใŸ่จ€่ชžใƒขใƒ‡ใƒซใ‚’็”จใ„ใ‚‹ใŸใ‚ใ€ๅญฆ็ฟ’ใฎๅฟ…่ฆใŒใ‚ใ‚Šใพใ›ใ‚“ใ€‚ ใ‚ใ‚‹ๆ–‡็ซ ใจๆฌกใฎๆ–‡็ซ ใ‚’ๅ…ฅๅŠ›ใจใ—ใฆใ€่ฆ็ด„ใ‚’็”Ÿๆˆใ—ใพใ™ใ€‚ BottleSum Self ใจใ„ใ†ๆ‰‹ๆณ•ใฏใ€่จ“็ทดใ—ใŸ่จ€่ชžใƒขใƒ‡ใƒซใ‚’็”จใ„ใฆAbstractive ใชใ‚ˆใ†ใ‚„ใใ‚’่กŒใ†ๆ‰‹ๆณ•ใงใ™ใ€‚ ่จ€่ชžใƒขใƒ‡ใƒซใฏใ€BottleSum Ex ใซใ‚ˆใฃใฆ็”Ÿๆˆใ•ใ‚ŒใŸใ‚ˆใ†ใ‚„ใใ‚’ๅญฆ็ฟ’ใƒ‡ใƒผใ‚ฟใจใ—ใฆ่จ“็ทดใ—ใพใ™ใ€‚ ใ“ใฎใƒขใƒ‡ใƒซใฏๅ˜ไธ€ใฎๆ–‡็ซ ใ‚’ๅ…ฅๅŠ›ใจใ—ใฆใ‚ˆใ†ใ‚„ใใ‚’็”Ÿๆˆใ—ใพใ™ใ€‚
  7. ใใ‚Œใงใฏใ€IBใซใคใ„ใฆใฎ่ชฌๆ˜Žใ‚’ใ—ใพใ™ใ€‚ ่จญๅฎšใจใ—ใฆใฏใ€ใ‚ใ‚‹ๆƒ…ๅ ฑๆบSใŒไธŽใˆใ‚‰ใ‚Œใ€ใใ‚Œใซ้–ข้€ฃใ—ใŸๅค–้ƒจใฎๆƒ…ๅ ฑYใŒใ‚ใ‚‹ใจใใซใ€่‰ฏใ„ใ‚ตใƒžใƒชใƒผS~(ใƒใƒซใƒ€) ใ‚’็”Ÿๆˆใ™ใ‚‹ใจใ„ใ†่จญๅฎšใงใ™ใ€‚ ใใฎใ‚‚ใจใงใ€ๆกไปถไป˜ใ็ขบ็Ž‡ๅˆ†ๅธƒ p(S~|S) ใ‚’ใ“ใฎๆกไปถใฎใ‚‚ใจใงๅญฆ็ฟ’ใ—ใพใ™ใ€‚ ใ“ใ“ใง, ฮฒใฏไบŒใคใฎ้ …ใ‚’่ชฟๆ•ดใ™ใ‚‹้ …ใ€ IใฏYใซใคใ„ใฆๆƒ…ๅ ฑใŒไธŽใˆใ‚‰ใ‚ŒใŸๆ™‚ใ€Xใฎๆ›–ๆ˜งใ•ใŒใฉใ‚Œใใ‚‰ใ„ๆธ›ใ‚‹ใ‹๏ผŸใจใ„ใ†ใฎใ‚’่จˆ็ฎ—ใ™ใ‚‹้–ขๆ•ฐใงใ™ใ€‚ ๅฎš็พฉใชใฉใฏAppendixใซ่ผ‰ใ›ใฆใ„ใพใ™ใŒใ€ใจใ‚Šใ‚ใˆใšใฏYใซใคใ„ใฆๆƒ…ๅ ฑใŒไธŽใˆใ‚‰ใ‚ŒใŸๆ™‚ใ€Xใฎๆ›–ๆ˜งใ•ใŒใฉใ‚Œใใ‚‰ใ„ๆธ›ใ‚‹ใ‹๏ผŸใจใ„ใ†ใฎใ‚’่จˆ็ฎ—ใ—ใฆใ„ใ‚‹ใจใ„ใ†ใ“ใจใŒใ‚ใ‹ใฃใฆใ„ใ‚Œใฐๅคงไธˆๅคซใงใ™
  8. ๅ…ทไฝ“็š„ใซใใ‚Œใžใ‚Œใฎ้ …ใŒใฉใ†ใ„ใ†ใ“ใจใซๅฏพๅฟœใ—ใฆใ„ใ‚‹ใ‹ใ‚’่ฉฑใ—ใพใ™ใ€‚ ่ตคใ„ๆ–นใฎ้ …ใฏpruning term ใจๅ‘ผใณใ€ไบŒใค็›ฎใฎ้ …ใจๅˆใ‚ใ›ใ‚‹ใ“ใจใงใ‚ตใƒžใƒชใƒผใซ้–ขไฟ‚ใชใ„ๆƒ…ๅ ฑใŒๅซใพใ‚Œใชใ„ใ‚ˆใ†ใซไฟ่จผใ™ใ‚‹้ …ใงใ™ใ€‚ S~ใŒSใฎๆ›–ๆ˜งใ•ใ‚’ๆธ›ใ‚‰ใ™ๅ ดๅˆใซใ€้ …ใฎๅ€คใจใ—ใฆใฏๅข—ๅŠ ใ—ใพใ™ใ€‚ ไธ€ๆ–นใ€้’ใ„ๆ–นใฎ้ …ใฏ relevant term ใจๅ‘ผใณใ€S~ใจYใŒๆƒ…ๅ ฑใ‚’ๅ…ฑๆœ‰ใ—ใฆใ„ใ‚‹ใ„ใ†ใ“ใจใ‚’ไฟ่จผใ™ใ‚‹ใ‚ณใ‚ฆใซใชใ‚Šใพใ™ใ€‚ S~ใŒYใฎๆ›–ๆ˜งใ•ใ‚’ๆธ›ใ‚‰ใ™ๅ ดๅˆใซใ€้ …ใฎๅ€คใŒๆธ›ๅฐ‘ใ—ใพใ™
  9. ่จญๅฎšใจใ—ใฆใ€Sใฎไธญใซใ€ๅค–้ƒจใฎๆƒ…ๅ ฑYใจใฏ็„ก้–ขไฟ‚ใชๆƒ…ๅ ฑZใŒใ‚ใ‚Šใ€ใใฎZใ‚’S~ใซๅซใ‚ใ‚‹ใ‹ใ€ๅซใ‚ใชใ„ใ‹ใจใ„ใ†ใ“ใจใ‚’่€ƒใˆใพใ™.
  10. ใพใšใ€Extractiveใชๆ‰‹ๆณ•ใงใ‚ใ‚‹BottleSumExใซใคใ„ใฆ่ชฌๆ˜Žใ—ใพใ™ใ€‚
  11. IBใฏๆกไปถไป˜ใ็ขบ็Ž‡ๅˆ†ๅธƒใซ้–ขใ™ใ‚‹็†่ซ–ใ ใฃใŸใฎใงใ™ใŒใ€ใใ‚Œใ‚’ๅ…ทไฝ“็š„ใซใฉใ†ๅฎŸ็พใ™ใ‚‹ใ‹ใจใ„ใ†ใ“ใจใ‚’ใŠ่ฉฑใ—ใ—ใพใ™ใ€‚ ใ“ใฎ่ซ–ๆ–‡ใงใฏใ€ๅค–้ƒจใฎๅค‰ๆ•ฐYใจใ—ใฆใ€ๆฌกใฎๆ–‡็ซ ใ‚’็”จใ„ใ‚‹ใ“ใจใจใ—ใพใ™ใ€‚ ๆกไปถไป˜ใ็ขบ็Ž‡ๅˆ†ๅธƒใจใ—ใฆใ€ใ‚ใ‚‹ๆ–‡็ซ ใซๅฏพๅฟœใ™ใ‚‹ใ‚ตใƒžใƒชใƒผใŒไธ€ๆ„ใซๅฎšใพใ‚‹ใ‚ˆใ†ใชๅˆ†ๅธƒใ‚’่จˆ็ฎ—ใ™ใ‚‹ใ“ใจใจใ—ใพใ™ใ€‚ ใ“ใฎๆกไปถใฎใ‚‚ใจใงใ€ๅ…ˆใปใฉใฎใ—ใใฏใ“ใฎใ‚ˆใ†ใซๆ›ธใๆ›ใˆใ‚‰ใ‚Œใพใ™ใ€‚ ใ“ใฎๅผใฎไธญใฎ็ขบ็ซ‹ใฎๅ€คใ‚’่จˆ็ฎ—ใ™ใ‚‹ใฎใซไบ‹ๅ‰ๅญฆ็ฟ’ๆธˆใฟใฎ่จ€่ชžใƒขใƒ‡ใƒซใ‚’็”จใ„ใพใ™ใ€‚
  12. ๆฌกใซใ‚ขใƒซใ‚ดใƒชใ‚บใƒ ใซใคใ„ใฆ่ชฌๆ˜Žใ—ใพใ™ใ€‚ ๅ€™่ฃœใจใชใ‚‹ๆ–‡็ซ ใ‹ใ‚‰ใ„ใใคใ‹ใฎๅ˜่ชžใ‚’ๅ‰Š้™คใ—ใฆใ„ใใ€extractive ใช่ฆ็ด„ใ‚’็”Ÿๆˆใ—ใพใ™ใ€‚ ใ“ใฎใจใใ€ๅ‰Š้™คใ™ใ‚‹ๅ˜่ชžใจใ—ใฆpruning term ใ‚’ๆ”นๅ–„ใ™ใ‚‹ใ‚ˆใ†ใช้ …ใ ใ‘ใ‚’้ธใณใพใ™ใ€‚ ใพใŸใ€ๅ€™่ฃœใ‚’ๅข—ใ‚„ใ™้š›ใซใ€ relevant term ใ‚’ๆ”นๅ–„ใ™ใ‚‹ใ‚ˆใ†ใช้ …ใ ใ‘ใ‚’้ธใณใพใ™ใ€‚
  13. ๅณใฎ่กจใŒๅ…ทไฝ“็š„ใชใ‚ขใƒซใ‚ดใƒชใ‚บใƒ ใงใ™ใ€‚ ใ‚คใƒณใƒ—ใƒƒใƒˆใ€ใƒใ‚คใƒ‘ใ‚‰ ใ‚ขใƒซใ‚ดใƒชใ‚บใƒ ใซใคใ„ใฆใ–ใฃใจ่ชฌๆ˜Žใ—ใพใ™ใ€‚ Cใจใ„ใ†ใฎใฏๅ€™่ฃœ้›†ๅˆใงใ™ใ€‚ๆœ€ๅˆใฏๅ‰Š้™คใ—ใฆใ„ใชใ„ๆ–‡็ซ ใ‚’ๅ…ฅใ‚Œใพใ™ใ€‚ ใใ“ใ‹ใ‚‰ใ€้•ทใ•ใ‚’ไธ€ใคใฅใค็Ÿญใใ—ใชใŒใ‚‰ๅ˜่ชžใ‚’ๅ‰Š้™คใ™ใ‚‹ใ‚ฟใƒผใ‚ฒใƒƒใƒˆใ‚’้ธใณใพใ™ใ€‚ ใ“ใ“ใงใ€ใ“ใฎๅ€คใฎ้ †ใง้™้ †ใซใ‚ฝใƒผใƒˆใ—ใฆใ€ไธŠใ‹ใ‚‰kๅ€‹้ธใณใพใ™ใ€‚ ใใ—ใฆใ“ใฎ้ธใ‚“ใ ใ‚ฟใƒผใ‚ฒใƒƒใƒˆใ‹ใ‚‰ใ„ใใคใ‹ใฎๅ˜่ชžใ‚’ๅ‰Š้™คใ—ใฆๆ–‡็ซ ใ‚’ไฝœใ‚Šใพใ™ใ€‚ ใ“ใฎs'' ใŒใ„ใ„ๆ„Ÿใ˜ใฎๆ–‡็ซ ใซใชใฃใฆใ„ใŸใ‚‰ใ€ๅ€™่ฃœ้›†ๅˆใซๅŠ ใˆใพใ™ใ€‚
  14. ่ค‡้›‘ใ•(ๆœช็ขบๅฎšใ•)ใ‚’่กจใ™ๆŒ‡ๆจ™