Query and output generating words by querying distributed word representations for paraphrase generation

Query and Output: Generating Words by
Querying Distributed Word Representations
for Paraphrase Generation
Shuming Ma, Xu Sun1, Wei Li, Sujian Li, Wenjie Li, Xuancheng Ren
（NAACL 2018）
紹介者　B4 吉村綾馬

概要
● Word Embedding Attention Network (WEAN)という言い換え生成のモデルを提案
● 単語生成時にWord Embeddingを見ることで単語を意味をとらえたい
● ２つの言い換えタスクで実験
○ text simplification（テキスト平易化）
■ ２つのデータセットでそれぞれ BLEUが6.3, 5.5上がった
○ text summarization（テキスト要約）
■ ROUGE-2のF1スコアが5.7上がった
● ３つのデータセットでstate-of-the-artを上回った

イントロ
seq2seqモデルは言い換え生成で成功したが、２つの主な問題がある
1. 単語の意味ではなくて、単語自体とtraningデータでのパターンを記憶してる
○ 主な原因はdecoderで、意味情報をモデル化していない
○ 単語は互いに独立でスコアは無関係であるという仮定をしているため、ある単語とその同義語の単
語のスコアは異なる →単語間の関係を学習しているのではなく単語自体を学習してる
2. word generatorが非常に多くのパラメータを持つ（Vocab × hidden size）
○ パラメータが多いと学習速度が遅くなる
これらの問題を対処するため、WEANというモデルを提案

モデルの概要
Attention機構付きのencoder-decoderベースのモデル
encoder, decoderはLSTM

モデル Attention layer
ct: context vector
hi: hidden state of encoder
st: hidden state of decoder
g(st, hi): attentive score

モデル Query
qt: query
Wc: parameter
st: hidden state of decoder
ct: context vector
[st; ct]: concat

モデル key-value pair
wi: 候補単語（Value）
ei: 対応するembedding（Key）
n: 候補単語の数
● 候補単語はtraningセットから取り
出した最頻出のN個
● keyとdecoderの入力の
embeddingは共有（おそらく）
● word embeddingsはpretrainせず
ゼロから学習

モデル queryとkeyのスコア関数
Wa, Wq, We: parameter matrix
v^T: parameter vecctor
テスト時はスコアが最大の wtを予測
単語とし、eiを次のタイムステップの
LSTMの入力に入れる

実験1　Text Simplification
Data sets
● Parallel WIkipedia Simplification Corpus (PWKP) (Zhu et al., 2010)
○ train 89,042 pair
○ dev 205 pair
○ test 100 pair
● English Wikipedia and Simple English Wikipedia (EW-SEW) (Hwang et al.2015)
○ train 280,000 pair
○ dev 2000 pair
○ test 359 pair

Evaluation Metrics
● Automatic evaluation. BLEU（Paineni et al., 2002）
○ PWKP single reference
○ EW-SEW multi reference
● Human evaluation. (1 is very bad, 5 is very good)
○ Fluency（流暢性）　1 ~ 5
○ Adequacy（妥当性）1 ~ 5　
○ Simplicity（簡潔性） 1 ~ 5

Settings
● layer 2
● hidden size 256
● optimizer Adam
● batch size 64
● dropout rate 0.4
● Clipping gradients 5以上

Baselines
● Seq2seq
● NTS and NTS-w2v（Nisioi et al., 2017）
○ NTSはOpenNMT、NTS-w2vはword embeddingをpretrainしている
● DRESS and DRESS-LS（Zhang and Lapata, 2017）
○ DRESSは強化学習を使ったモデル、 DRESS-LSは語彙平易化のモデルを追加したモデル
● EncDecA（Zhang and Lapata, 2017）
○ アテンション付きのencoder-decoderモデル

Baselines
● PRBMT-R（Wubben et al., 2012）
○ フレーズベースのSMT
● Hybrid（Narayan and Gradent, 2014）
○ deep semanticsとモノリンガルMTのハイブリッド
● SBMT-SARI（Xu et al., 2016）
○ 構文ベースのモデル

Result　自動評価（BLEU）

Result　人手評価
referenceより良い結果が出てる
PWKPでは全ての項目でWEANが一番良い
EW-SEWでは平均してWEANが一番良い

実験2 Large Scale Text Summarization
Dataset
Large Scale Chinese Social Media Short Text Summarization Dataset（LCSTS）
2,400,000文ペア　
● Part1 2,400,591ペア train
● Part2 8,685ぺア validation
● Part3 725ペア test
Part2とPart3は1~5で自動評価されていて、スコア3以上のものを選択

Evaluation Matrics
ROUGE-1, ROUGE-2, ROUGE-L
Settings
● vocab size 4000
● embedding size 512
● hidden size 512
● layers of encoder 2
● layers of decoder 1
● batch size 64
● beam size 5

Baselines
● RNN and RNN-cont（Hu et al. 2015）
○ GRUベースのseq2seqモデル
● RNN-dist（Chen et al., 2016）
○ Attensionベースのseq2seqモデルにdistraction機構を追加したモデル
● CopyNet（Gu et al., 2016）
○ Copy機構を取り入れたモデル　入力テキストのコピーを生成するときに使える
● SRB（Ma et al., 2017）
○ 入力と出力の意味の妥当性を改善した seq2seqモデル
● DRGD（Li et al., 2017）（state-of-the-art）
○ variational autoencoderを組み合わせた
● Seq2seq

Results
提案手法WEANが全てのベースライン
に対して良い結果
state-of-the-artのDRGDよりも良い結果

分析　パラメータの数（出力層）
● seq2seq
○ PWKP, EWSEW 5000(vocab) × 256(hidden size) = 12,800,000
○ LCSTS 4000(vocab) × 512(hidden size) = 2,048,000
● WEAN 最大でもvocab sizeに関係なく2つのmatrixと1つのvectorしか持たない
○ PWKP, EWSEW 256 × 256 × 2 + 256 = 131,328
○ LCSTS 512 × 512 × 2 + 512 = 524,800

分析　学習速度
2, 3エポックで最大スコアに到達
seq2seqと比べて学習速度がとても早い

分析　Case Study1 （Text simplification）
NTS, NTS-w2v, PBMT-Rは重要な単語を含んでいない
SBMT-SARIは流暢だが、意味がSourceと異なる
WEANは流暢で簡潔でReferenceと同じである

分析　Case Study2（Text simplification）

まとめ
● WEANというword embeddingをクエリとして単語を生成するモデルを提案
● 単語予測にword embeddingを使うことで、単語の意味をとらえることができる
● テキスト平易化、テキスト要約で実験、３つのデータセットでSOTAを上回った
● seq2seqに比べてパラメータ数が減って学習速度が速くなった

Query and output generating words by querying distributed word representations for paraphrase generation

More Related Content

Similar to Query and output generating words by querying distributed word representations for paraphrase generation

More from ryoma yoshimura

Query and output generating words by querying distributed word representations for paraphrase generation