Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Emnlp読み会@2017 02-15

163 views

Published on

Incorporating Discrete Translation Lexicons into Neural Machine Translation

Published in: Education
  • Be the first to comment

  • Be the first to like this

Emnlp読み会@2017 02-15

  1. 1. EMNLP 2016 reading Incorporating Discrete Translation Lexicons into Neural Machine Translation author : Philip Arthur Graham Neubig, Satoshi Nakamura presentation : Sekizawa Yuuki Komachi lab M1 17/02/15 1
  2. 2. Incorporating Discrete Translation Lexicons into Neural Machine Translation • NMT often mistakes traislating low-frequency content words • lose sentence meaning • propose method • encode low-frequency words by lexicon probabilicity • 2methods : 1, use it as a bias 2, linear interpolation • result (En-Ja translation, use two corpora (KFTT, BTEC) ) • improve 2.0-2.3 BLEU, 0.13-0.44 NIST score • faster covergence time 17/02/15 2
  3. 3. NMT feature • NMT system • treat each word in the vocabulary as a vector of continuous- valued numbers • share statistical power between similar words (“dog” and “cat”) or contexts (“this is” and “that is”) • drawback : often mistranslate into words that seem natural in the context  do not reflect the content of the source sentence. • PBMT・SMT tend to rarely make this kind of mistake • base their translations on discrete phrase mappings • ensure that source words will be translated into a target word that has been observed as a translation at least once in the training data 17/02/15 3
  4. 4. NMT • source words • target words • translate probability 17/02/15 4 weight matrix bias vector fixed-width vector
  5. 5. Integrating Lexicons into NMT • Lexicon probability 17/02/15 5 lexical matrix by input sentence alignment probability v o c a b input sentence words
  6. 6. combine lexicon probability 1. model bias 1. linear interpolation 17/02/15 6 x : learnable parameter (begin : 0.5) prevent zero probability here : 0.001
  7. 7. Constructing Lexicon Probability 1. automatically learning • use EM algorithm • E : count expected count : • M : lexicon probability  2. manual • use dictionary entry as translation 3. hybrid 17/02/15 7 all possible count translation set of source word f
  8. 8. Experiment • Dataset : KFTT, BTEC • English to Japanese • tokenize, lowercase • length <= 50 • if low frequent word, it replace <unk> and translate in test (Luong et al (2015) ) • BTEC : less than 1, KFTT : less than 3 • Evaluation • BLEU, NIST, recall (rare words from references) 17/02/15 8 Data Corpu s Sentence Tokens En Ja Train BTEC KFTT 464K 377K 3.60M 4.97M 7.77M 8.04M Dev BTEC KFTT 510 1,160 3.8K 5.3K 24.3K 26.8K Test BTEC KFTT 508 1,169 3.8K 5.5K 26.0K 28.4K appear less than 8 times in target training corpus or references vocab-size source target BTEC 17.8k 21.8k KFTT 48.2k 49.1k
  9. 9. Experiment • method • pbmt : Koehn+ (2003) – use Moses • hiero (hierarchical pbmt) : Chiang+ (2007) – use travatar • attn : Bahdanau+ (2015) – attention NMT • auto-bias : proposed – automatic • hyb-bias : proposed – hybrid dictionary • Lexicon • auto : training data (separately) with GIZA++ • manual : English-Japanese dictionary – Eijiro : 104k entries • hyb : combine “auto” and “manual” lexicon 17/02/15 9
  10. 10. compare with related work † : p < 0.05, * : p < 0.10 17/02/15 10 +2.3 +0.44 +30%
  11. 11. compare with related work † : p < 0.05, * : p < 0.10 • KFTT : BLEU↑ NIST↓ (compare with SMT) • traditional SMT systems have a small advantage in translating low-frequency words 17/02/15 11
  12. 12. Translate examples 17/02/15 12
  13. 13. Training curves • in KFTT • blue : attn • orange : auto-bias • green : hyb-bias • first iteration : propose BLEU are higher than attn • iteration time : 167minutes (attn) 275minutes (auto-bias) • due to calculate and use lexical probability matrix 17/02/15 13
  14. 14. Attention matrices • proposed (bias) • more correct • lighter color : stronger word attention • red box : correct alignment 17/02/15 14
  15. 15. proposed method result first column without lexicon NMT bias ・man is less effective due to coverage for target domain words linear ・reverse to bias ・worse than bias due to constant interpolation coefficient 17/02/15 15
  16. 16. Incorporating Discrete Translation Lexicons into Neural Machine Translation • NMT often mistakes traislating low-frequency content words • propose method • encode low-frequency words by lexicon probabilicity • 2methods : 1, use it as a bias 2, linear interpolation • improve 2.0-2.3 BLEU, 0.13-0.44 NIST score • faster covergence time 17/02/15 16

×