Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Acl reading@2016 10-26

239 views

Published on

Neural Machine Translation of Rare Words with Subword Units

Published in: Education
  • Be the first to comment

  • Be the first to like this

Acl reading@2016 10-26

  1. 1. ACL 2016 reading Neural Machine Transla8on of Rare Words with Subword Units author : Rico Sennrich, Barry Haddow , Alexandra Birch presenta8on : Sekizawa Yuuki Komachi lab M1 16/10/26 1
  2. 2. Neural Machine Transla8on of Rare Words with Subword Units •  NMT : fixed vocabulary •  transla8on : open-vocabulary àNMT have to address out-of-vocabulary(OOV) such as rare and unknown words •  propose method •  encode OOV words as sequences of subword units •  result(BLEU, WMT2015, compare with baseline) •  Eng-Ger : +1.1, Eng-Rus : +1.3 •  main contribu8on •  open vocabulary NMT by encoding words via subword units •  adapt byte pair encoding to word segmenta8on 16/10/26 2
  3. 3. transparent word category to translate •  name en88es •  copy src à trg •  need transcrip8on (if alphabets or syllabraries differ) •  cognates, loanwords •  character-level differ •  morphologically complex words •  mul8ple morphemes •  tranlsate separately 16/10/26 3
  4. 4. related work •  Durrani et al. 2014 •  copy unknown words (alphabet is shared) •  translitera8on is required (alphabets differ) •  Mikolov et al. 2012 •  inves8gate subword language models •  propose to use syllables (speech recogni8on) 16/10/26 4
  5. 5. byte pair encoding(BPE) (Gage, 1994) •  BPE : simple data compression technique •  itera8vely replace the most frequent pair of bytes in a with a single, unused byte •  this paper •  merge characters or character sequences •  most frequent pair (‘A’,’B’) à ‘AB’ •  don’t cross word boundary (for efficiency) •  aden8on model operates on variable-length units 16/10/26 5
  6. 6. BPE example •  learning •  word:freq : {low:5, lowest:2, newer:6, wider:3} •  marge & count 1.  ‘r’ ‘</w>’ : 9 à marge’r</w>’ 2.  ‘e’ ‘r</w>’ : 9 àmarge’er</w>’ 3.  ‘l’ ‘o’ : 7 àmarge’lo’ 4.  ‘lo’ ‘w’ : 7 àmarge’low’ à OOV : ‘lower’ segmented ‘low er</w>’ 16/10/26 6
  7. 7. Evalua8on •  data : shared transla8on task of WMT 2015 •  En-Ge train : 4.2m sentence, 100m tokens •  En-Ru train : 2.6m sentence, 50m tokens •  dev : newstest2013, test : newstest2015 •  use BLEU, CHR F3, character ngram F3 16/10/26 7
  8. 8. segmenta8on sta8cs (train) number of unknown tokens in newstest2013 16/10/26 8 segmenta8on technique in SMT 59,500 merge 89,500 merge unsegmented words
  9. 9. result(En-Ge) •  Wunk : word-level model OOV output is UNK •  Wdict : Wunk with a back-off dict to rare words (baseline) •  C2-50k : character bigrams with 50,000 unsegmented words •  BPE-J90k : learning BPE symbols on vocab union •  BPE-60k : learning BPE symbols separately 16/10/26 9
  10. 10. result(En-Ge) •  words : 44,085 •  not in top 50,000 words : 2,900 •  OOV : 1,168 16/10/26 10
  11. 11. result(En-Ge) •  words : 55,654 •  not in top 50,000 words : 5,442 •  OOV : 851 16/10/26 11
  12. 12. transla8on example En – Ge En- Ru 16/10/26 12
  13. 13. Neural Machine Transla8on of Rare Words with Subword Units •  main contribu8on •  capable of open-vocabulary in NMT •  represent OOV as a sequence of subword units •  using byte pair encoding •  simple and effec8ve than back-off model •  future work •  learn op8oal vocab size for a transla8on task •  ex: language pair, amount of training data… 16/10/26 13

×