6. Khalil Sima'an (UVA) Statistical Machine Translation

2,225 views

Published on

Published in: Technology, Business
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
2,225
On SlideShare
0
From Embeds
0
Number of Embeds
1,000
Actions
Shares
0
Downloads
61
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

6. Khalil Sima'an (UVA) Statistical Machine Translation

  1. 1. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Statistical Machine Translation Part I: Khalil Sima’an Data and Models Universiteit van Amsterdam Syntax Part II: Trevor Cohn Decoding and efficiency University of Sheffield
  2. 2. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Statistical Machine Translation: PART I Dr. Khalil Sima’an Statistical Language Processing and Learning Institute for Logic, Language and Computation Universiteit van Amsterdam Phrase-Based Models Limitations of PB Models Syntax Some slides use figures from Philipp Koehn, Barry Haddow and Sophie Arnoult
  3. 3. Data and Models: Structure of lecture Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax General statistical framework Word-based models: word alignments Phrase-based models: phrase-alignments Tree-based models: tree-alignments
  4. 4. Statistical Approach: Parallel Corpora Tutorial to Statistical Machine Translation Dr Khalil Sima’an Task: Translate a source sentence f to a target sentence e. Data: Parallel corpus (source-target sentence pairs). ‫ﻧﺤﻦ ﺍﻟﺸﻌﻮﺏ‬ ‫ﻭﺳﺎﺋﻂ ﺍﻹﻋﻼﻡ ﺍﻟﻤﺘﻌﺪﺩﺓ ﻭﺛﺎﺋﻖ ﻭﺧﺮﺍﺋﻂ ﻣﻨﺸﻮﺭﺍﺕ ﻭﻃﻮﺍﺑﻊ ﻭﻗﻮﺍﻋﺪ ﺍﻟﺒﻴﺎﻧﺎﺕ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﺗﻌﻤﻞ ﺑﺤﺚ‬ Word-Based Models ‫ﺍﻟﺴﻠﻢ ﻭﺍﻷﻣﻦ ﺍﻟﺘﻨﻤﻴﺔ ﺍﻻﻗﺘﺼﺎﺩﻳﺔ ﻭﺍﻻﺟﺘﻤﺎﻋﻴﺔ ﺣﻘﻮﻕ ﺍﻹﻧﺴﺎﻥ ﺍﻟﺸﺆﻭﻥ ﺍﻹﻧﺴﺎﻧﻴﺔ ﺍﻟﻘﺎﻧﻮﻥ ﺍﻟﺪﻭﻟﻲ‬ Daily Briefing│Press Releases│Radio, TV, Photo│Documents, Maps│Publications, Stamps, Databases│UN Works Peace & Security│ Economic & Social Development │ Human Rights │ Humanitarian Affairs │ International Law Welcome to the United Nations Alignment Symmetrization Situation in the Middle East UN News Centre Situation in Iraq About the United Nations Renewing the United Nations UN Action against Terrorism Main Bodies Issues on the UN Agenda Conferences & Events Phrase-Based Models Limitations of PB Models Syntax ‫ﺍﻷﻣﻴﻦ ﺍﻟﻌﺎﻡ‬ ‫ﺍﻷﻫﺪﺍﻑ ﺍﻹﻧﻤﺎﺋﻴﺔ ﻟﻸﻣﻢ‬ ‫ﺍﻟﺤﺎﻟﺔ ﻓﻲ ﺍﻟﺸﺮﻕ ﺍﻷﻭﺳﻂ‬ Secretary-General UN Millennium Development Goals ‫ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﺍﻟﺤﺎﻟﺔ ﻓﻲ ﺍﻟﻌﺮﺍﻕ‬ ‫ﻣﺮﻛﺰ ﺃﻧﺒﺎء ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﺗﺠﺪﻳﺪ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﻓﻲ ﻣﻮﺍﺟﻬﺔ ﺍﻹﺭﻫﺎﺏ‬ ‫ﻗﻀﺎﻳﺎ ﻋﻠﻰ ﺟﺪﻭﻝ ﺃﻋﻤﺎﻝ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﺍﻟﻤﺠﺘﻤﻊ ﺍﻟﻤﺪﻧﻲ / ﺍﻟﻤﺆﺳﺴﺎﺕ‬ ‫ﻣﻌﻠﻮﻣﺎﺕ ﻋﻦ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﺃﻫﻼ ﺑﻜﻢ ﻓﻲ ﻣﻮﻗﻊ ﺍﻷﻣﻢ‬ ‫ﺍﻷﺟﻬﺰﺓ ﺍﻟﺮﺋﻴﺴﻴﺔ‬ ‫ﺍﻟﻤﺘﺤﺪﺓ‬ ‫ﻣﺆﺗﻤﺮﺍﺕ ﻭﻣﻨﺎﺳﺒﺎﺕ‬ ‫ﺍﻟﺘﺠﺎﺭﻳﺔ‬ Civil Society & Business Member States UN Webcast General Assembly President CyberSchoolBus International Conference on Financing for Development, Doha, Qatar 29 November - 2 December 2008 Home │ Recent Additions │ Employment │ Procurement │ Comments │ Q&A │ UN System Sites │ Index │ Search ‫ | ﻋﺮﺑﻲ‬中文 | English | Français | Русский | Español Copyright, United Nations, 2000-2008 | Terms of Use | Privacy Notice | Help ‫ﺍﻟﺒﺚ ﺍﻟﺸﺒﻜﻲ‬ ‫ﺍﻟﺪﻭﻝ ﺍﻷﻋﻀﺎء‬ ‫ﺍﻟﺤﺎﻓﻠﺔ ﺍﻟﻤﺪﺭﺳﻴﺔ ﺍﻹﻟﻜﺘﺮﻭﻧﻴﺔ‬ ‫ﺭﺋﻴﺲ ﺍﻟﺠﻤﻌﻴﺔ ﺍﻟﻌﺎﻣﺔ‬ ‫ﻣﺆﺗﻤﺮ ﺍﻟﻤﺘﺎﺑﻌﺔ ﺍﻟﺪﻭﻟﻲ ﻟﺘﻤﻮﻳﻞ ﺍﻟﺘﻨﻤﻴﺔ ﺍﻟﻤﻌﻨﻲ‬ ‫ﺑﺎﺳﺘﻌﺮﺍﺽ ﺗﻨﻔﻴﺬ ﺗﻮﺍﻓﻖ ﺁﺭﺍء ﻣﻮﻧﺘﻴﺮﻱ‬ ‫ﺍﻟﺪﻭﺣﺔ-ﻗﻄﺮ‬ ‫92 ﺗﺸﺮﻳﻦ ﺍﻟﺜﺎﻧﻲ/ﻧﻮﻓﻤﺒﺮ - 2 ﻛﺎﻧﻮﻥ ﺍﻷﻭﻝ/ﺩﻳﺴﻤﺒﺮ‬ 2008 ‫ﺻﻔﺤﺔ ﺍﻻﺳﺘﻘﺒﺎﻝ ﺇﺿﺎﻓﺎﺕ ﺟﺪﻳﺪﺓ ﻓﺮﺹ ﻋﻤﻞ ﺷﻌﺒﺔ ﺍﻟﻤﺸﺘﺮﻳﺎﺕ ﻣﻼﺣﻈﺎﺕ ﺃﺳﺌﻠﺔ ﻭﺃﺟﻮﺑﺔ ﻣﻮﺍﻗﻊ ﻣﻨﻈﻮﻣﺔ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ ﻓﻬﺮﺱ ﺍﻟﻤﻮﻗﻊ‬ ‫ ﻋﺮﺑﻲ‬中文 English Français Русский Español ‫ﺻﻔﺤﺔ ﺍﻟﻤﺴﺎﻋﺪﺓ ﺷﺮﻭﻁ ﺍﺳﺘﺨﺪﺍﻡ ﺍﻟﻤﻮﻗﻊ ﺑﻴﺎﻥ ﺍﻟﺨﺼﻮﺻﻴﺎﺕ‬ 2008-2000 ‫ﺣﻘﻮﻕ ﺍﻟﻄﺒﻊ ﺍﻷﻣﻢ ﺍﻟﻤﺘﺤﺪﺓ‬ Source-Channel Approach: IBM Models (1990’s)
  5. 5. Parallel Corpus Example Tutorial to Statistical Machine Translation Dr Khalil Sima’an Parallel corpus C = a collection of text-chunks and their translations. Parallel corpora are the by-product of human translation. Every source chunk is paired with a target chunk. Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Dutch De prijs van het huis is gestegen. Het huis kan worden verkocht. Als het de marktprijs daalt zullen sommige gezinnen een zware tijd doormaken. . . . . . . . . . . . . . . . . . . English The price of the house has risen. The house can be sold. If the market price goes down, some families will go through difficult times. . . . . . . . . . . . . . . . . . . Syntax Hansards Canadian Parliament Proc. (English-French). European Parliament Proc. (23 languages). United Nations documents. Newspapers: Chinese-English; Arabic-English; Urdu-English. TAUS corpora.
  6. 6. Generative Source-Channel Framework Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Given source sentence f, select target sentence e L.M. T .M. arg maxe∈E(f) { P(e | f) } = arg maxe∈E(f) { P(e) × P(f | e) } Set E(f) is the set of hypothesized translations of f. P(f | e): accounts for divergence in . . . word order Limitations of PB Models morphology Syntax syntactic relations idiomatic ways of expression . . . How to estimate P(e | f)? Sparse-data problem!
  7. 7. Inducing The Structure of Translation Data Tutorial to Statistical Machine Translation e = Mary did not slap the green witch . ???? Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax f = Maria no dio una bofetada a la bruja verde . The latent structure of translation equivalence Graphical representations ∆f and ∆e for f and e. Relation a between ∆f and ∆e arg maxe∈E(f) { P(e | f) } = arg maxe∈E(f) { ∆f ,a,∆e P(e, ∆f , ∆e , a | f) } The difficult question: Which ∆f/e and a fit data best?
  8. 8. Structure in current models Tutorial to Statistical Machine Translation a ∆f → ∆e Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax In most current models structure of reordering: ∆f/e are structures over word positions. a is an alignment between groups of word positions in ∆f and ∆e . Challenge: Number of permutations of n words is n! Structure shows translation units composing together What are the atomic translation units? How these compose together efficiently? How to put probs. on these structures? Structure helps combat sparsity and complexity
  9. 9. Structure in Existing Models: Sketch l Tutorial to Statistical Machine Translation uage Lang aj Alignment a Word-based Dr Khalil Sima’an el on Mod e el e aj e1 1 "translation dictionary" m j fm fj 1 f1 Word-Based Models Alignment Symmetrization Phrase-based Phrase-Based Models Limitations of PB Models l uage aj Syntax Lang 1 Tree-based Mod el on e el e aj e1 m j fm fj 1 f1 Problem: No sufficient stats to estimate P(e | f) from data
  10. 10. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Word-Based Models: Word Alignments
  11. 11. Some History and References Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Statistical models with word-alignments: Brown, Cocke, Della Pietra, Della Pietra, Jelinek, Lafferty, Mercer and Roossin. A statistical approach to machine translation. Computational Linguistics, 1990. Brown, Della Pietra, Della Pietra and Mercer. The mathematics of statistical machine translation: parameter estimation., Computational Linguistics, 1993. Och and Ney: A Systematic Comparison of Various Statistical Alignment Models. Computational Linguistics, 2003.
  12. 12. Word-Based Models and Word-Alignment Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax a is a mapping between word positions. a e f ∆f and ∆e are sequences of word positions. m l e = e1 = e1 . . . el and f = f1 = f1 . . . fm A hidden word-alignment a: P(f | e) = P(a, f | e) a Assume that a target word-position ei translates into zero or more source word-positions a : {posf } → ({pose } ∪ {0}) ai or a(i), i.e., word position in e with which fi is aligned.
  13. 13. Word Alignment Example Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  14. 14. Word Alignment Example Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax (Le reste appartenait aux autochtones |
  15. 15. Word Alignment Example: Not covered in this setting Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  16. 16. Word Alignment Matrix Example Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  17. 17. Translation model with word alignment Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax arg maxe P(e | f) = arg maxe P(e)×P(f | e) P(f | e) = a P(a, f | e) = a P(a| e) × P(f | a, e) Questions How to parametrize the model? How are e, f and a composed from basic units? How to train the model? How to acquire word alignment? How to translate with this model? Decoding and computational issues (for second part)
  18. 18. Word-Alignment As Hidden Structure Tutorial to Statistical Machine Translation l ge M ngua La aj Dr Khalil Sima’an Alignment a Alignment Symmetrization e aj "translation dictionary" m j Phrase-Based Models Limitations of PB Models el e1 1 Word-Based Models on e odel fm fj 1 Syntax f1 We need to decompose The alignment a and the length m: P(a | e) “Translation dictionary” P(f | e, a)
  19. 19. Word Alignment Models: General Scheme Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment of positions in f with positions in e: m a = a1 = a1 . . . am Markov process over a m m l P(a1 , f1 | e1 ) = P(m | e) × m Alignment Symmetrization j=1 Phrase-Based Models Limitations of PB Models Syntax j−1 j−1 j j−1 P(aj | a1 , f1 , m, e) × P(fj | a1 , f1 , m, e) In words: to generate alignment a and foreign sentence f 1 2 3 Choose a length m for f Generate alignment aj given the preceding alignments, words in f, m, and e Generate word fj conditioned on structure so far and e. IBM models are obtained by simplifications of this formula.
  20. 20. IBM Model I Tutorial to Statistical Machine Translation m m P(a1 , f1 | e1 . . . el ) = P(m | e) × m Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models j=1 IBM Model I: Length: P(m | e) =≈ P(m | l) ≈ = A fixed probability . Align with uniform probability j with any aj in el1 or j−1 j−1 NULL: P(aj | a1 , f1 , m, e) ≈ (l + 1)−1 Limitations of PB Models Syntax j−1 j−1 j j−1 P(aj | a1 , f1 , m, e) × P(fj | a1 , f1 , m, e) Note that aj can be linked with l positions in e or with NULL. Lexicon: lexicon parameters πt (f | e) j j−1 P(fj | a1 , f1 , m, e) ≈ P(fj | eaj ) = πt (fj | eaj ) Parameters: and {πt (f | e) | f , e ∈ C}.
  21. 21. Sketch IBM Model I Tutorial to Statistical Machine Translation l Dr Khalil Sima’an L age angu aj Word-Based Models e el eaj 1 Alignment Symmetrization el on Mod e1 label every f position with an e position choose length m given l "translation dictionary" Phrase-Based Models m Limitations of PB Models j fm Syntax fj 1 f1
  22. 22. IBM Model I Parameters and Data Likelhood Tutorial to Statistical Machine Translation Dr Khalil Sima’an Data Likelihood: m a1 Word-Based Models l = Alignment Symmetrization Phrase-Based Models Limitations of PB Models m m P(a1 , f1 | e1 . . . el ) P(f |e) = Parameters: (l + 1)m × l m πt (fj | eaj ) ... a1 =0 am =0 j=1 and {πt (f | e) | f , e ∈ C}. Fix , i.e., in practice put a uniform probability over a range [1..m], for some natural number m. Syntax Dilemma To estimate these parameters we need word-alignment To get word-alignment we need these parameters.
  23. 23. IBM Model II Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Extends IBM Model I at alignment probs: m m P(a1 , f1 m | e1 . . . el ) ≈ × j−1 j−1 P(aj | a1 , f1 , m, e)×πt (fj | eaj ) j=1 IBM Model II: changes only one element in IBM Model I: IBM Model I does not take into account the position of words in both strings j−1 j−1 P(aj | a1 , f1 , m, e) = P(aj | j, l, m) := πA (aj | j, l, m) Where πA (.|.) are parameters to be learned from data. IBM Models III, IV and V concentrate on more complex alignments allowing, e.g., 1 − to − n (fertility)
  24. 24. IBM Model II Parameters Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax m m P(a1 , f1 | e1 . . . el ) ≈ × m πA (aj | j, l, m) × πt (fj | eaj ) j=1 Parameters: {πA (aj | j, l, m)} and {πt (fj | eaj )} Dilemma To estimate these parameters we need word-alignment To get word-alignment we need these parameters.
  25. 25. Estimating Model Parameters Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Maximum-Likelihood Estimation of model M on parallel corpus C arg max P(C | m) = arg max m∈M m∈M Pm (e | f) e,f in C Example IBM Model I: Model M is defined by model parameters. Data is incomplete: no closed form solution. Expectation-Maximization (EM) sketch Init: Set the parameters at some m0 and let i = 0 Repeat until convergence (in perplexity) EMi (C) = C completed using estimate mi EMi (C) contains mi -expectations over e, f, a : P(a | f, e) mi+1 = Relative Frequency Estimates from EMi (C).
  26. 26. EM for Lexicon and Word Alignment Probs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization ... la maison ... la maison blue ... la fleur ... Phrase-Based Models   ... the house ... the blue house ... the flower ...   Limitations of PB Models Syntax
  27. 27. EM for Lexicon and Word Alignment Probs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization ... la maison ... la maison blue ... la fleur ... Phrase-Based Models Limitations of PB Models Syntax     ... the house ... the blue house ... the flower ...
  28. 28. EM for Lexicon and Word Alignment Probs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization ... la maison ... la maison bleu ... la fleur ... Phrase-Based Models Limitations of PB Models Syntax     ... the house ... the blue house ... the flower ...
  29. 29. EM for Lexicon and Word Alignment Probs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization ... la maison ... la maison bleu ... la fleur ... Phrase-Based Models Limitations of PB Models Syntax     ... the house ... the blue house ... the flower ...
  30. 30. Translation Using EM Estimates Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Lexicon probability estimates: {πt (fj | eaj )} ˆ Alignment probabilities: {πA (aj | j, m, l)} ˆ Translation Model + Language Model + Decoder arg maxe P(e | f) = arg maxe P(e) × P(a, f | e) a
  31. 31. Viterbi Word-Alignment using EM estimates Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization After EM has stabilized on estimates {πt (fj | eaj )} ˆ For every f, e in C apply the following m m arg maxam P(a1 , f1 | e1 . . . el ) ≈ Phrase-Based Models Limitations of PB Models Syntax and {πA (aj | j, m, l)} ˆ 1 m arg maxam 1 πA (aj | j, m, l) × πt (fj | eaj ) ˆ ˆ × j=1
  32. 32. HMM Alignment Model: General Form Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax m m P(a1 , f1 | e1 . . . el ) ≈ × m j−1 j−1 P(aj | a1 , f1 , m, e)×πt (fj | eaj ) j=1 Words do not move independently of each other: condition word movement on previous word movement j−1 j−1 P(aj | a1 , f1 , m, e) ≈ P(aj | aj−1 , m)
  33. 33. IBM Model III (and IV): Example Tutorial to Statistical Machine Translation Dr Khalil Sima’an A hidden word-alignment a: P(f | e) = a P(a, f Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Estimate alignment + lexicon + reordering + fertility parameters. | e)
  34. 34. Word-based Models (Och Ney 2003) Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  35. 35. Word-Alignment As Hidden Structure: Sufficient? Tutorial to Statistical Machine Translation l ge M ngua La aj Dr Khalil Sima’an Alignment a Alignment Symmetrization e aj translation dictionary m j Phrase-Based Models Limitations of PB Models el e1 1 Word-Based Models on e odel fm fj 1 Syntax f1 We assumed alignment between words and dictionary: Alignment a and the length m: P(a | e) Dictionary P(f | e, a)
  36. 36. Limitations of Word-based Models Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Limitations of word-based translation: Many-to-one and many-to-many is common: “Makes more difficult”/bemoeilijkt “Dat richtte (hen) ten gronde”/”That destroyed (them)” Reordering takes place (often) by whole blocks. Reordering individual words increases ambiguity. “The (big heavy) cow/la vaca (pesada grande)” Translation works by “fixed expressions” (idiomatic). Concatenating word-translations increases ambiguity. Estimates of P(f | e) by word-based models are inaccurate. Instead of words as basic events: multi-word events in corpus.
  37. 37. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Obtaining Symmetrized Word Alignments
  38. 38. Asymetric Alignments: Limitations Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Word-based models presented so far are based on asymmetric word alignment. Each position i in f is aligned with at most one position in e: ai What about such word alignments? Phrase-Based Models Limitations of PB Models Syntax Or when a word in f translated into two or more in e?
  39. 39. Symmetrization Heursitics Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Obtain Af→e and Ae→f From Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
  40. 40. Symmetrization Heursitics Algoritm Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Obtain Af→e and Ae→f From Intersection Af→e ∩ Ae→f to Union Af→e ∪ Ae→f
  41. 41. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Phrase-based Models: Alignment between Phrases
  42. 42. Statistical “Memory-based” Translation Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Store arbitrary length source-target translation units from training parallel corpus. Translate new input by “covering” it with translation units replayed from memory. Idiomatic = Tiling: Phrase-Based SMT Assume word-alignment a is given in parallel corpus. Phrase-pair = contiguous source-target n, m -grams that are translational equivalents under a. Estimate phrase-pair probabilities Θ(f i | ei ) Translate f by “tiling it with phrases with order permutation”
  43. 43. PBSMT some references Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Alignment-template Approach to Stat. Machine Translation (RWTH Aachen 1999) Phrase-based statistical machine translation (Zens, Och and Ney 2002) Phrase based SMT (Koehn, Och and Marcu 2003) Joint Phrase-based SMT (Wang and Marcu 2005) Statistical Machine Translation. (Ph. Koehn, Cambridge University Press 2010) Relation to EBMT 1984; Data-Oriented Translation (2000).
  44. 44. Phrase-Based Models: Conceptual Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models T he president meets Segment foreign sentence f I into I phrases f 1 economic officials arg maxe P(e | f) = arg maxe P(e) × P(f | e) P(f | e) = I f 1 ,eI 1 I P(f 1 ,eI | e) 1 Limitations of PB Models Syntax Saudi I i=1 P(f i | ei )×d(starti −endi−1 −1) ph.table arg maxe P(f | e) ≈ arg max I f 1 ,eI 1 Dist.reord. I Θ(f i | ei )×d(starti −endi−1 −1) i=1 starti /endi are positions of first/last words of f i (translateing to ei ). d(x) = αx exponentially decaying in words skipped (α ∈ (0, 1]).
  45. 45. Phrase-Based Models: Linear-interpolation Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax T he president meets Segment foreign sentence f I into I phrases f 1 Saudi economic officials Log-linear interpolation of factors: score(e|f) = λf × log Hf (e, f) f∈F Where set F consists of: Bag of phrases translation = I i=1 Θ(f i | ei ) d() Phrases reordered with reordering model d(.) lm Language model (5-grams or even 7-grams). other Smoothing + length penalty terms.
  46. 46. Topics to discuss Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Phrase table extraction Estimating {Θ(f i | ei )} and {Θ(ei | f i )} Lexicalized and hierarchical phrase reordering models Other: phrase, length penalty . . . Log-linear interpolation and minimum error-rate training Decoding and optimization
  47. 47. Extracting phrase pairs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax A phrase pair f , e is consistent with alignment a iff Non-empty: at least one alignment pair from a is in f,e No foreign positions inside f , e aligned to positions outside it No english positions inside f , e aligned to positions outside it
  48. 48. Extracting phrase pairs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  49. 49. Extracting phrase pairs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  50. 50. Extracting phrase pairs Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  51. 51. Extracting phrases from permutations Tutorial to Statistical Machine Translation 1 3 4 2 5 1 3 4 2 5 1 2 3 4 5 1 2 3 4 5 1 3 4 2 5 1 3 4 2 5 1 2 3 4 5 1 2 3 4 5 1 3 4 2 5 1 3 4 2 5 1 2 3 4 5 1 2 3 4 5 Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  52. 52. Phrase pair weights Tutorial to Statistical Machine Translation Dr Khalil Sima’an Extract phrase pairs from corpus into multiset Tab = { f , e , freq(f , e)} Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Weights for f , e Θ(f | e) = freq(f ,e) freq(f ,e) f ,e ∈Tab Θ(e | f ) = freq(e,f ) freq(f ,e ) f ,e ∈Tab Smoothing with lexical word alignment estimates from IBM models
  53. 53. Distance-Based Reordering Sketch Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  54. 54. Lexicalized Reordering Sketch (Tillmann 2004) Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  55. 55. Limited generalization over parallel data (1) Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Non-productive Phrase Table: Phrase Variants? Morphological e.g., changing inflection, agreement Al$arekat Alhindiyya the-companies the-Indian the Indian companies $areka hindiyya company Indian (an) Indian company Syntactic e.g. adding adjective/proposition/adverbials the fish in the deep sea swims the fish swims Reordering minor reordering of same words not allowed In Arabic V-S-O and S-V-O are allowed. Semantic e.g. synonyms, paraphrases Non-productive Phrase Table = Data Sparseness
  56. 56. Limited generalization over parallel data (2) Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Reordering Local, monotone, almost non-lexicalized reordering. What about long range reordering? Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Five phrases need to be reversed: see Chiang 2007 (J. CL). Reordering target phrases with a coarse “source road map”?
  57. 57. Limitations: Data-Sparseness Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Non-productive phrase table + Local, Uncharted reordering ⇓ Data-sparseness: Shorter phrases will apply down to word level. ⇓ Shorter phrases combined assuming independence. ⇓ Target phrase selection hard due to large hypotheses lattice Target Language Model = Only “GLUE” over target phrases. The Shorter the Phrases, the Greater the Risk
  58. 58. Idiomatic Approach: GOOD, BAD and UGLY Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Phrases as atomic units Good: Less ambiguity in lexical choice and reordering. Match-Retrieve exactly is largely safe. Bad: Weak generalization over data. No phrase variants, weak reordering Ugly: Fall-back on shorter phrases downto word-to-word LM as “glue” is insufficient. Idiomatic approach does not alleviate data-sparseness How Should We Translate Novel Phrases?
  59. 59. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Towards the land of bi-trees Alignments between Tree Pairs, ITG, Hierarchical Models and Syntax
  60. 60. Hidden Structure of Translation: Tree Pairs Tutorial to Statistical Machine Translation l Dr Khalil Sima’an on e el uag Lang e aj aj Word-Based Models del e Mo 1 e1 Alignment Symmetrization Phrase-Based Models m Limitations of PB Models j Syntax fm fj 1 f1
  61. 61. Reordering n words Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models 1 Permutations of n words: n! 3 4 1 2 3 5 4 5 Surely not all permutations are needed! (Wu 1995) Use trees and allow permutations on the nodes? There is an exponential number of trees in n ITG hypothesis (Wu 1995) Limitations of PB Models Syntax 2 [] [] Assume binary trees with two operations [] 1 3 4 2 5 Phrase-based forms of ITG (Chiang 2005; 2007): Hiero
  62. 62. Syntax-Driven Phrase Translation Tutorial to Statistical Machine Translation Syntax-driven Re-Ordering Hierarchical (ITG) Dr Khalil Sima’an Linguistic Syntax XP XP Word-Based Models XP XP Alignment Symmetrization Phrase-Based Models 1 3 4 1 2 3 2 4 5 5 Limitations of PB Models Is translation syntactically cohesive? Syntax Reordering == Moving children in parse tree? Binary: monotone or inverted order at every node. Lexical elements can be phrase pairs. Covers word-alignments in parallel corpora?
  63. 63. Word order difference and syntax: Impression Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  64. 64. Phrase-based Hierarchical Model: Hiero (Chiang 2005; 2007) Tutorial to Statistical Machine Translation Extracting phrase-pairs with gaps (hierarchical trees): Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax X → X1 dar una bufetada a X2 / X1 slap X2 X → Maria no X la bruja verde / Mary did not X the green witch
  65. 65. ITG with syntactic labels Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax
  66. 66. Tutorial to Statistical Machine Translation Dr Khalil Sima’an Word-Based Models Alignment Symmetrization Phrase-Based Models Limitations of PB Models Syntax Part II: Trevor Cohn Decoding algorithms and efficiency

×