SlideShare a Scribd company logo
1 of 11
Download to read offline
A (10-slide) introduction to
Statistical Machine Translation
(SMT)
             Cuong Huy TO
Machine Translation
 MT (first demo by IBM on 1954) is:
    Commercially interesting (EU spends 1,000,000,000 €/year)
    Academically interesting (NLP technologies)
 What makes MT hard ?
    Word order: Monde entier       whole world
    Word sense: bank      rivière / banque
    Idioms: to kick the bucket     mourir
 Various approaches:
    Rule-based MT
    Example-based MT (1984): merges what is in memory
    Statistical MT (1993)


            Source: Wikipedia, John Hutchins, P.Koehn      2
Statistical Machine Translation
Uses a parallel corpus
  Europarl (European parliament)
  Hansard (Canadian parliament)
  Vermobil (German-english)
  United Nations (used by Google)
Learns
  Lexicon:               Le monde entier ne parle pas du problème
  Alignment:
                         The whole world is not talking about the problem
  Well-formedness:

 Advantages (over rule-based, example-based):
   good performance,
   quick implementation
   Can deal with noisy text
                                                                      3
Translation units & alignment ?
 Unit = WORD
   Seems to be more ready to the
   variability
   BUT more parameters



 Unit = PHRASE
   Seems to be less ready to the
   variability
   BUT less parameters




                                   4
Two approaches to SMT
        1     1       j
                                         I                             ^I
                                                                                           {
                                                                                         I J    }
Source s J = s ,..., s ,...s J ⇒ target t1 = t1,..., ti ,...t I . Find t 1 = arg max Pr(t1 | s1 )
                                                                                tI
                                                                                1
    Source-channel translation:
                                            {                                 }
                    ^I
                    t 1 = arg max Pr(t1I ). Pr( s1J | t1I )
                                   t1I

    Direct maximum entropy translation:
                                                               M
                                                         exp[∑ λm hm (t1I , s1J )]
                Pr(t1I | s1J ) = pλM (t1I | s1J ) =           m =1
                                                                 M

                                                      ∑ exp[∑ λ
                                   1
                                                                        h (t '1 , s1J )]
                                                                       m m
                                                                              I

                                                         I
                                                      t '1      m =1

                              ^I           ⎧M                   ⎫
                             t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬
                                      t1I  ⎩ m =1               ⎭


                                                                                               5
Source-channel vs. Maximum-Entropy
     Source language text                                            Source language text

        Preprocessing                                                     Preprocessing

          Global search                         Pr( t1I )              Global search                  λ1h1 (t1I , s1J )


 I                                                             ^I            ⎧M                   ⎫
     = arg max {Pr(t                     }                     t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬
^
t1                  I
                        ) • Pr(t | s )
                                I    J

                                                                             ⎩ m =1               ⎭
                   1           1    1
          t1I                                                           t1I

                                             Pr( s1J | t1I )                                          λM hM (t1I , s1J )




     Target language text                                             Target language text

        SC approach is one special case of ME one
        We will first train the SC, then integrate it in the ME
        framework
                                                                                                                    6
Alignment in source-channel approach
      Pr( s1J | t1I ) = ∑ Pr( s1J , a | t1I )
                        a

a : alignment between translation units
P(translation)=P(alignment) x P(lexicon)


     Word-to-Word alignment:                    IJ   possible alignments
          Source-to-target mapping: Model-1 (can be efficiently trained)
          Target-to-source fertility: Model-2 (cannot be efficiently trained)
          Training (EM): train Model-1, and use its parameters to initialize the
          parameters of Model-2

     Phrase-to-Phrase alignment:
          Got the Viterbi alignments from Word-to-Word training.
          Build consistent pairs of phrase-to-phrase alignment

                                                                                   7
The state-of-the-art SMT system
 Start with a source-channel system
   Train and find the word-to-word alignments
   Build phrase-to-phrase alignments and lexicon
 Then include the phrase-to-phrase model into a
 Maximum Entropy framework
   Train the scaling factor lambda using GIS (Generalize Iterative
   Scaling)
   Add more feature functions
      Many language models (trillions of words)
      P(s|t) and P(t|s) can be both used (more symmetrical translation)
      P(I) (word penalty)
      We can add many more features (conventional dictionary,…)


                                                                          8
Search in SMT is inexact
 Problem: Search is an NP-hard problem even with
 Model-1, mainly due to the need to re-order the
 target words   we need to approximate the search
 Solutions:
   A* search/integer programming: not efficient for long
   sentence
   Greedy search: severe search errors
   Beam search with pruning and a heuristic function
     Decision = Q(n)+H(n) where Q = past, H = future
     Good heuristic function leads to efficient quality/speed
 Conclusion: search is still far from good


                                                                9
Translation evaluation: many metrics

 Objective automatic scores: most count word-
 matching against a set of references
   WER / mWER / PER / mPER / BLEU / NIST
 Subjective score (judged by human):
   SSER/IER/ISER: meaning, syntax
   Adequacy-Fluency:
 We need automatic scoring to speed-up
 research, but no metric is persuasive enough
    Must use many metrics at the same time

                                             10
Issues in state-of-the-art SMT techniques
 Too much approximation in training and decoding
 Decoding implementation for a new model is
 expensive since search is heavily dependant on the
 model
 Phrase segmentation is still not powerful
 Phrase reordering is still not powerful
 Objective metrics are not highly correlated to
 adequacy and fluency
 Real challenge for computation:
   1012 words for language model
   108 words for translation model
   106 feature functions


                                                 11

More Related Content

What's hot

RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
Christian Robert
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
ketanaka
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
shravs_188
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithms
Aakash deep Singhal
 

What's hot (19)

GJMA-4664
GJMA-4664GJMA-4664
GJMA-4664
 
Karin Quaas
Karin QuaasKarin Quaas
Karin Quaas
 
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority TransitionsSimulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming Distance
 
Logic
LogicLogic
Logic
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
 
Hardness of approximation
Hardness of approximationHardness of approximation
Hardness of approximation
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
 
Lec09- AI
Lec09- AILec09- AI
Lec09- AI
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
Computational complexity in python
Computational complexity in pythonComputational complexity in python
Computational complexity in python
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithms
 
Writing a SAT solver as a hobby project
Writing a SAT solver as a hobby projectWriting a SAT solver as a hobby project
Writing a SAT solver as a hobby project
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
 

Viewers also liked

Front cover analysis (main task)
Front cover analysis (main task)Front cover analysis (main task)
Front cover analysis (main task)
EDPRICE93
 
Cgs ve365 core presentation v4 july2010 copy
Cgs ve365 core presentation v4 july2010   copyCgs ve365 core presentation v4 july2010   copy
Cgs ve365 core presentation v4 july2010 copy
jhmcclain
 
Dissertationresearch2012
Dissertationresearch2012Dissertationresearch2012
Dissertationresearch2012
catherineca
 
Emulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzyciaEmulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzycia
robertartois
 

Viewers also liked (20)

Dog racing
Dog racingDog racing
Dog racing
 
Front cover analysis (main task)
Front cover analysis (main task)Front cover analysis (main task)
Front cover analysis (main task)
 
2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published
 
Cgs ve365 core presentation v4 july2010 copy
Cgs ve365 core presentation v4 july2010   copyCgs ve365 core presentation v4 july2010   copy
Cgs ve365 core presentation v4 july2010 copy
 
Dissertationresearch2012
Dissertationresearch2012Dissertationresearch2012
Dissertationresearch2012
 
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
 
Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)
 
ART CART Artists
ART CART ArtistsART CART Artists
ART CART Artists
 
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg21. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
 
Emulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzyciaEmulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzycia
 
LXXVIII
LXXVIIILXXVIII
LXXVIII
 
3 reports
3 reports3 reports
3 reports
 
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial InfarctionIntracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
 
Play Media Talk - Invitation to Speak
Play Media Talk - Invitation to SpeakPlay Media Talk - Invitation to Speak
Play Media Talk - Invitation to Speak
 
Vringo Presentation
Vringo PresentationVringo Presentation
Vringo Presentation
 
Organet drejtuese te parlamentit
Organet drejtuese te parlamentitOrganet drejtuese te parlamentit
Organet drejtuese te parlamentit
 
ICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSBICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSB
 
Session 14 hydropower
Session 14   hydropowerSession 14   hydropower
Session 14 hydropower
 
RSNA 2014
RSNA 2014RSNA 2014
RSNA 2014
 
Kerx zerenex phase3
Kerx zerenex phase3Kerx zerenex phase3
Kerx zerenex phase3
 

Similar to A short introduction to Statistical Machine Translation

Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 

Similar to A short introduction to Statistical Machine Translation (16)

Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
 
Naive string matching algorithm
Naive string matching algorithmNaive string matching algorithm
Naive string matching algorithm
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
 
Cognitive radio
Cognitive radioCognitive radio
Cognitive radio
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
Lec17
Lec17Lec17
Lec17
 
Arithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HSArithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HS
 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
 
LZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdfLZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdf
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 

Recently uploaded

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Recently uploaded (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 

A short introduction to Statistical Machine Translation

  • 1. A (10-slide) introduction to Statistical Machine Translation (SMT) Cuong Huy TO
  • 2. Machine Translation MT (first demo by IBM on 1954) is: Commercially interesting (EU spends 1,000,000,000 €/year) Academically interesting (NLP technologies) What makes MT hard ? Word order: Monde entier whole world Word sense: bank rivière / banque Idioms: to kick the bucket mourir Various approaches: Rule-based MT Example-based MT (1984): merges what is in memory Statistical MT (1993) Source: Wikipedia, John Hutchins, P.Koehn 2
  • 3. Statistical Machine Translation Uses a parallel corpus Europarl (European parliament) Hansard (Canadian parliament) Vermobil (German-english) United Nations (used by Google) Learns Lexicon: Le monde entier ne parle pas du problème Alignment: The whole world is not talking about the problem Well-formedness: Advantages (over rule-based, example-based): good performance, quick implementation Can deal with noisy text 3
  • 4. Translation units & alignment ? Unit = WORD Seems to be more ready to the variability BUT more parameters Unit = PHRASE Seems to be less ready to the variability BUT less parameters 4
  • 5. Two approaches to SMT 1 1 j I ^I { I J } Source s J = s ,..., s ,...s J ⇒ target t1 = t1,..., ti ,...t I . Find t 1 = arg max Pr(t1 | s1 ) tI 1 Source-channel translation: { } ^I t 1 = arg max Pr(t1I ). Pr( s1J | t1I ) t1I Direct maximum entropy translation: M exp[∑ λm hm (t1I , s1J )] Pr(t1I | s1J ) = pλM (t1I | s1J ) = m =1 M ∑ exp[∑ λ 1 h (t '1 , s1J )] m m I I t '1 m =1 ^I ⎧M ⎫ t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬ t1I ⎩ m =1 ⎭ 5
  • 6. Source-channel vs. Maximum-Entropy Source language text Source language text Preprocessing Preprocessing Global search Pr( t1I ) Global search λ1h1 (t1I , s1J ) I ^I ⎧M ⎫ = arg max {Pr(t } t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬ ^ t1 I ) • Pr(t | s ) I J ⎩ m =1 ⎭ 1 1 1 t1I t1I Pr( s1J | t1I ) λM hM (t1I , s1J ) Target language text Target language text SC approach is one special case of ME one We will first train the SC, then integrate it in the ME framework 6
  • 7. Alignment in source-channel approach Pr( s1J | t1I ) = ∑ Pr( s1J , a | t1I ) a a : alignment between translation units P(translation)=P(alignment) x P(lexicon) Word-to-Word alignment: IJ possible alignments Source-to-target mapping: Model-1 (can be efficiently trained) Target-to-source fertility: Model-2 (cannot be efficiently trained) Training (EM): train Model-1, and use its parameters to initialize the parameters of Model-2 Phrase-to-Phrase alignment: Got the Viterbi alignments from Word-to-Word training. Build consistent pairs of phrase-to-phrase alignment 7
  • 8. The state-of-the-art SMT system Start with a source-channel system Train and find the word-to-word alignments Build phrase-to-phrase alignments and lexicon Then include the phrase-to-phrase model into a Maximum Entropy framework Train the scaling factor lambda using GIS (Generalize Iterative Scaling) Add more feature functions Many language models (trillions of words) P(s|t) and P(t|s) can be both used (more symmetrical translation) P(I) (word penalty) We can add many more features (conventional dictionary,…) 8
  • 9. Search in SMT is inexact Problem: Search is an NP-hard problem even with Model-1, mainly due to the need to re-order the target words we need to approximate the search Solutions: A* search/integer programming: not efficient for long sentence Greedy search: severe search errors Beam search with pruning and a heuristic function Decision = Q(n)+H(n) where Q = past, H = future Good heuristic function leads to efficient quality/speed Conclusion: search is still far from good 9
  • 10. Translation evaluation: many metrics Objective automatic scores: most count word- matching against a set of references WER / mWER / PER / mPER / BLEU / NIST Subjective score (judged by human): SSER/IER/ISER: meaning, syntax Adequacy-Fluency: We need automatic scoring to speed-up research, but no metric is persuasive enough Must use many metrics at the same time 10
  • 11. Issues in state-of-the-art SMT techniques Too much approximation in training and decoding Decoding implementation for a new model is expensive since search is heavily dependant on the model Phrase segmentation is still not powerful Phrase reordering is still not powerful Objective metrics are not highly correlated to adequacy and fluency Real challenge for computation: 1012 words for language model 108 words for translation model 106 feature functions 11