SlideShare a Scribd company logo
A (10-slide) introduction to
Statistical Machine Translation
(SMT)
             Cuong Huy TO
Machine Translation
 MT (first demo by IBM on 1954) is:
    Commercially interesting (EU spends 1,000,000,000 €/year)
    Academically interesting (NLP technologies)
 What makes MT hard ?
    Word order: Monde entier       whole world
    Word sense: bank      rivière / banque
    Idioms: to kick the bucket     mourir
 Various approaches:
    Rule-based MT
    Example-based MT (1984): merges what is in memory
    Statistical MT (1993)


            Source: Wikipedia, John Hutchins, P.Koehn      2
Statistical Machine Translation
Uses a parallel corpus
  Europarl (European parliament)
  Hansard (Canadian parliament)
  Vermobil (German-english)
  United Nations (used by Google)
Learns
  Lexicon:               Le monde entier ne parle pas du problème
  Alignment:
                         The whole world is not talking about the problem
  Well-formedness:

 Advantages (over rule-based, example-based):
   good performance,
   quick implementation
   Can deal with noisy text
                                                                      3
Translation units & alignment ?
 Unit = WORD
   Seems to be more ready to the
   variability
   BUT more parameters



 Unit = PHRASE
   Seems to be less ready to the
   variability
   BUT less parameters




                                   4
Two approaches to SMT
        1     1       j
                                         I                             ^I
                                                                                           {
                                                                                         I J    }
Source s J = s ,..., s ,...s J ⇒ target t1 = t1,..., ti ,...t I . Find t 1 = arg max Pr(t1 | s1 )
                                                                                tI
                                                                                1
    Source-channel translation:
                                            {                                 }
                    ^I
                    t 1 = arg max Pr(t1I ). Pr( s1J | t1I )
                                   t1I

    Direct maximum entropy translation:
                                                               M
                                                         exp[∑ λm hm (t1I , s1J )]
                Pr(t1I | s1J ) = pλM (t1I | s1J ) =           m =1
                                                                 M

                                                      ∑ exp[∑ λ
                                   1
                                                                        h (t '1 , s1J )]
                                                                       m m
                                                                              I

                                                         I
                                                      t '1      m =1

                              ^I           ⎧M                   ⎫
                             t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬
                                      t1I  ⎩ m =1               ⎭


                                                                                               5
Source-channel vs. Maximum-Entropy
     Source language text                                            Source language text

        Preprocessing                                                     Preprocessing

          Global search                         Pr( t1I )              Global search                  λ1h1 (t1I , s1J )


 I                                                             ^I            ⎧M                   ⎫
     = arg max {Pr(t                     }                     t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬
^
t1                  I
                        ) • Pr(t | s )
                                I    J

                                                                             ⎩ m =1               ⎭
                   1           1    1
          t1I                                                           t1I

                                             Pr( s1J | t1I )                                          λM hM (t1I , s1J )




     Target language text                                             Target language text

        SC approach is one special case of ME one
        We will first train the SC, then integrate it in the ME
        framework
                                                                                                                    6
Alignment in source-channel approach
      Pr( s1J | t1I ) = ∑ Pr( s1J , a | t1I )
                        a

a : alignment between translation units
P(translation)=P(alignment) x P(lexicon)


     Word-to-Word alignment:                    IJ   possible alignments
          Source-to-target mapping: Model-1 (can be efficiently trained)
          Target-to-source fertility: Model-2 (cannot be efficiently trained)
          Training (EM): train Model-1, and use its parameters to initialize the
          parameters of Model-2

     Phrase-to-Phrase alignment:
          Got the Viterbi alignments from Word-to-Word training.
          Build consistent pairs of phrase-to-phrase alignment

                                                                                   7
The state-of-the-art SMT system
 Start with a source-channel system
   Train and find the word-to-word alignments
   Build phrase-to-phrase alignments and lexicon
 Then include the phrase-to-phrase model into a
 Maximum Entropy framework
   Train the scaling factor lambda using GIS (Generalize Iterative
   Scaling)
   Add more feature functions
      Many language models (trillions of words)
      P(s|t) and P(t|s) can be both used (more symmetrical translation)
      P(I) (word penalty)
      We can add many more features (conventional dictionary,…)


                                                                          8
Search in SMT is inexact
 Problem: Search is an NP-hard problem even with
 Model-1, mainly due to the need to re-order the
 target words   we need to approximate the search
 Solutions:
   A* search/integer programming: not efficient for long
   sentence
   Greedy search: severe search errors
   Beam search with pruning and a heuristic function
     Decision = Q(n)+H(n) where Q = past, H = future
     Good heuristic function leads to efficient quality/speed
 Conclusion: search is still far from good


                                                                9
Translation evaluation: many metrics

 Objective automatic scores: most count word-
 matching against a set of references
   WER / mWER / PER / mPER / BLEU / NIST
 Subjective score (judged by human):
   SSER/IER/ISER: meaning, syntax
   Adequacy-Fluency:
 We need automatic scoring to speed-up
 research, but no metric is persuasive enough
    Must use many metrics at the same time

                                             10
Issues in state-of-the-art SMT techniques
 Too much approximation in training and decoding
 Decoding implementation for a new model is
 expensive since search is heavily dependant on the
 model
 Phrase segmentation is still not powerful
 Phrase reordering is still not powerful
 Objective metrics are not highly correlated to
 adequacy and fluency
 Real challenge for computation:
   1012 words for language model
   108 words for translation model
   106 feature functions


                                                 11

More Related Content

What's hot

GJMA-4664
GJMA-4664GJMA-4664
Karin Quaas
Karin QuaasKarin Quaas
Karin Quaas
oxwocs
 
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority TransitionsSimulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
idescitation
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
Benjamin Sach
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
Rajendran
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming Distance
Benjamin Sach
 
Logic
LogicLogic
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
Christian Robert
 
Hardness of approximation
Hardness of approximationHardness of approximation
Hardness of approximation
carlol
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
Ashwin Rao
 
Lec09- AI
Lec09- AILec09- AI
Lec09- AI
drmbalu
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
ketanaka
 
Computational complexity in python
Computational complexity in pythonComputational complexity in python
Computational complexity in python
Jordi Soucheiron
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
shravs_188
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithms
Aakash deep Singhal
 
Writing a SAT solver as a hobby project
Writing a SAT solver as a hobby projectWriting a SAT solver as a hobby project
Writing a SAT solver as a hobby project
Masahiro Sakai
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
ManishPrajapati78
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
Pratik Mota
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
klirantga
 

What's hot (19)

GJMA-4664
GJMA-4664GJMA-4664
GJMA-4664
 
Karin Quaas
Karin QuaasKarin Quaas
Karin Quaas
 
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority TransitionsSimulating Turing Machines Using Colored Petri Nets with Priority Transitions
Simulating Turing Machines Using Colored Petri Nets with Priority Transitions
 
Pattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatchesPattern Matching Part Two: k-mismatches
Pattern Matching Part Two: k-mismatches
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Pattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming DistancePattern Matching Part Three: Hamming Distance
Pattern Matching Part Three: Hamming Distance
 
Logic
LogicLogic
Logic
 
RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010RSS discussion of Girolami and Calderhead, October 13, 2010
RSS discussion of Girolami and Calderhead, October 13, 2010
 
Hardness of approximation
Hardness of approximationHardness of approximation
Hardness of approximation
 
Value Function Geometry and Gradient TD
Value Function Geometry and Gradient TDValue Function Geometry and Gradient TD
Value Function Geometry and Gradient TD
 
Lec09- AI
Lec09- AILec09- AI
Lec09- AI
 
Jyokyo-kai-20120605
Jyokyo-kai-20120605Jyokyo-kai-20120605
Jyokyo-kai-20120605
 
Computational complexity in python
Computational complexity in pythonComputational complexity in python
Computational complexity in python
 
Pattern matching
Pattern matchingPattern matching
Pattern matching
 
Lecture 2 data structures and algorithms
Lecture 2 data structures and algorithmsLecture 2 data structures and algorithms
Lecture 2 data structures and algorithms
 
Writing a SAT solver as a hobby project
Writing a SAT solver as a hobby projectWriting a SAT solver as a hobby project
Writing a SAT solver as a hobby project
 
Data Structure and Algorithms
Data Structure and Algorithms Data Structure and Algorithms
Data Structure and Algorithms
 
Introduction to datastructure and algorithm
Introduction to datastructure and algorithmIntroduction to datastructure and algorithm
Introduction to datastructure and algorithm
 
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMYComputer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
Computer Science Engineering : Data structure & algorithm, THE GATE ACADEMY
 

Viewers also liked

Dog racing
Dog racingDog racing
Dog racing
Randy Kha
 
Front cover analysis (main task)
Front cover analysis (main task)Front cover analysis (main task)
Front cover analysis (main task)
EDPRICE93
 
2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published
ProActive Capital Resources Group
 
Cgs ve365 core presentation v4 july2010 copy
Cgs ve365 core presentation v4 july2010   copyCgs ve365 core presentation v4 july2010   copy
Cgs ve365 core presentation v4 july2010 copy
jhmcclain
 
Dissertationresearch2012
Dissertationresearch2012Dissertationresearch2012
Dissertationresearch2012
catherineca
 
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
KennisKring Amsterdam
 
Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)
ProActive Capital Resources Group
 
ART CART Artists
ART CART ArtistsART CART Artists
ART CART Artists
ARTCART
 
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg21. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
Roiamah Basri
 
Emulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzyciaEmulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzyciarobertartois
 
LXXVIII
LXXVIIILXXVIII
LXXVIII
fanny629
 
3 reports
3 reports3 reports
3 reports
Paulat65
 
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial InfarctionIntracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
ProActive Capital Resources Group
 
Play Media Talk - Invitation to Speak
Play Media Talk - Invitation to SpeakPlay Media Talk - Invitation to Speak
Play Media Talk - Invitation to Speak
Oscar Prajnaphalla
 
Vringo Presentation
Vringo PresentationVringo Presentation
Organet drejtuese te parlamentit
Organet drejtuese te parlamentitOrganet drejtuese te parlamentit
Organet drejtuese te parlamentit
Albania Energy Association
 
ICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSBICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSB
Stephen Lamb
 
Session 14 hydropower
Session 14   hydropowerSession 14   hydropower
Session 14 hydropower
Albania Energy Association
 
RSNA 2014
RSNA 2014RSNA 2014
RSNA 2014
Block Imaging
 
Kerx zerenex phase3
Kerx zerenex phase3Kerx zerenex phase3

Viewers also liked (20)

Dog racing
Dog racingDog racing
Dog racing
 
Front cover analysis (main task)
Front cover analysis (main task)Front cover analysis (main task)
Front cover analysis (main task)
 
2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published2012 MHCC Web resentation proposal-3-19-2012-published
2012 MHCC Web resentation proposal-3-19-2012-published
 
Cgs ve365 core presentation v4 july2010 copy
Cgs ve365 core presentation v4 july2010   copyCgs ve365 core presentation v4 july2010   copy
Cgs ve365 core presentation v4 july2010 copy
 
Dissertationresearch2012
Dissertationresearch2012Dissertationresearch2012
Dissertationresearch2012
 
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
Metro Campus:Geleyn Meijer, Economic Development Board Catch Up Sessie 24 jun...
 
Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)Unilife Corporation (NASDAQ: UNIS)
Unilife Corporation (NASDAQ: UNIS)
 
ART CART Artists
ART CART ArtistsART CART Artists
ART CART Artists
 
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg21. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
1. slide tot_pelaksanaan_dan_sistem_e-prestasibhg2
 
Emulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzyciaEmulsja grunt gotowa do uzycia
Emulsja grunt gotowa do uzycia
 
LXXVIII
LXXVIIILXXVIII
LXXVIII
 
3 reports
3 reports3 reports
3 reports
 
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial InfarctionIntracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
Intracoronary Bone Marrow Mononuclear Cells After Myocardial Infarction
 
Play Media Talk - Invitation to Speak
Play Media Talk - Invitation to SpeakPlay Media Talk - Invitation to Speak
Play Media Talk - Invitation to Speak
 
Vringo Presentation
Vringo PresentationVringo Presentation
Vringo Presentation
 
Organet drejtuese te parlamentit
Organet drejtuese te parlamentitOrganet drejtuese te parlamentit
Organet drejtuese te parlamentit
 
ICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSBICT Update - Pro-D Day VSB
ICT Update - Pro-D Day VSB
 
Session 14 hydropower
Session 14   hydropowerSession 14   hydropower
Session 14 hydropower
 
RSNA 2014
RSNA 2014RSNA 2014
RSNA 2014
 
Kerx zerenex phase3
Kerx zerenex phase3Kerx zerenex phase3
Kerx zerenex phase3
 

Similar to A short introduction to Statistical Machine Translation

Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
dbhanumahesh
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
shivkr15
 
Naive string matching algorithm
Naive string matching algorithmNaive string matching algorithm
Naive string matching algorithm
Kiran K
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
Garima Singh
 
Cognitive radio
Cognitive radioCognitive radio
Cognitive radio
Rached Abdel
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
hytae
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
BigMC
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
Hrishikesh Nair
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
Programming Exam Help
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
HamzaJaved306957
 
Lec17
Lec17Lec17
Arithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HSArithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HS
RoseEdenAbitong2
 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
Danbi Cho
 
LZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdfLZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdf
BEN-BRIGHT BENUWA
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
Joe Andelija
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
NETAJI SUBHASH ENGINEERING COLLEGE , KOLKATA
 

Similar to A short introduction to Statistical Machine Translation (16)

Chpt9 patternmatching
Chpt9 patternmatchingChpt9 patternmatching
Chpt9 patternmatching
 
lec17.ppt
lec17.pptlec17.ppt
lec17.ppt
 
Naive string matching algorithm
Naive string matching algorithmNaive string matching algorithm
Naive string matching algorithm
 
Modified Rabin Karp
Modified Rabin KarpModified Rabin Karp
Modified Rabin Karp
 
Cognitive radio
Cognitive radioCognitive radio
Cognitive radio
 
Pointing the Unknown Words
Pointing the Unknown WordsPointing the Unknown Words
Pointing the Unknown Words
 
Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011Olivier Cappé's talk at BigMC March 2011
Olivier Cappé's talk at BigMC March 2011
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Programming Exam Help
Programming Exam Help Programming Exam Help
Programming Exam Help
 
Sampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptxSampling and Reconstruction (Online Learning).pptx
Sampling and Reconstruction (Online Learning).pptx
 
Lec17
Lec17Lec17
Lec17
 
Arithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HSArithmetic sequence in elementary and HS
Arithmetic sequence in elementary and HS
 
Can recurrent neural networks warp time
Can recurrent neural networks warp timeCan recurrent neural networks warp time
Can recurrent neural networks warp time
 
LZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdfLZ77-Like-Compression-with-Fast-Random-Access.pdf
LZ77-Like-Compression-with-Fast-Random-Access.pdf
 
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
An Exact Exponential Branch-And-Merge Algorithm For The Single Machine Total ...
 
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM  IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION  ALGORITHM
IMPLEMENTATION OF DIFFERENT PATTERN RECOGNITION ALGORITHM
 

Recently uploaded

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
Neo4j
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
DianaGray10
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
Zilliz
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 

Recently uploaded (20)

20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
GraphSummit Singapore | Graphing Success: Revolutionising Organisational Stru...
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6UiPath Test Automation using UiPath Test Suite series, part 6
UiPath Test Automation using UiPath Test Suite series, part 6
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Infrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI modelsInfrastructure Challenges in Scaling RAG with Custom AI models
Infrastructure Challenges in Scaling RAG with Custom AI models
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 

A short introduction to Statistical Machine Translation

  • 1. A (10-slide) introduction to Statistical Machine Translation (SMT) Cuong Huy TO
  • 2. Machine Translation MT (first demo by IBM on 1954) is: Commercially interesting (EU spends 1,000,000,000 €/year) Academically interesting (NLP technologies) What makes MT hard ? Word order: Monde entier whole world Word sense: bank rivière / banque Idioms: to kick the bucket mourir Various approaches: Rule-based MT Example-based MT (1984): merges what is in memory Statistical MT (1993) Source: Wikipedia, John Hutchins, P.Koehn 2
  • 3. Statistical Machine Translation Uses a parallel corpus Europarl (European parliament) Hansard (Canadian parliament) Vermobil (German-english) United Nations (used by Google) Learns Lexicon: Le monde entier ne parle pas du problème Alignment: The whole world is not talking about the problem Well-formedness: Advantages (over rule-based, example-based): good performance, quick implementation Can deal with noisy text 3
  • 4. Translation units & alignment ? Unit = WORD Seems to be more ready to the variability BUT more parameters Unit = PHRASE Seems to be less ready to the variability BUT less parameters 4
  • 5. Two approaches to SMT 1 1 j I ^I { I J } Source s J = s ,..., s ,...s J ⇒ target t1 = t1,..., ti ,...t I . Find t 1 = arg max Pr(t1 | s1 ) tI 1 Source-channel translation: { } ^I t 1 = arg max Pr(t1I ). Pr( s1J | t1I ) t1I Direct maximum entropy translation: M exp[∑ λm hm (t1I , s1J )] Pr(t1I | s1J ) = pλM (t1I | s1J ) = m =1 M ∑ exp[∑ λ 1 h (t '1 , s1J )] m m I I t '1 m =1 ^I ⎧M ⎫ t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬ t1I ⎩ m =1 ⎭ 5
  • 6. Source-channel vs. Maximum-Entropy Source language text Source language text Preprocessing Preprocessing Global search Pr( t1I ) Global search λ1h1 (t1I , s1J ) I ^I ⎧M ⎫ = arg max {Pr(t } t 1 = arg max ⎨∑ λm hm (t1I , s1J )⎬ ^ t1 I ) • Pr(t | s ) I J ⎩ m =1 ⎭ 1 1 1 t1I t1I Pr( s1J | t1I ) λM hM (t1I , s1J ) Target language text Target language text SC approach is one special case of ME one We will first train the SC, then integrate it in the ME framework 6
  • 7. Alignment in source-channel approach Pr( s1J | t1I ) = ∑ Pr( s1J , a | t1I ) a a : alignment between translation units P(translation)=P(alignment) x P(lexicon) Word-to-Word alignment: IJ possible alignments Source-to-target mapping: Model-1 (can be efficiently trained) Target-to-source fertility: Model-2 (cannot be efficiently trained) Training (EM): train Model-1, and use its parameters to initialize the parameters of Model-2 Phrase-to-Phrase alignment: Got the Viterbi alignments from Word-to-Word training. Build consistent pairs of phrase-to-phrase alignment 7
  • 8. The state-of-the-art SMT system Start with a source-channel system Train and find the word-to-word alignments Build phrase-to-phrase alignments and lexicon Then include the phrase-to-phrase model into a Maximum Entropy framework Train the scaling factor lambda using GIS (Generalize Iterative Scaling) Add more feature functions Many language models (trillions of words) P(s|t) and P(t|s) can be both used (more symmetrical translation) P(I) (word penalty) We can add many more features (conventional dictionary,…) 8
  • 9. Search in SMT is inexact Problem: Search is an NP-hard problem even with Model-1, mainly due to the need to re-order the target words we need to approximate the search Solutions: A* search/integer programming: not efficient for long sentence Greedy search: severe search errors Beam search with pruning and a heuristic function Decision = Q(n)+H(n) where Q = past, H = future Good heuristic function leads to efficient quality/speed Conclusion: search is still far from good 9
  • 10. Translation evaluation: many metrics Objective automatic scores: most count word- matching against a set of references WER / mWER / PER / mPER / BLEU / NIST Subjective score (judged by human): SSER/IER/ISER: meaning, syntax Adequacy-Fluency: We need automatic scoring to speed-up research, but no metric is persuasive enough Must use many metrics at the same time 10
  • 11. Issues in state-of-the-art SMT techniques Too much approximation in training and decoding Decoding implementation for a new model is expensive since search is heavily dependant on the model Phrase segmentation is still not powerful Phrase reordering is still not powerful Objective metrics are not highly correlated to adequacy and fluency Real challenge for computation: 1012 words for language model 108 words for translation model 106 feature functions 11