STATISTICAL MACHINE
TRANSLATION
Presented by Asim Ullah Jan
Agenda
• What is Statistical Machine Translation?
• How to built an STM System?
• Sentence alignment
• Word alignment
What is Probibility?
History of Probability
Conditional Probability
Agenda (Cont..)
Language Model
Translation model
SMT Basic Architecture
Bayes Theorm
Chain Rule
SMT Model
• Fundimental Equation of Machine Translation
• P(E|F) = argmax P(F|E).P(E)
• Translation model: Find best target-language word or phrase for each input unit.
Result: a bag of words/phases.
• P(F|E) :-Probability that the input French gives this particular sentence
of English
• Language Model: Pick one word/phrase at a time from the bag of words/phrases
and build up a sentence.
• P(E) :-Probability governing how the English words are properly composed
Basic SMT Architecture
Baye’s Rule
𝑃 𝐴, 𝐵 = 𝑃(𝐵) ∙ 𝑃 𝐴 𝐵
𝑃 𝐴, 𝐵 = 𝑃(𝐴) ∙ 𝑃 𝐵 𝐴
𝑃(𝐵) ∙ 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∙ 𝑃 𝐵 𝐴
𝑃 𝐴 𝐵 =
𝑃 𝐴 ∙ 𝑃 𝐵 𝐴
𝑃(𝐵)
probability of A given B is equal to the probability of A, multiplied by the probability of B given
A, divided by the probability of B.
Thomas Bayes
1701 – 1761
Probabilistic Language Modeling
Goal: compute the probability of a sentence or sequence of words:
P(W) = P(w1,w2,w3,w4,w5…wn)
• Related task: probability of an upcoming word:
P(w5|w1,w2,w3,w4)
• A model that computes either of these:
P(W) or P(wn|w1,w2…wn-1) is called a language model.
How to compute P(W)
• How to compute this joint probability:
• P(its, water, is, so, transparent, that)
• Intuition: let’s rely on the Chain Rule of Probability
The Chain Rule
• Recall the definition of conditional probabilities
More variables:
P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C)
• The Chain Rule in General
P(x1,x2,x3,…,xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)
The Chain Rule applied to compute joint probability
of words in sentence
P(“its water is so transparent”) =
P(its) × P(water|its) × P(is|its water)
× P(so|its water is) × P(transparent|its water is so)
 
i
iin wwwwPwwwP )|()( 12121 
Statistical machine translation

Statistical machine translation

  • 1.
  • 2.
    Agenda • What isStatistical Machine Translation? • How to built an STM System? • Sentence alignment • Word alignment What is Probibility? History of Probability Conditional Probability
  • 3.
    Agenda (Cont..) Language Model Translationmodel SMT Basic Architecture Bayes Theorm Chain Rule
  • 4.
    SMT Model • FundimentalEquation of Machine Translation • P(E|F) = argmax P(F|E).P(E) • Translation model: Find best target-language word or phrase for each input unit. Result: a bag of words/phases. • P(F|E) :-Probability that the input French gives this particular sentence of English • Language Model: Pick one word/phrase at a time from the bag of words/phrases and build up a sentence. • P(E) :-Probability governing how the English words are properly composed
  • 5.
  • 6.
    Baye’s Rule 𝑃 𝐴,𝐵 = 𝑃(𝐵) ∙ 𝑃 𝐴 𝐵 𝑃 𝐴, 𝐵 = 𝑃(𝐴) ∙ 𝑃 𝐵 𝐴 𝑃(𝐵) ∙ 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∙ 𝑃 𝐵 𝐴 𝑃 𝐴 𝐵 = 𝑃 𝐴 ∙ 𝑃 𝐵 𝐴 𝑃(𝐵) probability of A given B is equal to the probability of A, multiplied by the probability of B given A, divided by the probability of B. Thomas Bayes 1701 – 1761
  • 7.
    Probabilistic Language Modeling Goal:compute the probability of a sentence or sequence of words: P(W) = P(w1,w2,w3,w4,w5…wn) • Related task: probability of an upcoming word: P(w5|w1,w2,w3,w4) • A model that computes either of these: P(W) or P(wn|w1,w2…wn-1) is called a language model.
  • 8.
    How to computeP(W) • How to compute this joint probability: • P(its, water, is, so, transparent, that) • Intuition: let’s rely on the Chain Rule of Probability
  • 9.
    The Chain Rule •Recall the definition of conditional probabilities More variables: P(A,B,C,D) = P(A)P(B|A)P(C|A,B)P(D|A,B,C) • The Chain Rule in General P(x1,x2,x3,…,xn) = P(x1)P(x2|x1)P(x3|x1,x2)…P(xn|x1,…,xn-1)
  • 10.
    The Chain Ruleapplied to compute joint probability of words in sentence P(“its water is so transparent”) = P(its) × P(water|its) × P(is|its water) × P(so|its water is) × P(transparent|its water is so)   i iin wwwwPwwwP )|()( 12121 