1.
A Statistical Approach to
Machine Translation
Peter F. Brown, John Cocke, Stephen A. Della Pietra,
Vincent J. Della Pietra, Fredrick Jeinek, John D. Lafferty,
Robert L. Mercer, and Paul S. Roossin
Published in:
· Journal
Computational Linguistics archive
Volume 16 Issue 2, June 1990
Pages 79-85
MIT Press Cambridge, MA, USA
2.
Introduction
• (S, T): Every pair of sentences
• Pr(T|S): probability of T when presented with S
• Given a sentence T in the target language, seek
the sentence S from which the translator
produced T.
• Chance of error is minimized by choosing the
most probable sentence S given T
– In other words, to maximize Pr(S|T)
– Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) : Bayes’ theorem
– Pr(S): language model probability of S
– Pr(T|S): translation probability of T given S
3.
• Statistical Translation System requires:
– A method for computing language model
probabilities
– A method for computing translation probabilities
– A method for searching among possible source
sentences S for the one that gives the greatest
value for Pr(S)Pr(T|S)
4.
The Language Model
• a word string S1, S2, … Sn
• Pr(S1, S2, … Sn)
= Pr(S1) Pr(S2|S1)…Pr(Sn|S1S2…Sn-1)
• Given a history S1S2…Sj-1, you must know the
probability of object word Sj
• Histories are too long, probability parameters cannot
be separated
• To reduce parameters:
– Categorize each history into same class
– Probability of object depends on the history in same class
5.
The Translation Model
• A word can be translated into more than one word
– Fertility: num. of T words that an S word produces
• Notation for alignment:
– (Jean aime Marie | John(1) loves(2) Mary(3) )
• ( T | S)
• Num. in S words are positions in T
• Computing the probability of the alignment:
– (Le chien est battu par Jean | John(6) does beat(3, 4)
the(1) dog(2))
6.
The Translation Model
• In English adjectives precede nouns, in French
adj. follows nouns
– Distortion: T word appears far from S word in
alignment
• Distortion probability:
– Pr(i|j, l)
• i: a target position
• J: a source position
• l: the target length
7.
Searching
• Searching for the sentence S to maximize
Pr(S)Pr(T|S)
• Uses stack search
– A list of partial alignment hypothesis
– (Jean aime Marie| * )
• *: a place holder for unknown sequence of S
– In iteration, extends most promising entries and adds
to its hypothesis
– Ends when complete alignment is the most promising
significantly
8.
Parameter Estimation
• Both LM and TM have many parameters
• To estimate, needs pairs of translations
• For this experiment, they used Canadian
parliament’s records translated in
English/French
– Param. & LM/TM can be estimated from this
9.
Two Pilot Experiments
• First experiment: to estimate params for TM
• 9000 most common words in English/French
from Hansard data
10.
Two Pilot Experiment
• Second experiment:
– French to English
– 1,000 most frequently used English words
– 1,700 most frequently used French words ( covered by the 1k
English words)
– Estimated 17 million parameters of translation model from
117,000 pairs of sentences
– Est. bigram language model from 570,000 sentences from the
English part of Hansard
– Evaluation:
• Exact: Decoded sentence was exactly the same
• Alternate: same meaning, slightly different words
• Different: not covey the same meaning as the translation
• Wrong: makes a sense but not interpreted
• Ungrammatical: grammatically deficient
It appears that you have an ad-blocker running. By whitelisting SlideShare on your ad-blocker, you are supporting our community of content creators.
Hate ads?
We've updated our privacy policy.
We’ve updated our privacy policy so that we are compliant with changing global privacy regulations and to provide you with insight into the limited ways in which we use your data.
You can read the details below. By accepting, you agree to the updated privacy policy.