Successfully reported this slideshow.

×

# A statistical approach to machine translation

P.F.Brown, J. Cocke, S.A. Della Pietra

P.F.Brown, J. Cocke, S.A. Della Pietra

## More Related Content

### A statistical approach to machine translation

1. 1. A Statistical Approach to Machine Translation Peter F. Brown, John Cocke, Stephen A. Della Pietra, Vincent J. Della Pietra, Fredrick Jeinek, John D. Lafferty, Robert L. Mercer, and Paul S. Roossin Published in: · Journal Computational Linguistics archive Volume 16 Issue 2, June 1990 Pages 79-85 MIT Press Cambridge, MA, USA
2. 2. Introduction • (S, T): Every pair of sentences • Pr(T|S): probability of T when presented with S • Given a sentence T in the target language, seek the sentence S from which the translator produced T. • Chance of error is minimized by choosing the most probable sentence S given T – In other words, to maximize Pr(S|T) – Pr(S|T) = Pr(S) Pr(T|S) / Pr(T) : Bayes’ theorem – Pr(S): language model probability of S – Pr(T|S): translation probability of T given S
3. 3. • Statistical Translation System requires: – A method for computing language model probabilities – A method for computing translation probabilities – A method for searching among possible source sentences S for the one that gives the greatest value for Pr(S)Pr(T|S)
4. 4. The Language Model • a word string S1, S2, … Sn • Pr(S1, S2, … Sn) = Pr(S1) Pr(S2|S1)…Pr(Sn|S1S2…Sn-1) • Given a history S1S2…Sj-1, you must know the probability of object word Sj • Histories are too long, probability parameters cannot be separated • To reduce parameters: – Categorize each history into same class – Probability of object depends on the history in same class
5. 5. The Translation Model • A word can be translated into more than one word – Fertility: num. of T words that an S word produces • Notation for alignment: – (Jean aime Marie | John(1) loves(2) Mary(3) ) • ( T | S) • Num. in S words are positions in T • Computing the probability of the alignment: – (Le chien est battu par Jean | John(6) does beat(3, 4) the(1) dog(2))
6. 6. The Translation Model • In English adjectives precede nouns, in French adj. follows nouns – Distortion: T word appears far from S word in alignment • Distortion probability: – Pr(i|j, l) • i: a target position • J: a source position • l: the target length
7. 7. Searching • Searching for the sentence S to maximize Pr(S)Pr(T|S) • Uses stack search – A list of partial alignment hypothesis – (Jean aime Marie| * ) • *: a place holder for unknown sequence of S – In iteration, extends most promising entries and adds to its hypothesis – Ends when complete alignment is the most promising significantly
8. 8. Parameter Estimation • Both LM and TM have many parameters • To estimate, needs pairs of translations • For this experiment, they used Canadian parliament’s records translated in English/French – Param. & LM/TM can be estimated from this
9. 9. Two Pilot Experiments • First experiment: to estimate params for TM • 9000 most common words in English/French from Hansard data
10. 10. Two Pilot Experiment • Second experiment: – French to English – 1,000 most frequently used English words – 1,700 most frequently used French words ( covered by the 1k English words) – Estimated 17 million parameters of translation model from 117,000 pairs of sentences – Est. bigram language model from 570,000 sentences from the English part of Hansard – Evaluation: • Exact: Decoded sentence was exactly the same • Alternate: same meaning, slightly different words • Different: not covey the same meaning as the translation • Wrong: makes a sense but not interpreted • Ungrammatical: grammatically deficient
11. 11. Translation Example/Result