Like this presentation? Why not share!

# Bayesian word alignment for statistical machine translation

## on Nov 03, 2011

• 1,017 views

This is our reading group slides. For ACL 2011 Poster paper: Bayesian Word Alignment for Statistical Machine Translation

This is our reading group slides. For ACL 2011 Poster paper: Bayesian Word Alignment for Statistical Machine Translation

### Views

Total Views
1,017
Views on SlideShare
1,017
Embed Views
0

Likes
0
7
0

No embeds

### Report content

• Comment goes here.
Are you sure you want to

## Bayesian word alignment for statistical machine translationPresentation Transcript

• Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group
• Paper info
• Bayesian Word Alignment for Statistical Machine Translation
• ACL 2011 Short Paper
• With Source Code in Perl on 379 lines
• Authors
• Coskun Mermer
• Murat Saraclar
• Core Idea
• Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1
• Result
• Outperform classical EM in BLEU up to 2.99
• Effectively address the rare word problem
• Much smaller phrase table than EM
• Mathematics
• ( E , F ): parallel corpus
• e i , f j : i -th ( j -th) source (target) word in e ( f ), which contains I ( J ) words in corpus E ( F ).
• e 0 : Each E sentence contains “null” word
• V E ( V F ): size of source (target) vocabulary
• a ( A ): alignment for sentence (corpus)
• a j : f j has alignment a j for source word e aj
• T : parameter table, size is V E x V F
• t e,f = P(f|e) : word translation probability
• IBM Model 1 T as a random variable
• Dirichlet Distribution
• T ={ t e,f } is an exponential family distribution
• Specifically being multinomial distribution
• We choose the conjugate prior
• In the case of Dirichlet Distribution for computational convenience
• Dirichlet Distribution Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution Avoid rare words acting as “garbage collectors”
• Dirichlet Distribution sample the unknowns A and T in turn ¬j denotes the exclusion of the current value of aj .
• Algorithm A can be arbitrary, but normal EM output is better
• Results
•
•
• Code View bayesalign.pl
• Conclusions
• Outperform classical EM in BLEU up to 2.99
• Effectively address the rare word problem
• Much smaller phrase table than EM
• Shortcomings
• Too slow: 100 sentence pairs costs 18 mins
• Maybe can be speedup by parallel computing
• 3