• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Bayesian word alignment for statistical machine translation
 

Bayesian word alignment for statistical machine translation

on

  • 1,017 views

This is our reading group slides. For ACL 2011 Poster paper: Bayesian Word Alignment for Statistical Machine Translation

This is our reading group slides. For ACL 2011 Poster paper: Bayesian Word Alignment for Statistical Machine Translation

Statistics

Views

Total Views
1,017
Views on SlideShare
1,017
Embed Views
0

Actions

Likes
0
Downloads
7
Comments
0

0 Embeds 0

No embeds

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Bayesian word alignment for statistical machine translation Bayesian word alignment for statistical machine translation Presentation Transcript

    • Bayesian Word Alignment for Statistical Machine Translation Authors: Coskun Mermer, Murat Saraclar Present by Jun Lang 2011-10-13 I2R SMT-Reading Group
    • Paper info
      • Bayesian Word Alignment for Statistical Machine Translation
      • ACL 2011 Short Paper
      • With Source Code in Perl on 379 lines
      • Authors
        • Coskun Mermer
        • Murat Saraclar
    • Core Idea
      • Propose a Gibbs Sampler for Fully Bayesian Inference in IBM Model 1
      • Result
        • Outperform classical EM in BLEU up to 2.99
        • Effectively address the rare word problem
        • Much smaller phrase table than EM
    • Mathematics
      • ( E , F ): parallel corpus
      • e i , f j : i -th ( j -th) source (target) word in e ( f ), which contains I ( J ) words in corpus E ( F ).
      • e 0 : Each E sentence contains “null” word
      • V E ( V F ): size of source (target) vocabulary
      • a ( A ): alignment for sentence (corpus)
      • a j : f j has alignment a j for source word e aj
      • T : parameter table, size is V E x V F
      • t e,f = P(f|e) : word translation probability
    • IBM Model 1 T as a random variable
    • Dirichlet Distribution
      • T ={ t e,f } is an exponential family distribution
      • Specifically being multinomial distribution
      • We choose the conjugate prior
      • In the case of Dirichlet Distribution for computational convenience
    • Dirichlet Distribution Each source word type te is a distribution over the target vocabulary, to be a Dirichlet distribution Avoid rare words acting as “garbage collectors”
    • Dirichlet Distribution sample the unknowns A and T in turn ¬j denotes the exclusion of the current value of aj .
    • Algorithm A can be arbitrary, but normal EM output is better
    • Results
    •  
    •  
    • Code View bayesalign.pl
    • Conclusions
      • Outperform classical EM in BLEU up to 2.99
      • Effectively address the rare word problem
      • Much smaller phrase table than EM
      • Shortcomings
        • Too slow: 100 sentence pairs costs 18 mins
        • Maybe can be speedup by parallel computing
    • 3