The paper presentation I did for HMM-based Alignment at IIT Bombay as a part of the Topics in NLP course.
The paper treats alignment as an HMM problem, which is a different approach compared to the IBM models approach which is predominantly used.
3. Roadmap: We Are Here
● Review of Alignment
● HMM-based Alignment
● Results and Examples
4. Review of Alignment
● In order to translate a French sentence F to
an English sentence E, the following
expression can be used:
E* = argmaxE P(E|F)
= argmaxE P(E) * P(F|E)
● To learn P(F|E), the concept of alignments is
used.
5. Review of Alignment
● Alignment refers to a correspondence
between E and F which indicates which word
in F is translated to a particular word in E.
● For Example:
पीटर
ज द
सोया
Peter slept
1
3
early
2
6. Alignment Models
Depending on the assumptions taken, there are
several possible alignment models:
● IBM Models (1 to 5)
● HMM-based Alignment Models
8. IBM Model 1
● Assumes all alignments are equally likely
● Assumes source word depends only on
target word
IBM Model 2
● Assumes alignments are more likely to “lie
along the diagonal”
9. Roadmap: We Are Here
● Review of Alignment
● HMM-based Alignment
● Results and Examples
11. HMM-based Alignment
● Assumes alignment depends only on
○ The previous alignment (not all previous)
○ The jump width
● Thus, in this model alignments are relative
13. Roadmap: We Are Here
● Review of Alignment
● HMM-based Alignment
● Results and Examples
14. Statistical Results:
Basic Framework
● Models compared:
○ IBM 1
○ IBM 2
○ HMM
● Corpora Used (German to French)
○ Avalanche Bulletins Corpus (News)
○ Vermobil Corpus (Spoken Dialog)
○ EuTrans Corpus (Travel & Tourism)
15. Statistical Results:
Basic Framework
● Training Process:
○ IBM 1: 10 iterations of EM
○ IBM 2: 5 iterations of Maximum Approximation
○ HMM: 5 iterations of Maximum Approximation
● Metric Used
○ Perplexity (Wikipedia: “a measurement of how well a
probability model predicts a sample”)
20. Intuitive Example:
पीटर घर लौटने पर ज द सोया
● IBM 2 stresses on diagonal alignments, so it
will find the correct alignment difficult, as all
alignments are nearly on the inverse
diagonal
● HMM only concentrates on previous
alignments and overall jump lengths, so this
alignment minimizes the total jump length
22. Intuitive Example:
पीटर बहु त ह ज द सोया
● The HMM model assumes that every source
word has a corresponding target word
● Moreover, empty word alignments are not
incorporated in the basic HMM model
● To model empty words an HMM of order 2 is
required
24. Intuitive Example:
पीटर आज कल ज द सोता है
● सोता है ↔sleeps can be handled by HMM
● आज कल↔these days requires multi-word
handling to defeat a translation like “today
tomorrow”
25. References
● HMM-based Word Alignment in Statistical
Translation (1996) by Stephan Vogel,
Hermann Ney, Christoph Tillman; COLING
‘96, Copenhagen
● The Mathematics of Statistical Machine Translation:
Parameter Estimation (1993) by Peter Brown, Stephen
Della-Pietra, Vincent Della-Pietra, Robert Mercer;
Journal of Computational Linguistics