Pos Integration to MOSES

Background
Englis
h
Tamil φ(e|f)
I நான் 0.66
went ப ாபேன் 0.54
went வந்பேன் 0.13
to stall கடைக்கு 0.47
to stall அங்காடிக்கு 0.21
I went நான் ப ாபேன் 0.42
w3 w1w2 score
கடைக்கு <s> நான் -1.400199
சாப் ிை நான் கடைக்கு -1.855783
ப ாபேன் கடைக்கு சாப் ிை -0.4191293
நான் ப ாபேன்
வந்பேன்
நான்
கடைக்கு
அங்காடிக்கு
ப ாபேன்
Getting Maximum
Probability

Key Challenges
• Challenge 1
• Word Reordering
• Challenge 2
• Unknown words
Using factored model to solve it

Word Reordering
• Example
I will arrive tomorrow afternoon
நாடை மாடை நான் வருபவன்

Unknown words
• Example
Word house completely independently of the
word houses.
• Training data do not add any knowledge
about the translation of houses.

What is Factored model
• Redefining a word from a single symbol to a
vector of factors
Traditional Factored
Word

Factored model Example
Went
Go
Verb
Past tense
Word
Lemma
POS
Case maker

• Components of Factored translation models
• Language model
• Translation model
• Reordering model
• Translation steps
• Generation steps
• Each component defines one or more feature
functions that are combined in a log-linear model:
Factored Translation

Methodology
• Parallel Corpus comparison
Traditional Factored

Methodology
• LM Comparison
• No changes. Same as traditional method/

Methodology
• Translation model
• Prepare on training- Run POS tagger on corpus to tagged
the data
• Establish word alignment and POS tagged alignment
using GIZA++
I Went To shop
நான்
ப ாபேன்
PRP V PREP NN
PRP
NN
V

Methodology
• According to the alignment of word and tag
source sentence will be reordered
• Extract phrase pairs that are consistent with
the word alignment
• Estimate scoring functions (conditional
phrase translation probabilities or lexical
translation probabilities)

Methodology
Phrase table comparison
• Traditional
• Factored

Decoding
Source phrase: boys|boy|NN|plural
• Translation: Mapping lemmas
boy → ஆண், யுவன் etc.
• Translation: Mapping morphology
NN||plural → NN|-e, NN|-o, etc.
• Generation: Generating surface forms
ஆண் NN|-s → ஆண்
ஆண் NN|-p → ஆண்கள்
யுவன் NN|-s → யுவன்
யுவன் |NN|-p → யுவன்கள்
• Translation options:
ஆண் NN|-s → ஆண்
ஆண் NN|-p → ஆண்கள்
யுவன் NN|-s → யுவன்
யுவன் |NN|-p → யுவன்கள்

Pos Integration to MOSES

Recommended

Recommended

More Related Content

Featured

Featured (20)

Pos Integration to MOSES