2. RECAP
“Simple Syntactic and Morphological Processing
Can Help English-Hindi Statistical Machine
Translation” by Bhattacharya et al.; ACL 2014.
“Statistical Machine Translation into a
Morphologically Complex Language” by Oflazer et
al.; CICLing 2008.
“Combining Morpheme-based Machine Translation
with Post-processing Morpheme Prediction” by
Sarkar et al.; ACL 2011
3. MORPHOLOGICAL DE-SEGMENTATION
De-segmentation is the process of converting
segmented words into their original surface form.
Concatenation, Rules or Table look-up
Segmentation-Sparsity reduction technique
eat+ing
dinner+s
4. LATTICE DE-SEGMENTATION FOR STATISTICAL
MACHINE TRANSLATION
By Mohammad Salameh, Colin Cherry, Grzegorz
Kondrak
Published in Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics 2014
English-to-Arabic and English-to-Finnish translation
5. LATTICE
A word lattice G = (V,E) is a directed acyclic graph
that formally is a weighted finite state automata
(FSA)
Exactly one node has no outgoing edges and it is
called as ‘end node’.
7. GENERALISING WORD LATTICE TRANSLATION
By Christopher Dyer, Smaranda Muresan, Philip
Resnik
In Proceedings of Association for Computational
Linguistics 2008
Chinese to English and Arabic to English
translation.
9. WORD LATTICE DECODING
2 classes of Translation models for lattice
translation:
Finite State Transducers with hierarchical Phrase based
models.
Synchronous CFG based decoder
10. LATTICE TRANSLATION WITH FST BASED
PHRASE BASED MODELS
Phrase based models
Splitting the sentence and creating phrases
Choosing the path from lattice
Moses phrase-based decoder to translate word
lattices
Left to right parsing of Lattice
11. SYNCHRONOUS CONTEXT FREE GRAMMAR
Source-Target synchronous rules
Parse the input using source language grammar
Simultaneously build a tree on target language
12. EFFECT OF WORD LATTICES
Improvement in BLEU score
Decrease in OOV words
Poor Coverage of Named Entities
13. LATTICE DE-SEGMENTATION FOR STATISTICAL
MACHINE TRANSLATION
By Mohammad Salameh, Colin Cherry, Grzegorz
Kondrak
Published in Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics 2014
English-to-Arabic and English-to-Finnish translation
14. GOAL
De-segment the decoder’s output lattice
Gain access to a compact, de-segmented view of a
large portion of the translation search space
Morphemes De-segmenting Transducer De-
segmented words
Lattice Specific Table Finite State Transducer
15. APPROACH:
Baseline (Without Segmentation)
1 best De-segmentation: Segmentation at Training
& De-segmentation after decoding
N-best De-segmentation: De-segments, augments
and re-ranks the decoder’s 1000-best list.
Lattice De-segmentation: Exponential number of
hypothesis
The search graph of a phrase-based decoder can be
interpreted as a lattice.
De-segmenting Transducer
18. PROPOSED APPROACH TO IMPROVE
TRANSLATION QUALITY
Translation from Multi-parallel sources
English, Hindi, Konkani & Marathi
Morphological Segmentation- to reduce data
sparsity
Morfessor / Morph Analyser
Morphological De-Segmentation
Named Entity Tagger
Cognates
19. PROPOSED WORK
To study and experiment the effect of Morphological
Segmentation & De-segmentation on Phrase Based
Statistical Machine Translation
Before evaluation
Before decoding
Before phrase extraction
Implement on English to Konkani and Hindi to
Konkani translation systems.
Evaluate with BLEU and METEOR
20. CURRENT STATUS
Got familiar with basics of Moses
Developed a Baseline System as suggested on
Moses website with their corpus
Developed basic English-Hindi translation system
using parallel data available online with BLEU score
5:31 only.
Hindi to Konkani Translation system for 3k
sentences of ILCI with BLEU score of 27.3
21. NEXT..
Get the parallel data in text files which is not in
Unicode format.
Align the data.
Identify the Named Entities.
Morphological Segmentation for Konkani.
Morphological De-segmentation for Konkani
Test the improvement in BLEU score.
22. REFERENCES
“Lattice De-segmentation for Statistical Machine
Translation” by Mohammad Salameh, ColinCherry,
Grzegorz Kondrak in ACL 2014
“Generalising Word Lattice Translation” by
Christopher Dyer, Smaranda Muresan, Philip
Resnik in ACL 2008.