In this slideshow, I presented my research work in Machine Translation as my M.Tech Thesis. I developed English-Konkani Machine Translation system using various preprocessing and postprocessing steps so as to improve the quality of the translation.
4. Overview of Machine Translation
What is Machine Translation?
Types of Machine Translation
RBMT
SMT
Existing Machine Translation Tools: Anglabharati, Anubharati,
Anusaaraka, Mantra, MaTra, Shiva and Shakti, Anuvaadak, Sampark
etc.
What is Phrase Based Statistical Machine Translation?
Sunayana Gawde Machine Translation June 29, 2016 4 / 23
5. Challenges faced by English-IL Machine Translation
Word order mismatch
Richer Morphology in IL
Less amount of Parallel corpora
Sunayana Gawde Machine Translation June 29, 2016 5 / 23
7. Source Side Reordering
English: Subject-Verb-Object
Konkani: Subject-Object-Verb
English sentence is reordered in Subject-Object-Verb order
English parse tree is built using dependency parser and leaves are read
off after performing transformations to form a reordered English
sentence.
Source reordering in Indic NLP Library
Sunayana Gawde Machine Translation June 29, 2016 7 / 23
8. Morphological Segmentation
Morphological Segmentation is a process of splitting the words into
its corresponding morphemes.
Sparsity Reduction Technique for morphologically rich languages
Morphemes are the smallest unit of language which has meaning.
flower+s, run+ing, person+s, clean+li+ness
Source/Target side Morphological Segmentation
Morfessor
Word Segmentation in Indic NLP Library
Sunayana Gawde Machine Translation June 29, 2016 8 / 23
9. Transliteration
Transliteration is a transformation of text from one script to another
Script conversion for OOV words.
BrahmiNet for 18 languages(13 Indo-Aryan, 4 Dravidian and English)
Konkanverter for script conversion among Konkani scripts
Sunayana Gawde Machine Translation June 29, 2016 9 / 23
10. Pivoting
Pivoting takes advantage of third language and its available resources
to train the SMT system which results in improved performance.
Transfer Method or Sentence Translation
Corpus Synthesis
Table Induction or Phrase table Triangulation
Sunayana Gawde Machine Translation June 29, 2016 10 / 23
11. System Combination Techniques
Phrase table Triangulation
Linear Interpolation
Fill-up Interpolation
Ensemble Encoding
Sunayana Gawde Machine Translation June 29, 2016 11 / 23
12. Motivation
Relevant Work on Konkani MT or above Techniques:
Sata-Anuvaadak-Tackling Multiway Translation of Indian Languages by
Kunchukuttan et al. LREC 2014
Source Side Reordering and Transliteration
BLEU = 13.01
IIT Bombay SMT system for ICON 2014 tool contest by Kunchukuttan
et al.
Source side Reordering and transliteration
Source side word segmentation for IL-Hin (Not for Konkani)
There is no single system which makes use of combination of Source
side Reordering, Transliteration, Morphological Segmentation along
with Pivoting.
Sunayana Gawde Machine Translation June 29, 2016 12 / 23
13. Proposed Approach
Source Side Reordering for English
Morphological Segmentation for languages which are morphologically
rich
Pivoting with Hindi and Marathi as pivot languages
Transliteration as post-processing step
Ensemble encoding technique is used to combine various systems
where the translation which has highest probability is chosen from the
respective system.
Sunayana Gawde Machine Translation June 29, 2016 13 / 23
15. Experimental Setup
Linear Interpolation:
Direct English to Konkani Baseline system
Source Reordered English to Konkani system
Hindi Triangulated System
Source Reordered English-Hindi System
Hindi-Konkani Baseline System
Marathi Triangulated System
Source Reordered English-Marathi System
Marathi-Konkani Baseline System
Transliteration using Brahmi-Net
Sunayana Gawde Machine Translation June 29, 2016 15 / 23
21. Conclusion and Future Scope
With the successful implementation of Phrase Table Triangulation on
Source Reordered models and Transliteration using the parallel
corpora of English, Konkani, Hindi and Marathi we are able to get
improved BLEU score of 17.57.
Developing a WSD engine for Konkani will help English-Konkani
Machine Translation.
Developing a domain specific Machine Translation System
Sunayana Gawde Machine Translation June 29, 2016 21 / 23
22. References
1 R. Dabre, F. Cromieres, S. Kurohashi, and P. Bhattacharyya,
”Leveraging Small Multilingual Corpora for SMT Using Many Pivot
Languages,” NAACL 2014, 2014.
2 A. Vasijevs, R. Kalnis, M. Pinnis, and R. Skadis, Machine translation
for e-Governmentthe Baltic case.
3 A. Lopez, Statistical machine translation, ACM Computing Surveys,
vol. 40, no. 3, pp. 149, Aug. 2008.
4 Anoop Kunchukuttan, Pushpak Bhattacharyya, ”Tackling Multiway
Translation of Indian Languages”
Sunayana Gawde Machine Translation June 29, 2016 22 / 23