Hybrid Approach to English-Konkani Machine Translation

Hybrid Approach to English-Konkani Machine
Translation
Sunayana Gawde
M.Tech, Dept. Of Computer Science and Technology
Goa University
sunayanagawde17@gmail.com
June 29, 2016
Sunayana Gawde Machine Translation June 29, 2016 1 / 23

Overview
1 Overview of Machine Translation
2 Challenges faced by English-IL Machine Translation
3 Quality Enhancement Techniques
4 System Combination Techniques
5 Proposed Approach
6 Experimental Setup
7 Results
8 Conclusion

Overview of Machine Translation
What is Machine Translation?
Types of Machine Translation
RBMT
SMT
Existing Machine Translation Tools: Anglabharati, Anubharati,
Anusaaraka, Mantra, MaTra, Shiva and Shakti, Anuvaadak, Sampark
etc.
What is Phrase Based Statistical Machine Translation?

Challenges faced by English-IL Machine Translation
Word order mismatch
Richer Morphology in IL
Less amount of Parallel corpora

Quality Enhancement Techniques
Pre-processing steps
Source Side Reordering
Morphological Segmentation
Post-processing step
Transliteration

Source Side Reordering
English: Subject-Verb-Object
Konkani: Subject-Object-Verb
English sentence is reordered in Subject-Object-Verb order
English parse tree is built using dependency parser and leaves are read
oﬀ after performing transformations to form a reordered English
sentence.
Source reordering in Indic NLP Library

Morphological Segmentation
Morphological Segmentation is a process of splitting the words into
its corresponding morphemes.
Sparsity Reduction Technique for morphologically rich languages
Morphemes are the smallest unit of language which has meaning.
ﬂower+s, run+ing, person+s, clean+li+ness
Source/Target side Morphological Segmentation
Morfessor
Word Segmentation in Indic NLP Library

Transliteration
Transliteration is a transformation of text from one script to another
Script conversion for OOV words.
BrahmiNet for 18 languages(13 Indo-Aryan, 4 Dravidian and English)
Konkanverter for script conversion among Konkani scripts

Pivoting
Pivoting takes advantage of third language and its available resources
to train the SMT system which results in improved performance.
Transfer Method or Sentence Translation
Corpus Synthesis
Table Induction or Phrase table Triangulation

System Combination Techniques
Phrase table Triangulation
Linear Interpolation
Fill-up Interpolation
Ensemble Encoding

Motivation
Relevant Work on Konkani MT or above Techniques:
Sata-Anuvaadak-Tackling Multiway Translation of Indian Languages by
Kunchukuttan et al. LREC 2014
Source Side Reordering and Transliteration
BLEU = 13.01
IIT Bombay SMT system for ICON 2014 tool contest by Kunchukuttan
et al.
Source side Reordering and transliteration
Source side word segmentation for IL-Hin (Not for Konkani)
There is no single system which makes use of combination of Source
side Reordering, Transliteration, Morphological Segmentation along
with Pivoting.

Proposed Approach
Source Side Reordering for English
Morphological Segmentation for languages which are morphologically
rich
Pivoting with Hindi and Marathi as pivot languages
Transliteration as post-processing step
Ensemble encoding technique is used to combine various systems
where the translation which has highest probability is chosen from the
respective system.

System Architecture

Experimental Setup
Linear Interpolation:
Direct English to Konkani Baseline system
Source Reordered English to Konkani system
Hindi Triangulated System
Source Reordered English-Hindi System
Hindi-Konkani Baseline System
Marathi Triangulated System
Source Reordered English-Marathi System
Marathi-Konkani Baseline System
Transliteration using Brahmi-Net

Results(1/5)

Results(2/5)

Results(3/5)

Results(4/5)

Results(5/5)

Conclusion and Future Scope
With the successful implementation of Phrase Table Triangulation on
Source Reordered models and Transliteration using the parallel
corpora of English, Konkani, Hindi and Marathi we are able to get
improved BLEU score of 17.57.
Developing a WSD engine for Konkani will help English-Konkani
Machine Translation.
Developing a domain speciﬁc Machine Translation System

References
1 R. Dabre, F. Cromieres, S. Kurohashi, and P. Bhattacharyya,
”Leveraging Small Multilingual Corpora for SMT Using Many Pivot
Languages,” NAACL 2014, 2014.
2 A. Vasijevs, R. Kalnis, M. Pinnis, and R. Skadis, Machine translation
for e-Governmentthe Baltic case.
3 A. Lopez, Statistical machine translation, ACM Computing Surveys,
vol. 40, no. 3, pp. 149, Aug. 2008.
4 Anoop Kunchukuttan, Pushpak Bhattacharyya, ”Tackling Multiway
Translation of Indian Languages”

Thank You

Hybrid Approach to English-Konkani Machine Translation

Recommended

Recommended

More Related Content

More from Sunayana Gawde

More from Sunayana Gawde (8)

Recently uploaded

Recently uploaded (20)

Hybrid Approach to English-Konkani Machine Translation