SlideShare a Scribd company logo
Hybrid Approach to English-Konkani Machine
Translation
Sunayana Gawde
M.Tech, Dept. Of Computer Science and Technology
Goa University
sunayanagawde17@gmail.com
June 29, 2016
Sunayana Gawde Machine Translation June 29, 2016 1 / 23
Sunayana Gawde Machine Translation June 29, 2016 2 / 23
Overview
1 Overview of Machine Translation
2 Challenges faced by English-IL Machine Translation
3 Quality Enhancement Techniques
4 System Combination Techniques
5 Proposed Approach
6 Experimental Setup
7 Results
8 Conclusion
Sunayana Gawde Machine Translation June 29, 2016 3 / 23
Overview of Machine Translation
What is Machine Translation?
Types of Machine Translation
RBMT
SMT
Existing Machine Translation Tools: Anglabharati, Anubharati,
Anusaaraka, Mantra, MaTra, Shiva and Shakti, Anuvaadak, Sampark
etc.
What is Phrase Based Statistical Machine Translation?
Sunayana Gawde Machine Translation June 29, 2016 4 / 23
Challenges faced by English-IL Machine Translation
Word order mismatch
Richer Morphology in IL
Less amount of Parallel corpora
Sunayana Gawde Machine Translation June 29, 2016 5 / 23
Quality Enhancement Techniques
Pre-processing steps
Source Side Reordering
Morphological Segmentation
Post-processing step
Transliteration
Sunayana Gawde Machine Translation June 29, 2016 6 / 23
Source Side Reordering
English: Subject-Verb-Object
Konkani: Subject-Object-Verb
English sentence is reordered in Subject-Object-Verb order
English parse tree is built using dependency parser and leaves are read
off after performing transformations to form a reordered English
sentence.
Source reordering in Indic NLP Library
Sunayana Gawde Machine Translation June 29, 2016 7 / 23
Morphological Segmentation
Morphological Segmentation is a process of splitting the words into
its corresponding morphemes.
Sparsity Reduction Technique for morphologically rich languages
Morphemes are the smallest unit of language which has meaning.
flower+s, run+ing, person+s, clean+li+ness
Source/Target side Morphological Segmentation
Morfessor
Word Segmentation in Indic NLP Library
Sunayana Gawde Machine Translation June 29, 2016 8 / 23
Transliteration
Transliteration is a transformation of text from one script to another
Script conversion for OOV words.
BrahmiNet for 18 languages(13 Indo-Aryan, 4 Dravidian and English)
Konkanverter for script conversion among Konkani scripts
Sunayana Gawde Machine Translation June 29, 2016 9 / 23
Pivoting
Pivoting takes advantage of third language and its available resources
to train the SMT system which results in improved performance.
Transfer Method or Sentence Translation
Corpus Synthesis
Table Induction or Phrase table Triangulation
Sunayana Gawde Machine Translation June 29, 2016 10 / 23
System Combination Techniques
Phrase table Triangulation
Linear Interpolation
Fill-up Interpolation
Ensemble Encoding
Sunayana Gawde Machine Translation June 29, 2016 11 / 23
Motivation
Relevant Work on Konkani MT or above Techniques:
Sata-Anuvaadak-Tackling Multiway Translation of Indian Languages by
Kunchukuttan et al. LREC 2014
Source Side Reordering and Transliteration
BLEU = 13.01
IIT Bombay SMT system for ICON 2014 tool contest by Kunchukuttan
et al.
Source side Reordering and transliteration
Source side word segmentation for IL-Hin (Not for Konkani)
There is no single system which makes use of combination of Source
side Reordering, Transliteration, Morphological Segmentation along
with Pivoting.
Sunayana Gawde Machine Translation June 29, 2016 12 / 23
Proposed Approach
Source Side Reordering for English
Morphological Segmentation for languages which are morphologically
rich
Pivoting with Hindi and Marathi as pivot languages
Transliteration as post-processing step
Ensemble encoding technique is used to combine various systems
where the translation which has highest probability is chosen from the
respective system.
Sunayana Gawde Machine Translation June 29, 2016 13 / 23
System Architecture
Sunayana Gawde Machine Translation June 29, 2016 14 / 23
Experimental Setup
Linear Interpolation:
Direct English to Konkani Baseline system
Source Reordered English to Konkani system
Hindi Triangulated System
Source Reordered English-Hindi System
Hindi-Konkani Baseline System
Marathi Triangulated System
Source Reordered English-Marathi System
Marathi-Konkani Baseline System
Transliteration using Brahmi-Net
Sunayana Gawde Machine Translation June 29, 2016 15 / 23
Results(1/5)
Sunayana Gawde Machine Translation June 29, 2016 16 / 23
Results(2/5)
Sunayana Gawde Machine Translation June 29, 2016 17 / 23
Results(3/5)
Sunayana Gawde Machine Translation June 29, 2016 18 / 23
Results(4/5)
Sunayana Gawde Machine Translation June 29, 2016 19 / 23
Results(5/5)
Sunayana Gawde Machine Translation June 29, 2016 20 / 23
Conclusion and Future Scope
With the successful implementation of Phrase Table Triangulation on
Source Reordered models and Transliteration using the parallel
corpora of English, Konkani, Hindi and Marathi we are able to get
improved BLEU score of 17.57.
Developing a WSD engine for Konkani will help English-Konkani
Machine Translation.
Developing a domain specific Machine Translation System
Sunayana Gawde Machine Translation June 29, 2016 21 / 23
References
1 R. Dabre, F. Cromieres, S. Kurohashi, and P. Bhattacharyya,
”Leveraging Small Multilingual Corpora for SMT Using Many Pivot
Languages,” NAACL 2014, 2014.
2 A. Vasijevs, R. Kalnis, M. Pinnis, and R. Skadis, Machine translation
for e-Governmentthe Baltic case.
3 A. Lopez, Statistical machine translation, ACM Computing Surveys,
vol. 40, no. 3, pp. 149, Aug. 2008.
4 Anoop Kunchukuttan, Pushpak Bhattacharyya, ”Tackling Multiway
Translation of Indian Languages”
Sunayana Gawde Machine Translation June 29, 2016 22 / 23
Thank You
Sunayana Gawde Machine Translation June 29, 2016 23 / 23

More Related Content

More from Sunayana Gawde

Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
Sunayana Gawde
 
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslationEffectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Sunayana Gawde
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domain
Sunayana Gawde
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context Trees
Sunayana Gawde
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVAL
Sunayana Gawde
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
Sunayana Gawde
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part I
Sunayana Gawde
 
My NLP seminars
My NLP seminarsMy NLP seminars
My NLP seminars
Sunayana Gawde
 

More from Sunayana Gawde (8)

Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
 
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslationEffectof morphologicalsegmentation&de segmentationonmachinetranslation
Effectof morphologicalsegmentation&de segmentationonmachinetranslation
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domain
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context Trees
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVAL
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part I
 
My NLP seminars
My NLP seminarsMy NLP seminars
My NLP seminars
 

Recently uploaded

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

Hybrid Approach to English-Konkani Machine Translation

  • 1. Hybrid Approach to English-Konkani Machine Translation Sunayana Gawde M.Tech, Dept. Of Computer Science and Technology Goa University sunayanagawde17@gmail.com June 29, 2016 Sunayana Gawde Machine Translation June 29, 2016 1 / 23
  • 2. Sunayana Gawde Machine Translation June 29, 2016 2 / 23
  • 3. Overview 1 Overview of Machine Translation 2 Challenges faced by English-IL Machine Translation 3 Quality Enhancement Techniques 4 System Combination Techniques 5 Proposed Approach 6 Experimental Setup 7 Results 8 Conclusion Sunayana Gawde Machine Translation June 29, 2016 3 / 23
  • 4. Overview of Machine Translation What is Machine Translation? Types of Machine Translation RBMT SMT Existing Machine Translation Tools: Anglabharati, Anubharati, Anusaaraka, Mantra, MaTra, Shiva and Shakti, Anuvaadak, Sampark etc. What is Phrase Based Statistical Machine Translation? Sunayana Gawde Machine Translation June 29, 2016 4 / 23
  • 5. Challenges faced by English-IL Machine Translation Word order mismatch Richer Morphology in IL Less amount of Parallel corpora Sunayana Gawde Machine Translation June 29, 2016 5 / 23
  • 6. Quality Enhancement Techniques Pre-processing steps Source Side Reordering Morphological Segmentation Post-processing step Transliteration Sunayana Gawde Machine Translation June 29, 2016 6 / 23
  • 7. Source Side Reordering English: Subject-Verb-Object Konkani: Subject-Object-Verb English sentence is reordered in Subject-Object-Verb order English parse tree is built using dependency parser and leaves are read off after performing transformations to form a reordered English sentence. Source reordering in Indic NLP Library Sunayana Gawde Machine Translation June 29, 2016 7 / 23
  • 8. Morphological Segmentation Morphological Segmentation is a process of splitting the words into its corresponding morphemes. Sparsity Reduction Technique for morphologically rich languages Morphemes are the smallest unit of language which has meaning. flower+s, run+ing, person+s, clean+li+ness Source/Target side Morphological Segmentation Morfessor Word Segmentation in Indic NLP Library Sunayana Gawde Machine Translation June 29, 2016 8 / 23
  • 9. Transliteration Transliteration is a transformation of text from one script to another Script conversion for OOV words. BrahmiNet for 18 languages(13 Indo-Aryan, 4 Dravidian and English) Konkanverter for script conversion among Konkani scripts Sunayana Gawde Machine Translation June 29, 2016 9 / 23
  • 10. Pivoting Pivoting takes advantage of third language and its available resources to train the SMT system which results in improved performance. Transfer Method or Sentence Translation Corpus Synthesis Table Induction or Phrase table Triangulation Sunayana Gawde Machine Translation June 29, 2016 10 / 23
  • 11. System Combination Techniques Phrase table Triangulation Linear Interpolation Fill-up Interpolation Ensemble Encoding Sunayana Gawde Machine Translation June 29, 2016 11 / 23
  • 12. Motivation Relevant Work on Konkani MT or above Techniques: Sata-Anuvaadak-Tackling Multiway Translation of Indian Languages by Kunchukuttan et al. LREC 2014 Source Side Reordering and Transliteration BLEU = 13.01 IIT Bombay SMT system for ICON 2014 tool contest by Kunchukuttan et al. Source side Reordering and transliteration Source side word segmentation for IL-Hin (Not for Konkani) There is no single system which makes use of combination of Source side Reordering, Transliteration, Morphological Segmentation along with Pivoting. Sunayana Gawde Machine Translation June 29, 2016 12 / 23
  • 13. Proposed Approach Source Side Reordering for English Morphological Segmentation for languages which are morphologically rich Pivoting with Hindi and Marathi as pivot languages Transliteration as post-processing step Ensemble encoding technique is used to combine various systems where the translation which has highest probability is chosen from the respective system. Sunayana Gawde Machine Translation June 29, 2016 13 / 23
  • 14. System Architecture Sunayana Gawde Machine Translation June 29, 2016 14 / 23
  • 15. Experimental Setup Linear Interpolation: Direct English to Konkani Baseline system Source Reordered English to Konkani system Hindi Triangulated System Source Reordered English-Hindi System Hindi-Konkani Baseline System Marathi Triangulated System Source Reordered English-Marathi System Marathi-Konkani Baseline System Transliteration using Brahmi-Net Sunayana Gawde Machine Translation June 29, 2016 15 / 23
  • 16. Results(1/5) Sunayana Gawde Machine Translation June 29, 2016 16 / 23
  • 17. Results(2/5) Sunayana Gawde Machine Translation June 29, 2016 17 / 23
  • 18. Results(3/5) Sunayana Gawde Machine Translation June 29, 2016 18 / 23
  • 19. Results(4/5) Sunayana Gawde Machine Translation June 29, 2016 19 / 23
  • 20. Results(5/5) Sunayana Gawde Machine Translation June 29, 2016 20 / 23
  • 21. Conclusion and Future Scope With the successful implementation of Phrase Table Triangulation on Source Reordered models and Transliteration using the parallel corpora of English, Konkani, Hindi and Marathi we are able to get improved BLEU score of 17.57. Developing a WSD engine for Konkani will help English-Konkani Machine Translation. Developing a domain specific Machine Translation System Sunayana Gawde Machine Translation June 29, 2016 21 / 23
  • 22. References 1 R. Dabre, F. Cromieres, S. Kurohashi, and P. Bhattacharyya, ”Leveraging Small Multilingual Corpora for SMT Using Many Pivot Languages,” NAACL 2014, 2014. 2 A. Vasijevs, R. Kalnis, M. Pinnis, and R. Skadis, Machine translation for e-Governmentthe Baltic case. 3 A. Lopez, Statistical machine translation, ACM Computing Surveys, vol. 40, no. 3, pp. 149, Aug. 2008. 4 Anoop Kunchukuttan, Pushpak Bhattacharyya, ”Tackling Multiway Translation of Indian Languages” Sunayana Gawde Machine Translation June 29, 2016 22 / 23
  • 23. Thank You Sunayana Gawde Machine Translation June 29, 2016 23 / 23