SlideShare a Scribd company logo
1 of 23
EFFECT OF MORPHOLOGICAL
SEGMENTATION & DE-SEGMENTATION ON
MACHINE TRANSLATION
Sunayana R. Gawde
14109, M.Tech Part II
RECAP
 “Simple Syntactic and Morphological Processing
Can Help English-Hindi Statistical Machine
Translation” by Bhattacharya et al.; ACL 2014.
 “Statistical Machine Translation into a
Morphologically Complex Language” by Oflazer et
al.; CICLing 2008.
 “Combining Morpheme-based Machine Translation
with Post-processing Morpheme Prediction” by
Sarkar et al.; ACL 2011
MORPHOLOGICAL DE-SEGMENTATION
 De-segmentation is the process of converting
segmented words into their original surface form.
 Concatenation, Rules or Table look-up
 Segmentation-Sparsity reduction technique
 eat+ing
 dinner+s
LATTICE DE-SEGMENTATION FOR STATISTICAL
MACHINE TRANSLATION
 By Mohammad Salameh, Colin Cherry, Grzegorz
Kondrak
 Published in Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics 2014
 English-to-Arabic and English-to-Finnish translation
LATTICE
 A word lattice G = (V,E) is a directed acyclic graph
that formally is a weighted finite state automata
(FSA)
 Exactly one node has no outgoing edges and it is
called as ‘end node’.
EXAMPLES:
GENERALISING WORD LATTICE TRANSLATION
 By Christopher Dyer, Smaranda Muresan, Philip
Resnik
 In Proceedings of Association for Computational
Linguistics 2008
 Chinese to English and Arabic to English
translation.
THE CHART-REPRESENTATION OF THE GRAPH
WORD LATTICE DECODING
 2 classes of Translation models for lattice
translation:
 Finite State Transducers with hierarchical Phrase based
models.
 Synchronous CFG based decoder
LATTICE TRANSLATION WITH FST BASED
PHRASE BASED MODELS
 Phrase based models
 Splitting the sentence and creating phrases
 Choosing the path from lattice
 Moses phrase-based decoder to translate word
lattices
 Left to right parsing of Lattice
SYNCHRONOUS CONTEXT FREE GRAMMAR
 Source-Target synchronous rules
 Parse the input using source language grammar
 Simultaneously build a tree on target language
EFFECT OF WORD LATTICES
 Improvement in BLEU score
 Decrease in OOV words
 Poor Coverage of Named Entities
LATTICE DE-SEGMENTATION FOR STATISTICAL
MACHINE TRANSLATION
 By Mohammad Salameh, Colin Cherry, Grzegorz
Kondrak
 Published in Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics 2014
 English-to-Arabic and English-to-Finnish translation
GOAL
 De-segment the decoder’s output lattice
 Gain access to a compact, de-segmented view of a
large portion of the translation search space
 Morphemes De-segmenting Transducer De-
segmented words
 Lattice Specific Table Finite State Transducer
APPROACH:
 Baseline (Without Segmentation)
 1 best De-segmentation: Segmentation at Training
& De-segmentation after decoding
 N-best De-segmentation: De-segments, augments
and re-ranks the decoder’s 1000-best list.
 Lattice De-segmentation: Exponential number of
hypothesis
 The search graph of a phrase-based decoder can be
interpreted as a lattice.
 De-segmenting Transducer
ENGLISH TO ARABIC
 Table based De-Segmentor
ENGLISH TO FINNISH
 Simple concatenation
PROPOSED APPROACH TO IMPROVE
TRANSLATION QUALITY
 Translation from Multi-parallel sources
 English, Hindi, Konkani & Marathi
 Morphological Segmentation- to reduce data
sparsity
 Morfessor / Morph Analyser
 Morphological De-Segmentation
 Named Entity Tagger
 Cognates
PROPOSED WORK
 To study and experiment the effect of Morphological
Segmentation & De-segmentation on Phrase Based
Statistical Machine Translation
 Before evaluation
 Before decoding
 Before phrase extraction
 Implement on English to Konkani and Hindi to
Konkani translation systems.
 Evaluate with BLEU and METEOR
CURRENT STATUS
 Got familiar with basics of Moses
 Developed a Baseline System as suggested on
Moses website with their corpus
 Developed basic English-Hindi translation system
using parallel data available online with BLEU score
5:31 only.
 Hindi to Konkani Translation system for 3k
sentences of ILCI with BLEU score of 27.3
NEXT..
 Get the parallel data in text files which is not in
Unicode format.
 Align the data.
 Identify the Named Entities.
 Morphological Segmentation for Konkani.
 Morphological De-segmentation for Konkani
 Test the improvement in BLEU score.
REFERENCES
 “Lattice De-segmentation for Statistical Machine
Translation” by Mohammad Salameh, ColinCherry,
Grzegorz Kondrak in ACL 2014
 “Generalising Word Lattice Translation” by
Christopher Dyer, Smaranda Muresan, Philip
Resnik in ACL 2008.
THANK YOU

More Related Content

Similar to Effect of morphological segmentation & de-segmentation on machine translation Part2

Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indianeSAT Publishing House
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainSunayana Gawde
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsIOSR Journals
 
English to Bangla Translation
English to Bangla TranslationEnglish to Bangla Translation
English to Bangla TranslationSaugata Bose
 
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...IJECEIAES
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageEditor IJCATR
 

Similar to Effect of morphological segmentation & de-segmentation on machine translation Part2 (20)

Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
Cross language information retrieval in indian
Cross language information retrieval in indianCross language information retrieval in indian
Cross language information retrieval in indian
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domain
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Ac04507168175
Ac04507168175Ac04507168175
Ac04507168175
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
Machine Translation Approaches and Design Aspects
Machine Translation Approaches and Design AspectsMachine Translation Approaches and Design Aspects
Machine Translation Approaches and Design Aspects
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
English to Bangla Translation
English to Bangla TranslationEnglish to Bangla Translation
English to Bangla Translation
 
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...Source side pre-ordering using recurrent neural networks for English-Myanmar ...
Source side pre-ordering using recurrent neural networks for English-Myanmar ...
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Survey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi LanguageSurvey on Indian CLIR and MT systems in Marathi Language
Survey on Indian CLIR and MT systems in Marathi Language
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 

More from Sunayana Gawde

Hybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationHybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationSunayana Gawde
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesSunayana Gawde
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALSunayana Gawde
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMSunayana Gawde
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part ISunayana Gawde
 

More from Sunayana Gawde (6)

Hybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationHybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine Translation
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context Trees
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVAL
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part I
 
My NLP seminars
My NLP seminarsMy NLP seminars
My NLP seminars
 

Recently uploaded

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️anilsa9823
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 

Recently uploaded (20)

Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online  ☂️
CALL ON ➥8923113531 🔝Call Girls Kakori Lucknow best sexual service Online ☂️
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 

Effect of morphological segmentation & de-segmentation on machine translation Part2

  • 1. EFFECT OF MORPHOLOGICAL SEGMENTATION & DE-SEGMENTATION ON MACHINE TRANSLATION Sunayana R. Gawde 14109, M.Tech Part II
  • 2. RECAP  “Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation” by Bhattacharya et al.; ACL 2014.  “Statistical Machine Translation into a Morphologically Complex Language” by Oflazer et al.; CICLing 2008.  “Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction” by Sarkar et al.; ACL 2011
  • 3. MORPHOLOGICAL DE-SEGMENTATION  De-segmentation is the process of converting segmented words into their original surface form.  Concatenation, Rules or Table look-up  Segmentation-Sparsity reduction technique  eat+ing  dinner+s
  • 4. LATTICE DE-SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION  By Mohammad Salameh, Colin Cherry, Grzegorz Kondrak  Published in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 2014  English-to-Arabic and English-to-Finnish translation
  • 5. LATTICE  A word lattice G = (V,E) is a directed acyclic graph that formally is a weighted finite state automata (FSA)  Exactly one node has no outgoing edges and it is called as ‘end node’.
  • 7. GENERALISING WORD LATTICE TRANSLATION  By Christopher Dyer, Smaranda Muresan, Philip Resnik  In Proceedings of Association for Computational Linguistics 2008  Chinese to English and Arabic to English translation.
  • 9. WORD LATTICE DECODING  2 classes of Translation models for lattice translation:  Finite State Transducers with hierarchical Phrase based models.  Synchronous CFG based decoder
  • 10. LATTICE TRANSLATION WITH FST BASED PHRASE BASED MODELS  Phrase based models  Splitting the sentence and creating phrases  Choosing the path from lattice  Moses phrase-based decoder to translate word lattices  Left to right parsing of Lattice
  • 11. SYNCHRONOUS CONTEXT FREE GRAMMAR  Source-Target synchronous rules  Parse the input using source language grammar  Simultaneously build a tree on target language
  • 12. EFFECT OF WORD LATTICES  Improvement in BLEU score  Decrease in OOV words  Poor Coverage of Named Entities
  • 13. LATTICE DE-SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION  By Mohammad Salameh, Colin Cherry, Grzegorz Kondrak  Published in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 2014  English-to-Arabic and English-to-Finnish translation
  • 14. GOAL  De-segment the decoder’s output lattice  Gain access to a compact, de-segmented view of a large portion of the translation search space  Morphemes De-segmenting Transducer De- segmented words  Lattice Specific Table Finite State Transducer
  • 15. APPROACH:  Baseline (Without Segmentation)  1 best De-segmentation: Segmentation at Training & De-segmentation after decoding  N-best De-segmentation: De-segments, augments and re-ranks the decoder’s 1000-best list.  Lattice De-segmentation: Exponential number of hypothesis  The search graph of a phrase-based decoder can be interpreted as a lattice.  De-segmenting Transducer
  • 16. ENGLISH TO ARABIC  Table based De-Segmentor
  • 17. ENGLISH TO FINNISH  Simple concatenation
  • 18. PROPOSED APPROACH TO IMPROVE TRANSLATION QUALITY  Translation from Multi-parallel sources  English, Hindi, Konkani & Marathi  Morphological Segmentation- to reduce data sparsity  Morfessor / Morph Analyser  Morphological De-Segmentation  Named Entity Tagger  Cognates
  • 19. PROPOSED WORK  To study and experiment the effect of Morphological Segmentation & De-segmentation on Phrase Based Statistical Machine Translation  Before evaluation  Before decoding  Before phrase extraction  Implement on English to Konkani and Hindi to Konkani translation systems.  Evaluate with BLEU and METEOR
  • 20. CURRENT STATUS  Got familiar with basics of Moses  Developed a Baseline System as suggested on Moses website with their corpus  Developed basic English-Hindi translation system using parallel data available online with BLEU score 5:31 only.  Hindi to Konkani Translation system for 3k sentences of ILCI with BLEU score of 27.3
  • 21. NEXT..  Get the parallel data in text files which is not in Unicode format.  Align the data.  Identify the Named Entities.  Morphological Segmentation for Konkani.  Morphological De-segmentation for Konkani  Test the improvement in BLEU score.
  • 22. REFERENCES  “Lattice De-segmentation for Statistical Machine Translation” by Mohammad Salameh, ColinCherry, Grzegorz Kondrak in ACL 2014  “Generalising Word Lattice Translation” by Christopher Dyer, Smaranda Muresan, Philip Resnik in ACL 2008.