SlideShare a Scribd company logo
1 of 22
EFFECT OF MORPHOLOGICAL
SEGMENTATION & DE-SEGMENTATION ON
MACHINE TRANSLATION
Sunayana R. Gawde
14109, M.Tech Part II
RECAP
 Concept Definition
 Overview of existing tools like Mantra, Anuvaadak,
MAT etc.
 Machine Translation for E-GOV - Baltic Case
 Machine Translation approaches
 Classification of Government Documents
 Proposed an idea of Structured Translation.
ISSUES IN MACHINE TRANSLATION
 Disambiguation (WSD)
 Non-Standard Speech
 Named Entities
 Translate
 Transliterate
IMPROVING TRANSLATION QUALITY
 Translation from Multi-parallel sources
 Morphological Segmentation
 Morphological De-Segmentation
 Ontologies in MT
MORPHEMES
 Smallest unit of language which has meaning.
 Any word form can be expressed as a combination
of morphemes.
 affect+ion+ate
 dinner+s
 eat+ing
 king+'s
 open+mind+ed+ness.
MORPHOLOGICAL SEGMENTATION
 Morphological segmentation transforms the
sentence by segmenting relevant morphemes,
which are then handled as regular tokens during
alignment and translation.
 To reduce data sparsity and to improve
correspondence with the source language (usually
English)
MORPHOLOGICAL DE-SEGMENTATION
 De-segmentation is the process of converting
segmented words into their original surface form.
 For many segmentations, especially unsupervised
ones, this amounts to simple concatenation.
 Two schemes proposed by Badr et al. (2008)
 table-based and
 rule-based
SIMPLE SYNTACTIC AND MORPHOLOGICAL
PROCESSING CAN HELP ENGLISH-HINDI
STATISTICAL MACHINE TRANSLATION
 By Ananthakrishnan Ramanathan, Pushpak
Bhattacharyya, Jayprasad Hegde, Ritesh M. Shah,
Sasikumar M.
 ACL 2014
 Re-ordering (3.8)
 Transliteration (4.8)
 Using suffixes of Hindi (Morfessor 2.0)
 BLEU from 12.10 to 15.88
STATISTICAL MACHINE TRANSLATION INTO A
MORPHOLOGICALLY COMPLEX LANGUAGE
 Kemal Oflazer
 CICLing 2008 (Conference on Intelligent Text
Processing & Computational Linguistics)
 Phrase-based SMT from English into Turkish
 Improved BLEU score by 7.10 points i.e. from
19.77 to 26.87
 Moses toolkit
 SRILM language modelling toolkit.
APPROACH
 Representing both English and Turkish at the
morpheme-level but with some selective
morpheme-grouping on the Turkish side of the
training data
 Re-ranking the n-best morpheme-sequence outputs
of the decoder with a word-based language model
 “Repairing” translated words with incorrect
morphological structure and words which are out-
of-vocabulary relative to the training and the
language model corpus
LATTICE DE-SEGMENTATION FOR STATISTICAL
MACHINE TRANSLATION
 By Mohammad Salameh, Colin Cherry, Grzegorz
Kondrak
 Published in Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics 2014
 English-to-Arabic and English-to-Finnish translation
APPROACH:
 Baseline (Without Segmentation)
 1 best De-segmentation: Segmentation at Training
& De-segmentation after decoding
 N-best De-segmentation: De-segments, augments
and re-ranks the decoder’s 1000-best list.
 Lattice De-segmentation: Exponential number of
hypothesis
 The search graph of a phrase-based decoder can be
interpreted as a lattice.
 De-segmenting Transducer
ENGLISH TO ARABIC
 Table based De-segmentor
ENGLISH TO FINNISH
 Simple concatenation
COMBINING MORPHEME-BASED MACHINE
TRANSLATION WITH POST-PROCESSING
MORPHEME PREDICTION
 By Ann Clifton and Anoop Sarkar
 Proceedings of the 49th Annual Meeting of the
Association for Computational Linguistics 2011
 English-Finnish
 BLEU improves from 14.82 to 15.09.
APPROACH
 Idea of segmented translation where they explicitly
allow phrase pairs that can end with a dangling
morpheme, which can connect with other
morphemes as part of the translation process
 Use of a fully segmented translation model in
combination with a post-processing morpheme
prediction system, using unsupervised morphology
induction.
 Baselines:
 Word Based
 Factored (Unsupervised)
 Segmented Models (Supervised)
 Segmentation using Morfessor (Unsupervised)
PROPOSED WORK
 To study and experiment the effect of Morphological
Segmentation & De-segmentation on Phrase Based
Statistical Machine Translation
 Before evaluation
 Before decoding
 Before phrase extraction
 Implement on English to Konkani and Hindi to
Konkani translation systems.
 Evaluate with BLEU and METEOR
CURRENT STATUS
 Got familiar with basics of Moses
 Developed a Baseline System as suggested on
Moses website with their corpus
 Developed basic English-Hindi translation system
using parallel data available online with BLEU score
5:31 only.
NEXT..
 Get the parallel data in text files which is not in
Unicode format.
 Align the data.
 Identify the Named Entities.
 Morphological Segmentation for Konkani.
 Test the improvement in BLEU score.
REFERENCES
 “The IIT Bombay SMT System for ICON 2014 Tools
Contest” By Anoop Kunchukuttan, Ratish Puduppully,
Rajen Chatterjee, Abhijit Mishra, Pushpak
Bhattacharyya
 “Statistical Machine Translation into a Morphologically
Complex Language” by Kemal Oflazer in CICLing 2008.
 “Lattice De-segmentation for Statistical Machine
Translation” by Mohammad Salameh, ColinCherry,
Grzegorz Kondrak in ACL 2014
 “Combining Morpheme-based Machine Translation with
Post-processing Morpheme Prediction” by Ann Clifton
and Anoop Sarkar in ACL 2011.
THANK YOU

More Related Content

What's hot

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationHrishikesh Nair
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsSajeed Mahaboob
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationeSAT Publishing House
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...IJERA Editor
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemWaqas Tariq
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
 
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...IJERA Editor
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...IJECEIAES
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONijnlc
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisIJERA Editor
 
THESIS PROPOSAL
THESIS PROPOSAL THESIS PROPOSAL
THESIS PROPOSAL Hasan Aid
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELSijnlc
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 

What's hot (18)

Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
Evaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutionsEvaluation of hindi english mt systems, challenges and solutions
Evaluation of hindi english mt systems, challenges and solutions
 
Isolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantizationIsolated word recognition using lpc & vector quantization
Isolated word recognition using lpc & vector quantization
 
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
Approach To Build A Marathi Text-To-Speech System Using Concatenative Synthes...
 
Building of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert SystemBuilding of Database for English-Azerbaijani Machine Translation Expert System
Building of Database for English-Azerbaijani Machine Translation Expert System
 
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESGENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURES
 
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
Implementation of Text To Speech for Marathi Language Using Transcriptions Co...
 
SMT3
SMT3SMT3
SMT3
 
Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...Improving the role of language model in statistical machine translation (Indo...
Improving the role of language model in statistical machine translation (Indo...
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATIONSEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
SEMI-AUTOMATIC SIMULTANEOUS INTERPRETING QUALITY EVALUATION
 
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech SynthesisGrapheme-To-Phoneme Tools for the Marathi Speech Synthesis
Grapheme-To-Phoneme Tools for the Marathi Speech Synthesis
 
THESIS PROPOSAL
THESIS PROPOSAL THESIS PROPOSAL
THESIS PROPOSAL
 
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
[IJET-V2I1P13] Authors:Shilpa More, Gagandeep .S. Dhir , Deepak Daiwadney and...
 
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
S URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELSS URVEY  O N M ACHINE  T RANSLITERATION A ND  M ACHINE L EARNING M ODELS
S URVEY O N M ACHINE T RANSLITERATION A ND M ACHINE L EARNING M ODELS
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 
arttt.pdf
arttt.pdfarttt.pdf
arttt.pdf
 
Presentation1
Presentation1Presentation1
Presentation1
 

Similar to Effectof morphologicalsegmentation&de segmentationonmachinetranslation

Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Sunayana Gawde
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Reviewinscit2006
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShashank Shisodia
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachIJERA Editor
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languageiosrjce
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translationbehzad66
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agendaTAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agendaTAUS - The Language Data Network
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...iosrjce
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...IJITE
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...ijrap
 

Similar to Effectof morphologicalsegmentation&de segmentationonmachinetranslation (20)

Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...Effect of morphological segmentation & de-segmentation on machine translation...
Effect of morphological segmentation & de-segmentation on machine translation...
 
Improvement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A ReviewImprovement in Quality of Speech associated with Braille codes - A Review
Improvement in Quality of Speech associated with Braille codes - A Review
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
**JUNK** (no subject)
**JUNK** (no subject)**JUNK** (no subject)
**JUNK** (no subject)
 
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
Make it simple with paraphrases: Automated paraphrasing for authoring aids an...
 
Jq3616701679
Jq3616701679Jq3616701679
Jq3616701679
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Shallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliteratorShallow parser for hindi language with an input from a transliterator
Shallow parser for hindi language with an input from a transliterator
 
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid ApproachPunjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
Punjabi to Hindi Transliteration System for Proper Nouns Using Hybrid Approach
 
Performance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi languagePerformance Calculation of Speech Synthesis Methods for Hindi language
Performance Calculation of Speech Synthesis Methods for Hindi language
 
Personalising speech to-speech translation
Personalising speech to-speech translationPersonalising speech to-speech translation
Personalising speech to-speech translation
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
srinu.pptx
srinu.pptxsrinu.pptx
srinu.pptx
 
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agendaTAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
 
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
Artificially Generatedof Concatenative Syllable based Text to Speech Synthesi...
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
Learning to Pronounce as Measuring Cross Lingual Joint Orthography Phonology ...
 

More from Sunayana Gawde

Hybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationHybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationSunayana Gawde
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainSunayana Gawde
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesSunayana Gawde
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALSunayana Gawde
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMSunayana Gawde
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part ISunayana Gawde
 

More from Sunayana Gawde (7)

Hybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine TranslationHybrid Approach to English-Konkani Machine Translation
Hybrid Approach to English-Konkani Machine Translation
 
Machine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domainMachine translation-system-for-administrative-domain
Machine translation-system-for-administrative-domain
 
Mind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context TreesMind mapping and Its Applications, Introduction to Context Trees
Mind mapping and Its Applications, Introduction to Context Trees
 
A MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVALA MIND MAP QUERY IN INFORMATION RETRIEVAL
A MIND MAP QUERY IN INFORMATION RETRIEVAL
 
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEMMIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
MIND MAP BASED USER MODELLING AND RECOMMENDER SYSTEM
 
My 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part IMy 1st semester seminar of M. Tech Part I
My 1st semester seminar of M. Tech Part I
 
My NLP seminars
My NLP seminarsMy NLP seminars
My NLP seminars
 

Recently uploaded

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningVitsRangannavar
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEOrtus Solutions, Corp
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样umasea
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutionsmonugehlot87
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 

Recently uploaded (20)

Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
cybersecurity notes for mca students for learning
cybersecurity notes for mca students for learningcybersecurity notes for mca students for learning
cybersecurity notes for mca students for learning
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASEBATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
BATTLEFIELD ORM: TIPS, TACTICS AND STRATEGIES FOR CONQUERING YOUR DATABASE
 
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
办理学位证(UQ文凭证书)昆士兰大学毕业证成绩单原版一模一样
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
buds n tech IT solutions
buds n  tech IT                solutionsbuds n  tech IT                solutions
buds n tech IT solutions
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 

Effectof morphologicalsegmentation&de segmentationonmachinetranslation

  • 1. EFFECT OF MORPHOLOGICAL SEGMENTATION & DE-SEGMENTATION ON MACHINE TRANSLATION Sunayana R. Gawde 14109, M.Tech Part II
  • 2. RECAP  Concept Definition  Overview of existing tools like Mantra, Anuvaadak, MAT etc.  Machine Translation for E-GOV - Baltic Case  Machine Translation approaches  Classification of Government Documents  Proposed an idea of Structured Translation.
  • 3. ISSUES IN MACHINE TRANSLATION  Disambiguation (WSD)  Non-Standard Speech  Named Entities  Translate  Transliterate
  • 4. IMPROVING TRANSLATION QUALITY  Translation from Multi-parallel sources  Morphological Segmentation  Morphological De-Segmentation  Ontologies in MT
  • 5. MORPHEMES  Smallest unit of language which has meaning.  Any word form can be expressed as a combination of morphemes.  affect+ion+ate  dinner+s  eat+ing  king+'s  open+mind+ed+ness.
  • 6. MORPHOLOGICAL SEGMENTATION  Morphological segmentation transforms the sentence by segmenting relevant morphemes, which are then handled as regular tokens during alignment and translation.  To reduce data sparsity and to improve correspondence with the source language (usually English)
  • 7. MORPHOLOGICAL DE-SEGMENTATION  De-segmentation is the process of converting segmented words into their original surface form.  For many segmentations, especially unsupervised ones, this amounts to simple concatenation.  Two schemes proposed by Badr et al. (2008)  table-based and  rule-based
  • 8. SIMPLE SYNTACTIC AND MORPHOLOGICAL PROCESSING CAN HELP ENGLISH-HINDI STATISTICAL MACHINE TRANSLATION  By Ananthakrishnan Ramanathan, Pushpak Bhattacharyya, Jayprasad Hegde, Ritesh M. Shah, Sasikumar M.  ACL 2014  Re-ordering (3.8)  Transliteration (4.8)  Using suffixes of Hindi (Morfessor 2.0)  BLEU from 12.10 to 15.88
  • 9. STATISTICAL MACHINE TRANSLATION INTO A MORPHOLOGICALLY COMPLEX LANGUAGE  Kemal Oflazer  CICLing 2008 (Conference on Intelligent Text Processing & Computational Linguistics)  Phrase-based SMT from English into Turkish  Improved BLEU score by 7.10 points i.e. from 19.77 to 26.87  Moses toolkit  SRILM language modelling toolkit.
  • 10. APPROACH  Representing both English and Turkish at the morpheme-level but with some selective morpheme-grouping on the Turkish side of the training data  Re-ranking the n-best morpheme-sequence outputs of the decoder with a word-based language model  “Repairing” translated words with incorrect morphological structure and words which are out- of-vocabulary relative to the training and the language model corpus
  • 11. LATTICE DE-SEGMENTATION FOR STATISTICAL MACHINE TRANSLATION  By Mohammad Salameh, Colin Cherry, Grzegorz Kondrak  Published in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics 2014  English-to-Arabic and English-to-Finnish translation
  • 12. APPROACH:  Baseline (Without Segmentation)  1 best De-segmentation: Segmentation at Training & De-segmentation after decoding  N-best De-segmentation: De-segments, augments and re-ranks the decoder’s 1000-best list.  Lattice De-segmentation: Exponential number of hypothesis  The search graph of a phrase-based decoder can be interpreted as a lattice.  De-segmenting Transducer
  • 13. ENGLISH TO ARABIC  Table based De-segmentor
  • 14. ENGLISH TO FINNISH  Simple concatenation
  • 15. COMBINING MORPHEME-BASED MACHINE TRANSLATION WITH POST-PROCESSING MORPHEME PREDICTION  By Ann Clifton and Anoop Sarkar  Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics 2011  English-Finnish  BLEU improves from 14.82 to 15.09.
  • 16. APPROACH  Idea of segmented translation where they explicitly allow phrase pairs that can end with a dangling morpheme, which can connect with other morphemes as part of the translation process  Use of a fully segmented translation model in combination with a post-processing morpheme prediction system, using unsupervised morphology induction.
  • 17.  Baselines:  Word Based  Factored (Unsupervised)  Segmented Models (Supervised)  Segmentation using Morfessor (Unsupervised)
  • 18. PROPOSED WORK  To study and experiment the effect of Morphological Segmentation & De-segmentation on Phrase Based Statistical Machine Translation  Before evaluation  Before decoding  Before phrase extraction  Implement on English to Konkani and Hindi to Konkani translation systems.  Evaluate with BLEU and METEOR
  • 19. CURRENT STATUS  Got familiar with basics of Moses  Developed a Baseline System as suggested on Moses website with their corpus  Developed basic English-Hindi translation system using parallel data available online with BLEU score 5:31 only.
  • 20. NEXT..  Get the parallel data in text files which is not in Unicode format.  Align the data.  Identify the Named Entities.  Morphological Segmentation for Konkani.  Test the improvement in BLEU score.
  • 21. REFERENCES  “The IIT Bombay SMT System for ICON 2014 Tools Contest” By Anoop Kunchukuttan, Ratish Puduppully, Rajen Chatterjee, Abhijit Mishra, Pushpak Bhattacharyya  “Statistical Machine Translation into a Morphologically Complex Language” by Kemal Oflazer in CICLing 2008.  “Lattice De-segmentation for Statistical Machine Translation” by Mohammad Salameh, ColinCherry, Grzegorz Kondrak in ACL 2014  “Combining Morpheme-based Machine Translation with Post-processing Morpheme Prediction” by Ann Clifton and Anoop Sarkar in ACL 2011.