SlideShare a Scribd company logo
1 of 14
BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE
PILANI (RAJASTHAN)
April, 2018
Structuring of Translation Memory
By
Ashutosh Kumar
2015HT13439
BITS ZG628T: Dissertation
Introduction
What is Translation Memory (TM) ?
Translation Memory (TM) is an archive of previously translated segments that stores
source language segment and its corresponding translation to target language. Here,
segment refers to a single sentence or a single paragraph.
Uses of TM
When a translator uses a TM tool to translate a new segment, the tool identifies
similarities between segment of Query data ( the segment which have to translate) and
the stored segment in TM database. A translator may then choose one of them to insert
or make slight changes to the to given segment.
Benefits of TM
TM will help Human Translators to increase their productivity.
It helps to ensure that the same term is consistently used across translation.
It helps to ensure uniform style of translation across a large document.
Problem statement
Sentence-level TM
Most of the TM tool stores sentence-level segments in TM database. Hence the benefits of TM
are only realized for identical or similar sentences, which may occur rarely because usually
sentences are complex, while sentence fragments (clauses) may match more often.
In TM comprising of sentence-level segments, It may sometimes occur that input sentence
contains a sub-segment (clause) and its translation is available in TM. But search-and-
retrieval function will not show any result because matching percentage (36%) will be low as
the defined threshold (75%) for TM.
Query data TM data
1. Earphone is the best option
available for them as it doesn't
disturb others' sleep
2. It doesn't disturb others' sleep
1. Earphone is the best option
available for them as it doesn't
disturb others' sleep
Introduction
Structuring of TM (clause-level)
In our approach we used clause splitting to define clause level structure of TM. In clause level
structure of TM, We split the sentence into clauses, and put the clauses along with the its full
sentence. So in clause level structure of TM, TM contains the given sentences along with their
clauses.
Retrieving clauses is desirable because there is a higher chance for a match to be found for a
clause than for a complex sentence (contains more than one clause). A Clause contains
complete thought because it comprises of a subject and a verb. Hence even if a translator
does not find a match for the entire sentence, he or she still might get matches for clauses
and therefore the translator will be benefited.
Experiments and their Results
Clause Extraction
For clause extraction tasks we used parsing module of OpenNLP. OpenNLP provides Machine
Learning models which is trained on “Penn Treebank POS” tagged corpus. OpenNLP[6] creates a
“constituency parse tree” also known “phrase parse tree”.
Example : For a sentence
“Earphone is the best option available for them as it doesn't disturb others' sleep”
generated bracket notation tree is given below –
[TOP [S [NP [NN Earphone]] [VP [VBZ is] [NP [NP [DT the] [JJS best] [NN option]] [ADJP
[ADJP [JJ available] [PP [IN for] [NP [PRP them]]]] [SBAR [IN as] [S [NP [PRP it]] [VP
[VBD doesn't] [NP [ADJP [NN disturb] [JJ others']] [NN sleep]]]]]]]] [. .] ]]
Experiments and their Results
From the pictorial representation we can see that there are two clause-
Clause 1: Earphone is the best option available for them
Clause 2 : as it doesn't disturb others' sleep
Experiments and their Results
Experimental Data
For our experiments we have created three different test data set for English language. Set-A
contains 100 input sentences, Set-B contains 200 input sentences and Set-C contains 300 input
sentences. For TM we have a data set which contains 3,500 sentences. This data contains simple
sentences and complex sentences. Complex sentence contains more than one clause.
TM Configuration
In this dissertation we have performed experiments with three kinds of TM configurations.
TM configuration 1
In TM configuration 1, Query data contains full sentences. TM data also contains full sentences. In
this configuration, there is no clause level splitting either in Query data or in TM data.
TM configuration 2
In TM configuration 2, Query data contains full sentences. TM data contains full sentences as well as
clauses of their sentences. We used clause splitter/structure in this configuration.
TM configuration 3
In TM configuration 3, Query data contains the original sentence and its clauses, in case of complex
sentences. So there are not only one query but a set of queries while looking into TM database.
Similarly, TM data contains the original sentences as well as the clauses of those sentences. We used
clause splitter for structuring Query data in this configuration.
Experiment and Results

‘S’ denotes the sentences, ‘C’ denotes the clauses of these sentences, ‘S, C’ denotes the sentences
and its clauses.
Result on Set-A
Table 1
Experiment and Results
Result on Set-B
Result on Set-C
Table 2
Table 3
Experiment and Results
Conclusion 1
Table 1, Table 2 and Table 3, shows that whenever there is clause splitting either in the
Query data or in TM data, there is increase in % Match from TM.
Conclusion 2
Table 1, Table 2 and Table 3, the Query data set is different in each case. In spite of
different data set (Set-A, Set-B, Set-C) we observe that for each data set there is an
increasing trend in % Match from TM. Hence we can conclude that clause splitting will
always increase the % Match over sentence level data.
Conclusion 3
Table 1, Table 2 and Table 3, the increase in % Match in each case is not uniform. This is
because the % Match for a given data set from the TM is dependent upon the two factors,
i.e. TM data and Query data which is very natural.
Summary
We have seen that when we use clause level structuring in TM, relevant matches for
Query data that were earlier dropped due to low percentage match in sentence level,
are also retrieved in the resulting set.
So we get more relevant matches for Query data from the TM database.
This study uses different TM configurations (TM configuration 1, TM configuration 2,
TM configuration 3 ) to support the above claim on different test data set. A translator
might not get a match for a complete sentence but he or she will still get a match for a
clause, which helps him to perform translation task better, thereby increasing his
productivity (translated word per hour).
Acknowledgments
Firstly, I would like to express my sincere gratitude to my Supervisor Dr. Pawan Kumar for
the continuous support of my M.Tech. study and related research, for his patience,
motivation, and immense knowledge. His guidance helped me in all the time of research and
writing of this dissertation.
I would also like to thank Dr. Mukul Kumar Sinha for his insightful comments and
encouragement, but also for asking hard questions which helped me widen my research from
various perspectives.
I would also like to thank my colleague, Ms. Himanshi Thapliyal for editing and proof-
reading the dissertation.
Last but not the least, I would like to express my love and gratitude to my beloved family, for
their understanding & motivation, through the duration of this project.
References
[1] Reinke, U. (2013), State of the Art in Translation Memory Technology. Proceedings of the Workshop on Natural Language Processing
for Translation Memories (NLP4TM), pages 17–23,
[2] Grönroos, Mickel., Becks,Ari., Bringing Intelligence to Translation Memory Technology.
Translating and the Computer 27, November 2005 [London: Aslib, 2005]
[3] Timonera, Katerina., and Mitkov, Ruslan., (Sept 2015), Improving Translation Memory Matching through Clause Splitting. Proceedings
of the Workshop on Natural Language Processing for Translation Memories (NLP4TM), pages 17–23, Hissar, Bulgaria,
[4] Sharma, Sanjeev Kumar.,(2016), Clause Boundary Identification for Different Languages: A Survey, International Journal of Computer
Applications & Information Technology Vol. 8, Issue II 2016 (ISSN: 2278-7720)
[5] https://stanfordnlp.github.io/CoreNLP/
[6] https://opennlp.apache.org/
[7] https://www.ibm.com/developerworks/library/x-localis3/
[8] Translators on translation memory (TM). Results of an ethnographic study in threetranslation services and agencies Matthieu LeBlanc
Université de Moncton,
[9] Christensen,Tina Paulsen. and Schjoldager, Anne., (2011) The Impact of Translation- Memory (TM) Technology on Cognitive
Processes. NLPSC 2011
[10] A.Zerfass., (2002). Evaluating Translation Memory Systems. Proceedings of the LREC 2002 Workshop, Las Palmas, Canary Islands,
SPAIN.
[11] Timothy Baldwin & Hozumi Tanaka. (2001). Balancing up Efficiency and Accuracy in Translation Retrieval. Journal of Natural
Language Processing vol. 8.
THANKYOU

More Related Content

What's hot

PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGkevig
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015Conor McGrory
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...TAUS - The Language Data Network
 
Myanmar news summarization using different word representations
Myanmar news summarization using different word representations Myanmar news summarization using different word representations
Myanmar news summarization using different word representations IJECEIAES
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...cscpconf
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...mlaij
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...ijnlc
 
Clustering Algorithm for Gujarati Language
Clustering Algorithm for Gujarati LanguageClustering Algorithm for Gujarati Language
Clustering Algorithm for Gujarati Languageijsrd.com
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextBayu Aldi Yansyah
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment AnalysisRupak Roy
 
Anandkumar novel approach
Anandkumar novel approachAnandkumar novel approach
Anandkumar novel approachJasline Presilda
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...ijnlc
 
Mit202 data base management system(dbms)
Mit202  data base management system(dbms)Mit202  data base management system(dbms)
Mit202 data base management system(dbms)smumbahelp
 
A survey on sentence fusion techniques of abstractive text summarization
A survey on sentence fusion techniques of abstractive text summarizationA survey on sentence fusion techniques of abstractive text summarization
A survey on sentence fusion techniques of abstractive text summarizationIJERA Editor
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksSDL
 

What's hot (20)

D3 dhanalakshmi
D3 dhanalakshmiD3 dhanalakshmi
D3 dhanalakshmi
 
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGINGPARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
PARSING OF MYANMAR SENTENCES WITH FUNCTION TAGGING
 
feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015feras_kalita_mcgrory_2015
feras_kalita_mcgrory_2015
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Beijing, Chengqing Zong, Casia...
 
Myanmar news summarization using different word representations
Myanmar news summarization using different word representations Myanmar news summarization using different word representations
Myanmar news summarization using different word representations
 
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
ON THE UTILITY OF A SYLLABLE-LIKE SEGMENTATION FOR LEARNING A TRANSLITERATION...
 
Word embedding
Word embedding Word embedding
Word embedding
 
D2 anandkumar
D2 anandkumarD2 anandkumar
D2 anandkumar
 
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...Classification of Machine Translation Outputs Using NB Classifier and SVM for...
Classification of Machine Translation Outputs Using NB Classifier and SVM for...
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
IMPROVING THE QUALITY OF GUJARATI-HINDI MACHINE TRANSLATION THROUGH PART-OF-S...
 
Clustering Algorithm for Gujarati Language
Clustering Algorithm for Gujarati LanguageClustering Algorithm for Gujarati Language
Clustering Algorithm for Gujarati Language
 
Intent Classifier with Facebook fastText
Intent Classifier with Facebook fastTextIntent Classifier with Facebook fastText
Intent Classifier with Facebook fastText
 
NLP - Sentiment Analysis
NLP - Sentiment AnalysisNLP - Sentiment Analysis
NLP - Sentiment Analysis
 
Anandkumar novel approach
Anandkumar novel approachAnandkumar novel approach
Anandkumar novel approach
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Mit202 data base management system(dbms)
Mit202  data base management system(dbms)Mit202  data base management system(dbms)
Mit202 data base management system(dbms)
 
Text Summarization
Text SummarizationText Summarization
Text Summarization
 
A survey on sentence fusion techniques of abstractive text summarization
A survey on sentence fusion techniques of abstractive text summarizationA survey on sentence fusion techniques of abstractive text summarization
A survey on sentence fusion techniques of abstractive text summarization
 
Fast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural NetworksFast and Accurate Preordering for SMT using Neural Networks
Fast and Accurate Preordering for SMT using Neural Networks
 

Similar to 2015ht13439 final presentation

Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONcscpconf
 
word level analysis
word level analysis word level analysis
word level analysis tjs1
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Yafi Azhari
 
INTERPRETER.ppt
INTERPRETER.pptINTERPRETER.ppt
INTERPRETER.pptssuser2454e81
 
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONIMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONcsandit
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET Journal
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1Kalyanee Baruah
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkIRJET Journal
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) inventionjournals
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Translation Management System - Effection Contribution from a writer
Translation Management System -  Effection Contribution from a writerTranslation Management System -  Effection Contribution from a writer
Translation Management System - Effection Contribution from a writerAarthi Kirubaharan
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPAndi Wu
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationChengeng Ma
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis ShivangiYadav42
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Waqas Tariq
 

Similar to 2015ht13439 final presentation (20)

Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATIONAN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
AN ADVANCED APPROACH FOR RULE BASED ENGLISH TO BENGALI MACHINE TRANSLATION
 
Pxc3898474
Pxc3898474Pxc3898474
Pxc3898474
 
word level analysis
word level analysis word level analysis
word level analysis
 
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
Measuring word alignment_quality_for_statistical_machine_translation_tcm17-29663
 
INTERPRETER.ppt
INTERPRETER.pptINTERPRETER.ppt
INTERPRETER.ppt
 
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATIONIMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
IMPROVE THE QUALITY OF IMPORTANT SENTENCES FOR AUTOMATIC TEXT SUMMARIZATION
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
IRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational VideosIRJET - Response Analysis of Educational Videos
IRJET - Response Analysis of Educational Videos
 
Translationusing moses1
Translationusing moses1Translationusing moses1
Translationusing moses1
 
Multi Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation NetworkMulti Document Text Summarization using Backpropagation Network
Multi Document Text Summarization using Backpropagation Network
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI) International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Translation Management System - Effection Contribution from a writer
Translation Management System -  Effection Contribution from a writerTranslation Management System -  Effection Contribution from a writer
Translation Management System - Effection Contribution from a writer
 
Chinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLPChinese Word Segmentation in MSR-NLP
Chinese Word Segmentation in MSR-NLP
 
Yelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classificationYelp challenge reviews_sentiment_classification
Yelp challenge reviews_sentiment_classification
 
sentiment analysis
sentiment analysis sentiment analysis
sentiment analysis
 
C8 akumaran
C8 akumaranC8 akumaran
C8 akumaran
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
Implementation of Enhanced Parts-of-Speech Based Rules for English to Telugu ...
 

Recently uploaded

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto GonzĂĄlez Trastoy
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 

Recently uploaded (20)

Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar đŸ“±  9999965857  đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar đŸ“± 9999965857 đŸ€© Delhi đŸ«Š HOT AND SEXY VVIP 🍎 SE...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 

2015ht13439 final presentation

  • 1. BIRLA INSTITUTE OF TECHNOLOGY & SCIENCE PILANI (RAJASTHAN) April, 2018 Structuring of Translation Memory By Ashutosh Kumar 2015HT13439 BITS ZG628T: Dissertation
  • 2. Introduction What is Translation Memory (TM) ? Translation Memory (TM) is an archive of previously translated segments that stores source language segment and its corresponding translation to target language. Here, segment refers to a single sentence or a single paragraph. Uses of TM When a translator uses a TM tool to translate a new segment, the tool identifies similarities between segment of Query data ( the segment which have to translate) and the stored segment in TM database. A translator may then choose one of them to insert or make slight changes to the to given segment. Benefits of TM TM will help Human Translators to increase their productivity. It helps to ensure that the same term is consistently used across translation. It helps to ensure uniform style of translation across a large document.
  • 3. Problem statement Sentence-level TM Most of the TM tool stores sentence-level segments in TM database. Hence the benefits of TM are only realized for identical or similar sentences, which may occur rarely because usually sentences are complex, while sentence fragments (clauses) may match more often. In TM comprising of sentence-level segments, It may sometimes occur that input sentence contains a sub-segment (clause) and its translation is available in TM. But search-and- retrieval function will not show any result because matching percentage (36%) will be low as the defined threshold (75%) for TM. Query data TM data 1. Earphone is the best option available for them as it doesn't disturb others' sleep 2. It doesn't disturb others' sleep 1. Earphone is the best option available for them as it doesn't disturb others' sleep
  • 4. Introduction Structuring of TM (clause-level) In our approach we used clause splitting to define clause level structure of TM. In clause level structure of TM, We split the sentence into clauses, and put the clauses along with the its full sentence. So in clause level structure of TM, TM contains the given sentences along with their clauses. Retrieving clauses is desirable because there is a higher chance for a match to be found for a clause than for a complex sentence (contains more than one clause). A Clause contains complete thought because it comprises of a subject and a verb. Hence even if a translator does not find a match for the entire sentence, he or she still might get matches for clauses and therefore the translator will be benefited.
  • 5. Experiments and their Results Clause Extraction For clause extraction tasks we used parsing module of OpenNLP. OpenNLP provides Machine Learning models which is trained on “Penn Treebank POS” tagged corpus. OpenNLP[6] creates a “constituency parse tree” also known “phrase parse tree”. Example : For a sentence “Earphone is the best option available for them as it doesn't disturb others' sleep” generated bracket notation tree is given below – [TOP [S [NP [NN Earphone]] [VP [VBZ is] [NP [NP [DT the] [JJS best] [NN option]] [ADJP [ADJP [JJ available] [PP [IN for] [NP [PRP them]]]] [SBAR [IN as] [S [NP [PRP it]] [VP [VBD doesn't] [NP [ADJP [NN disturb] [JJ others']] [NN sleep]]]]]]]] [. .] ]]
  • 6. Experiments and their Results From the pictorial representation we can see that there are two clause- Clause 1: Earphone is the best option available for them Clause 2 : as it doesn't disturb others' sleep
  • 7. Experiments and their Results Experimental Data For our experiments we have created three different test data set for English language. Set-A contains 100 input sentences, Set-B contains 200 input sentences and Set-C contains 300 input sentences. For TM we have a data set which contains 3,500 sentences. This data contains simple sentences and complex sentences. Complex sentence contains more than one clause. TM Configuration In this dissertation we have performed experiments with three kinds of TM configurations. TM configuration 1 In TM configuration 1, Query data contains full sentences. TM data also contains full sentences. In this configuration, there is no clause level splitting either in Query data or in TM data. TM configuration 2 In TM configuration 2, Query data contains full sentences. TM data contains full sentences as well as clauses of their sentences. We used clause splitter/structure in this configuration. TM configuration 3 In TM configuration 3, Query data contains the original sentence and its clauses, in case of complex sentences. So there are not only one query but a set of queries while looking into TM database. Similarly, TM data contains the original sentences as well as the clauses of those sentences. We used clause splitter for structuring Query data in this configuration.
  • 8. Experiment and Results  ‘S’ denotes the sentences, ‘C’ denotes the clauses of these sentences, ‘S, C’ denotes the sentences and its clauses. Result on Set-A Table 1
  • 9. Experiment and Results Result on Set-B Result on Set-C Table 2 Table 3
  • 10. Experiment and Results Conclusion 1 Table 1, Table 2 and Table 3, shows that whenever there is clause splitting either in the Query data or in TM data, there is increase in % Match from TM. Conclusion 2 Table 1, Table 2 and Table 3, the Query data set is different in each case. In spite of different data set (Set-A, Set-B, Set-C) we observe that for each data set there is an increasing trend in % Match from TM. Hence we can conclude that clause splitting will always increase the % Match over sentence level data. Conclusion 3 Table 1, Table 2 and Table 3, the increase in % Match in each case is not uniform. This is because the % Match for a given data set from the TM is dependent upon the two factors, i.e. TM data and Query data which is very natural.
  • 11. Summary We have seen that when we use clause level structuring in TM, relevant matches for Query data that were earlier dropped due to low percentage match in sentence level, are also retrieved in the resulting set. So we get more relevant matches for Query data from the TM database. This study uses different TM configurations (TM configuration 1, TM configuration 2, TM configuration 3 ) to support the above claim on different test data set. A translator might not get a match for a complete sentence but he or she will still get a match for a clause, which helps him to perform translation task better, thereby increasing his productivity (translated word per hour).
  • 12. Acknowledgments Firstly, I would like to express my sincere gratitude to my Supervisor Dr. Pawan Kumar for the continuous support of my M.Tech. study and related research, for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing of this dissertation. I would also like to thank Dr. Mukul Kumar Sinha for his insightful comments and encouragement, but also for asking hard questions which helped me widen my research from various perspectives. I would also like to thank my colleague, Ms. Himanshi Thapliyal for editing and proof- reading the dissertation. Last but not the least, I would like to express my love and gratitude to my beloved family, for their understanding & motivation, through the duration of this project.
  • 13. References [1] Reinke, U. (2013), State of the Art in Translation Memory Technology. Proceedings of the Workshop on Natural Language Processing for Translation Memories (NLP4TM), pages 17–23, [2] Grönroos, Mickel., Becks,Ari., Bringing Intelligence to Translation Memory Technology. Translating and the Computer 27, November 2005 [London: Aslib, 2005] [3] Timonera, Katerina., and Mitkov, Ruslan., (Sept 2015), Improving Translation Memory Matching through Clause Splitting. Proceedings of the Workshop on Natural Language Processing for Translation Memories (NLP4TM), pages 17–23, Hissar, Bulgaria, [4] Sharma, Sanjeev Kumar.,(2016), Clause Boundary Identification for Different Languages: A Survey, International Journal of Computer Applications & Information Technology Vol. 8, Issue II 2016 (ISSN: 2278-7720) [5] https://stanfordnlp.github.io/CoreNLP/ [6] https://opennlp.apache.org/ [7] https://www.ibm.com/developerworks/library/x-localis3/ [8] Translators on translation memory (TM). Results of an ethnographic study in threetranslation services and agencies Matthieu LeBlanc UniversitĂ© de Moncton, [9] Christensen,Tina Paulsen. and Schjoldager, Anne., (2011) The Impact of Translation- Memory (TM) Technology on Cognitive Processes. NLPSC 2011 [10] A.Zerfass., (2002). Evaluating Translation Memory Systems. Proceedings of the LREC 2002 Workshop, Las Palmas, Canary Islands, SPAIN. [11] Timothy Baldwin & Hozumi Tanaka. (2001). Balancing up Efficiency and Accuracy in Translation Retrieval. Journal of Natural Language Processing vol. 8.

Editor's Notes

  1. 1