SlideShare a Scribd company logo
Combining machine translated sentence chunks
from multiple MT systems
Matīss Rikters and Inguna Skadiņa
17th International Conference on Intelligent Text Processing and Computational Linguistics
Konya, Turkey
April 5, 2016
Contents
 Hybrid Machine Translation
 Multi-System Hybrid MT
 Simple combining of translations
 Combining full whole translations
 Combining translations of sentence chunks
 Combining translations of linguistically motivated chunks
 Other work
 Future plans
Hybrid Machine Translation
 Statistical rule generation
 Rules for RBMT systems are generated from training corpora
 Multi-pass
 Process data through RBMT first, and then through SMT
 Multi-System hybrid MT
 Multiple MT systems run in parallel
Multi-System Hybrid MT
Related work:
 SMT + RBMT (Ahsan and Kolachina, 2010)
 Confusion Networks (Barrault, 2010)
 + Neural Network Model (Freitag et al., 2015)
 SMT + EBMT + TM + NE (Santanu et al., 2014)
 Recursive sentence decomposition (Mellebeek et al., 2006)
 Combining full whole translations
 Translate the full input sentence with multiple MT systems
 Choose the best translation as the output
Combining Translations
 Combining full whole translations
 Translate the full input sentence with multiple MT systems
 Choose the best translation as the output
 Combining translations of sentence chunks
 Split the sentence into smaller chunks
 The chunks are the top level subtrees of the syntax tree of the sentence
 Translate each chunk with multiple MT systems
 Choose the best translated chunks and combine them
Combining Translations
Combining full whole translations
Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google Translate Bing Translator LetsMT
Labākā tulkojuma izvēle
Tulkojuma izvade
Sentence tokenization
Translation with the online MT APIs
Selection of
the best translation
Output
Combining full whole translations
Choosing the best translation:
KenLM (Heafield, 2011) calculates probabilities based on the observed entry with
longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an
already-estimated language model. Perplexity is then calculated using this
probability: where given an unknown probability distribution p
and a proposed probability model q, it is evaluated by determining how well it
predicts a separate test sample x1, x2... xN drawn from p.
Combining full whole translations
Choosing the best translation:
 A 5-gram language model was trained with
 KenLM
 JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal
domain sentences
 Sentences are scored with the query program that comes with KenLM
Combining full whole translations
Choosing the best translation:
 A 5-gram language model was trained with
 KenLM
 JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal
domain sentences
 Sentences are scored with the query program that comes with KenLM
 Test data
 1581 random sentences from the JRC-Acquis corpus
 Tested with the ACCURAT balanced evaluation corpus - 512
general domain sentences (Skadiņš et al., 2010), but
the results were not as good
Combining full whole translations
System BLEU
Hybrid selection
Google Bing LetsMT Equal
Google Translate 16.92 100 % - - -
Bing Translator 17.16 - 100 % - -
LetsMT 28.27 - - 100 % -
Hibrīds Google + Bing 17.28 50.09 % 45.03 % - 4.88 %
Hibrīds Google + LetsMT 22.89 46.17 % - 48.39 % 5.44 %
Hibrīds LetsMT + Bing 22.83 - 45.35 % 49.84 % 4.81 %
Hibrīds Google + Bing + LetsMT 21.08 28.93 % 34.31 % 33.98 % 2.78 %
May 2015 (Rikters 2015)
Combining translated chunks of sentences
Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with the online MT APIs
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence recomposition
 Syntactic analysis:
 Berkeley Parser (Petrov et al., 2006)
 Sentences are split into chunks from the top level subtrees
of the syntax tree
Combining translated chunks of sentences
 Syntactic analysis:
 Berkeley Parser (Petrov et al., 2006)
 Sentences are split into chunks from the top level subtrees
of the syntax tree
 Selection of the best chunk:
 5-gram LM trained with KenLM and the JRC-Acquis corpus
 Sentences are scored with the query program that comes with KenLM
Combining translated chunks of sentences
 Syntactic analysis:
 Berkeley Parser (Petrov et al., 2006)
 Sentences are split into chunks from the top level subtrees
of the syntax tree
 Selection of the best chunk:
 5-gram LM trained with KenLM and the JRC-Acquis corpus
 Sentences are scored with the query program that comes with KenLM
 Test data
 1581 random sentences from the JRC-Acquis corpus
 Tested with the ACCURAT balanced evaluation corpus,
but the results were not as good
Combining translated chunks of sentences
System
BLEU Hybrid selection
MSMT SyMHyT Google Bing LetsMT
Google Translate 18.09 100% - -
Bing Translator 18.87 - 100% -
LetsMT 30.28 - - 100%
Hibrīds Google + Bing 18.73 21.27 74% 26% -
Hibrīds Google + LetsMT 24.50 26.24 25% - 75%
Hibrīds LetsMT + Bing 24.66 26.63 - 24% 76%
Hibrīds Google + Bing + LetsMT 22.69 24.72 17% 18% 65%
September 2015 (Rikters and Skadiņa 2016)
Combining translated chunks of sentences
Combining translations of
linguistically motivated chunks
 An advanced approach to chunking
 Traverse the syntax tree bottom up, from right to left
 Add a word to the current chunk if
 The current chunk is not too long (sentence word count / 4)
 The word is non-alphabetic or only one symbol long
 The word begins with a genitive phrase («of »)
 Otherwise, initialize a new chunk with the word
 In case when chunking results in too many chunks, repeat the process, allowing
more (than sentence word count / 4) words in a chunk
 An advanced approach to chunking
 Traverse the syntax tree bottom up, from right to left
 Add a word to the current chunk if
 The current chunk is not too long (sentence word count / 4)
 The word is non-alphabetic or only one symbol long
 The word begins with a genitive phrase («of »)
 Otherwise, initialize a new chunk with the word
 In case when chunking results in too many chunks, repeat the process, allowing
more (than sentence word count / 4) words in a chunk
 Changes in the MT API systems
 LetsMT API temporarily replaced with Hugo.lv API
 Added Yandex API
Combining translations of
linguistically motivated chunks
Combining translations of
linguistically motivated chunks
Selection of the best translation:
 6-gram and 12-gram LMs trained with
 KenLM
 JRC-Acquis corpus v. 3.0
 DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian
legal domain sentences
 Sentences scored with the query program from KenLM
Combining translations of
linguistically motivated chunks
Selection of the best translation:
 6-gram and 12-gram LMs trained with
 KenLM
 JRC-Acquis corpus v. 3.0
 DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian
legal domain sentences
 Sentences scored with the query program from KenLM
 Test data
 1581 random sentences from the JRC-Acquis corpus
 ACCURAT balanced evaluation corpus
Combining translations of
linguistically motivated chunks
Sentence chunks with SyMHyT Sentence chunks with ChunkMT
• Recently
• there
• has been an increased interest in the
automated discovery of equivalent
expressions in different languages
• .
• Recently there has been an increased
interest
• in the automated discovery of
equivalent expressions
• in different languages .
Combining translations of
linguistically motivated chunks
Combining translations of
linguistically motivated chunks
Combining translations of
linguistically motivated chunks
System BLEU Equal Bing Google Hugo Yandex
BLEU - - 17.43 17.73 17.14 16.04
MSMT - Google + Bing 17.70 7.25% 43.85% 48.90% - -
MSMT- Google + Bing + LetsMT 17.63 3.55% 33.71% 30.76% 31.98% -
SyMHyT - Google + Bing 17.95 4.11% 19.46% 76.43% - -
SyMHyT - Google + Bing +
LetsMT 17.30 3.88% 15.23% 19.48% 61.41% -
ChunkMT - Google + Bing 18.29 22.75% 39.10% 38.15% - -
ChunkMT – all four 19.21 7.36% 30.01% 19.47% 32.25% 10.91%
January 2016
Combining translations of
linguistically motivated chunks
• Matīss Rikters
"Multi-system machine translation using online APIs
for English-Latvian"
ACL-IJCNLP 2015
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
LREC 2016
Related publications
K-translate - interactive multi-system machine translation
 About the same as ChunkMT but with a nice user interface
 Draws a syntax tree with chunks highlighted
 Designates which chunks where chosen from which system
 Provides a confidence score for the choices
 Allows using online APIs or user provided machine translations
 Comes with resources for translating between English, French, German and Latvian
 Can be used in a web browser
Work in progress
K-translate - interactive multi-system machine translation
Start page
Translate with
onlinesystems
Inputtranslations
to combine
Input
translated
chunks
Settings
Translation results
Inputsource
sentence
Inputsource
sentence
Work in progress
Code on GitHub
http://ej.uz/ChunkMT
http://ej.uz/SyMHyT
http://ej.uz/MSMT
http://ej.uz/chunker
Future work
 More enhancements for the chunking step
 Add special processing of multi-word expressions (MWEs)
 Try out other types of LMs
 POS tag + lemma
 Recurrent Neural Network Language Model
(Mikolov et al., 2010)
 Continuous Space Language Model
(Schwenk et al., 2006)
 Character-Aware Neural Language Model
(Kim et al., 2015)
 Choose the best translation candidate with MT quality estimation
 QuEst++ (Specia et al., 2015)
 SHEF-NN (Shah et al., 2015)
Future ideas
References
 Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth
Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010).
 Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics
93 (2010): 147-155.
 Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on
Natural Language Processing. , 2014.
 Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
 Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical
Machine Translation. Association for Computational Linguistics, 2011.
 Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058
(2006).
 Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International
Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics.
Association for Computational Linguistics, 2006.
 Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
 Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the
Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.
 Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
 Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine
translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics,
2006.
 Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
 Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual
Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language
Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.
 Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on
Statistical Machine Translation. 2015.
Thank you!
Questions?

More Related Content

Similar to Combining machine translated sentence chunks from multiple MT systems

Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
Matīss ‎‎‎‎‎‎‎  
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-Latvian
Matīss ‎‎‎‎‎‎‎  
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Sheeyam Shellvacumar
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
Amir Abdalla
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
RIILP
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
Chamani Shiranthika
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
REMEGIUSPRAVEENSAHAY
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
TAUS - The Language Data Network
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
ivan provalov
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Association for Computational Linguistics
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
Ding Li
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper Writing
Manuel Castro
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
IJECEIAES
 
Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17
Garth Brian Hedenskog
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
ijnlc
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Association for Computational Linguistics
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
Stephen Peacock
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
IRJET Journal
 

Similar to Combining machine translated sentence chunks from multiple MT systems (20)

Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-Latvian
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Natural language processing and transformer models
Natural language processing and transformer modelsNatural language processing and transformer models
Natural language processing and transformer models
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
Work in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper WritingWork in progress: ChatGPT as an Assistant in Paper Writing
Work in progress: ChatGPT as an Assistant in Paper Writing
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17
 
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
EMPLOYING PIVOT LANGUAGE TECHNIQUE THROUGH STATISTICAL AND NEURAL MACHINE TRA...
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
 
What is machine translation
What is machine translationWhat is machine translation
What is machine translation
 
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
Question Retrieval in Community Question Answering via NON-Negative Matrix Fa...
 

More from Matīss ‎‎‎‎‎‎‎  

日本のお風呂
日本のお風呂日本のお風呂
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
Matīss ‎‎‎‎‎‎‎  
 
私の趣味
私の趣味私の趣味
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
Matīss ‎‎‎‎‎‎‎  
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
Matīss ‎‎‎‎‎‎‎  
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
Matīss ‎‎‎‎‎‎‎  
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
Matīss ‎‎‎‎‎‎‎  
 
ラトビア大学
ラトビア大学ラトビア大学
私の趣味
私の趣味私の趣味
富士山りょこう
富士山りょこう富士山りょこう
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
Matīss ‎‎‎‎‎‎‎  
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
Matīss ‎‎‎‎‎‎‎  
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
Matīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
Matīss ‎‎‎‎‎‎‎  
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
Matīss ‎‎‎‎‎‎‎  
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
Matīss ‎‎‎‎‎‎‎  
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Matīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 

Recently uploaded

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
Alpen-Adria-Universität
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
nkrafacyberclub
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Aggregage
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
ThomasParaiso2
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 

Recently uploaded (20)

Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
Video Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the FutureVideo Streaming: Then, Now, and in the Future
Video Streaming: Then, Now, and in the Future
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptxSecstrike : Reverse Engineering & Pwnable tools for CTF.pptx
Secstrike : Reverse Engineering & Pwnable tools for CTF.pptx
 
Generative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to ProductionGenerative AI Deep Dive: Advancing from Proof of Concept to Production
Generative AI Deep Dive: Advancing from Proof of Concept to Production
 
GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...GridMate - End to end testing is a critical piece to ensure quality and avoid...
GridMate - End to end testing is a critical piece to ensure quality and avoid...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 

Combining machine translated sentence chunks from multiple MT systems

  • 1. Combining machine translated sentence chunks from multiple MT systems Matīss Rikters and Inguna Skadiņa 17th International Conference on Intelligent Text Processing and Computational Linguistics Konya, Turkey April 5, 2016
  • 2. Contents  Hybrid Machine Translation  Multi-System Hybrid MT  Simple combining of translations  Combining full whole translations  Combining translations of sentence chunks  Combining translations of linguistically motivated chunks  Other work  Future plans
  • 3. Hybrid Machine Translation  Statistical rule generation  Rules for RBMT systems are generated from training corpora  Multi-pass  Process data through RBMT first, and then through SMT  Multi-System hybrid MT  Multiple MT systems run in parallel
  • 4. Multi-System Hybrid MT Related work:  SMT + RBMT (Ahsan and Kolachina, 2010)  Confusion Networks (Barrault, 2010)  + Neural Network Model (Freitag et al., 2015)  SMT + EBMT + TM + NE (Santanu et al., 2014)  Recursive sentence decomposition (Mellebeek et al., 2006)
  • 5.  Combining full whole translations  Translate the full input sentence with multiple MT systems  Choose the best translation as the output Combining Translations
  • 6.  Combining full whole translations  Translate the full input sentence with multiple MT systems  Choose the best translation as the output  Combining translations of sentence chunks  Split the sentence into smaller chunks  The chunks are the top level subtrees of the syntax tree of the sentence  Translate each chunk with multiple MT systems  Choose the best translated chunks and combine them Combining Translations
  • 7. Combining full whole translations Teikumu dalīšana tekstvienībās Tulkošana ar tiešsaistes MT API Google Translate Bing Translator LetsMT Labākā tulkojuma izvēle Tulkojuma izvade Sentence tokenization Translation with the online MT APIs Selection of the best translation Output
  • 8. Combining full whole translations Choosing the best translation: KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 9. Combining full whole translations Choosing the best translation:  A 5-gram language model was trained with  KenLM  JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal domain sentences  Sentences are scored with the query program that comes with KenLM
  • 10. Combining full whole translations Choosing the best translation:  A 5-gram language model was trained with  KenLM  JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal domain sentences  Sentences are scored with the query program that comes with KenLM  Test data  1581 random sentences from the JRC-Acquis corpus  Tested with the ACCURAT balanced evaluation corpus - 512 general domain sentences (Skadiņš et al., 2010), but the results were not as good
  • 11. Combining full whole translations System BLEU Hybrid selection Google Bing LetsMT Equal Google Translate 16.92 100 % - - - Bing Translator 17.16 - 100 % - - LetsMT 28.27 - - 100 % - Hibrīds Google + Bing 17.28 50.09 % 45.03 % - 4.88 % Hibrīds Google + LetsMT 22.89 46.17 % - 48.39 % 5.44 % Hibrīds LetsMT + Bing 22.83 - 45.35 % 49.84 % 4.81 % Hibrīds Google + Bing + LetsMT 21.08 28.93 % 34.31 % 33.98 % 2.78 % May 2015 (Rikters 2015)
  • 12. Combining translated chunks of sentences Teikumu dalīšana tekstvienībās Tulkošana ar tiešsaistes MT API Google Translate Bing Translator LetsMT Labāko fragmentu izvēle Tulkojumu izvade Teikumu sadalīšana fragmentos Sintaktiskā analīze Teikumu apvienošana Sentence tokenization Translation with the online MT APIs Selection of the best chunks Output Syntactic analysis Sentence chunking Sentence recomposition
  • 13.  Syntactic analysis:  Berkeley Parser (Petrov et al., 2006)  Sentences are split into chunks from the top level subtrees of the syntax tree Combining translated chunks of sentences
  • 14.  Syntactic analysis:  Berkeley Parser (Petrov et al., 2006)  Sentences are split into chunks from the top level subtrees of the syntax tree  Selection of the best chunk:  5-gram LM trained with KenLM and the JRC-Acquis corpus  Sentences are scored with the query program that comes with KenLM Combining translated chunks of sentences
  • 15.  Syntactic analysis:  Berkeley Parser (Petrov et al., 2006)  Sentences are split into chunks from the top level subtrees of the syntax tree  Selection of the best chunk:  5-gram LM trained with KenLM and the JRC-Acquis corpus  Sentences are scored with the query program that comes with KenLM  Test data  1581 random sentences from the JRC-Acquis corpus  Tested with the ACCURAT balanced evaluation corpus, but the results were not as good Combining translated chunks of sentences
  • 16. System BLEU Hybrid selection MSMT SyMHyT Google Bing LetsMT Google Translate 18.09 100% - - Bing Translator 18.87 - 100% - LetsMT 30.28 - - 100% Hibrīds Google + Bing 18.73 21.27 74% 26% - Hibrīds Google + LetsMT 24.50 26.24 25% - 75% Hibrīds LetsMT + Bing 24.66 26.63 - 24% 76% Hibrīds Google + Bing + LetsMT 22.69 24.72 17% 18% 65% September 2015 (Rikters and Skadiņa 2016) Combining translated chunks of sentences
  • 17. Combining translations of linguistically motivated chunks  An advanced approach to chunking  Traverse the syntax tree bottom up, from right to left  Add a word to the current chunk if  The current chunk is not too long (sentence word count / 4)  The word is non-alphabetic or only one symbol long  The word begins with a genitive phrase («of »)  Otherwise, initialize a new chunk with the word  In case when chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk
  • 18.  An advanced approach to chunking  Traverse the syntax tree bottom up, from right to left  Add a word to the current chunk if  The current chunk is not too long (sentence word count / 4)  The word is non-alphabetic or only one symbol long  The word begins with a genitive phrase («of »)  Otherwise, initialize a new chunk with the word  In case when chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk  Changes in the MT API systems  LetsMT API temporarily replaced with Hugo.lv API  Added Yandex API Combining translations of linguistically motivated chunks
  • 20. Selection of the best translation:  6-gram and 12-gram LMs trained with  KenLM  JRC-Acquis corpus v. 3.0  DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences  Sentences scored with the query program from KenLM Combining translations of linguistically motivated chunks
  • 21. Selection of the best translation:  6-gram and 12-gram LMs trained with  KenLM  JRC-Acquis corpus v. 3.0  DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences  Sentences scored with the query program from KenLM  Test data  1581 random sentences from the JRC-Acquis corpus  ACCURAT balanced evaluation corpus Combining translations of linguistically motivated chunks
  • 22. Sentence chunks with SyMHyT Sentence chunks with ChunkMT • Recently • there • has been an increased interest in the automated discovery of equivalent expressions in different languages • . • Recently there has been an increased interest • in the automated discovery of equivalent expressions • in different languages . Combining translations of linguistically motivated chunks
  • 25. System BLEU Equal Bing Google Hugo Yandex BLEU - - 17.43 17.73 17.14 16.04 MSMT - Google + Bing 17.70 7.25% 43.85% 48.90% - - MSMT- Google + Bing + LetsMT 17.63 3.55% 33.71% 30.76% 31.98% - SyMHyT - Google + Bing 17.95 4.11% 19.46% 76.43% - - SyMHyT - Google + Bing + LetsMT 17.30 3.88% 15.23% 19.48% 61.41% - ChunkMT - Google + Bing 18.29 22.75% 39.10% 38.15% - - ChunkMT – all four 19.21 7.36% 30.01% 19.47% 32.25% 10.91% January 2016 Combining translations of linguistically motivated chunks
  • 26. • Matīss Rikters "Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 • Matīss Rikters and Inguna Skadiņa "Syntax-based multi-system machine translation" LREC 2016 Related publications
  • 27. K-translate - interactive multi-system machine translation  About the same as ChunkMT but with a nice user interface  Draws a syntax tree with chunks highlighted  Designates which chunks where chosen from which system  Provides a confidence score for the choices  Allows using online APIs or user provided machine translations  Comes with resources for translating between English, French, German and Latvian  Can be used in a web browser Work in progress
  • 28. K-translate - interactive multi-system machine translation Start page Translate with onlinesystems Inputtranslations to combine Input translated chunks Settings Translation results Inputsource sentence Inputsource sentence Work in progress
  • 30. Future work  More enhancements for the chunking step  Add special processing of multi-word expressions (MWEs)  Try out other types of LMs  POS tag + lemma  Recurrent Neural Network Language Model (Mikolov et al., 2010)  Continuous Space Language Model (Schwenk et al., 2006)  Character-Aware Neural Language Model (Kim et al., 2015)  Choose the best translation candidate with MT quality estimation  QuEst++ (Specia et al., 2015)  SHEF-NN (Shah et al., 2015) Future ideas
  • 31. References  Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010).  Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.  Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014.  Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).  Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011.  Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).  Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006.  Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).  Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.  Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.  Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.  Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).  Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015.  Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015.