SlideShare a Scribd company logo
1 of 28
Searching for the Best
Translation Combination
Matīss Rikters, University of Latvia
The 7th International Conference
Human Language Technologies - the Baltic Perspective
Riga, Latvia
October 6, 2016
Contents
Hybrid Machine Translation
Multi-System Hybrid MT
Simple combining of translations
– Combining full whole translations
– Combining translations of sentence chunks
Combining translations of linguistically motivated chunks
Searching for the best translation combination
Other work
Future plans
Hybrid Machine Translation
Statistical rule generation
– Rules for RBMT systems are generated from training corpora
Multi-pass
– Process data through RBMT first, and then through SMT
Multi-System hybrid MT
– Multiple MT systems run in parallel
Multi-System Hybrid MT
Related work:
SMT + RBMT (Ahsan and Kolachina, 2010)
Confusion Networks (Barrault, 2010)
– + Neural Network Model (Freitag et al., 2015)
SMT + EBMT + TM + NE (Santanu et al., 2014)
Recursive sentence decomposition (Mellebeek et al., 2006)
Combining full whole translations
– Translate the full input sentence with multiple MT systems
– Choose the best translation as the output
Combining translations of sentence chunks
– Split the sentence into smaller chunks
• The chunks are the top level subtrees of the syntax tree of the sentence
– Translate each chunk with multiple MT systems
– Choose the best translated chunks and combine them
Combining Translations
Choose the best candidate
KenLM (Heafield, 2011) calculates probabilities based on the observed
entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are
given by an already-estimated language model. Perplexity is then
calculated using this probability: where given an
unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample
x1, x2... xN drawn from p.
Whole translations
Choosing the best candidate:
A 5-gram language model trained with
– KenLM
– JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal domain
sentences
– Sentences are scored with the query program that comes with KenLM
Test data
– 1581 random sentences from the JRC-Acquis corpus
– Tested with the ACCURAT balanced evaluation corpus - 512 general domain
sentences (Skadiņš et al., 2010), but the results were not as good
Simple:
– Berkeley Parser (Petrov et al., 2006)
– Sentences are split into chunks from the top level subtrees
of the syntax tree
Linguistically motivated:
– Traverse the syntax tree bottom up, from right to left
– Add a word to the current chunk if
• The current chunk is not too long (sentence word count / 4)
• The word is non-alphabetic or only one symbol long
• The word begins with a genitive phrase («of »)
– Otherwise, initialize a new chunk with the word
– In case when chunking results in too many chunks, repeat the process, allowing
more (than sentence word count / 4) words in a chunk
Changes in the MT API systems:
– LetsMT API swapped with Hugo.lv API
– Added Yandex API
12-gram LM trained with
– DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal
domain sentences
Chunks
Teikumu dalīšana tekstvienībās
Tulkošana ar tiešsaistes MT API
Google Translate Bing Translator LetsMT
Labākā tulkojuma izvēle
Tulkojuma izvade
Sentence tokenization
Translation with online MT
Selection of
the best translation
Output
Whole translations
Teikumu dalīšana tekstvienībās
Tulkošana artiešsaistes MT API
Google
Translate
Bing
Translator
LetsMT
Labāko fragmentu izvēle
Tulkojumu izvade
Teikumu sadalīšana fragmentos
Sintaktiskā analīze
Teikumu apvienošana
Sentence tokenization
Translation with online MT
Selection of
the best chunks
Output
Syntactic analysis
Sentence chunking
Sentence
recomposition
Chunks
Sentence chunking
Simple chunks Linguistically motivated chunks
• Recently
• there
• has been an increased interest in the
automated discovery of equivalent
expressions in different languages
• .
• Recently there has been an increased interest
• in the automated discovery of equivalent
expressions
• in different languages .
Example sentence
Example sentence
Recently there has been an increased interest
in the automated discovery
of equivalent expressions in different languages .
Example sentence
Whole translations
System BLEU
Hybrid selection
Google Bing LetsMT Equal
Google Translate 16.92 100 % - - -
Bing Translator 17.16 - 100 % - -
LetsMT 28.27 - - 100 % -
Hybrid Google + Bing 17.28 50.09 % 45.03 % - 4.88 %
Hybrid Google + LetsMT 22.89 46.17 % - 48.39 % 5.44 %
Hybrid LetsMT + Bing 22.83 - 45.35 % 49.84 % 4.81 %
Hybrid Google + Bing + LetsMT 21.08 28.93 % 34.31 % 33.98 % 2.78 %
May 2015 results (Rikters 2015)
System
BLEU Hybrid selection
Whole
translations
Simple chunks Google Bing LetsMT
Google Translate 18.09 100% - -
Bing Translator 18.87 - 100% -
LetsMT 30.28 - - 100%
Simple Chunks G + B 18.73 21.27 74% 26% -
Simple Chunks G + L 24.50 26.24 25% - 75%
Simple Chunks L + B 24.66 26.63 - 24% 76%
Simple Chunks G + B + L 22.69 24.72 17% 18% 65%
September 2015 (Rikters and Skadiņa 2016(1))
Simple chunks
System BLEU Equal Bing Google Hugo Yandex
BLEU - - 17.43 17.73 17.14 16.04
Whole translations G + B 17.70 7.25% 43.85% 48.90% - -
Whole translations G + B + H 17.63 3.55% 33.71% 30.76% 31.98% -
Simple Chunks G + B 17.95 4.11% 19.46% 76.43% - -
Simple Chunks G + B + H 17.30 3.88% 15.23% 19.48% 61.41% -
Linguistic Chunks G + B 18.29 22.75% 39.10% 38.15% - -
Linguistic Chunks G + B + H + Y 19.21 7.36% 30.01% 19.47% 32.25% 10.91%
Linguistically motivated chunks
January 2016 (Rikters and Skadiņa 2016(2))
Searching for the best
The main differences:
• the manner of scoring chunks with the LM and selecting the best
translation
• utilisation of multi-threaded computing that allows to run the
process on all available CPU cores in parallel
• still very slow
Searching for the best
Chunks Combinations
Count Percentage
Legal General Legal General
1 4 210 16 13.28% 3.13%
2 16 178 78 11.26% 15.23%
3 64 262 131 16.57% 25.59%
4 256 273 127 17.27% 24.80%
5 1024 275 94 17.39% 18.36%
6 4096 201 47 12.71% 9.18%
7 16384 96 11 6.07% 2.15%
8 65536 49 6 3.10% 1.17%
9 262144 37 2 2.34% 0.39%
Searching for the best
Legal domain General domain
Searching for the best
System
BLEU
Legal General
Full-search 23.61 14.40
Linguistic chunks 20.00 17.27
Bing 16.99 17.43
Google 16.19 17.72
Hugo 20.27 17.13
Yandex 19.75 16.03
May 2016
Neural Network LM
16.00
17.00
18.00
19.00
20.00
21.00
22.00
23.00
24.00
25.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77
BLEU
Perplexity
Epoch
Perplexity BLEU-HY Linear (BLEU-HY)
Neural Network LM
13.30
13.80
14.30
14.80
15.30
15.80
16.30
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77
BLEU
Perplexity
Epoch
Perplexity BLEU Linear (BLEU)
More enhancements for the chunking step
Add special processing of multi-word expressions (MWEs)
Try out other types of LMs
– POS tag + lemma
– Recurrent Neural Network Language Model
(Mikolov et al., 2010)
– Continuous Space Language Model
(Schwenk et al., 2006)
– Character-Aware Neural Language Model
(Kim et al., 2015)
Choose the best translation candidate with MT quality estimation
– QuEst++ (Specia et al., 2015)
– SHEF-NN (Shah et al., 2015)
Handling MWEs in neural machine translation systems
Future work
• Matīss Rikters
"Multi-system machine translation using online APIs for English-Latvian"
ACL-IJCNLP 2015
• Matīss Rikters and Inguna Skadiņa
"Syntax-based multi-system machine translation"
LREC 2016
• Matīss Rikters and Inguna Skadiņa
"Combining machine translated sentence chunks from multiple MT systems"
CICLing 2016
• Matīss Rikters
"K-translate – interactive multi-system machine translation"
Baltic DB&IS 2016
Related publications
Code on GitHub
http://ej.uz/ChunkMT
http://ej.uz/SyMHyT
http://ej.uz/MSMT
http://ej.uz/chunker
Code on GitHub
References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of
the Association for Machine Translation in the Americas." Denver, Colorado (2010).
• Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155.
• Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation.
Association for Computational Linguistics, 2011.
• Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015).
• Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006).
• Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010.
• Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on
Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics,
2006.
• Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International
Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132.
• Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016)
• Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016)
• Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language
Processing. , 2014.
• Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of
the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006.
• Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine
Translation. 2015.
• Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association
for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural
Language Processing: System Demonstrations. 2015.
• Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013).
• Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006).
References
Thank you!
Thank you!

More Related Content

Viewers also liked

印物所介绍_20150717
印物所介绍_20150717印物所介绍_20150717
印物所介绍_20150717Noah LIU
 
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...TAUS - The Language Data Network
 
Mz twitter-1.1-sdl
Mz twitter-1.1-sdlMz twitter-1.1-sdl
Mz twitter-1.1-sdlAngus Fox
 
Online track politic wave_june_06.16
Online track politic wave_june_06.16Online track politic wave_june_06.16
Online track politic wave_june_06.16Kantar Ukraine
 
Guida al sistema di compilazione del bando efficientamento energetico 0
Guida al sistema di compilazione del bando efficientamento energetico 0Guida al sistema di compilazione del bando efficientamento energetico 0
Guida al sistema di compilazione del bando efficientamento energetico 0POR FESR Toscana
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to TranslationMohammed Raiyah
 
Современная аналитика: как заставить данные работать на бизнес
Современная аналитика: как заставить данные работать на бизнесСовременная аналитика: как заставить данные работать на бизнес
Современная аналитика: как заставить данные работать на бизнесUAMASTER Digital Agency
 
What Is Translation? , by Dr. Shadia Yousef Banjar
What Is Translation? , by Dr. Shadia Yousef BanjarWhat Is Translation? , by Dr. Shadia Yousef Banjar
What Is Translation? , by Dr. Shadia Yousef BanjarDr. Shadia Banjar
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introductionnlab_utokyo
 
Translation Studies
Translation StudiesTranslation Studies
Translation StudiesArdiansyah -
 

Viewers also liked (20)

Catálogo de economía
Catálogo de economíaCatálogo de economía
Catálogo de economía
 
印物所介绍_20150717
印物所介绍_20150717印物所介绍_20150717
印物所介绍_20150717
 
MT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, LingosailMT Use in Lingosail, by Yongpeng Wei, Lingosail
MT Use in Lingosail, by Yongpeng Wei, Lingosail
 
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
 
Mz twitter-1.1-sdl
Mz twitter-1.1-sdlMz twitter-1.1-sdl
Mz twitter-1.1-sdl
 
Welcome to Martec
Welcome to MartecWelcome to Martec
Welcome to Martec
 
Vik 09-09-2016 (3)
Vik 09-09-2016 (3)Vik 09-09-2016 (3)
Vik 09-09-2016 (3)
 
CV-RAUL
CV-RAULCV-RAUL
CV-RAUL
 
Online track politic wave_june_06.16
Online track politic wave_june_06.16Online track politic wave_june_06.16
Online track politic wave_june_06.16
 
Guida al sistema di compilazione del bando efficientamento energetico 0
Guida al sistema di compilazione del bando efficientamento energetico 0Guida al sistema di compilazione del bando efficientamento energetico 0
Guida al sistema di compilazione del bando efficientamento energetico 0
 
Ciudad
CiudadCiudad
Ciudad
 
Introduction to Translation
Introduction to TranslationIntroduction to Translation
Introduction to Translation
 
Newsletter sept oct_2015_rus
Newsletter sept oct_2015_rusNewsletter sept oct_2015_rus
Newsletter sept oct_2015_rus
 
Современная аналитика: как заставить данные работать на бизнес
Современная аналитика: как заставить данные работать на бизнесСовременная аналитика: как заставить данные работать на бизнес
Современная аналитика: как заставить данные работать на бизнес
 
JWT Ukraine: Cannes Lions 2015
JWT Ukraine: Cannes Lions 2015 JWT Ukraine: Cannes Lions 2015
JWT Ukraine: Cannes Lions 2015
 
What Is Translation? , by Dr. Shadia Yousef Banjar
What Is Translation? , by Dr. Shadia Yousef BanjarWhat Is Translation? , by Dr. Shadia Yousef Banjar
What Is Translation? , by Dr. Shadia Yousef Banjar
 
image of a city
image of a cityimage of a city
image of a city
 
Machine Translation Introduction
Machine Translation IntroductionMachine Translation Introduction
Machine Translation Introduction
 
History of translation studies
History of translation studiesHistory of translation studies
History of translation studies
 
Translation Studies
Translation StudiesTranslation Studies
Translation Studies
 

Similar to Searching for the best translation combination

Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMatīss ‎‎‎‎‎‎‎  
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...Matīss ‎‎‎‎‎‎‎  
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Sheeyam Shellvacumar
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdfAmir Abdalla
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010ivan provalov
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?Guus van den Brekel
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval ShujaatZaheer3
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?Constantin Orasan
 

Similar to Searching for the best translation combination (20)

Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Searching for the Best Machine Translation Combination
Searching for the Best Machine Translation CombinationSearching for the Best Machine Translation Combination
Searching for the Best Machine Translation Combination
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Multi-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-LatvianMulti-system machine translation using online APIs for English-Latvian
Multi-system machine translation using online APIs for English-Latvian
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 Neural Network Language Models for Candidate Scoring in Multi-System Machine... Neural Network Language Models for Candidate Scoring in Multi-System Machine...
Neural Network Language Models for Candidate Scoring in Multi-System Machine...
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.Real-time DirectTranslation System for Sinhala and Tamil Languages.
Real-time DirectTranslation System for Sinhala and Tamil Languages.
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
 
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
Michigan Information Retrieval Enthusiasts Group Meetup - August 19, 2010
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?RUGCombine & Livetrix : search for a perfect interface ....?
RUGCombine & Livetrix : search for a perfect interface ....?
 
RUGCombine & Livetrix
RUGCombine & LivetrixRUGCombine & Livetrix
RUGCombine & Livetrix
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Information Retrieval
Information Retrieval Information Retrieval
Information Retrieval
 
From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?From TREC to Watson: is open domain question answering a solved problem?
From TREC to Watson: is open domain question answering a solved problem?
 

More from Matīss ‎‎‎‎‎‎‎  

Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Automated Translation of Multi-word Expressions Application in English-Latvia...
Automated Translation of Multi-word Expressions Application in English-Latvia...Automated Translation of Multi-word Expressions Application in English-Latvia...
Automated Translation of Multi-word Expressions Application in English-Latvia...Matīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 
CoLing 2016
CoLing 2016CoLing 2016
CoLing 2016
 
Automated Translation of Multi-word Expressions Application in English-Latvia...
Automated Translation of Multi-word Expressions Application in English-Latvia...Automated Translation of Multi-word Expressions Application in English-Latvia...
Automated Translation of Multi-word Expressions Application in English-Latvia...
 

Recently uploaded

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 

Recently uploaded (20)

Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 

Searching for the best translation combination

  • 1. Searching for the Best Translation Combination Matīss Rikters, University of Latvia The 7th International Conference Human Language Technologies - the Baltic Perspective Riga, Latvia October 6, 2016
  • 2. Contents Hybrid Machine Translation Multi-System Hybrid MT Simple combining of translations – Combining full whole translations – Combining translations of sentence chunks Combining translations of linguistically motivated chunks Searching for the best translation combination Other work Future plans
  • 3. Hybrid Machine Translation Statistical rule generation – Rules for RBMT systems are generated from training corpora Multi-pass – Process data through RBMT first, and then through SMT Multi-System hybrid MT – Multiple MT systems run in parallel
  • 4. Multi-System Hybrid MT Related work: SMT + RBMT (Ahsan and Kolachina, 2010) Confusion Networks (Barrault, 2010) – + Neural Network Model (Freitag et al., 2015) SMT + EBMT + TM + NE (Santanu et al., 2014) Recursive sentence decomposition (Mellebeek et al., 2006)
  • 5. Combining full whole translations – Translate the full input sentence with multiple MT systems – Choose the best translation as the output Combining translations of sentence chunks – Split the sentence into smaller chunks • The chunks are the top level subtrees of the syntax tree of the sentence – Translate each chunk with multiple MT systems – Choose the best translated chunks and combine them Combining Translations
  • 6. Choose the best candidate KenLM (Heafield, 2011) calculates probabilities based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already-estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 7. Whole translations Choosing the best candidate: A 5-gram language model trained with – KenLM – JRC-Acquis corpus v. 3.0 (Steinberger, 2006) - 1.4 million Latvian legal domain sentences – Sentences are scored with the query program that comes with KenLM Test data – 1581 random sentences from the JRC-Acquis corpus – Tested with the ACCURAT balanced evaluation corpus - 512 general domain sentences (Skadiņš et al., 2010), but the results were not as good
  • 8. Simple: – Berkeley Parser (Petrov et al., 2006) – Sentences are split into chunks from the top level subtrees of the syntax tree Linguistically motivated: – Traverse the syntax tree bottom up, from right to left – Add a word to the current chunk if • The current chunk is not too long (sentence word count / 4) • The word is non-alphabetic or only one symbol long • The word begins with a genitive phrase («of ») – Otherwise, initialize a new chunk with the word – In case when chunking results in too many chunks, repeat the process, allowing more (than sentence word count / 4) words in a chunk Changes in the MT API systems: – LetsMT API swapped with Hugo.lv API – Added Yandex API 12-gram LM trained with – DGT-Translation Memory corpus (Steinberger, 2011) – 3.1 million Latvian legal domain sentences Chunks
  • 9. Teikumu dalīšana tekstvienībās Tulkošana ar tiešsaistes MT API Google Translate Bing Translator LetsMT Labākā tulkojuma izvēle Tulkojuma izvade Sentence tokenization Translation with online MT Selection of the best translation Output Whole translations
  • 10. Teikumu dalīšana tekstvienībās Tulkošana artiešsaistes MT API Google Translate Bing Translator LetsMT Labāko fragmentu izvēle Tulkojumu izvade Teikumu sadalīšana fragmentos Sintaktiskā analīze Teikumu apvienošana Sentence tokenization Translation with online MT Selection of the best chunks Output Syntactic analysis Sentence chunking Sentence recomposition Chunks
  • 12. Simple chunks Linguistically motivated chunks • Recently • there • has been an increased interest in the automated discovery of equivalent expressions in different languages • . • Recently there has been an increased interest • in the automated discovery of equivalent expressions • in different languages . Example sentence
  • 13. Example sentence Recently there has been an increased interest in the automated discovery of equivalent expressions in different languages .
  • 15. Whole translations System BLEU Hybrid selection Google Bing LetsMT Equal Google Translate 16.92 100 % - - - Bing Translator 17.16 - 100 % - - LetsMT 28.27 - - 100 % - Hybrid Google + Bing 17.28 50.09 % 45.03 % - 4.88 % Hybrid Google + LetsMT 22.89 46.17 % - 48.39 % 5.44 % Hybrid LetsMT + Bing 22.83 - 45.35 % 49.84 % 4.81 % Hybrid Google + Bing + LetsMT 21.08 28.93 % 34.31 % 33.98 % 2.78 % May 2015 results (Rikters 2015)
  • 16. System BLEU Hybrid selection Whole translations Simple chunks Google Bing LetsMT Google Translate 18.09 100% - - Bing Translator 18.87 - 100% - LetsMT 30.28 - - 100% Simple Chunks G + B 18.73 21.27 74% 26% - Simple Chunks G + L 24.50 26.24 25% - 75% Simple Chunks L + B 24.66 26.63 - 24% 76% Simple Chunks G + B + L 22.69 24.72 17% 18% 65% September 2015 (Rikters and Skadiņa 2016(1)) Simple chunks
  • 17. System BLEU Equal Bing Google Hugo Yandex BLEU - - 17.43 17.73 17.14 16.04 Whole translations G + B 17.70 7.25% 43.85% 48.90% - - Whole translations G + B + H 17.63 3.55% 33.71% 30.76% 31.98% - Simple Chunks G + B 17.95 4.11% 19.46% 76.43% - - Simple Chunks G + B + H 17.30 3.88% 15.23% 19.48% 61.41% - Linguistic Chunks G + B 18.29 22.75% 39.10% 38.15% - - Linguistic Chunks G + B + H + Y 19.21 7.36% 30.01% 19.47% 32.25% 10.91% Linguistically motivated chunks January 2016 (Rikters and Skadiņa 2016(2))
  • 18. Searching for the best The main differences: • the manner of scoring chunks with the LM and selecting the best translation • utilisation of multi-threaded computing that allows to run the process on all available CPU cores in parallel • still very slow
  • 19. Searching for the best Chunks Combinations Count Percentage Legal General Legal General 1 4 210 16 13.28% 3.13% 2 16 178 78 11.26% 15.23% 3 64 262 131 16.57% 25.59% 4 256 273 127 17.27% 24.80% 5 1024 275 94 17.39% 18.36% 6 4096 201 47 12.71% 9.18% 7 16384 96 11 6.07% 2.15% 8 65536 49 6 3.10% 1.17% 9 262144 37 2 2.34% 0.39%
  • 20. Searching for the best Legal domain General domain
  • 21. Searching for the best System BLEU Legal General Full-search 23.61 14.40 Linguistic chunks 20.00 17.27 Bing 16.99 17.43 Google 16.19 17.72 Hugo 20.27 17.13 Yandex 19.75 16.03 May 2016
  • 22. Neural Network LM 16.00 17.00 18.00 19.00 20.00 21.00 22.00 23.00 24.00 25.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU-HY Linear (BLEU-HY)
  • 23. Neural Network LM 13.30 13.80 14.30 14.80 15.30 15.80 16.30 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0.11 0.20 0.32 0.41 0.50 0.61 0.70 0.79 0.88 1.00 1.09 1.20 1.29 1.40 1.47 1.56 1.67 1.74 1.77 BLEU Perplexity Epoch Perplexity BLEU Linear (BLEU)
  • 24. More enhancements for the chunking step Add special processing of multi-word expressions (MWEs) Try out other types of LMs – POS tag + lemma – Recurrent Neural Network Language Model (Mikolov et al., 2010) – Continuous Space Language Model (Schwenk et al., 2006) – Character-Aware Neural Language Model (Kim et al., 2015) Choose the best translation candidate with MT quality estimation – QuEst++ (Specia et al., 2015) – SHEF-NN (Shah et al., 2015) Handling MWEs in neural machine translation systems Future work
  • 25. • Matīss Rikters "Multi-system machine translation using online APIs for English-Latvian" ACL-IJCNLP 2015 • Matīss Rikters and Inguna Skadiņa "Syntax-based multi-system machine translation" LREC 2016 • Matīss Rikters and Inguna Skadiņa "Combining machine translated sentence chunks from multiple MT systems" CICLing 2016 • Matīss Rikters "K-translate – interactive multi-system machine translation" Baltic DB&IS 2016 Related publications
  • 27. References• Ahsan, A., and P. Kolachina. "Coupling Statistical Machine Translation with Rule-based Transfer and Generation, AMTA-The Ninth Conference of the Association for Machine Translation in the Americas." Denver, Colorado (2010). • Barrault, Loïc. "MANY: Open source machine translation system combination." The Prague Bulletin of Mathematical Linguistics 93 (2010): 147-155. • Heafield, Kenneth. "KenLM: Faster and smaller language model queries." Proceedings of the Sixth Workshop on Statistical Machine Translation. Association for Computational Linguistics, 2011. • Kim, Yoon, et al. "Character-aware neural language models." arXiv preprint arXiv:1508.06615 (2015). • Mellebeek, Bart, et al. "Multi-engine machine translation by recursive sentence decomposition." (2006). • Mikolov, Tomas, et al. "Recurrent neural network based language model." INTERSPEECH. Vol. 2. 2010. • Petrov, Slav, et al. "Learning accurate, compact, and interpretable tree annotation." Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006. • Raivis Skadiņš, Kārlis Goba, Valters Šics. 2010. Improving SMT for Baltic Languages with Factored Models. Proceedings of the Fourth International Conference Baltic HLT 2010, Frontiers in Artificial Intelligence and Applications, Vol. 2192. , 125-132. • Rikters, M., Skadiņa, I.: Syntax-based multi-system machine translation. LREC 2016. (2016) • Rikters, M., Skadiņa, I.: Combining machine translated sentence chunks from multiple MT systems. CICLing 2016. (2016) • Santanu, Pal, et al. "USAAR-DCU Hybrid Machine Translation System for ICON 2014" The Eleventh International Conference on Natural Language Processing. , 2014. • Schwenk, Holger, Daniel Dchelotte, and Jean-Luc Gauvain. "Continuous space language models for statistical machine translation." Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006. • Shah, Kashif, et al. "SHEF-NN: Translation Quality Estimation with Neural Networks." Proceedings of the Tenth Workshop on Statistical Machine Translation. 2015. • Specia, Lucia, G. Paetzold, and Carolina Scarton. "Multi-level Translation Quality Prediction with QuEst++." 53rd Annual Meeting of the Association for Computational Linguistics and Seventh International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing: System Demonstrations. 2015. • Steinberger, Ralf, et al. "Dgt-tm: A freely available translation memory in 22 languages." arXiv preprint arXiv:1309.5226 (2013). • Steinberger, Ralf, et al. "The JRC-Acquis: A multilingual aligned parallel corpus with 20+ languages." arXiv preprint cs/0609058 (2006). References