This paper describes a hybrid machine translation (HMT) system that employs several online MT system application program interfaces (APIs) forming a Multi-System Machine Translation (MSMT) approach. The goal is to im-prove the automated translation of English – Latvian texts over each of the individual MT APIs. The selection of the best hypothesis translation is done by calculating the perplexity for each hypothesis. Experiment results show a slight improvement of BLEU score and WER (word error rate).
Multi-system machine translation using online APIs for English-Latvian
1. Multi-system machine translation
using online APIs for English-Latvian
Matīss Rikters
University of Latvia
ACL 2015 Fourth Workshop on
Hybrid Approaches to Translation
Beijing, 31.07.2015
2. Introduction
Motivation:
Doctoral studies at the University of Latvia
A hybrid machine translation method, combining results of various machine translation systems
Literature review
Recent trends in Multi-System Machine Translation
Nothing similar publically available was found
4. Related work
"Coupling Statistical Machine Translation with Rule-based Transfer and Generation",
A. Ahsan, and P. Kolachina.
"Using language and translation models to select the best among outputs from
multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
"MANY: Open source machine translation system combination", L. Barrault.
"A program for automatically selecting the best output from multiple machine
translation engines", C. Callison-Burch and R. S. Flournoy.
5. Initial plan
Use systems that support English – Latvian translation
Found five such systems:
8. Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-
estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn
from p.
9. System usage
Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator
Get API access
Google - https://cloud.google.com/translate/
Bing - http://www.bing.com/dev/en-us/translator
LetsMT - https://www.letsmt.eu/Integration.aspx
Add API keys to the configuration
Prepare a language model
You can use KenLM – https://kheafield.com/code/kenlm/
Prepare input data
Run
php MSHT.php languageModel.binary inputSentances.txt
10. Experiments
MT System APIs
Google Translate
Bing Translator
TB2013 EN-LV v03 from LetsMT
Language model
JRC Acquis corpus version 2.2
Input sentences
JRC Acquis corpus version 2.2
ACCURAT balanced test corpus for under resourced languages
12. Experiment results – ACCURAT balanced
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT 28.94
Hybrid LetsMT + Bing 27.44
Hybrid Google + Bing + LetsMT 26.74
13. Human evaluation
5 native Latvian speakers were given a random 2% - 32 sentences
They were told to mark which of the three MT outputs is the best, worst and OK
Having the option to select multiple answers for best, worst or OK
14. Human results
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
15. Conclusion
Simple to
Build
Use
Add new MT APIs
Works
When used on similar systems
Poor with one much superior system
Needs
Improvements for translation selection
More configuration options
16. Future work
Use a bigger & better language model?
Tried it… about the same results
Confusion networks?
Too confusing for now
Use MT quality estimation for selecting the best candidates
QuEst or QuEst++
Other quality estimation
Chunk sentences in smaller parts, translate & recombine