SlideShare a Scribd company logo
1 of 17
Multi-system machine translation
using online APIs for English-Latvian
Matīss Rikters
University of Latvia
ACL 2015 Fourth Workshop on
Hybrid Approaches to Translation
Beijing, 31.07.2015
Introduction
 Motivation:
 Doctoral studies at the University of Latvia
 A hybrid machine translation method, combining results of various machine translation systems
 Literature review
 Recent trends in Multi-System Machine Translation
 Nothing similar publically available was found
Introduction
 Goals:
 Combine output from multiple online MT APIs
 Keep it simple
 Make it work fast
Related work
 "Coupling Statistical Machine Translation with Rule-based Transfer and Generation",
A. Ahsan, and P. Kolachina.
 "Using language and translation models to select the best among outputs from
multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
 "MANY: Open source machine translation system combination", L. Barrault.
 "A program for automatically selecting the best output from multiple machine
translation engines", C. Callison-Burch and R. S. Flournoy.
Initial plan
 Use systems that support English – Latvian translation
 Found five such systems:
What worked
 Couldn`t get APIs of two of them to work
 Used the remaining three:
System description
Sentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best
translation
Output
Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-
estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn
from p.
System usage
 Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator
 Get API access
 Google - https://cloud.google.com/translate/
 Bing - http://www.bing.com/dev/en-us/translator
 LetsMT - https://www.letsmt.eu/Integration.aspx
 Add API keys to the configuration
 Prepare a language model
 You can use KenLM – https://kheafield.com/code/kenlm/
 Prepare input data
 Run
 php MSHT.php languageModel.binary inputSentances.txt
Experiments
 MT System APIs
 Google Translate
 Bing Translator
 TB2013 EN-LV v03 from LetsMT
 Language model
 JRC Acquis corpus version 2.2
 Input sentences
 JRC Acquis corpus version 2.2
 ACCURAT balanced test corpus for under resourced languages
Experiment results – JRC Acquis
System BLEU TER WER
Translations selected
Google Bing LetsMT Equal
Google Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 %
Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 %
Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 %
Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
Experiment results – ACCURAT balanced
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT 28.94
Hybrid LetsMT + Bing 27.44
Hybrid Google + Bing + LetsMT 26.74
Human evaluation
 5 native Latvian speakers were given a random 2% - 32 sentences
 They were told to mark which of the three MT outputs is the best, worst and OK
 Having the option to select multiple answers for best, worst or OK
Human results
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
Conclusion
 Simple to
 Build
 Use
 Add new MT APIs
 Works
 When used on similar systems
 Poor with one much superior system
 Needs
 Improvements for translation selection
 More configuration options
Future work
 Use a bigger & better language model?
 Tried it… about the same results
 Confusion networks?
 Too confusing for now
 Use MT quality estimation for selecting the best candidates
 QuEst or QuEst++
 Other quality estimation
 Chunk sentences in smaller parts, translate & recombine
Thank you!
http://ej.uz/MSHT-GITHUB
http://ej.uz/MSMT-EN-LV

More Related Content

Viewers also liked

Viewers also liked (11)

C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
 
Powerpoint Template
Powerpoint TemplatePowerpoint Template
Powerpoint Template
 
Modaclub v3
Modaclub v3Modaclub v3
Modaclub v3
 
Makalah asertifitas
Makalah asertifitasMakalah asertifitas
Makalah asertifitas
 
Makalah asa nukleat
Makalah asa nukleatMakalah asa nukleat
Makalah asa nukleat
 
Unidad 3
Unidad 3Unidad 3
Unidad 3
 
Makalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kepMakalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kep
 
Filming schedule
Filming scheduleFilming schedule
Filming schedule
 
Google drive d.şahi̇n
Google drive d.şahi̇nGoogle drive d.şahi̇n
Google drive d.şahi̇n
 
Transform: One World
Transform: One WorldTransform: One World
Transform: One World
 
Lição 36 as limitações dos discípulos
Lição 36   as limitações dos discípulosLição 36   as limitações dos discípulos
Lição 36 as limitações dos discípulos
 

Similar to Multi-system machine translation using online APIs for English-Latvian

Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programmingabhishek singh
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationPerforce
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSebastiano Panichella
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithmsNurhussen Menza
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
 

Similar to Multi-system machine translation using online APIs for English-Latvian (20)

K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
 
C2-4-Putchala
C2-4-PutchalaC2-4-Putchala
C2-4-Putchala
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
 
Final
FinalFinal
Final
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithms
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systems
 

More from Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 

Recently uploaded

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 

Recently uploaded (20)

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 

Multi-system machine translation using online APIs for English-Latvian

  • 1. Multi-system machine translation using online APIs for English-Latvian Matīss Rikters University of Latvia ACL 2015 Fourth Workshop on Hybrid Approaches to Translation Beijing, 31.07.2015
  • 2. Introduction  Motivation:  Doctoral studies at the University of Latvia  A hybrid machine translation method, combining results of various machine translation systems  Literature review  Recent trends in Multi-System Machine Translation  Nothing similar publically available was found
  • 3. Introduction  Goals:  Combine output from multiple online MT APIs  Keep it simple  Make it work fast
  • 4. Related work  "Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.  "Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.  "MANY: Open source machine translation system combination", L. Barrault.  "A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.
  • 5. Initial plan  Use systems that support English – Latvian translation  Found five such systems:
  • 6. What worked  Couldn`t get APIs of two of them to work  Used the remaining three:
  • 7. System description Sentence tokenization Translation with APIs Google Translate Bing Translator LetsMT Selection of the best translation Output
  • 8. Selection of the best translation Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already- estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 9. System usage  Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator  Get API access  Google - https://cloud.google.com/translate/  Bing - http://www.bing.com/dev/en-us/translator  LetsMT - https://www.letsmt.eu/Integration.aspx  Add API keys to the configuration  Prepare a language model  You can use KenLM – https://kheafield.com/code/kenlm/  Prepare input data  Run  php MSHT.php languageModel.binary inputSentances.txt
  • 10. Experiments  MT System APIs  Google Translate  Bing Translator  TB2013 EN-LV v03 from LetsMT  Language model  JRC Acquis corpus version 2.2  Input sentences  JRC Acquis corpus version 2.2  ACCURAT balanced test corpus for under resourced languages
  • 11. Experiment results – JRC Acquis System BLEU TER WER Translations selected Google Bing LetsMT Equal Google Translate 16.92 47.68 58.55 100 % - - - Bing Translator 17.16 49.66 58.40 - 100 % - - LetsMT 28.27 36.19 42.89 - - 100 % - Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 % Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 % Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 % Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
  • 12. Experiment results – ACCURAT balanced System BLEU Google Translate 24.73 Bing Translator 22.07 LetsMT 32.01 Hybrid Google + Bing 23.75 Hybrid Google + LetsMT 28.94 Hybrid LetsMT + Bing 27.44 Hybrid Google + Bing + LetsMT 26.74
  • 13. Human evaluation  5 native Latvian speakers were given a random 2% - 32 sentences  They were told to mark which of the three MT outputs is the best, worst and OK  Having the option to select multiple answers for best, worst or OK
  • 14. Human results System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92 Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16 LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
  • 15. Conclusion  Simple to  Build  Use  Add new MT APIs  Works  When used on similar systems  Poor with one much superior system  Needs  Improvements for translation selection  More configuration options
  • 16. Future work  Use a bigger & better language model?  Tried it… about the same results  Confusion networks?  Too confusing for now  Use MT quality estimation for selecting the best candidates  QuEst or QuEst++  Other quality estimation  Chunk sentences in smaller parts, translate & recombine