SlideShare a Scribd company logo
1 of 17
Multi-system machine translation
using online APIs for English-Latvian
Matīss Rikters
University of Latvia
ACL 2015 Fourth Workshop on
Hybrid Approaches to Translation
Beijing, 31.07.2015
Introduction
 Motivation:
 Doctoral studies at the University of Latvia
 A hybrid machine translation method, combining results of various machine translation systems
 Literature review
 Recent trends in Multi-System Machine Translation
 Nothing similar publically available was found
Introduction
 Goals:
 Combine output from multiple online MT APIs
 Keep it simple
 Make it work fast
Related work
 "Coupling Statistical Machine Translation with Rule-based Transfer and Generation",
A. Ahsan, and P. Kolachina.
 "Using language and translation models to select the best among outputs from
multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
 "MANY: Open source machine translation system combination", L. Barrault.
 "A program for automatically selecting the best output from multiple machine
translation engines", C. Callison-Burch and R. S. Flournoy.
Initial plan
 Use systems that support English – Latvian translation
 Found five such systems:
What worked
 Couldn`t get APIs of two of them to work
 Used the remaining three:
System description
Sentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best
translation
Output
Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-
estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn
from p.
System usage
 Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator
 Get API access
 Google - https://cloud.google.com/translate/
 Bing - http://www.bing.com/dev/en-us/translator
 LetsMT - https://www.letsmt.eu/Integration.aspx
 Add API keys to the configuration
 Prepare a language model
 You can use KenLM – https://kheafield.com/code/kenlm/
 Prepare input data
 Run
 php MSHT.php languageModel.binary inputSentances.txt
Experiments
 MT System APIs
 Google Translate
 Bing Translator
 TB2013 EN-LV v03 from LetsMT
 Language model
 JRC Acquis corpus version 2.2
 Input sentences
 JRC Acquis corpus version 2.2
 ACCURAT balanced test corpus for under resourced languages
Experiment results – JRC Acquis
System BLEU TER WER
Translations selected
Google Bing LetsMT Equal
Google Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 %
Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 %
Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 %
Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
Experiment results – ACCURAT balanced
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT 28.94
Hybrid LetsMT + Bing 27.44
Hybrid Google + Bing + LetsMT 26.74
Human evaluation
 5 native Latvian speakers were given a random 2% - 32 sentences
 They were told to mark which of the three MT outputs is the best, worst and OK
 Having the option to select multiple answers for best, worst or OK
Human results
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
Conclusion
 Simple to
 Build
 Use
 Add new MT APIs
 Works
 When used on similar systems
 Poor with one much superior system
 Needs
 Improvements for translation selection
 More configuration options
Future work
 Use a bigger & better language model?
 Tried it… about the same results
 Confusion networks?
 Too confusing for now
 Use MT quality estimation for selecting the best candidates
 QuEst or QuEst++
 Other quality estimation
 Chunk sentences in smaller parts, translate & recombine
Thank you!
http://ej.uz/MSHT-GITHUB
http://ej.uz/MSMT-EN-LV

More Related Content

Viewers also liked

Viewers also liked (11)

C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
 
Powerpoint Template
Powerpoint TemplatePowerpoint Template
Powerpoint Template
 
Modaclub v3
Modaclub v3Modaclub v3
Modaclub v3
 
Makalah asertifitas
Makalah asertifitasMakalah asertifitas
Makalah asertifitas
 
Makalah asa nukleat
Makalah asa nukleatMakalah asa nukleat
Makalah asa nukleat
 
Unidad 3
Unidad 3Unidad 3
Unidad 3
 
Makalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kepMakalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kep
 
Filming schedule
Filming scheduleFilming schedule
Filming schedule
 
Google drive d.şahi̇n
Google drive d.şahi̇nGoogle drive d.şahi̇n
Google drive d.şahi̇n
 
Transform: One World
Transform: One WorldTransform: One World
Transform: One World
 
Lição 36 as limitações dos discípulos
Lição 36   as limitações dos discípulosLição 36   as limitações dos discípulos
Lição 36 as limitações dos discípulos
 

Similar to Multi-system machine translation using online APIs for English-Latvian

Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programmingabhishek singh
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationPerforce
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSebastiano Panichella
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithmsNurhussen Menza
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
 

Similar to Multi-system machine translation using online APIs for English-Latvian (20)

K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
 
C2-4-Putchala
C2-4-PutchalaC2-4-Putchala
C2-4-Putchala
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
 
Final
FinalFinal
Final
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithms
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systems
 

More from Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 

Recently uploaded

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesZilliz
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 

Recently uploaded (20)

Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Vector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector DatabasesVector Databases 101 - An introduction to the world of Vector Databases
Vector Databases 101 - An introduction to the world of Vector Databases
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 

Multi-system machine translation using online APIs for English-Latvian

  • 1. Multi-system machine translation using online APIs for English-Latvian Matīss Rikters University of Latvia ACL 2015 Fourth Workshop on Hybrid Approaches to Translation Beijing, 31.07.2015
  • 2. Introduction  Motivation:  Doctoral studies at the University of Latvia  A hybrid machine translation method, combining results of various machine translation systems  Literature review  Recent trends in Multi-System Machine Translation  Nothing similar publically available was found
  • 3. Introduction  Goals:  Combine output from multiple online MT APIs  Keep it simple  Make it work fast
  • 4. Related work  "Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.  "Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.  "MANY: Open source machine translation system combination", L. Barrault.  "A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.
  • 5. Initial plan  Use systems that support English – Latvian translation  Found five such systems:
  • 6. What worked  Couldn`t get APIs of two of them to work  Used the remaining three:
  • 7. System description Sentence tokenization Translation with APIs Google Translate Bing Translator LetsMT Selection of the best translation Output
  • 8. Selection of the best translation Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already- estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 9. System usage  Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator  Get API access  Google - https://cloud.google.com/translate/  Bing - http://www.bing.com/dev/en-us/translator  LetsMT - https://www.letsmt.eu/Integration.aspx  Add API keys to the configuration  Prepare a language model  You can use KenLM – https://kheafield.com/code/kenlm/  Prepare input data  Run  php MSHT.php languageModel.binary inputSentances.txt
  • 10. Experiments  MT System APIs  Google Translate  Bing Translator  TB2013 EN-LV v03 from LetsMT  Language model  JRC Acquis corpus version 2.2  Input sentences  JRC Acquis corpus version 2.2  ACCURAT balanced test corpus for under resourced languages
  • 11. Experiment results – JRC Acquis System BLEU TER WER Translations selected Google Bing LetsMT Equal Google Translate 16.92 47.68 58.55 100 % - - - Bing Translator 17.16 49.66 58.40 - 100 % - - LetsMT 28.27 36.19 42.89 - - 100 % - Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 % Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 % Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 % Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
  • 12. Experiment results – ACCURAT balanced System BLEU Google Translate 24.73 Bing Translator 22.07 LetsMT 32.01 Hybrid Google + Bing 23.75 Hybrid Google + LetsMT 28.94 Hybrid LetsMT + Bing 27.44 Hybrid Google + Bing + LetsMT 26.74
  • 13. Human evaluation  5 native Latvian speakers were given a random 2% - 32 sentences  They were told to mark which of the three MT outputs is the best, worst and OK  Having the option to select multiple answers for best, worst or OK
  • 14. Human results System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92 Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16 LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
  • 15. Conclusion  Simple to  Build  Use  Add new MT APIs  Works  When used on similar systems  Poor with one much superior system  Needs  Improvements for translation selection  More configuration options
  • 16. Future work  Use a bigger & better language model?  Tried it… about the same results  Confusion networks?  Too confusing for now  Use MT quality estimation for selecting the best candidates  QuEst or QuEst++  Other quality estimation  Chunk sentences in smaller parts, translate & recombine