SlideShare a Scribd company logo
Multi-system machine translation
using online APIs for English-Latvian
Matīss Rikters
University of Latvia
ACL 2015 Fourth Workshop on
Hybrid Approaches to Translation
Beijing, 31.07.2015
Introduction
 Motivation:
 Doctoral studies at the University of Latvia
 A hybrid machine translation method, combining results of various machine translation systems
 Literature review
 Recent trends in Multi-System Machine Translation
 Nothing similar publically available was found
Introduction
 Goals:
 Combine output from multiple online MT APIs
 Keep it simple
 Make it work fast
Related work
 "Coupling Statistical Machine Translation with Rule-based Transfer and Generation",
A. Ahsan, and P. Kolachina.
 "Using language and translation models to select the best among outputs from
multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.
 "MANY: Open source machine translation system combination", L. Barrault.
 "A program for automatically selecting the best output from multiple machine
translation engines", C. Callison-Burch and R. S. Flournoy.
Initial plan
 Use systems that support English – Latvian translation
 Found five such systems:
What worked
 Couldn`t get APIs of two of them to work
 Used the remaining three:
System description
Sentence tokenization
Translation with APIs
Google Translate Bing Translator LetsMT
Selection of the best
translation
Output
Selection of the best translation
Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓
𝑛
:
𝑝 𝑤 𝑛 𝑤1
𝑛−1
= 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
𝑖=1
𝑓−1
𝑏(𝑤𝑖
𝑛−1
)
where the probability 𝑝 𝑤 𝑛 𝑤𝑓
𝑛−1
and backoff penalties 𝑏(𝑤𝑖
𝑛−1
) are given by an already-
estimated language model. Perplexity is then calculated using this probability:
where given an unknown probability distribution p and a proposed probability model q, it
is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn
from p.
System usage
 Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator
 Get API access
 Google - https://cloud.google.com/translate/
 Bing - http://www.bing.com/dev/en-us/translator
 LetsMT - https://www.letsmt.eu/Integration.aspx
 Add API keys to the configuration
 Prepare a language model
 You can use KenLM – https://kheafield.com/code/kenlm/
 Prepare input data
 Run
 php MSHT.php languageModel.binary inputSentances.txt
Experiments
 MT System APIs
 Google Translate
 Bing Translator
 TB2013 EN-LV v03 from LetsMT
 Language model
 JRC Acquis corpus version 2.2
 Input sentences
 JRC Acquis corpus version 2.2
 ACCURAT balanced test corpus for under resourced languages
Experiment results – JRC Acquis
System BLEU TER WER
Translations selected
Google Bing LetsMT Equal
Google Translate 16.92 47.68 58.55 100 % - - -
Bing Translator 17.16 49.66 58.40 - 100 % - -
LetsMT 28.27 36.19 42.89 - - 100 % -
Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 %
Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 %
Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 %
Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
Experiment results – ACCURAT balanced
System BLEU
Google Translate 24.73
Bing Translator 22.07
LetsMT 32.01
Hybrid Google + Bing 23.75
Hybrid Google + LetsMT 28.94
Hybrid LetsMT + Bing 27.44
Hybrid Google + Bing + LetsMT 26.74
Human evaluation
 5 native Latvian speakers were given a random 2% - 32 sentences
 They were told to mark which of the three MT outputs is the best, worst and OK
 Having the option to select multiple answers for best, worst or OK
Human results
System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU
Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92
Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16
LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
Conclusion
 Simple to
 Build
 Use
 Add new MT APIs
 Works
 When used on similar systems
 Poor with one much superior system
 Needs
 Improvements for translation selection
 More configuration options
Future work
 Use a bigger & better language model?
 Tried it… about the same results
 Confusion networks?
 Too confusing for now
 Use MT quality estimation for selecting the best candidates
 QuEst or QuEst++
 Other quality estimation
 Chunk sentences in smaller parts, translate & recombine
Thank you!
http://ej.uz/MSHT-GITHUB
http://ej.uz/MSMT-EN-LV

More Related Content

Viewers also liked

Viewers also liked (11)

C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
C4.5, C5.0 un SVM klasifikācijas algoritmu izpēte un salīdzināšana datorlingv...
 
Powerpoint Template
Powerpoint TemplatePowerpoint Template
Powerpoint Template
 
Modaclub v3
Modaclub v3Modaclub v3
Modaclub v3
 
Makalah asertifitas
Makalah asertifitasMakalah asertifitas
Makalah asertifitas
 
Makalah asa nukleat
Makalah asa nukleatMakalah asa nukleat
Makalah asa nukleat
 
Unidad 3
Unidad 3Unidad 3
Unidad 3
 
Makalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kepMakalah api klpk 1 kls a3 kep
Makalah api klpk 1 kls a3 kep
 
Filming schedule
Filming scheduleFilming schedule
Filming schedule
 
Google drive d.şahi̇n
Google drive d.şahi̇nGoogle drive d.şahi̇n
Google drive d.şahi̇n
 
Transform: One World
Transform: One WorldTransform: One World
Transform: One World
 
Lição 36 as limitações dos discípulos
Lição 36   as limitações dos discípulosLição 36   as limitações dos discípulos
Lição 36 as limitações dos discípulos
 

Similar to Multi-system machine translation using online APIs for English-Latvian

Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsMatīss ‎‎‎‎‎‎‎  
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechTAUS - The Language Data Network
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...PVS-Studio
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programmingabhishek singh
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approachGarima Nanda
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationPerforce
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems BiologyRichard Adams
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSebastiano Panichella
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithmsNurhussen Menza
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systemsIJECEIAES
 

Similar to Multi-system machine translation using online APIs for English-Latvian (20)

K translate - Baltic DBIS2016
K translate - Baltic DBIS2016K translate - Baltic DBIS2016
K translate - Baltic DBIS2016
 
Combining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systemsCombining machine translated sentence chunks from multiple MT systems
Combining machine translated sentence chunks from multiple MT systems
 
Searching for the best translation combination
Searching for the best translation combinationSearching for the best translation combination
Searching for the best translation combination
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
Doktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācijaDoktorantūras semināra 3. prezentācija
Doktorantūras semināra 3. prezentācija
 
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTechNew Development in MT Technology and Services, by Anthony Wong, CCID TransTech
New Development in MT Technology and Services, by Anthony Wong, CCID TransTech
 
Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...Adaptation of the technology of the static code analyzer for developing paral...
Adaptation of the technology of the static code analyzer for developing paral...
 
C2-4-Putchala
C2-4-PutchalaC2-4-Putchala
C2-4-Putchala
 
Introduction to genetic programming
Introduction to genetic programmingIntroduction to genetic programming
Introduction to genetic programming
 
Question Answering System using machine learning approach
Question Answering System using machine learning approachQuestion Answering System using machine learning approach
Question Answering System using machine learning approach
 
team10.ppt.pptx
team10.ppt.pptxteam10.ppt.pptx
team10.ppt.pptx
 
White Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build VerificationWhite Paper: Continuous Change-Driven Build Verification
White Paper: Continuous Change-Driven Build Verification
 
Final
FinalFinal
Final
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Eclipse Meets Systems Biology
Eclipse Meets Systems BiologyEclipse Meets Systems Biology
Eclipse Meets Systems Biology
 
Poster (1)
Poster (1)Poster (1)
Poster (1)
 
Summarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and TestingSummarization Techniques for Code, Changes, and Testing
Summarization Techniques for Code, Changes, and Testing
 
Software testing using genetic algorithms
Software testing using genetic algorithmsSoftware testing using genetic algorithms
Software testing using genetic algorithms
 
Proposal with sdlc
Proposal with sdlcProposal with sdlc
Proposal with sdlc
 
Efficient failure detection and consensus at extreme-scale systems
Efficient failure detection and consensus at extreme-scale  systemsEfficient failure detection and consensus at extreme-scale  systems
Efficient failure detection and consensus at extreme-scale systems
 

More from Matīss ‎‎‎‎‎‎‎  

Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsMatīss ‎‎‎‎‎‎‎  
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationMatīss ‎‎‎‎‎‎‎  
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsMatīss ‎‎‎‎‎‎‎  
 

More from Matīss ‎‎‎‎‎‎‎   (20)

日本のお風呂
日本のお風呂日本のお風呂
日本のお風呂
 
Thrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy DayThrifty Food Tweets on a Rainy Day
Thrifty Food Tweets on a Rainy Day
 
私の趣味
私の趣味私の趣味
私の趣味
 
How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?How Masterly Are People at Playing with Their Vocabulary?
How Masterly Are People at Playing with Their Vocabulary?
 
私の町リガ
私の町リガ私の町リガ
私の町リガ
 
大学への交通手段
大学への交通手段大学への交通手段
大学への交通手段
 
小学生に 携帯電話
小学生に 携帯電話小学生に 携帯電話
小学生に 携帯電話
 
Tracing multisensory food experience on twitter
Tracing multisensory food experience on twitterTracing multisensory food experience on twitter
Tracing multisensory food experience on twitter
 
ラトビア大学
ラトビア大学ラトビア大学
ラトビア大学
 
私の趣味
私の趣味私の趣味
私の趣味
 
富士山りょこう
富士山りょこう富士山りょこう
富士山りょこう
 
Tips and Tools for NMT
Tips and Tools for NMTTips and Tools for NMT
Tips and Tools for NMT
 
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation SystemsHybrid Machine Translation by Combining Multiple Machine Translation Systems
Hybrid Machine Translation by Combining Multiple Machine Translation Systems
 
The Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine TranslationThe Impact of Corpora Qulality on Neural Machine Translation
The Impact of Corpora Qulality on Neural Machine Translation
 
Advancing Estonian Machine Translation
Advancing Estonian Machine TranslationAdvancing Estonian Machine Translation
Advancing Estonian Machine Translation
 
Debugging neural machine translations
Debugging neural machine translationsDebugging neural machine translations
Debugging neural machine translations
 
Effective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translationEffective online learning implementation for statistical machine translation
Effective online learning implementation for statistical machine translation
 
Neirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošanaNeirontulkojumu atkļūdošana
Neirontulkojumu atkļūdošana
 
Hybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systemsHybrid machine translation by combining multiple machine translation systems
Hybrid machine translation by combining multiple machine translation systems
 
Paying attention to MWEs in NMT
Paying attention to MWEs in NMTPaying attention to MWEs in NMT
Paying attention to MWEs in NMT
 

Recently uploaded

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaRTTS
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...Elena Simperl
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀DianaGray10
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Tobias Schneck
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCzechDreamin
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsPaul Groth
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...Product School
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlPeter Udo Diehl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 

Recently uploaded (20)

Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
Exploring UiPath Orchestrator API: updates and limits in 2024 🚀
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
 
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya HalderCustom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
Custom Approval Process: A New Perspective, Pavel Hrbacek & Anindya Halder
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Multi-system machine translation using online APIs for English-Latvian

  • 1. Multi-system machine translation using online APIs for English-Latvian Matīss Rikters University of Latvia ACL 2015 Fourth Workshop on Hybrid Approaches to Translation Beijing, 31.07.2015
  • 2. Introduction  Motivation:  Doctoral studies at the University of Latvia  A hybrid machine translation method, combining results of various machine translation systems  Literature review  Recent trends in Multi-System Machine Translation  Nothing similar publically available was found
  • 3. Introduction  Goals:  Combine output from multiple online MT APIs  Keep it simple  Make it work fast
  • 4. Related work  "Coupling Statistical Machine Translation with Rule-based Transfer and Generation", A. Ahsan, and P. Kolachina.  "Using language and translation models to select the best among outputs from multiple MT systems", Y. Akiba, T. Watanabe, and E. Sumita.  "MANY: Open source machine translation system combination", L. Barrault.  "A program for automatically selecting the best output from multiple machine translation engines", C. Callison-Burch and R. S. Flournoy.
  • 5. Initial plan  Use systems that support English – Latvian translation  Found five such systems:
  • 6. What worked  Couldn`t get APIs of two of them to work  Used the remaining three:
  • 7. System description Sentence tokenization Translation with APIs Google Translate Bing Translator LetsMT Selection of the best translation Output
  • 8. Selection of the best translation Probabilities are calculated based on the observed entry with longest matching history 𝑤𝑓 𝑛 : 𝑝 𝑤 𝑛 𝑤1 𝑛−1 = 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 𝑖=1 𝑓−1 𝑏(𝑤𝑖 𝑛−1 ) where the probability 𝑝 𝑤 𝑛 𝑤𝑓 𝑛−1 and backoff penalties 𝑏(𝑤𝑖 𝑛−1 ) are given by an already- estimated language model. Perplexity is then calculated using this probability: where given an unknown probability distribution p and a proposed probability model q, it is evaluated by determining how well it predicts a separate test sample x1, x2... xN drawn from p.
  • 9. System usage  Get the code - https://github.com/M4t1ss/Multi-System-Hybrid-Translator  Get API access  Google - https://cloud.google.com/translate/  Bing - http://www.bing.com/dev/en-us/translator  LetsMT - https://www.letsmt.eu/Integration.aspx  Add API keys to the configuration  Prepare a language model  You can use KenLM – https://kheafield.com/code/kenlm/  Prepare input data  Run  php MSHT.php languageModel.binary inputSentances.txt
  • 10. Experiments  MT System APIs  Google Translate  Bing Translator  TB2013 EN-LV v03 from LetsMT  Language model  JRC Acquis corpus version 2.2  Input sentences  JRC Acquis corpus version 2.2  ACCURAT balanced test corpus for under resourced languages
  • 11. Experiment results – JRC Acquis System BLEU TER WER Translations selected Google Bing LetsMT Equal Google Translate 16.92 47.68 58.55 100 % - - - Bing Translator 17.16 49.66 58.40 - 100 % - - LetsMT 28.27 36.19 42.89 - - 100 % - Hybrid Google + Bing 17.28 48.30 58.15 50.09 % 45.03 % - 4.88 % Hybrid Google + LetsMT 22.89 41.38 50.31 46.17 % - 48.39 % 5.44 % Hybrid LetsMT + Bing 22.83 42.92 50.62 - 45.35 % 49.84 % 4.81 % Hybrid Google + Bing + LetsMT 21.08 44.12 52.99 28.93 % 34.31 % 33.98 % 2.78 %
  • 12. Experiment results – ACCURAT balanced System BLEU Google Translate 24.73 Bing Translator 22.07 LetsMT 32.01 Hybrid Google + Bing 23.75 Hybrid Google + LetsMT 28.94 Hybrid LetsMT + Bing 27.44 Hybrid Google + Bing + LetsMT 26.74
  • 13. Human evaluation  5 native Latvian speakers were given a random 2% - 32 sentences  They were told to mark which of the three MT outputs is the best, worst and OK  Having the option to select multiple answers for best, worst or OK
  • 14. Human results System User 1 User 2 User 3 User 4 User 5 AVG user Hybrid BLEU Bing 21,88% 53,13% 28,13% 25,00% 31,25% 31,88% 28,93% 16.92 Google 28,13% 25,00% 25,00% 28,13% 46,88% 30,63% 34,31% 17.16 LetsMT 50,00% 21,88% 46,88% 46,88% 21,88% 37,50% 33,98% 28.27
  • 15. Conclusion  Simple to  Build  Use  Add new MT APIs  Works  When used on similar systems  Poor with one much superior system  Needs  Improvements for translation selection  More configuration options
  • 16. Future work  Use a bigger & better language model?  Tried it… about the same results  Confusion networks?  Too confusing for now  Use MT quality estimation for selecting the best candidates  QuEst or QuEst++  Other quality estimation  Chunk sentences in smaller parts, translate & recombine