Effective Online Learning for Statistical Machine Translation

•Download as PPTX, PDF•

0 likes•207 views

This document discusses effective online learning for statistical machine translation. It presents research on using online learning to improve machine translation quality during runtime by learning from corrected translations sent back from computer-assisted translation tools. The researchers implemented three translation scenarios: a baseline without online learning, online learning without feedback, and online learning with feedback. Their two-step tuning method improved over the baseline, increasing quality by up to 12 BLEU points depending on the language domain and size of the training data used for tuning.

Technology

Effective Online Learning
for Statistical Machine Translation
Toms Miks, Mārcis Pinnis, Matīss Rikters and Rihards Krišlauks
13th International Baltic Conference on Databases and Information Systems
July 3, 2018
Trakai, Lithuania

MT, SMT, OL?
• Machine translation (MT) is a sub-field of natural language
processing that investigates the use of computers to translate text
from one language into another
• Statistical MT (SMT) consists of subcomponents that are separately
engineered to learn how to translate from vast amounts of human-
translated texts
• Online learning (OL) allows to improve MT quality during runtime by
learning from corrected translations sent back to the MT system
from computer-assisted translation (CAT) tools after translators have
approved a post-edited translation

Automatic Evaluation of MT: BLEU
• One of the first (and still the most widely used) metrics to report high
correlation with human judgments
• The closer an MT hypothesis is to a professional
human translation, the better it is
• Scores MT hypotheses on a scale of
0 to 100
Papineni et al. 2002

Data
IT domain Medical domain
English-Estonian English-Latvian English-Latvian
Unique parallel sentence pairs 9,059,100 4,029,063 325,332
Unique in-domain monolingual sentences 34,392,322 1,950,266 332,652
Unique broad domain monolingual sentences - 2,369,308 -
Tuning data 1,990 1,837 2,000
IT domain Medical domain
English-Estonian English-Latvian English-Latvian
Segments 60,630 27,122 20,286
Tokens 475,295 166,350 374,914
Post-edited data
Training data

Online
learning
procedure
Translation
request
SMT
translation
Post-
editing
Word
alignment
Phrase
extraction
Phrase
integration
CAT
SMT
CAT
SMT
SMT
SMT

Three types of translation scenarios
• Baseline
• standard SMT without dynamic models
• OL-
• SMT with dynamic models;
• post-edits are not sent back to the system for online learning
• OL+
• SMT with dynamic models;
• post-edits are sent back to the SMT system for online learning

Two-step Tuning
1. Static model weights are tuned using MIRA in a standard translation
scenario (without online learning)
2. Dynamic model weights are tuned using MERT in an online learning
scenario using the pre-trained static model weights

Results
System
IT domain Medical domain
English-Estonian English-Latvian English-Latvian
Baseline 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17
OL- - 19.91 ±0.20 70.27 ±0.20
OL+ - 32.42 ±0.30 69.53 ±0.22
Baseline system results
Two-step tuning results
System
IT domain Medical domain
English-Estonian English-Latvian English-Latvian
Baseline 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17
OL- 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17
OL+ 31.45 ±0.20 38.59 ±0.31 76.23 ±0.19

Different text repetitiveness levels in tuning data
Experiment
Medical domain IT domain
RR1 RR BLEU RR1 RR BLEU
Evaluation data 0.13 0.11 - 0.31 0.2 -
Tuning data - 100% repetitiveness 0.51 0.85
75.98
(75.61-76.33)
0.51 0.85
36.01
(35.42-36.63)
Tuning data - 25% repetitiveness 0.27 0.31
76.17
(75.81-76.54)
0.26 0.3
38.59
(38.01-39.21)
Tuning data - 12.5% repetitiveness 0.19 0.21
76.18
(75.83-76.54)
0.19 0.2
38.38
(37.77-38.95)
Tuning data - 6.25% repetitiveness 0.14 0.16
76.23
(75.88-76.60)
0.14 0.15
38.38
(37.79-39.02)

Different text repetitiveness levels in tuning data

Translation Memory Influence
System English-Estonian English-Latvian
Baseline 26.80 ±0.17 26.42 ±0.23
Baseline + translation memory
(improvement)
28.87 ±0.20
(+2.07)
28.97 ±0.26
(+2.55)
OL+ + translation memory
(improvement)
31.45 ±0.20
(+2.58)
38.59 ±0.31
(+9.62)
Total improvement +4.65 +12.17

Conclusions
• The baseline implementation did not allow to improve SMT system
quality due to sub-optimal tuning performance
• We devised a two-step tuning method to address this issue
• The improved OL method allowed to increase IT domain SMT quality
from +4.65 (En-Et) up to +12.17 (En-Lv) BLEU points
• Although the method did not improve the high quality (> 75 BLEU) baseline,
the loss was minimal (0.55 BLEU)

{toms.miks, marcis.pinnis, matiss.rikters, rihards.krislauks}@tilde.lv

Similar to Effective Online Learning for Statistical Machine Translation

Advances in Bayesian Learningbutest

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster

IRJET- Factoid Question and Answering SystemIRJET Journal

Fitness Inheritance in Evolutionary andPier Luca Lanzi

IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...IRJET Journal

Searching for the best translation combinationMatīss ‎‎‎‎‎‎‎

Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...Mumbai Academisc

Efficiently Detecting and Analyzing Spam Reviews Using Live Data FeedIRJET Journal

Research on Power Quality Real-Time Monitoring System For High Voltage Switch...IJRESJOURNAL

Not Only Statements: The Role of Textual Analysis in Software QualityRocco Oliveto

Long Zhou - 2017 - Neural System Combination for Machine TransaltionAssociation for Computational Linguistics

IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...IRJET Journal

IRJET- E-MORES: Efficient Multiple Output Regression for Streaming DataIRJET Journal

Real-Time, Non-Intrusive Evaluation of VoIP Using Genetic Programmingadil raja

ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)Konstantinos Zagoris

Matlab.pptKarimullaSk14

OverheadsDay1.pptKarimullaSk14

Case Study—PART 1—Jurisdictional Declaration CriteriaLevels .docxketurahhazelhurst

IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...IRJET Journal

Machine learning in opticalVishal Waghmare

Similar to Effective Online Learning for Statistical Machine Translation (20)

Advances in Bayesian Learning

Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...

IRJET- Factoid Question and Answering System

Fitness Inheritance in Evolutionary and

IRJET- Expert Independent Bayesian Data Fusion and Decision Making Model for ...

Searching for the best translation combination

Bra a bidirectional routing abstraction for asymmetric mobile ad hoc networks...

Efficiently Detecting and Analyzing Spam Reviews Using Live Data Feed

Research on Power Quality Real-Time Monitoring System For High Voltage Switch...

Not Only Statements: The Role of Textual Analysis in Software Quality

Long Zhou - 2017 - Neural System Combination for Machine Transaltion

IRJET - Comparative Analysis of GUI based Prediction of Parkinson Disease usi...

IRJET- E-MORES: Efficient Multiple Output Regression for Streaming Data

Real-Time, Non-Intrusive Evaluation of VoIP Using Genetic Programming

ICFHR 2014 Competition on Handwritten KeyWord Spotting (H-KWS 2014)

Matlab.ppt

OverheadsDay1.ppt

Case Study—PART 1—Jurisdictional Declaration CriteriaLevels .docx

IRJET- Predicting Outcome of Judicial Cases and Analysis using Machine Le...

Machine learning in optical

Recently uploaded

Key Features Of Token Development (1).pptxLBM Solutions

Scaling API-first – The story of a global engineering organizationRadu Cotescu

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada

GenCyber Cyber Security Day PresentationMichael W. Hawkins

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited

SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent

Slack Application Development 101 Slidespraypatel2

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

Pigging Solutions Piggable Sweeping ElbowsPigging Solutions

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

Understanding the Laravel MVC ArchitecturePixlogix Infotech

Recently uploaded (20)

Key Features Of Token Development (1).pptx

Scaling API-first – The story of a global engineering organization

Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024

GenCyber Cyber Security Day Presentation

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Breaking the Kubernetes Kill Chain: Host Path Mount

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

[2024]Digital Global Overview Report 2024 Meltwater.pdf

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi

Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...

Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365

SIEMENS: RAPUNZEL – A Tale About Knowledge Graph

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...

Slack Application Development 101 Slides

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

Pigging Solutions Piggable Sweeping Elbows

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

Understanding the Laravel MVC Architecture

Effective Online Learning for Statistical Machine Translation

1. Effective Online Learning for Statistical Machine Translation Toms Miks, Mārcis Pinnis, Matīss Rikters and Rihards Krišlauks 13th International Baltic Conference on Databases and Information Systems July 3, 2018 Trakai, Lithuania

2. MT, SMT, OL? • Machine translation (MT) is a sub-field of natural language processing that investigates the use of computers to translate text from one language into another • Statistical MT (SMT) consists of subcomponents that are separately engineered to learn how to translate from vast amounts of human- translated texts • Online learning (OL) allows to improve MT quality during runtime by learning from corrected translations sent back to the MT system from computer-assisted translation (CAT) tools after translators have approved a post-edited translation

3. Automatic Evaluation of MT: BLEU • One of the first (and still the most widely used) metrics to report high correlation with human judgments • The closer an MT hypothesis is to a professional human translation, the better it is • Scores MT hypotheses on a scale of 0 to 100 Papineni et al. 2002

4. Data IT domain Medical domain English-Estonian English-Latvian English-Latvian Unique parallel sentence pairs 9,059,100 4,029,063 325,332 Unique in-domain monolingual sentences 34,392,322 1,950,266 332,652 Unique broad domain monolingual sentences - 2,369,308 - Tuning data 1,990 1,837 2,000 IT domain Medical domain English-Estonian English-Latvian English-Latvian Segments 60,630 27,122 20,286 Tokens 475,295 166,350 374,914 Post-edited data Training data

5. Online learning procedure Translation request SMT translation Post- editing Word alignment Phrase extraction Phrase integration CAT SMT CAT SMT SMT SMT

6. Three types of translation scenarios • Baseline • standard SMT without dynamic models • OL- • SMT with dynamic models; • post-edits are not sent back to the system for online learning • OL+ • SMT with dynamic models; • post-edits are sent back to the SMT system for online learning

7. Two-step Tuning 1. Static model weights are tuned using MIRA in a standard translation scenario (without online learning) 2. Dynamic model weights are tuned using MERT in an online learning scenario using the pre-trained static model weights

8. Results System IT domain Medical domain English-Estonian English-Latvian English-Latvian Baseline 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17 OL- - 19.91 ±0.20 70.27 ±0.20 OL+ - 32.42 ±0.30 69.53 ±0.22 Baseline system results Two-step tuning results System IT domain Medical domain English-Estonian English-Latvian English-Latvian Baseline 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17 OL- 26.80 ±0.17 26.42 ±0.23 76.78 ±0.17 OL+ 31.45 ±0.20 38.59 ±0.31 76.23 ±0.19

9. Different text repetitiveness levels in tuning data Experiment Medical domain IT domain RR1 RR BLEU RR1 RR BLEU Evaluation data 0.13 0.11 - 0.31 0.2 - Tuning data - 100% repetitiveness 0.51 0.85 75.98 (75.61-76.33) 0.51 0.85 36.01 (35.42-36.63) Tuning data - 25% repetitiveness 0.27 0.31 76.17 (75.81-76.54) 0.26 0.3 38.59 (38.01-39.21) Tuning data - 12.5% repetitiveness 0.19 0.21 76.18 (75.83-76.54) 0.19 0.2 38.38 (37.77-38.95) Tuning data - 6.25% repetitiveness 0.14 0.16 76.23 (75.88-76.60) 0.14 0.15 38.38 (37.79-39.02)

10. Different text repetitiveness levels in tuning data

11. Translation Memory Influence System English-Estonian English-Latvian Baseline 26.80 ±0.17 26.42 ±0.23 Baseline + translation memory (improvement) 28.87 ±0.20 (+2.07) 28.97 ±0.26 (+2.55) OL+ + translation memory (improvement) 31.45 ±0.20 (+2.58) 38.59 ±0.31 (+9.62) Total improvement +4.65 +12.17

12. Conclusions • The baseline implementation did not allow to improve SMT system quality due to sub-optimal tuning performance • We devised a two-step tuning method to address this issue • The improved OL method allowed to increase IT domain SMT quality from +4.65 (En-Et) up to +12.17 (En-Lv) BLEU points • Although the method did not improve the high quality (> 75 BLEU) baseline, the loss was minimal (0.55 BLEU)

13. {toms.miks, marcis.pinnis, matiss.rikters, rihards.krislauks}@tilde.lv

Editor's Notes

1. The SMT system receives a translation request to translate a sentence. 2. The sentence is translated by the SMT system and the translation is sent to a CAT tool. 3. The translation is post-edited by a translator in the CAT tool. 4. The post-edited translation together with the source sentence is sent back to the SMT system to perform online learning. 5. The SMT system performs word alignment between the source sentence and the post-edited sentence using fast-align. For this, we use the fast-align model acquired during training of the SMT system. 6. The SMT system extracts parallel phrases consisting of 1-7 tokens using the Moses phrase extraction method [16] that is implemented in the Moses toolkit. 7. The extracted phrases are added to the dynamic translation and language models so that, when translating the next sentence, the system would benefit from the newly learned phrases. Phrases that are added to the dynamic models are weighted according to their age (newer phrases have a higher weight) using the hyperbola-based penalty function [3]. A maximum of 10,000 phrases is kept in the dynamic models.
1. The baseline scenario uses a standard SMT system with no dynamic models. 2. The OL- scenario uses an SMT system with dynamic models, however, post-edited translations are not sent back to the SMT system for online learning. This means that the dynamic models will always stay empty. The goal of this scenario is to validate whether SMT systems with dynamic models are able to reach baseline translation quality in situations when some CAT tools are not able to or do not provide functionality that allows returning post-edited translations back to the SMT system. 3. The OL+ scenario uses an SMT system with dynamic models and after translation of each sentence, the post-edited translation is sent back to the SMT system for online learning.

Effective Online Learning for Statistical Machine Translation

Recommended

Recommended

More Related Content

Similar to Effective Online Learning for Statistical Machine Translation

Similar to Effective Online Learning for Statistical Machine Translation (20)

More from Matīss ‎‎‎‎‎‎‎

More from Matīss ‎‎‎‎‎‎‎ (20)

Recently uploaded

Recently uploaded (20)

Effective Online Learning for Statistical Machine Translation

Editor's Notes