Mt summit2015 jdu_v2

An Empirical Study of Segment Prioritization
for Incrementally Retrained Post-Editing-
Based SMT
Jinhua Du, Ankit Srivastava, Andy Way
ADAPT Centre,
Dublin City University, Ireland
Alfredo Maldonado-Guerra, David Lewis
ADAPT Centre,
Trinity College Dublin, Ireland
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
MT Summit XV, 2015-11-02, Miami, FL

www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Outline
Motivation
Incrementally Retrained PE-SMT
Experiments and Analysis
Conclusions and Future Work
Segment Prioritization Algorithms

www.adaptcentre.ie
Motivation - Question
 Post-editing-based SMT has been widely deployed into the translator’s
workflow.
 The productivity of translators/post-editors can be increased due to the
the use of SMT systems.
 However, current SMT systems cannot generate high-quality translations,
so human effort is usually required.
 Our Question: How to significantly reduce the PE-time?
Two objective ways:
 Inbound: improving SMT system performance
 Outbound: post-editing settings, circumstances, etc.
Segment Prioritization plays an important role in reducing the
overall PE-time.

www.adaptcentre.ie
Motivation - Related work
Incrementally PE-SMT is essentially an active learning-based SMT,
including two key parts:
 Active learning framework
 Incremental retraining methods for model update
In 2009, Haffari et al. firstly proposed a practical active learning
framework for low-resource SMT where a number of high-quality
parallel data are acquired from the large-scale monolingual data.
In 2012, Gonzalez-Rubio et al. apply AL to the interactive MT to
select informative sentences to reduce human effort rather than PE
time.
The most relevant previous work: Dara et al. (2014) propose a Cross
Entropy Difference (CED) criterion to prioritize input segments in an
AL framework for PE-based incremental MT update applications.

www.adaptcentre.ie
Motivation - Related work
Ortiz-Martinez et al. (2010) incrementally update the feature values
of the phrase table based on the pre-stored statistics related to the
feature scores.
Hardt and Elming (2010) propose a sentence-level retraining
scheme in which a local phrase table is created and incrementally
updated as a file is translated and post-edited.
Denkowski et al. (2014) propose an online model adaptation for
PE-SMT in which three methods are used for incremental model
adaptation.
Germann (2014) proposes a dynamic phrase table strategy for an
interactive PE-SMT that computes phrase table entries on demand
by sampling a suffix array-indexed bitext.

www.adaptcentre.ie
Motivation
System Framework: batch-mode incremental retraining;
Goal: reducing the overall PE-time;
Our solution: source-side segments prioritization;
Contributions: an empirical study on different segment
prioritization algorithms.
Motivation

www.adaptcentre.ie
Incrementally Retrained PE-SMT
Workflow of active learning-based incrementally retrained PE-SMT
Where we are!

www.adaptcentre.ie
Workflow of the batch-mode incrementally retraining
Where we are!

www.adaptcentre.ie
Significant Features:
 Architecture:
 A combination of Moses and CDEC
 A combination of online and offline retraining
 Speed: about 20 minutes for each iteration to update
models at the scale of 4 million sentence pairs. However,
the time for Moses incremental retraining scheme is
more than 2 hours for the same-scale data.
 Performance: no significantly difference between the
incrementally retrained system and the offline retrained
system

www.adaptcentre.ie
The initial parallel training corpus
a monolingual corpus (translation job)
F
Scoring metric: segment prioritization
algorithm
Segment Prioritization Algorithms
To prioritize the input segments, an importance score or uncertainty score for
the sentence s must be calculated under a scoring metric as:
Mathematical Formalization
),,(F:)( LUss 
)},{(:L ii ef
}{: jfU 

www.adaptcentre.ie
Confidence of Translations (Confidence)
Definition: The probability p(e|f) of a translation output ê that is
calculated by the decoder.
 Geometry n-gram (Geom n-gram)
The Geom n-gram algorithm is defined as:
),|(
),|(
log
||
:)(
1 nLxP
nUxP
X
s n
sXx
N
n
n
s
n


 


|)|(log|:)( feps 

www.adaptcentre.ie
Perplexity of Sentences (PPL)
The PPL algorithm is defined as:
 Cross Entropy Difference (CED)
The CED algorithm is defined as:
)()(:)( sHsHs LU 
N-OOVs
p(s)
s
log
10:)(


Ranking: The higher the score, the higher the rank.

www.adaptcentre.ie
Experiments and Analysis
•Corpora: Europarl
•Language pairs: EN-DE, EN-ES,
ES-EN, DE-EN
•Training set: 50K
•Incremental set: 10K
•Devset: Newswire 2012 test set
Settings for incrementally
retrained SMT system
•Corpora: DGT
•Language pair: EN-DE, EN-ES,
EN-FR, EN-PL
•Training set: 50K
•Incremental set: 10K
•Devset: 2K (extracted)
 Evaluation metric: TER to simulate the PE-time;
 Batch mode: 500 sentences for each batch;
 Simulated workflow: we use the references of the translated segments
instead of post-edited MT output per se;

www.adaptcentre.ie
Experimental results (DGT) -- TER Score
Pair Sequential
Geom
n-gram
PPL CED Confidence Gains
EN-DE 55.94 56.50 56.58 55.23 51.39 4.55
EN-ES 41.65 42.40 42.48 41.58 38.69 2.96
EN-FR 44.86 46.92 46.75 44.62 41.42 3.44
EN-PL 51.38 51.53 51.63 51.16 48.09 3.29

www.adaptcentre.ie
Experimental results (Europarl) -- TER Score
Pair Sequential
Geom
n-gram
PPL CED Confidence Gains
DE-EN 73.73 73.87 72.60 72.48 70.56 3.17
EN-DE 80.12 80.00 79.38 79.02 77.83 2.29
ES-EN 84.14 84.20 83.75 83.25 81.82 2.32
EN-ES 64.10 64.10 63.37 63.11 62.38 1.72

www.adaptcentre.ie
Iterative curves – DGT – TER Score

www.adaptcentre.ie
Iterative curves – Europarl – TER Score
 we can see that there is no obvious decrease (i.e. improvement) for the
baseline;
 the other four prioritization criteria have a trend of decreasing the TER
score;
 The decreasing trends show the effectiveness of these methods to
prioritize the input segments.

www.adaptcentre.ie
 the sentence length distribution of the “Confidence” criterion strongly fluctuates
per iteration;
 the fluctuations of the length distribution of “Confidence” consistently
correspond to the TER score trend;
 The length distributions of the “Geom n-gram”, “PPL” and “CED” methods are
more smooth, but start from shorter sentences, end with longer sentences.
Sentence Length Distribution

www.adaptcentre.ie
 It is inferred that the prioritization score ϕ(s) of “Confidence” may correlate to
the sentence length of the input segment.
 Pearson correlation: EN-DE as an example
Correlation between sentence length and sentence score
EN-DE Geom n-gram PPL CED Confidence
DGT 0.21 0.009 0.02 0.29
Europarl 0.14 0.06 0.10 0.48
 Results show that the longer the sentence is, the more difficult it is to be
translated;
 Question: should we normalize the score by the sentence length or not in the
segment prioritization task?

www.adaptcentre.ie
Correlation: Normalize or not?
 The average TER score for the normalized “Confidence” is 56.09 which is much
higher (i.e. worse) than the baseline (55.94).

www.adaptcentre.ie
OOVs: Correlation with the sentence score
 In Moses, when an OOV occurs, the probability p(e|f) will be
significantly decreased, i.e. the confidence of the translation becomes
lower.
 Pearson correlation: EN-DE as an example
Pearson Correlation DGT Europarl
ρ 0.9993 0.9997
 As expected the more OOVs a sentence contains, the lower the
confidence score, and the greater the possibility that it is ranked at the top.

www.adaptcentre.ie
 Confidence: Pros and Cons
 Confidence performed best in all methods. HOWEVER,
 At each iteration, all the incremental segments need to be translated
beforehand, which might be a problem for sentence-level incremental
PE-SMT.
 The sentence length at the first several iterations is much longer,
which might significantly increase the amount of post-editing, which in
turn may adversely affect the perception of translators/post-editors as to
the utility of this approach.
Implication from the analysis: develop a method that can consider the
OOVs and sentence length distribution in the training data and the
incremental data rather than translating them.

www.adaptcentre.ie
Conclusions and Future Work
Conclusions
2
1
3
The “Confidence” method achieved the best results in all tasks that
reduced the TER score of 1.72-4.55 absolute points.
We conducted an empirical study on four different segment
prioritization algorithms, namely the Sequential, Geom n-gram, PPL,
CED and Confidence methods for incrementally retrained PE-SMT..
An investigation was carried out to examine the crucial factors that
make the “Confidence” effective.

www.adaptcentre.ie
Updates: Continuous Work
Actual Post-editing Experiment

www.adaptcentre.ie
Updates: Continuous Work
Actual Post-editing experiment
This work will be beneficial for CAT tools
(Web-based or Desktop-based)!

www.adaptcentre.ie
Conclusions and Future Work
 Future Work
The context problem: the sorted input segments lose the sequential
context that is helpful to the translators.Q
Considering the context when ranking the segments.
Carrying out actual PE experiments using our different segment
prioritization algorithms.
1
2

The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
This research is supported by Science Foundation
Ireland through the ADAPT Centre (Grant
13/RC/2106) (www.adaptcentre.ie) at Dublin City
University and Trinity College Dublin, and by Grant
610879 for the Falcon project funded by the
European Commission.

Mt summit2015 jdu_v2

Recommended

Recommended

More Related Content

What's hot

What's hot (15)

Viewers also liked

Viewers also liked (20)

Similar to Mt summit2015 jdu_v2

Similar to Mt summit2015 jdu_v2 (20)

Recently uploaded

Recently uploaded (20)

Mt summit2015 jdu_v2

Editor's Notes