Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Mt summit2015 jdu_v2
1. An Empirical Study of Segment Prioritization
for Incrementally Retrained Post-Editing-
Based SMT
Jinhua Du, Ankit Srivastava, Andy Way
ADAPT Centre,
Dublin City University, Ireland
Alfredo Maldonado-Guerra, David Lewis
ADAPT Centre,
Trinity College Dublin, Ireland
The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
MT Summit XV, 2015-11-02, Miami, FL
3. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Motivation - Question
Post-editing-based SMT has been widely deployed into the translator’s
workflow.
The productivity of translators/post-editors can be increased due to the
the use of SMT systems.
However, current SMT systems cannot generate high-quality translations,
so human effort is usually required.
Our Question: How to significantly reduce the PE-time?
Two objective ways:
Inbound: improving SMT system performance
Outbound: post-editing settings, circumstances, etc.
Segment Prioritization plays an important role in reducing the
overall PE-time.
4. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Motivation - Related work
Incrementally PE-SMT is essentially an active learning-based SMT,
including two key parts:
Active learning framework
Incremental retraining methods for model update
In 2009, Haffari et al. firstly proposed a practical active learning
framework for low-resource SMT where a number of high-quality
parallel data are acquired from the large-scale monolingual data.
In 2012, Gonzalez-Rubio et al. apply AL to the interactive MT to
select informative sentences to reduce human effort rather than PE
time.
The most relevant previous work: Dara et al. (2014) propose a Cross
Entropy Difference (CED) criterion to prioritize input segments in an
AL framework for PE-based incremental MT update applications.
5. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Motivation - Related work
Ortiz-Martinez et al. (2010) incrementally update the feature values
of the phrase table based on the pre-stored statistics related to the
feature scores.
Hardt and Elming (2010) propose a sentence-level retraining
scheme in which a local phrase table is created and incrementally
updated as a file is translated and post-edited.
Denkowski et al. (2014) propose an online model adaptation for
PE-SMT in which three methods are used for incremental model
adaptation.
Germann (2014) proposes a dynamic phrase table strategy for an
interactive PE-SMT that computes phrase table entries on demand
by sampling a suffix array-indexed bitext.
10. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Incrementally Retrained PE-SMT
Significant Features:
Architecture:
A combination of Moses and CDEC
A combination of online and offline retraining
Speed: about 20 minutes for each iteration to update
models at the scale of 4 million sentence pairs. However,
the time for Moses incremental retraining scheme is
more than 2 hours for the same-scale data.
Performance: no significantly difference between the
incrementally retrained system and the offline retrained
system
12. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
The initial parallel training corpus
a monolingual corpus (translation job)
F
Scoring metric: segment prioritization
algorithm
Segment Prioritization Algorithms
To prioritize the input segments, an importance score or uncertainty score for
the sentence s must be calculated under a scoring metric as:
Mathematical Formalization
),,(F:)( LUss
)},{(:L ii ef
}{: jfU
13. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Segment Prioritization Algorithms
Confidence of Translations (Confidence)
Definition: The probability p(e|f) of a translation output ê that is
calculated by the decoder.
Geometry n-gram (Geom n-gram)
The Geom n-gram algorithm is defined as:
),|(
),|(
log
||
:)(
1 nLxP
nUxP
X
s n
sXx
N
n
n
s
n
|)|(log|:)( feps
14. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Segment Prioritization Algorithms
Perplexity of Sentences (PPL)
The PPL algorithm is defined as:
Cross Entropy Difference (CED)
The CED algorithm is defined as:
)()(:)( sHsHs LU
N-OOVs
p(s)
s
log
10:)(
Ranking: The higher the score, the higher the rank.
16. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
•Corpora: Europarl
•Language pairs: EN-DE, EN-ES,
ES-EN, DE-EN
•Training set: 50K
•Incremental set: 10K
•Devset: Newswire 2012 test set
Settings for incrementally
retrained SMT system
•Corpora: DGT
•Language pair: EN-DE, EN-ES,
EN-FR, EN-PL
•Training set: 50K
•Incremental set: 10K
•Devset: 2K (extracted)
Evaluation metric: TER to simulate the PE-time;
Batch mode: 500 sentences for each batch;
Simulated workflow: we use the references of the translated segments
instead of post-edited MT output per se;
20. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
Iterative curves – Europarl – TER Score
we can see that there is no obvious decrease (i.e. improvement) for the
baseline;
the other four prioritization criteria have a trend of decreasing the TER
score;
The decreasing trends show the effectiveness of these methods to
prioritize the input segments.
21. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
the sentence length distribution of the “Confidence” criterion strongly fluctuates
per iteration;
the fluctuations of the length distribution of “Confidence” consistently
correspond to the TER score trend;
The length distributions of the “Geom n-gram”, “PPL” and “CED” methods are
more smooth, but start from shorter sentences, end with longer sentences.
Sentence Length Distribution
22. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
It is inferred that the prioritization score ϕ(s) of “Confidence” may correlate to
the sentence length of the input segment.
Pearson correlation: EN-DE as an example
Correlation between sentence length and sentence score
EN-DE Geom n-gram PPL CED Confidence
DGT 0.21 0.009 0.02 0.29
Europarl 0.14 0.06 0.10 0.48
Results show that the longer the sentence is, the more difficult it is to be
translated;
Question: should we normalize the score by the sentence length or not in the
segment prioritization task?
24. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
OOVs: Correlation with the sentence score
In Moses, when an OOV occurs, the probability p(e|f) will be
significantly decreased, i.e. the confidence of the translation becomes
lower.
Pearson correlation: EN-DE as an example
Pearson Correlation DGT Europarl
ρ 0.9993 0.9997
As expected the more OOVs a sentence contains, the lower the
confidence score, and the greater the possibility that it is ranked at the top.
25. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Experiments and Analysis
Confidence: Pros and Cons
Confidence performed best in all methods. HOWEVER,
At each iteration, all the incremental segments need to be translated
beforehand, which might be a problem for sentence-level incremental
PE-SMT.
The sentence length at the first several iterations is much longer,
which might significantly increase the amount of post-editing, which in
turn may adversely affect the perception of translators/post-editors as to
the utility of this approach.
Implication from the analysis: develop a method that can consider the
OOVs and sentence length distribution in the training data and the
incremental data rather than translating them.
26. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Conclusions and Future Work
Conclusions
2
1
3
The “Confidence” method achieved the best results in all tasks that
reduced the TER score of 1.72-4.55 absolute points.
We conducted an empirical study on four different segment
prioritization algorithms, namely the Sequential, Geom n-gram, PPL,
CED and Confidence methods for incrementally retrained PE-SMT..
An investigation was carried out to examine the crucial factors that
make the “Confidence” effective.
29. www.adaptcentre.ie
E-Mail: jdu@computing.dcu.ie
Conclusions and Future Work
Future Work
The context problem: the sorted input segments lose the sequential
context that is helpful to the translators.Q
Considering the context when ranking the segments.
Carrying out actual PE experiments using our different segment
prioritization algorithms.
1
2
30. The ADAPT Centre is funded under the SFI Research Centres Programme (Grant 13/RC/2106) and is co-funded under the European Regional Development Fund.
This research is supported by Science Foundation
Ireland through the ADAPT Centre (Grant
13/RC/2106) (www.adaptcentre.ie) at Dublin City
University and Trinity College Dublin, and by Grant
610879 for the Falcon project funded by the
European Commission.
Editor's Notes
Post-editing the output of a statistical machine translation (SMT) system to obtain high-quality
translation has become an increasingly common application of SMT, which henceforth we refer
to as post-editing-based SMT (PE-SMT). PE-SMT is often deployed as an incrementally
retrained system that can learn knowledge from human post-editing outputs as early as possible
to augment the SMT models to reduce PE time.
In a nutshell, our motivation of this paper is: under a batch-mode incremental retraining SMT framework, with the goal of reducing the overall post-editing time, we use different source-side segment prioritization methods to perform an empirical study and to analyze their advantages and disadvantages.