1. Translation Memory Retrieval Methods
[Bloodgood and Strauss, 2014] in Proc of 14th EACL
Koichi Akabe and Philip Arthur
NAIST MT Study
2014-07-03
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 1 / 27
4. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
5. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
En The dog opened the door.
Ja 犬がドアを開けた。
En I saw a girl with a telescope.
Ja 僕は望遠鏡で少女を見た。
En John opened the door.
Ja
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
6. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
En The dog opened the door.
Ja 犬がドアを開けた。
En I saw a girl with a telescope.
Ja 僕は望遠鏡で少女を見た。
En John opened the door.
Ja
1. Find the nearest source sentence
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
7. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
En The dog opened the door.
Ja 犬がドアを開けた。
En I saw a girl with a telescope.
Ja 僕は望遠鏡で少女を見た。
En John opened the door.
Ja 犬がドアを開けた。 (fuzzy)
1. Find the nearest source sentence
2. Suggest a translation
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
8. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
En The dog opened the door.
Ja 犬がドアを開けた。
En I saw a girl with a telescope.
Ja 僕は望遠鏡で少女を見た。
En John opened the door.
Ja 犬がドアを開けた。 (fuzzy)
1. Find the nearest source sentence
2. Suggest a translation
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
9. Translation Memory (TM)
▶ Most widely used computer-assisted translation (CAT) tool
▶ Suggest translations using other translations
En The dog opened the door.
Ja 犬がドアを開けた。
En I saw a girl with a telescope.
Ja 僕は望遠鏡で少女を見た。
En John opened the door.
Ja ジョンがドアを開けた。
1. Find the nearest source sentence
2. Suggest a translation
3. Post-editing
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 3 / 27
10. How to find the nearest source sentence?
TM finds the nearest source sentence using similarity metrics
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 4 / 27
11. How to find the nearest source sentence?
TM finds the nearest source sentence using similarity metrics
▶ Edit distance (Leven-shtein distance)
−→ Widely used metric
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 4 / 27
12. How to find the nearest source sentence?
TM finds the nearest source sentence using similarity metrics
▶ Edit distance (Leven-shtein distance)
−→ Widely used metric
▶ MT evaluation metrics [Simard and Fujita, 2012]
−→ WER, BLEU, NIST, VMeteor, Meteor as TM metrics
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 4 / 27
13. How to find the nearest source sentence?
TM finds the nearest source sentence using similarity metrics
▶ Edit distance (Leven-shtein distance)
−→ Widely used metric
▶ MT evaluation metrics [Simard and Fujita, 2012]
−→ WER, BLEU, NIST, VMeteor, Meteor as TM metrics
▶ This paper
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 4 / 27
14. Threshold of helpfulness
Matching algorithm always returns the nearest sentence
However, low score suggestions should not be shown
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 5 / 27
15. Threshold of helpfulness
Matching algorithm always returns the nearest sentence
However, low score suggestions should not be shown
TM softwares set the threshold at 70% in practice
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 5 / 27
16. Threshold of helpfulness
Matching algorithm always returns the nearest sentence
However, low score suggestions should not be shown
TM softwares set the threshold at 70% in practice −→ Why?
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 5 / 27
18. Definitions
TM Similarity Metrics compare M and C.
M: workload sentence
C: source language side of a candidate pre-existing translation
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 7 / 27
19. Definitions
TM Similarity Metrics compare M and C.
M: workload sentence
C: source language side of a candidate pre-existing translation
En The dog opened the door .
Ja 犬がドアを開けた。
En I saw a girl with a telescope .
Ja 僕は望遠鏡で少女を見た。
En John opened the door .
Ja 犬がドアを開けた。 (fuzzy)
M =John opened the door .
C1 =The dog opened the door .
C2 =I saw a girl with a telescope .
...
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 7 / 27
20. Translation Memory Similarity Metrics
Compare the following metrics:
▶ Percent Match
▶ Weighted Percent Match
▶ Edit Distance
▶ N-gram Precision
▶ Weighted N-gram Precision
▶ Modified Weighted N-gram Precision
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 8 / 27
22. Percent Match (PM)
The simplest metric
PM(M, C) =
|Munigrams ∩ Cunigrams|
|Munigrams|
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 9 / 27
23. Percent Match (PM)
The simplest metric
PM(M, C) =
|Munigrams ∩ Cunigrams|
|Munigrams|
e.g.
M =John opened the door .
C =The dog opened the door .
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 9 / 27
24. Percent Match (PM)
The simplest metric
PM(M, C) =
|Munigrams ∩ Cunigrams|
|Munigrams|
e.g.
M =John opened the door .
C =The dog opened the door .
PM(M, C) =
4
5
= 0.80
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 9 / 27
25. Weighted Percent Match (WPM)
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 10 / 27
26. Weighted Percent Match (WPM)
We want to know translation of rare words
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 10 / 27
27. Weighted Percent Match (WPM)
We want to know translation of rare words
PM with IDF weighting
WPM(M, C) =
∑
u∈{Munigrams∩Cunigrams}
idf(u, D)
∑
u∈Munigrams
idf(u, D)
where D is a set of all source sentences in the parallel corpus
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 10 / 27
28. Problem of PM and WPM
PM and WPM only consider coverage of words
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 11 / 27
29. Problem of PM and WPM
PM and WPM only consider coverage of words
−→ They cannnot see any context
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 11 / 27
30. Problem of PM and WPM
PM and WPM only consider coverage of words
−→ They cannnot see any context
We show methods that consider contexts in next slides
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 11 / 27
32. Edit Distance (ED)
Widely used metric
ED = max
(
1 −
edit-dist(M, C)
|Munigrams|
, 0
)
where edit-dist(M, C) is the number of word insertions, deletions,
and substitutions required to transform M into C
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 12 / 27
33. Edit Distance (ED)
Widely used metric
ED = max
(
1 −
edit-dist(M, C)
|Munigrams|
, 0
)
where edit-dist(M, C) is the number of word insertions, deletions,
and substitutions required to transform M into C
e.g.
M =John opened the door .
C =The dog opened the door .
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 12 / 27
34. Edit Distance (ED)
Widely used metric
ED = max
(
1 −
edit-dist(M, C)
|Munigrams|
, 0
)
where edit-dist(M, C) is the number of word insertions, deletions,
and substitutions required to transform M into C
e.g.
M =John opened the door .
C =The dog opened the door .
substitution: 1
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 12 / 27
35. Edit Distance (ED)
Widely used metric
ED = max
(
1 −
edit-dist(M, C)
|Munigrams|
, 0
)
where edit-dist(M, C) is the number of word insertions, deletions,
and substitutions required to transform M into C
e.g.
M =John opened the door .
C =The dog opened the door .
substitution: 1
insertion: 1
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 12 / 27
36. Edit Distance (ED)
Widely used metric
ED = max
(
1 −
edit-dist(M, C)
|Munigrams|
, 0
)
where edit-dist(M, C) is the number of word insertions, deletions,
and substitutions required to transform M into C
e.g.
M =John opened the door .
C =The dog opened the door .
substitution: 1
insertion: 1
ED(M, C) = 1 −
2
5
= 0.60
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 12 / 27
38. N-gram Precision (NGP)
Mean of N-gram precision (like the BLEU metric)
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 13 / 27
39. N-gram Precision (NGP)
Mean of N-gram precision (like the BLEU metric)
However, BLEU → 0 when the precision of longer N-grams is 0
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 13 / 27
40. N-gram Precision (NGP)
Mean of N-gram precision (like the BLEU metric)
However, BLEU → 0 when the precision of longer N-grams is 0
This work uses arithmetic mean instead of geometric mean
NGP =
1
N
N∑
n=1
pn
pn =
|Mn-grams ∩ Cn-grams|
Z ∗ |Mn-grams| + (1 − Z) ∗ |Cn-grams|
where Z is a parameter to control normalization,
and N is the maximum length of N-gram
N = 4 and Z = 0.75 in main experiments (discuss later)
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 13 / 27
47. Experiment
Two different technicals domains with Two different language pairs
(Fr-En, Zn-En).
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 17 / 27
48. Experiment
Two different technicals domains with Two different language pairs
(Fr-En, Zn-En).
▶ Zn-En: OpenOffice3
▶ Fr-En: EMEA
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 17 / 27
49. Experiment
Two different technicals domains with Two different language pairs
(Fr-En, Zn-En).
▶ Zn-En: OpenOffice3
▶ Fr-En: EMEA
Preprocessing is performed on both source sides to produce valid
segment.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 17 / 27
50. Experiment
Two different technicals domains with Two different language pairs
(Fr-En, Zn-En).
▶ Zn-En: OpenOffice3
▶ Fr-En: EMEA
Preprocessing is performed on both source sides to produce valid
segment.
Some sentences are randomly sampled from corpus as M and C.
▶ Zn-En: 400 M and 10.000 C.
▶ Fr-En: 300 M and 10.000 C.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 17 / 27
52. Evaluation
Evaluation is performed with Human Evaluation using Amazon
Mechanical Turk.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 18 / 27
53. Evaluation
Evaluation is performed with Human Evaluation using Amazon
Mechanical Turk.
The Score is ranging from 1 to 5 (Not Helpful until Extremely
Helpful).
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 18 / 27
54. Evaluation
Evaluation is performed with Human Evaluation using Amazon
Mechanical Turk.
The Score is ranging from 1 to 5 (Not Helpful until Extremely
Helpful).
Each segment M is rated by 5 Turkers and we keep track which
metric performs best (ties is allowed).
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 18 / 27
55. Evaluation
Evaluation is performed with Human Evaluation using Amazon
Mechanical Turk.
The Score is ranging from 1 to 5 (Not Helpful until Extremely
Helpful).
Each segment M is rated by 5 Turkers and we keep track which
metric performs best (ties is allowed).
The scores of each M are averaged as Mean Opinion Score
(MOS).
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 18 / 27
57. Result: Which metric performs best?
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 20 / 27
58. Result: Which metric performs best?
Table OO3 Zn-En
Metric Found Best Total C
PM 178 400
WPM 200 400
ED 193 400
NGP 251 400
WNGP 271 400
MWNGP 282 400
Table EMEA Fr-En
Metric Found Best Total C
PM 166 300
WPM 184 300
ED 148 300
NGP 188 300
WNGP 198 300
MWNGP 201 300
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 20 / 27
59. Result: Which metric performs best?
Table OO3 Zn-En
Metric Found Best Total C
PM 178 400
WPM 200 400
ED 193 400
NGP 251 400
WNGP 271 400
MWNGP 282 400
Table EMEA Fr-En
Metric Found Best Total C
PM 166 300
WPM 184 300
ED 148 300
NGP 188 300
WNGP 198 300
MWNGP 201 300
Modified Weighted N-Gram Precision (MWNGP) achieved the
best result compared to any other metrics.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 20 / 27
60. Result: Which metric performs best?
Table OO3 Zn-En
Metric Found Best Total C
PM 178 400
WPM 200 400
ED 193 400
NGP 251 400
WNGP 271 400
MWNGP 282 400
Table EMEA Fr-En
Metric Found Best Total C
PM 166 300
WPM 184 300
ED 148 300
NGP 188 300
WNGP 198 300
MWNGP 201 300
Modified Weighted N-Gram Precision (MWNGP) achieved the
best result compared to any other metrics.
There are slight different between WNGP and Modified-WNGP.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 20 / 27
61. Scatterplot: OO3 Percent Match
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
MOS
0.0
0.2
0.4
0.6
0.8
1.0
MetricValue
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 21 / 27
62. Scatterplot: OO3 Edit Distance
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
MOS
0.0
0.2
0.4
0.6
0.8
1.0
MetricValue
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 22 / 27
63. Scatterplot: OO3 Modified N-Gram Precision
1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
MOS
0.0
0.2
0.4
0.6
0.8
1.0
MetricValue
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 23 / 27
64. The effect of Z: Adjusting for length preferences
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 24 / 27
65. The effect of Z: Adjusting for length preferences
Many of the metrics are using Z as parameters.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 24 / 27
66. The effect of Z: Adjusting for length preferences
Many of the metrics are using Z as parameters.
Z parameter can be used to control for length preferences.
Table EMEA Fr-En
Z Value Avg Length
0.00 9.9298
0.25 13.204
0.50 16.0134
0.75 19.6355
1.00 27.8829
Table OO3 Zn-En
Z Value Avg Length
0.00 7.2475
0.25 9.5600
0.50 11.1250
0.75 14.1825
1.00 25.0875
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 24 / 27
67. The effect of Z: Adjusting for length preferences
Many of the metrics are using Z as parameters.
Z parameter can be used to control for length preferences.
Table EMEA Fr-En
Z Value Avg Length
0.00 9.9298
0.25 13.204
0.50 16.0134
0.75 19.6355
1.00 27.8829
Table OO3 Zn-En
Z Value Avg Length
0.00 7.2475
0.25 9.5600
0.50 11.1250
0.75 14.1825
1.00 25.0875
Smaller Z prefered shorter match that are more precise and
increased precision.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 24 / 27
68. The effect of Z: Adjusting for length preferences
Many of the metrics are using Z as parameters.
Z parameter can be used to control for length preferences.
Table EMEA Fr-En
Z Value Avg Length
0.00 9.9298
0.25 13.204
0.50 16.0134
0.75 19.6355
1.00 27.8829
Table OO3 Zn-En
Z Value Avg Length
0.00 7.2475
0.25 9.5600
0.50 11.1250
0.75 14.1825
1.00 25.0875
Smaller Z prefered shorter match that are more precise and
increased precision.
Larger Z prefers longer match that contains many correct
translations and increased recall.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 24 / 27
71. Conclusion
▶ This paper compares TM similarity metrics.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 26 / 27
72. Conclusion
▶ This paper compares TM similarity metrics.
▶ The best method is Modified Weighted N-Gram Precision.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 26 / 27
73. Conclusion
▶ This paper compares TM similarity metrics.
▶ The best method is Modified Weighted N-Gram Precision.
▶ All the discussed metrics only consider source sides in the
calculation.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 26 / 27
74. Conclusion
▶ This paper compares TM similarity metrics.
▶ The best method is Modified Weighted N-Gram Precision.
▶ All the discussed metrics only consider source sides in the
calculation.
▶ Z parameter is used to adjust the length preferences of the
retrieved TM.
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 26 / 27
75. Thank you for your attention!
2014-07-03 Koichi Akabe and Philip Arthur (MT Study) 27 / 27