ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015

S
Use of Paraphrasing to
Improve Matching and
Retrieval in Translation
Memory
Rohit Gupta, University of Wolverhampton
Supervisors:
Dr Constantin Orasan, University of Wolverhampton
Prof Josef van Genabith, Saarland University and DFKI
Prof Ruslan Mitkov, University of Wolverhampton

Outline
S  Objective
S  Translation Memory
S  Incorporating Paraphrasing
S  Human Evaluation
S  Conclusion

Objective
S  Improving matching and retrieval in Translation Memory
with the help of advanced language technology. This is
achieved by:
S  using paraphrases
S  using semantic information

Limitations of current TMs
S  Surface form comparison
S  No or very limited linguistic information

S  Surface form comparison
S  No or very limited linguistic information
S  Paraphrased segments either not retrieved or ranked
incorrectly among the retrieved segments

S  Fuzzy scores are really fuzzy
S  Input_1: the period laid down in article 4(3)
S  Input_2: the responsible person defined in article 4(3)
S  TM: the duration set forth in article 4(3)
57% fuzzy score as per word-based edit-distance for
both input sentences

S
Paraphrasing in TM
Matching and Retrieval

Paraphrases
S  PPDB: The paraphrase database (Ganitkevitch et al., 2013)
S  Phrasal and lexical paraphrases
S  L size (2 million)

Concept behind paraphrases
Figure from Ganitkevitch et al., 2013

Trivial Approach
S  Generate additional segments based on paraphrases
available

Complexity of Trivial Approach
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25
W1 W2 W3 W4 W5 | W6 W7 W8 W9 W10 |W11 W12 W13 W14 W15 |W16 W17 W18 W19 W20 | W21 W22 W23 W24 W25
5 5 5 55

Complexity of Trivial Approach
W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17 W18 W19 W20 W21 W22 W23 W24 W25
W1 W2 W3 W4 W5 | W6 W7 W8 W9 W10 |W11 W12 W13 W14 W15 |W16 W17 W18 W19 W20 | W21 W22 W23 W24 W25
5 5 5 55
(5+1)^5 -1= 7775 more segments

Our Approach
1.  Dynamic programming and Greedy approximation
2.  Classification of paraphrases
3.  Dealing different paraphrases in different manner
4.  Filtering

Classification of Paraphrases:
4 Types
i.  One word paraphrases
S  “period” => “duration”

4 Types
ii.  Multiple words but differing in one word
S  “in the period” => “during the period”

4 Types
iii.  Differing in multiple words but having same number of words
S  “laid down in article” => “set forth in article”

4 Types
iii.  Differing in multiple words but having same number of words
S  “laid down in article” => “set forth in article”
iv.  Differing in multiple words with different number of words
S  “a reasonable period of time to” => “a reasonable period to”

Example
The period laid down in article 4(3) of decision 468…

Example
The period laid down in article 4(3) of decision 468 …
The period
duration
time
laid down in article 4(3) of decision 468 …

Example
The period
duration
time
laid down
referred to
provided for
in
in
by
article
article
article
4(3) of decision 468 …

Example
The period
duration
time
laid down
referred to
provided for
in
by
article
2
3
4(3) of decision 468 …
Source
length

General Edit-distance
Implementation
Insertion cost = Deletion cost = Substitution cost =1

Edit-distance Calculation
0 1 2 3 4 5
TM
Input
# the period laid down in
0 #
1 the
2 period
3 referred
4 to
5 in

0 1 2 3 4 5
TM
Input
# the period
duration
time
laid down in
0 #
1 the
2 period
3 referred
4 to
5 in

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
laid down in referred to provided for by
0 #
1 the
2 period
3 referred
4 to
5 in

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0
1 the 1
2 period 2
3 referred 3
4 to 4
5 in 5

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0 1
1 the 1 0
2 period 2 1
3 referred 3 2
4 to 4 3
5 in 5 4

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0 1 2
1 the 1 0 1
2 period 2 1 0
3 referred 3 2 1
4 to 4 3 2
5 in 5 4 3

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0 1 2 3
1 the 1 0 1 2
2 period 2 1 0 1
3 referred 3 2 1 1
4 to 4 3 2 2
5 in 5 4 3 3

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0 1 2 3 4 5
1 the 1 0 1 2 3 4
2 period 2 1 0 1 2 3
3 referred 3 2 1 1 2 3
4 to 4 3 2 2 2 3
5 in 5 4 3 3 3 2

0 1 2 3 4 5 31 41 32 42 52
TM
Input
# the period
duration
time
0 # 0 1 2 3 4 5 3 4 3 4 5
1 the 1 0 1 2 3 4 2 3 2 3 4
2 period 2 1 0 1 2 3 1 2 1 2 3
3 referred 3 2 1 1 2 3 0 1 1 2 3
4 to 4 3 2 2 2 3 1 0 2 2 3
5 in 5 4 3 3 3 2 2 1 3 3 3

0 1 2 3 4 5 31 41 32 42 52 5
TM
Input
# the period
duration
time
laid down in referred to provided for by in
0 # 0 1 2 3 4 5 3 4 3 4 5 5
1 the 1 0 1 2 3 4 2 3 2 3 4 4
2 period 2 1 0 1 2 3 1 2 1 2 3 3
3 referred 3 2 1 1 2 3 0 1 1 2 3 2
4 to 4 3 2 2 2 3 1 0 2 2 3 1
5 in 5 4 3 3 3 2 2 1 3 3 3 0

Computational Complexity
S  Only type (i) and type (ii) paraphrases:
S  O(mnlog(p)) , p: paraphrases of types (i) and (ii)

Computational Complexity
S  Only type (i) and type (ii) paraphrases:
S  O(mnlog(p)) , p: paraphrases of types (i) and (ii)
S  All paraphrases:
S  O(lmn(log(p) + q)) , q: paraphrases of types (iii) and (iv),
l: length of paraphrase

Filtering
1.  Filter out the segments based on length (39%)

Filtering
2.  Filter out the candidates based on baseline edit-distance
similarity (39%)

Filtering
similarity (39%)
3.  Pick the top 100 segments

Filtering
similarity (39%)
3.  Pick the top 100 segments
4.  Segments within a certain range of similarity with the most
similar segment are selected for paraphrasing (35%)

Experiments
S  Corpus Used:
S  Europarl V7.0
S  English-German pairs
More results on DGT-TM (English-French) in:
Rohit Gupta and Constantin Orasan 2014. Incorporating Paraphrasing in Translation Memory
Matching and Retrieval. In Proceeding of EAMT-2014, Dubrovnik Croatia.

Corpus statistics: Europarl
TM Test
Segments 1,565,194 9,981
Source words 37,824,634 240,916
Target words 36,267,909 230,620
Source average length 24.16 24.13
Target average length 23.17 23.10

Results: Europarl dataset
TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18

TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18
Rank Change (RC) 9 19 16 25 36 65 97

TH 100 95 90 85 80 75 70
Edit Retrieved 117 127 163 215 257 337 440
+Para Retrieved 16 16 22 33 49 79 102
% Improve 13.68 12.6 13.5 15.35 19.07 23.44 23.18
Rank Change (RC) 9 19 16 25 36 65 97
METEOR-Edit-RC 45.48 46.48 45.59 39.24 37.32 34.02 31.10
METEOR-Para-RC 68.08 67.03 61.09 50.07 44.16 38.35 33.19
BLEU-Edit-RC 31.88 32.37 27.70 21.71 19.32 14.98 12.25
BLEU-Para-RC 52.00 47.92 43.90 31.76 25.24 19.75 15.28

TH 100 [85, 100) [70, 85)
Edit Retrieved 117 127 163
+Para Retrieved 16 30 98
% Improve 13.67 30.61 43.55
Rank Change (RC) 9 14 55
METEOR-Edit-RC 45.48 34.37 25.76
METEOR-Para-RC 68.08 40.00 25.82
BLEU-Edit-RC 31.88 13.18 6.85
BLEU-Para-RC 52.00 17.10 8.37

Dataset: Human Evaluation
TH 100 [85, 100) [70, 85) Total
Set1 2 6 6 14
Set2 5 4 7 16
Total 7 10 13 30

Evaluations
S  Post-Editing time
S  Keystrokes
S  Subjective Evaluation 2 Options
S  A is better
S  B is better
S  Subjective Evaluation 3 Options, Added One more
S  Both are equal

Experimental Settings:
Post-editing time and
Keystrokes
S  Each file contains segments of both types (ED+PP)
S  Each file is post-edited by 5 translation student
S  German: Native
S  English: C1

Results: Keystrokes
532.6
356.2
570.6
468.59
0
200
400
600
800
1000
1200
Edit-Distance Paraphrasing
NumberofKeystrokes
Set2
Set1
25.23% less keystrokes

Results: Post-Editing Time
520.02 466.44
657.75
603.17
0
200
400
600
800
1000
1200
1400
Edit-Distance Paraphrasing
Post-EditingTime
(Seconds)
Set2
Set1
9.18% time saved

Results: Subjective Evaluation
(Two Options, 17 Translators)
66
172
110
162
0
50
100
150
200
250
300
350
400
Edit-Distance is better Paraphrasing is better
Replies
Set2
Set1

Results: Subjective Evaluation
(Three Options, Seven Translators)
12
46 4026
53
33
0
20
40
60
80
100
120
Edit-Distance is
better
Paraphrasing is
better
Both are equal
Replies
Set2
Set1

H-TER and H-METEOR
Set1 Set2
Edit Distance Paraphrasing Edit Distance Paraphrasing
HMETEOR5 59.82 81.44 69.81 80.60
HTER5 39.72 17.63 27.81 18.71
HMETEOR10 59.82 81.44 69.81 80.61
HTER10 36.93 18.46 27.26 18.40

Segment-wise analysis
S  Statistical significance testing per segment
S  Welch-t test (One tailed, p<0.05)

S  Paraphrasing (Keystrokes/Post-Editing Time):
S  Twelve segments are significantly better

S  For ten segments all other evaluations also shows them better

S  For ten segments all other evaluations also shows them better
S  Edit-Distance (Keystrokes/Post-Editing Time):
S  Three segments are significantly better
S  Not all evaluations shows them better

Conclusion
S  Presented approach to include paraphrasing and machine
and retrieval
S  Presented human evaluations
S  In future, we will use deep learning for TM matching and
retrieval

Related Publications
S  Rohit Gupta and Constantin Orasan. 2014. Incorporating Paraphrasing in Translation
Memory Matching and Retrieval. In Proceeding of EAMT-2014, Dubrovnik Croatia.
S  Rohit Gupta, Constantin Orasan, Marcos Zampieri, Mihaela Vela and Josef van
Genabith. 2015. Can Transfer Memories afford not to use paraphrasing? In Proceeding of
EAMT-2015, Antalya Turkey.
S  Rohit Gupta, Hanna Bechara, Ismail El Maarouf, and Constantin Orasan. 2014a. UoW:
NLP techniques developed at the University of Wolverhampton for Semantic Similarity
and Textual Entailment. In Proceedings of the 8th International Workshop on Semantic
Evaluation (SemEval-2014), COLING-2014 Dublin Ireland.
S  Rohit Gupta, Hanna Bechara, and Constantin Orasan. 2014b. Intelligent Translation
Memory Matching and Retrieval Metric Exploiting Linguistic Technology. In Proceedings
of the thirty sixth Conference on Translating and Computer, London, UK.

References
S  Jane Bradbury and Ismaıl El Maarouf. 2013. An empirical classification of verbs based on
Semantic Types: the case of the ’poison’ verbs. In Proceedings of the Joint Symposium on Semantic
Processing. Textual Inference and Structures in Corpora, pages 70–74.
S  Juri Ganitkevitch, Van Durme Benjamin, and Chris Callison-Burch. 2013. Ppdb: The
paraphrase database. In Proceedings of NAACL-HLT, pages 758–764, Atlanta, Georgia.
Association for Computational Linguistics.
S  Marco Marelli, Luisa Bentivogli, Marco Baroni, Raffaella Bernardi, Stefano Menini, and
Roberto Zamparelli. 2014a. Semeval-2014 task 1: Evaluation of compositional distributional
semantic models on full sentences through semantic relatedness and textual entailment. In
Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval-2014).
S  Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, and
Roberto Zamparelli. 2014b. A sick cure for the evaluation of compositional distributional
semantic models. In Proceedings of LREC 2014.
S  Steinberger, Ralf, Andreas Eisele, Szymon Klocek, Spyridon Pilos, and Patrick Schluter. 2012.
DGT- TM: A freely available Translation Memory in 22 languages. LREC, pages 454–459.

ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015

Recommended

Recommended

More Related Content

More from RIILP

More from RIILP (20)

Recently uploaded

Recently uploaded (20)

ESR4 Rohit Gupta - EXPERT Summer School - Malaga 2015