Applicative evaluation of bilingual terminologies

Applicative evaluation of bilingual
terminologies
Estelle Delpech
NODALIDA
12th May 2011

1

Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4. Experimentation and results
5. Future improvements
2

2 / 47

Outline
evaluation
3 / 47

3

Context of the work
• Bilingual terminology mining from
comparable corpora
• Application to:
– computer-aided translation
– computer-aided terminology

4

4 / 47

Scope of the work
• Find a way to show the "added-value" of
the acquired terminology when used for
technical translation
– do translators translate better and/or faster ?

• Conception and experimentation of an
"applicative" evaluation protocol for
bilingual terminologies

5

5 / 47

Outline
2. Comparable corpora and
terminology evaluation
6 / 47

6

Comparable corpora
English texts on breast
cancer

French texts on breast
cancer

It has been suggested that
breast magnetic resonance
imaging (MRI) is more
accurate in the diagnosis of
breast cancer...

L'imagerie par résonance
magnétique avec injection
de gadolinium (IRM) est une
technique indépendante de
la densité mammaire....

Histological evaluation
revealed the presence of
DCIS...

Un diagnostic histologique
est nécessaire...

7

7 / 47

Comparable corpora
English texts on breast
cancer

cancer

It has been suggested that
breast magnetic resonance
imaging (MRI) is more
accurate in the diagnosis of
breast cancer...

de gadolinium (IRM) est une
technique indépendante de
la densité mammaire....

Histological evaluation
revealed the presence of
DCIS...

est nécessaire...

8

8 / 47

Comparable corpora
English texts on breast cancer

cancer

It has been suggested that breast
magnetic resonance imaging
(MRI) is more accurate in the
diagnosis of breast cancer...

de gadolinium (IRM) est
une technique
indépendante de la
densité mammaire....

Histological evaluation revealed
the presence of ductal
carcinoma in situ.

est nécessaire...
9

9 / 47

Advantages of comparable
corpora
• More available
– new domains
– unprecedented language pairs

• Quality
– spontaneous language
– not influenced from source texts

10

10 / 47

Reference evaluation of
bilingual terminologies
• Reference evaluation:
– output of the program is compared with a list
of reference translations

• Precision:
– percentage of output translations which are in
the reference

output∩reference
output
11

11 / 47

Reference evaluation with
comparable corpora
• Output:
– source term → ordered list of candidate
translations

• Example:
– histological → diagnostic1, histologie2,
histologique3, … nécessairen

12

12 / 47

Reference evaluation with
comparable corpora
• Precision:
– percentage of output translations which are in
the reference when you take into account
the Top 20 or Top 10 candidate
translations
• State-of-the-art:
– between 42% and 80% on Top 20
depending on corpus size, corpus type,
nature of translated elements [Morin and
Daille, 2009]
13

13 / 47

Reference vs. Applicative
evaluation
• Reference evaluation:
– ok for testing/developing the alignment
program
– fast, cheap, reproducible, objective

• Applicative evaluation:
– how much does the alignment program help
the end-users ?
– can the terminologies improve translation
quality?
14

14 / 47

Outline
evaluation
15

Applicative evaluation scenario

16

16 / 47


17

17 / 47


18

18 / 47


19

19 / 47

Questions raised
1) How do you assess translation quality ?
2) Evaluate the whole of the translations or
technical terms only ?

20

20 / 47

1) How do you assess translation
quality ?
• Translation studies evaluation grids:
– SICAL, SAE J 2450
– too complex, scarcely documented

• Machine translation objective metrics
– BLEU, METEOR
– not adapted to human translation
– reproducibility is not an advantage in our case
21

21 / 47

1) How do you assess translation
quality ?
• Machine translation subjective
evaluation
– translations evaluated by humans:
• quality judgement: adequacy, fluency...
• ranking

– use annotator agreement measure to ensure
judges agreement is sufficient

22

22 / 47

2) Evaluate the whole text or
just some terms ?
• Quality of a text translation = complex
interaction of several parameters
• Focus on those elements for which the
translator felt he/she needed a linguistic
resource:
– evaluates only the part of the translation on
which the terminology has an impact
– easier and faster
23

23 / 47

Applicative evaluation protocol
• Compare 3 different "situations of
translations"
– one situation = one type of resource

• Translators do the translation, note down
the terms they had to look up
• The quality of the terms' translations is
assessed by human judges

24

24 / 47

Situations of translation

25

25 / 47


26

26 / 47


27

27 / 47


28

28 / 47

Translations' assessment
1. Quality judgement :
– correct: standard term or expression
– acceptable: meaning is retained
– wrong: no meaning is retained

2. Ranking :
– from best to worst
– ties allowed

29

29 / 47

Outline
evaluation
30

Data
• Comparable corpora :
– breast cancer: 400k words/language
– water science: 2M words/language

• Texts to translate :
– research paper abstracts: ~500 words/domain
– lay science texts: ~500 words/domain

31

Translators' feedback
" Globally, 75% of technical words aren't in the
glossary, and for the other 25%, 99% have between
10 and 20 candidate translations and none has
been validated. So most of the time, you are just
partly sure, but you are never totally sure of your
translation. And in the worst cases, you translate
instinctively ".

 Translators were not prepared to use a bilingual
terminology with many candidate translations

 The terminology covered partially the
vocabulary of the texts to translate

32

32 / 47

Terminology coverage of texts to
translate
• Breast Cancer
– 94% of the vocabulary of the texts is in the
terminology
– fine-grained topic

• Water Science
– 14% of the vocabulary of the texts is in the
terminology
– topic is too general

33

33 / 47

Quality judgement / Breast Cancer
• equivalent proportion
of incorrect
translations
• Internet gives the
more correct
translations, then the
Comparable Corpora.

BREAST CANCER
K = 0,25
100%
90%

20%

19%

18%

42%

38%

35%

38%

43%

47%

80%
70%
60%
50%
40%
30%
20%
10%
0%

SIT. 1 / CC
SIT. 0 / GEN. LANG.
SIT. 2 / WEB

34

34 / 47

Quality judgement / Water Science
• Translations are
much better with
Internet
• Comparable corpora
produces worse
translations than the
general resources

WATER SCIENCE
K = 0,42

100%

18%

21%

23%

23%

90%
80%

7%
16%

70%
60%
50%
40%
30%

77%
59%

56%

20%
10%
0%

SIT. 1 / CC
SIT. 0/ GEN. LANG.

35

SIT. 2 / WEB

35 / 47

Results seem incoherent
• Translations produced
in situation 1 are
worse than
translations produced
in sit. 2
• But they share the
same "general
language resource"
basis

Terminology
mined from
COMPARABLE
CORPORA
general
language
resources

general
language
resources

BASELINE

Situation 1

36

36 / 47

Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition

SITUATION 1
Comparable corpora

SITUATION 2
Web

43%

14%

3%

-

25%

56%

79%

77%

44%

When translators have a specialized
ressource they tend to ignore the general
language resource
37

37 / 47

Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition

SITUATION 1
Comparable
corpora

SITUATION 2
Web

43%

14%

3%

-

25%

56%

79%

77%

44%

If translators of situation 1 had always looked
up the general resource first, translations of
situation 1 would have been at least as good
as translations of situation 0
38
38 / 47

Ranking / Breast Cancer
BREAST CANCER
K=0,69

42%

47%

45

32%

40
35
30

28%

26%

26%

25
20
15
10
5
0

CC vs. GEN. LANG.

CC vs. WEB

39

39 / 47

Ranking / Water science
WATER SCIENCE
K=0,63

49%
90

41% 43%

80

33%

70
60
50
40

18%

16%

30
20
10
0

CC vs. GEN. LANG.

CC vs. WEB

40

40 / 47

Outline
2. Bilingual terminology mining :
comparable vs. parallel corpora
3. Evaluation of bilingual terminologies
41

Improvements: terminology
coverage
• dependency between:
– added-value of the bilingual terminology
– its coverage of the texts to translate

• any added-value measure should also
indicate to what extent the terminology
contains the vocabulary of the translated
texts

42

42 / 47

Improvement 1: terminology
coverage
• Perspectives:
– create a "coverage" measure
– find out what is the minimum coverage for a
terminology to be "useful" to translate a given
text
– gather smaller but finer-grained corpora

43

43 / 47

Improvement 2: situations of
translations
• When translators have several ressources
at their disposal, they tend to ignore the
general language resource
• Consequence : the same resource is used
differently depending on the situation
• Seems to be the cause for incoherent
results

44

44 / 47

Improvement 2: situations of
translations
• Perspective : use 0 or 1 resource per
situation of translation

terminology
mined from
Comparable
Corpora

Situation 0

Situation 1

Web

Situation 2

45

45 / 47

Improvement 3: train translators
• Prepare translators to use "ambiguous",
unvalidated terminologies
• Do a first blank evaluation to :
– train the translators
– train the judges → results in higher
agreement

46

46 / 47

Acknowledgements
This work was funded by:
– French National Research Agency, subvention
n° ANR-08-CORD-009
– Lingua et Machina, www.lingua-et-machina.com

Annotators:
– Clémence De Baudus
– Mathieu Delage
47

Applicative evaluation of bilingual terminologies

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (17)

Similar to Applicative evaluation of bilingual terminologies

Similar to Applicative evaluation of bilingual terminologies (20)

More from Estelle Delpech

More from Estelle Delpech (15)

Recently uploaded

Recently uploaded (20)

Applicative evaluation of bilingual terminologies