Applicative evaluation of bilingual
terminologies
Estelle Delpech
NODALIDA
12th May 2011

1
Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4...
Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4...
Context of the work
• Bilingual terminology mining from
comparable corpora
• Application to:
– computer-aided translation
...
Scope of the work
• Find a way to show the "added-value" of
the acquired terminology when used for
technical translation
–...
Outline
1. Context and scope of work
2. Comparable corpora and
terminology evaluation
3. Applicative evaluation protocol
4...
Comparable corpora
English texts on breast
cancer

French texts on breast
cancer

It has been suggested that
breast magnet...
Comparable corpora
English texts on breast
cancer

French texts on breast
cancer

It has been suggested that
breast magnet...
Comparable corpora
English texts on breast cancer

French texts on breast
cancer

It has been suggested that breast
magnet...
Advantages of comparable
corpora
• More available
– new domains
– unprecedented language pairs

• Quality
– spontaneous la...
Reference evaluation of
bilingual terminologies
• Reference evaluation:
– output of the program is compared with a list
of...
Reference evaluation with
comparable corpora
• Output:
– source term → ordered list of candidate
translations

• Example:
...
Reference evaluation with
comparable corpora
• Precision:
– percentage of output translations which are in
the reference w...
Reference vs. Applicative
evaluation
• Reference evaluation:
– ok for testing/developing the alignment
program
– fast, che...
Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4...
Applicative evaluation scenario

16

16 / 47
Applicative evaluation scenario

17

17 / 47
Applicative evaluation scenario

18

18 / 47
Applicative evaluation scenario

19

19 / 47
Questions raised
1) How do you assess translation quality ?
2) Evaluate the whole of the translations or
technical terms o...
1) How do you assess translation
quality ?
• Translation studies evaluation grids:
– SICAL, SAE J 2450
– too complex, scar...
1) How do you assess translation
quality ?
•  Machine translation subjective
evaluation
– translations evaluated by humans...
2) Evaluate the whole text or
just some terms ?
• Quality of a text translation = complex
interaction of several parameter...
Applicative evaluation protocol
• Compare 3 different "situations of
translations"
– one situation = one type of resource
...
Situations of translation

25

25 / 47
Situations of translation

26

26 / 47
Situations of translation

27

27 / 47
Situations of translation

28

28 / 47
Translations' assessment
1. Quality judgement :
– correct: standard term or expression
– acceptable: meaning is retained
–...
Outline
1. Context and scope of work
2. Comparable corpora and terminology
evaluation
3. Applicative evaluation protocol
4...
Data
• Comparable corpora :
– breast cancer: 400k words/language
– water science: 2M words/language

• Texts to translate ...
Translators' feedback
" Globally, 75% of technical words aren't in the
glossary, and for the other 25%, 99% have between
1...
Terminology coverage of texts to
translate
• Breast Cancer
– 94% of the vocabulary of the texts is in the
terminology
– fi...
Quality judgement / Breast Cancer
• equivalent proportion
of incorrect
translations
• Internet gives the
more correct
tran...
Quality judgement / Water Science
• Translations are
much better with
Internet
• Comparable corpora
produces worse
transla...
Results seem incoherent
• Translations produced
in situation 1 are
worse than
translations produced
in sit. 2
• But they s...
Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition

SITUATION 1
Comparable corpora

SI...
Possible explanation
BASELINE
General Language
resource
Specialized resource
Intuition

SITUATION 1
Comparable
corpora

SI...
Ranking / Breast Cancer
BREAST CANCER
K=0,69

42%

47%

45

32%

40
35
30

28%

26%

26%

25
20
15
10
5
0

CC vs. GEN. LAN...
Ranking / Water science
WATER SCIENCE
K=0,63

49%
90

41% 43%

80

33%

70
60
50
40

18%

16%

30
20
10
0

CC vs. GEN. LAN...
Outline
1. Context and scope of work
2. Bilingual terminology mining :
comparable vs. parallel corpora
3. Evaluation of bi...
Improvements: terminology
coverage
• dependency between:
– added-value of the bilingual terminology
– its coverage of the ...
Improvement 1: terminology
coverage
• Perspectives:
– create a "coverage" measure
– find out what is the minimum coverage ...
Improvement 2: situations of
translations
• When translators have several ressources
at their disposal, they tend to ignor...
Improvement 2: situations of
translations
• Perspective : use 0 or 1 resource per
situation of translation

terminology
mi...
Improvement 3: train translators
• Prepare translators to use "ambiguous",
unvalidated terminologies
• Do a first blank ev...
Acknowledgements
This work was funded by:
– French National Research Agency, subvention
n° ANR-08-CORD-009
– Lingua et Mac...
Upcoming SlideShare
Loading in...5
×

Applicative evaluation of bilingual terminologies

113

Published on

Material presented at the 18th Nordic Conference of Computational Linguistics (NODALIDA 2011), Riga, Latvia.
Download paper: http://hal.archives-ouvertes.fr/hal-00585187
Institutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
113
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Applicative evaluation of bilingual terminologies"

  1. 1. Applicative evaluation of bilingual terminologies Estelle Delpech NODALIDA 12th May 2011 1
  2. 2. Outline 1. Context and scope of work 2. Comparable corpora and terminology evaluation 3. Applicative evaluation protocol 4. Experimentation and results 5. Future improvements 2 2 / 47
  3. 3. Outline 1. Context and scope of work 2. Comparable corpora and terminology evaluation 3. Applicative evaluation protocol 4. Experimentation and results 5. Future improvements 3 / 47 3
  4. 4. Context of the work • Bilingual terminology mining from comparable corpora • Application to: – computer-aided translation – computer-aided terminology 4 4 / 47
  5. 5. Scope of the work • Find a way to show the "added-value" of the acquired terminology when used for technical translation – do translators translate better and/or faster ? • Conception and experimentation of an "applicative" evaluation protocol for bilingual terminologies 5 5 / 47
  6. 6. Outline 1. Context and scope of work 2. Comparable corpora and terminology evaluation 3. Applicative evaluation protocol 4. Experimentation and results 5. Future improvements 6 / 47 6
  7. 7. Comparable corpora English texts on breast cancer French texts on breast cancer It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer... L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire.... Histological evaluation revealed the presence of DCIS... Un diagnostic histologique est nécessaire... 7 7 / 47
  8. 8. Comparable corpora English texts on breast cancer French texts on breast cancer It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer... L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire.... Histological evaluation revealed the presence of DCIS... Un diagnostic histologique est nécessaire... 8 8 / 47
  9. 9. Comparable corpora English texts on breast cancer French texts on breast cancer It has been suggested that breast magnetic resonance imaging (MRI) is more accurate in the diagnosis of breast cancer... L'imagerie par résonance magnétique avec injection de gadolinium (IRM) est une technique indépendante de la densité mammaire.... Histological evaluation revealed the presence of ductal carcinoma in situ. Un diagnostic histologique est nécessaire... 9 9 / 47
  10. 10. Advantages of comparable corpora • More available – new domains – unprecedented language pairs • Quality – spontaneous language – not influenced from source texts 10 10 / 47
  11. 11. Reference evaluation of bilingual terminologies • Reference evaluation: – output of the program is compared with a list of reference translations • Precision: – percentage of output translations which are in the reference output∩reference output 11 11 / 47
  12. 12. Reference evaluation with comparable corpora • Output: – source term → ordered list of candidate translations • Example: – histological → diagnostic1, histologie2, histologique3, … nécessairen 12 12 / 47
  13. 13. Reference evaluation with comparable corpora • Precision: – percentage of output translations which are in the reference when you take into account the Top 20 or Top 10 candidate translations • State-of-the-art: – between 42% and 80% on Top 20 depending on corpus size, corpus type, nature of translated elements [Morin and Daille, 2009] 13 13 / 47
  14. 14. Reference vs. Applicative evaluation • Reference evaluation: – ok for testing/developing the alignment program – fast, cheap, reproducible, objective • Applicative evaluation: – how much does the alignment program help the end-users ? – can the terminologies improve translation quality? 14 14 / 47
  15. 15. Outline 1. Context and scope of work 2. Comparable corpora and terminology evaluation 3. Applicative evaluation protocol 4. Experimentation and results 5. Future improvements 15
  16. 16. Applicative evaluation scenario 16 16 / 47
  17. 17. Applicative evaluation scenario 17 17 / 47
  18. 18. Applicative evaluation scenario 18 18 / 47
  19. 19. Applicative evaluation scenario 19 19 / 47
  20. 20. Questions raised 1) How do you assess translation quality ? 2) Evaluate the whole of the translations or technical terms only ? 20 20 / 47
  21. 21. 1) How do you assess translation quality ? • Translation studies evaluation grids: – SICAL, SAE J 2450 – too complex, scarcely documented • Machine translation objective metrics – BLEU, METEOR – not adapted to human translation – reproducibility is not an advantage in our case 21 21 / 47
  22. 22. 1) How do you assess translation quality ? •  Machine translation subjective evaluation – translations evaluated by humans: • quality judgement: adequacy, fluency... • ranking – use annotator agreement measure to ensure judges agreement is sufficient 22 22 / 47
  23. 23. 2) Evaluate the whole text or just some terms ? • Quality of a text translation = complex interaction of several parameters • Focus on those elements for which the translator felt he/she needed a linguistic resource: – evaluates only the part of the translation on which the terminology has an impact – easier and faster 23 23 / 47
  24. 24. Applicative evaluation protocol • Compare 3 different "situations of translations" – one situation = one type of resource • Translators do the translation, note down the terms they had to look up • The quality of the terms' translations is assessed by human judges 24 24 / 47
  25. 25. Situations of translation 25 25 / 47
  26. 26. Situations of translation 26 26 / 47
  27. 27. Situations of translation 27 27 / 47
  28. 28. Situations of translation 28 28 / 47
  29. 29. Translations' assessment 1. Quality judgement : – correct: standard term or expression – acceptable: meaning is retained – wrong: no meaning is retained 2. Ranking : – from best to worst – ties allowed 29 29 / 47
  30. 30. Outline 1. Context and scope of work 2. Comparable corpora and terminology evaluation 3. Applicative evaluation protocol 4. Experimentation and results 5. Future improvements 30
  31. 31. Data • Comparable corpora : – breast cancer: 400k words/language – water science: 2M words/language • Texts to translate : – research paper abstracts: ~500 words/domain – lay science texts: ~500 words/domain 31
  32. 32. Translators' feedback " Globally, 75% of technical words aren't in the glossary, and for the other 25%, 99% have between 10 and 20 candidate translations and none has been validated. So most of the time, you are just partly sure, but you are never totally sure of your translation. And in the worst cases, you translate instinctively ".  Translators were not prepared to use a bilingual terminology with many candidate translations  The terminology covered partially the vocabulary of the texts to translate 32 32 / 47
  33. 33. Terminology coverage of texts to translate • Breast Cancer – 94% of the vocabulary of the texts is in the terminology – fine-grained topic • Water Science – 14% of the vocabulary of the texts is in the terminology – topic is too general 33 33 / 47
  34. 34. Quality judgement / Breast Cancer • equivalent proportion of incorrect translations • Internet gives the more correct translations, then the Comparable Corpora. BREAST CANCER K = 0,25 100% 90% 20% 19% 18% 42% 38% 35% 38% 43% 47% 80% 70% 60% 50% 40% 30% 20% 10% 0% SIT. 1 / CC SIT. 0 / GEN. LANG. SIT. 2 / WEB 34 34 / 47
  35. 35. Quality judgement / Water Science • Translations are much better with Internet • Comparable corpora produces worse translations than the general resources WATER SCIENCE K = 0,42 100% 18% 21% 23% 23% 90% 80% 7% 16% 70% 60% 50% 40% 30% 77% 59% 56% 20% 10% 0% SIT. 1 / CC SIT. 0/ GEN. LANG. 35 SIT. 2 / WEB 35 / 47
  36. 36. Results seem incoherent • Translations produced in situation 1 are worse than translations produced in sit. 2 • But they share the same "general language resource" basis Terminology mined from COMPARABLE CORPORA general language resources general language resources BASELINE Situation 1 36 36 / 47
  37. 37. Possible explanation BASELINE General Language resource Specialized resource Intuition SITUATION 1 Comparable corpora SITUATION 2 Web 43% 14% 3% - 25% 56% 79% 77% 44% When translators have a specialized ressource they tend to ignore the general language resource 37 37 / 47
  38. 38. Possible explanation BASELINE General Language resource Specialized resource Intuition SITUATION 1 Comparable corpora SITUATION 2 Web 43% 14% 3% - 25% 56% 79% 77% 44% If translators of situation 1 had always looked up the general resource first, translations of situation 1 would have been at least as good as translations of situation 0 38 38 / 47
  39. 39. Ranking / Breast Cancer BREAST CANCER K=0,69 42% 47% 45 32% 40 35 30 28% 26% 26% 25 20 15 10 5 0 CC vs. GEN. LANG. CC vs. WEB 39 39 / 47
  40. 40. Ranking / Water science WATER SCIENCE K=0,63 49% 90 41% 43% 80 33% 70 60 50 40 18% 16% 30 20 10 0 CC vs. GEN. LANG. CC vs. WEB 40 40 / 47
  41. 41. Outline 1. Context and scope of work 2. Bilingual terminology mining : comparable vs. parallel corpora 3. Evaluation of bilingual terminologies 4. Applicative evaluation protocol 5. Experimentation and results 6. Future improvements 41
  42. 42. Improvements: terminology coverage • dependency between: – added-value of the bilingual terminology – its coverage of the texts to translate • any added-value measure should also indicate to what extent the terminology contains the vocabulary of the translated texts 42 42 / 47
  43. 43. Improvement 1: terminology coverage • Perspectives: – create a "coverage" measure – find out what is the minimum coverage for a terminology to be "useful" to translate a given text – gather smaller but finer-grained corpora 43 43 / 47
  44. 44. Improvement 2: situations of translations • When translators have several ressources at their disposal, they tend to ignore the general language resource • Consequence : the same resource is used differently depending on the situation • Seems to be the cause for incoherent results 44 44 / 47
  45. 45. Improvement 2: situations of translations • Perspective : use 0 or 1 resource per situation of translation terminology mined from Comparable Corpora Situation 0 Situation 1 Web Situation 2 45 45 / 47
  46. 46. Improvement 3: train translators • Prepare translators to use "ambiguous", unvalidated terminologies • Do a first blank evaluation to : – train the translators – train the judges → results in higher agreement 46 46 / 47
  47. 47. Acknowledgements This work was funded by: – French National Research Agency, subvention n° ANR-08-CORD-009 – Lingua et Machina, www.lingua-et-machina.com Annotators: – Clémence De Baudus – Mathieu Delage 47
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×