Material presented at the Tenth Biennial Conference of the
Association for Machine Translation in the Americas (AMTA 2012), San Diego, CA.
Download paper at http://hal.archives-ouvertes.fr/hal-00730325.
Instiutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Identification of Fertile Translations in Comparable Corpora Using Morpho-Compositional Approach
1. Identification of Fertile Translations in Comparable
Corpora
a Morpho-Compositional Approach
Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire
e
Lemaire2,3
1 LINA,
2 GREMUTS, Universit´ de Grenoble
Universit´ de Nantes
e
e
3 Lingua et Machina
AMTA’12
10/31/12
San Diego, CA
2. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
3. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
4. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Context
1 / 28
5. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Context
Research partly funded by Computer-Aided Translation
company
1 / 28
6. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Context
Research partly funded by Computer-Aided Translation
company
Goal: generate domain-specific bilingual lexicons when no
parallel data is available
1 / 28
7. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Context
Research partly funded by Computer-Aided Translation
company
Goal: generate domain-specific bilingual lexicons when no
parallel data is available
Available data:
general language bilingual dictionary
domain-specific comparable corpora
1 / 28
8. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
2 / 28
9. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
2 / 28
10. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
2 / 28
11. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...
2 / 28
12. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
2 / 28
13. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...
⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation
2 / 28
14. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
3 / 28
15. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
3 / 28
16. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons
3 / 28
17. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons
Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
3 / 28
18. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons
Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
3 / 28
19. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons
Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:
bi-dimensional → deux dimensions ’two dimensions’
3 / 28
20. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Variation in translation
Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons
Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus
Fertility:
bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed
3 / 28
21. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Fertility
4 / 28
22. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Fertility
Definition target term has more words than the source term
4 / 28
23. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the source
term
voie de glace ’route of ice’ → ice climbing route
aquarelle (not decomposable) → water color
4 / 28
24. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Original problem
Comparable corpora
Variation in translation
Fertility
Definition target term has more words than the source term
semantic fertility target term has more morphemes than the source
term
voie de glace ’route of ice’ → ice climbing route
aquarelle (not decomposable) → water color
surface fertility target and source terms have the same number of
morphemes
bi-dimensional → deux dimensions ’two dimensions’
4 / 28
25. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
26. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation
Principle of compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
5 / 28
27. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation
Principle of compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Definition of compositional translation
The translation of the whole is a function of the translation of the
parts
5 / 28
28. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation process
6 / 28
29. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation process
1
Decomposition
“cytotoxic” → {cyto, toxic}
6 / 28
30. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation process
1
Decomposition
“cytotoxic” → {cyto, toxic}
2
Translation
{cyto, toxic} → {cyto, toxique}
6 / 28
31. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation process
1
Decomposition
“cytotoxic” → {cyto, toxic}
2
Translation
{cyto, toxic} → {cyto, toxique}
3
Recomposition
{cyto, toxique} → {cytotoxique, toxiquecyto}
6 / 28
32. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Compositional translation process
1
Decomposition
“cytotoxic” → {cyto, toxic}
2
Translation
{cyto, toxic} → {cyto, toxique}
3
Recomposition
{cyto, toxique} → {cytotoxique, toxiquecyto}
4
Selection
{cytotoxique, toxiquecyto} → “cytotoxique”
6 / 28
33. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Relevance of compositional translation
7 / 28
34. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Relevance of compositional translation
More than 60% of terms in technical and scientific domains
are morphologically complex [Namer and Baud, 2007]
7 / 28
35. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Relevance of compositional translation
More than 60% of terms in technical and scientific domains
are morphologically complex [Namer and Baud, 2007]
Outperforms distributional approach for the translation of
terms with compositional meaning [Morin and Daille, 2009]
7 / 28
36. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
8 / 28
37. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
8 / 28
38. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
8 / 28
39. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
8 / 28
40. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
8 / 28
41. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
8 / 28
42. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
[Weller et al., 2011]
8 / 28
43. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
[Weller et al., 2011]
Noun compound → noun phrase
8 / 28
44. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope
8 / 28
45. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
8 / 28
46. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e
[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e
[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope
⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
47. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution I
9 / 28
48. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution I
Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:
9 / 28
49. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution I
Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:
cyto → cellule ’cell’
9 / 28
50. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution I
Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:
cyto → cellule ’cell’
cytotoxic → toxique (pour les) cellules ’toxic to the cells’
9 / 28
51. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution II
10 / 28
52. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution II
Larger variety of input/output structures:
SOURCE
TARGET
10 / 28
53. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution II
Larger variety of input/output structures:
SOURCE
prefixed word
neoclassical compound
suffixed word
compound
any combination
TARGET
10 / 28
54. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Underlying principle and advantages
Related work
Contribution
Contribution II
Larger variety of input/output structures:
SOURCE
TARGET
prefixed word
prefixed word
neoclassical compound
neoclassical compound
suffixed word
suffixed word
=⇒
compound
compound
any combination
any combination
phrase
10 / 28
55. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
56. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Overview
11 / 28
57. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Overview
1
Decomposition
lexicons + heuristic rules
11 / 28
58. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Overview
1
Decomposition
lexicons + heuristic rules
2
Translation
dictionary look-up
11 / 28
59. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Overview
1
Decomposition
lexicons + heuristic rules
2
Translation
dictionary look-up
3
Recomposition
permutations
11 / 28
60. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Overview
1
Decomposition
lexicons + heuristic rules
2
Translation
dictionary look-up
3
Recomposition
permutations
4
Selection
search occurrences in target texts
11 / 28
61. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
12 / 28
62. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
12 / 28
63. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
12 / 28
64. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items
12 / 28
65. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items
respect some length constraints on the substrings
12 / 28
66. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 1
Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items
respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}
12 / 28
67. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 2
13 / 28
68. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 2
Generate all possible concatenations of the minimal
components:
13 / 28
69. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 2
Generate all possible concatenations of the minimal
components:
{ non, cyto, toxic} → {non, cyto, toxic},
{noncyto, toxic}, {non, cytotoxic},
{noncytotoxic}
13 / 28
70. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Decomposition - step 2
Generate all possible concatenations of the minimal
components:
{ non, cyto, toxic} → {non, cyto, toxic},
{noncyto, toxic}, {non, cytotoxic},
{noncytotoxic}
⇒ Increases the chances of matching the components with
entries of the dictionaries
13 / 28
71. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation through direct dictionary look-up
14 / 28
72. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:
toxic → toxique
14 / 28
73. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:
toxic → toxique
Morpheme translation table for bound morphemes:
-cyto- → -cyto-, cellule
14 / 28
74. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation through direct dictionary look-up
Bilingual dictionary for lexical items:
toxic → toxique
Morpheme translation table for bound morphemes:
-cyto- → -cyto-, cellule
{-cyto-, toxic} → {-cyto-, toxique},
{cellule, toxique}
14 / 28
75. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation with variation
15 / 28
76. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation with variation
Morphological lexicon
toxique → toxicit´ ’toxicity’
e
15 / 28
77. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation with variation
Morphological lexicon
toxique → toxicit´ ’toxicity’
e
Synonyms
toxique → v´n´neux ’poisonous’
e e
15 / 28
78. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Translation with variation
Morphological lexicon
toxique → toxicit´ ’toxicity’
e
Synonyms
toxique → v´n´neux ’poisonous’
e e
{-cyto-, toxic} → {-cyto-, toxicit´},
e
{-cyto-, v´n´neux}, {cellule, toxicit´},
e e
e
{cellule, v´n´neux}
e e
15 / 28
79. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 1
16 / 28
80. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 1
Permutate the target components :
16 / 28
81. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}
16 / 28
82. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}
Recreate target words by generating all possible
concatenations of the components :
16 / 28
83. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 1
Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}
Recreate target words by generating all possible
concatenations of the components :
{-cyto-, toxique} →
{cytotoxique}
{cyto toxique},
16 / 28
84. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 2
17 / 28
85. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 2
Filter out impossible target terms words
e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item
17 / 28
86. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Recomposition - step 2
Filter out impossible target terms words
e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item
{cyto toxique}, {cytotoxique}→ {cytotoxique}
17 / 28
87. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Selection
18 / 28
88. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Selection
Match target term with the words of the target corpus
18 / 28
89. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
18 / 28
90. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Decomposition
Translation
Recomposition
Selection
Selection
Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’
18 / 28
91. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
92. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
19 / 28
93. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
19 / 28
94. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
19 / 28
95. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
19 / 28
96. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2
scientic papers +
1
2
lay science
19 / 28
97. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2
scientic papers +
1
2
lay science
pos-tagged with software Xelda1
1
http://www.temis.com
19 / 28
98. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2
scientic papers +
1
2
lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable
1
http://www.temis.com
19 / 28
99. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2
scientic papers +
1
2
lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable
English-French: 0.71
1
http://www.temis.com
19 / 28
100. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2
scientic papers +
1
2
lay science
pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable
English-French: 0.71
English-German: 0.45
1
http://www.temis.com
19 / 28
101. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Source terms
20 / 28
102. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Source terms
Morphologically constructed word collected from the English
texts
20 / 28
103. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Source terms
Morphologically constructed word collected from the English
texts
None of them have a translation in the general language
dictionary which is attested in the target texts
English to French: 1839 source terms
English to German: 1824 source terms
20 / 28
104. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
21 / 28
105. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
General language dictionary (Xelda)
21 / 28
106. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
21 / 28
107. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
21 / 28
108. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
21 / 28
109. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Resources for translation
General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]
21 / 28
110. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Evaluation measures I
Coverage
C=
σ(STi ) =
1
0
|ST |
i=1 σ(STi )
|ST |
if |Trans(STi )| ≥ 1
else
⇒ % of source terms with at least 1 translation (regardless of its
accuracy)
22 / 28
111. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Evaluation measures II
Precision
P=
|Exact|
|Trans|
⇒ % of generated translations which are exact translations
Overall quality
OQ = C × P
⇒ trade-off between precision and coverage
23 / 28
112. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Experiments
24 / 28
113. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Experiments
combination of linguistic resources
24 / 28
114. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Experiments
combination of linguistic resources
quality of the lexicon with and without the fertile translations
24 / 28
115. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Results: English → French
C
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
avg. gain
+S
+M
+D
+SMD
P
-f
+f
.04 .12
.05 .15
.11 .23
.16 .26
.24 .39
+11
-f
+f
.81 .57
.69 .50
.20 .28
.70 .60
.31 .33
-8.6
OQ
-f
+f
.03 .07
.03 .08
.02 .06
.11 .16
.07 .13
+4.8
25 / 28
116. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Results: English → German
C
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
avg. gain
+S
+M
+D
+SMD
P
-f
+f
.06 .13
.08 .16
.12 .22
.17 .26
.24 .36
+9.2
-f
+f
.80 .35
.69 .31
.40 .23
.65 .39
.43 .27
-28.4
OQ
-f
+f
.05 .05
.05 .05
.05 .05
.11 .10
.10 .10
-0.2
26 / 28
117. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Discussion: English-French vs. English-German results
27 / 28
118. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.
0.71)
27 / 28
119. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:
27 / 28
120. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:
German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a
27 / 28
121. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Discussion: English-French vs. English-German results
English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:
German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a
French romance language: creates phrases more easily
oestrogen-independant → ind´pendant des œstrog`nes
e
e
27 / 28
122. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Error analysis
28 / 28
123. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Error analysis
Problems in word reordering
28 / 28
124. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Error analysis
Problems in word reordering
self-examination → untersuchung selbst ’examination self’
28 / 28
125. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Error analysis
Problems in word reordering
self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
28 / 28
126. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Data
Evaluation measures
Results
Error analysis
Problems in word reordering
self-examination → untersuchung selbst ’examination self’
Wrong or innapropriate translations
in-patient → pas malade ’not ill’
in → “inside” → inside patient
in → “inverse” → not a patient
28 / 28
127. Outline
1
Context and original problem
2
Compositional translation framework
3
Detailed translation method
4
Experiments and results
5
Future work
128. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
29 / 28
129. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
Improve quality of linguistic resources
29 / 28
130. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
Improve quality of linguistic resources
morphological derivation rules instead of stemming
29 / 28
131. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus
29 / 28
132. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus
Try translations patterns instead of permutations
29 / 28
133. Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work
Future work
Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus
Try translations patterns instead of permutations
Rank translations
29 / 28
134. Thank you for your attention.
B
estelle.delpech@univ-nantes.fr
beatrice.daille@univ-nantes.fr
emmanuel.morin@univ-nantes.fr
cl@lingua-et-machina.com
136. Exact translations
Non fertiles:
pathophysiological → physiopathologique
overactive → uberaktiv
¨
Fertiles:
cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’
e
mastectomy → ablation der brust ’ablation of the breast’
137. Morphological variants
Non fertiles:
dosimetry → dosim´trique ’dosimetric’
e
radiosensitivity → strahlenempfindlich ’radiosensitive’
Fertiles:
milk-producing → production de lait ’production of milk’
selfexamination → selbst untersuchen ’self examine’
138. Inexact but semantically related
Non fertiles:
oncogene → oncog´n`se ’oncogenesis’
e e
breakthrough → durchbrechen ’break’
Fertiles:
chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’
treatable → pouvoir le traiter ’can treat it’
139. Wrong translations
Non fertiles:
immunoscore → immunomarquer ’immunostain’
check-in → unkontrollieren ’uncontrolled’
Fertiles:
bloodstream → fliessen mehr blut ’more blood flow’
risk-reducing → risque de r´duire ’risk of reducing’
e
140. R´f´rences I
ee
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.
In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
e
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.
In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.
In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text
Processing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.
In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,
Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.
D. Reidel, Dordrecht, Holland.
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.
In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain
sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands
o
edition.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.
International Journal of Medical Informatics, 76(2-3):226–33.
141. R´f´rences II
ee
Porter, M. F. (1980).
An algorithm for suffix stripping.
Program, 14(3):130–137.
Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011).
Simple methods for dealing with term variation and term alignment.
In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93,
Paris, France.