Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

880 views

Published on

Material presented at the Tenth Biennial Conference of the
Association for Machine Translation in the Americas (AMTA 2012), San Diego, CA.
Download paper at http://hal.archives-ouvertes.fr/hal-00730325.
Instiutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
880
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
5
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

  1. 1. Identification of Fertile Translations in Comparable Corpora a Morpho-Compositional Approach Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire e Lemaire2,3 1 LINA, 2 GREMUTS, Universit´ de Grenoble Universit´ de Nantes e e 3 Lingua et Machina AMTA’12 10/31/12 San Diego, CA
  2. 2. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  3. 3. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  4. 4. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context 1 / 28
  5. 5. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company 1 / 28
  6. 6. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company Goal: generate domain-specific bilingual lexicons when no parallel data is available 1 / 28
  7. 7. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company Goal: generate domain-specific bilingual lexicons when no parallel data is available Available data: general language bilingual dictionary domain-specific comparable corpora 1 / 28
  8. 8. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs 2 / 28
  9. 9. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: 2 / 28
  10. 10. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts 2 / 28
  11. 11. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... 2 / 28
  12. 12. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... ⇒ do not expect parallelism in source ↔ target structures 2 / 28
  13. 13. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... ⇒ do not expect parallelism in source ↔ target structures ⇒ need to deal with variation in translation 2 / 28
  14. 14. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation 3 / 28
  15. 15. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e 3 / 28
  16. 16. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons 3 / 28
  17. 17. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance 3 / 28
  18. 18. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus 3 / 28
  19. 19. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus Fertility: bi-dimensional → deux dimensions ’two dimensions’ 3 / 28
  20. 20. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus Fertility: bi-dimensional → deux dimensions ’two dimensions’ ⇒ scarcely adressed 3 / 28
  21. 21. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility 4 / 28
  22. 22. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term 4 / 28
  23. 23. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term semantic fertility target term has more morphemes than the source term voie de glace ’route of ice’ → ice climbing route aquarelle (not decomposable) → water color 4 / 28
  24. 24. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term semantic fertility target term has more morphemes than the source term voie de glace ’route of ice’ → ice climbing route aquarelle (not decomposable) → water color surface fertility target and source terms have the same number of morphemes bi-dimensional → deux dimensions ’two dimensions’ 4 / 28
  25. 25. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  26. 26. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation Principle of compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] 5 / 28
  27. 27. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation Principle of compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Definition of compositional translation The translation of the whole is a function of the translation of the parts 5 / 28
  28. 28. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 6 / 28
  29. 29. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 6 / 28
  30. 30. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 6 / 28
  31. 31. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 3 Recomposition {cyto, toxique} → {cytotoxique, toxiquecyto} 6 / 28
  32. 32. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 3 Recomposition {cyto, toxique} → {cytotoxique, toxiquecyto} 4 Selection {cytotoxique, toxiquecyto} → “cytotoxique” 6 / 28
  33. 33. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation 7 / 28
  34. 34. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] 7 / 28
  35. 35. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] Outperforms distributional approach for the translation of terms with compositional meaning [Morin and Daille, 2009] 7 / 28
  36. 36. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] 8 / 28
  37. 37. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word 8 / 28
  38. 38. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e 8 / 28
  39. 39. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] 8 / 28
  40. 40. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound 8 / 28
  41. 41. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e 8 / 28
  42. 42. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] 8 / 28
  43. 43. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase 8 / 28
  44. 44. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope 8 / 28
  45. 45. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope ⇒ Restricted to a small set of source-to-target structures 8 / 28
  46. 46. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope ⇒ Restricted to a small set of source-to-target structures ⇒ Fertility handled in the specific case of noun compounds 8 / 28
  47. 47. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I 9 / 28
  48. 48. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: 9 / 28
  49. 49. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: cyto → cellule ’cell’ 9 / 28
  50. 50. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: cyto → cellule ’cell’ cytotoxic → toxique (pour les) cellules ’toxic to the cells’ 9 / 28
  51. 51. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II 10 / 28
  52. 52. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE TARGET 10 / 28
  53. 53. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE   prefixed word     neoclassical compound        suffixed word compound      any combination        TARGET 10 / 28
  54. 54. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE TARGET     prefixed word  prefixed word     neoclassical compound     neoclassical compound              suffixed word suffixed word =⇒ compound  compound          any combination  any combination              phrase 10 / 28
  55. 55. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  56. 56. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 11 / 28
  57. 57. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 11 / 28
  58. 58. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 11 / 28
  59. 59. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 3 Recomposition permutations 11 / 28
  60. 60. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 3 Recomposition permutations 4 Selection search occurrences in target texts 11 / 28
  61. 61. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 12 / 28
  62. 62. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: 12 / 28
  63. 63. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens 12 / 28
  64. 64. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items 12 / 28
  65. 65. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items respect some length constraints on the substrings 12 / 28
  66. 66. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items respect some length constraints on the substrings non-cytotoxic → {non, cyto, toxic} 12 / 28
  67. 67. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 13 / 28
  68. 68. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: 13 / 28
  69. 69. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: { non, cyto, toxic} → {non, cyto, toxic}, {noncyto, toxic}, {non, cytotoxic}, {noncytotoxic} 13 / 28
  70. 70. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: { non, cyto, toxic} → {non, cyto, toxic}, {noncyto, toxic}, {non, cytotoxic}, {noncytotoxic} ⇒ Increases the chances of matching the components with entries of the dictionaries 13 / 28
  71. 71. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up 14 / 28
  72. 72. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique 14 / 28
  73. 73. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: -cyto- → -cyto-, cellule 14 / 28
  74. 74. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: -cyto- → -cyto-, cellule {-cyto-, toxic} → {-cyto-, toxique}, {cellule, toxique} 14 / 28
  75. 75. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation 15 / 28
  76. 76. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e 15 / 28
  77. 77. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e Synonyms toxique → v´n´neux ’poisonous’ e e 15 / 28
  78. 78. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e Synonyms toxique → v´n´neux ’poisonous’ e e {-cyto-, toxic} → {-cyto-, toxicit´}, e {-cyto-, v´n´neux}, {cellule, toxicit´}, e e e {cellule, v´n´neux} e e 15 / 28
  79. 79. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 16 / 28
  80. 80. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : 16 / 28
  81. 81. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} 16 / 28
  82. 82. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} Recreate target words by generating all possible concatenations of the components : 16 / 28
  83. 83. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} Recreate target words by generating all possible concatenations of the components : {-cyto-, toxique} → {cytotoxique} {cyto toxique}, 16 / 28
  84. 84. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 17 / 28
  85. 85. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 Filter out impossible target terms words e.g.. : “cyto” is a bound morpheme, cannot occur as an autonomous item 17 / 28
  86. 86. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 Filter out impossible target terms words e.g.. : “cyto” is a bound morpheme, cannot occur as an autonomous item {cyto toxique}, {cytotoxique}→ {cytotoxique} 17 / 28
  87. 87. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection 18 / 28
  88. 88. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus 18 / 28
  89. 89. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus Allow at maximum 3 stop words between two words 18 / 28
  90. 90. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus Allow at maximum 3 stop words between two words {toxique cellule} → ‘‘toxique pour les cellules’’ ’toxic to the cells’ 18 / 28
  91. 91. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  92. 92. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora 19 / 28
  93. 93. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German 19 / 28
  94. 94. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer 19 / 28
  95. 95. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 19 / 28
  96. 96. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science 19 / 28
  97. 97. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 1 http://www.temis.com 19 / 28
  98. 98. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable 1 http://www.temis.com 19 / 28
  99. 99. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable English-French: 0.71 1 http://www.temis.com 19 / 28
  100. 100. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable English-French: 0.71 English-German: 0.45 1 http://www.temis.com 19 / 28
  101. 101. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms 20 / 28
  102. 102. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms Morphologically constructed word collected from the English texts 20 / 28
  103. 103. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms Morphologically constructed word collected from the English texts None of them have a translation in the general language dictionary which is attested in the target texts English to French: 1839 source terms English to German: 1824 source terms 20 / 28
  104. 104. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation 21 / 28
  105. 105. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) 21 / 28
  106. 106. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] 21 / 28
  107. 107. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) 21 / 28
  108. 108. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) Synonyms (Xelda) 21 / 28
  109. 109. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) Synonyms (Xelda) Morphological families [Porter, 1980] 21 / 28
  110. 110. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Evaluation measures I Coverage C= σ(STi ) = 1 0 |ST | i=1 σ(STi ) |ST | if |Trans(STi )| ≥ 1 else ⇒ % of source terms with at least 1 translation (regardless of its accuracy) 22 / 28
  111. 111. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Evaluation measures II Precision P= |Exact| |Trans| ⇒ % of generated translations which are exact translations Overall quality OQ = C × P ⇒ trade-off between precision and coverage 23 / 28
  112. 112. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments 24 / 28
  113. 113. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments combination of linguistic resources 24 / 28
  114. 114. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments combination of linguistic resources quality of the lexicon with and without the fertile translations 24 / 28
  115. 115. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Results: English → French C Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. avg. gain +S +M +D +SMD P -f +f .04 .12 .05 .15 .11 .23 .16 .26 .24 .39 +11 -f +f .81 .57 .69 .50 .20 .28 .70 .60 .31 .33 -8.6 OQ -f +f .03 .07 .03 .08 .02 .06 .11 .16 .07 .13 +4.8 25 / 28
  116. 116. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Results: English → German C Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. avg. gain +S +M +D +SMD P -f +f .06 .13 .08 .16 .12 .22 .17 .26 .24 .36 +9.2 -f +f .80 .35 .69 .31 .40 .23 .65 .39 .43 .27 -28.4 OQ -f +f .05 .05 .05 .05 .05 .05 .11 .10 .10 .10 -0.2 26 / 28
  117. 117. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results 27 / 28
  118. 118. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) 27 / 28
  119. 119. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: 27 / 28
  120. 120. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: German germanic language: tendency to agglutination oestrogen-independant → Ostrogen-unabh¨ngige a 27 / 28
  121. 121. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: German germanic language: tendency to agglutination oestrogen-independant → Ostrogen-unabh¨ngige a French romance language: creates phrases more easily oestrogen-independant → ind´pendant des œstrog`nes e e 27 / 28
  122. 122. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis 28 / 28
  123. 123. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering 28 / 28
  124. 124. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ 28 / 28
  125. 125. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ Wrong or innapropriate translations 28 / 28
  126. 126. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ Wrong or innapropriate translations in-patient → pas malade ’not ill’ in → “inside” → inside patient in → “inverse” → not a patient 28 / 28
  127. 127. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  128. 128. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work 29 / 28
  129. 129. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources 29 / 28
  130. 130. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming 29 / 28
  131. 131. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus 29 / 28
  132. 132. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus Try translations patterns instead of permutations 29 / 28
  133. 133. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus Try translations patterns instead of permutations Rank translations 29 / 28
  134. 134. Thank you for your attention. B estelle.delpech@univ-nantes.fr beatrice.daille@univ-nantes.fr emmanuel.morin@univ-nantes.fr cl@lingua-et-machina.com
  135. 135. ADDITIONAL SLIDES
  136. 136. Exact translations Non fertiles: pathophysiological → physiopathologique overactive → uberaktiv ¨ Fertiles: cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’ e mastectomy → ablation der brust ’ablation of the breast’
  137. 137. Morphological variants Non fertiles: dosimetry → dosim´trique ’dosimetric’ e radiosensitivity → strahlenempfindlich ’radiosensitive’ Fertiles: milk-producing → production de lait ’production of milk’ selfexamination → selbst untersuchen ’self examine’
  138. 138. Inexact but semantically related Non fertiles: oncogene → oncog´n`se ’oncogenesis’ e e breakthrough → durchbrechen ’break’ Fertiles: chemoradiotherapy → chemotherapie oder strahlen ’chemotherapy or radiation’ treatable → pouvoir le traiter ’can treat it’
  139. 139. Wrong translations Non fertiles: immunoscore → immunomarquer ’immunostain’ check-in → unkontrollieren ’uncontrolled’ Fertiles: bloodstream → fliessen mehr blut ’more blood flow’ risk-reducing → risque de r´duire ’risk of reducing’ e
  140. 140. R´f´rences I ee Bo, L. and Gaussier, E. (2010). Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine. e Cartoni, B. (2009). Lexical morphology in machine translation: A feasibility study. In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece. Harastani, R., Daille, B., and Morin, E. (2012). Neoclassical compound alignments from comparable corpora. In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text Processing, volume 2, pages 72–82, New Delhi, India. Hauer, B. and Kondrak, G. (2011). Clustering semantically equivalent words into cognate sets in multilingual lists. In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873, Chiang Mai, Thailand. Keenan, E. L. and Faltz, L. M. (1985). Boolean semantics for natural language. D. Reidel, Dordrecht, Holland. Morin, E. and Daille, B. (2009). Compositionality and lexical alignment of multi-word terms. In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands o edition. Namer, F. and Baud, R. (2007). Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system. International Journal of Medical Informatics, 76(2-3):226–33.
  141. 141. R´f´rences II ee Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137. Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011). Simple methods for dealing with term variation and term alignment. In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93, Paris, France.

×