SlideShare a Scribd company logo
1 of 141
Download to read offline
Identification of Fertile Translations in Comparable
Corpora
a Morpho-Compositional Approach

Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire
e
Lemaire2,3
1 LINA,

2 GREMUTS, Universit´ de Grenoble
Universit´ de Nantes
e
e
3 Lingua et Machina

AMTA’12

10/31/12

San Diego, CA
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Context

1 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Context

Research partly funded by Computer-Aided Translation
company

1 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Context

Research partly funded by Computer-Aided Translation
company
Goal: generate domain-specific bilingual lexicons when no
parallel data is available

1 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Context

Research partly funded by Computer-Aided Translation
company
Goal: generate domain-specific bilingual lexicons when no
parallel data is available
Available data:
general language bilingual dictionary
domain-specific comparable corpora

1 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Comparable corpora
Definition of comparable corpora
Set of texts in languages L1 and L2, which are not translations,
but which deal with the same subject matter, so that there is still a
possibility to extract translation pairs
Some difficulties:
language in target texts is not influenced by source texts
mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures
⇒ need to deal with variation in translation

2 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons

Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons

Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons

Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus

Fertility:
bi-dimensional → deux dimensions ’two dimensions’

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Variation in translation

Morphological variation:
anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’
e
⇒ use of morphological derivation rules / lexicons

Lexical variation:
radiosensitivity → Radiotoleranz ’radiotolerance’
sensitivity ≈ tolerance
⇒ use of synonyms, thesaurus

Fertility:
bi-dimensional → deux dimensions ’two dimensions’
⇒ scarcely adressed

3 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Fertility

4 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Fertility

Definition target term has more words than the source term

4 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Fertility

Definition target term has more words than the source term
semantic fertility target term has more morphemes than the source
term
voie de glace ’route of ice’ → ice climbing route
aquarelle (not decomposable) → water color

4 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Original problem
Comparable corpora
Variation in translation

Fertility

Definition target term has more words than the source term
semantic fertility target term has more morphemes than the source
term
voie de glace ’route of ice’ → ice climbing route
aquarelle (not decomposable) → water color
surface fertility target and source terms have the same number of
morphemes
bi-dimensional → deux dimensions ’two dimensions’

4 / 28
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation

Principle of compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]

5 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation

Principle of compositionality
“the meaning of the whole is a function of the meaning of the
parts” [Keenan and Faltz, 1985, 24-25]
Definition of compositional translation
The translation of the whole is a function of the translation of the
parts

5 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation process

6 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation process

1

Decomposition
“cytotoxic” → {cyto, toxic}

6 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation process

1

Decomposition
“cytotoxic” → {cyto, toxic}

2

Translation
{cyto, toxic} → {cyto, toxique}

6 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation process

1

Decomposition
“cytotoxic” → {cyto, toxic}

2

Translation
{cyto, toxic} → {cyto, toxique}

3

Recomposition
{cyto, toxique} → {cytotoxique, toxiquecyto}

6 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Compositional translation process

1

Decomposition
“cytotoxic” → {cyto, toxic}

2

Translation
{cyto, toxic} → {cyto, toxique}

3

Recomposition
{cyto, toxique} → {cytotoxique, toxiquecyto}

4

Selection
{cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Relevance of compositional translation

7 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Relevance of compositional translation

More than 60% of terms in technical and scientific domains
are morphologically complex [Namer and Baud, 2007]

7 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Relevance of compositional translation

More than 60% of terms in technical and scientific domains
are morphologically complex [Namer and Baud, 2007]
Outperforms distributional approach for the translation of
terms with compositional meaning [Morin and Daille, 2009]

7 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

[Weller et al., 2011]

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

[Weller et al., 2011]
Noun compound → noun phrase

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Related work: single-word terms translation
[Cartoni, 2009]
Prefixed word → prefixed word
ri+organizzare → r´+organiser ’reorganize’
e

[Harastani et al., 2012]
Neoclassical compound → neoclassical compound
Kalori+metrie → calori+m´trie ’calorimetry’
e

[Weller et al., 2011]
Noun compound → noun phrase
Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures
⇒ Fertility handled in the specific case of noun compounds
8 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution I

9 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution I

Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:

9 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution I

Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:
cyto → cellule ’cell’

9 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution I

Addressing fertility by allowing translation equivalences from
bound morpheme to autonomous lexical item:
cyto → cellule ’cell’
cytotoxic → toxique (pour les) cellules ’toxic to the cells’

9 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution II

10 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution II

Larger variety of input/output structures:
SOURCE

TARGET

10 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution II

Larger variety of input/output structures:
SOURCE


prefixed word




neoclassical compound







suffixed word
compound





any combination








TARGET

10 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Underlying principle and advantages
Related work
Contribution

Contribution II

Larger variety of input/output structures:
SOURCE
TARGET




prefixed word

prefixed word




neoclassical compound




neoclassical compound













suffixed word
suffixed word
=⇒
compound

compound









any combination

any combination













phrase

10 / 28
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Overview

11 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Overview

1

Decomposition
lexicons + heuristic rules

11 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Overview

1

Decomposition
lexicons + heuristic rules

2

Translation
dictionary look-up

11 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Overview

1

Decomposition
lexicons + heuristic rules

2

Translation
dictionary look-up

3

Recomposition
permutations

11 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Overview

1

Decomposition
lexicons + heuristic rules

2

Translation
dictionary look-up

3

Recomposition
permutations

4

Selection
search occurrences in target texts

11 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

Split source term into minimal components with heuristic rules:
split on hyphens

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items

respect some length constraints on the substrings

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 1

Split source term into minimal components with heuristic rules:
split on hyphens
match substrings of the source term with:
a list of morphemes (prefixes, confixes, suffixes)
a list of lexical items

respect some length constraints on the substrings
non-cytotoxic → {non, cyto, toxic}

12 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 2

13 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 2

Generate all possible concatenations of the minimal
components:

13 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 2

Generate all possible concatenations of the minimal
components:
{ non, cyto, toxic} → {non, cyto, toxic},
{noncyto, toxic}, {non, cytotoxic},
{noncytotoxic}

13 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Decomposition - step 2

Generate all possible concatenations of the minimal
components:
{ non, cyto, toxic} → {non, cyto, toxic},
{noncyto, toxic}, {non, cytotoxic},
{noncytotoxic}
⇒ Increases the chances of matching the components with
entries of the dictionaries

13 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation through direct dictionary look-up

14 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

14 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

Morpheme translation table for bound morphemes:
-cyto- → -cyto-, cellule

14 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:
toxic → toxique

Morpheme translation table for bound morphemes:
-cyto- → -cyto-, cellule

{-cyto-, toxic} → {-cyto-, toxique},
{cellule, toxique}

14 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation with variation

15 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation with variation

Morphological lexicon
toxique → toxicit´ ’toxicity’
e

15 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation with variation

Morphological lexicon
toxique → toxicit´ ’toxicity’
e

Synonyms
toxique → v´n´neux ’poisonous’
e e

15 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Translation with variation

Morphological lexicon
toxique → toxicit´ ’toxicity’
e

Synonyms
toxique → v´n´neux ’poisonous’
e e

{-cyto-, toxic} → {-cyto-, toxicit´},
e
{-cyto-, v´n´neux}, {cellule, toxicit´},
e e
e
{cellule, v´n´neux}
e e

15 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 1

16 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 1

Permutate the target components :

16 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 1

Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}

16 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 1

Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}
Recreate target words by generating all possible
concatenations of the components :

16 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 1

Permutate the target components :
{-cyto-, toxique} → {-cyto-, toxique},
{toxique, -cyto-}
Recreate target words by generating all possible
concatenations of the components :
{-cyto-, toxique} →
{cytotoxique}

{cyto toxique},

16 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 2

17 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 2

Filter out impossible target terms words
e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item

17 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Recomposition - step 2

Filter out impossible target terms words
e.g.. : “cyto” is a bound morpheme, cannot occur as an
autonomous item

{cyto toxique}, {cytotoxique}→ {cytotoxique}

17 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Selection

18 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Selection

Match target term with the words of the target corpus

18 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Selection

Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words

18 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Decomposition
Translation
Recomposition
Selection

Selection

Match target term with the words of the target corpus
Allow at maximum 3 stop words between two words
{toxique cellule} → ‘‘toxique pour les
cellules’’ ’toxic to the cells’

18 / 28
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora

19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German

19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer

19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language

19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2

scientic papers +

1
2

lay science

19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2

scientic papers +

1
2

lay science

pos-tagged with software Xelda1

1

http://www.temis.com
19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2

scientic papers +

1
2

lay science

pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable

1

http://www.temis.com
19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2

scientic papers +

1
2

lay science

pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable
English-French: 0.71

1

http://www.temis.com
19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Corpora
English, French, German
breast cancer
approx. 400k words per language
1
2

scientic papers +

1
2

lay science

pos-tagged with software Xelda1
Comparability [Bo and Gaussier, 2010]:
unrelated 0 ⇔ 1 perfectly comparable
English-French: 0.71
English-German: 0.45

1

http://www.temis.com
19 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Source terms

20 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Source terms

Morphologically constructed word collected from the English
texts

20 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Source terms

Morphologically constructed word collected from the English
texts
None of them have a translation in the general language
dictionary which is attested in the target texts
English to French: 1839 source terms
English to German: 1824 source terms

20 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

General language dictionary (Xelda)

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Resources for translation

General language dictionary (Xelda)
Domain-specific dictionary : cognates extracted from corpus
[Hauer and Kondrak, 2011]
Morpheme translation table (hand-crafted)
Synonyms (Xelda)
Morphological families [Porter, 1980]

21 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Evaluation measures I

Coverage
C=
σ(STi ) =

1
0

|ST |
i=1 σ(STi )

|ST |
if |Trans(STi )| ≥ 1
else

⇒ % of source terms with at least 1 translation (regardless of its
accuracy)

22 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Evaluation measures II
Precision
P=

|Exact|
|Trans|

⇒ % of generated translations which are exact translations
Overall quality
OQ = C × P
⇒ trade-off between precision and coverage

23 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Experiments

24 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Experiments

combination of linguistic resources

24 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Experiments

combination of linguistic resources
quality of the lexicon with and without the fertile translations

24 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Results: English → French

C
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.

avg. gain

+S
+M
+D
+SMD

P

-f
+f
.04 .12
.05 .15
.11 .23
.16 .26
.24 .39
+11

-f
+f
.81 .57
.69 .50
.20 .28
.70 .60
.31 .33
-8.6

OQ
-f
+f
.03 .07
.03 .08
.02 .06
.11 .16
.07 .13
+4.8

25 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Results: English → German

C
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.
Gen.+Morph.

avg. gain

+S
+M
+D
+SMD

P

-f
+f
.06 .13
.08 .16
.12 .22
.17 .26
.24 .36
+9.2

-f
+f
.80 .35
.69 .31
.40 .23
.65 .39
.43 .27
-28.4

OQ
-f
+f
.05 .05
.05 .05
.05 .05
.11 .10
.10 .10
-0.2

26 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Discussion: English-French vs. English-German results

27 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.
0.71)

27 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:

27 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:
German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a

27 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.
0.71)
Morphological types:
German germanic language: tendency to agglutination
oestrogen-independant → Ostrogen-unabh¨ngige
a
French romance language: creates phrases more easily
oestrogen-independant → ind´pendant des œstrog`nes
e
e

27 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Error analysis

28 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Error analysis

Problems in word reordering

28 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Error analysis

Problems in word reordering
self-examination → untersuchung selbst ’examination self’

28 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Error analysis

Problems in word reordering
self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations

28 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Data
Evaluation measures
Results

Error analysis

Problems in word reordering
self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations
in-patient → pas malade ’not ill’
in → “inside” → inside patient
in → “inverse” → not a patient

28 / 28
Outline

1

Context and original problem

2

Compositional translation framework

3

Detailed translation method

4

Experiments and results

5

Future work
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

29 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

Improve quality of linguistic resources

29 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

Improve quality of linguistic resources
morphological derivation rules instead of stemming

29 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus

29 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus

Try translations patterns instead of permutations

29 / 28
Context and original problem
Compositional translation framework
Detailed translation method
Experiments and results
Future work

Future work

Improve quality of linguistic resources
morphological derivation rules instead of stemming
use of a thesaurus

Try translations patterns instead of permutations
Rank translations

29 / 28
Thank you for your attention.

B
estelle.delpech@univ-nantes.fr
beatrice.daille@univ-nantes.fr
emmanuel.morin@univ-nantes.fr
cl@lingua-et-machina.com
ADDITIONAL SLIDES
Exact translations

Non fertiles:
pathophysiological → physiopathologique
overactive → uberaktiv
¨

Fertiles:
cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’
e
mastectomy → ablation der brust ’ablation of the breast’
Morphological variants

Non fertiles:
dosimetry → dosim´trique ’dosimetric’
e
radiosensitivity → strahlenempfindlich ’radiosensitive’

Fertiles:
milk-producing → production de lait ’production of milk’
selfexamination → selbst untersuchen ’self examine’
Inexact but semantically related

Non fertiles:
oncogene → oncog´n`se ’oncogenesis’
e e
breakthrough → durchbrechen ’break’

Fertiles:
chemoradiotherapy → chemotherapie oder strahlen
’chemotherapy or radiation’
treatable → pouvoir le traiter ’can treat it’
Wrong translations

Non fertiles:
immunoscore → immunomarquer ’immunostain’
check-in → unkontrollieren ’uncontrolled’

Fertiles:
bloodstream → fliessen mehr blut ’more blood flow’
risk-reducing → risque de r´duire ’risk of reducing’
e
R´f´rences I
ee
Bo, L. and Gaussier, E. (2010).
Improving corpus comparability for bilingual lexicon extraction from comparable corpora.
In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.
e
Cartoni, B. (2009).
Lexical morphology in machine translation: A feasibility study.
In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.
Harastani, R., Daille, B., and Morin, E. (2012).
Neoclassical compound alignments from comparable corpora.
In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text
Processing, volume 2, pages 72–82, New Delhi, India.
Hauer, B. and Kondrak, G. (2011).
Clustering semantically equivalent words into cognate sets in multilingual lists.
In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,
Chiang Mai, Thailand.
Keenan, E. L. and Faltz, L. M. (1985).
Boolean semantics for natural language.
D. Reidel, Dordrecht, Holland.
Morin, E. and Daille, B. (2009).
Compositionality and lexical alignment of multi-word terms.
In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain
sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands
o
edition.
Namer, F. and Baud, R. (2007).
Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.
International Journal of Medical Informatics, 76(2-3):226–33.
R´f´rences II
ee

Porter, M. F. (1980).
An algorithm for suffix stripping.
Program, 14(3):130–137.
Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011).
Simple methods for dealing with term variation and term alignment.
In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93,
Paris, France.

More Related Content

What's hot

Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentationSurya Sg
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...csandit
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESkevig
 
An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...Mario Bisiada
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentJorge Barreto
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project PresentationAryak Sengupta
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationElaheh Barati
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianIDES Editor
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiIIIT Hyderabad
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueJinho Choi
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIijnlc
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersLiangqun Lu
 

What's hot (20)

Nlp (1)
Nlp (1)Nlp (1)
Nlp (1)
 
Nlp research presentation
Nlp research presentationNlp research presentation
Nlp research presentation
 
Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...Improvement wsd dictionary using annotated corpus and testing it with simplif...
Improvement wsd dictionary using annotated corpus and testing it with simplif...
 
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIESTHE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
THE ABILITY OF WORD EMBEDDINGS TO CAPTURE WORD SIMILARITIES
 
An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...An investigation of diachronic change in hypotaxis and parataxis in German th...
An investigation of diachronic change in hypotaxis and parataxis in German th...
 
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL DevelopmentProposal of an Ontology Applied to Technical Debt on PL/SQL Development
Proposal of an Ontology Applied to Technical Debt on PL/SQL Development
 
NLP Project Presentation
NLP Project PresentationNLP Project Presentation
NLP Project Presentation
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 
Latent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text SummarizationLatent Topic-semantic Indexing based Automatic Text Summarization
Latent Topic-semantic Indexing based Automatic Text Summarization
 
Corpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of PersianCorpus-based part-of-speech disambiguation of Persian
Corpus-based part-of-speech disambiguation of Persian
 
Arabic MT Project
Arabic MT ProjectArabic MT Project
Arabic MT Project
 
Code Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging haiCode Mixing computationally bahut challenging hai
Code Mixing computationally bahut challenging hai
 
ijcai11
ijcai11ijcai11
ijcai11
 
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty DialogueTransformers to Learn Hierarchical Contexts in Multiparty Dialogue
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue
 
Presentation
PresentationPresentation
Presentation
 
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABIRULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
RULE BASED TRANSLITERATION SCHEME FOR ENGLISH TO PUNJABI
 
Lfg and gpsg
Lfg and gpsgLfg and gpsg
Lfg and gpsg
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Icpc13.ppt
Icpc13.pptIcpc13.ppt
Icpc13.ppt
 

Viewers also liked

Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Tobias Wunner
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Association for Computational Linguistics
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesEstelle Delpech
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Association for Computational Linguistics
 
Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionSarvnaz Karimi
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Association for Computational Linguistics
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeEstelle Delpech
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconİrem Tümer
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology miningEstelle Delpech
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsAlberto Simões
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationwebLyzard technology
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van braziliëJan-Willem Lammens
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaHaithem Afli
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Adrien Barbaresi
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Estelle Delpech
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in Englishteflang
 

Viewers also liked (17)

Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...Cross-lingual ontology lexicalisation, translation and information extraction...
Cross-lingual ontology lexicalisation, translation and information extraction...
 
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
Philippe Langlais - 2017 - Users and Data: The Two Neglected Children of Bili...
 
Applicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologiesApplicative evaluation of bilingual terminologies
Applicative evaluation of bilingual terminologies
 
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
Michael Bloodgood - 2017 - Acquisition of Translation Lexicons for Historical...
 
Chelo Vargas-Sierra
Chelo Vargas-SierraChelo Vargas-Sierra
Chelo Vargas-Sierra
 
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration ExtractionEnriching Transliteration Lexicon Using Automatic Transliteration Extraction
Enriching Transliteration Lexicon Using Automatic Transliteration Extraction
 
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
Meng Zhang - 2017 - Adversarial Training for Unsupervised Bilingual Lexicon I...
 
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchangeDealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
Dealing with Lexicon Acquired from Comparable Corpora: post-edition and exchange
 
A cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexiconA cognitive view of the bilingual lexicon
A cognitive view of the bilingual lexicon
 
Bilingual terminology mining
Bilingual terminology miningBilingual terminology mining
Bilingual terminology mining
 
Bilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation PatternsBilingual Terminology Extraction based on Translation Patterns
Bilingual Terminology Extraction based on Translation Patterns
 
Embedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and EvaluationEmbedded Human Computation for Knowledge Extraction and Evaluation
Embedded Human Computation for Knowledge Extraction and Evaluation
 
Macro economische analyse van brazilië
Macro economische analyse van braziliëMacro economische analyse van brazilië
Macro economische analyse van brazilië
 
Parallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corporaParallel text extraction from multimodal comparable corpora
Parallel text extraction from multimodal comparable corpora
 
Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...Challenges in the linguistic exploitation of specialized republishable web co...
Challenges in the linguistic exploitation of specialized republishable web co...
 
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
Extraction of domain-specific bilingual lexicon from comparable corpora: comp...
 
Word Formation in English
Word Formation in EnglishWord Formation in English
Word Formation in English
 

Similar to Identification of Fertile Translations in Comparable Corpora Using Morpho-Compositional Approach

Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationIJECEIAES
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Chunyang Chen
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsTae Hwan Jung
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsVincenzo Lomonaco
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...baskaran_md
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptxMahsadelavari
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdfAmir Abdalla
 
Psychological test adaptation
Psychological test adaptationPsychological test adaptation
Psychological test adaptationCarlo Magno
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to HindiRajat Jain
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in textunyil96
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...iyo
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONijnlc
 
Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Lawrie Hunter
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsTomoyuki Kajiwara
 

Similar to Identification of Fertile Translations in Comparable Corpora Using Morpho-Compositional Approach (20)

PDFTextProcessing
PDFTextProcessingPDFTextProcessing
PDFTextProcessing
 
Two Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query TranslationTwo Level Disambiguation Model for Query Translation
Two Level Disambiguation Model for Query Translation
 
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
Unsupervised Software-Specific Morphological Forms Inference from Informal Di...
 
Neural machine translation of rare words with subword units
Neural machine translation of rare words with subword unitsNeural machine translation of rare words with subword units
Neural machine translation of rare words with subword units
 
Parafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdfParafraseo-Chenggang.pdf
Parafraseo-Chenggang.pdf
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 
Word2vec on the italian language: first experiments
Word2vec on the italian language: first experimentsWord2vec on the italian language: first experiments
Word2vec on the italian language: first experiments
 
Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...Tamil-English Document Translation Using Statistical Machine Translation Appr...
Tamil-English Document Translation Using Statistical Machine Translation Appr...
 
More on Indexing Text Operations (1).pptx
More on Indexing  Text Operations (1).pptxMore on Indexing  Text Operations (1).pptx
More on Indexing Text Operations (1).pptx
 
Machine Transalation.pdf
Machine Transalation.pdfMachine Transalation.pdf
Machine Transalation.pdf
 
Psychological test adaptation
Psychological test adaptationPsychological test adaptation
Psychological test adaptation
 
Machine translation from English to Hindi
Machine translation from English to HindiMachine translation from English to Hindi
Machine translation from English to Hindi
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
Techniques for automatically correcting words in text
Techniques for automatically correcting words in textTechniques for automatically correcting words in text
Techniques for automatically correcting words in text
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
A Study Of Statistical Models For Query Translation :Finding A Good Unit Of T...
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12Academic writing: pointers for G-cube PhD students 23-01.12
Academic writing: pointers for G-cube PhD students 23-01.12
 
Noun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of ContextsNoun Paraphrasing Based on a Variety of Contexts
Noun Paraphrasing Based on a Variety of Contexts
 

More from Estelle Delpech

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texteEstelle Delpech
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxEstelle Delpech
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesEstelle Delpech
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardEstelle Delpech
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Estelle Delpech
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxEstelle Delpech
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Estelle Delpech
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchEstelle Delpech
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Estelle Delpech
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engineEstelle Delpech
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeEstelle Delpech
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsingEstelle Delpech
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmEstelle Delpech
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringEstelle Delpech
 

More from Estelle Delpech (15)

Génération automatique de texte
Génération automatique de texteGénération automatique de texte
Génération automatique de texte
 
Identification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieuxIdentification de compatibilités entre tages descriptifs de lieux
Identification de compatibilités entre tages descriptifs de lieux
 
Découverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des LanguesDécouverte du Traitement Automatique des Langues
Découverte du Traitement Automatique des Langues
 
Invited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis awardInvited speaker, ATALA 2014 Ph. D. Thesis award
Invited speaker, ATALA 2014 Ph. D. Thesis award
 
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...Corpus comparables et traduction assistée par ordinateur, contributions à la ...
Corpus comparables et traduction assistée par ordinateur, contributions à la ...
 
Identification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieuxIdentification de compatibilites sémantiques entre descripteurs de lieux
Identification de compatibilites sémantiques entre descripteurs de lieux
 
Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...Usage du TAL dans des applications industrielles : gestion des contenus multi...
Usage du TAL dans des applications industrielles : gestion des contenus multi...
 
Nomao: data analysis for personalized local search
Nomao: data analysis for personalized local searchNomao: data analysis for personalized local search
Nomao: data analysis for personalized local search
 
Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)Nomao: carnet de bonnes adresses (entre amis)
Nomao: carnet de bonnes adresses (entre amis)
 
Nomao: local search and recommendation engine
Nomao: local search and recommendation engineNomao: local search and recommendation engine
Nomao: local search and recommendation engine
 
Évaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialiséeÉvaluation applicative des terminologies destinées à la traduction spécialisée
Évaluation applicative des terminologies destinées à la traduction spécialisée
 
R&D Lingua et Machina
R&D Lingua et MachinaR&D Lingua et Machina
R&D Lingua et Machina
 
Robust rule-based parsing
Robust rule-based parsingRobust rule-based parsing
Robust rule-based parsing
 
Experimenting the TextTiling Algorithm
Experimenting the TextTiling AlgorithmExperimenting the TextTiling Algorithm
Experimenting the TextTiling Algorithm
 
Text Processing for Procedural Question Answering
Text Processing for Procedural Question AnsweringText Processing for Procedural Question Answering
Text Processing for Procedural Question Answering
 

Recently uploaded

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentationphoebematthew05
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 

Recently uploaded (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
costume and set research powerpoint presentation
costume and set research powerpoint presentationcostume and set research powerpoint presentation
costume and set research powerpoint presentation
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 

Identification of Fertile Translations in Comparable Corpora Using Morpho-Compositional Approach

  • 1. Identification of Fertile Translations in Comparable Corpora a Morpho-Compositional Approach Estelle Delpech1 , B´atrice Daille1 , Emmanuel Morin1 , Claire e Lemaire2,3 1 LINA, 2 GREMUTS, Universit´ de Grenoble Universit´ de Nantes e e 3 Lingua et Machina AMTA’12 10/31/12 San Diego, CA
  • 2. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 3. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 4. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context 1 / 28
  • 5. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company 1 / 28
  • 6. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company Goal: generate domain-specific bilingual lexicons when no parallel data is available 1 / 28
  • 7. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Context Research partly funded by Computer-Aided Translation company Goal: generate domain-specific bilingual lexicons when no parallel data is available Available data: general language bilingual dictionary domain-specific comparable corpora 1 / 28
  • 8. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs 2 / 28
  • 9. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: 2 / 28
  • 10. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts 2 / 28
  • 11. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... 2 / 28
  • 12. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... ⇒ do not expect parallelism in source ↔ target structures 2 / 28
  • 13. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Comparable corpora Definition of comparable corpora Set of texts in languages L1 and L2, which are not translations, but which deal with the same subject matter, so that there is still a possibility to extract translation pairs Some difficulties: language in target texts is not influenced by source texts mixed text types : technical, scientific, lay science... ⇒ do not expect parallelism in source ↔ target structures ⇒ need to deal with variation in translation 2 / 28
  • 14. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation 3 / 28
  • 15. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e 3 / 28
  • 16. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons 3 / 28
  • 17. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance 3 / 28
  • 18. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus 3 / 28
  • 19. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus Fertility: bi-dimensional → deux dimensions ’two dimensions’ 3 / 28
  • 20. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Variation in translation Morphological variation: anticancer (Noun) → anticanc´reux (Adj) ’anticancerous’ e ⇒ use of morphological derivation rules / lexicons Lexical variation: radiosensitivity → Radiotoleranz ’radiotolerance’ sensitivity ≈ tolerance ⇒ use of synonyms, thesaurus Fertility: bi-dimensional → deux dimensions ’two dimensions’ ⇒ scarcely adressed 3 / 28
  • 21. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility 4 / 28
  • 22. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term 4 / 28
  • 23. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term semantic fertility target term has more morphemes than the source term voie de glace ’route of ice’ → ice climbing route aquarelle (not decomposable) → water color 4 / 28
  • 24. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Original problem Comparable corpora Variation in translation Fertility Definition target term has more words than the source term semantic fertility target term has more morphemes than the source term voie de glace ’route of ice’ → ice climbing route aquarelle (not decomposable) → water color surface fertility target and source terms have the same number of morphemes bi-dimensional → deux dimensions ’two dimensions’ 4 / 28
  • 25. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 26. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation Principle of compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] 5 / 28
  • 27. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation Principle of compositionality “the meaning of the whole is a function of the meaning of the parts” [Keenan and Faltz, 1985, 24-25] Definition of compositional translation The translation of the whole is a function of the translation of the parts 5 / 28
  • 28. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 6 / 28
  • 29. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 6 / 28
  • 30. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 6 / 28
  • 31. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 3 Recomposition {cyto, toxique} → {cytotoxique, toxiquecyto} 6 / 28
  • 32. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Compositional translation process 1 Decomposition “cytotoxic” → {cyto, toxic} 2 Translation {cyto, toxic} → {cyto, toxique} 3 Recomposition {cyto, toxique} → {cytotoxique, toxiquecyto} 4 Selection {cytotoxique, toxiquecyto} → “cytotoxique” 6 / 28
  • 33. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation 7 / 28
  • 34. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] 7 / 28
  • 35. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Relevance of compositional translation More than 60% of terms in technical and scientific domains are morphologically complex [Namer and Baud, 2007] Outperforms distributional approach for the translation of terms with compositional meaning [Morin and Daille, 2009] 7 / 28
  • 36. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] 8 / 28
  • 37. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word 8 / 28
  • 38. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e 8 / 28
  • 39. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] 8 / 28
  • 40. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound 8 / 28
  • 41. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e 8 / 28
  • 42. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] 8 / 28
  • 43. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase 8 / 28
  • 44. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope 8 / 28
  • 45. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope ⇒ Restricted to a small set of source-to-target structures 8 / 28
  • 46. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Related work: single-word terms translation [Cartoni, 2009] Prefixed word → prefixed word ri+organizzare → r´+organiser ’reorganize’ e [Harastani et al., 2012] Neoclassical compound → neoclassical compound Kalori+metrie → calori+m´trie ’calorimetry’ e [Weller et al., 2011] Noun compound → noun phrase Elektronen+mikroskop →electron microscope ⇒ Restricted to a small set of source-to-target structures ⇒ Fertility handled in the specific case of noun compounds 8 / 28
  • 47. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I 9 / 28
  • 48. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: 9 / 28
  • 49. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: cyto → cellule ’cell’ 9 / 28
  • 50. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution I Addressing fertility by allowing translation equivalences from bound morpheme to autonomous lexical item: cyto → cellule ’cell’ cytotoxic → toxique (pour les) cellules ’toxic to the cells’ 9 / 28
  • 51. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II 10 / 28
  • 52. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE TARGET 10 / 28
  • 53. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE   prefixed word     neoclassical compound        suffixed word compound      any combination        TARGET 10 / 28
  • 54. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Underlying principle and advantages Related work Contribution Contribution II Larger variety of input/output structures: SOURCE TARGET     prefixed word  prefixed word     neoclassical compound     neoclassical compound              suffixed word suffixed word =⇒ compound  compound          any combination  any combination              phrase 10 / 28
  • 55. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 56. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 11 / 28
  • 57. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 11 / 28
  • 58. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 11 / 28
  • 59. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 3 Recomposition permutations 11 / 28
  • 60. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Overview 1 Decomposition lexicons + heuristic rules 2 Translation dictionary look-up 3 Recomposition permutations 4 Selection search occurrences in target texts 11 / 28
  • 61. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 12 / 28
  • 62. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: 12 / 28
  • 63. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens 12 / 28
  • 64. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items 12 / 28
  • 65. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items respect some length constraints on the substrings 12 / 28
  • 66. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 1 Split source term into minimal components with heuristic rules: split on hyphens match substrings of the source term with: a list of morphemes (prefixes, confixes, suffixes) a list of lexical items respect some length constraints on the substrings non-cytotoxic → {non, cyto, toxic} 12 / 28
  • 67. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 13 / 28
  • 68. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: 13 / 28
  • 69. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: { non, cyto, toxic} → {non, cyto, toxic}, {noncyto, toxic}, {non, cytotoxic}, {noncytotoxic} 13 / 28
  • 70. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Decomposition - step 2 Generate all possible concatenations of the minimal components: { non, cyto, toxic} → {non, cyto, toxic}, {noncyto, toxic}, {non, cytotoxic}, {noncytotoxic} ⇒ Increases the chances of matching the components with entries of the dictionaries 13 / 28
  • 71. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up 14 / 28
  • 72. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique 14 / 28
  • 73. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: -cyto- → -cyto-, cellule 14 / 28
  • 74. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation through direct dictionary look-up Bilingual dictionary for lexical items: toxic → toxique Morpheme translation table for bound morphemes: -cyto- → -cyto-, cellule {-cyto-, toxic} → {-cyto-, toxique}, {cellule, toxique} 14 / 28
  • 75. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation 15 / 28
  • 76. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e 15 / 28
  • 77. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e Synonyms toxique → v´n´neux ’poisonous’ e e 15 / 28
  • 78. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Translation with variation Morphological lexicon toxique → toxicit´ ’toxicity’ e Synonyms toxique → v´n´neux ’poisonous’ e e {-cyto-, toxic} → {-cyto-, toxicit´}, e {-cyto-, v´n´neux}, {cellule, toxicit´}, e e e {cellule, v´n´neux} e e 15 / 28
  • 79. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 16 / 28
  • 80. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : 16 / 28
  • 81. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} 16 / 28
  • 82. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} Recreate target words by generating all possible concatenations of the components : 16 / 28
  • 83. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 1 Permutate the target components : {-cyto-, toxique} → {-cyto-, toxique}, {toxique, -cyto-} Recreate target words by generating all possible concatenations of the components : {-cyto-, toxique} → {cytotoxique} {cyto toxique}, 16 / 28
  • 84. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 17 / 28
  • 85. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 Filter out impossible target terms words e.g.. : “cyto” is a bound morpheme, cannot occur as an autonomous item 17 / 28
  • 86. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Recomposition - step 2 Filter out impossible target terms words e.g.. : “cyto” is a bound morpheme, cannot occur as an autonomous item {cyto toxique}, {cytotoxique}→ {cytotoxique} 17 / 28
  • 87. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection 18 / 28
  • 88. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus 18 / 28
  • 89. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus Allow at maximum 3 stop words between two words 18 / 28
  • 90. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Decomposition Translation Recomposition Selection Selection Match target term with the words of the target corpus Allow at maximum 3 stop words between two words {toxique cellule} → ‘‘toxique pour les cellules’’ ’toxic to the cells’ 18 / 28
  • 91. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 92. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora 19 / 28
  • 93. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German 19 / 28
  • 94. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer 19 / 28
  • 95. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 19 / 28
  • 96. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science 19 / 28
  • 97. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 1 http://www.temis.com 19 / 28
  • 98. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable 1 http://www.temis.com 19 / 28
  • 99. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable English-French: 0.71 1 http://www.temis.com 19 / 28
  • 100. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Corpora English, French, German breast cancer approx. 400k words per language 1 2 scientic papers + 1 2 lay science pos-tagged with software Xelda1 Comparability [Bo and Gaussier, 2010]: unrelated 0 ⇔ 1 perfectly comparable English-French: 0.71 English-German: 0.45 1 http://www.temis.com 19 / 28
  • 101. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms 20 / 28
  • 102. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms Morphologically constructed word collected from the English texts 20 / 28
  • 103. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Source terms Morphologically constructed word collected from the English texts None of them have a translation in the general language dictionary which is attested in the target texts English to French: 1839 source terms English to German: 1824 source terms 20 / 28
  • 104. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation 21 / 28
  • 105. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) 21 / 28
  • 106. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] 21 / 28
  • 107. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) 21 / 28
  • 108. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) Synonyms (Xelda) 21 / 28
  • 109. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Resources for translation General language dictionary (Xelda) Domain-specific dictionary : cognates extracted from corpus [Hauer and Kondrak, 2011] Morpheme translation table (hand-crafted) Synonyms (Xelda) Morphological families [Porter, 1980] 21 / 28
  • 110. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Evaluation measures I Coverage C= σ(STi ) = 1 0 |ST | i=1 σ(STi ) |ST | if |Trans(STi )| ≥ 1 else ⇒ % of source terms with at least 1 translation (regardless of its accuracy) 22 / 28
  • 111. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Evaluation measures II Precision P= |Exact| |Trans| ⇒ % of generated translations which are exact translations Overall quality OQ = C × P ⇒ trade-off between precision and coverage 23 / 28
  • 112. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments 24 / 28
  • 113. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments combination of linguistic resources 24 / 28
  • 114. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Experiments combination of linguistic resources quality of the lexicon with and without the fertile translations 24 / 28
  • 115. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Results: English → French C Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. avg. gain +S +M +D +SMD P -f +f .04 .12 .05 .15 .11 .23 .16 .26 .24 .39 +11 -f +f .81 .57 .69 .50 .20 .28 .70 .60 .31 .33 -8.6 OQ -f +f .03 .07 .03 .08 .02 .06 .11 .16 .07 .13 +4.8 25 / 28
  • 116. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Results: English → German C Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. Gen.+Morph. avg. gain +S +M +D +SMD P -f +f .06 .13 .08 .16 .12 .22 .17 .26 .24 .36 +9.2 -f +f .80 .35 .69 .31 .40 .23 .65 .39 .43 .27 -28.4 OQ -f +f .05 .05 .05 .05 .05 .05 .11 .10 .10 .10 -0.2 26 / 28
  • 117. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results 27 / 28
  • 118. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) 27 / 28
  • 119. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: 27 / 28
  • 120. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: German germanic language: tendency to agglutination oestrogen-independant → Ostrogen-unabh¨ngige a 27 / 28
  • 121. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Discussion: English-French vs. English-German results English-German corpus is much less comparable (0.45 vs. 0.71) Morphological types: German germanic language: tendency to agglutination oestrogen-independant → Ostrogen-unabh¨ngige a French romance language: creates phrases more easily oestrogen-independant → ind´pendant des œstrog`nes e e 27 / 28
  • 122. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis 28 / 28
  • 123. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering 28 / 28
  • 124. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ 28 / 28
  • 125. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ Wrong or innapropriate translations 28 / 28
  • 126. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Data Evaluation measures Results Error analysis Problems in word reordering self-examination → untersuchung selbst ’examination self’ Wrong or innapropriate translations in-patient → pas malade ’not ill’ in → “inside” → inside patient in → “inverse” → not a patient 28 / 28
  • 127. Outline 1 Context and original problem 2 Compositional translation framework 3 Detailed translation method 4 Experiments and results 5 Future work
  • 128. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work 29 / 28
  • 129. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources 29 / 28
  • 130. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming 29 / 28
  • 131. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus 29 / 28
  • 132. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus Try translations patterns instead of permutations 29 / 28
  • 133. Context and original problem Compositional translation framework Detailed translation method Experiments and results Future work Future work Improve quality of linguistic resources morphological derivation rules instead of stemming use of a thesaurus Try translations patterns instead of permutations Rank translations 29 / 28
  • 134. Thank you for your attention. B estelle.delpech@univ-nantes.fr beatrice.daille@univ-nantes.fr emmanuel.morin@univ-nantes.fr cl@lingua-et-machina.com
  • 136. Exact translations Non fertiles: pathophysiological → physiopathologique overactive → uberaktiv ¨ Fertiles: cardiotoxicity → toxicit´ cardiaque ’cardiac toxicity’ e mastectomy → ablation der brust ’ablation of the breast’
  • 137. Morphological variants Non fertiles: dosimetry → dosim´trique ’dosimetric’ e radiosensitivity → strahlenempfindlich ’radiosensitive’ Fertiles: milk-producing → production de lait ’production of milk’ selfexamination → selbst untersuchen ’self examine’
  • 138. Inexact but semantically related Non fertiles: oncogene → oncog´n`se ’oncogenesis’ e e breakthrough → durchbrechen ’break’ Fertiles: chemoradiotherapy → chemotherapie oder strahlen ’chemotherapy or radiation’ treatable → pouvoir le traiter ’can treat it’
  • 139. Wrong translations Non fertiles: immunoscore → immunomarquer ’immunostain’ check-in → unkontrollieren ’uncontrolled’ Fertiles: bloodstream → fliessen mehr blut ’more blood flow’ risk-reducing → risque de r´duire ’risk of reducing’ e
  • 140. R´f´rences I ee Bo, L. and Gaussier, E. (2010). Improving corpus comparability for bilingual lexicon extraction from comparable corpora. In 23`me International Conference on Computational Linguistics, pages 23–27, Beijing, Chine. e Cartoni, B. (2009). Lexical morphology in machine translation: A feasibility study. In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece. Harastani, R., Daille, B., and Morin, E. (2012). Neoclassical compound alignments from comparable corpora. In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent Text Processing, volume 2, pages 72–82, New Delhi, India. Hauer, B. and Kondrak, G. (2011). Clustering semantically equivalent words into cognate sets in multilingual lists. In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873, Chiang Mai, Thailand. Keenan, E. L. and Faltz, L. M. (1985). Boolean semantics for natural language. D. Reidel, Dordrecht, Holland. Morin, E. and Daille, B. (2009). Compositionality and lexical alignment of multi-word terms. In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plain sailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moir´n, springer netherlands o edition. Namer, F. and Baud, R. (2007). Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system. International Journal of Medical Informatics, 76(2-3):226–33.
  • 141. R´f´rences II ee Porter, M. F. (1980). An algorithm for suffix stripping. Program, 14(3):130–137. Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011). Simple methods for dealing with term variation and term alignment. In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93, Paris, France.