R&D Lingua et Machina

423 views

Published on

Material of the 4th Intensive Summer school and collaborative workshop on Natural Language Processing (NAIST Franco-Thai Workshop 2010).
Bangkok, Thaıland.
Institution: Institut de Recherche en Informatique de Toulouse (IRIT), Lingua et Machina

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
423
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
2
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

R&D Lingua et Machina

  1. 1. Franco-Thai Workshop 2010 Lingua et Machina Research & Development 1
  2. 2. About me ● ● ● ● ● ● ● ● Estelle Delpech Research engineer at Lingua et Machina, France CAT tools provider ed(at)lingua-et-machina(dot)com www.lingua-et-machina.com Ph. Candidate at LINA, France taln team : specialises in NLP estelle.delpech(at)univ-nantes(dot)fr 2
  3. 3. LINGUA ET MACHINA ● ● ● ● French company Founded by Dr E. Planas Led by Dr. F. De Colstoun Small but innovative ● 8 persons ● 2 R&D engineers / Ph. D. candidates ● NLP ● Computational Linguistics ● Translation Studies 3
  4. 4. LINGUA ET MACHINA ● 2002 ● ● ● ● SIMILIS 2nd generation translation memories Based on Ph.D. work 2007 ● ● ● LIBELLEX Access to TM for non-professionals Translation and terminology management platform 4
  5. 5. They trust us 5
  6. 6. Partners 6
  7. 7. SIMILIS ● ● ● ● Computer-aided translation ● Free -lance translators ● Translation agencies Translation memories ● Pre translations Terminology extraction 7 languages : FR,EN,IT,ES,PT,DE,NL → rule based 7
  8. 8. Similis Part 1/1 TITLE 1 8
  9. 9. SIMILIS technology Based on the Ph. D. work of E. Planas ● First generation translation memory ● Works with segments, sentences ● Second generation translation memory ● Works with chunks ● [the driver] [steps] [on the gas pedal] ● Chunking ● Rules written by linguists ● Fuzzy matching ● Modified edit-distance ● Several linguistic levels ● 9
  10. 10. From SIMILIS to LIBELLEX Source Text French Documents Moderator Memory (TMX) Glossary English Documents Translated Text (lexicon) Moderator Translators linguists Business Experts 10
  11. 11. LIBELLEX ● ● Translation memories meet corporate content management Target : global companies ● Many languages ● customers ● Parterns ● employees ● Speakers ● Non native ● Not language professionals ● Terminology and translations needs ● Official documentation ● Day to day intern communication 11
  12. 12. Libellex ● ● Terminology management platform ● builds corporate TM ● extract / check terminology ● help employees communicate Translation management platform ● manage translations jobs ● terminologies for translation agencies ● chunk matches for MT 12
  13. 13. Libellex Part 1/1 TITLE 1 ● ● ● ● ● ● Look up a word, a term, an expression Manage terminology Have a document translated Check translations Check text Add new documents 13
  14. 14. R-D-I at Lingua et Machina On going ● Statistical term extraction ● « Cheap and quick » addition of new languages ● Consider hybridation with rule-based methods ● Term alignment in comparable corpora ● Modelize translation process Planned ● Development of rule-based chunking on Chinese ● Extraction of « Knowledge-rich contexts » for terminologies 14
  15. 15. Research partnerships ● ● ● ● ● Statistical term extraction and alignment ● A. Lardilleux, Y. Lepage (Caen/Waseda) Chinsese processing ● EDF, Kinep Comparable corpora ● National project + Ph. D. candidate KRC extraction ● European project submission Translation studies ● Ph. D. candidate : Stendhal University 15
  16. 16. Statistical term extraction and alignment ● ● ● Algorithm developed by A. Lardilleux in Ph. D. Thesis ● http://users.info.unicaen.fr/~alardill/ Uses “perfect alignments“ ● Source and target words that only occur in the same source and target sentences adf ↔ AD b ↔ BE b ↔ CF a e ↔ AE d D R n o ly b ild sm sa p s o co u adm u s a ll m le f rp s ● Perfect alignments add-up 16
  17. 17. Chinese and other languages ● ● ● ● Chinese processing ● EDF uses Libellex ● Needs ZH↔FR ZH ↔ EN translation Currently : ● Statistical term alignment and extraction Planned : ● Chinese chunking rule ● Develop hybrid statistical/rule-based chunk alignment Other languages : ● Asian ● Northern european ● Eastern european 17
  18. 18. Metricc projetc ● ● ● Scope : national Bilingual terminologies mining from comparable corpora ● CAT ● Translation memories ● CLIR Partners ● Syllabs, Sinéqua, LM ● IMAG, Valoria http://www.metricc.com 18
  19. 19. Metricc : term alignment in comparable corpora ● ● ● ● ● Based on distributional analysis hypothesis ● Words that appear in similar contexts have similar meaning Represent context of a word in vector : ● Word cooccurrents + normalized frequencies Translate context vector with seed lexicon Compute distance between source and target vectors The closer , the better 19
  20. 20. Knowledge-Rich Contexts Extraction ● ● ● ● Project under submission Scope : european Partners : ● Inbenta , BEO ● Lljublana University, LINA Knowlege-rich contexts ● Help understand the term ● Indicates of to use the term 20
  21. 21. Knowledge-Rich Contexts Extraction ● ● ● Examples of KRC : ● Contains of definition ● Describes a relation between two terms ● Indicates a collocation ● Illustrates the term KRC linguistic description ● Exemples, definitions in dictionaries ● Corpus study KRC automatic identification ● Morpho syntactic patterns ● Statistical clues 21
  22. 22. Modelization of translation process ● ● ● ● ● ● Research engineer / Ph. D. Thesis ● Department of translations studies ● Université Stendhal, Grenoble How do we translate ? What knowledge is helpful to translators ? What is a good translation ? Do non-professional translate differently ? How do you improve software usability ? 22
  23. 23. More information ● ● ● Lingua et Machina ● www.lingua-et-machina.com/ ● contact(a)lingua-et-machina.com Libellex ● http://libellex.fr/ Download Similis ● http://similis.org/Download/SimilisFreel ance-2.16.04-Setup.exe 23
  24. 24. Franco-Thai Workshop 2010 Thank you ed(a)lingua-et-machina.com 24

×