DeepMinerIntegrating Translation Memories andMachine TranslationTEKOMOctober 25th, 2012Presenter: Daniel Benito
Introduction• History• Limitations of Translation Memory• Beyond Segment-Level Reuse   –   Machine Translation   –   Fuzzy...
History• Past:   – 1950s – Early Machine Translation (MT) experiments   – 1960s – General awareness that Machine Translati...
Limitations of Translation Memory• Segment-level translation reuse is only useful in  limited cases• Even in highly repeti...
Beyond Segment-level Reuse• We need to translate:      EN: The black cat usually sleeps in the hallway.• Our TM contains: ...
Machine Translation• We need to translate:      EN: The black cat usually sleeps in the hallway.• Results returned by vari...
Machine Translation• General-purpose MT engines such as Google  Translate or Microsoft Translator usually require  extensi...
Fuzzy Match Repair• Inspired by the translation by analogy concept from  Example-Based Machine Translation (EBMT)• Attempt...
Fuzzy Match Repair• We need to translate:      EN: The black cat usually sleeps in the hallway.• Our TM contains:      EN:...
Fuzzy Match Repair• Requires knowing the following translations:      grey → graue      black → schwarze      living room ...
Advanced Leveraging• Bilingual concordance search:   EN: The grey cat usually sleeps in the living room.   DE: Die graue K...
Advanced Leveraging• Statistically infer translations from the TM• Compare all of the German translations and suggest  one...
Combining TM and MT• We can use MT as an additional resource for finding  the translations needed to repair fuzzy matches•...
Combining TM and MT• We need to translate:      EN: The black cat usually sleeps in the hallway.• Our TM contains:      EN...
Combining TM and MT• We do not have the translation for living room in our  TM or our termbase, so we can request it from ...
Current Limitations• We need to translate:      EN: The white dog usually sleeps in the living room.• Our TM contains:    ...
Current Limitations• Asking the MT system for the missing translation, we  get:      EN: white dog      DE: weißer Hund• T...
Current Limitations• We need to translate:      EN: The grey cat often sleeps in the living room.• Our TM contains:      E...
Future Developments• Greater integration with the MT engines   – Access to internal translation candidates:      • EN: usu...
Conclusion• Traditional segment-level translation reuse has  reached its full potential• ATRIL’s Déjà Vu X2 already includ...
Questions?
Additional Topics
Predictive Typing• Find all sub-segment matches and offer them to the  translator as he or she types• Suggestions are cont...
Advanced Predictive Typing• Advanced Leveraging techniques for statistically  inferring sub-segment translations from the ...
MT integrations in Déjà Vu X2•   Systran Entreprise Server•   Google Translate•   Microsoft Translator•   PROMT Translatio...
Systran Entreprise Server
Google Translate
Microsoft Translator
PROMT Translation Server
itranslate4eu
Upcoming SlideShare
Loading in …5
×

DeepMiner - Advanced Leveraging : Integrating Translation Memories and Machine Translation

635 views
567 views

Published on

Presentation at TEKOM October 25th,2012

DeepMiner Integrating Translation Memories and Machine Translation

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
635
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
13
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

DeepMiner - Advanced Leveraging : Integrating Translation Memories and Machine Translation

  1. 1. DeepMinerIntegrating Translation Memories andMachine TranslationTEKOMOctober 25th, 2012Presenter: Daniel Benito
  2. 2. Introduction• History• Limitations of Translation Memory• Beyond Segment-Level Reuse – Machine Translation – Fuzzy Match Repair – Advanced Leveraging – Combining TM and MT• Current Limitations• Perspectives• Conclusion
  3. 3. History• Past: – 1950s – Early Machine Translation (MT) experiments – 1960s – General awareness that Machine Translation (MT) was not going to replace human translators – 1970s – First proposals for Translator Workstations – 1990s – Translation Memory (TM) became viable• Present: – TM technology has barely advanced in the last ten years – MT has advanced to the point where its applications in the translation industry are incontrovertible
  4. 4. Limitations of Translation Memory• Segment-level translation reuse is only useful in limited cases• Even in highly repetitive texts, most of the repetitions happen at the sub-segment level: – Terms and phrases – Sentence structure• Most Translation Memory systems are limited to providing fuzzy matches but are unable to exploit sub-segment repetition
  5. 5. Beyond Segment-level Reuse• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• What can we do to reduce the time spent editing fuzzy matches? – Ignore the fuzzy matches and use MT – Automatically repair the fuzzy matches
  6. 6. Machine Translation• We need to translate: EN: The black cat usually sleeps in the hallway.• Results returned by various MT systems: DE: Die schwarze Katze in der Regel schläft im Flur. DE: Die schwarze Katze schläft normalerweise im Flur.• Achieving consistency and using specific terminology (e.g. Gang instead of Flur) will require some degree of training or post-editing
  7. 7. Machine Translation• General-purpose MT engines such as Google Translate or Microsoft Translator usually require extensive post-editing, but can be used for inspiration• Rule-based and statistical MT engines customized for specific domains offer much higher quality but require expensive tuning or retraining• It is usually more expensive to use MT than to manually edit a fuzzy match
  8. 8. Fuzzy Match Repair• Inspired by the translation by analogy concept from Example-Based Machine Translation (EBMT)• Attempts to maintain the quality and consistency of existing translations in the TM while increasing productivity
  9. 9. Fuzzy Match Repair• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• We can replace graue with schwarze and Wohnzimmer with Gang to produce an exact match.
  10. 10. Fuzzy Match Repair• Requires knowing the following translations: grey → graue black → schwarze living room → Wohnzimmer hallway → Gang• What do we do if those translations are not explicitly in our TMs or termbases?
  11. 11. Advanced Leveraging• Bilingual concordance search: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer. EN: Mary has bought a new pair of grey running shoes. DE: Maria hat ein neues Paar graue Laufschuhe gekauft. EN: This article is also available in grey. DE: Dieser Artikel ist auch in grau erhältlich.
  12. 12. Advanced Leveraging• Statistically infer translations from the TM• Compare all of the German translations and suggest one or more probable translations (e.g. graue, grau)• Requires: – Large TMs with many examples – Consistent translations in the TM
  13. 13. Combining TM and MT• We can use MT as an additional resource for finding the translations needed to repair fuzzy matches• MT systems often give better results for terms and short phrases than for long sentences• We approach this combination based on the following premises: – A client’s own data is considered to be of higher quality and will always have priority over the Machine Translation results – A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment
  14. 14. Combining TM and MT• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• Our termbase contains: EN: grey DE: graue EN: black DE: schwarze EN: hallway DE: Gang
  15. 15. Combining TM and MT• We do not have the translation for living room in our TM or our termbase, so we can request it from the MT system: EN: living room DE: Wohnzimmer• The combination of material in our TM, termbase and MT system allows to perform the appropriate replacements and obtain: EN: The black cat usually sleeps in the hallway. DE: Die schwarze Katze schläft gewöhnlich im Gang.
  16. 16. Current Limitations• We need to translate: EN: The white dog usually sleeps in the living room.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• Our termbase contains: EN: grey cat DE: graue Katze
  17. 17. Current Limitations• Asking the MT system for the missing translation, we get: EN: white dog DE: weißer Hund• The result of fixing the fuzzy match is: EN: The white dog usually sleeps in the living room. DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.• Some post-editing is still required
  18. 18. Current Limitations• We need to translate: EN: The grey cat often sleeps in the living room.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• The translations we get from the MT system are: EN: usually DE: normalerweise EN: often DE: oft• We cannot repair the fuzzy match because we do not know how usually has been translated
  19. 19. Future Developments• Greater integration with the MT engines – Access to internal translation candidates: • EN: usually • DE: normalerweise, gewöhnlich, sonst, ... – Access to internal language models: • DE: Die weißer Hund – never • DE: Der weiße Hund – often – Automatic upload of new TM material to the MT engine so it can be used for retraining in the future
  20. 20. Conclusion• Traditional segment-level translation reuse has reached its full potential• ATRIL’s Déjà Vu X2 already includes DeepMiner technology that improves productivity by cleverly combining all the approaches we described: – (Statistical) Machine Translation – Example-Based Machine Translation – Advanced Leveraging (sub-segment matching)
  21. 21. Questions?
  22. 22. Additional Topics
  23. 23. Predictive Typing• Find all sub-segment matches and offer them to the translator as he or she types• Suggestions are context-sensitive, so there are never too many results to choose from• Translations are constructed piece by piece from previous texts, guided by the translator
  24. 24. Advanced Predictive Typing• Advanced Leveraging techniques for statistically inferring sub-segment translations from the TM can be adapted to provide additional predictive typing suggestions• Translations from MT can be added to the predictive typing mechanism, to offer additional suggestions for translations of terms and phrases
  25. 25. MT integrations in Déjà Vu X2• Systran Entreprise Server• Google Translate• Microsoft Translator• PROMT Translation Server• itranslate4eu
  26. 26. Systran Entreprise Server
  27. 27. Google Translate
  28. 28. Microsoft Translator
  29. 29. PROMT Translation Server
  30. 30. itranslate4eu

×