Your SlideShare is downloading. ×
DeepMiner - Advanced Leveraging :Integrating Translation Memories and Machine Translation
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

DeepMiner - Advanced Leveraging : Integrating Translation Memories and Machine Translation

426
views

Published on

Presentation at TEKOM October 25th,2012 …

Presentation at TEKOM October 25th,2012

DeepMiner Integrating Translation Memories and Machine Translation

Published in: Technology

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
426
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
11
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. DeepMinerIntegrating Translation Memories andMachine TranslationTEKOMOctober 25th, 2012Presenter: Daniel Benito
  • 2. Introduction• History• Limitations of Translation Memory• Beyond Segment-Level Reuse – Machine Translation – Fuzzy Match Repair – Advanced Leveraging – Combining TM and MT• Current Limitations• Perspectives• Conclusion
  • 3. History• Past: – 1950s – Early Machine Translation (MT) experiments – 1960s – General awareness that Machine Translation (MT) was not going to replace human translators – 1970s – First proposals for Translator Workstations – 1990s – Translation Memory (TM) became viable• Present: – TM technology has barely advanced in the last ten years – MT has advanced to the point where its applications in the translation industry are incontrovertible
  • 4. Limitations of Translation Memory• Segment-level translation reuse is only useful in limited cases• Even in highly repetitive texts, most of the repetitions happen at the sub-segment level: – Terms and phrases – Sentence structure• Most Translation Memory systems are limited to providing fuzzy matches but are unable to exploit sub-segment repetition
  • 5. Beyond Segment-level Reuse• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• What can we do to reduce the time spent editing fuzzy matches? – Ignore the fuzzy matches and use MT – Automatically repair the fuzzy matches
  • 6. Machine Translation• We need to translate: EN: The black cat usually sleeps in the hallway.• Results returned by various MT systems: DE: Die schwarze Katze in der Regel schläft im Flur. DE: Die schwarze Katze schläft normalerweise im Flur.• Achieving consistency and using specific terminology (e.g. Gang instead of Flur) will require some degree of training or post-editing
  • 7. Machine Translation• General-purpose MT engines such as Google Translate or Microsoft Translator usually require extensive post-editing, but can be used for inspiration• Rule-based and statistical MT engines customized for specific domains offer much higher quality but require expensive tuning or retraining• It is usually more expensive to use MT than to manually edit a fuzzy match
  • 8. Fuzzy Match Repair• Inspired by the translation by analogy concept from Example-Based Machine Translation (EBMT)• Attempts to maintain the quality and consistency of existing translations in the TM while increasing productivity
  • 9. Fuzzy Match Repair• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• We can replace graue with schwarze and Wohnzimmer with Gang to produce an exact match.
  • 10. Fuzzy Match Repair• Requires knowing the following translations: grey → graue black → schwarze living room → Wohnzimmer hallway → Gang• What do we do if those translations are not explicitly in our TMs or termbases?
  • 11. Advanced Leveraging• Bilingual concordance search: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer. EN: Mary has bought a new pair of grey running shoes. DE: Maria hat ein neues Paar graue Laufschuhe gekauft. EN: This article is also available in grey. DE: Dieser Artikel ist auch in grau erhältlich.
  • 12. Advanced Leveraging• Statistically infer translations from the TM• Compare all of the German translations and suggest one or more probable translations (e.g. graue, grau)• Requires: – Large TMs with many examples – Consistent translations in the TM
  • 13. Combining TM and MT• We can use MT as an additional resource for finding the translations needed to repair fuzzy matches• MT systems often give better results for terms and short phrases than for long sentences• We approach this combination based on the following premises: – A client’s own data is considered to be of higher quality and will always have priority over the Machine Translation results – A fuzzy match repaired with Machine Translation will usually be better than a normal fuzzy match, and better than an MT result for an entire segment
  • 14. Combining TM and MT• We need to translate: EN: The black cat usually sleeps in the hallway.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• Our termbase contains: EN: grey DE: graue EN: black DE: schwarze EN: hallway DE: Gang
  • 15. Combining TM and MT• We do not have the translation for living room in our TM or our termbase, so we can request it from the MT system: EN: living room DE: Wohnzimmer• The combination of material in our TM, termbase and MT system allows to perform the appropriate replacements and obtain: EN: The black cat usually sleeps in the hallway. DE: Die schwarze Katze schläft gewöhnlich im Gang.
  • 16. Current Limitations• We need to translate: EN: The white dog usually sleeps in the living room.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• Our termbase contains: EN: grey cat DE: graue Katze
  • 17. Current Limitations• Asking the MT system for the missing translation, we get: EN: white dog DE: weißer Hund• The result of fixing the fuzzy match is: EN: The white dog usually sleeps in the living room. DE: Die weißer Hund schläft gewöhnlich im Wohnzimmer.• Some post-editing is still required
  • 18. Current Limitations• We need to translate: EN: The grey cat often sleeps in the living room.• Our TM contains: EN: The grey cat usually sleeps in the living room. DE: Die graue Katze schläft gewöhnlich im Wohnzimmer.• The translations we get from the MT system are: EN: usually DE: normalerweise EN: often DE: oft• We cannot repair the fuzzy match because we do not know how usually has been translated
  • 19. Future Developments• Greater integration with the MT engines – Access to internal translation candidates: • EN: usually • DE: normalerweise, gewöhnlich, sonst, ... – Access to internal language models: • DE: Die weißer Hund – never • DE: Der weiße Hund – often – Automatic upload of new TM material to the MT engine so it can be used for retraining in the future
  • 20. Conclusion• Traditional segment-level translation reuse has reached its full potential• ATRIL’s Déjà Vu X2 already includes DeepMiner technology that improves productivity by cleverly combining all the approaches we described: – (Statistical) Machine Translation – Example-Based Machine Translation – Advanced Leveraging (sub-segment matching)
  • 21. Questions?
  • 22. Additional Topics
  • 23. Predictive Typing• Find all sub-segment matches and offer them to the translator as he or she types• Suggestions are context-sensitive, so there are never too many results to choose from• Translations are constructed piece by piece from previous texts, guided by the translator
  • 24. Advanced Predictive Typing• Advanced Leveraging techniques for statistically inferring sub-segment translations from the TM can be adapted to provide additional predictive typing suggestions• Translations from MT can be added to the predictive typing mechanism, to offer additional suggestions for translations of terms and phrases
  • 25. MT integrations in Déjà Vu X2• Systran Entreprise Server• Google Translate• Microsoft Translator• PROMT Translation Server• itranslate4eu
  • 26. Systran Entreprise Server
  • 27. Google Translate
  • 28. Microsoft Translator
  • 29. PROMT Translation Server
  • 30. itranslate4eu

×