Welcome to the Cloud! Terminology as a Service, CHAT2013

796 views

Published on

Presenter: Andrejs Vasiljevs (Tilde)

This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
796
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
13
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Welcome to the Cloud! Terminology as a Service, CHAT2013

  1. 1. Welcome to the Cloud! Terminology as a Service Andrejs Vasiļjevs Tilde tekom 2013 / Wiesbaden / 07.11.2013.
  2. 2. Complexity of terminology works  Term identification in the source text  Consulting online databases and local files for translation equivalents  Creating and maintaining terminology glossaries  Sharing term glossaries and involving others in their polishing  Structuring data in the industry standard formats  Integrating term glossaries in CAT and other productivity tools  Keeping terminology up to date  etc.
  3. 3. Terminology as a Service cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data
  4. 4. TaaS User Needs Survey Results: Importance of terminology work 1.8% 14.8% 43.5% Very important Quite important Less important Not important 39.9%
  5. 5. TaaS User Needs Survey: willingness to share 60.5% 39.5% Yes, provided that… 16.7% No, because… 8.3% 24.9% 6.0% 4.6% 16.5% 48.6% 7.6% 19.2% 11.4% 14.2% Joint contribution to the DB Access control Legal aspects External quality control Little effort Anonymity Other 22.0% Legal restrictions Poor quality/Lack of time Own asset Risk of misunderstanding
  6. 6. TaaS Partners  Tilde Latvia (Coordinator)  TAUS Netherlands  Kilgray Hungary  Cologne University of Applied Sciences  University of Sheffield Germany UK
  7. 7. TaaS Mission  Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries  Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools  Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data
  8. 8. Key services of TaaS  Automatic extraction of monolingual term candidates from user uploaded documents  Automatic retrieval of translation equivalents from different public and industry terminology databases  Translation candidate acquisition from multilingual web data  Facilities for cleaning-up by users automatically acquired terminological data;  Data sharing and integration facilities through APIs and export tools
  9. 9. Focus areas Research     Quality Performance Scalability Interoperability  Term extraction  Collection of domain specific multilingual corpora  Max(FTC) Development Usage  Usability  Outreach  Sustainability
  10. 10. TaaS Services
  11. 11. Target Repositories  TAUS Data repository of multilingual translation memories  EuroTermBank databank of federated multilingual terminology  IATE inter-institutional termbank of European Union  META-SHARE distributed Pan-European repository of language resources
  12. 12. Integration  Support for industry standard formats  Integration into CAT and productivity tools  API to integrate TaaS services into various software applications
  13. 13. Term identification and annotation
  14. 14. HTML Term Annotation Term entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.
  15. 15. XLIFF Term Annotation
  16. 16. Identifying and marking terms New W3C standard for Internationalization Tag Set ITS 2.0 ITS 2.0 enriched content ITS 2.0 enriched content Showcase Web Page Terminology Annotation Web Service API Plaintext TaaS Terminology Services Human users (e.g., translators, terminologists) ITS2.0 term-annotated content export / visualisation ITS2.0 term-annotated content ITS 2.0 enriched content Term-annotated content ITS2.0 term-annotated content Machine users CAT Tools MT Systems
  17. 17. CAT tools MT https REST https REST Presentation Layer included Public API included Web Page UI External TDBs https REST Web Browsers http/https html TaaS Architecture Application Logic Layer Terminology collection management User management Data Storage Layer (Shared Term Repository) Terminology collection search Terminology collection creation Term extraction workflows Full collection creation workflow Monolingual collection creation High-performance Computing (HPC) Cluster File Store HPC frontend SGE Translation candidate extraction Modules Term extraction TXT extractor TWSC Kilgray Term Extractor Term normalizer CPU CPU Collection creator CPU CPU Statistical DB acquisition CPU Statistical DB CPU CPU Shared Term Repository DB Text tagging with terms CPU CPU CPU CPU CPU Parameter retriever Bilingual Term Extraction System Statistical DB feeding .... Translation lookup ETB & STR IATE TAUS API Statistical DB Collection merger Result processing Collection Importer Marked Text enrichment
  18. 18. koks timber How to instruct SMT to use the right terms?
  19. 19. Put TaaS in the service for MT
  20. 20. s do-it-yourself MT factory on the cloud
  21. 21. Boost in the quality of machine translation Narrow Domain Automotive MT English – Latvian DATA 2 M unique parallel sentences 1.9 M monolingual sentences 0.2 M in-domain monolingual QUALITY 16% improvement from terminology integration
  22. 22. Come & Try demo.taas-project.eu
  23. 23. Thank you! andrejs@tilde.com The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312

×