Welcome to the Cloud! Terminology as a Service, CHAT2013
Upcoming SlideShare
Loading in...5

Welcome to the Cloud! Terminology as a Service, CHAT2013



Presenter: Andrejs Vasiljevs (Tilde)

Presenter: Andrejs Vasiljevs (Tilde)

This presentation is a part of TaaS project funded from the European Union Seventh Framework Programme (FP7/2007-2013), grant agreement no 296312



Total Views
Slideshare-icon Views on SlideShare
Embed Views



2 Embeds 2

https://twitter.com 1
http://kred.com 1



Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment

    Welcome to the Cloud! Terminology as a Service, CHAT2013 Welcome to the Cloud! Terminology as a Service, CHAT2013 Presentation Transcript

    • Welcome to the Cloud! Terminology as a Service Andrejs Vasiļjevs Tilde tekom 2013 / Wiesbaden / 07.11.2013.
    • Complexity of terminology works  Term identification in the source text  Consulting online databases and local files for translation equivalents  Creating and maintaining terminology glossaries  Sharing term glossaries and involving others in their polishing  Structuring data in the industry standard formats  Integrating term glossaries in CAT and other productivity tools  Keeping terminology up to date  etc.
    • Terminology as a Service cloud-based platform for acquiring, cleaning up, sharing, and reusing multilingual terminological data
    • TaaS User Needs Survey Results: Importance of terminology work 1.8% 14.8% 43.5% Very important Quite important Less important Not important 39.9%
    • TaaS User Needs Survey: willingness to share 60.5% 39.5% Yes, provided that… 16.7% No, because… 8.3% 24.9% 6.0% 4.6% 16.5% 48.6% 7.6% 19.2% 11.4% 14.2% Joint contribution to the DB Access control Legal aspects External quality control Little effort Anonymity Other 22.0% Legal restrictions Poor quality/Lack of time Own asset Risk of misunderstanding
    • TaaS Partners  Tilde Latvia (Coordinator)  TAUS Netherlands  Kilgray Hungary  Cologne University of Applied Sciences  University of Sheffield Germany UK
    • TaaS Mission  Simplify the process for language workers to prepare, store and share of task-specific multilingual term glossaries  Provide instant access to term translation equivalents and translation candidates for professional translators through CAT tools  Domain adaptation of statistical machine translation systems by dynamic integration with TaaS provided terminology data
    • Key services of TaaS  Automatic extraction of monolingual term candidates from user uploaded documents  Automatic retrieval of translation equivalents from different public and industry terminology databases  Translation candidate acquisition from multilingual web data  Facilities for cleaning-up by users automatically acquired terminological data;  Data sharing and integration facilities through APIs and export tools
    • Focus areas Research     Quality Performance Scalability Interoperability  Term extraction  Collection of domain specific multilingual corpora  Max(FTC) Development Usage  Usability  Outreach  Sustainability
    • TaaS Services
    • Target Repositories  TAUS Data repository of multilingual translation memories  EuroTermBank databank of federated multilingual terminology  IATE inter-institutional termbank of European Union  META-SHARE distributed Pan-European repository of language resources
    • Integration  Support for industry standard formats  Integration into CAT and productivity tools  API to integrate TaaS services into various software applications
    • Term identification and annotation
    • HTML Term Annotation Term entries for terms identified in EuroTermBank are stored in TBX format in a <script> element that is placed in the HTML5 document.
    • XLIFF Term Annotation
    • Identifying and marking terms New W3C standard for Internationalization Tag Set ITS 2.0 ITS 2.0 enriched content ITS 2.0 enriched content Showcase Web Page Terminology Annotation Web Service API Plaintext TaaS Terminology Services Human users (e.g., translators, terminologists) ITS2.0 term-annotated content export / visualisation ITS2.0 term-annotated content ITS 2.0 enriched content Term-annotated content ITS2.0 term-annotated content Machine users CAT Tools MT Systems
    • CAT tools MT https REST https REST Presentation Layer included Public API included Web Page UI External TDBs https REST Web Browsers http/https html TaaS Architecture Application Logic Layer Terminology collection management User management Data Storage Layer (Shared Term Repository) Terminology collection search Terminology collection creation Term extraction workflows Full collection creation workflow Monolingual collection creation High-performance Computing (HPC) Cluster File Store HPC frontend SGE Translation candidate extraction Modules Term extraction TXT extractor TWSC Kilgray Term Extractor Term normalizer CPU CPU Collection creator CPU CPU Statistical DB acquisition CPU Statistical DB CPU CPU Shared Term Repository DB Text tagging with terms CPU CPU CPU CPU CPU Parameter retriever Bilingual Term Extraction System Statistical DB feeding .... Translation lookup ETB & STR IATE TAUS API Statistical DB Collection merger Result processing Collection Importer Marked Text enrichment
    • koks timber How to instruct SMT to use the right terms?
    • Put TaaS in the service for MT
    • s do-it-yourself MT factory on the cloud
    • Boost in the quality of machine translation Narrow Domain Automotive MT English – Latvian DATA 2 M unique parallel sentences 1.9 M monolingual sentences 0.2 M in-domain monolingual QUALITY 16% improvement from terminology integration
    • Come & Try demo.taas-project.eu
    • Thank you! andrejs@tilde.com The research within the project TaaS leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013), Grant Agreement no 296312