Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Tralogy 2011-user scenariosttc

285 views

Published on

This paper presents usage scenarios of the platform being developed within the TTC project (Terminology Extraction, Translation Tools and Comparable Corpora) along with the first feedback from potential users. The TTC project aims at leveraging translation tools, computer-assisted translation tools, and terminology management tools by automatically generating bilingual terminologies from comparable corpora in several languages of the European Union (English, French, German, Latvian and Spanish), as well as in Chinese and Russian. The TTC platform includes a web crawler and a corpora management tool, as well as tools for monolingual term extraction and bilingual terminology alignment, online terminology management, and terminology export into CAT tools and MT systems.
Overall, the paper focuses on the language activities to be carried out with the TTC tools, issues with respect to the availability of required language resources and linguistic knowledge, and different user profiles and needs. Regarding potential user needs, we discuss the results of an online questionnaire-based survey on terminology and corpora issues conducted in the translation and localization industry to reveal user needs. Furthermore, we present the envisaged usage scenarios as well as first feedback from potential users. The expected TTC input and outputs are also outlined. Finally, as it seems clear that the amount of available data and resources will not be the same for all languages, we discuss technical solutions to achieve language coverage: the TTC tools will offer different approaches depending on the amount and type of linguistic knowledge available.

Published in: Technology
  • Be the first to comment

Tralogy 2011-user scenariosttc

  1. 1. User-centered Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT/CAT Helena Blancafort, Ulrich Heid, Tatiana Gornostay, Claude Méchoulam, Béatrice Daille, Serge Sharoff Project TTCTerminology Extraction, Translation Tools and Comparable Corpora Tralogy - 3rd of March 2011 1
  2. 2. The idea behind TTC Tool Functions and Applications WEB CrawlingCorpus Corpus in SL in TL Tralogy - 3rd of March 2011 2
  3. 3. The idea behind TTC Tool Functions and Applications WEB Crawling Corpus Corpus in SL in TL wind energy aérogénérateur Term Term wind turbine énergie éolienneExtraction Extraction
  4. 4. The idea behind TTC Tool Functions and Applications WEB Crawling Corpus Corpus in SL in TL wind energy aérogénérateur Term Term wind turbine énergie éolienneExtraction Extraction Term Alignment Tralogy - 3rd of March 2011 4
  5. 5. The idea behind TTC Tool Functions and Applications WEB Crawling Corpus Corpus in SL in TL wind energy aérogénérateur Term Term wind turbine énergie éolienneExtraction Extraction Term Alignment MT CAT Tools Rule-based - Systran Statistical MT -Moses Tralogy - 3rd of March 2011 5
  6. 6. First Interaction with Users Needs and Expectations Online Survey among Translation Industry ( March 2010) • 139 respondents from 31 countries Workshop with experts (Oct 2010) • Users, deployers, developers  feedback TTC specifications Topics • Relevance of terminology work • User and application types • Input to TTC tools • Output of TTC tools Tralogy - 3rd of March 2011 6
  7. 7. User Needs Survey Relevance of Terminology WorkStable since 2004 LISA survey • 75% systematic terminology work (LISA) • Over 50% spend 10-30% time on terminology (TTC)Use of tools• 74% use CAT tools (TTC), 27% terminology tools (SDL)• 66% are interested in new solutions (TTC)• 50% collect corpora manually• 40% agree to share their terminology within an online database Need / opportunity for terminology tools Tralogy - 3rd of March 2011 7
  8. 8. Types of Potential TTC tool usersRequests of Oct Workshop Standard • little time, small amount of information Users • Translators, technical writers Advanced • Terminology specialists, translation proofreaders Users • Interest in broad documentation of output • interest in specific solutions MT users • Focus on workflow integration Tralogy - 3rd of March 2011 8
  9. 9. Input to TTC tools Feedback of Domain Experts Awareness of mixed quality: WEB genres, text types Crawling  TTC output include METADATA Seeds -Company data Users have different input- existing terminologies - just keywords Tralogy - 3rd of March 2011 9
  10. 10. Output of TTC tools by User Types Standard • Equivalents: maxim. 5 candidates Users • Format for CAT tools: TBX, tables (Excel) Advanced • Term origin  metadata (Dublin Core based) • Reliability  confidence values Users • Term variants • Output adapted to the respective system MT users • RBMT vs. SMT vs. CAT tools Tralogy - 3rd of March 2011 10
  11. 11. Next stepsIntegrate lessons learnt from users into TTCprototype • Metadata in focused crawler • Provide term variants • Different output formatsTest of TTC tool prototype with Advisory Board members2nd Users Workshop Spring 2012 Tralogy - 3rd of March 2011 11
  12. 12. Merci !http://www.ttc-project.eu/ Tralogy - 3rd of March 2011 12
  13. 13. TTC Output for Advanced UsersRequests from Potential Users1. Equivalents2. Example sentences3. Definitions x4. Style/usage5. Frequency6. Subject field7. Synonyms x8. Word classes Tralogy - 3rd of March 2011 13

×