Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

of

Terminology Extraction Tools for Interpreters Slide 1 Terminology Extraction Tools for Interpreters Slide 2 Terminology Extraction Tools for Interpreters Slide 3 Terminology Extraction Tools for Interpreters Slide 4 Terminology Extraction Tools for Interpreters Slide 5 Terminology Extraction Tools for Interpreters Slide 6 Terminology Extraction Tools for Interpreters Slide 7 Terminology Extraction Tools for Interpreters Slide 8 Terminology Extraction Tools for Interpreters Slide 9 Terminology Extraction Tools for Interpreters Slide 10 Terminology Extraction Tools for Interpreters Slide 11 Terminology Extraction Tools for Interpreters Slide 12 Terminology Extraction Tools for Interpreters Slide 13 Terminology Extraction Tools for Interpreters Slide 14 Terminology Extraction Tools for Interpreters Slide 15 Terminology Extraction Tools for Interpreters Slide 16 Terminology Extraction Tools for Interpreters Slide 17 Terminology Extraction Tools for Interpreters Slide 18 Terminology Extraction Tools for Interpreters Slide 19 Terminology Extraction Tools for Interpreters Slide 20 Terminology Extraction Tools for Interpreters Slide 21 Terminology Extraction Tools for Interpreters Slide 22 Terminology Extraction Tools for Interpreters Slide 23 Terminology Extraction Tools for Interpreters Slide 24 Terminology Extraction Tools for Interpreters Slide 25 Terminology Extraction Tools for Interpreters Slide 26 Terminology Extraction Tools for Interpreters Slide 27 Terminology Extraction Tools for Interpreters Slide 28 Terminology Extraction Tools for Interpreters Slide 29 Terminology Extraction Tools for Interpreters Slide 30 Terminology Extraction Tools for Interpreters Slide 31 Terminology Extraction Tools for Interpreters Slide 32 Terminology Extraction Tools for Interpreters Slide 33 Terminology Extraction Tools for Interpreters Slide 34 Terminology Extraction Tools for Interpreters Slide 35 Terminology Extraction Tools for Interpreters Slide 36 Terminology Extraction Tools for Interpreters Slide 37 Terminology Extraction Tools for Interpreters Slide 38 Terminology Extraction Tools for Interpreters Slide 39 Terminology Extraction Tools for Interpreters Slide 40 Terminology Extraction Tools for Interpreters Slide 41 Terminology Extraction Tools for Interpreters Slide 42 Terminology Extraction Tools for Interpreters Slide 43 Terminology Extraction Tools for Interpreters Slide 44 Terminology Extraction Tools for Interpreters Slide 45 Terminology Extraction Tools for Interpreters Slide 46 Terminology Extraction Tools for Interpreters Slide 47 Terminology Extraction Tools for Interpreters Slide 48 Terminology Extraction Tools for Interpreters Slide 49 Terminology Extraction Tools for Interpreters Slide 50 Terminology Extraction Tools for Interpreters Slide 51 Terminology Extraction Tools for Interpreters Slide 52 Terminology Extraction Tools for Interpreters Slide 53 Terminology Extraction Tools for Interpreters Slide 54 Terminology Extraction Tools for Interpreters Slide 55 Terminology Extraction Tools for Interpreters Slide 56 Terminology Extraction Tools for Interpreters Slide 57 Terminology Extraction Tools for Interpreters Slide 58 Terminology Extraction Tools for Interpreters Slide 59 Terminology Extraction Tools for Interpreters Slide 60 Terminology Extraction Tools for Interpreters Slide 61 Terminology Extraction Tools for Interpreters Slide 62
Upcoming SlideShare
What to Upload to SlideShare
Next
Download to read offline and view in fullscreen.

6 Likes

Share

Download to read offline

Terminology Extraction Tools for Interpreters

Download to read offline

Presentation given by Josh Goldsmith at "Interdependence and Innovation – 2nd Cologne Conference on Translation, Interpreting and Technical Documentation" on November 30, 2018

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Terminology Extraction Tools for Interpreters

  1. 1. TERMINOLOGY EXTRACTION TOOLS FOR INTERPRETERS JOSH GOLDSMITH 2ND COLOGNE CONFERENCE ON TRANSLATION, INTERPRETING, AND TECHNICAL DOCUMENTATION NOVEMBER 30, 2018 JG@JOSHGOLDSMITH.COM @GOLDSMITH_JOSH http://xl8.link/ TerminologyExtractionSlides
  2. 2. 1. STATE OF THE ART 2
  3. 3. DEFINITIONS TERM “lexical items belonging to specialized areas of usage” Sager (1990: 2) TERMINOLOGY EXTRACTION “Automatically isolating terminology from texts” Cabré, Estopà & Vivaldi (2001:53) 3
  4. 4. WHY TERMINOLOGY MATTERS FOR INTERPRETERS To be accepted as “insiders” and perceived as “competent,” interpreters must: ● Have sufficient specialized knowledge of the domain ● Know and use domain-specific terminology ● Master phraseology of specialized language Fantinuoli (2012:41) 4
  5. 5. WHY EXTRACT TERMINOLOGY? ● Limited preparation time: materials made available last minute (Pignataro 2012) ● Preparation is time-intensive; generally entails collecting parallel texts and extracting relevant terminology (Fantinuoli 2017) ● Collecting terminology is a regular part of preparing for an assignment (Bilgen 2009:91) ● Preparation front-loads cognitively challenging tasks and can decrease cognitive load while interpreting (Stoll 2009) ● Terminological preparation may improve performance and processing, leading to target language renditions featuring more specialized terminology (Diaz Galaz, 2015) 5
  6. 6. TERMINOLOGY MANAGEMENT SYSTEMS ● Early studies survey professionals about terminology- related needs and practices to develop terminology management tools for interpreters Rütten (2003), Bilgen (2009) ● Researchers analyze tools to see if meet needs Costa, Corpas Pastor & Durán Muñoz (2014, 2017); Will (2015) ● These studies tend to be based on researchers’ subjective assessments of interpreters’ needs rather than on objective criteria Goldsmith (2017) 6
  7. 7. COULD TERMINOLOGY EXTRACTION STREAMLINE PREPARATION? ● Tools could decrease preparation time and allow interpreters to focus on the most relevant terms during preparation Rütten (2003) ● Corpus-based preparation gave rise to better terminology- related performance in simultaneous interpretation Xu (2015) 7
  8. 8. LACK OF TECHNOLOGY AND RESEARCH ● “No tool has been specifically developed to satisfy the needs of interpreters during the preparatory phase” Fantinuoli (2017:24) ● Research has considered key features of terminology extraction tools for translators, but not interpreters Costa, Zaretskaya, Corpas Pastor & Seghiri (2016) 8
  9. 9. TYPES OF AUTOMATIC TERMINOLOGY EXTRACTION SYSTEMS 9 LINGUISTIC Use linguistic knowledge (morphology, etc.) to detect lexical units ▸ Noise tends to be high STATISTIC Use relative frequencies to identify high- frequency lexical units ▸ Hard to find low-frequency terms HYBRID Combine statistical and linguistic measures Cabré, Estopà & Vivaldi (2001:53); Fantinuoli (2012)
  10. 10. ASSESSING TERMINOLOGY EXTRACTION SYSTEMS RECALL “Capacity of the detection system to extract all terms from a document” SILENCE “Terms contained in an analysed text that are not detected by the system” PRECISION “Capacity to discriminate between those units detected by the system which are terms and those which are not” NOISE “The rate between discarded candidates and accepted ones” Cabré, Estopà & Vivaldi (2001:53-56) 10
  11. 11. AIMS OF AUTOMATIC TERMINOLOGY EXTRACTION ● Reduce noise (be accurate) ● Reduce silence (be complete) ● Allow for manual selection of terms and validation of candidate terms Heid (2001) ● “As usability is regarded as being fundamental for the acceptability of an interpreter-oriented tool, a terminology extraction system for interpreters must give priority to precision over recall.” Fantinuoli (2012: 49) 11
  12. 12. 2. STUDY DESIGN AND PARTICIPANTS 12
  13. 13. 1. What tools are interpreters using for terminology extraction? 2. What are the strengths and weaknesses of these tools? 3. In which settings are terminology extraction tools useful? In which settings should they be avoided? 4. What does the terminology extraction process look like? 5. How does terminology extraction compare to other types of preparation? 6. In addition to the term itself, what should these tools extract? 7. What features would an ideal terminology extraction tool offer? RESEARCH QUESTIONS
  14. 14. EXPLORATORY, MULTI-PHASE MIXED METHODS RESEARCH TO ▸ Map the field of terminology extraction tools for interpreters ▸ Develop an instrument to assess tools (Creswell & Clark 2006) SEMI-STRUCTURED IN-DEPTH INTERVIEWS ▸ Develop detailed descriptions, present multiple perspectives, describe process, understand a situation from the inside (Weiss, 1994). ▸ Answers “are rich and thick with qualitative data” (Turner, 2010, p. 756). ▸ Zoom™, Speechmatics™ ▸ Informed consent ▸ Anonymous INDUCTIVE THEMATIC ANALYSIS ▸ Transcribe interviews and inductively derive categories (Kvale, 1996) ▸ Coded with NVivo™ (CAQDAS program) RESEARCH DESIGN
  15. 15. ▸ 10 respondents, all professional interpreters (2 women) ▸ Age 29 – 57 (μ = 42.2) ▸ Domiciled in Europe and North America ▸ 6 members of professional associations (60%) ▸ 2 staff interpreters (20%) ▸ Conference (100%), Media (10%), Court (10%) and Community (10%) interpreting ▸ Experience: 3 – 30 years (μ = 17.7) ▸ Experience using terminology extraction tools: 1 - 17 years (μ = 8.9) ▸ Translation, training, research, administration, voiceovers PARTICIPANTS
  16. 16. PARTICIPANTS’ EXPERIENCE MANUAL SEMI- AUTOMATIC AUTOMATIC PERCENTAGE OF ASSIGNMENTS USED 0 - 100% (μ = 48.0%) 0 - 100% (μ = 18.9%) 0 - 100% (μ = 40.0%) NUMBER OF ASSIGNMENTS USED 0 - 840 (μ = 123.8) 0 - 150 (μ = 17.2) 0 - 600 (μ = 135.6)
  17. 17. THIS IS A PILOT STUDY RESULTS CANNOT BE GENERALIZED, BUT DO AIM TO GIVE A GENERAL OVERVIEW OF TOOLS, EXPERIENCES AND EXPECTATIONS. PERCENTAGES ARE NOT STATISTICALLY SIGNIFICANT OR GENERALIZABLE.
  18. 18. 3. TOOLS USED 18
  19. 19. 19 HARDWARE USED Desktop (50%) Laptop (75%) Tablet (20%) Windows operating system (80%) MacOS (40%) iOS (20%) ▸ Some users utilize multiple devices
  20. 20. 20 InterpretBank (60%) Interpreters’ Help (40%) SketchEngine (20%; 30% used or tested) Intragloss (10%; 40% used or tested) Wordsmith, Terminotix, Readdle Documents, GoodReader, GT4T, dtSearch, Thermostat, as well as an in-house tool at an international organization (10% each) ▸ Users work with or had tested multiple types of terminology extraction software TERMINOLOGY EXTRACTION SOFTWARE USED
  21. 21. 21 Terminology management tools (InterpretBank, Interpreters’ Help, Interplex, MS Access): 100% Annotation tools (Readdle Documents, GoodReader, PDF Exchange Editor, Skim): 50% Terminology database (e.g. IATE): 50% Wikipedia: 40% Linguee: 40% Search Engines: 30% OTHER SOFTWARE USED
  22. 22. 4. THE EXTRACTION PROCESS DIFFERENT APPROACHES TO MANUAL, SEMI- AUTOMATIC AND AUTOMATIC TERMINOLOGY EXTRACTION 22
  23. 23. TYPES OF TECHNOLOGY-ASSISTED TERMINOLOGY EXTRACTION 23 MANUAL User selects terms manually. Tool provides support, e.g., to: ▸ add terms to glossary ▸ look up translation ▸ help manage terms SEMI-AUTOMATIC User provides document(s). Tool suggests terms. User reviews and accepts them. AUTOMATIC User provides document(s). Tool suggests term candidates. Goldsmith (2018)
  24. 24. MONOLINGUAL MANUAL TERMINOLOGY EXTRACTION 24 WITH ANNOTATION
  25. 25. BILINGUAL MANUAL TERMINOLOGY EXTRACTION 25 WITH PARALLEL DOCUMENTS
  26. 26. MONOLINGUAL/BILINGUAL MANUAL TERMINOLOGY EXTRACTION 26 WITH PARALLEL DOCUMENTS
  27. 27. MULTILINGUAL MANUAL TERMINOLOGY EXTRACTION 27
  28. 28. MONOLINGUAL SEMI-AUTOMATIC TERMINOLOGY EXTRACTION 28
  29. 29. MONOLINGUAL SEMI-AUTOMATIC TERMINOLOGY EXTRACTION 29
  30. 30. BILINGUAL SEMI-AUTOMATIC TERMINOLOGY EXTRACTION 30
  31. 31. BILINGUAL SEMI-AUTOMATIC TERMINOLOGY EXTRACTION 31
  32. 32. MULTILINGUAL SEMI-AUTOMATIC TERMINOLOGY EXTRACTION 32 WITH ANNOTATION
  33. 33. MONOLINGUAL/MULTILINGUAL AUTOMATIC TERMINOLOGY EXTRACTION 33
  34. 34. BILINGUAL AUTOMATIC TERMINOLOGY EXTRACTION 34 WITH ANNOTATION
  35. 35. 5. OTHER PREPARATION STRATEGIES 35
  36. 36. 36 OTHER PREPARATION STRATEGIES Read documents (90%) Background reading (50%) Web research (50%) Memorize/drill terms (50%) Manual annotation (40%) Wikipedia (40%) Terminological research (30%) Gisting/text summarization (20%) Automatic translation; Concordancer; Build glossaries collaboratively; Read news; Read technical documents; Practice interpreting on similar topics (10%)
  37. 37. 6. PROS, CONS AND EFFECTIVENESS 37
  38. 38. 38 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (1) FACILITATES PREPARATION Saves time (100%) Provides terminology despite time pressure (90%) Quick extraction from lengthy documents (60%) Less hassle / menial copying and pasting (30%) Automatic annotation of term (and translation) (20%) Better preparation (10%) CONSISTENCY/RELIABILITY Accurate/reliable results from automatic extraction (50%) Consistent preparation (20%)
  39. 39. 39 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (2) TERMINOLOGICAL PRECISION Automatically extract most important / “right” terms (50%) Automatically look up translations on other sites (40%) Automatically extract named entities (10%) Add stop words (10%) Search function (10%) ERGONOMICS Lightweight, portable, small footprint (30%)
  40. 40. 40 STRENGTHS OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY/INTERFACE Parallel scrolling (50%) Easy comparison of bilingual/multilingual texts (30%) Manual highlighting/annotation/bookmarking (20%) Easy to use; easy input of terms; visually appealing; filter/edit results (10% each) EXPORT/STORAGE Export candidates to database (40%) Back up/digitize glossaries (30%) Export in shareable format (20%) Reuse for later assignments (10%)
  41. 41. 41 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (1) PREPARATION Incomplete preparation if only use term extraction (40%) Time-intensive (manual, copy/paste) (20%) Slow with large glossaries (10%) IMPORT/EXPORT/STORAGE Poor export/formatting of exported text (20%) Tool doesn’t recognize format (e.g. line breaks, images) (20%) Compatibility (Mac/PC, etc.) (10%) Poor import of documents/glossaries (10%) Export not provided (10%)
  42. 42. 42 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (2) EXTRACTION Multilingual extraction not supported (50%) Too many terms extracted (50%) Results need cleaning up (30%) Too few/many words in term (20%) Noise (20%) Too few terms extracted (20%) Incomplete extraction (e.g. context missing) (10%) Tool reorders words (10%)
  43. 43. 43 WEAKNESSES OF TERMINOLOGY EXTRACTION TOOLS (3) DISPLAY Poor/incomplete presentation of results (30%) Terminology entry lacks relevant fields (10%) Small screen size (tablet) (10%) CUSTOMIZATION Tools not designed for interpreters (10%) Software doesn’t know user’s individual needs (10%) COST Cost/subscription (20%)
  44. 44. 44 SETTINGS WHERE EXTRACTION TOOLS PREFERRED 80% used extraction when documents available MANUAL Parallel texts (40%) New topic (30%) Few documents (30%) Time permitting (30%) Focus on collocations (10%) Only monolingual documents available (10%) AUTOMATIC Numerous/long documents (40%) For institutions (40%) Time pressure (40%) For hearings (20%) For automatic annotation when glossaries available (20%) Familiar subject matter (10%) All assignments (10%) When onsite (10%)
  45. 45. 45 SETTINGS WHERE EXTRACTION TOOLS AVOIDED Limited / no materials available (50%) Documents not available in digital format (30%) Need to understand content (30%) Text too general (20%) Powerpoint (20%) Faster to read than extract (20%) Recurring meeting/familiar with terminology (10%) Confidentiality (10%) Multilingual documents not available (10%) Vague subject matter (10%) Very large / small glossary available (10%)
  46. 46. 70%of respondents felt terminology extraction was more effective than other types of preparation 46 62.5% BUT ONLY 40% of respondents preferred terminology extraction over other types of preparation of respondents felt terminology extraction tools meet their needs
  47. 47. 90%of respondents felt clients were not aware they used terminology extraction tools. Those who were aware reacted positively (20%) and found it professional (10%) 47 80%of respondents felt colleagues were curious about terminology extraction tools, although some mentioned uninterested colleagues (40%) who were averse to new approaches (20%) or unwilling to change their habits (20%)
  48. 48. 7. THE IDEAL TOOL 48
  49. 49. 49 THE IDEAL TOOL SHOULD EXTRACT Term (100%) Single and multi-word terms (100%) Context/examples (90%) Equivalents in other languages (70%) Source / source document (50%) Definition (40%) Frequencies (40%) Subject matter overview (40%) Collocations / phraseology (30%) Named entities; figures; domain; link to source (20%) Graphical information; images; hyponyms; semantic groupings (10%)
  50. 50. 50 THE IDEAL TOOL ANNOTATION Allow manual annotation (70%) Highlight terms (60%) Highlight phraseology (60%) Print translations above extracted term (40%) Automatically annotate term occurrences from glossary (30%) Manually add sticky notes (30%) Highlight relevant content (20%) Annotations overview pane (20%) Bookmarks; Highlight phraseology; Highlight named entities (10% each)
  51. 51. 51 THE IDEAL TOOL EXTRACTION/TRANSLATION Extract unknown terms (80%) Multilingual extraction available (80%) Statistical extraction/show frequencies (70%) Filter results (manually, chronologically, thematically, by frequency, by agenda item, etc.) (60%) Extract from multiple files (60%) Access external resources from within program (60%) Ignore stop words / decrease noise (60%) View parallel texts & manually extract equivalents (50%) Automatically rank most relevant terms (40%) No clean up necessary; access multiple termbases/dictionaries; search glossaries for extracted terms; tablet and/or stylus interface (30%) ...
  52. 52. 52 THE IDEAL TOOL IMPORT Limited preprocessing / automatic conversion regardless of source file format (40%) Batch upload (30%) Import from parallel resources / in multiple languages (20%) Built-in webcrawler (10%) Import from your institutional calendar (10%) Flawless import (no errors with line breaks, etc.) (10%) Imports pre-existing glossaries (10%)
  53. 53. 53 THE IDEAL TOOL EXPORT Multilingual export (60%) One-click import into database (50%) Export into widely used/compatible formats (30%) Export annotated text (20%) Print from tool (10%)
  54. 54. 54 THE IDEAL TOOL FORMAT AND STORAGE FORMAT Cross-platform (50%) Software suite / integration with terminology management tool (50%) Compatible with mobile devices (30%) “Available on my operating system” (30%) Compatible with translation tools/databases (20%) Checks pre-existing glossaries to avoid duplicates (20%) STORAGE Local storage (40%) Offline to maintain confidentiality (30%) Cloud storage (30%)
  55. 55. 55 THE IDEAL TOOL INTERFACE Link term to context (90%) View parallel texts side by side with synchronous scrolling (70%) Bilingual/multilingual term list (50%) Reliability marker/index (50%) Simple, uncluttered display (40%) Search within source documents (30%) Customize display (30%) Speech recognition interface (20%) Can manually annotate with stylus (20%) Clear color code (20%)/color code for fuzzy matches (20%) Search within/filter exported terms (20%) Extensive information available (20%) ...
  56. 56. 56 THE IDEAL TOOL CUSTOMIZATION Configure number of terms extracted (50%) Configure working languages (40%) Customize external resources (40%) Custom results based on audience/domain/client (40%) Configure term length (n-gram) (30%) Customize display/user interface (30%) Knows interpreter’s preferences (20%); Designed for interpreters (20%) Tool knows interpreter’s background and adjusts accordingly (20%) Configure frequency threshold (20%) Learns from human postprocessing; preconfigure database / fields; configure domain; tool knows where to find information in document (10%)
  57. 57. 8. CONCLUSIONS AND FUTURE RESEARCH 57
  58. 58. 58 CONCLUSIONS (1) Interpreters regularly use manual, semi-automatic and automatic terminology extraction tools. The terminology extraction process differs for every interpreter, although it tends to include document collection, extraction, glossary building, and possible annotation. Interpreters prefer different approaches (manual vs. [semi-]automatic) in different settings, and avoid terminology extraction when documents are not available or digitized or when they need an in-depth understanding of content and have time to read the entire text.
  59. 59. 59 Terminology extraction saves time and can lead to reliable results and terminological precision. Terminology extraction alone may be insufficient. Most respondents felt terminology extraction was more effective than other types of preparation. Most respondents felt that terminology extraction tools did not meet their needs. CONCLUSIONS (2)
  60. 60. 60 Interpreters use a wide variety of terminology extraction software, but few terminology extraction tools are designed for interpreters, and the perfect tool doesn’t exist yet. Minimally, the ideal tool should extract unknown terms, context, and translations and offer multilingual extraction, filtering of results, access to terminological resources, multilingual export, manual annotation, parallel scrolling, bilingual/multilingual term lists and significant customization. CONCLUSIONS (3)
  61. 61. 61 Phase 2: Survey to rank the features of ideal tools and make recommendations to designers Phase 3: Use weighted rankings to assess existing tools and make recommendations to practitioners. FUTURE WORK
  62. 62. THANK YOU! jg@joshgoldsmith.com @Goldsmith_Josh http://xl8.link/ TerminologyExtractionSlides 62
  • LaurieLebert

    Nov. 27, 2021
  • JaffeeJiahuiLu

    Jul. 6, 2021
  • duycduycduyc

    May. 6, 2020
  • AnaCarolinaPapa1

    Jul. 18, 2019
  • SoniaAmadeo1

    Apr. 13, 2019
  • AlexanderDrechsel2

    Dec. 10, 2018

Presentation given by Josh Goldsmith at "Interdependence and Innovation – 2nd Cologne Conference on Translation, Interpreting and Technical Documentation" on November 30, 2018

Views

Total views

1,586

On Slideshare

0

From embeds

0

Number of embeds

33

Actions

Downloads

46

Shares

0

Comments

0

Likes

6

×