Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Making Terms Matter 2015. Kara Warburton, Termologic


Published on

From the conference on September 25, 2015 in Stockholm:
Better, faster, cheaper!
Terminology management for big data and explosive content growth.

Published in: Business
  • Be the first to comment

  • Be the first to like this

Making Terms Matter 2015. Kara Warburton, Termologic

  1. 1. New Frontiers in Terminology work Kara Warburton
  2. 2. The frontiers of terminology work are extending to such a degree that we are no longer dealing with terms, but with subsegment level linguistic data of various kinds, which are needed to process information in the digital age.
  3. 3. Lexical data has many uses ● Computer-assisted translation (CAT) ● Controlled authoring (CA) ● Content Management Systems (CMS) ● Globalization Management Systems (GMS) ● Business process management (BPM) ● Global branding: products, features, marketing ● SEO and search keywords ● Spell checkers, typeahead, machine-translation, indexing ● NLP and text mining, e.g. sentiment analysis, opinion mining, information forensics
  4. 4. A new operational framework is needed ● Factors driving the changes: advances in technology, diversity of applications, increased availability of large-scale corpora, “industrialization” of terminology ● Changes in the notion of termhood – what we agree to “manage” ● Changes in theory, mission, basic principles ● Changes in methodology – how and what we do
  5. 5. 1990 2000 2010 2020 0 2 4 6 8 10 12 14 Normalization aim Crowdsourcing Range of tools Role of text Units accepted Role of concept Scope of applications Trends we have been witnessing
  6. 6. Classical notions of termhood are being challenged ● Classical definition of a term: – the designation of a concept in a structured concept system of a field of special knowledge (subject field). ● Now guided by two factors**: – relevance to the corpus – lexical structures that are “stable” and “salient” in a given corpus – relevance to the intended application – purposeful, productive, economical, efficient, internally coherent ** Bourigault, D., and Jacquemin, C. 2000. Construction de ressources terminologiques. In J-M. Pierrel, editor, Ingénierie des langues. Hermès, Paris.
  7. 7. Definition from the “Textual theory of terminology*” A term is a construct that takes shape through an analysis which gives consideration to corpus evidence, validation by subject-matter experts, and the purpose of the terminographical product According to the intended purpose, a collection of “terms” can differ according to ● which lexical units are retained ● how they are documented * See works of D. Bourigault, C. Roche, A. Condamines, Slodzian, and M-C. L'Homme.
  8. 8. Repurposability requires... ● A detailed, comprehensive data model – Adherence to ISO standards, and principles – Takes into account different applications – Emphasis on textual context and concept relations ● A terminology management system (TMS) that supports such a data model ● Term selection criteria (termhood) according to purpose
  9. 9. Lack of structure reduces reuse potential
  10. 10. Knowledge bases Are more repurposable than “flat” termbases ● Rich with concept relations ● Multi-level subject-field hierarchy ● Multi-media
  11. 11. Multi-level subject field hierarchy
  12. 12. Multimedia
  13. 13. © Termologic, 2014. All rights reserved. Search query contraction ? ? ? Facetted search without structured lexical resources
  14. 14. Global Search Engine Optimization ● Increase traffic to a website by improving the site's rank in search engines ● A key SEO method is to add search keywords strategically to web sites
  15. 15. Keyword Effectiveness Index (KEI) volume of searches* per day 2 number of competing pages (hits) • value greater than 1 is ideal but often difficult • values lower than 1 can still be good keywords * you can get this data from:
  16. 16. Enterprise search can beat Google ● How can we associate the user's search words with other different yet closely-meaning words that are present in the text? ➔ Load the SE with a lexical resource (LR) comprising terms from the domain in question. ● Can we do this for global SEO (i.e. Google, Baidu, Yandex, etc.)? ➔ No. The target domain of a search in a global SE is unknown ➔ We can't load a global SEO with an LR ● Can we do this for an enterprise search (e.g. or ➔ YES! loafers shoes moccasins chappals sandals
  17. 17. knowledge base feeds into enterprise search If a user searches for “Venus”, the SE knows it is not the tennis player. A search for “planet” could suggest all individual planets as alternate searches.
  18. 18. Leveraging big data Using various NLP tools, terminologists can base decisions on objective statistical measures ● Generation of sailient unigrams ● Term extraction tools ● Concordancing software ● Collocations ● Pattern clustering ● Concept maps
  19. 19. Salient unigrams
  20. 20. Concordance
  21. 21. Collocations
  22. 22. Patterns
  23. 23. Collocations of “dimension”
  24. 24. Expansion of bigram to trigram
  25. 25. Like the cameleon who changes colours to adapt to his environment, terminologists need to adapt to new conditions. While respecting the traditions of the past where it makes sense, we need to also be prepared to unshackle ourselves from those traditions in order to play a greater role in the evolution of information technology.