16. Anne Schumann (USAAR) Terminology and Ontologies 1

1,892 views

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,892
On SlideShare
0
From Embeds
0
Number of Embeds
723
Actions
Shares
0
Downloads
21
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

16. Anne Schumann (USAAR) Terminology and Ontologies 1

  1. 1. Terminology and Ontologies Section 1: Basics Anne-Kathrin Schumann Saarland University “Expert“ Winter School Birmingham November 13, 2013
  2. 2. Overview     Why terminology? Terms and concepts Conceptual relations Concept systems and concept-oriented terminology work  Resources and references
  3. 3. Why terminology?  “founder“ of terminology: Eugen Wüster, an engineer  Encyclopedic dictionary Esperanto-German  1931: „Die Internationale Sprachnormung in der Technik, besonders in der Elektrotechnik“ (International language standardization in technology, particularly in electronics)  Founder of TC37 (later ISO)  Teacher at University of Vienna  Interlinguistics/planned languages
  4. 4. Why terminology?
  5. 5. Why terminology?  Controlled languages:  “Controlled language … can be defined as a subset of a language with a restricted grammar and a domain specific vocabulary designed to allow domain specialists to unambiguously formulate texts pertaining to their subject fields“ (Wright, Sue E./Budin, Gerhard: Handbook of Terminology Management, vol. 2, p. 872)  Planned languages, e.g. Esperanto, Ido:  Avoidance of lexical ambiguity by means of the construction of an ambiguity-free lexicon  Avoidance of grammatical ambiguities and preference for easyto-use strutures
  6. 6. Why terminology?  Means of expert communication  Text reception (what is the text about?)  Text production (production of comprehensible texts: correctness, univocity, acceptability of specialised texts)  Means of knowledge transfer for education  Instructive texts (text books)  Expert-to-layman communication: introduction and explicitation of terminology  Popularising texts
  7. 7. Why terminology? Example: specialised text (journal abstract) Terminology 13(1): 2007, 35
  8. 8. Why terminology?  Without knowing the meaning of the terms it is impossible to understand specialised texts  Terms work as “handles“ to units of knowledge (or “units of understanding“, Temmerman)  Terminology is a means of reducing complexity  Correct use of terminology is a prerequisite for membership (credibility, social status, comprehensibility) in a community of experts: need for correct translation!  Means of social distinction?
  9. 9. Why terminology? Example: popularising text (Wikipedia) * Terms are linked to (canonical) definitions and/or explanations * Humans typically acquire this kind of knowledge from specific types of text (educational texts)
  10. 10. Why terminology?  Knowledge management (industry, big organisations) * Strategic management of the knowledge stock of an organisation * Identification of relevant rules, processes and concepts * Provision of information about these items (e.g. intranet, knowledge base) – knowledge transfer * Monitoring and management of knowledge evolution * Research and comparison with other communities‘ knowledge
  11. 11. Why terminology?
  12. 12. Why terminology?  Other applications     domain adaptation of statistical MT systems ontology-based information retrieval QA- and expert systems …
  13. 13. Terms and concepts The basics of structuralist semantics  Concept vs. term – the general language view (graphic by Elke Teich)
  14. 14. Terms and concepts  Concept vs. term – the general language view  but in general language, ambiguities are ubiquitous: the relation between linguistic symbols (words, lexical units) and concepts is m:n m:n m:n m:n
  15. 15. Terms and concepts  Concept vs. term – the general language view (www. leo.org)
  16. 16. Terms and concepts  Concept vs. term – the terminological view  Why are m:n-mappings (read: inconsistent terminology) problematic for specialised domains?  hamper comprehensibility of specialised texts  create semantic ambiguities (to be avoided at all costs in safety-sensitive environments, e. g. medicine, engineering or construction!)  reduce retrieval results  increase translation costs  lower translation quality (in the translation studies point of view, not necessarily in terms of BLEU points)
  17. 17. Terms and concepts  Concept vs. term – the terminological view  Why are m:n-mappings (read: inconsistent terminology) problematic? Examples: Ana Hoffmeister, Volkswagen After Sales Language Service http://fr46.unisaarland.de/fileadmin/user_upload/personen/wurm/Workshops/Hoffmeister_Termi nology_Processes_and_Quality_Assurance.pdf
  18. 18. Terms and concepts  Concept vs. term – the terminological view 1:1 concept: „unit of thought“ – abstract mental representation of typical features (intension) term: • name, designation • arbitrary linguistic symbol 1:n n:1 individual objects: • material • immaterial • extension
  19. 19. Terms and concepts  Wüster‘s answer to lexical ambiguities: active language planning/standardization -> prescriptive intervention into the lexicon of a specialised domain („bewußte Sprachgestaltung“, „Soll-Norm“)  descriptive branches of terminology: corpus-based investigations, term extraction, use of (automatically acquired) terms in other applications
  20. 20. Terms and concepts  What is the added value of the distinction between concepts and terms?  allows us to work with culture- and languageindependent concepts rather than language-specific terms: terminology is not really a linguistic enterprise  concepts are understood as universal (independent of cultures and languages) representations of knowledge
  21. 21. Terms and concepts  What is the added value of the distinction between concepts and terms?  concepts are understood as universal (independent of cultures and languages) representations of knowledge  Abstract away from irrelevant differences BREAD
  22. 22. Terms and concepts  What is the added value of the distinction between concepts and terms?  thus, we can easily map multilingual terms onto one single concept  rather than mapping incommensurable multilingual terms onto each other (difficult: lexical gaps, slight shifts in meaning) Brot, bread, pain, pane, maize, хлеб, …∈ BREAD
  23. 23. Terms and concepts  What is the added value of the distinction between concepts and terms?  we can distinguish between:  conceptual (semantic) relations – relations between concepts (e. g. HUT is-a HOUSE)  lexical relations – relations between lexical units (lemmas) – (e. g. house, n. vs. to house, v.)  grammatical relations – relations between word forms (e. g. house vs. houses)  only conceptual relations are relevant to terminology  no interest in stylistic or connotational differences between terms (designations)
  24. 24. Terms and concepts  Terms are also words, but what is the difference between general language words and specialised terms? general language word term has no specialised meaning can be a homonym of a general language word, but with a distinct specialised meaning (-> mapping to another concept) can be an abbreviation, an acronym or a unit of measurement, a proper name or a symbol (e. g. mathematical symbols) meaning often highly dependent on linguistic context (co-text) meaning defined independently from context less likely to be a foreign word more likely to be a foreign word meaning transparent to competent speakers of given language meaning is part of expert knowledge, non-experts have to look up the concept definition
  25. 25. Terms and concepts  Terms are often (but not always!) complex noun phrases (patterns developed within TTC project: www.ttc-project.eu)
  26. 26. Terms and concepts * Terminological phraseology:  DIN 2342: a fixed group of words containing a verb serving as a designation of a given concept within a specialised language → einen Wechsel ziehen, den Hochofen anstechen, in Phase sein → to pass a bill, to file for divorce  less strict definition: fixed, reproducable, lexicalised and recurrent group of words that is typical for a specialised domain (cf. Gläser (2007): Fachphraseologie, HSK 28:1, 482-505, my translations from German)
  27. 27. Terms and concepts * Terminological phrases have similar properties as single word terms * * * * no expressive or stylistic connotations reference to a context- and culture-independent concept not generally comprehensible (need for explanations!) non-compositional  Boundary cases:  support verb constructions: Einwände erheben vs. einwenden, to make a decision vs. to decide  collocations: to levy taxes/soldiers/troops  multi word terms (MWT) (cf. Gläser (2007): Fachphraseologie, HSK 28:1, 482-505)
  28. 28. Conceptual relations  Conceptual relations  relations between concepts  define where a concept is located within the concept system  important for understanding the concept and for distinguishing it from neighbouring concepts “Semantic relations are at the core of any representational system, and are keys to enable the next generation of information processing systems with semantic and reasoning capabilities.“ (Auger/Barrière 2008:1)
  29. 29. Conceptual relations  Which kinds of relations are relevant to terminology (for concept analysis)?  Wüster: logical relations (similarity between concepts – hierarchical: is-a, siblings etc.) vs. ontological relations (temporal, spatial or causal relations)  terminologies can be represented as graphs:     concepts are nodes relations are edges relation types are edge labels additional information is in the node attributes
  30. 30. Conceptual relations * ISO 12620: 2009: * Generic * Partitive * Temporal * Sequential * Causal * generic, broadcoverage relations, no domain-specific relations! (Nuopponen 1994: 533) * no consensus * synonymy, antonymy?
  31. 31. Conceptual relations * To choose the right TL term candidate, information about semantic relations is needed (esp. in the legal domain) * e.g. retrieved from definitions * but termbases/dictionaries often do NOT provide this information
  32. 32. Conceptual relations Can we improve the representation of terminological information by providing richer descriptions for language workers? For example, by mining explanations, definitions or semantic relations?
  33. 33. Concept systems and conceptoriented terminology work * terminography is concept-oriented (onomasiological approach) * structures descriptions around concepts, not around terms * lexicography is normally designation-oriented (semasiological approach) - > list of lemmas with corresponding enumeration of “word senses“ (www.leo.org)
  34. 34. Concept systems and conceptoriented terminology work * a typical “sense enumerative“ dictionary entry (Tildes Birojs 2013)
  35. 35. Concept systems and conceptoriented terminology work  What are shortcomings of “sense enumerative“ lexicography/terminography?  no method for handling multilinguality, since semantic structures do not coincide across languages (language industry projects may involve up to 20-30 languages or even more including translation to/from pivot languages)  no method for dealing with term variation, since variants are kept apart from preferred terms  no 1:1-mappings between multilingual designations – backtranslation normally leads to a different result -> inconsistent translation
  36. 36. Concept systems and conceptoriented terminology work
  37. 37. Concept systems and conceptoriented terminology work  Separate entries for different concepts in MultiTerm
  38. 38. Concept systems and conceptoriented terminology work  Onomasiological approaches were not “invented“ by terminology, but are ancient achievements of lexicography proper  Thesauri structure our knowledge of the world according to semantic relations, building a hierarchically organised inventory of concepts (similar to the old philosophical understanding of „ontology“)  Dornseiff: Der deutsche Wortschatz nach Sachgruppen  Roget‘s Thesaurus of the English language  О. С. Баранов: Идеографический словарь русского языка
  39. 39. Concept systems and conceptoriented terminology work  Onomasiological approaches were not “invented“ by terminology, but are old achievements of lexicography  Thesauri structure the lexicon according to semantic relations Concept with identifier as part of concept hierarchy Related Concepts Designations for the concept “cosmos“ + related terms
  40. 40. Concept systems and conceptoriented terminology work  Onomasiological approaches were not “invented“ by terminology, but are old achievements of lexicography  Semantic field dictionaries structure the lexicon according to a notion of „semantic proximity“  Schumacher: Verben in Feldern  Шведова: Русский семантический словарь
  41. 41. Concept systems and conceptoriented terminology work  Other kinds of onomasiological resources  A taxonomy is traditionally a scientific system of categories of concepts and hierarchical relations between them  But there are also “folk taxonomies“  Taxonomic approaches have been applied to the description of the lexicon of a given language (e. g. WordNet) (but are they really language-independent?)
  42. 42. Concept systems and conceptoriented terminology work  Other kinds of onomasiological resources  A nomenclature is a list of designations in a given domain, especially in science  e. g. Bacterial nomenclature  http://www.dsmz.de/fileadmin/Bereiche/ChiefEditors/BacterialNo menclature/DSMZ_Bactnames.pdf
  43. 43. Concept systems and conceptoriented terminology work  Other kinds of onomasiological resources  Finally, ontologies  Traditionally a discipline of theoretical philosophy/metaphysics: categorisation of elements of existence  In the narrower AI sense: form of knowledge representation that makes explicit concepts and the relations between them and imposes functions, restrictions, rules, axioms and the like  Ontologies can be lexicalised, but don‘t have to be  Gruber: “An ontology is an explicit specification of a conceptualization.” (http://tomgruber.org/writing/onto-design.pdf)
  44. 44. Concept systems and conceptoriented terminology work  Other kinds of onomasiological resources  Finally, ontologies  Examples:  Cyc, an ontology of common sense knowledge for AI  DOLCE, a descriptive ontology for linguistic and cognitive engineering  SUMO, the suggested upper merged ontology … and many others and many similar
  45. 45. Resources  Ontology languages and knowledge representation specifications:  RDF and RDF Schema  OWL, the web ontology language  SKOS, the simple knowledge organization system (builds on RDF and RDFS)  lemon, a lexicon model for ontologies  RDF, RDF Schema, OWL and SKOS are W3C standards
  46. 46. Resources  Tools and semantic resources:  Protegé, an ontology editor with reasoning component  Snomed CT, Systemazized Nomenclature of Medicine – clinical terms  UMLS, Unified Medical Language System
  47. 47. Resources  Relevant Standards:  ISO 704 (2000): Terminology work – Principles and Methods  ISO 1087-1 (2000): Terminology work – Vocabulary – Part 1: Theory and application  ISO 12620 (2009): Terminology and other language and content resources – Specification of data categories and management of a Data Category Registry for language resources  ISO 30042 (2008): Systems to manage terminology, knowledge and content – Termbase eXchange (TBX) (http://www.ttt.org/oscarStandards/tbx/tbx_oscar.pdf)
  48. 48. Resources  Web pages:  www. isocat.org - ISO TC 37 Terminology and Other Language and Content Resources: data category registry  www.taus.net – association of companies in translation industry with interesting resources, downloadable TMs for members  termcoord.eu – web page of the European Parliament’s terminology coordination unit  tekom.de – German association for technical communication
  49. 49. Resources  Journals and conferences:     Terminology (Benjamins) TIA, Terminologie et Intelligence Artificielle TKE, Terminology and Knowledge Enginerring TEKOM
  50. 50. References: Literature  Auger, Alain / Barrière, Caroline (2008): “Pattern-based approaches to semantic relation extraction”. Terminology 14 (1), pp. 1-19.  Baranov, Oleg S. (1995): Ideografičeskij slovar’ russkogo jazyka. Moskva: ETS.  Dornseiff, Franz (2004): Der deutsche Wortschatz nach Sachgruppen. Berlin: de Gruyter.  Gläser, Rosemarie (2007): “Fachphraseologie”. In Burger et al. (eds.): Phraseologie. Vol.1., pp. 482505.  Gruber, Thomas. (1993): “Toward Principles for the Design of Ontologies Used for Knowledge Sharing”. Human-Computer Studies 43, 907-928.  International Organization for Standardization (2000a): International Standard ISO 704: 2000 (E) – Terminology Work – Principles and Methods. Geneva: ISO.  International Organization for Standardization (2000b): International Standard ISO 1087-1: 2000 – Terminology Work – Vocabulary – Part 1: Theory and application. Geneva: ISO.  International Organization for Standardization (2008): International Standard ISO 30042:2008 Systems to manage terminology, knowledge and content – Termbase eXchange (TBX). Geneva: ISO.  International Organization for Standardization (2009): International Standard ISO 12620: 2009 – Terminology and Other Language and Content Resources – Specification of Data Categories and Management of a Data Category Registry for Language Resources. Geneva: ISO.
  51. 51. References: Literature  Kipfer, Barbara A. (2010): Roget’s International Thesaurus. New York: Collins Reference.  Nuopponen, Anita (1994): “Wüster revisited: On Causal Concept Relationships and Causal Concept Systems”. 9th European Symposium on LSP, Bergen, Norway, August 2-6, 1993, pp. 532-539.  Schumacher, Helmut (1986): Verben in Feldern: Valenzwörterbuch zur Syntax und Semantik deutscher Verben. Berlin: de Gruyter.  Švedova, N. Ju. (2002): Russkij semantičeskij slovar’: Tolkovyj slovar’, sistematizirovannyj po klassam slov i značenij. Moskva: Azbukovnik.  Wright, Sue Ellen / Budin, Gerhard (eds.) (2001): Handbook of Terminology Management. Vol. 2: Application-Oriented Terminology Management. Amsterdam/Philadelphia: John Benjamins.
  52. 52. References: Tools and Resources        http://www.cyc.com/platform/opencyc http://www.loa.istc.cnr.it/DOLCE.html http://www.ontologyportal.org/ http://lemon-model.net/ http://protege.stanford.edu/ http://www.nlm.nih.gov/research/umls/Snomed/snomed_main.html https://uts.nlm.nih.gov/home.html
  53. 53. Contributions to this Presentation  Dr. Ana Hoffmeister, Volkswagen After Sales Language Service  Prof. Elke Teich, Saarland University  Prof. Klaus Schubert, University of Hildesheim
  54. 54. End of part 1 … Thanks for your attention!

×