Porting terminologies to the Semantic Web (aka: the Semiotic Web)  bernard.vatant @ mondeca.com making sense of content  TM
Mondeca at a glance Facts and figures  Established : 1999 - Founder and CEO : Jean Delahousse -  Staff  2010 : 22 Bernard Vatant has been Senior Consultant for Mondeca since 2000  Products Intelligent Topic Manager (Vocabularies and Knowledge base management) CA Manager (Content integration through semantic annotation) Services Consulting and training in Semantic Web technologies deployment Modeling, data and vocabulary migration and integration References Publication, territorial management, tourism, public sector, health Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE … Participation in many national and european research projects Including DataLift  http://datalift.org/  (just about to kick off) Ongoing participation in Semantic Web standards and linked data community From Topic Maps (2000-2001) to OWL, SKOS, … In the Cloud : geonames.org, lingvoj.org ontologies
Summary A semiotic view of terminology « Every sign is a thing » : signs (terms) are resources (business objects) The semiotic triangle : terms, concepts and referents Current approaches to term representations SKOS-XL, BS 8723, ISO 25964 The Eurovoc model : a term is a denotation of a concept Lexvo.org : a term is a sign defined by string + language ISO TC-37 standards (LMF) only XML schemas, no ontology  Moving forward Limits of current approaches A strawman « Simple Term System » Introducing explicit « meaning » objects (aka : references or significations)
The pervasive Web – quick reminder Internet (ca.1970) Network of identified, connected and addressable  computers Technical support : IP addresses Web 1.0 (ca. 1990) Network of identified, connected and addressable  resources Technical support : URLs, http Semantic Web (ca. 2010) Network of identified, connected and addressable  representations Technical support : URIs, RDF, content negociation Just about  anything  can be represented and connected People (Social Web), Devices (Web of Things), Places (GeoSemantic Web),  Concepts (Web of Vocabularies) … «  Everything is a Thing » Everything? Even signs?
Every sign is a thing (& vice versa) http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg   Impasse  Saint-Quentin
The semiotic triangle : road signs impasse ,  cul-de-sac ,  voie sans issue, no through road, dead end,  死路 …  have to get out using the path you get in  … sometimes no way to get out at all « signifiant » « signifié » « référent » denotation representation
The semiotic triangle : lexical signs (terms) L’Arctique est la région entourant le pôle Nord de la Terre à l’intérieur et aux abords du cercle polaire nord (Wikipédia)  ‘ Arctique’@fr « signifiant » « signifié » « référent » denotation representation
Sorting out Terms, Concepts and Things Terms are lexical entities (signifiants) Generally used as denotations for concepts or things If possible qualified by terminologists Expressed in some identified natural language Devil in the details : encoding system, scripting system. Concepts are specific representations of « things » In a certain view of the world For a specific functional purpose Indexing, classification, search, inference Things are ... just things What users are about at the end the day (people, places, products, ideas …) Terms, Concepts and Things should all be first-class citizens in the Semantic Web Switching from a term-centric to a concept-centric view … Like in SKOS and ISO 25964 …  does not mean that terms and terminology are out of the picture! They simply need to be defined and managed at a different level
Translation into Semantic Web languages Something « référent » Concept « signifié » Term « signifiant » denotes represents owl:Thing http:// dbpedia.org / resource / Arctic skos:Concept http:// stitch.cs.vu.nl / vocabularies /rameau/ ark :/12148/cb11940481m skosxl:Label http:// lexvo.org /id/ term /fra/Arctique foaf:focus lvont:means ‘ Arctique’@fr skosxl:literalForm skosxl:prefLabel
Concept-centric approach of terms (SKOS) The concept-centric approach put concepts at the center of discourse Terms are denotations of concepts Standalone terms can be considered in theory, but not in practice Minimal, shallow level of description of terms Basic properties : lexical form + language No support for proper lexical properties Part of speech, lemma, tokenization, variant Basic expressivity for term-to-term relationships skosxl:labelRelation is just an abstract superproperty Good expressivity of the term-to-concept relationships But clearly asserted from a concept viewpoint No support for context Implicit context : the term-concept relationship inside a given concept scheme Similar approach used by BS 8723 and ISO 25964 Also used in EUROVOC model with customized extensions
Concept-centric approach to homographs A term can denote more than one concept aka: homography, ambiguity … issue Q : Are homograph terms (denoting different concepts) the  same  resource, or not? In other words : should they be given the same URI? The SKOS-xl approach SKOS-xl statement :  If two instances of the class skosxl:Label  have the same literal form, they are  not  necessarily the same resource.  IOW : Existence of distinct terms (distinct URIs) bearing the same literal form  in the same language  is not forbidden . « table@en » can be the literal form of different terms (different URIs),  e.g., denoting different concepts such as « table (furniture) », « table (data base) » … SKOS-xl does not enforce this distinction, either Using the same term (same URIs) for different concepts is not forbidden
Concept-centric model : EUROVOC EUROVOC model is built as extension of SKOS Subclasses of skosxl:Label eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm … Type of term defined by the type of relationship to a concept No « standalone definition » of a term : a term is attached to a single concept Specific relationships between terms Translation, Permuted lexical form Full name/short name, Acronym/expansion No lexical (grammatical) level properties Neither POS, lemma, variants … Homographs are distinct terms Hence homographs attached to different concepts  Have different URIs … …  are not linked whatsoever, except appearing as sibling results of a query … …  should not occur since EUROVOC should be a unique name space
A concept representation in EUROVOC as seen in Mondeca back-office (ITM) pref label in current language concept attributes preferred term in current language preferred terms in other languages User language choice (25 languages available) concept schemes hierarchy (domains and microthesauri) related concepts
A concept representation (continued) non-preferred terms in various languages broader-narrower hierarchy Display uses terms  in current user language
Term representation level lexical form term type term attributes The term « meaning » concept  Display uses the preferred term in current user language relationships between terms User language choice (25 languages available)
The term-centric (semiotic) approach As used by Lexvo.org A term is uniquely defined by a string and a language This definition is made functional in the URI structure Example :  http:// lexvo.org /id/ term /fra/Arctique A term can have zero or more declared « meanings » Values of the « lvont:means » property The URI is functional whether there is zero, one or more declared « meanings » Simple approach, but the number of meanings is to everyone guess http://www.lexvo.org/id/term/eng/hubject   No meaning found in the data base, but the world is open   http://www.lexvo.org/id/term/eng/photosphere   Two meanings found, linked by a lexvont:nearlySameAs relationship http://www.lexvo.org/id/term/eng/table   How many meanings?
What « table@en » means many more of the same…
ISO TC-37 terminology standards Build up on top of various other (ISO) standards Define a lot of data models or schemas Either UML or XML schemas Dwelve in deep complex lexical details Addressing fine-grained terminology management issues But provide no interoperability with the Semantic Web universe Not even as informative annexes Example : Lexical Markup Framework An attempt to produce an OWL representation of LMF model Neither normative nor even OWL-conformant Been sitting useless on LMF website for two years. Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/ Even if published in Semantic Web formats Chances of mainstream adoption are weak Due to their sheer complexity…
Adding context to the semiotic triangle http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture ‘ table’@en « signifiant » « signifié » « référent » denotation representation Furniture « context »
Context of meaning in existing approaches In SKOS and concept-centric models The context of the meaning is the Concept Scheme <http://id.loc.gov/authorities/sh85131792#concept>a  skos:Concept  [ skos:prefLabel  ‘Table@en’    skos:inScheme  http://id.loc.gov/authorities#topicalTerms> ] Reads from the viewpoint of the term ‘ Table’ is the english preferred term for concept ‘ #sh85131792’ in the context of LCSH topical terms In the purely semiotic approach of Lexvo.org The only context is the declared language Ambiguity is assumed, but not resolved A term description is a bag of possible meanings ad translations Useful, but not enough In a nutshell, regarding context Concept-centric approach is too restrictive … Lexvo.org approach is too open …
Trying to capture context Context can be more than an implicit skos:ConceptScheme A language A country, a community A document or corpus lexical context Any combination of the above … Actually a context might be any kind of relevant resource Including list of resources Neither term or concept should be linked directly to a context Need to define « reference » or « meaning » resources Linking one term to one concept and one context Allowing attachement of metadata (e.g., Dublin Core)
Requirements for « STS » STS = « Simple Terminology System »  aka : « Simple Terminology Semiotics »  As simple as SKOS is for representation of concepts And as extensible Based on core classes of LMF or any relevant ISO TC-37 model Simpler than LMF but extensible to capture all LMF subtleties Interoperable with concept layers formats (SKOS and SKOS-xl) As open and robust as the semiotic approach of Lexvo.org Including representation of context/meanings/references And of course recommended by a relevant standard body   Food for another W3C recommandation track?
STS draft model (built upon lexvo ontology)  lvont:Term sts:Context sts:signifier sts:Meaning skos:Concept sts:inContext sts:signified anything sts:contextPropery geo:SpatialThing sts:spatialContext time:Period sts:timeContext skos-xl:Label lvont:Language lvont:language rdf:Literal skosxl:literaForm sts:lexicalProperty Dublin Core metadata dcterms:* anything extensions to fit e.g., TC-37 LMF schemas or EUROVOC management specifics …
Ready for a standardization track ?

Porting terminologies to the Semantic Web

  • 1.
    Porting terminologies tothe Semantic Web (aka: the Semiotic Web) bernard.vatant @ mondeca.com making sense of content TM
  • 2.
    Mondeca at aglance Facts and figures Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22 Bernard Vatant has been Senior Consultant for Mondeca since 2000 Products Intelligent Topic Manager (Vocabularies and Knowledge base management) CA Manager (Content integration through semantic annotation) Services Consulting and training in Semantic Web technologies deployment Modeling, data and vocabulary migration and integration References Publication, territorial management, tourism, public sector, health Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE … Participation in many national and european research projects Including DataLift http://datalift.org/ (just about to kick off) Ongoing participation in Semantic Web standards and linked data community From Topic Maps (2000-2001) to OWL, SKOS, … In the Cloud : geonames.org, lingvoj.org ontologies
  • 3.
    Summary A semioticview of terminology « Every sign is a thing » : signs (terms) are resources (business objects) The semiotic triangle : terms, concepts and referents Current approaches to term representations SKOS-XL, BS 8723, ISO 25964 The Eurovoc model : a term is a denotation of a concept Lexvo.org : a term is a sign defined by string + language ISO TC-37 standards (LMF) only XML schemas, no ontology Moving forward Limits of current approaches A strawman « Simple Term System » Introducing explicit « meaning » objects (aka : references or significations)
  • 4.
    The pervasive Web– quick reminder Internet (ca.1970) Network of identified, connected and addressable computers Technical support : IP addresses Web 1.0 (ca. 1990) Network of identified, connected and addressable resources Technical support : URLs, http Semantic Web (ca. 2010) Network of identified, connected and addressable representations Technical support : URIs, RDF, content negociation Just about anything can be represented and connected People (Social Web), Devices (Web of Things), Places (GeoSemantic Web), Concepts (Web of Vocabularies) … «  Everything is a Thing » Everything? Even signs?
  • 5.
    Every sign isa thing (& vice versa) http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg Impasse Saint-Quentin
  • 6.
    The semiotic triangle: road signs impasse , cul-de-sac , voie sans issue, no through road, dead end, 死路 … have to get out using the path you get in … sometimes no way to get out at all « signifiant » « signifié » « référent » denotation representation
  • 7.
    The semiotic triangle: lexical signs (terms) L’Arctique est la région entourant le pôle Nord de la Terre à l’intérieur et aux abords du cercle polaire nord (Wikipédia) ‘ Arctique’@fr « signifiant » « signifié » « référent » denotation representation
  • 8.
    Sorting out Terms,Concepts and Things Terms are lexical entities (signifiants) Generally used as denotations for concepts or things If possible qualified by terminologists Expressed in some identified natural language Devil in the details : encoding system, scripting system. Concepts are specific representations of « things » In a certain view of the world For a specific functional purpose Indexing, classification, search, inference Things are ... just things What users are about at the end the day (people, places, products, ideas …) Terms, Concepts and Things should all be first-class citizens in the Semantic Web Switching from a term-centric to a concept-centric view … Like in SKOS and ISO 25964 … does not mean that terms and terminology are out of the picture! They simply need to be defined and managed at a different level
  • 9.
    Translation into SemanticWeb languages Something « référent » Concept « signifié » Term « signifiant » denotes represents owl:Thing http:// dbpedia.org / resource / Arctic skos:Concept http:// stitch.cs.vu.nl / vocabularies /rameau/ ark :/12148/cb11940481m skosxl:Label http:// lexvo.org /id/ term /fra/Arctique foaf:focus lvont:means ‘ Arctique’@fr skosxl:literalForm skosxl:prefLabel
  • 10.
    Concept-centric approach ofterms (SKOS) The concept-centric approach put concepts at the center of discourse Terms are denotations of concepts Standalone terms can be considered in theory, but not in practice Minimal, shallow level of description of terms Basic properties : lexical form + language No support for proper lexical properties Part of speech, lemma, tokenization, variant Basic expressivity for term-to-term relationships skosxl:labelRelation is just an abstract superproperty Good expressivity of the term-to-concept relationships But clearly asserted from a concept viewpoint No support for context Implicit context : the term-concept relationship inside a given concept scheme Similar approach used by BS 8723 and ISO 25964 Also used in EUROVOC model with customized extensions
  • 11.
    Concept-centric approach tohomographs A term can denote more than one concept aka: homography, ambiguity … issue Q : Are homograph terms (denoting different concepts) the same resource, or not? In other words : should they be given the same URI? The SKOS-xl approach SKOS-xl statement : If two instances of the class skosxl:Label have the same literal form, they are not necessarily the same resource. IOW : Existence of distinct terms (distinct URIs) bearing the same literal form in the same language is not forbidden . « table@en » can be the literal form of different terms (different URIs), e.g., denoting different concepts such as « table (furniture) », « table (data base) » … SKOS-xl does not enforce this distinction, either Using the same term (same URIs) for different concepts is not forbidden
  • 12.
    Concept-centric model :EUROVOC EUROVOC model is built as extension of SKOS Subclasses of skosxl:Label eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm … Type of term defined by the type of relationship to a concept No « standalone definition » of a term : a term is attached to a single concept Specific relationships between terms Translation, Permuted lexical form Full name/short name, Acronym/expansion No lexical (grammatical) level properties Neither POS, lemma, variants … Homographs are distinct terms Hence homographs attached to different concepts Have different URIs … … are not linked whatsoever, except appearing as sibling results of a query … … should not occur since EUROVOC should be a unique name space
  • 13.
    A concept representationin EUROVOC as seen in Mondeca back-office (ITM) pref label in current language concept attributes preferred term in current language preferred terms in other languages User language choice (25 languages available) concept schemes hierarchy (domains and microthesauri) related concepts
  • 14.
    A concept representation(continued) non-preferred terms in various languages broader-narrower hierarchy Display uses terms in current user language
  • 15.
    Term representation levellexical form term type term attributes The term « meaning » concept Display uses the preferred term in current user language relationships between terms User language choice (25 languages available)
  • 16.
    The term-centric (semiotic)approach As used by Lexvo.org A term is uniquely defined by a string and a language This definition is made functional in the URI structure Example : http:// lexvo.org /id/ term /fra/Arctique A term can have zero or more declared « meanings » Values of the « lvont:means » property The URI is functional whether there is zero, one or more declared « meanings » Simple approach, but the number of meanings is to everyone guess http://www.lexvo.org/id/term/eng/hubject No meaning found in the data base, but the world is open  http://www.lexvo.org/id/term/eng/photosphere Two meanings found, linked by a lexvont:nearlySameAs relationship http://www.lexvo.org/id/term/eng/table How many meanings?
  • 17.
    What « table@en » meansmany more of the same…
  • 18.
    ISO TC-37 terminologystandards Build up on top of various other (ISO) standards Define a lot of data models or schemas Either UML or XML schemas Dwelve in deep complex lexical details Addressing fine-grained terminology management issues But provide no interoperability with the Semantic Web universe Not even as informative annexes Example : Lexical Markup Framework An attempt to produce an OWL representation of LMF model Neither normative nor even OWL-conformant Been sitting useless on LMF website for two years. Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/ Even if published in Semantic Web formats Chances of mainstream adoption are weak Due to their sheer complexity…
  • 19.
    Adding context tothe semiotic triangle http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture ‘ table’@en « signifiant » « signifié » « référent » denotation representation Furniture « context »
  • 20.
    Context of meaningin existing approaches In SKOS and concept-centric models The context of the meaning is the Concept Scheme <http://id.loc.gov/authorities/sh85131792#concept>a skos:Concept [ skos:prefLabel ‘Table@en’ skos:inScheme http://id.loc.gov/authorities#topicalTerms> ] Reads from the viewpoint of the term ‘ Table’ is the english preferred term for concept ‘ #sh85131792’ in the context of LCSH topical terms In the purely semiotic approach of Lexvo.org The only context is the declared language Ambiguity is assumed, but not resolved A term description is a bag of possible meanings ad translations Useful, but not enough In a nutshell, regarding context Concept-centric approach is too restrictive … Lexvo.org approach is too open …
  • 21.
    Trying to capturecontext Context can be more than an implicit skos:ConceptScheme A language A country, a community A document or corpus lexical context Any combination of the above … Actually a context might be any kind of relevant resource Including list of resources Neither term or concept should be linked directly to a context Need to define « reference » or « meaning » resources Linking one term to one concept and one context Allowing attachement of metadata (e.g., Dublin Core)
  • 22.
    Requirements for « STS »STS = « Simple Terminology System » aka : « Simple Terminology Semiotics » As simple as SKOS is for representation of concepts And as extensible Based on core classes of LMF or any relevant ISO TC-37 model Simpler than LMF but extensible to capture all LMF subtleties Interoperable with concept layers formats (SKOS and SKOS-xl) As open and robust as the semiotic approach of Lexvo.org Including representation of context/meanings/references And of course recommended by a relevant standard body  Food for another W3C recommandation track?
  • 23.
    STS draft model(built upon lexvo ontology) lvont:Term sts:Context sts:signifier sts:Meaning skos:Concept sts:inContext sts:signified anything sts:contextPropery geo:SpatialThing sts:spatialContext time:Period sts:timeContext skos-xl:Label lvont:Language lvont:language rdf:Literal skosxl:literaForm sts:lexicalProperty Dublin Core metadata dcterms:* anything extensions to fit e.g., TC-37 LMF schemas or EUROVOC management specifics …
  • 24.
    Ready for astandardization track ?