• Like
Porting terminologies to the Semantic Web
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Porting terminologies to the Semantic Web

  • 1,401 views
Published

aka : the Semiotic Web. Presentation at ISKO UK Linked Data Event, London, 2010-09-14

aka : the Semiotic Web. Presentation at ISKO UK Linked Data Event, London, 2010-09-14

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,401
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
31
Comments
0
Likes
4

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Porting terminologies to the Semantic Web (aka: the Semiotic Web) bernard.vatant @ mondeca.com making sense of content TM
  • 2. Mondeca at a glance
    • Facts and figures
      • Established : 1999 - Founder and CEO : Jean Delahousse - Staff 2010 : 22
      • Bernard Vatant has been Senior Consultant for Mondeca since 2000
    • Products
      • Intelligent Topic Manager (Vocabularies and Knowledge base management)
      • CA Manager (Content integration through semantic annotation)
    • Services
      • Consulting and training in Semantic Web technologies deployment
      • Modeling, data and vocabulary migration and integration
    • References
      • Publication, territorial management, tourism, public sector, health
        • Lexis Nexis, Wolters Kluwer, Thomson, BnF, Documentation Française, OPOCE …
      • Participation in many national and european research projects
        • Including DataLift http://datalift.org/ (just about to kick off)
      • Ongoing participation in Semantic Web standards and linked data community
        • From Topic Maps (2000-2001) to OWL, SKOS, …
        • In the Cloud : geonames.org, lingvoj.org ontologies
  • 3. Summary
    • A semiotic view of terminology
      • « Every sign is a thing » : signs (terms) are resources (business objects)
      • The semiotic triangle : terms, concepts and referents
    • Current approaches to term representations
      • SKOS-XL, BS 8723, ISO 25964
      • The Eurovoc model : a term is a denotation of a concept
      • Lexvo.org : a term is a sign defined by string + language
      • ISO TC-37 standards (LMF) only XML schemas, no ontology
    • Moving forward
      • Limits of current approaches
      • A strawman « Simple Term System »
      • Introducing explicit « meaning » objects (aka : references or significations)
  • 4. The pervasive Web – quick reminder
    • Internet (ca.1970)
      • Network of identified, connected and addressable computers
        • Technical support : IP addresses
    • Web 1.0 (ca. 1990)
      • Network of identified, connected and addressable resources
        • Technical support : URLs, http
    • Semantic Web (ca. 2010)
      • Network of identified, connected and addressable representations
        • Technical support : URIs, RDF, content negociation
      • Just about anything can be represented and connected
        • People (Social Web), Devices (Web of Things), Places (GeoSemantic Web), Concepts (Web of Vocabularies) … «  Everything is a Thing »
    • Everything? Even signs?
  • 5. Every sign is a thing (& vice versa) http://fr.wikipedia.org/wiki/Fichier:Impasse_%C3%A0_sens_unique.jpg Impasse Saint-Quentin
  • 6. The semiotic triangle : road signs impasse , cul-de-sac , voie sans issue, no through road, dead end, 死路 … have to get out using the path you get in … sometimes no way to get out at all « signifiant » « signifié » « référent » denotation representation
  • 7. The semiotic triangle : lexical signs (terms) L’Arctique est la région entourant le pôle Nord de la Terre à l’intérieur et aux abords du cercle polaire nord (Wikipédia) ‘ Arctique’@fr « signifiant » « signifié » « référent » denotation representation
  • 8. Sorting out Terms, Concepts and Things
    • Terms are lexical entities (signifiants)
      • Generally used as denotations for concepts or things
      • If possible qualified by terminologists
      • Expressed in some identified natural language
        • Devil in the details : encoding system, scripting system.
    • Concepts are specific representations of « things »
      • In a certain view of the world
      • For a specific functional purpose
        • Indexing, classification, search, inference
    • Things are ... just things
      • What users are about at the end the day (people, places, products, ideas …)
    • Terms, Concepts and Things should all be first-class citizens in the Semantic Web
      • Switching from a term-centric to a concept-centric view …
        • Like in SKOS and ISO 25964
      • … does not mean that terms and terminology are out of the picture!
        • They simply need to be defined and managed at a different level
  • 9. Translation into Semantic Web languages Something « référent » Concept « signifié » Term « signifiant » denotes represents owl:Thing http:// dbpedia.org / resource / Arctic skos:Concept http:// stitch.cs.vu.nl / vocabularies /rameau/ ark :/12148/cb11940481m skosxl:Label http:// lexvo.org /id/ term /fra/Arctique foaf:focus lvont:means ‘ Arctique’@fr skosxl:literalForm skosxl:prefLabel
  • 10. Concept-centric approach of terms (SKOS)
    • The concept-centric approach put concepts at the center of discourse
      • Terms are denotations of concepts
      • Standalone terms can be considered in theory, but not in practice
    • Minimal, shallow level of description of terms
      • Basic properties : lexical form + language
      • No support for proper lexical properties
        • Part of speech, lemma, tokenization, variant
      • Basic expressivity for term-to-term relationships
        • skosxl:labelRelation is just an abstract superproperty
    • Good expressivity of the term-to-concept relationships
      • But clearly asserted from a concept viewpoint
    • No support for context
      • Implicit context : the term-concept relationship inside a given concept scheme
    • Similar approach used by BS 8723 and ISO 25964
      • Also used in EUROVOC model with customized extensions
  • 11. Concept-centric approach to homographs
    • A term can denote more than one concept
      • aka: homography, ambiguity … issue
    • Q : Are homograph terms (denoting different concepts) the same resource, or not?
      • In other words : should they be given the same URI?
    • The SKOS-xl approach
      • SKOS-xl statement : If two instances of the class skosxl:Label have the same literal form, they are not necessarily the same resource.
      • IOW : Existence of distinct terms (distinct URIs) bearing the same literal form in the same language is not forbidden .
        • « table@en » can be the literal form of different terms (different URIs), e.g., denoting different concepts such as « table (furniture) », « table (data base) » …
      • SKOS-xl does not enforce this distinction, either
        • Using the same term (same URIs) for different concepts is not forbidden
  • 12. Concept-centric model : EUROVOC
    • EUROVOC model is built as extension of SKOS
    • Subclasses of skosxl:Label
      • eu:ThesaurusTerm, eu:PreferredTerm, eu:SimpleNonPreferredTerm …
        • Type of term defined by the type of relationship to a concept
      • No « standalone definition » of a term : a term is attached to a single concept
    • Specific relationships between terms
      • Translation, Permuted lexical form
      • Full name/short name, Acronym/expansion
    • No lexical (grammatical) level properties
      • Neither POS, lemma, variants …
    • Homographs are distinct terms
      • Hence homographs attached to different concepts
        • Have different URIs …
        • … are not linked whatsoever, except appearing as sibling results of a query …
        • … should not occur since EUROVOC should be a unique name space
  • 13. A concept representation in EUROVOC as seen in Mondeca back-office (ITM) pref label in current language concept attributes preferred term in current language preferred terms in other languages User language choice (25 languages available) concept schemes hierarchy (domains and microthesauri) related concepts
  • 14. A concept representation (continued) non-preferred terms in various languages broader-narrower hierarchy Display uses terms in current user language
  • 15. Term representation level lexical form term type term attributes The term « meaning » concept Display uses the preferred term in current user language relationships between terms User language choice (25 languages available)
  • 16. The term-centric (semiotic) approach
    • As used by Lexvo.org
    • A term is uniquely defined by a string and a language
      • This definition is made functional in the URI structure
      • Example : http:// lexvo.org /id/ term /fra/Arctique
    • A term can have zero or more declared « meanings »
      • Values of the « lvont:means » property
    • The URI is functional whether there is zero, one or more declared « meanings »
    • Simple approach, but the number of meanings is to everyone guess
      • http://www.lexvo.org/id/term/eng/hubject
        • No meaning found in the data base, but the world is open 
      • http://www.lexvo.org/id/term/eng/photosphere
        • Two meanings found, linked by a lexvont:nearlySameAs relationship
      • http://www.lexvo.org/id/term/eng/table
        • How many meanings?
  • 17. What « table@en » means many more of the same…
  • 18. ISO TC-37 terminology standards
    • Build up on top of various other (ISO) standards
    • Define a lot of data models or schemas
      • Either UML or XML schemas
    • Dwelve in deep complex lexical details
      • Addressing fine-grained terminology management issues
    • But provide no interoperability with the Semantic Web universe
      • Not even as informative annexes
    • Example : Lexical Markup Framework
      • An attempt to produce an OWL representation of LMF model
      • Neither normative nor even OWL-conformant
      • Been sitting useless on LMF website for two years.
        • Any feedback? Does anyone really care? http://www.lexicalmarkupframework.org/
    • Even if published in Semantic Web formats
      • Chances of mainstream adoption are weak
      • Due to their sheer complexity…
  • 19. Adding context to the semiotic triangle http://sw.opencyc.org/2009/04/07/concept/en/Table_PieceOfFurniture ‘ table’@en « signifiant » « signifié » « référent » denotation representation Furniture « context »
  • 20. Context of meaning in existing approaches
    • In SKOS and concept-centric models
      • The context of the meaning is the Concept Scheme <http://id.loc.gov/authorities/sh85131792#concept>a skos:Concept [ skos:prefLabel ‘Table@en’ skos:inScheme http://id.loc.gov/authorities#topicalTerms> ]
      • Reads from the viewpoint of the term
        • ‘ Table’ is the english preferred term for concept ‘ #sh85131792’ in the context of LCSH topical terms
    • In the purely semiotic approach of Lexvo.org
      • The only context is the declared language
      • Ambiguity is assumed, but not resolved
      • A term description is a bag of possible meanings ad translations
      • Useful, but not enough
    • In a nutshell, regarding context
      • Concept-centric approach is too restrictive …
      • Lexvo.org approach is too open …
  • 21. Trying to capture context
    • Context can be more than an implicit skos:ConceptScheme
      • A language
      • A country, a community
      • A document or corpus lexical context
      • Any combination of the above …
    • Actually a context might be any kind of relevant resource
      • Including list of resources
    • Neither term or concept should be linked directly to a context
      • Need to define « reference » or « meaning » resources
      • Linking one term to one concept and one context
      • Allowing attachement of metadata (e.g., Dublin Core)
  • 22. Requirements for « STS »
    • STS = « Simple Terminology System »
      • aka : « Simple Terminology Semiotics »
    • As simple as SKOS is for representation of concepts
      • And as extensible
    • Based on core classes of LMF or any relevant ISO TC-37 model
      • Simpler than LMF but extensible to capture all LMF subtleties
    • Interoperable with concept layers formats (SKOS and SKOS-xl)
    • As open and robust as the semiotic approach of Lexvo.org
    • Including representation of context/meanings/references
    • And of course recommended by a relevant standard body 
      • Food for another W3C recommandation track?
  • 23. STS draft model (built upon lexvo ontology) lvont:Term sts:Context sts:signifier sts:Meaning skos:Concept sts:inContext sts:signified anything sts:contextPropery geo:SpatialThing sts:spatialContext time:Period sts:timeContext skos-xl:Label lvont:Language lvont:language rdf:Literal skosxl:literaForm sts:lexicalProperty Dublin Core metadata dcterms:* anything extensions to fit e.g., TC-37 LMF schemas or EUROVOC management specifics …
  • 24. Ready for a standardization track ?