Your SlideShare is downloading. ×
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Upcoming SlideShare
Loading in...5

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Porting Library Vocabularies to the Semantic Web - IFLA 2010


Published on

Presentation at IFLA 2010

Presentation at IFLA 2010

Published in: Technology
  • Be the first to comment

No Downloads
Total Views
On Slideshare
From Embeds
Number of Embeds
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

No notes for slide


  • 1. 2010-08-15 Session 149 Information Technology, Cataloguing, Classification and Indexing with Knowledge Management Porting library vocabularies to the Semantic Web, and back A win-win round trip [email_address] making sense of content TM
  • 2. Summary
    • Libraries and the Web have a twenty-years affair of love and hate which should now come of age.
    • The role of vocabularies is critical in this affair.
    • The Linked Data architecture should leverage proven heritage vocabularies instead of reinventing them.
    • Specific features of library vocabularies make them more or less portable and useful to the Semantic Web.
    • To-do list and guidelines for vocabulary audit and publication.
    • Semantic Web tools feedback : helping vocabulary management.
  • 3. Libraries and the Web, a love and hate story
    • Libraries have been around for thousands of years
    • The Web is barely in its twenties
    • Webbies has been claiming the Web was bound to become the Universal Library
      • Bottom line : traditional libraries are obsolete
    • Librarians have been claiming the Web is a mess and will never improve
      • Bottom line : keep using libraries for serious stuff
    • But they look at each other with fascination
      • The Web : if only we could be as efficient as libraries for classification and index
      • Libraries : if only we could scale at the size of the Web, and be as user-friendly
    • They are bound to be married at the end of the day !
  • 4. What is there? Everything!
    • In libraries
      • Resources (aka records) and catalogues
      • Authority lists (aka descriptions of « real-world entities »)
      • Subject headings, thesaurus, classification schemes (aka vocabularies)
      • Metadata linking resources to entities and subjects
    • On the Web
      • Resources (aka web pages) w/o catalogue
      • Real-world entities descriptions (w/o authority lists)
        • Exemple : Wikipedia, Facebook
      • Profusion of vocabularies, but w/o general schemes
        • Often called « taxonomies », handcrafted for user navigation
      • More and more metadata based on RDF family of standards
    • Vocabularies are the missing pieces of the Semantic Web
      • Libraries are the natural providers!
  • 5. Web vs Libraries (1.0 view) experimental, wild polished, proven methods unknown controlled type of content unknown, organic growth manageable size search algorithms vocabulary-based search and retrieval local, quality unknown native, vocabulary-based classification local, quality unknown native, vocabulary-based organization of content unknown, organic growth controlled content global local, focused scope The Web Library
  • 6. Attempts for organizing the Web #fail
    • Directories
      • Could not cope with the scale of Web growth
      • Were often built by amateurs in classification and vocabulary management
      • Were biased by the commercial use of the Web
    • Vocabularies
      • Open Directory categories
      • Wikipedia categories
      • Globally messy, organic growth
    • Metadata in html <head>
      • Spammed, not in sync with the content
      • Ignored by most search engines now
    • Bottom line : The Web is not and will never be a Global Library
  • 7. What the Web is good at
    • Creating representation of « things »
      • Wikipedia pages
      • Facebook pages
      • Pages for products, species, places …
    • Providing standard identifiers (URI) associated to access protocol (http)
      • Identity of things is encapsulated in resources URIs
    • Linking things together
      • Via http protocol, hypertext etc
    • Semantic Web is just an extension of the Web
      • Leveraging all the above features
      • Expliciting the semantics of URIs and descriptions
      • Allowing better, less ambiguous access to resources
  • 8. The Semantic Web in perspective
    • INTERNET (ca.1970)
      • Network of identified, connected and addressable computers
        • Similar to libray infrastructure level : buildings, rooms, shelves …
        • Technical support : IP addresses
    • WEB 1.0 (ca. 1990)
      • Network of identified, connected and addressable resources
        • Similar to library resources level : books, documents …
        • Technical support : URLs, http
    • Semantic Web (ca. 2010)
      • Network of identified, connected and addressable concepts
        • Similar to library vocabulary level : thesaurus, classification, authority lists
        • Technical support : URIs, RDF, content negociation
  • 9. Vocabularies as Core Data
    • Definition of « Core Data » (Hannemann & Kett, 2010)
      • Stable and reliable
      • persistent nodes with a strict, transparent policy : data provenance, no deletions, versioning
      • maintained or backed by trusted public organizations
      • standards based
    • Library Vocabularies are just that!
      • Or at least they should play this role
    • Vocabularies can be used on the Web as in libraries
      • Despite the difference in scope and size
    • Based on shared metadata standards
      • That’s where the Semantic Web comes in
  • 10. A roadmap for Semantic Web migration
    • Audit
      • Choose the vocabularies which are worth publishing
      • Sort out terms, concepts and things
      • Explicit the semantics of your specific syntactic constructs
      • Check the actual transitivity of your hierarchies
      • Figure the translation into vocabularies/ontologies popular on the Web
    • Make ready for publication
      • Package by domains
      • Define a strict URI policy, including versioning
      • Map to other vocabularies
    • Integration
      • Expose and promote your Vocabulary as a Service
        • Using de-referencable URIs and SPARQL endpoints
      • Use Semantic Web software for vocabulary management
        • To ensure native standard conformance and logical consistency
  • 11. Semantic audit of vocabularies ?
    • Publication for the Semantic Web can be a painful process
    • Not only technically (formats etc) but conceptually
    • Making semantic explicit often shows clearly where you’ve gone wrong
    • But it’s an healthy process anyway
    • Better do most of the audit before any publication on the Web
    • But publishing early can trigger useful Web community feedback …
  • 12. Which vocabularies are worth it?
    • Named entities, authority files
      • A growing number of entities already defined in the Linked Data Cloud …
        • 3.4 million « things » identified and described at
        • Over 7 million « features »identified and described at
        • 1,687,119 entries and counting
      • Consider if duplicate efforts are worth it
        • Should you throw away our yet-another-Victor Hugo entry?
        • No, but link to other descriptions in the Cloud (based on http URI) and keep existing identifiesr for retro-compatibility
    • Taxonomies, subject headings, classifications
      • That’s where library heritage is strong and the Web is weak
      • Such vocabularies can be structuring for the web of data as they are for libraries
      • Their publication should be a priority !
  • 13. Sort out Terms, Concepts and Things
    • Terms are denotations for concepts
      • In a given language
      • If possible qualified by vocabulary specialists
    • Concepts are specific representations of « things »
      • In a certain view of the world
      • For a specific functional purpose in mind
    • Things are ... just things
      • What users are about at the end the day (people, places, products …)
    • Terms, Concepts and Things should all be first-class citizens in the Semantic Web
      • Switching from a term-centric to a concept-centric view
        • That’s what SKOS and ISO 25964 … are all about
      • Does not mean that terms and terminology are out of the picture!
        • They simply need to be defined and managed at a different level
  • 14. Building on SKOS and extensions Thing Concept Term denotes represents owl:Thing skos:Concept skos-xl:Label foaf:focus * skos-xl:prefLabel * Under discussion geo:Feature foaf:Person time:TemporalEntity
  • 15. Explicit the semantics of your syntax
    • < http:// / authorities /sh00000562 > a skos:Concept
      • skos:prefLabel ‘Environmental justice--Religious aspects--Buddhism, [Christianity, etc.]’
    • What does such an aggregation mean?
      • Has « -- » the same semantics in all subject headings in LCSH?
      • Is yes, which one?
      • Same questions for [ ]
    • How does this concept link to its components?
      • Currently it does not, although they are defined elsewhere in LCSH
      • : Environmental justice
      • : Environmental justice--Religious aspects
      • : Buddhism
    • Expliciting the link between the above concepts would definitely add value!
      • To do : figure out how (using flavors of skos:semanticRelation)
  • 16. Make sense of hierarchies
    • From LCSH hierarchy
      • Auxiliary sciences of history
      • .Civilization
      • ..Learning and Scholarship
      • … Humanities
      • … .Philosophy
      • … ..Psychology
      • …… Attention
      • …… .Listening
      • …… ..Eavesdropping
      • ……… Wiretapping
    • Kind of semantic drift all the way down
      • Every local relation makes sense, globally it’s weird if transitivity applies
      • Bust most automatic systems will rely on transitivity as default feature
    • Either fix it, or specify the hierarchy is not transitive
  • 17. Consider how the vocabulary will be used
    • The Web is an open world
      • Whatever is not explicitly forbidden is allowed
      • Whereas in closed library practice, whatever is not explicitly allowed is forbidden
      • So be prepared to all sort of misuses if you let room for interpretation
    • Package by domains
      • Web users will be happy to integrate hundreds of concepts rather than millions
      • Small, focused vocabularies are more re-usable than general ones
      • The widest the scope, the more room for ambiguity!
    • Package by versions
      • Versioning at vocabulary level and/or at concept level (open issue)
        • Should a concept keep the same URI in successive versions?
        • When has a concept changed enough to be replaced by a different one?
      • Never delete a concept, deprecate it if necessary
        • Using e.g. dcterms:isReplacedBy
        • Concepts have a life cycle, but cool URIs don’t change
  • 18. Mapping to other vocabularies
    • A most important but still open area
      • Dealing with hard notions like identity, similarity, sameness …
    • Tools for help to alignment are emerging
      • See e.g. ONAGUI
      • Work in progress (TAE project)
        • More background in ontology mapping than vocabulary mapping
    • Mapping at the concept level seems to make sense
      • SKOS provides basic vocabulary for simple mapping
      • But no provision for mapping a simple concept to an aggregate
      • In particular no boolean operators (Actor AND Musician vs Actor OR Musician)
    • Alignment of « things » is a contentious area
      • See various debates on use and abuse of owl:sameAs
  • 19. Ready to publish?
    • Follow recommended (best) practices
      • See e.g.
    • Provide usable packaging (see above, particularly if the vocabulary is large)
      • 500 Mo dump with one single SKOS file is not the most manageable form!
    • Choose and expose a clear licensing policy
      • Using e.g. Creative Commons license model
    • Make all concept URIs de-referencable
      • Using content negociation : one description for machines, one for humans
    • Publish a vocabulary description
      • For humans and for machines (using e.g., SIOC)
    • Publish your own use of the vocabulary
      • In metadata of your records
  • 20. Spread the word, join the community!
    • The Web is an open collaborative entreprise …
    • Push what you’ve done outside the library world
      • Join linked data and Semantic Web forums and work groups
      • Be ready to answer feedback
    • The Semantic Web will be a Social Web
      • Look out how your vocabulary will be adopted or not by social applications
      • Facebook, Twitter and the like …
    • And of course look out for further developments at LLD XG
  • 21. Put Semantic Web software in the back-office
    • Publication of linked data can be done from any information system
      • It can be dealt as yet another publication issue …
    • But it’s simpler if semantic formats are dealt natively in the back-office
      • Supporting import/export of vocabularies in the system in standard formats
        • Native RDF and SKOS integration
      • Supporting semantic queries in the back-office
      • Supporting sanity checking, inference rules
    • … And making librarians natively at ease with semantic technologies!
      • Not the least part of it
    • The technology is mature, software is on the market …
      • Time to think about it!
      • So, just ask 
  • 22.
    • Thanks for your attention
    • Questions?