Your SlideShare is downloading. ×
Porting Library Vocabularies to the Semantic Web - IFLA 2010
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Porting Library Vocabularies to the Semantic Web - IFLA 2010

1,529

Published on

Presentation at IFLA 2010

Presentation at IFLA 2010

Published in: Technology
0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,529
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
18
Comments
0
Likes
3
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. 2010-08-15 Session 149 Information Technology, Cataloguing, Classification and Indexing with Knowledge Management Porting library vocabularies to the Semantic Web, and back A win-win round trip [email_address] making sense of content TM
  • 2. Summary
    • Libraries and the Web have a twenty-years affair of love and hate which should now come of age.
    • The role of vocabularies is critical in this affair.
    • The Linked Data architecture should leverage proven heritage vocabularies instead of reinventing them.
    • Specific features of library vocabularies make them more or less portable and useful to the Semantic Web.
    • To-do list and guidelines for vocabulary audit and publication.
    • Semantic Web tools feedback : helping vocabulary management.
  • 3. Libraries and the Web, a love and hate story
    • Libraries have been around for thousands of years
    • The Web is barely in its twenties
    • Webbies has been claiming the Web was bound to become the Universal Library
      • Bottom line : traditional libraries are obsolete
    • Librarians have been claiming the Web is a mess and will never improve
      • Bottom line : keep using libraries for serious stuff
    • But they look at each other with fascination
      • The Web : if only we could be as efficient as libraries for classification and index
      • Libraries : if only we could scale at the size of the Web, and be as user-friendly
    • They are bound to be married at the end of the day !
  • 4. What is there? Everything!
    • In libraries
      • Resources (aka records) and catalogues
      • Authority lists (aka descriptions of « real-world entities »)
      • Subject headings, thesaurus, classification schemes (aka vocabularies)
      • Metadata linking resources to entities and subjects
    • On the Web
      • Resources (aka web pages) w/o catalogue
      • Real-world entities descriptions (w/o authority lists)
        • Exemple : Wikipedia, Facebook
      • Profusion of vocabularies, but w/o general schemes
        • Often called « taxonomies », handcrafted for user navigation
      • More and more metadata based on RDF family of standards
    • Vocabularies are the missing pieces of the Semantic Web
      • Libraries are the natural providers!
  • 5. Web vs Libraries (1.0 view) experimental, wild polished, proven methods unknown controlled type of content unknown, organic growth manageable size search algorithms vocabulary-based search and retrieval local, quality unknown native, vocabulary-based classification local, quality unknown native, vocabulary-based organization of content unknown, organic growth controlled content global local, focused scope The Web Library
  • 6. Attempts for organizing the Web #fail
    • Directories
      • Could not cope with the scale of Web growth
      • Were often built by amateurs in classification and vocabulary management
      • Were biased by the commercial use of the Web
    • Vocabularies
      • Open Directory categories
      • Wikipedia categories
      • Globally messy, organic growth
    • Metadata in html <head>
      • Spammed, not in sync with the content
      • Ignored by most search engines now
    • Bottom line : The Web is not and will never be a Global Library
  • 7. What the Web is good at
    • Creating representation of « things »
      • Wikipedia pages
      • Facebook pages
      • Pages for products, species, places …
    • Providing standard identifiers (URI) associated to access protocol (http)
      • Identity of things is encapsulated in resources URIs
    • Linking things together
      • Via http protocol, hypertext etc
    • Semantic Web is just an extension of the Web
      • Leveraging all the above features
      • Expliciting the semantics of URIs and descriptions
      • Allowing better, less ambiguous access to resources
  • 8. The Semantic Web in perspective
    • INTERNET (ca.1970)
      • Network of identified, connected and addressable computers
        • Similar to libray infrastructure level : buildings, rooms, shelves …
        • Technical support : IP addresses
    • WEB 1.0 (ca. 1990)
      • Network of identified, connected and addressable resources
        • Similar to library resources level : books, documents …
        • Technical support : URLs, http
    • Semantic Web (ca. 2010)
      • Network of identified, connected and addressable concepts
        • Similar to library vocabulary level : thesaurus, classification, authority lists
        • Technical support : URIs, RDF, content negociation
  • 9. Vocabularies as Core Data
    • Definition of « Core Data » (Hannemann & Kett, 2010)
      • Stable and reliable
      • persistent nodes with a strict, transparent policy : data provenance, no deletions, versioning
      • maintained or backed by trusted public organizations
      • standards based
    • Library Vocabularies are just that!
      • Or at least they should play this role
    • Vocabularies can be used on the Web as in libraries
      • Despite the difference in scope and size
    • Based on shared metadata standards
      • That’s where the Semantic Web comes in
  • 10. A roadmap for Semantic Web migration
    • Audit
      • Choose the vocabularies which are worth publishing
      • Sort out terms, concepts and things
      • Explicit the semantics of your specific syntactic constructs
      • Check the actual transitivity of your hierarchies
      • Figure the translation into vocabularies/ontologies popular on the Web
    • Make ready for publication
      • Package by domains
      • Define a strict URI policy, including versioning
      • Map to other vocabularies
    • Integration
      • Expose and promote your Vocabulary as a Service
        • Using de-referencable URIs and SPARQL endpoints
      • Use Semantic Web software for vocabulary management
        • To ensure native standard conformance and logical consistency
  • 11. Semantic audit of vocabularies ?
    • Publication for the Semantic Web can be a painful process
    • Not only technically (formats etc) but conceptually
    • Making semantic explicit often shows clearly where you’ve gone wrong
    • But it’s an healthy process anyway
    • Better do most of the audit before any publication on the Web
    • But publishing early can trigger useful Web community feedback …
  • 12. Which vocabularies are worth it?
    • Named entities, authority files
      • A growing number of entities already defined in the Linked Data Cloud …
        • 3.4 million « things » identified and described at dbpedia.org
        • Over 7 million « features »identified and described at geonames.org
        • http://www.freebase.com/view/people/views/person 1,687,119 entries and counting
        • http://www.freebase.com/view/en/victor_hugo
        • http://dbpedia.org/resource/Victor_Hugo
      • Consider if duplicate efforts are worth it
        • Should you throw away our yet-another-Victor Hugo entry?
        • No, but link to other descriptions in the Cloud (based on http URI) and keep existing identifiesr for retro-compatibility
    • Taxonomies, subject headings, classifications
      • That’s where library heritage is strong and the Web is weak
      • Such vocabularies can be structuring for the web of data as they are for libraries
      • Their publication should be a priority !
  • 13. Sort out Terms, Concepts and Things
    • Terms are denotations for concepts
      • In a given language
      • If possible qualified by vocabulary specialists
    • Concepts are specific representations of « things »
      • In a certain view of the world
      • For a specific functional purpose in mind
    • Things are ... just things
      • What users are about at the end the day (people, places, products …)
    • Terms, Concepts and Things should all be first-class citizens in the Semantic Web
      • Switching from a term-centric to a concept-centric view
        • That’s what SKOS and ISO 25964 … are all about
      • Does not mean that terms and terminology are out of the picture!
        • They simply need to be defined and managed at a different level
  • 14. Building on SKOS and extensions Thing Concept Term denotes represents owl:Thing skos:Concept skos-xl:Label foaf:focus * skos-xl:prefLabel * Under discussion geo:Feature foaf:Person time:TemporalEntity
  • 15. Explicit the semantics of your syntax
    • < http:// id.loc.gov / authorities /sh00000562 > a skos:Concept
      • skos:prefLabel ‘Environmental justice--Religious aspects--Buddhism, [Christianity, etc.]’
    • What does such an aggregation mean?
      • Has « -- » the same semantics in all subject headings in LCSH?
      • Is yes, which one?
      • Same questions for [ ]
    • How does this concept link to its components?
      • Currently it does not, although they are defined elsewhere in LCSH
      • http://id.loc.gov/authorities/sh97002483 : Environmental justice
      • http://id.loc.gov/authorities/sh00000564 : Environmental justice--Religious aspects
      • http://id.loc.gov/authorities/sh85017454 : Buddhism
    • Expliciting the link between the above concepts would definitely add value!
      • To do : figure out how (using flavors of skos:semanticRelation)
  • 16. Make sense of hierarchies
    • From LCSH hierarchy
      • Auxiliary sciences of history
      • .Civilization
      • ..Learning and Scholarship
      • … Humanities
      • … .Philosophy
      • … ..Psychology
      • …… Attention
      • …… .Listening
      • …… ..Eavesdropping
      • ……… Wiretapping
    • Kind of semantic drift all the way down
      • Every local relation makes sense, globally it’s weird if transitivity applies
      • Bust most automatic systems will rely on transitivity as default feature
    • Either fix it, or specify the hierarchy is not transitive
  • 17. Consider how the vocabulary will be used
    • The Web is an open world
      • Whatever is not explicitly forbidden is allowed
      • Whereas in closed library practice, whatever is not explicitly allowed is forbidden
      • So be prepared to all sort of misuses if you let room for interpretation
    • Package by domains
      • Web users will be happy to integrate hundreds of concepts rather than millions
      • Small, focused vocabularies are more re-usable than general ones
      • The widest the scope, the more room for ambiguity!
    • Package by versions
      • Versioning at vocabulary level and/or at concept level (open issue)
        • Should a concept keep the same URI in successive versions?
        • When has a concept changed enough to be replaced by a different one?
      • Never delete a concept, deprecate it if necessary
        • Using e.g. dcterms:isReplacedBy
        • Concepts have a life cycle, but cool URIs don’t change
  • 18. Mapping to other vocabularies
    • A most important but still open area
      • Dealing with hard notions like identity, similarity, sameness …
    • Tools for help to alignment are emerging
      • See e.g. ONAGUI http://sourceforge.net/projects/onagui/
      • Work in progress (TAE project)
        • More background in ontology mapping than vocabulary mapping
    • Mapping at the concept level seems to make sense
      • SKOS provides basic vocabulary for simple mapping
      • But no provision for mapping a simple concept to an aggregate
      • In particular no boolean operators (Actor AND Musician vs Actor OR Musician)
    • Alignment of « things » is a contentious area
      • See various debates on use and abuse of owl:sameAs
        • http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf
  • 19. Ready to publish?
    • Follow recommended (best) practices
      • See e.g. http://www.w3.org/TR/swbp-vocab-pub/
    • Provide usable packaging (see above, particularly if the vocabulary is large)
      • 500 Mo dump with one single SKOS file is not the most manageable form!
    • Choose and expose a clear licensing policy
      • Using e.g. Creative Commons license model
    • Make all concept URIs de-referencable
      • Using content negociation : one description for machines, one for humans
    • Publish a vocabulary description
      • For humans and for machines (using e.g., SIOC)
    • Publish your own use of the vocabulary
      • In metadata of your records
  • 20. Spread the word, join the community!
    • The Web is an open collaborative entreprise …
    • Push what you’ve done outside the library world
      • Join linked data and Semantic Web forums and work groups
      • Be ready to answer feedback
    • The Semantic Web will be a Social Web
      • Look out how your vocabulary will be adopted or not by social applications
      • Facebook, Twitter and the like …
    • And of course look out for further developments at LLD XG
      • http://www.w3.org/2005/Incubator/lld/
  • 21. Put Semantic Web software in the back-office
    • Publication of linked data can be done from any information system
      • It can be dealt as yet another publication issue …
    • But it’s simpler if semantic formats are dealt natively in the back-office
      • Supporting import/export of vocabularies in the system in standard formats
        • Native RDF and SKOS integration
      • Supporting semantic queries in the back-office
      • Supporting sanity checking, inference rules
    • … And making librarians natively at ease with semantic technologies!
      • Not the least part of it
    • The technology is mature, software is on the market …
      • Time to think about it!
      • So, just ask 
  • 22.
    • Thanks for your attention
    • Questions?

×