2010-08-15 Session 149   Information Technology, Cataloguing, Classification and Indexing with Knowledge Management Portin...
Summary <ul><li>Libraries and the Web have a twenty-years affair of love and hate which should now come of age. </li></ul>...
Libraries and the Web, a love and hate story <ul><li>Libraries have been around for thousands of years </li></ul><ul><li>T...
What is there? Everything! <ul><li>In libraries </li></ul><ul><ul><li>Resources (aka records) and catalogues </li></ul></u...
Web vs Libraries (1.0 view) experimental, wild polished, proven methods unknown controlled type of content unknown, organi...
Attempts for organizing the Web #fail  <ul><li>Directories </li></ul><ul><ul><li>Could not cope with the scale of Web grow...
What the Web is good at <ul><li>Creating representation of « things » </li></ul><ul><ul><li>Wikipedia pages </li></ul></ul...
The Semantic Web in perspective <ul><li>INTERNET (ca.1970) </li></ul><ul><ul><li>Network of identified, connected and addr...
Vocabularies as Core Data <ul><li>Definition of « Core Data » (Hannemann & Kett, 2010) </li></ul><ul><ul><li>Stable and re...
A roadmap for Semantic Web migration <ul><li>Audit </li></ul><ul><ul><li>Choose the vocabularies which are worth publishin...
Semantic audit of vocabularies ? <ul><li>Publication for the Semantic Web can be a painful process </li></ul><ul><li>Not o...
Which vocabularies are worth it? <ul><li>Named entities, authority files </li></ul><ul><ul><li>A growing number of entitie...
Sort out Terms, Concepts and Things <ul><li>Terms are denotations for concepts </li></ul><ul><ul><li>In a given language <...
Building on SKOS and extensions Thing Concept Term denotes represents owl:Thing skos:Concept skos-xl:Label foaf:focus * sk...
Explicit the semantics of your syntax <ul><li>< http:// id.loc.gov / authorities /sh00000562 > a skos:Concept </li></ul><u...
Make sense of hierarchies <ul><li>From LCSH hierarchy </li></ul><ul><ul><li>Auxiliary sciences of history </li></ul></ul><...
Consider how the vocabulary will be used <ul><li>The Web is an open world </li></ul><ul><ul><li>Whatever is not explicitly...
Mapping to other vocabularies <ul><li>A most important but still open area </li></ul><ul><ul><li>Dealing with hard notions...
Ready to publish? <ul><li>Follow recommended (best) practices  </li></ul><ul><ul><li>See e.g.  http://www.w3.org/TR/swbp-v...
Spread the word, join the community! <ul><li>The Web is an open collaborative entreprise … </li></ul><ul><li>Push what you...
Put Semantic Web software in the back-office <ul><li>Publication of linked data can be done from any information system </...
<ul><li>Thanks for your attention </li></ul><ul><li>Questions? </li></ul>
Upcoming SlideShare
Loading in …5
×

Porting Library Vocabularies to the Semantic Web - IFLA 2010

1,773 views

Published on

Presentation at IFLA 2010

Published in: Technology

Porting Library Vocabularies to the Semantic Web - IFLA 2010

  1. 1. 2010-08-15 Session 149 Information Technology, Cataloguing, Classification and Indexing with Knowledge Management Porting library vocabularies to the Semantic Web, and back A win-win round trip [email_address] making sense of content TM
  2. 2. Summary <ul><li>Libraries and the Web have a twenty-years affair of love and hate which should now come of age. </li></ul><ul><li>The role of vocabularies is critical in this affair. </li></ul><ul><li>The Linked Data architecture should leverage proven heritage vocabularies instead of reinventing them. </li></ul><ul><li>Specific features of library vocabularies make them more or less portable and useful to the Semantic Web. </li></ul><ul><li>To-do list and guidelines for vocabulary audit and publication. </li></ul><ul><li>Semantic Web tools feedback : helping vocabulary management. </li></ul>
  3. 3. Libraries and the Web, a love and hate story <ul><li>Libraries have been around for thousands of years </li></ul><ul><li>The Web is barely in its twenties </li></ul><ul><li>Webbies has been claiming the Web was bound to become the Universal Library </li></ul><ul><ul><li>Bottom line : traditional libraries are obsolete </li></ul></ul><ul><li>Librarians have been claiming the Web is a mess and will never improve </li></ul><ul><ul><li>Bottom line : keep using libraries for serious stuff </li></ul></ul><ul><li>But they look at each other with fascination </li></ul><ul><ul><li>The Web : if only we could be as efficient as libraries for classification and index </li></ul></ul><ul><ul><li>Libraries : if only we could scale at the size of the Web, and be as user-friendly </li></ul></ul><ul><li>They are bound to be married at the end of the day ! </li></ul>
  4. 4. What is there? Everything! <ul><li>In libraries </li></ul><ul><ul><li>Resources (aka records) and catalogues </li></ul></ul><ul><ul><li>Authority lists (aka descriptions of « real-world entities ») </li></ul></ul><ul><ul><li>Subject headings, thesaurus, classification schemes (aka vocabularies) </li></ul></ul><ul><ul><li>Metadata linking resources to entities and subjects </li></ul></ul><ul><li>On the Web </li></ul><ul><ul><li>Resources (aka web pages) w/o catalogue </li></ul></ul><ul><ul><li>Real-world entities descriptions (w/o authority lists) </li></ul></ul><ul><ul><ul><li>Exemple : Wikipedia, Facebook </li></ul></ul></ul><ul><ul><li>Profusion of vocabularies, but w/o general schemes </li></ul></ul><ul><ul><ul><li>Often called « taxonomies », handcrafted for user navigation </li></ul></ul></ul><ul><ul><li>More and more metadata based on RDF family of standards </li></ul></ul><ul><li>Vocabularies are the missing pieces of the Semantic Web </li></ul><ul><ul><li>Libraries are the natural providers! </li></ul></ul>
  5. 5. Web vs Libraries (1.0 view) experimental, wild polished, proven methods unknown controlled type of content unknown, organic growth manageable size search algorithms vocabulary-based search and retrieval local, quality unknown native, vocabulary-based classification local, quality unknown native, vocabulary-based organization of content unknown, organic growth controlled content global local, focused scope The Web Library
  6. 6. Attempts for organizing the Web #fail <ul><li>Directories </li></ul><ul><ul><li>Could not cope with the scale of Web growth </li></ul></ul><ul><ul><li>Were often built by amateurs in classification and vocabulary management </li></ul></ul><ul><ul><li>Were biased by the commercial use of the Web </li></ul></ul><ul><li>Vocabularies </li></ul><ul><ul><li>Open Directory categories </li></ul></ul><ul><ul><li>Wikipedia categories </li></ul></ul><ul><ul><li>Globally messy, organic growth </li></ul></ul><ul><li>Metadata in html <head> </li></ul><ul><ul><li>Spammed, not in sync with the content </li></ul></ul><ul><ul><li>Ignored by most search engines now </li></ul></ul><ul><li>Bottom line : The Web is not and will never be a Global Library </li></ul>
  7. 7. What the Web is good at <ul><li>Creating representation of « things » </li></ul><ul><ul><li>Wikipedia pages </li></ul></ul><ul><ul><li>Facebook pages </li></ul></ul><ul><ul><li>Pages for products, species, places … </li></ul></ul><ul><li>Providing standard identifiers (URI) associated to access protocol (http) </li></ul><ul><ul><li>Identity of things is encapsulated in resources URIs </li></ul></ul><ul><li>Linking things together </li></ul><ul><ul><li>Via http protocol, hypertext etc </li></ul></ul><ul><li>Semantic Web is just an extension of the Web </li></ul><ul><ul><li>Leveraging all the above features </li></ul></ul><ul><ul><li>Expliciting the semantics of URIs and descriptions </li></ul></ul><ul><ul><li>Allowing better, less ambiguous access to resources </li></ul></ul>
  8. 8. The Semantic Web in perspective <ul><li>INTERNET (ca.1970) </li></ul><ul><ul><li>Network of identified, connected and addressable computers </li></ul></ul><ul><ul><ul><li>Similar to libray infrastructure level : buildings, rooms, shelves … </li></ul></ul></ul><ul><ul><ul><li>Technical support : IP addresses </li></ul></ul></ul><ul><li>WEB 1.0 (ca. 1990) </li></ul><ul><ul><li>Network of identified, connected and addressable resources </li></ul></ul><ul><ul><ul><li>Similar to library resources level : books, documents … </li></ul></ul></ul><ul><ul><ul><li>Technical support : URLs, http </li></ul></ul></ul><ul><li>Semantic Web (ca. 2010) </li></ul><ul><ul><li>Network of identified, connected and addressable concepts </li></ul></ul><ul><ul><ul><li>Similar to library vocabulary level : thesaurus, classification, authority lists </li></ul></ul></ul><ul><ul><ul><li>Technical support : URIs, RDF, content negociation </li></ul></ul></ul>
  9. 9. Vocabularies as Core Data <ul><li>Definition of « Core Data » (Hannemann & Kett, 2010) </li></ul><ul><ul><li>Stable and reliable </li></ul></ul><ul><ul><li>persistent nodes with a strict, transparent policy : data provenance, no deletions, versioning </li></ul></ul><ul><ul><li>maintained or backed by trusted public organizations </li></ul></ul><ul><ul><li>standards based </li></ul></ul><ul><li>Library Vocabularies are just that! </li></ul><ul><ul><li>Or at least they should play this role </li></ul></ul><ul><li>Vocabularies can be used on the Web as in libraries </li></ul><ul><ul><li>Despite the difference in scope and size </li></ul></ul><ul><li>Based on shared metadata standards </li></ul><ul><ul><li>That’s where the Semantic Web comes in </li></ul></ul>
  10. 10. A roadmap for Semantic Web migration <ul><li>Audit </li></ul><ul><ul><li>Choose the vocabularies which are worth publishing </li></ul></ul><ul><ul><li>Sort out terms, concepts and things </li></ul></ul><ul><ul><li>Explicit the semantics of your specific syntactic constructs </li></ul></ul><ul><ul><li>Check the actual transitivity of your hierarchies </li></ul></ul><ul><ul><li>Figure the translation into vocabularies/ontologies popular on the Web </li></ul></ul><ul><li>Make ready for publication </li></ul><ul><ul><li>Package by domains </li></ul></ul><ul><ul><li>Define a strict URI policy, including versioning </li></ul></ul><ul><ul><li>Map to other vocabularies </li></ul></ul><ul><li>Integration </li></ul><ul><ul><li>Expose and promote your Vocabulary as a Service </li></ul></ul><ul><ul><ul><li>Using de-referencable URIs and SPARQL endpoints </li></ul></ul></ul><ul><ul><li>Use Semantic Web software for vocabulary management </li></ul></ul><ul><ul><ul><li>To ensure native standard conformance and logical consistency </li></ul></ul></ul>
  11. 11. Semantic audit of vocabularies ? <ul><li>Publication for the Semantic Web can be a painful process </li></ul><ul><li>Not only technically (formats etc) but conceptually </li></ul><ul><li>Making semantic explicit often shows clearly where you’ve gone wrong </li></ul><ul><li>But it’s an healthy process anyway </li></ul><ul><li>Better do most of the audit before any publication on the Web </li></ul><ul><li>But publishing early can trigger useful Web community feedback … </li></ul>
  12. 12. Which vocabularies are worth it? <ul><li>Named entities, authority files </li></ul><ul><ul><li>A growing number of entities already defined in the Linked Data Cloud … </li></ul></ul><ul><ul><ul><li>3.4 million « things » identified and described at dbpedia.org </li></ul></ul></ul><ul><ul><ul><li>Over 7 million « features »identified and described at geonames.org </li></ul></ul></ul><ul><ul><ul><li>http://www.freebase.com/view/people/views/person 1,687,119 entries and counting </li></ul></ul></ul><ul><ul><ul><li>http://www.freebase.com/view/en/victor_hugo </li></ul></ul></ul><ul><ul><ul><li>http://dbpedia.org/resource/Victor_Hugo </li></ul></ul></ul><ul><ul><li>Consider if duplicate efforts are worth it </li></ul></ul><ul><ul><ul><li>Should you throw away our yet-another-Victor Hugo entry? </li></ul></ul></ul><ul><ul><ul><li>No, but link to other descriptions in the Cloud (based on http URI) and keep existing identifiesr for retro-compatibility </li></ul></ul></ul><ul><li>Taxonomies, subject headings, classifications </li></ul><ul><ul><li>That’s where library heritage is strong and the Web is weak </li></ul></ul><ul><ul><li>Such vocabularies can be structuring for the web of data as they are for libraries </li></ul></ul><ul><ul><li>Their publication should be a priority ! </li></ul></ul>
  13. 13. Sort out Terms, Concepts and Things <ul><li>Terms are denotations for concepts </li></ul><ul><ul><li>In a given language </li></ul></ul><ul><ul><li>If possible qualified by vocabulary specialists </li></ul></ul><ul><li>Concepts are specific representations of « things » </li></ul><ul><ul><li>In a certain view of the world </li></ul></ul><ul><ul><li>For a specific functional purpose in mind </li></ul></ul><ul><li>Things are ... just things </li></ul><ul><ul><li>What users are about at the end the day (people, places, products …) </li></ul></ul><ul><li>Terms, Concepts and Things should all be first-class citizens in the Semantic Web </li></ul><ul><ul><li>Switching from a term-centric to a concept-centric view </li></ul></ul><ul><ul><ul><li>That’s what SKOS and ISO 25964 … are all about </li></ul></ul></ul><ul><ul><li>Does not mean that terms and terminology are out of the picture! </li></ul></ul><ul><ul><ul><li>They simply need to be defined and managed at a different level </li></ul></ul></ul>
  14. 14. Building on SKOS and extensions Thing Concept Term denotes represents owl:Thing skos:Concept skos-xl:Label foaf:focus * skos-xl:prefLabel * Under discussion geo:Feature foaf:Person time:TemporalEntity
  15. 15. Explicit the semantics of your syntax <ul><li>< http:// id.loc.gov / authorities /sh00000562 > a skos:Concept </li></ul><ul><ul><li>skos:prefLabel ‘Environmental justice--Religious aspects--Buddhism, [Christianity, etc.]’ </li></ul></ul><ul><li>What does such an aggregation mean? </li></ul><ul><ul><li>Has « -- » the same semantics in all subject headings in LCSH? </li></ul></ul><ul><ul><li>Is yes, which one? </li></ul></ul><ul><ul><li>Same questions for [ ] </li></ul></ul><ul><li>How does this concept link to its components? </li></ul><ul><ul><li>Currently it does not, although they are defined elsewhere in LCSH </li></ul></ul><ul><ul><li>http://id.loc.gov/authorities/sh97002483 : Environmental justice </li></ul></ul><ul><ul><li>http://id.loc.gov/authorities/sh00000564 : Environmental justice--Religious aspects </li></ul></ul><ul><ul><li>http://id.loc.gov/authorities/sh85017454 : Buddhism </li></ul></ul><ul><li>Expliciting the link between the above concepts would definitely add value! </li></ul><ul><ul><li>To do : figure out how (using flavors of skos:semanticRelation) </li></ul></ul>
  16. 16. Make sense of hierarchies <ul><li>From LCSH hierarchy </li></ul><ul><ul><li>Auxiliary sciences of history </li></ul></ul><ul><ul><li>.Civilization </li></ul></ul><ul><ul><li>..Learning and Scholarship </li></ul></ul><ul><ul><li>… Humanities </li></ul></ul><ul><ul><li>… .Philosophy </li></ul></ul><ul><ul><li>… ..Psychology </li></ul></ul><ul><ul><li>…… Attention </li></ul></ul><ul><ul><li>…… .Listening </li></ul></ul><ul><ul><li>…… ..Eavesdropping </li></ul></ul><ul><ul><li>……… Wiretapping </li></ul></ul><ul><li>Kind of semantic drift all the way down </li></ul><ul><ul><li>Every local relation makes sense, globally it’s weird if transitivity applies </li></ul></ul><ul><ul><li>Bust most automatic systems will rely on transitivity as default feature </li></ul></ul><ul><li>Either fix it, or specify the hierarchy is not transitive </li></ul>
  17. 17. Consider how the vocabulary will be used <ul><li>The Web is an open world </li></ul><ul><ul><li>Whatever is not explicitly forbidden is allowed </li></ul></ul><ul><ul><li>Whereas in closed library practice, whatever is not explicitly allowed is forbidden </li></ul></ul><ul><ul><li>So be prepared to all sort of misuses if you let room for interpretation </li></ul></ul><ul><li>Package by domains </li></ul><ul><ul><li>Web users will be happy to integrate hundreds of concepts rather than millions </li></ul></ul><ul><ul><li>Small, focused vocabularies are more re-usable than general ones </li></ul></ul><ul><ul><li>The widest the scope, the more room for ambiguity! </li></ul></ul><ul><li>Package by versions </li></ul><ul><ul><li>Versioning at vocabulary level and/or at concept level (open issue) </li></ul></ul><ul><ul><ul><li>Should a concept keep the same URI in successive versions? </li></ul></ul></ul><ul><ul><ul><li>When has a concept changed enough to be replaced by a different one? </li></ul></ul></ul><ul><ul><li>Never delete a concept, deprecate it if necessary </li></ul></ul><ul><ul><ul><li>Using e.g. dcterms:isReplacedBy </li></ul></ul></ul><ul><ul><ul><li>Concepts have a life cycle, but cool URIs don’t change </li></ul></ul></ul>
  18. 18. Mapping to other vocabularies <ul><li>A most important but still open area </li></ul><ul><ul><li>Dealing with hard notions like identity, similarity, sameness … </li></ul></ul><ul><li>Tools for help to alignment are emerging </li></ul><ul><ul><li>See e.g. ONAGUI http://sourceforge.net/projects/onagui/ </li></ul></ul><ul><ul><li>Work in progress (TAE project) </li></ul></ul><ul><ul><ul><li>More background in ontology mapping than vocabulary mapping </li></ul></ul></ul><ul><li>Mapping at the concept level seems to make sense </li></ul><ul><ul><li>SKOS provides basic vocabulary for simple mapping </li></ul></ul><ul><ul><li>But no provision for mapping a simple concept to an aggregate </li></ul></ul><ul><ul><li>In particular no boolean operators (Actor AND Musician vs Actor OR Musician) </li></ul></ul><ul><li>Alignment of « things » is a contentious area </li></ul><ul><ul><li>See various debates on use and abuse of owl:sameAs </li></ul></ul><ul><ul><ul><li>http://events.linkeddata.org/ldow2010/papers/ldow2010_paper09.pdf </li></ul></ul></ul>
  19. 19. Ready to publish? <ul><li>Follow recommended (best) practices </li></ul><ul><ul><li>See e.g. http://www.w3.org/TR/swbp-vocab-pub/ </li></ul></ul><ul><li>Provide usable packaging (see above, particularly if the vocabulary is large) </li></ul><ul><ul><li>500 Mo dump with one single SKOS file is not the most manageable form! </li></ul></ul><ul><li>Choose and expose a clear licensing policy </li></ul><ul><ul><li>Using e.g. Creative Commons license model </li></ul></ul><ul><li>Make all concept URIs de-referencable </li></ul><ul><ul><li>Using content negociation : one description for machines, one for humans </li></ul></ul><ul><li>Publish a vocabulary description </li></ul><ul><ul><li>For humans and for machines (using e.g., SIOC) </li></ul></ul><ul><li>Publish your own use of the vocabulary </li></ul><ul><ul><li>In metadata of your records </li></ul></ul>
  20. 20. Spread the word, join the community! <ul><li>The Web is an open collaborative entreprise … </li></ul><ul><li>Push what you’ve done outside the library world </li></ul><ul><ul><li>Join linked data and Semantic Web forums and work groups </li></ul></ul><ul><ul><li>Be ready to answer feedback </li></ul></ul><ul><li>The Semantic Web will be a Social Web </li></ul><ul><ul><li>Look out how your vocabulary will be adopted or not by social applications </li></ul></ul><ul><ul><li>Facebook, Twitter and the like … </li></ul></ul><ul><li>And of course look out for further developments at LLD XG </li></ul><ul><ul><li>http://www.w3.org/2005/Incubator/lld/ </li></ul></ul>
  21. 21. Put Semantic Web software in the back-office <ul><li>Publication of linked data can be done from any information system </li></ul><ul><ul><li>It can be dealt as yet another publication issue … </li></ul></ul><ul><li>But it’s simpler if semantic formats are dealt natively in the back-office </li></ul><ul><ul><li>Supporting import/export of vocabularies in the system in standard formats </li></ul></ul><ul><ul><ul><li>Native RDF and SKOS integration </li></ul></ul></ul><ul><ul><li>Supporting semantic queries in the back-office </li></ul></ul><ul><ul><li>Supporting sanity checking, inference rules </li></ul></ul><ul><li>… And making librarians natively at ease with semantic technologies! </li></ul><ul><ul><li>Not the least part of it </li></ul></ul><ul><li>The technology is mature, software is on the market … </li></ul><ul><ul><li>Time to think about it! </li></ul></ul><ul><ul><li>So, just ask  </li></ul></ul>
  22. 22. <ul><li>Thanks for your attention </li></ul><ul><li>Questions? </li></ul>

×