Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Ed Chamberlain Systems Development Librarian Cambridge University Library
Disclaimers … <ul><li>Apologies if you see the semantic web as up there with quantum mechanics … </li></ul><ul><ul><li>Wil...
Overview <ul><li>Linked data in theory </li></ul><ul><li>What we learnt </li></ul><ul><ul><li>IPR </li></ul></ul><ul><ul><...
What is the semantic web? <ul><li>“ The  Semantic Web  is a &quot;man-made woven web of data&quot; that facilitates  machi...
Eh? <ul><li>Semantic = its meaning is explained - self-describing data! </li></ul><ul><li>Hyperlinked = meaning contextual...
What is Linked Data … <ul><li>After several iterations of semantic web development … </li></ul><ul><li>Tim Berners-Lee has...
And RDF ? <ul><li>The  Resource Description Framework  ( RDF ) is a family of  World Wide Web Consortium  (W3C)  specifica...
What does this mean in practice … <ul><li>RDF Data is expressed as triples: </li></ul><ul><li>DC XML … </li></ul><ul><li><...
Most of a record … <ul><li>1.  <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &q...
Where is the linking exactly? <ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/cr...
External linking <ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/subject> </li><...
Live demo … <ul><li>http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346 </li></ul>
Meanwhile … <ul><li>BNB </li></ul><ul><li>British Museum </li></ul><ul><li>Library of Congress </li></ul><ul><li>BBC Natur...
The Linking Open Data cloud diagram -  http : //richard . cyganiak .de/2007/10/lod /
What was COMET? <ul><li>Cambridge Open Metadata </li></ul><ul><li>Cambridge University Library / CARET / OCLC </li></ul><u...
What did COMET do … <ul><li>Experimentally convert as much of the Cambridge University Library catalogue as it could from ...
Why? <ul><li>Respond to academic / national demand for Open Data </li></ul><ul><li>Get our data to non-librarians! </li></...
Why - IPR <ul><li>Linked data works best with a permissive license </li></ul><ul><li>CC0 or Public Domain Data License </l...
How – IPR <ul><li>Examine contracts with major vendors </li></ul><ul><li>Decide on re-use conditions and contact them </li...
How – IPR <ul><li>Where does a record come from ? </li></ul><ul><li>Several places in Marc21 where this data could be held...
 
What - IPR <ul><li>Most vendors happy with permissive license for ‘non-marc21’ formats </li></ul><ul><li>RLUK / BL B.N.B. ...
IPR - What did we learn? <ul><li>Marc21 not fit for purpose here, no ‘authoritative code’ for license </li></ul><ul><li>Na...
How - data <ul><li>Several attempts – settled on SQL extracts based on lists of bib_ids </li></ul><ul><li>Use Perl scripti...
How  - marc problems <ul><li>Punctuation as a function </li></ul><ul><li>Binary encoding </li></ul><ul><li>Numbers for fie...
How – data vocab <ul><li>RDF allows you to freely mix vocabularies </li></ul><ul><li>Emerging consensus on bibliographic d...
How - data publishing <ul><li>Bulk downloads </li></ul><ul><li>Queryable ‘endpoints’ </li></ul><ul><li>Data and code at ht...
How – linking <ul><li>PHP script to match text against LOC subject headings – enrich with LOC GUID </li></ul><ul><li>FAST ...
Data - What did we learn ? <ul><li>Marc / AACR2 cannot translate will to semantically rich formats </li></ul><ul><li>Need ...
What else?
RDF friendly database <ul><li>Called RDF stores, triplestores or Quadstores </li></ul><ul><li>Vary in size scale and scope...
How - SPARQL <ul><li>Query language for RDF stores </li></ul><ul><li>Still a work in progress </li></ul><ul><li>Some simil...
How –storage and access <ul><li>ARC2 - Lightweight MYSQL / PHP solution </li></ul><ul><ul><li>Good fit for a six month pro...
Supporting tech -What did we learn? <ul><li>Triplestores are cumbersome </li></ul><ul><li>SPARQL alone does not do the tri...
What does this mean for Ex Libris <ul><li>Building whole systems around RDF is not really a good idea </li></ul><ul><li>Ne...
Always add value to RDF … <ul><li>Standalone RDF is just fiddly Dublin Core, so … </li></ul><ul><ul><li>Create httpd URI’s...
Beyond bibliographic Bibliographic Holdings FAST subject headings Libraries Transactions Special collections Archives Crea...
Do what Tim said … <ul><ul><li>Use URIs as names for things </li></ul></ul><ul><ul><li>Use HTTP URIs so that people can lo...
Questions? <ul><li>@edchamberlain /  [email_address] </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul><ul><li>http://...
Upcoming SlideShare
Loading in …5
×

Linked data and voyager

1,041 views

Published on

Published in: Education
  • Be the first to comment

  • Be the first to like this

Linked data and voyager

  1. 1. Ed Chamberlain Systems Development Librarian Cambridge University Library
  2. 2. Disclaimers … <ul><li>Apologies if you see the semantic web as up there with quantum mechanics … </li></ul><ul><ul><li>Will contain some techy stuff </li></ul></ul><ul><ul><li>Not that much on Voyager … </li></ul></ul>
  3. 3. Overview <ul><li>Linked data in theory </li></ul><ul><li>What we learnt </li></ul><ul><ul><li>IPR </li></ul></ul><ul><ul><li>Data </li></ul></ul><ul><ul><li>Supporting technology </li></ul></ul><ul><li>How could it be used by Ex Libris? </li></ul>
  4. 4. What is the semantic web? <ul><li>“ The Semantic Web is a &quot;man-made woven web of data&quot; that facilitates machines to understand the semantics , or meaning, of information on the World Wide Web [1] [2 ] .” </li></ul><ul><li>“ The concept of Semantic Web applies methods beyond linear presentation of information ( Web 1.0 ) and multi-linear presentation of information ( Web 2.0 ) to make use of hyper-structures leading to entities of hypertext.” </li></ul><ul><li>http://en.wikipedia.org/wiki/Semantic_Web </li></ul>
  5. 5. Eh? <ul><li>Semantic = its meaning is explained - self-describing data! </li></ul><ul><li>Hyperlinked = meaning contextualised elsewhere </li></ul><ul><li>Focus on machines rather than people </li></ul>
  6. 6. What is Linked Data … <ul><li>After several iterations of semantic web development … </li></ul><ul><li>Tim Berners-Lee has advocated four underlying design principles for linked data: </li></ul><ul><ul><li>Use URIs as names for things </li></ul></ul><ul><ul><li>Use HTTP URIs so that people can look up those names </li></ul></ul><ul><ul><li>When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) </li></ul></ul><ul><ul><li>Include links to other URIs, so that they can discover more things </li></ul></ul><ul><ul><ul><ul><ul><li> http://www.w3.org/DesignIssues/LinkedData.html </li></ul></ul></ul></ul></ul>
  7. 7. And RDF ? <ul><li>The Resource Description Framework ( RDF ) is a family of World Wide Web Consortium (W3C) specifications originally designed as a metadata data model . It has come to be used as a general method for conceptual description or modeling of information that is implemented in web resources, using a variety of syntax formats. </li></ul><ul><li>http://en.wikipedia.org/wiki/Resource_Description_Framework </li></ul>
  8. 8. What does this mean in practice … <ul><li>RDF Data is expressed as triples: </li></ul><ul><li>DC XML … </li></ul><ul><li><dc:identifer>1000346</dc:identifer> </li></ul><ul><li><dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title> </li></ul><ul><li>Marc21 … </li></ul><ul><li>001 1000346 </li></ul><ul><li>245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / </li></ul><ul><li>RDF triples … </li></ul><ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> </li></ul><ul><li><http://purl.org/dc/terms/title> </li></ul><ul><li>&quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot; . </li></ul>
  9. 9. Most of a record … <ul><li>1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot; . </li></ul><ul><li>2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> . </li></ul><ul><li>3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> . </li></ul><ul><li>4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> &quot;UkCU1000346&quot; . </li></ul><ul><li>5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> &quot;1981&quot; . </li></ul><ul><li>6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> . </li></ul><ul><li>7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> . </li></ul><ul><li>8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii> </li></ul>
  10. 10. Where is the linking exactly? <ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0 > </li></ul><ul><li><http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/2000/01/rdf-schema#label> &quot;Mohan, Krishna&quot; . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://xmlns.com/foaf/0.1/Person> . <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> <http://xmlns.com/foaf/0.1#name> &quot;Mohan, Krishna&quot; . </li></ul>
  11. 11. External linking <ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/subject> </li></ul><ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> . </li></ul><ul><li><http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://www.w3.org/2004/02/skos/core#Concept> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#inScheme> <http://id.loc.gov/authorities#conceptScheme> . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://www.w3.org/2004/02/skos/core#prefLabel> &quot;Lohars -- History&quot; . <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_43e3fa1b4404410454c90d8022578852> <http://purl.org/dc/terms/hasPart> <http://id.loc.gov/authorities/sh85078149#concept> . </li></ul>
  12. 12. Live demo … <ul><li>http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346 </li></ul>
  13. 13. Meanwhile … <ul><li>BNB </li></ul><ul><li>British Museum </li></ul><ul><li>Library of Congress </li></ul><ul><li>BBC Nature </li></ul>
  14. 14. The Linking Open Data cloud diagram - http : //richard . cyganiak .de/2007/10/lod /
  15. 15. What was COMET? <ul><li>Cambridge Open Metadata </li></ul><ul><li>Cambridge University Library / CARET / OCLC </li></ul><ul><li>Funded by the JISC Infrastructure for Resource Discovery Project </li></ul><ul><li>February to July 2011 </li></ul><ul><ul><ul><li>http://discovery.ac.uk </li></ul></ul></ul>
  16. 16. What did COMET do … <ul><li>Experimentally convert as much of the Cambridge University Library catalogue as it could from Marc21 to RDF triples </li></ul><ul><li>Investigate IPR issues around Open License publishing and Marc21 </li></ul><ul><li>Construct an RDF publishing platform to site behind those URI’s … </li></ul><ul><li>Release tools for others to do the same </li></ul><ul><li>Blog and documentation </li></ul>
  17. 17. Why? <ul><li>Respond to academic / national demand for Open Data </li></ul><ul><li>Get our data to non-librarians! </li></ul><ul><li>Tax-payer value-for-money </li></ul><ul><li>CUL already provides public APIs </li></ul><ul><li>Gain in-house experience of RDF </li></ul><ul><li>Move library services forward </li></ul>
  18. 18. Why - IPR <ul><li>Linked data works best with a permissive license </li></ul><ul><li>CC0 or Public Domain Data License </li></ul><ul><li>Non-commercial licenses not suitable </li></ul><ul><li>Conflict with record vendors </li></ul>
  19. 19. How – IPR <ul><li>Examine contracts with major vendors </li></ul><ul><li>Decide on re-use conditions and contact them </li></ul><ul><li>Decode record ownership from Marc21 fields (Could not use Voyager SQL) </li></ul>
  20. 20. How – IPR <ul><li>Where does a record come from ? </li></ul><ul><li>Several places in Marc21 where this data could be held (015,035,038,994 …) </li></ul><ul><li>Logic and hierarchy for examination </li></ul><ul><li>Attempt at scripted analysis – list bib_ids by record vendor </li></ul>
  21. 22. What - IPR <ul><li>Most vendors happy with permissive license for ‘non-marc21’ formats </li></ul><ul><li>RLUK / BL B.N.B. – PDDL </li></ul><ul><li>OCLC – ODC-By Attribution license </li></ul><ul><li>No good reason not to re-publish – need the right license! </li></ul>
  22. 23. IPR - What did we learn? <ul><li>Marc21 not fit for purpose here, no ‘authoritative code’ for license </li></ul><ul><li>National / international mandate to release open data </li></ul><ul><li>No good reason not to re-publish – need the right license! </li></ul>
  23. 24. How - data <ul><li>Several attempts – settled on SQL extracts based on lists of bib_ids </li></ul><ul><li>Use Perl scripting to ‘munge’ the data </li></ul><ul><li>You can try this at home ! (work) </li></ul>
  24. 25. How - marc problems <ul><li>Punctuation as a function </li></ul><ul><li>Binary encoding </li></ul><ul><li>Numbers for field names </li></ul><ul><li>Bad characters </li></ul><ul><li>Replication of data in fields </li></ul>
  25. 26. How – data vocab <ul><li>RDF allows you to freely mix vocabularies </li></ul><ul><li>Emerging consensus on bibliographic description </li></ul><ul><li>Our conversion script is CSV customisable </li></ul><ul><li>BL and others leading the way </li></ul>
  26. 27. How - data publishing <ul><li>Bulk downloads </li></ul><ul><li>Queryable ‘endpoints’ </li></ul><ul><li>Data and code at http://data.lib.cam.ac.uk </li></ul>
  27. 28. How – linking <ul><li>PHP script to match text against LOC subject headings – enrich with LOC GUID </li></ul><ul><li>FAST / VIAF enrichment courtesy of OCLC </li></ul>
  28. 29. Data - What did we learn ? <ul><li>Marc / AACR2 cannot translate will to semantically rich formats </li></ul><ul><li>Need better container / transfer standards (not necessarily RDF) </li></ul>
  29. 30. What else?
  30. 31. RDF friendly database <ul><li>Called RDF stores, triplestores or Quadstores </li></ul><ul><li>Vary in size scale and scope </li></ul><ul><li>None are particularly admin / dev friendly right now … </li></ul>
  31. 32. How - SPARQL <ul><li>Query language for RDF stores </li></ul><ul><li>Still a work in progress </li></ul><ul><li>Some similarities with SQL </li></ul><ul><li>Bibliographic-centric tutorial </li></ul>
  32. 33. How –storage and access <ul><li>ARC2 - Lightweight MYSQL / PHP solution </li></ul><ul><ul><li>Good fit for a six month project </li></ul></ul><ul><ul><li>Great for around 3-500 k records </li></ul></ul><ul><ul><li>Not so good for 1 million plus </li></ul></ul><ul><ul><li>20 million + ? </li></ul></ul>
  33. 34. Supporting tech -What did we learn? <ul><li>Triplestores are cumbersome </li></ul><ul><li>SPARQL alone does not do the trick </li></ul><ul><li>High entry barrier to RDF is partly a result of these accompanying technologies </li></ul>
  34. 35. What does this mean for Ex Libris <ul><li>Building whole systems around RDF is not really a good idea </li></ul><ul><li>Need the flexibility to do this by dropping Marc21 </li></ul><ul><li>GUIDS for records (or allow us to have our own) – resolvable ? </li></ul><ul><li>Ensure any RDF publishing capacity is flexible (as ours is) </li></ul><ul><li>RDF capability for Primo ? </li></ul>
  35. 36. Always add value to RDF … <ul><li>Standalone RDF is just fiddly Dublin Core, so … </li></ul><ul><ul><li>Create httpd URI’s for entities </li></ul></ul><ul><ul><li>Link it to something useful (LOC, FAST, VIAF) </li></ul></ul><ul><ul><li>Endpoint (SPARQL?) </li></ul></ul><ul><ul><li>Don’t limit to the bibliographic </li></ul></ul>
  36. 37. Beyond bibliographic Bibliographic Holdings FAST subject headings Libraries Transactions Special collections Archives Creator / entity Place of publication LCSH subject headings Course lists Language Librarians
  37. 38. Do what Tim said … <ul><ul><li>Use URIs as names for things </li></ul></ul><ul><ul><li>Use HTTP URIs so that people can look up those names </li></ul></ul><ul><ul><li>When someone looks up a URI, provide useful information, using the standards (RDF, SPARQL) </li></ul></ul><ul><ul><li>Include links to other URIs, so that they can discover more things </li></ul></ul><ul><ul><ul><ul><ul><li> http://www.w3.org/DesignIssues/LinkedData.html </li></ul></ul></ul></ul></ul>
  38. 39. Questions? <ul><li>@edchamberlain / [email_address] </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul><ul><li>http://cul-comet.blogspot.com/ </li></ul>

×