Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
<ul><li>How our catalogues are evolving </li></ul><ul><li>Opening and sharing the data within them </li></ul><ul><li>Ed Ch...
<ul><li>Systems Development Librarian at the other place </li></ul><ul><li>Data ‘munger’ </li></ul><ul><li>Data consumer? ...
<ul><li>Control over data creation </li></ul><ul><li>Control over data consumption </li></ul><ul><li>Control over data env...
 
 
<ul><li>No longer the single authority for content and data </li></ul><ul><li>Commercial, social and academic discovery me...
 
<ul><li>Studies into Google Generation / ‘Generation Y’  1 </li></ul><ul><li>Cambridge Arcadia IRIS report 2009  2 </li></...
<ul><li>So far … </li></ul><ul><li>Evolution of catalogues </li></ul><ul><li>Changes in exposure of data </li></ul><ul><li...
 
<ul><li>Keyword based discovery services </li></ul><ul><li>New ways to exploit old data </li></ul><ul><ul><li>Relevancy ra...
 
<ul><li>Citations </li></ul><ul><li>Abstracts </li></ul><ul><li>Table of Contents </li></ul>
 
<ul><li>Tags </li></ul><ul><li>Public lists </li></ul><ul><li>Reader reviews </li></ul><ul><li>Dramatic growth in access p...
<ul><li>Web scale - resource discovery concept taken further </li></ul><ul><ul><li>Primo Central </li></ul></ul><ul><ul><l...
<ul><li>Catalogue data is now: </li></ul><ul><ul><li>Consumed as keywords (not left anchored) </li></ul></ul><ul><ul><li>F...
 
 
 
Our local catalogues National  /  international aggregations Joe Public Teenage software developer / hacker Booksellers We...
<ul><li>Bibliographic data linked to many aspects of successful teaching and research </li></ul><ul><ul><li>Citation lists...
<ul><li>“ Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched an...
<ul><li>Success of distributed access outside of cultural heritage </li></ul><ul><li>Single point of discovery? </li></ul>...
<ul><li>Past few years have seen a massive release of public data in government and cultural heritage sectors </li></ul><u...
<ul><li>RLUK and JISC initiative </li></ul><ul><li>Galleries, libraries, archives, museums </li></ul><ul><li>The Discovery...
 
<ul><li>Why not? </li></ul><ul><li>WorldCat has done this for years </li></ul><ul><li>Schema.org microdata– some semantic ...
<h1 itemprop=&quot;name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1> <span ...
<ul><li>Application Programme Interface (API) </li></ul><ul><li>Layered over LMS </li></ul><ul><li>Expose catalogue data f...
http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb
 
<ul><li>COMET project </li></ul><ul><li>80% of CUL bib records converted to Resource Description Framework (RDF) </li></ul...
Marc21 … 001 1000346 245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / D...
1.  <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history ...
 
The Linking Open Data cloud diagram -  http://richard.cyganiak.de/2007/10/lod
<ul><li>Wikipedia </li></ul><ul><li>Archives Hub </li></ul><ul><li>British Library BNB </li></ul><ul><li>British Museum </...
<ul><li>More data out there for cataloguers to reuse </li></ul><ul><li>More access points in records </li></ul><ul><li>Bet...
<ul><li>Hard to understand and decode </li></ul><ul><li>Supporting ‘stack’ not up to scratch </li></ul><ul><li>No seriousl...
<ul><li>Initial attempts with RDF </li></ul><ul><li>Newer lightweight formats and databases </li></ul><ul><li>Focus on cit...
<ul><li>If developers are now consumers of our data … </li></ul>
<ul><li>Most Cambridge data could be released under a permissive license (PDDL) </li></ul><ul><li>Europeana Digital Librar...
<ul><li>No one wants OCLC to go under (partners on COMET) </li></ul><ul><li>Valued partners </li></ul><ul><li>Focus on sha...
<ul><li>Based on a 40 year old format </li></ul><ul><li>Based on a need to print a human readable card </li></ul><ul><li>S...
<ul><li>AACR2 / MARC21 uses punctuation to denote content (100$d) </li></ul><ul><li>Mixed fields (text and numbers) (020$a...
<ul><li>Marc21 is binary encoded </li></ul><ul><li>Web-friendly standards are now the norm (XML/JSON)  1 </li></ul><ul><li...
<ul><li>LOC Bibliographic Framework Transition declares a shift away from Marc21 </li></ul><ul><li>Is the delay in introdu...
<ul><li>Steering for RDA and Marc replacement needs non-librarian input or ownership </li></ul><ul><li>Offer from NISO to ...
<ul><li>It becomes (even) easier to go to Amazon </li></ul><ul><li>Our status as authoritative data providers will be (fur...
<ul><li>http://www.discovery.ac.uk - Discovery </li></ul><ul><li>Ncg4lib mailing list </li></ul><ul><li>http://okfn.org - ...
<ul><li>Ed Chamberlain  </li></ul><ul><ul><li>@edchamberlain </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul...
Upcoming SlideShare
Loading in …5
×

Developments in catalogues and data sharing

2,167 views

Published on

A talk given at the Bodleian libaries 'From cataloguing to metadata' event in November 2011
Personal opinions on changing trends in library metadata creation and consumption. Also considers the challenges and rewards associated providing and licensing data for re-use by machines and the people that program them.

Published in: Education, Technology
  • Be the first to comment

Developments in catalogues and data sharing

  1. 1. <ul><li>How our catalogues are evolving </li></ul><ul><li>Opening and sharing the data within them </li></ul><ul><li>Ed Chamberlain </li></ul><ul><li>Systems Development Librarian – Cambridge University Library </li></ul>
  2. 2. <ul><li>Systems Development Librarian at the other place </li></ul><ul><li>Data ‘munger’ </li></ul><ul><li>Data consumer? </li></ul>
  3. 3. <ul><li>Control over data creation </li></ul><ul><li>Control over data consumption </li></ul><ul><li>Control over data environment </li></ul><ul><li>Control over data technology </li></ul>
  4. 6. <ul><li>No longer the single authority for content and data </li></ul><ul><li>Commercial, social and academic discovery mechanisms </li></ul><ul><li>Explosion of digital content </li></ul><ul><li>Illusion of ‘all on the web’ </li></ul>
  5. 8. <ul><li>Studies into Google Generation / ‘Generation Y’ 1 </li></ul><ul><li>Cambridge Arcadia IRIS report 2009 2 </li></ul><ul><ul><li>Preference for search engine over catalogue </li></ul></ul><ul><ul><li>Online over in-building </li></ul></ul><ul><ul><li>Trust tutors and peers over Librarian </li></ul></ul><ul><ul><li>Still respect the library ‘brand’ </li></ul></ul><ul><ul><li>1) ”The Google generation: the information behaviour of the researcher of the future” </li></ul></ul><ul><ul><li>Aslib Proceedings, V60, issue 4 10.1108/00012530810887953 </li></ul></ul><ul><ul><li>2) Arcadia IRIS Project report - http://arcadiaproject.lib.cam.ac.uk/docs/Report_IRIS_final.pdf </li></ul></ul>
  6. 9. <ul><li>So far … </li></ul><ul><li>Evolution of catalogues </li></ul><ul><li>Changes in exposure of data </li></ul><ul><li>To come? </li></ul><ul><li>Greater sharing of data </li></ul><ul><li>Library data used in non-library environments </li></ul>
  7. 11. <ul><li>Keyword based discovery services </li></ul><ul><li>New ways to exploit old data </li></ul><ul><ul><li>Relevancy ranking </li></ul></ul><ul><ul><li>Rich faceting </li></ul></ul><ul><ul><li>Greater linking </li></ul></ul><ul><ul><li>Search is the new browse </li></ul></ul><ul><ul><li>Repositories and archives </li></ul></ul><ul><li>Is the OPAC dead? </li></ul>
  8. 13. <ul><li>Citations </li></ul><ul><li>Abstracts </li></ul><ul><li>Table of Contents </li></ul>
  9. 15. <ul><li>Tags </li></ul><ul><li>Public lists </li></ul><ul><li>Reader reviews </li></ul><ul><li>Dramatic growth in access points </li></ul><ul><li>Input from true subject specialists </li></ul><ul><li>Lack of structure </li></ul><ul><li>No quality control </li></ul><ul><li>Compromise of sanctity? </li></ul>
  10. 16. <ul><li>Web scale - resource discovery concept taken further </li></ul><ul><ul><li>Primo Central </li></ul></ul><ul><ul><li>Summon </li></ul></ul><ul><ul><li>Ebsco Discovery </li></ul></ul><ul><ul><li>Worldcat local </li></ul></ul><ul><li>Hathi trust data can be used for full text searching of print collections </li></ul>
  11. 17. <ul><li>Catalogue data is now: </li></ul><ul><ul><li>Consumed as keywords (not left anchored) </li></ul></ul><ul><ul><li>Facted (not browsed) </li></ul></ul><ul><ul><li>Supplemented </li></ul></ul><ul><ul><li>Transformed </li></ul></ul><ul><ul><li>Merged </li></ul></ul><ul><ul><li>Amalgamated </li></ul></ul>
  12. 21. Our local catalogues National / international aggregations Joe Public Teenage software developer / hacker Booksellers Web start-ups Search engines Wikipedia Other libraries Research group website
  13. 22. <ul><li>Bibliographic data linked to many aspects of successful teaching and research </li></ul><ul><ul><li>Citation lists – measure output </li></ul></ul><ul><ul><li>Shared bibliography – core of research group work </li></ul></ul><ul><ul><li>Reading lists – backbone of undergraduate teaching </li></ul></ul><ul><ul><li>High quality data needed for re-use </li></ul></ul><ul><li>Not all possible whilst data resides in the library ‘silo’ </li></ul>
  14. 23. <ul><li>“ Library catalogues have imposed on them librarian or supplier-made decisions about what can/can’t be searched and in what way.  Some of these decisions are limited by current cataloguing rules, but not all; often the data is recorded, but not in a usable way, or is there but isn’t tapped by the interface.  For example, in most catalogues you can limit by publication type to newspapers, but you can’t limit by frequency of the issues.” </li></ul><ul><li>“ Releasing data means that people can start to use it in the way they want to.” </li></ul>
  15. 24. <ul><li>Success of distributed access outside of cultural heritage </li></ul><ul><li>Single point of discovery? </li></ul><ul><li>Taxpayer generated – give it back! </li></ul>Why not share?
  16. 25. <ul><li>Past few years have seen a massive release of public data in government and cultural heritage sectors </li></ul><ul><ul><li>Open Government Data - http://data.gov.uk </li></ul></ul><ul><ul><li>Open Knowledge Foundation - http://okfn.org </li></ul></ul><ul><li>EU Commission mandate to open data </li></ul><ul><li>Shared in ways for easy reuse and linking </li></ul>
  17. 26. <ul><li>RLUK and JISC initiative </li></ul><ul><li>Galleries, libraries, archives, museums </li></ul><ul><li>The Discovery principles propose that: </li></ul><ul><ul><li>' Open metadata creates the opportunity for enhancing impact through the release of descriptive data about library, archival and museum resources. It allows such data to be made freely available and innovatively reused to serve researchers, teachers, students, service providers and the wider community in the UK and internationally.' </li></ul></ul>http://discovery.ac.uk
  18. 28. <ul><li>Why not? </li></ul><ul><li>WorldCat has done this for years </li></ul><ul><li>Schema.org microdata– some semantic structure </li></ul><ul><li>Use case for catalogue data in an advertising environment? </li></ul><ul><li>Google taken 10% (so far) </li></ul>
  19. 29. <h1 itemprop=&quot;name”>The Cambridge companion to Spenser edited by Andrew Hadfield. [electronic resource] /</h1> <span style=&quot;display: none;&quot; itemprop=&quot;publisher&quot;>Cambridge University Press,</span> <span style=&quot;display: none;&quot; itemprop=&quot;datePublished&quot;>2001.</span>
  20. 30. <ul><li>Application Programme Interface (API) </li></ul><ul><li>Layered over LMS </li></ul><ul><li>Expose catalogue data feeds for developers </li></ul><ul><li>Anyone can use them </li></ul><ul><li>Simple request, simple response </li></ul><ul><li>http://www.lib.cam.ac.uk/api </li></ul>
  21. 31. http://www.lib.cam.ac.uk/api/voyager/newtonSearch.cgi?searchArg=darwin&databases=depfacaedb
  22. 33. <ul><li>COMET project </li></ul><ul><li>80% of CUL bib records converted to Resource Description Framework (RDF) </li></ul><ul><li>Enriched with direct links to the Library of Congress </li></ul><ul><li>Vocab in-line with British Library work </li></ul><ul><li>OCLC FAST and VIAF authority sources </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul>
  23. 34. Marc21 … 001 1000346 245$aEarly medieval history of Kashmir : $b[with special reference to the Loharas] A.D. 1003-1171 / DC XML … <dc:identifer>1000346</dc:identifer> <dc:title>Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171</dc:title> RDF triples … <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot;
  24. 35. 1. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/title> &quot;Early medieval history of Kashmir : [with special reference to the Loharas] A.D. 1003-1171&quot; . 2. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/1cb251ec0d568de6a929b520c4aed8d1> . 3. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/type> <http://data.lib.cam.ac.uk/id/type/46657eb180382684090fda2b5670335d> . 4. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/identifier> &quot;UkCU1000346&quot; . 5. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/issued> &quot;1981&quot; . 6. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/creator> <http://data.lib.cam.ac.uk/id/entity/cambrdgedb_a5a6f7a184ff02e08b1befedc1b3a4d0> . 7. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://purl.org/dc/terms/language> <http://id.loc.gov/vocabulary/iso639-2/eng> . 8. <http://data.lib.cam.ac.uk/id/entry/cambrdgedb_1000346> <http://RDVocab.info/ElementsplaceOfPublication> <http://id.loc.gov/vocabulary/countries/ii>
  25. 37. The Linking Open Data cloud diagram - http://richard.cyganiak.de/2007/10/lod
  26. 38. <ul><li>Wikipedia </li></ul><ul><li>Archives Hub </li></ul><ul><li>British Library BNB </li></ul><ul><li>British Museum </li></ul><ul><li>Library of Congress </li></ul><ul><li>LOD at Bibliothèque nationale de France </li></ul><ul><li>BBC Nature </li></ul><ul><li>University of Southampton </li></ul><ul><li>Open University </li></ul>
  27. 39. <ul><li>More data out there for cataloguers to reuse </li></ul><ul><li>More access points in records </li></ul><ul><li>Better mechanisms for record enrichment </li></ul><ul><li>Scope for revised cataloguing workflows </li></ul><ul><li>Records have a permanent identity on the web </li></ul>
  28. 40. <ul><li>Hard to understand and decode </li></ul><ul><li>Supporting ‘stack’ not up to scratch </li></ul><ul><li>No seriously compelling use case (yet) </li></ul><ul><li>Other ways to provide linked data </li></ul><ul><li>Use URIs for people and things </li></ul>
  29. 41. <ul><li>Initial attempts with RDF </li></ul><ul><li>Newer lightweight formats and databases </li></ul><ul><li>Focus on citation metadata for the sciences </li></ul><ul><li>New ways for scientists to share and work with bibliography </li></ul><ul><li>http://openbiblio.net/ </li></ul><ul><li>http://openbiblio.net/principles/ </li></ul>
  30. 42. <ul><li>If developers are now consumers of our data … </li></ul>
  31. 43. <ul><li>Most Cambridge data could be released under a permissive license (PDDL) </li></ul><ul><li>Europeana Digital Library approve Creative Commons ‘Zero’ licensing of data </li></ul><ul><li>British Library BNB – Creative Commons ‘Zero’ </li></ul><ul><li>OCLC looking at attribution only licensing </li></ul><ul><li>Move away from ‘non-commercial’ wording </li></ul>Open Data Commons Public Domain Dedication and License (PDDL)
  32. 44. <ul><li>No one wants OCLC to go under (partners on COMET) </li></ul><ul><li>Valued partners </li></ul><ul><li>Focus on sharing ‘non-marc21’ formats of greater use to the non-Librarian </li></ul><ul><li>Vendors aim to profit from services based on data rather than data for its own sake? </li></ul>
  33. 45. <ul><li>Based on a 40 year old format </li></ul><ul><li>Based on a need to print a human readable card </li></ul><ul><li>Syntax, vocabulary, field names and content all intertwined </li></ul><ul><li>According to OCLC Research : </li></ul><ul><ul><li>Only 10% of all Marc tags in Worldcat appear in 100% of all Worldcat records </li></ul></ul><ul><ul><li>65% of tags appear in less that 1% of records. </li></ul></ul>
  34. 46. <ul><li>AACR2 / MARC21 uses punctuation to denote content (100$d) </li></ul><ul><li>Mixed fields (text and numbers) (020$a) </li></ul><ul><li>Duplication </li></ul><ul><ul><li>author name </li></ul></ul><ul><ul><li>format </li></ul></ul><ul><ul><li>One hundred notes fields (or close enough) ? </li></ul></ul>df100$aBradford, Gamaliel$d1863 - 1932. <authorParsed> <surname>Bradford</surname> <restOfName> Gamaliel</restOfName> <birthDate>1863</birthDate> <birthDateNormalised>18630101</birthDateNormalised> <deathDate>1932</deathDate> <deathDateNormalised>19320101</deathDateNormalised> </authorParsed>
  35. 47. <ul><li>Marc21 is binary encoded </li></ul><ul><li>Web-friendly standards are now the norm (XML/JSON) 1 </li></ul><ul><li>Numbers for field names? </li></ul><ul><li>Bad character encoding allowed </li></ul>
  36. 48. <ul><li>LOC Bibliographic Framework Transition declares a shift away from Marc21 </li></ul><ul><li>Is the delay in introduction of RDA until we get a ‘better container’ ? </li></ul><ul><li>No system vendor is going forward with Marc21 </li></ul><ul><li>Will take 10+ years </li></ul><ul><li>What is to come next? </li></ul>
  37. 49. <ul><li>Steering for RDA and Marc replacement needs non-librarian input or ownership </li></ul><ul><li>Offer from NISO to take the work on </li></ul>Karen Coyle criticises the Marc21 Bibliographic Framework Transition Initiative for not including museums, publishing, and IT professionals … She argues that our data is not just for us to consume alone … “ The next data carrier for libraries needs to be developed as a truly open effort. It should be led by a neutral organization (possibly ad hoc) that can bring together  the wide range of interested parties and make sure that all voices are heard. Technical development should be done by computer professionals with expertise in metadata design. The resulting system should be rigorous yet flexible enough to allow growth and specialization.” http://kcoyle.blogspot.com/2011/08/bibliographic-framework-transition.html
  38. 50. <ul><li>It becomes (even) easier to go to Amazon </li></ul><ul><li>Our status as authoritative data providers will be (further) eroded </li></ul><ul><li>No-one will want to play with us if we cannot learn to share </li></ul>
  39. 51. <ul><li>http://www.discovery.ac.uk - Discovery </li></ul><ul><li>Ncg4lib mailing list </li></ul><ul><li>http://okfn.org - Open Knowledge Foundation </li></ul><ul><li>http://data.lib.cam.ac.uk </li></ul>
  40. 52. <ul><li>Ed Chamberlain </li></ul><ul><ul><li>@edchamberlain </li></ul></ul><ul><ul><li>[email_address] </li></ul></ul><ul><ul><li>http://www.slideshare.net/EdmundChamberlain/ </li></ul></ul>

×