Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Cambridge university library ess update for ucs


Published on

Talk given at UCS morning seminar 4/5/11

Published in: Education, Technology
  • Be the first to comment

Cambridge university library ess update for ucs

  1. 1. From Books to Bits - IT developments in the University Library Ed Chamberlain - Systems Development Librarian
  2. 2. Overview <ul><li>UL and the ‘shift to the digital’ </li></ul><ul><li>ESS team and work </li></ul><ul><ul><li>Resource Discovery service </li></ul></ul><ul><ul><li>COMET – Cambridge Open METadata </li></ul></ul>
  3. 3. Cambridge University Library … <ul><li>One of six legal deposit libraries </li></ul><ul><li>6.5+ million items </li></ul><ul><li>1 major site </li></ul><ul><ul><li>‘Mausoleum of dusty old books for the humanities’ </li></ul></ul><ul><li>4 ‘dependent libraries’ </li></ul><ul><li>Wider group of college and departmental libraries – not a federation or single service </li></ul>
  4. 4. Shift to the digital <ul><li>Aprx. 40% of UL & dependents materials budget now spent on online resources (ejournals, database subscriptions) </li></ul><ul><ul><li>e.g. Science Direct, Web of Knowledge, JSTOR, CUP ejournals </li></ul></ul><ul><ul><li>Majority of this on STEM publications </li></ul></ul><ul><ul><li>Reliance on subscription content housed on publishers websites </li></ul></ul><ul><ul><li>Increasing cost </li></ul></ul>
  5. 5. Shift to the digital <ul><li>Legal deposit electronic intake to start in next year </li></ul><ul><ul><li>Publishers can submit electronic versions of material for legal deposit to the legal deposit agency </li></ul></ul><ul><ul><li>Initially voluntary for periodicals only </li></ul></ul><ul><ul><li>Dependent on law being passed </li></ul></ul>
  6. 6. Internal digitisation <ul><li>Digitising special collections for some time with external funding </li></ul><ul><li>No Google books project </li></ul><ul><li>Planning for a unified digital library </li></ul><ul><li>DSpace </li></ul>
  7. 7. Organisational change <ul><li>New skills base in staff </li></ul><ul><li>Changes in buildings </li></ul><ul><li>Changes in services </li></ul><ul><li>Changes in approach </li></ul>
  8. 8. UL divisional layout
  9. 9. Who we are? <ul><li>30+ staff in the UL </li></ul><ul><li>Lead by Patricia Killiard </li></ul><ul><li>Mix of skills and backgrounds - (Librarians, I.T. Officers, Developers, Early Career Researchers ...) </li></ul>
  10. 10. ESS – major areas of activity
  11. 11. ESS – major areas of activity
  12. 12. Two recent projects … <ul><li>Both in similar area – how library readers can find our stuff, but very different in tone and scope </li></ul><ul><ul><li>Resource Discovery platform – commercial software acquisition and implementation (2008-2010) </li></ul></ul><ul><ul><li>COMET (Cambridge Open METadata) – JISC funded exercise in publishing linked open data (2011) </li></ul></ul>
  13. 13. <ul><li>Resource Discovery </li></ul>
  14. 14. What do you mean by Resource Discovery? <ul><ul><li>Catalogue alone does not represent the true scope of library resources </li></ul></ul><ul><ul><li>Library catalogues of print collections (Newton) </li></ul></ul><ul><ul><li>Online article databases </li></ul></ul><ul><ul><ul><li>Abstract only – Web of Knowledge, Scopus </li></ul></ul></ul><ul><ul><ul><li>Full text – JSTOR, Science Direct, journal publisher sites etc </li></ul></ul></ul><ul><ul><li>A-Z of ejournal titles </li></ul></ul><ul><ul><li>Ebook websites </li></ul></ul><ul><ul><li>Repository content </li></ul></ul><ul><ul><li>Archive catalogue </li></ul></ul><ul><ul><li>Other stuff (content on our websites) </li></ul></ul>
  15. 15. Problems with Newton <ul><li>Newton – traditional library catalogue: </li></ul><ul><ul><li>Replicates Author / Title / Subject card index on the web </li></ul></ul><ul><ul><li>Tied into Voyager – part of the same application stack as library ‘back office’ </li></ul></ul><ul><ul><li>Cambridge setup fragmented by databases (e.g. colleges A-N) </li></ul></ul><ul><li>Trend in search towards: </li></ul><ul><ul><li>Keyword based searching </li></ul></ul><ul><ul><li>Initial ‘dumb’ search – refine afterwards </li></ul></ul><ul><ul><li>Is this a good thing? </li></ul></ul>
  16. 16. Google generation? <ul><li>‘ Although young people demonstrate an ease and familiarity with computers, they rely on the most basic search tools and do not possess the critical and analytical skills to asses the information that they find on the web.’ </li></ul><ul><li>‘ The study calls for libraries to respond urgently to the changing needs of researchers and other users and to understand the new means of searching and navigating information. Learning what researchers want and need is crucial if libraries are not to become obsolete, the report warns.’ </li></ul><ul><li>Nicholas, D., et al. &quot;The Google generation: the information behaviour of the researcher of the future.&quot; Aslib Proceedings 60.4 (2008):290-310 . </li></ul>
  17. 17. Diminishing brand? <ul><li>Many students expressed low levels of awareness of electronic resources, combined with a high use of Google. </li></ul><ul><li>Very few undergraduate students identified librarians as a source of either recommendations, or of help in searching for information. </li></ul><ul><li>However, they regarded the library as a key source of information material, and as a useful study space. </li></ul><ul><li>Information Skills Provision: Mapping the information skills of Cambridge undergraduates and induction / training provision across the University. Lizz Edwards-Waller, 2009 ( </li></ul>
  18. 18. Two main types of response … <ul><li>Attempt to ‘educate them’ </li></ul><ul><li>Try and adapt our resources and mechanisms to better suit their needs </li></ul><ul><li>Information rich, time poor </li></ul>
  19. 19. What could we do? <ul><li>Adopt the newer trend of library ‘resource discovery software’ </li></ul><ul><ul><li>Common features </li></ul></ul><ul><ul><li>Recognize that library resources do not end at the catalogue </li></ul></ul><ul><ul><li>Harvest resources from ‘silos’ (catalogue, repository) etc. </li></ul></ul><ul><ul><li>Separate front end application from backend </li></ul></ul>
  20. 21. What we did <ul><li>Went to tender (full EU): </li></ul><ul><ul><li>Open source options look promising now, but not there at the time </li></ul></ul><ul><ul><li>We wanted to move quickly </li></ul></ul><ul><ul><li>Reached decision by June 2009 </li></ul></ul>
  21. 22. What happened <ul><li>Five months of contract negotiation – signed in October 2009 </li></ul><ul><li>Hardware purchased December 2009 </li></ul><ul><li>Software installed January 2010 </li></ul><ul><li>Live by August 2010 </li></ul>
  22. 23. What we got <ul><li>Aquabrowser – used by Harvard, Edinburgh, York, Chicago, National Libraries of Wales and Scotland </li></ul><ul><li>Scalable, proven </li></ul><ul><li>Minimal hardware requirements </li></ul><ul><li>Relatively user friendly </li></ul><ul><li>Affordable </li></ul>
  23. 24. How does it work?
  24. 25. How does it work?
  25. 26. How does it work? <ul><li>Different stages: Import step 1 and 2 </li></ul><ul><li>First step </li></ul><ul><li>– Individual datasources </li></ul><ul><li>– Mapping data structures X, Y, Z to AquaBrowser data format </li></ul><ul><li>– Defining indexes </li></ul><ul><li>Second step </li></ul><ul><li>– Merging </li></ul><ul><li>• Bibs, holdings, items </li></ul><ul><li>– Enriching </li></ul><ul><li>• Edition grouping, FRBR’ization </li></ul>
  26. 30. What does it cover? <ul><li>Voyager / Newton catalogues (about 6 million bibliographic records) </li></ul><ul><li>Most of Dspace (harvested as Dublin Core) </li></ul><ul><li>‘ Just in time’ search of article databases </li></ul>
  27. 31. What can we do with it? <ul><li>Web interface: </li></ul><ul><ul><li> </li></ul></ul><ul><ul><li>Branded as LibrarySearch for Cambridge </li></ul></ul><ul><li>XML API (rest-like) produces Marc21-XML and Dublin Core data: </li></ul><ul><ul><li> </li></ul></ul>
  28. 32. Problems <ul><li>Historical nature of Cambridge bibliographic records </li></ul><ul><li>No policy of centralised cataloguing in Cambridge </li></ul><ul><ul><li>Lots of duplicate records across Cambridge libraries </li></ul></ul><ul><ul><li>ID centric de-duplication – works up to a point </li></ul></ul>
  29. 33. More problems <ul><li>Cannot replace totally Newton </li></ul><ul><ul><li>Not the original intention </li></ul></ul><ul><ul><li>Place for multiplicity of interfaces </li></ul></ul><ul><ul><li>Shift focus of marketing and development to LibrarySearch </li></ul></ul>
  30. 34. Coming soon: <ul><ul><li>British Library electronic legal deposit </li></ul></ul><ul><ul><li>Archives catalogues </li></ul></ul><ul><ul><li>Search engine crawlable ... </li></ul></ul>
  31. 35. <ul><ul><ul><ul><ul><li>COMET </li></ul></ul></ul></ul></ul><ul><ul><ul><ul><ul><li>(Cambridge Open METadata) </li></ul></ul></ul></ul></ul>
  32. 36. Background <ul><li>Peter Murray-Rust and the JISC Open Bibliography Project </li></ul><ul><li>JISC followed this up with a general call for ‘Infrastructure for Resource Discovery’ </li></ul>
  33. 37. COMET (Cambridge Open METadata) <ul><li>Releasing large subset of UL records under a Public Domain Data License </li></ul><ul><ul><li>Identifying IPR history of our bibliographic data </li></ul></ul><ul><ul><li>Documenting process and releasing tools for others to do the same </li></ul></ul><ul><ul><li>Some as Marc21 </li></ul></ul><ul><ul><li>Converting to useful linked RDF </li></ul></ul><ul><ul><li>Establishing a triplestore for the library </li></ul></ul>
  34. 38. Why? <ul><li>Part of a larger bid across the UK to open up data to provide data for national level discovery options </li></ul><ul><li>See what developers can do with our stuff </li></ul><ul><li>Gain in-house understanding of semantic web </li></ul><ul><li>Better realise value in records through contribution to the public domain </li></ul>
  35. 39. Why not the whole lot? <ul><li>Legal ownership of bibliographic data </li></ul><ul><ul><li>Large chunks of records from cataloguing collectives – reuse as RDF under public domain license not necessarily covered </li></ul></ul><ul><ul><li>OCLC – the major record provider are partners on the project </li></ul></ul>
  36. 40. Problems <ul><li>RDF vocabs – no accepted practice for bibliographic material </li></ul><ul><li>Marc21 does not translate well </li></ul><ul><li>Triplestores – relative immaturity of software </li></ul><ul><li>URI construction – needs to done in a sensible extensible fashion </li></ul>
  37. 41. What? <ul><li>Eventually hope that we could provide all our metadata in this way </li></ul><ul><li>Joint effort with Caret – parallel project at the Fitzwilliam </li></ul><ul><li>Triplestore at </li></ul><ul><li>Drawing on external developments – no modelling of data – use existing vocabs and URI guidelines </li></ul><ul><li>Project blogspot: / </li></ul>
  38. 42. Ed Chamberlain <ul><li>[email_address] </li></ul><ul><li>@edchamberlain </li></ul>