Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcherPresentation Transcript
Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher Redesign, collection analysis, recommendations Joy Palmer, Mimas University of Manchester
Key points Background & context of Copac Development in progress Strategic issues and directions R&D/Innovations Work Collections management project Surfacing the Academic Long Tail project
Aggregation of 50+ research & specialist libraries 40 million records < Aprox 1 million search sessions per month Primary academic use case – locating long tail materials Primary workflow use case – cataloguing & ILL support Funded by JISC since 1996 Sponsored by RLUK (and based on RLUK data +) In re-engineering process Expanding consistently to include specialist libraries Copac…
Others include…. Imperial War Museum Chetham’s library Windsor Castle National Maritime Museum British Museum French Institute University of Exeter Special Collections The Women’s Library Institute of Education Royal Academy of Music Kew Royal Botanic Gardens Tate Gallery Library Natural History Museum
Half our users Advanced researchers Humanities-based Been with us a while… Looking for specific items More later…..
And the rest are mostly librarians Cataloguing Support Collections Mgt ILL Support Researcher Support
Current data model
ftp Pre-processing Contributing libraries Nielsen Bookdata ToC, Reviews, Summaries, URL’s ftp RLUK MARC21 db http Live request for book cover images Deduplication& conversion to MODS XML Admin Copac database Live circulation data Harvesting Currently OpenTextLiveLink Discovery Server Z39.50& CGI http Consolidated records & Single records plus Local holdings Contributing Library catalogues Includes details of: books, journal titles, proceedings, theses, maps & plans, print & recorded music, video & film, spoken word, electronic materials, articles, archives, etc. Google Database cross-search
50m merged records
c.1m Sessions p/m
OpenURL router Z39.50 ESTC at the BL Search updates User interfaces ILL/Copy via users’ OpenURL server HTTP Social media COinS RSS OpenURL Z39.50 SRU/SRW Users: Open access Use by HE, FE, NHS, Libraries, Schools, General public M2M Last Updated 07/12/07
Development activity in progress New hardware (Oracle) Enhanced-de-duplication Improved search (ranking, facets) ‘FRBR-ised’ record display Enhanced user interface Additional specialist libraries Graphic redesign
Strategic issues -- macro Changing technological landscape and user-expectations Death of the physical? ‘Good enough’ = just-in-time (not a specific item) eBook search and discovery challenges Integrated and cross-domain search
Strategic issues - micro Leadership and community role Identity and positioning Enabling infrastructure support library workflows or resource discovery service? Governance Collections policy (uniqueness vs. comprehensiveness?) Innovation vs. service delivery
Copac Collections Management Project
Can I release this book?How does my collection compare in strength to that of other UK libraries?
Project background Builds on the work of the White Rose Consortium Partners are Leeds, York and Sheffield Universities Funded by JISC as part of Discovery initiative (making Copac data ‘work harder). Sponsored & facilitated by RLUK 7 months and limited in budget
How it works Web-based Identifies which locations items/batches exist Search by ISBN, RLUK #, author, title, subject Batch search (comma delimited sets) Data visualisation of results Map views Graphs Record export in MODS, CSV
Exploratory and iterative in approachDevelopment/testing cycle with partners trialling and providing structured feedback
Six use cases developed Identifying last copies among titles considered for withdrawal Identifying collection strengths Deciding whether to conserve a book Reviewing a collection at the shelves Prioritising a collection or items for digitisation Subject strengths – collection development and marketing/differentiation
Findings Need for further development and refinement of the tools (esp. duplication & user interface issues) Significant potential for answering strategic questions about the status of collections
Particularly Overlap between the holdings of major UK research libraries in particular subject areas; Differences in that overlap between different subject disciplines and areas; The proportion of unique titles within those collections; The extent to which researchers will find that they no longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
Next proposed steps Expansion of test libraries and resilience testing of the tool. Address evidence of scalability. Building collaborations and alliances with interested organisations pursuing complimentary activity Addressing the development of a business model for a service beyond a pilot More targeted communications and dissemination of the activity
SALT Surfacing the Academic Long Tail
Hypothesis…Library circulation activity data can be used to support humanities research by surfacing underused ‘long tail’ library materials through search
And also… how sustainable would an API-based national shared service be?Can such a service support users and also library workflows such as collections management?RLUK, M25, Leeds University, Cambridge University, Sussex University.
--John Rylands University Library: --1.3 million bib records--600,000 search sessions per month--23% of records unique (cross checked against WorldCat)--40,000 students10 years of circulation data
Why aren’t we there yet?
Where’s the business case? user demandbenefitsvaluesustainability
arts & humanities researchers borrow books…
Y- market research reveals these users as… Centrifugal searchersBerrypickers from various trailsQuite isolated and prone to pitfalls
And increasingly they just don’t ask librarians…They ask their tutors and each other where to look…
Researchers are suspicious about UGC, especially ratings & reviews, but…. they could see the immediate benefit of‘tacit’ recommender functions….
What if? this represented a national aggregation of data gathered from the usage activity of these researchers, collected as they worked with a national aggregation of unique or rare research collections?
In humanities research it’s all the way
What can this mean? Surfacing and increasing usage of hidden collections ( & demonstrating value) Providing new routes to discovery based on use and disciplinary contexts (not traditional classification). Powering ‘centrifugal searching’ and discovery through serendipity Enabling new, original research – academic excellence…
Targeted academic researchers
Examine relationship between relevance and frequency of borrowing
Does frequency of borrowing correlate to increased relevancy?
What are users saying? 3 focus groups (18 people) MA/PhD humanities students (mixed ages) Recommendations are already key to them: Supervisors/Peers Amazon Bib citations Don’t accept recommendations blindly
Focus groups and user testing 3 focus groups (18 people) MA/PhD humanities students (mixed ages) How relevant/useful are the recommendations at first glance? Do any other recommendations look useful? Were you previously aware of these texts? How likely would you be to borrow the recommended item?
What are users saying? Recommendations are already key to them: Supervisors/Peers Amazon Bib citations Don’t accept recommendations blindly Serendipity important (but not to all)
What are users saying? Very supportive, but in practice founds results too generalist, irrelevant, and sometimes bizzare! Lower ranked recommendations much better Did you find something you’d borrow (yes!) Find something new? (mixed) Would you use it? (yes!) Useful for searching more widely More university data needed to improve results
Relation between key critical texts at the nose And the other stuff here
Can we make the data work harder to solve other shared problems?
Issues for sustainability Is there a clear-cut case for a national shared service here? Data model: data out = easy data in = not so much Licensing & Attribution: collective ownership of a collective pot? Is proof of our hypothesis key to sustainability?
Key findings Lower thresholds will throw up ‘long tail’ items, but relevance and usefulness is not evident (but what is The Long Tail?) Users aren’t concerned about data privacy This can be successful without a significant backlog of data A shared service needs to aggregate activity data from more libraries (but not many more)
Proposed next steps Aggregate more data Assess impact over time Gather requirements and costs for a shared service Establish more data extraction recipes Investigate utility for collections mgt further Investigate usefulness for teachers & supervisors