• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher
 

Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher

on

  • 1,770 views

 

Statistics

Views

Total Views
1,770
Views on SlideShare
1,766
Embed Views
4

Actions

Likes
0
Downloads
9
Comments
0

1 Embed 4

http://tweetedtimes.com 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Redesign, collection analysis, recommendations …
  • Copac Closest thing UK has to National Union CatalogueOver 50 UK academic or specialist librariesGrowingCurrently being radically re-engineered from ground upFRBR-isingCompletely new UIMobile appsPersonalisation?Linked data – RDF planned for 2011Primarily used by humanities & philosophical studies postgrads and academicsHeavily used -- 8 million sessions per month
  • Over sixty libraries….
  • Get Shirley’s latest data model slide
  • I need Shirley to send me stuff here…By July 2011 the new Copac service will include: Enhanced user interface and complete graphic redesign with ‘FRBRised’ record display, enhanced deduplication, improved search and ranking of results and faceted browsing support (released iteratively. First prototype available for testing by end of November 2011). Improved coverage through the incorporation of more UK academic libraries (6 more by July 2011 – expansion rate and tactics to be determined post July 2011) Integration of article level data (Zetoc) and access to full content (for authenticated academic users) An ‘Open Copac’ API and support tools for developers wishing to ‘mash’ Copac MODS xml content into new applications (e.g. local library technical developers creating scholarly support resources). Licensing issues still be explored as part of RDTF, but there are some ‘quick win’ ways forward here. More flexible personalisation features for end users so that they can export and repurpose content within citation management systems and social media contexts (blogs, virtual learning spaces, etc) Also in development, with prototypes released by September 2011: Collaborative Collections Management service prototype (supporting the decision‐making processes for librarians managing, developing & disposing of collections) PENDING JISC FUNDING: Recommender functionality based on aggregations of UK university book circulation data. People who borrowed this also borrowed…) 2
  • Key strategic issues  Leadership & community roleCopac’s ‘passivity’ and lack of overt leadership was raised several times. Copac needs to exploit its position as a community focal point, and take a more overt leadership role in engaging the UK HE/FE library community, understanding its needs, and how Copac can serve those needs. How Copac engages with OCLC also needs to be determined, with an emphasis on finding ways to collaborate rather than compete.Identity & brand positioningCopac lacks a strong brand identity. Work is required to position Copac as a brand which targets both end users and stakeholders (librarians, developers) Enabling Infrastructure or One Service?A key challenge in developing a coherent identity for Copac is the question of whether Copac is ‘simply’ a resource discovery service, or an infrastructure to support resource discovery but also other business requirements. The question of what drives Copac strategically also needs to be tackled and clarified (see Governance below). GovernanceWhat drives Copac and shapes its strategy and values? If Copac is to exploit its full potential as an enabling infrastructure, and to take a leadership role, then formal Governance structures need to be put into place. Collections policy. Exposing unique content; providing comprehensive coverage?Copac does not have a clear Collections Policy. There are strong drivers to incorporate more unique and specialist content into Copac, and there are equally strong drivers to provide more comprehensive coverage so that all HE/FE users can use Copac to locate content locally. Cross domain search and discovery also featured heavily. Copac needs to develop a sustainable collections policy that best serves the needs of end users and contributing institutions. Copac also needs to explore the technical feasibility of serving as a National Union Catalogue for all UK HEFE libraries – are there risks around adding more libraries? (Will the value decrease as more data is added? Will standards need to drop?) Can coverage be achieved without aggregating? InnovationCopac does innovative work and has been successful in securing funds for specific projects. But the service needs to develop a more strategic and prioritised approach to innovation, and specifically for how it engages with social media, Open Data, Linked data, and the shared services agenda. Copac needs to develop a strategy that helps it innovate for new services or functionality that meets the needs of existing users, and also opens up the opportunity of reaching new users and markets. Copac also needs to identify how it wishes to engage the developer community through opening up data, or sharing source code.
  • Key strategic issues  Leadership & community roleCopac’s ‘passivity’ and lack of overt leadership was raised several times. Copac needs to exploit its position as a community focal point, and take a more overt leadership role in engaging the UK HE/FE library community, understanding its needs, and how Copac can serve those needs. How Copac engages with OCLC also needs to be determined, with an emphasis on finding ways to collaborate rather than compete.Identity & brand positioningCopac lacks a strong brand identity. Work is required to position Copac as a brand which targets both end users and stakeholders (librarians, developers) Enabling Infrastructure or One Service?A key challenge in developing a coherent identity for Copac is the question of whether Copac is ‘simply’ a resource discovery service, or an infrastructure to support resource discovery but also other business requirements. The question of what drives Copac strategically also needs to be tackled and clarified (see Governance below). GovernanceWhat drives Copac and shapes its strategy and values? If Copac is to exploit its full potential as an enabling infrastructure, and to take a leadership role, then formal Governance structures need to be put into place. Collections policy. Exposing unique content; providing comprehensive coverage?Copac does not have a clear Collections Policy. There are strong drivers to incorporate more unique and specialist content into Copac, and there are equally strong drivers to provide more comprehensive coverage so that all HE/FE users can use Copac to locate content locally. Cross domain search and discovery also featured heavily. Copac needs to develop a sustainable collections policy that best serves the needs of end users and contributing institutions. Copac also needs to explore the technical feasibility of serving as a National Union Catalogue for all UK HEFE libraries – are there risks around adding more libraries? (Will the value decrease as more data is added? Will standards need to drop?) Can coverage be achieved without aggregating? InnovationCopac does innovative work and has been successful in securing funds for specific projects. But the service needs to develop a more strategic and prioritised approach to innovation, and specifically for how it engages with social media, Open Data, Linked data, and the shared services agenda. Copac needs to develop a strategy that helps it innovate for new services or functionality that meets the needs of existing users, and also opens up the opportunity of reaching new users and markets. Copac also needs to identify how it wishes to engage the developer community through opening up data, or sharing source code.
  • “…to develop and test a service that will enable improved decision making regarding the retention, disposal, and redistribution of materials. The service will provide evidence of the wider availability of individual materials and/or collections when discussing the disposal of materials with academic staff within an institution.” And additionally will, “achieve the longer term aim of developing the technical framework required to support a more proactive and cohesive approach to collection management at a national level.  The development of the CCM Pilot will enable the practical demonstration of applying our tools to collection management workflows and provide an assessment of benefit that will feed into sustainability and business planning.
  • Builds on work of White Rose ConsortiumPartnering with RLUK7 months, minimal budgetProposal in development to extend (Discovery initiative).Uses Copac dataFunded through - RDTF/Discovery – making data ‘work harder.’ By building the service on top of Copac data, the project contributes significantly to furthering the work of the JISC & RLUK Resource Discovery Taskforce, which aims to explore how data can be opened up and made to ‘work harder.’”From the Interim Report at http://www.rluk.ac.uk/files/CopacCMInterimReportfinal.pdfCheap & low profile…
  • How it worksWhat we’ve built:Web –based toolUses a variety of means to identify in which locations a particular item or batches of items exist. Data visualisation provides differing views of the results for example, map views to assess quickly where items are held across the country, and graphs to indicate how many items searched for exist within specific libraries.Access via IP address checkingFacility to search for a set of records by entering a comma delimited set of local record identifiers or standard record numbers via a text box. The initial limit on the number of records in a set (~100).Result set display including holding libraries.Record export in MODS format.Option to view a map of the results to see where the documents are held.Option to see a graph showing the number of records held by each library. 1. The ability to set up an RSS feed that will tell you when the results are available. 2. A Search History button on the search screen that lets you look at all your batch search results.3. No limit on the number of records for batch searching.4. Revised search procedure behind-the-scenes.5. More information in the brief record.6. A basic full record html display.7. Records are sorted by author/title; records with no author will file at the top in title order.8. Export of visualisation data now saves as a csv file.9. There is a MARC exchange export format.  
  • Exploratory in Approach – iterative development; put tools in front of users (WRC) for feedback so we can refine them. Very much a pilot/exploratory projectManaging expectations….
  • 6.1 Interim Report Use CasesThe Interim Project Report described four Use Case scenarios that the project team had developed in light of the participating library’s collection management requirements and experience of the first CCM interface trial, alongside the discussion arising from that work. These four covered the following:Use case 1: Identifying last copies among titles considered for withdrawalUse case 2: Identifying collection strengthsUse case 3: Deciding whether to conserve a bookUse case 4: Reviewing a collection at the shelvesResulting from continued consideration of the benefits the CCM Tool offers to libraries and how easily it can be integrated into existing workflows two further Use Case scenarios have been developed during Phase 3 of the project. The purpose of these illustrative use cases will be to demonstrate how this tool can be used productively. The aim will be to look to applying and testing all the use cases in live conditions once the CCM Tool is more fully developed.6.2 Detail of additional Use Case Scenarios6.2.1 Use Case 5: Prioritising a collection or item(s) for digitisationContextMany libraries are faced with the monumental task of potentially digitising entire collections either through user request or preservation needs. Dependent upon resources and funding some material may be lost due to the lack of data required to understand what should be prioritised over other content. With all the competing priorities of a digitisation service (on-demand, research led, project funded, preservation) having a tool that identifies what is a strength, is unique or endangered 6.2.2 Use Case 6: Subject search - Collection development and marketing ContextUniversities are under pressure to attract more students and high quality researchers. This often involves creating new courses and/or research areas and expanding the topics covered by existing courses or research. It also requires strong marketing. An example of the former would be creating a new department. An example of the latter would be a department which historically focuses on their topic in the context of Europe and the US, but in order to offer a competitive course they expand this to include Asia and the Pacific, or even the whole world. The library’s collections will often be weak in the new area and will need building up. Libraries also need to market what they already have more effectively.
  • This case study, based on the current version of the Copac tools, demonstrates both the need for further development and refinement of those tools, but also their potential for answering strategic questions about the status of the collections in individual libraries and the opportunity to deepen our understanding of the parameters around collection development in terms of:Overlap between the holdings of major UK research libraries in particular subject areas;Differences in that overlap between different subject disciplines and areas;The proportion of unique titles within those collections; The extent to which researchers will find that they no longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
  • Relatively plain sailing ‘til now – followed the guidelines of Dave Pattern to extract the data and create the appropriate algorithms for the recommenderThis is pulling from the API (which is not public yet, but will be)
  • Building on the work of MOSAIC, the SALT project will focus on a different use case where the barriers encountered by MOSAIC can be overcome. Academic researchers in the humanities make up a vast proportion of the institutional library OPAC usage. Much of the work of these researchers is monograph-based, and recent findings from Mimas and RIN indicate that a high level of postgraduate research centres on the use of unique or rare items held across the UK. These are the collections that make up the ‘Long Tail’ of UK research library collections. SALT will test the ‘long tail’ hypothesis in relation to advanced academic users of long tail collections held in UK research libraries which hold some of the richest heritage collections in the world. We will investigate how issues of relevance and frequency of borrowing might shift within the particular use case of humanities research, where low level of borrowing of rare or niche items may not necessarily equate with lower relevance to end users whose search behaviour is typically centrifugal and exploratory. We will look at the relationship between key critical texts and commonly read humanities secondary or primary research monographs that might occupy the ‘head’ of frequently borrowed items and follow activity trails to explore the relevancy of long tail, lesser borrowed, lesser known niche items.
  • A lower threshold may throw up ‘long tail’ items, but they are likely to not be deemed relevant or useful by users (although they might be seen as ‘interesting’ and something they might look into further). Set a threshold of ten or so, as the University of Hudderfield has, and the quality of recommendations is relatively sound.Concerns over anonymisation and data privacy are not remotely shared by the users we spoke to.  While we might question this response as potentially naive, this does indicate that users trust libraries to handle their data in a way that protects them and also benefits them.You don’t necessarily need a significant backlog of data to make this work locally. Yes, we had ten years worth from JRUL, which turned out to be a vast amount of data to crunch.  But interestingly in our testing phases when we worked with only 5 weeks of data, the recommendations were remarkably good.  Of course, whether this is true elsewhere, depends on the nature and size of the institution. But it’s certainly worth investigating.If the API is to work on the shared service level, then we need more (but potentially not many more) representative libraries to aggregate data from in order to ensure that recommendations aren’t skewed to represent one institution’s holdings, course listings or niche research interests, and can support different use cases (i.e. learning and teaching).

Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher Presentation Transcript

  • Copac: Reengineering the UK national academic union catalogue to serve the 21st Century researcher
    Redesign, collection analysis, recommendations
    Joy Palmer, Mimas
    University of Manchester
  • Key points
    Background & context of Copac
    Development in progress
    Strategic issues and directions
    R&D/Innovations Work
    Collections management project
    Surfacing the Academic Long Tail project
  • Aggregation of 50+ research & specialist libraries
    40 million records <
    Aprox 1 million search sessions per month
    Primary academic use case – locating long tail materials
    Primary workflow use case – cataloguing & ILL support
    Funded by JISC since 1996
    Sponsored by RLUK (and based on RLUK data +)
    In re-engineering process
    Expanding consistently to include specialist libraries
    Copac…
  • Others include….
    Imperial War Museum
    Chetham’s library
    Windsor Castle
    National Maritime Museum
    British Museum
    French Institute
    University of Exeter Special Collections
    The Women’s Library
    Institute of Education
    Royal Academy of Music
    Kew Royal Botanic Gardens
    Tate Gallery Library
    Natural History Museum
  • Half our users
    Advanced researchers
    Humanities-based
    Been with us a while…
    Looking for specific items
    More later…..
  • And the rest are mostly librarians
    Cataloguing Support
    Collections Mgt
    ILL Support
    Researcher Support
  • Current data model
  • ftp
    Pre-processing
    Contributing libraries
    Nielsen Bookdata
    ToC, Reviews, Summaries, URL’s
    ftp
    RLUK MARC21 db
    http
    Live request for book cover images
    Deduplication& conversion to MODS XML
    Admin
    Copac database
    Live circulation data
    Harvesting
    Currently OpenTextLiveLink Discovery Server
    Z39.50& CGI
    http
    Consolidated records & Single records plus Local holdings
    Contributing Library catalogues
    Includes details of: books, journal titles, proceedings, theses, maps & plans, print & recorded music, video & film, spoken word, electronic materials, articles, archives, etc.
    Google
    Database cross-search
    • 50m merged records
    • c.1m Sessions p/m
    OpenURL router
    Z39.50
    ESTC at the BL
    Search updates
    User interfaces
    ILL/Copy via users’ OpenURL server
    HTTP
    Social media
    COinS
    RSS
    OpenURL
    Z39.50
    SRU/SRW
    Users: Open access
    Use by HE, FE, NHS, Libraries, Schools, General public
    M2M
    Last Updated 07/12/07
  • Development activity in progress
    New hardware (Oracle)
    Enhanced-de-duplication
    Improved search (ranking, facets)
    ‘FRBR-ised’ record display
    Enhanced user interface
    Additional specialist libraries
    Graphic redesign
  • Strategic issues -- macro
    Changing technological landscape and user-expectations
    Death of the physical?
    ‘Good enough’ = just-in-time (not a specific item)
    eBook search and discovery challenges
    Integrated and cross-domain search
  • Strategic issues - micro
    Leadership and community role
    Identity and positioning
    Enabling infrastructure support library workflows or resource discovery service?
    Governance
    Collections policy (uniqueness vs. comprehensiveness?)
    Innovation vs. service delivery
  • Copac Collections Management Project
  • Can I release this book?How does my collection compare in strength to that of other UK libraries?
  • Project background
    Builds on the work of the White Rose Consortium
    Partners are Leeds, York and Sheffield Universities
    Funded by JISC as part of Discovery initiative (making Copac data ‘work harder).
    Sponsored & facilitated by RLUK
    7 months and limited in budget
  • How it works
    Web-based
    Identifies which locations items/batches exist
    Search by ISBN, RLUK #, author, title, subject
    Batch search (comma delimited sets)
    Data visualisation of results
    Map views
    Graphs
    Record export in MODS, CSV
  • Exploratory and iterative in approachDevelopment/testing cycle with partners trialling and providing structured feedback
  • Six use cases developed
    Identifying last copies among titles considered for withdrawal
    Identifying collection strengths
    Deciding whether to conserve a book
    Reviewing a collection at the shelves
    Prioritising a collection or items for digitisation
    Subject strengths – collection development and marketing/differentiation
  • Findings
    Need for further development and refinement of the tools (esp. duplication & user interface issues)
    Significant potential for answering strategic questions about the status of collections
  • Particularly
    Overlap between the holdings of major UK research libraries in particular subject areas;
    Differences in that overlap between different subject disciplines and areas;
    The proportion of unique titles within those collections;
    The extent to which researchers will find that they no longer have access to such a wide range of research materials in future years, given current pressures on space and the widespread severe deterioration of printed materials through brittle paper and collapsing bindings.
  • Next proposed steps
    Expansion of test libraries and resilience testing of the tool.
    Address evidence of scalability.
    Building collaborations and alliances with interested organisations pursuing complimentary activity
    Addressing the development of a business model for a service beyond a pilot
    More targeted communications and dissemination of the activity
  • SALT
    Surfacing the Academic Long Tail
  • Hypothesis…Library circulation activity data can be used to support humanities research by surfacing underused ‘long tail’ library materials through search
  • And also… how sustainable would an API-based national shared service be?Can such a service support users and also library workflows such as collections management?RLUK, M25, Leeds University, Cambridge University, Sussex University.
  • --John Rylands University Library: --1.3 million bib records--600,000 search sessions per month--23% of records unique (cross checked against WorldCat)--40,000 students10 years of circulation data
  • Why aren’t we there yet?
  • Where’s the business case?
    user demandbenefitsvaluesustainability
  • arts & humanities researchers borrow books…
  • Y-
    market research reveals these users as…
    Centrifugal searchersBerrypickers from various trailsQuite isolated and prone to pitfalls
  • And increasingly they just don’t ask librarians…They ask their tutors and each other where to look…
  • Researchers are suspicious about UGC, especially ratings & reviews, but….
    they could see the immediate benefit of‘tacit’ recommender functions….
  • What if?
    this represented a national aggregation of data gathered from the usage activity of these researchers, collected as they worked with a national aggregation of unique or rare research collections?
  • In humanities research it’s
    all the way
  • What can this mean?
    Surfacing and increasing usage of hidden collections ( & demonstrating value)
    Providing new routes to discovery based on use and disciplinary contexts (not traditional classification).
    Powering ‘centrifugal searching’ and discovery through serendipity
    Enabling new, original research – academic excellence…
    • Targeted academic researchers
    • Examine relationship between relevance and frequency of borrowing
    • Does frequency of borrowing correlate to increased relevancy?
  • What are users saying?
    3 focus groups (18 people)
    MA/PhD humanities students (mixed ages)
    Recommendations are already key to them:
    Supervisors/Peers
    Amazon
    Bib citations
    Don’t accept recommendations blindly
  • Focus groups and user testing
    3 focus groups (18 people)
    MA/PhD humanities students (mixed ages)
    How relevant/useful are the recommendations at first glance?
    Do any other recommendations look useful?
    Were you previously aware of these texts?
    How likely would you be to borrow the recommended item?
  • What are users saying?
    Recommendations are already key to them:
    Supervisors/Peers
    Amazon
    Bib citations
    Don’t accept recommendations blindly
    Serendipity important (but not to all)
  • What are users saying?
    Very supportive, but in practice founds results too generalist, irrelevant, and sometimes bizzare!
    Lower ranked recommendations much better
    Did you find something you’d borrow (yes!)
    Find something new? (mixed)
    Would you use it? (yes!)
    Useful for searching more widely
    More university data needed to improve results
  • Relation between key critical texts at the nose
    And the other stuff here 
  • Can we make the data work harder to solve other shared problems?
  • Issues for sustainability
    Is there a clear-cut case for a national shared service here?
    Data model:
    data out = easy
    data in = not so much
    Licensing & Attribution: collective ownership of a collective pot?
    Is proof of our hypothesis key to sustainability?
  • Key findings
    Lower thresholds will throw up ‘long tail’ items, but relevance and usefulness is not evident (but what is The Long Tail?)
    Users aren’t concerned about data privacy
    This can be successful without a significant backlog of data
    A shared service needs to aggregate activity data from more libraries (but not many more)
  • Proposed next steps
    Aggregate more data
    Assess impact over time
    Gather requirements and costs for a shared service
    Establish more data extraction recipes
    Investigate utility for collections mgt further
    Investigate usefulness for teachers & supervisors
  • Thanks for listening
    Thanks for listening….