This presentation was provided by Kat Hagedorn of The University of Michigan, during the NISO event, "Library Resource Management Systems: New Challenges, New Opportunities," held October 8 - 9, 2009.
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Hagedorn, "Seamless Sharing: NYU, HathiTrust, ReCAP and the Cloud Library"
1. KAT HAGEDORN
HATHITRUST SPECIAL PROJECTS COORDINATOR
UNIVERSITY OF MICHIGAN LIBRARIES
OCTOBER 9, 2009
Seamless Sharing:
NYU, HathiTrust, ReCAP and the
Cloud Library
With thanks to Constance Malpas at OCLC and John Wilkin at University of Michigan for their considerable contributions
2. Overview
Need for cloud library
Our pilot project
Brief overview of HathiTrust
Scope and process for pilot project
Expectations and benefits
3. Cloud Library, not cloud computing
Similar but vastly different
Necessity/desire to share resources
leverage shared investment, reduce local cost
Multiple digital and print repositories
Repositories can now move into a “cloud” that will
become a shared network resource
What infrastructure needed?
5. Perceived need
Already good support of other “virtual” shared
services, e.g., ILL, doc delivery
What exists in off-site storage and digital
repositories that isn’t currently accessible?
Collection development mechanisms need to
discover accessibility and preservation statuses
How should we build such a service for consumers?
6. Partners in pilot
NYU – model customer
Acute space pressures; major library renovation
Limited mandate to build local collection of record
ReCAP – model supplier
Large-scale shared academic storage collection
HathiTrust – model supplier
Large-scale shared digital repository
OCLC Research and CLIR – consultants & convener
7. Demand for services
Multiple, sometimes overlapping, reasons
institutions will be interested in being part of a cloud
library
preserving titles that are rare and/or special in some
manner
remove titles that are duplicated across many institutions
added value of shared materials in digital repository
(discovery, search)
contributing to a public good
8. A bit about HathiTrust
To contribute to the common good by collecting,
organizing, preserving, communicating, and sharing
the record of human knowledge
materials converted from print
improve access …to meet the needs of the co-owning
institutions
reliable and accessible electronic representations
coordinate shared storage strategies
“public good” …sustaining the historical record
simultaneously …centralized …open
10. Goals of pilot study
service expectations for both digital and print
repositories
cost/benefit analyses for sharing resources
processes for discovery of shareable titles
not the build-out of technical solutions
11. N=7.6M
ReCAP
ReCAP
N=3.8M
HathiTrust
Material that
NYU can
already source
through existing
ILL – enhance
local collection
Material that NYU
can obtain through
HT dependent on
copyright status –
enhance ‘local’
collection
N=2.3M
opportunities for institutional cooperation
shared policy frameworks
joint service agreements
increased operational efficiencies
Intersections
Material that NYU
may choose to
relegate with
appropriate service
level agreement
Material that NYU can
relegate with a high
degree of confidence
12. Process for discovery of overlap
Ingestion on a monthly basis
Checking of OCLC numbers (without can’t be
processed)– use of xID to derive more
New data structure…
14. HathiTrust: Looking forward
Ingesting from 4 institutions (UC, Indiana,
Wisconsin, Michigan), more to come
Moving from off-site storage scanning to main
libraries
Result: slight changes in number of PD volumes
Change in membership …broader base of institutions
for cost-sharing
Future contracts will mostly be picklists
Internet Archive ingest starts this winter/late fall
Completion of TRAC certification
15. Expectations
Service expectations for both HathiTrust and ReCAP
turnaround time
continuity of operations
access privileges
For ReCAP, agreements similar to current processes
With HathiTrust, all are par for the course
16. Partners in cloud library with HathiTrust
With HathiTrust as a service partner, institutions
can reap the benefits of…
preservation of texts and metadata
longevity and perpetuity
trust and reliability
access to titles not held by library (comprehensive)
opportunity for voice in HathiTrust development
17. Outcome
Increased reliance on a network of collections and
services with a robust underpinning of shared policy
and service infrastructures that are jointly owned by
participating libraries
Naturally, as number of participants grows, value of
partnership increases…
We want people to perceive an analogy with cloud-sourcing core business services (storage, distribution) in the library environment -- this motivated partly by economic imperatives (opportunity to leverage shared investment, reduce local cost) but also by a fundamental transformation in the way that library services are consumed in the network
A system diagram. We already have reasonably good infrastructure to support ‘virtual’ shared collections via inter-lending and document delivery. What we lack is a view of the assets that are locked up in digital repositories and off-site storage. To make the cloud library work, we need to make the preservation status and availability of this content more available for collection management decision making. We also need to understand the service expectations that consumers will bring to the table. Data flows and business processes that will need to be supported in a large-scale shared collections environment; it requires that we make print/digital repositories part of a shared library infrastructure (the dotted line around shared collections). But before we build that kind of infrastructure, we have to figure out what kinds of inter-institutional agreements will actually support a wide-scale shift to reliance on shared collections.
The 'Cloud Library' project is seeking to expand the scope and scale of shared collections by making digital repositories (Hathi, primarily) and aggregate storage inventory (ReCAP, MLAC, SRLF etc -- the 1 billion books that we've already socked away in high-density facilities) part of the core service infrastructure. We've got a good, robust and dependable architecture for informal collection sharing via ILL-- what we lack is a comparable set of social agreements and registry services for the large-scale print and digital repository collections.
Mass digitization has created an opportunity for libraries to rethink the function of locally held inventory. The combination of large-scale digital access (via subscription mechanisms, in the main), large-scale preservation repositories (Hathi), and the already extant 'latent' infrastructure in our large-scale storage collections provides the foundation for a new collections economy that can free up a tremendous amount of library resource.
Sharing via ILL-- mechanisms for disclosing holdings and the policies that govern inter-lending practices.
Worth noting here that NYU is representative of a large cohort of academic institutions that are looking for ways to participate in a 'multi-institutional' model of library management (this is a useful hook for the people who've read the CLIR “No Brief Candle” report that Paul Courant helped write).
The scope of the CL project is really limited to the 'social/economic infrastructure' embodied in service agreements and not the build-out of technical solutions.
The intersection between NYU and HathiTrust holdings creates opportunities for changed management of physical inventory at NYU (and other institutions), as does the intersection between NYU and ReCAP holdings. The nature of the opportunity varies according to the relative ‘availability’ of content in digital or print form as well as local demand patterns.
The intersection of NYU/Hathi/ReCAP collections represents a particularly significant opportunity: titles in the public domain can be sourced from Hathi; titles in copyright can be discovered and searched in Hathi and delivered from ReCAP.
Note that ReCAP and Hathi collections are growing at a faster pace than NYU’s own holdings, accelerating the rate at which NYU can increase its reliance on the shared repositories. Hathi is adding several hundred thousand volumes per month. ReCAP is adding about 57K vols per month. NYU is probably adding under 500K volumes to its collections annually.
The opportunities for virtual enhancement of local collections are on the periphery.
Process currently in place for harvesting, merging and analyzing Hathi and WorldCat data. Some steps taken longer than others, dependent on staff and computing resources.
Need about 2 weeks per cycle; Hathi data made available on monthly cycle.
Current schedule is to draw down HT data the first week of the month, with the aim completing processing and indexing by mid-month. This gives OCLC about 10 days to identify patterns and discuss with partners before next harvest.
In terms of process, OCLC harvests HT data, finds those with an OCLC number (critical info), incorporates ReCAP holdings, and the result is a determination of the overlap of the three "repositories." Having identifying numbers is critical because there is no other way to appropriately match across the repositories. OCLC numbers are best, but other numbers can be folded into the OCLC processing routines.
All these are important to fulfill the needs of the cloud library.