Reshaping the Research Library: Some Observations on the Future of Academic Collections


Presentation from Future of the Research Library lecture series at University of Maryland, College Park. (2011-04-28)

  • With that as background, I’d like to offer a prediction about the future of shared print, and that’s our attention will begin to shift to pooled management of the retrospective print book collection. With this shift, I think we will see the emergence of a relatively small number of larger service hubs providing just-in-time delivery and longterm preservation services on a subscription basis. Individual academic libraries will contract with those service providers because they offer a cost efficient alternative to local operations and more importantly because they allow the library to redirect its attention and resources to renovating its service portfolio. As a result, I think we will see a progressive rationalization of the systemwide print book collection.I belive mass digitization of retrospective print collections will be a primary driver in this transition, preceding a broader shift to commercial provisioning of e-books.
  • How big is this shift likely to be and on what timeline? Over the last year we have studied the mass digitized book corpus in the context of systemwide print holdings and have found that a substantial part of the average academic library is already substantially duplicated. This scatter chart provide a simple but effective visualization of an important pattern that this project has revealed: that is, that the risks and opportunities associated with moving collection management ‘into the cloud’ are uniformly distributed across the research library community as a whole. [CLICK] This is a picture of the ARL membership (a microcosm of the larger research library community) that shows the level of duplication between individual library collections and the mass digitized book collection in Hathi. Over the course of this project, we have seen the rate of duplication between locally held print and mass digitized books increase steadily and significantly. In June of last year, an average of 20% of monographic titles in an academic library were duplicated in the Hathi repository; today that figure is about 30% (up to 40% for some institutions). [CLICK] In real terms, this means that rate of digital replication is exceeding the pace of growth in monographic acquisitions in most academic institutions. We estimate that the rate of duplication has increased by about 8% per library in the past year. Monographic acquisitions typically grow at about 2% per year in research libraries.A very low standard deviation (variance of ~4%), and across the population very little movement outside this range: 2/3rds of ARL community falls within standard deviation. [CLICK] We project that in a year’s time, many academic libraries are liable to find themselves “underwater,” holding a massive inventory of over-valued assets.Library directors will be called to account and expected to respond to questions about how an increasingly redundant local print collection is serving the educational and research mission of theparent institution. We need to be preparing for a world in which just-in-time, print on demand delivery is an option for a large share of the retrospective book collection.
  • Another major finding of our study is that the mass digitized book corpus is substantially ‘backed up’ in one or more large-scale storage collections. As I mentioned earlier, we have a very incomplete picture of what’s currently in storage, so this figure may actually be quite a bit higher. The figures here are based on just 5 major repositories The important point is that we seem to have the beginnings of what I characterized earlier as a ‘strategic reserve’ of print that could significantly offset the costs of local operations. As you can see here, the proportion has remained relatively stable over the course the past year. As of this month, about 2.5 million of the 3.5 million digitized books in Hathi are also held in one or more of 5 large scale shared print repositories.
  • This is a picture of how the potential value of individual print storage collections has evolved in the past year, as the mass digitized corpus in Hathi has grown. Currently, about 75% of the mass digitized book collection is ‘backed up’ in one or more of the 5 large print storage collections that we have examined. I want to say a little bit about what’s causing this increase in shared print coverage, from just over 60% a year ago to almost 75% today. Much of the change is associated with the increasing visibility of individual storage repositories (institutional disclosure) and with the increasing visibility of holdings in those repositories.A few key observations: collectively, a small number of SP repositories provide substantial coverage of the mass digitized book collection. We don’t need many libraries to tuck away inventory to ‘back up’ the digitized resource – and it may be counterproductive to do so. If you had to pick a single surrogate print supplier, it would be LC, whose collections substantially duplicate the corpus of mass digitized books. But it is not obvious that LC can or should assume this role. Finally, and perhaps most importantly, the net increase in coverage that we have seen in the last 12 months is due in large part to the increasing visibility of storage holdings at these repositories. For example, the big bump we see in UC Regional Library Facilities holdings happened in October last year when the NRLF holdings in Richmond CA became visible under a distinctive library symbol in WorldCat. The visibility of ReCAP holdings increased when we enriched the holdings data (which are external to WorldCat) with OCLC numbers. I’m not sure what happened to increase the visibility of the LC holdings, but I’d guess it has to do with a batch process in WorldCat. I want to emphasize that without better and more comprehensive disclosure of storageholdings, it is very difficult to assess the carrying capacity of existing print preservation infrastructure.
  • As we look to the future, it is clear that the academic library environment as a whole is changing. Here I have plotted projections for the duplication of academic print collections in the HathiTrust Digital Library for a range of academic libraries in the state of Pennsylvania. The blue and violet lines at the top of the stack represent smaller academic institutions . We predict that 50% of their library holdings will be duplicated within the coming year. At research intensive institutions, that watershed moment will occur somewhat later. At the largest research libraries, it may take another year or two before redundant print inventory begins to look less like an asset and more like a liability. But this change is coming, and we need to plan for it.
    1. 1. Reorganizing the Research Library:Carnegie Mellon University a system-wide perspective26 January 2011 Constance Malpas Program Officer, OCLC Research
    2. 2. OCLC Research: what we do Supports global cooperative by providing internal data and process analyses to inform enterprise service development (R&D) and deploying collective research capacity to deepen public understanding of the evolving library system Special focus on libraries in research institutions: in US, libraries supporting doctoral-level education account for <20% of academic libraries;>70% of library spending changes in this sector impact library system as a whole; collective preservation and access goals, shared infrastructure, &c.
    3. 3. OCLC Research: who we are • ~45 FTE with offices in Ohio, California and the UK • Sponsored by OCLC and a partnership of research libraries around the world that share: • A strong motivation to effect system-wide change • A commitment to collaboration as a means of achieving collective gains • A desire to engage internationally • Senior management ready to provide leadership within the transnational research library community • Deep and rich collections and a mandate to make them accessible • The capacity and the will to contribute
    4. 4. Our collaboratorsThen: Now:• ARL set the tone; size • Nimble institutions, matters and this is filler unburdened by legacy to adjust spacing print mandate• Collections of distinction • Distinctive purpose• Doing the same, better • Transforming the portfolio• Change is possible • Change is imperative A new coalition is needed to advance the research library agenda
    5. 5. OCLC Research: current portfolios
    6. 6. System-wide organization Research theme addresses “big picture” questions about the future of libraries in the network environment; implications for collections, services, institutions embedded in complex networks of collaboration, cooperation and exchange • Characterization of the aggregate library resource Collections, services, user behaviors, institutional profiles • Re-organization of individual libraries in network context Institutions adapting to changes in system-wide organization • Re-organization of the library system in network context „Multi-institutional‟ library framework, collective adaptation
    7. 7. Defining characteristics of SO activities • Emphasis on analytic frameworks and heuristic models that characterize (academic) library service environment as a whole • Identifying and interpreting patterns in distribution, character, use and value of library resource; implications for future organization of collections and services • Provides context for decision-making, not prescriptive judgments about a single, best course of action • Shared understanding of how network environment is transforming library organization on micro and macro level
    8. 8. Exemplar:Re-organization of library system • Externalization of print repository function facilitates redirection of institutional resources; new scholarly record • Cloud Library analysis (OCLC, Hathi, NYU, ReCAP) • Case study in de-composition of library service bundle: “cloud sourcing” research collections • Data-mining Hathi and WorldCat to determine where cost- effective reductions in print inventory can be achieved for individual libraries (micro economic context) • Characterizing optimal service profile for shared print/digital service providers; collective market for service (macro economic context) • Exploring social and economic infrastructure requirements; technical infrastructure a separate, secondary challenge
    9. 9. Prediction Within the next 5-10 years, focus of shared print archiving and service provision will shift to monographic collections • large scale service hubs will provide low-cost print management on a subscription basis; • reducing local expenditure on print operations, releasing space for new uses and facilitating a redirection of library resources; • enabling rationalization of aggregate print collection and renovation of library service portfolio Mass digitization of retrospective print collections will drive this transition
    10. 10. A global change in the library environment 60% Academic print book collection already substantially 50% duplicated in mass digitized book corpus% of Titles in Local Collection June 2010 40% Median duplication: 31% 30% 20% 10% June 2009 Median duplication: 19% 0% 0 20 40 60 80 100 120 Rank in 2008 ARL Investment Index
    11. 11. Mass Digitized Books in Shared Repositories ~3.5M titles 3,500,000 ~75% of mass digitized corpus is ‘backed up’ 3,000,000 in one or more shared print repositories ~2.5M 2,500,000 Unique Titles 2,000,000 1,500,000 1,000,000 500,000 0 Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Mass digitized books in Hathi digital repository Mass digitized books in shared print repositories
    12. 12. Shared Print Service Provision: Capacity Varies 80% Union of 5 major shared print collections 70% Library of Congress 60%% of Mass Digitized Corpus Duplicated 50% UC NRLF/SRLF 40% 30% 20% ReCAP 10% CRL 0% Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10
    13. 13. Carnegie Mellon University Library Collections Optimizing print holdings . . . • ~ 700,000 CMU holdings in WorldCat (PMC) Cf. 1.2M vols. ; are WorldCat holdings up to date? • ~240,000 titles held by CMU (PMC) replicated in mass- digitized book collection ~16,000 (6%) in the public domain • >190,000 mass-digitized titles held by CMU also held by PSU Shared print agreement feasible?
    14. 14. 35% of titles held in CMU Libraries are duplicated in the HathiTrust Digital Library ~700K Carnegie Mellon University (PMC) holdings in WorldCat 15,785 titles 227,729 Full View titles Limited View ~243K duplicated in HathiTrust Digital Library Represents ~$1M in annual operating costsOCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010.
    15. 15. System-wide print distribution of CMU-owned titles duplicated in HathiTrust Digital Library 89% of titles represent very low preservation risk; suitable for withdrawal, shared print agreement? Decreasing preservation riskOCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010.
    16. 16. Subject distribution of CMU-owned titles duplicated in HathiTrust Digital Library Communicable Diseases & Misc. Unclassified Health Facilities Physical Education & Recreation Medicine By Body System Agriculture Medicine Anthropology Preclinical Sciences Medicine By Discipline Geography & Earth Sciences Biological Sciences Represents 2.8 miles of library shelving; Law Psychology <1000 feet if limited to public domain Health Professions & Public Health Government Documents Chemistry Education Public domain… Performing Arts Computer Science Library Science low risk, limited return Mathematics Philosophy & Religion Political Science Sociology Physical Sciences Music Public domain Engineering & Technology Business & Economics In copyright Art & Architecture History & Auxiliary Sciences Language, Literature, Linguistics 0 10,000 20,000 30,000 40,000 50,000 60,000OCLC Research. Analysis based on HathiTrust and WorldCat snapshots. Data current as of December 2010. Titles / Editions
    17. 17. Maximize benefit, minimize risk Titles Linear Feet Offsite $ (p/a)Risk Level Strategy PD IC Min Max Min Max Relegate based on Highest Hathi 227,729 15,785 14,233 15,220 $195,847 $ 209,422 … Hathi & total High WC holdings >24 15,302 225,687 956 15,062 $ 13,160 $ 207,251 … Penn StateModerate without agreement 9,101 182,142 569 11,953 $ 7,827 $ 164,469 … Penn State without agreement Lower & holdings >24 9,073 182,026 567 11,944 $ 7,803 $ 164,345 … Penn State with Low service agreement 9,101 182,142 569 11,953 $ 7,827 $ 164,469
    18. 18. Academic libraries in the Keystone State: a common trajectory, different timelines The next few years are critical Jul „11 Nov „11 Aug ‟12 Aug ‟13 * * * *OCLC Research. Projection based on HathiTrust and WorldCat snapshot data, Jun 2009 – Dec 2010.
    19. 19. For discussion • What is the function of local print collection in long-term library strategy? • Is selective externalization of print management functions to Penn State or another potential provider an option? • Can faculty be persuaded that shared print strategy is sound? • How soon does change need to happen?