Boundless Opportunity
The Impact of Cloud-Based* Services for Libraries




                                                       Rachel L. Frick
                                 Director, Digital Library Federation
                      Council on Library and Information Resources

                                                    Ticer Summer School
                                                          August 21, 2012
Cloud Based* services

 Not just technical infrastructure


   Distributed Services


   Collections


   Expertise
Network Opportunities

 Capacity to do more



 Leverage local expertise



 Amplify local excellence
Macrosolutions:
towards convergence
           “Common to these efforts will be
           developing strong coalitions that bring
           together diverse institutions within a
           national framework; federating shared
           resources and interests, including
           collections, technology, and expertise;
           and creating a genuine, volitional
           dependency on other participating
           institutions for the provision of what was
           once a locally owned and managed
           asset. We are calling these
           collaborative projects macro solutions.”

               CLIR Annual Report, 2009-2010, p. 3
Collaboration Continuum
  • Common Interest
  • Common Values
  • Convergence




http://www.oclc.org/research/publications/library/2010/2010-09.pdf
High Risk / High Reward

 Requires high trust threshold / risk tolerance



 Dependence on others



 Less control
Research Library at Web-scale

10,449,391 total volumes
5,516,747 book titles
272,663 serial titles
3,657,286,850 pages
468 terabytes

124 miles = 199.5 Kilometers
8,490 tons (US) = 7702 metric tons
3,140,629 volumes (~30% of total) in the public domain
Cloud Sourcing Library Collections

Managing Print in the Mass Digitized Library Environment
Constance Malpas, 2011

 1/3 of U.S. ARL content duplicated in HathiTrust
   Shared Print Archiving / Collective Collections
   Regional Print/ Digital Archives
   Service Centers




     http://www.oclc.org/research/publications/library/2011/2011-01.pdf
Print Archiving: network scale

 ReCAP - http://recap.princeton.edu/

 WEST - http://www.cdlib.org/services/west/about/

 ASERL / University of Florida: US Gov Docs
   http://www.aserl.org/programs/gov-doc/

 Maine Shared Print - http://www.maineinfonet.net/mscs/

 Organizational Node: Center for Research Libraries
   Print Archive Community Forum
    http://www.crl.edu/archiving-preservation/print-
    archives/forum
New Metrics
How do we –




                                                    http://www.flickr.com/photos/blackcountrymuseums/4887803840
 Count Collections?

 Measure “quality”?

 Reward high ratios of services, collections per
  budget $

 Rate Trustworthiness

 Identify good collaborators / team players?
Pause for a moment




              http://www.flickr.com/photos/hckyso/3870006964/
Networked Collections: not just books

 Digitized Primary Resource Collections
   Europeana - http://www.europeana.eu/portal/
   Biodiversity Heritage Library -
    http://www.biodiversitylibrary.org/

 Scholarly Communications
   OA publications / IR’s, disciplinary depositories

 Research Data
   DataOne - http://www.dataone.org/
   OpenAire- http://www.openaire.eu/
Challenge of Data Collections

 BIG DATA vs. small data
   Data sharing, small science and institutional repositories. Melissa
    H. Cragin, Carole L. Palmer, Jacob R. Carlson, and Michael
    Witt. Philosophical Transactions of the Royal Society A 2010;
    368(1926): 4023-4038. doi:10.1098/rsta.2010.0165

 Preservation services
   Brief online interview with Sayeed Choudhry, JHU.
    http://youtu.be/oWw7Ifn1Xx8



 Data post production services:
   Access, reuse, remix
Challenge of Data Collections

Researchers aligned with discipline, not institution

Restrictive campus IT policies

Not adequate network storage

Focused on publication, not curation

Data breach (privacy) top concern

Library viewed as dispensary of goods, not a data service partner.
                                          http://www.clir.org/pubs/reports/pub154
Data Preservation Communities

   Professional Organizations providing guidance
       International Digital Curation Centre - http://www.dcc.ac.uk/
       Digital Preservation Coalition - http://www.dpconline.org/
       National Digital Stewardship Alliance -
        http://www.digitalpreservation.gov/ndsa/index.html
       Open Planets Foundation - http://www.openplanetsfoundation.org/

   Centers that “bridge the gap”
       Data to Insight Center – http://d2i.indiana.edu/
       D2C2 – http://d2c2.lib.purdue.edu/
       UC3 – California Digital Library - http://www.cdlib.org/services/uc3/

   Networks that balance the load
       Text Grid - http://www.textgrid.de/
       DataOne - http://www.dataone.org/
       Data Conservancy - http://dataconservancy.org/
Why prioritize data curation services?

 Data are emerging as the research output of importance
   Data papers, example Ecological Society ofAmerica:
     http://esapubs.org/archive/archive_D.htm
   Data citation http://www.datacite.org/
   Databib http://databib.org/


 Published journal articles will be less important
   Metadata of the research data
   Gravemarker of research activity and version of dataset
What are conversations on your
            campus?

                                                        How is the library positioning itself
                                                         in your campus’ data ecology?
                                                           Active Participant?
                                                           Research Partner?
                                                           Passive – end of process?

http://www.flickr.com/photos/marcwathieu/2979581445/

                                                        How is your library connected to
                                                         larger data communities?
Collections = DATA

 Data sets are not just scientific and business tables or
  spreadsheets



 Not just generated by satellites and sensors



 Libraries (archives,museums): potential distributed data
  stores
Digital Collections: Libraries’ Big Data
Computational Research

 Digital Humanities
   Digging into Data Challenge
      http://www.diggingintodata.org/




   CLIR publication: One Culture
      http://www.clir.org/pubs/reports/pub151/pub151.pdf
Case Study: Historic Newspapers
               • Chronicling America
                  • http://chroniclingamerica.loc.gov/

               • 5 million page images from historic
                 newspapers with OCR from
                 organizations in 25 states

               • ~ 4 million hits per day

               • Traditional research:
                   • SERACHING for stories

               • Data research:
                  • MINING newspaper OCR for
                     trends across time periods and
                     geographic areas
Case Study: Historic Newspapers




 http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers
Data Research Service Needs

 To use collections as a whole, mining and organizing and
  the information in novel and innovative ways



 Algorithmic and visualization tools



 Working with both the artifact and its data representation
Data Collection Services

 The ingest and inventory of such collections, other than scale,
  is basically understood.

 How much ingest processing should be done with data
  collections, or collections that can be treated as data?

 Do we process collections to create a variety of derivatives that
  might be used in various forms of analysis before ingesting
  them?

 Do we have sufficient infrastructure to support full discovery?

 Do we load collections into analytical tools?
Library Service Implications

 Collections as “self-serve”

 If only provide access to data, do we limit it to native format or
  provide pre-processed or on-the-fly format transformation
  services for downloads?

 Can we handle the download traffic?

 Can our staff develop the expertise to provide guidance to
  researchers in using analytical tools?

 Do we leave researchers to fend for themselves?
The De-centered Library
De-centered Networked Library




  http://www.slideshare.net/yiibu/beyond-themobilewebbyyiibu/128
United by Brand
DPLA: Library as Platform
Constellation Model:




http://s.socialinnovation.ca/files/constellation%20and%20open%20source%20article%20september08_osbr.pdf
New Librarianship

 Honesty about the limits of re-tooling

 Re-think the librarian’s role in research

 Crucial leadership challenge

 Priorities of traditional services
    “Stop moving the books, okay?”

 Back to Basics
    Collections that are unique
    REAL Research support
    Archiving, preservation, and access: distributed, but at scale
Get out of the comfort zone

 Take the time to ask the hard questions

 Consider the possibility for radical change

 Are we deciding for today?
   Or making the hard choice for tomorrow?

 Are we network ready?



                                    http://www.flickr.com/photos/iamthebestartist/203179552/
Being ready

 Research environments (including library systems) with
  permeable borders

 Advocacy Value of “Open Data”

 Facilitating information flow

 Courage




                          http://www.clir.org/pubs/reports/pub154/pub154.pdf
Connected-ness




Bollen J, Van de Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of
Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803
Action, Trust and Risk
Credits and Attribution


 Ideas and contributions
     Patricia Cruse, UC3 – California Digital Library
     Lorcan Dempsey, OCLC
     Josh Greenburg, Sloan Foundation
     Leslie Johnston, Library of Congress
     Patricia Cruse, UC3 – California Digital Library
     Gunter Waible, Smithsonian Institution
     Jon Voss, History Pin – We are what we do
     Martin Kalfatovic – Smithsonian Institution Libraries / BHL
     Charles Henry and my colleagues at CLIR

Boundless Opportunity

  • 1.
    Boundless Opportunity The Impactof Cloud-Based* Services for Libraries Rachel L. Frick Director, Digital Library Federation Council on Library and Information Resources Ticer Summer School August 21, 2012
  • 2.
    Cloud Based* services Not just technical infrastructure  Distributed Services  Collections  Expertise
  • 3.
    Network Opportunities  Capacityto do more  Leverage local expertise  Amplify local excellence
  • 4.
    Macrosolutions: towards convergence “Common to these efforts will be developing strong coalitions that bring together diverse institutions within a national framework; federating shared resources and interests, including collections, technology, and expertise; and creating a genuine, volitional dependency on other participating institutions for the provision of what was once a locally owned and managed asset. We are calling these collaborative projects macro solutions.” CLIR Annual Report, 2009-2010, p. 3
  • 5.
    Collaboration Continuum • Common Interest • Common Values • Convergence http://www.oclc.org/research/publications/library/2010/2010-09.pdf
  • 6.
    High Risk /High Reward  Requires high trust threshold / risk tolerance  Dependence on others  Less control
  • 7.
    Research Library atWeb-scale 10,449,391 total volumes 5,516,747 book titles 272,663 serial titles 3,657,286,850 pages 468 terabytes 124 miles = 199.5 Kilometers 8,490 tons (US) = 7702 metric tons 3,140,629 volumes (~30% of total) in the public domain
  • 8.
    Cloud Sourcing LibraryCollections Managing Print in the Mass Digitized Library Environment Constance Malpas, 2011  1/3 of U.S. ARL content duplicated in HathiTrust  Shared Print Archiving / Collective Collections  Regional Print/ Digital Archives  Service Centers http://www.oclc.org/research/publications/library/2011/2011-01.pdf
  • 9.
    Print Archiving: networkscale  ReCAP - http://recap.princeton.edu/  WEST - http://www.cdlib.org/services/west/about/  ASERL / University of Florida: US Gov Docs  http://www.aserl.org/programs/gov-doc/  Maine Shared Print - http://www.maineinfonet.net/mscs/  Organizational Node: Center for Research Libraries  Print Archive Community Forum http://www.crl.edu/archiving-preservation/print- archives/forum
  • 10.
    New Metrics How dowe – http://www.flickr.com/photos/blackcountrymuseums/4887803840  Count Collections?  Measure “quality”?  Reward high ratios of services, collections per budget $  Rate Trustworthiness  Identify good collaborators / team players?
  • 11.
    Pause for amoment http://www.flickr.com/photos/hckyso/3870006964/
  • 12.
    Networked Collections: notjust books  Digitized Primary Resource Collections  Europeana - http://www.europeana.eu/portal/  Biodiversity Heritage Library - http://www.biodiversitylibrary.org/  Scholarly Communications  OA publications / IR’s, disciplinary depositories  Research Data  DataOne - http://www.dataone.org/  OpenAire- http://www.openaire.eu/
  • 13.
    Challenge of DataCollections  BIG DATA vs. small data  Data sharing, small science and institutional repositories. Melissa H. Cragin, Carole L. Palmer, Jacob R. Carlson, and Michael Witt. Philosophical Transactions of the Royal Society A 2010; 368(1926): 4023-4038. doi:10.1098/rsta.2010.0165  Preservation services  Brief online interview with Sayeed Choudhry, JHU. http://youtu.be/oWw7Ifn1Xx8  Data post production services:  Access, reuse, remix
  • 14.
    Challenge of DataCollections Researchers aligned with discipline, not institution Restrictive campus IT policies Not adequate network storage Focused on publication, not curation Data breach (privacy) top concern Library viewed as dispensary of goods, not a data service partner. http://www.clir.org/pubs/reports/pub154
  • 15.
    Data Preservation Communities  Professional Organizations providing guidance  International Digital Curation Centre - http://www.dcc.ac.uk/  Digital Preservation Coalition - http://www.dpconline.org/  National Digital Stewardship Alliance - http://www.digitalpreservation.gov/ndsa/index.html  Open Planets Foundation - http://www.openplanetsfoundation.org/  Centers that “bridge the gap”  Data to Insight Center – http://d2i.indiana.edu/  D2C2 – http://d2c2.lib.purdue.edu/  UC3 – California Digital Library - http://www.cdlib.org/services/uc3/  Networks that balance the load  Text Grid - http://www.textgrid.de/  DataOne - http://www.dataone.org/  Data Conservancy - http://dataconservancy.org/
  • 16.
    Why prioritize datacuration services?  Data are emerging as the research output of importance  Data papers, example Ecological Society ofAmerica: http://esapubs.org/archive/archive_D.htm  Data citation http://www.datacite.org/  Databib http://databib.org/  Published journal articles will be less important  Metadata of the research data  Gravemarker of research activity and version of dataset
  • 17.
    What are conversationson your campus?  How is the library positioning itself in your campus’ data ecology?  Active Participant?  Research Partner?  Passive – end of process? http://www.flickr.com/photos/marcwathieu/2979581445/  How is your library connected to larger data communities?
  • 18.
    Collections = DATA Data sets are not just scientific and business tables or spreadsheets  Not just generated by satellites and sensors  Libraries (archives,museums): potential distributed data stores
  • 19.
  • 20.
    Computational Research  DigitalHumanities  Digging into Data Challenge  http://www.diggingintodata.org/  CLIR publication: One Culture  http://www.clir.org/pubs/reports/pub151/pub151.pdf
  • 21.
    Case Study: HistoricNewspapers • Chronicling America • http://chroniclingamerica.loc.gov/ • 5 million page images from historic newspapers with OCR from organizations in 25 states • ~ 4 million hits per day • Traditional research: • SERACHING for stories • Data research: • MINING newspaper OCR for trends across time periods and geographic areas
  • 23.
    Case Study: HistoricNewspapers http://www.stanford.edu/group/ruralwest/cgi-bin/drupal/visualizations/us_newspapers
  • 24.
    Data Research ServiceNeeds  To use collections as a whole, mining and organizing and the information in novel and innovative ways  Algorithmic and visualization tools  Working with both the artifact and its data representation
  • 25.
    Data Collection Services The ingest and inventory of such collections, other than scale, is basically understood.  How much ingest processing should be done with data collections, or collections that can be treated as data?  Do we process collections to create a variety of derivatives that might be used in various forms of analysis before ingesting them?  Do we have sufficient infrastructure to support full discovery?  Do we load collections into analytical tools?
  • 26.
    Library Service Implications Collections as “self-serve”  If only provide access to data, do we limit it to native format or provide pre-processed or on-the-fly format transformation services for downloads?  Can we handle the download traffic?  Can our staff develop the expertise to provide guidance to researchers in using analytical tools?  Do we leave researchers to fend for themselves?
  • 27.
  • 28.
    De-centered Networked Library http://www.slideshare.net/yiibu/beyond-themobilewebbyyiibu/128
  • 29.
  • 30.
  • 32.
  • 33.
    New Librarianship  Honestyabout the limits of re-tooling  Re-think the librarian’s role in research  Crucial leadership challenge  Priorities of traditional services  “Stop moving the books, okay?”  Back to Basics  Collections that are unique  REAL Research support  Archiving, preservation, and access: distributed, but at scale
  • 34.
    Get out ofthe comfort zone  Take the time to ask the hard questions  Consider the possibility for radical change  Are we deciding for today?  Or making the hard choice for tomorrow?  Are we network ready? http://www.flickr.com/photos/iamthebestartist/203179552/
  • 35.
    Being ready  Researchenvironments (including library systems) with permeable borders  Advocacy Value of “Open Data”  Facilitating information flow  Courage http://www.clir.org/pubs/reports/pub154/pub154.pdf
  • 36.
    Connected-ness Bollen J, Vande Sompel H, Hagberg A, Bettencourt L, Chute R, et al. 2009 Clickstream Data Yields High-Resolution Maps of Science. PLoS ONE 4(3): e4803. doi:10.1371/journal.pone.0004803
  • 37.
  • 38.
    Credits and Attribution Ideas and contributions  Patricia Cruse, UC3 – California Digital Library  Lorcan Dempsey, OCLC  Josh Greenburg, Sloan Foundation  Leslie Johnston, Library of Congress  Patricia Cruse, UC3 – California Digital Library  Gunter Waible, Smithsonian Institution  Jon Voss, History Pin – We are what we do  Martin Kalfatovic – Smithsonian Institution Libraries / BHL  Charles Henry and my colleagues at CLIR

Editor's Notes

  • #3 Hopefully my talk today will serve a s bridge between the organizational and personnel concepts that we have talked about today, to the more technical aspects that will be discussed in Module 2 tomorrow.Essentially our world, both personal and professional, is becoming more socially network driven. This demands a different way of working, both within our organizational confines and how we work with others.Instead of organizing and linking in linear ways of how our world and work is organized – colums and rows, labels and categories, it is a a more flat structure, where lables and relationships can shift, and there is more opportunities to engage and do more, but differently
  • #4 Capacity to do more – but it is imperitive to examine what work is no longer a prioirity – driving interdisciplinary scholarshipMore oportunties for individuals to work on projects outside of their normal organization confines.Opportunties for organizations to market what they do best, knowing which partnerships, and when to engage, for the maximium benefit.
  • #5 The President of CLIR, Charles Henry, has written much about macrosolutions, And I recommend his most recent article in the January issue of educuase review. Basically, Macorsolutions are where institutions come together, share resources and create solutions that create convergence – or an integral dependency – to provide a service that was once locally owned. Paul Courant and John Wilkin, of the Univeristy of Michigan refers to this as “above-campus” library services in an Educause review article from August 2010 .  IS this the same as outsourcing, or some of the limited collaborative projects, like interlibrary loan. Although there is similar charactieristics, macrosoluations, is collaboration to the extreme. - it is SUPER sonic collaboration. – Collaborations exist on a scale – Gunter Waible discusses this in his OCLC report Collaboration Contexts:Framing Local, Group and Global Solutions, Gunter Waible
  • #6 I Deeper collaborations trend toward convergence, a transformative process that eventually will change behaviors, processes and organizational structures, and leads to a fundamental interconnectedness and interdependence among the partners. In transformative collaborations, participants find efficiencies that free up time and resources to focus on the things they do best. At the extreme end of the continuum, convergence in a specific area may turn into infrastructure: a service that is so deeply embedded into our everyday life that it becomes visible only when it breaks down.From Collaboration Contexts:Framing Local, Group and Global Solutions, Gunter Waible
  • #8 How many are familiar with HathiTrust? HathiTrust is a partnership of major research institutions building an immense digital preservation repostitory. A majority of the content is from Google book scans, but other digital collections are represented. This project has achieved. Economies of scale for digital preservation and associated servicesGrown the pool of digital preservation expertise by through real world experienceTrusted collaboration As research libraries face financial pressures and weigh the relative value of print and digital volumes, this growing digital aggregation of research library content has the potential to support current local collection development decisions. 
  • #9 In a recent OCLC research report, the feasibility of outsourcing management of low-use print books held in academic libraries to shared service providers, including large-scale print like ReCAP and digital repositories like the HathiTurst is examined. Based on a year-long study of data from the New York University’s Bobst Library, HathiTrust, ReCAP, and WorldCat, they concluded that there is sufficient material in the the HathiTrust to duplicate a portion of virtually any academic library in the United States, and there is adequate duplication between Hathi and large-scale print storage facilities to enable a great number of academic libraries to reconsider their local print management operations. As of June 2010, the median rate of duplication between titles held by university libraries in the U.S. Association of Research Libraries (ARL) and the HathiTrust Digital Library exceeds 30%; that is to say, nearly a third of the content purchased by research-intensive libraries in the United States has already been digitized and is preserved in a shared digital repository.If the current growth trajectory of the HathiTrust Digital Library is sustained, it is projected that more than 60% of the retrospective print collections held in ARL libraries will be duplicated in HATHI by June 2014. This growth rate far exceeds average annual acquisitions in ARL libraries, suggesting that the digital replication of legacy collections will outpace growth of new physical collections, enabling a transformation in traditional library operations, staffing and space requirements.The median space fort an ARL library approximately 36,000 linear feet or the equivalent of more than 45,000 assignable square feet (conservative estimate).  The total annual cost avoidance possible available today would amount to $500,000 to $2 million per ARL library depending on the physical environment (e.g., open stacks on campus or high-density off-site storage) in which the titles would be managed locally. or (13,828,000 to 55,312,000 roubles), There are some obstacles to achieving this vision of a cloud sourced library.There is only 30% of Hathi in the public domain; it requires a network of shared print services – good news, a small number of print providers needed to achieve 70% collection duplication; and there needs to be a service to manage access to this print repository network.http://www.oclc.org/research/publications/library/2011/2011-01.pdfhttp://downloads.alcts.ala.org/ce/06062012_ACprecon_shared_collections_planning_slides.pdfIncreases preservation capacity.• Reinvests space.• Reduces risk of loss of scarce and unique copies.• Shifts library resources to new services/materials.• Encourages greater access through digitization.• Increases support for scholarship through inter-institutional collaboration.• Reduces rate of unnecessarily duplicative print collection growth.
  • #16 These are the library focused or from the library perspective – Hvent even gotten to the Data center or HPC organized groups -
  • #18 This is not the time to go it alone – for success in this area it is imperitive to be connected to others – And it is completely folly to not balance the load that this is going to take to others – Talk about DPN and other network efforts.
  • #20 We should be paying closer attention to the data curation conversation – as it is not just another siloed service in the library – it is the core service of our libraries Data curation activities enable data discovery and retrieval, maintain its quality, add value, and provide for reuse over time
  • #25 Our community used to expect researchers to come to us, ask us questions about our collections, and use our digital collections in our environment. They want their library resources presented as a platform…..
  • #29 This is how Amazon is represented as a decentered model. Replace Amazon with your library - and look at the cloud as “cloud library” as envisioned be the OCLC report.  By approaching library functions in this way – the library itself, ceases to be a stand-alone island, a world unto itself. It transcends the idea of place, and functions more like an ecosystem, enabling the freedom to experiment and respond proactively to user needs
  • #30 In the Decentered Library model, Collections and services are united, not by a place or a website – but by the library brand and its message.And as a result – the organizational borders become more permeable – and the”library” becomes more integrated into the information network.
  • #33 If our libraries are becoming more flattened and distributed – why are we still trying to organize in a linear fashion?Organizational model to solve the more challenging problems facing libraries / cultural heritage/ higher ed.Faith based modelTrust that the folks attracted to the need or opportunity are the best ones to solve the problemThis is the way to organize around these information based issues that require creative thinking – as these are large over arching challenges, international challenges, that invovle many stakeholders.It allows for orgnazations and individuals to contribture to the process at the time and tin instance when they can do the most good and make the most impactIt is how the NDSA is organized, the DPC and the planning stages for the DPLA.This is not an easy model – as it requires lots of overheard, is risky, but ultimately creates sociall driven emergent communities capable of enacting global change
  • #34 Leadership challenge – both top down and from where you are.
  • #35 Are we doing things, through hiring decisions, budget, and policy that keep us from fully taking advantage of the socially driven information ecology?
  • #36 Rufus Pollock of the open Knowldege foundation said at a conference – the best thing to do with your data will be done by someone else – This is not a bad thing – in fact it is extraordinary – how are you and your library facilitating this? We are a remix cuture –are you invested? Or do you abstain? Or worse, restrict?Golden moment of Advocacy for libraries in the area of Open.CLIR report – challenges of data research, curation and reuse – restrictive research environments – insert refePermeable – Cross discipline and institutional research spaceCourage in our skills, courage in our abilities, courage to trust our partners, and courage in order to expand our tolerance for risk
  • #37 The connectedness of it all – It is all very fluid – In order to be a thriving relevant information, or better yet, knowledge organization, our libraries, have to be connect, collaborated, and converge, in order to be a vibrant hub on the network. We need to be part of the research process and not passively wait for the end product to arrive on our loading docks and online through our subscription services.This is an unprecedented, boundless opportunity for libraries, limited only by us.