Oct 14 NISO Webinar: Cloud and Web Services for Libraries

NISO Webinar:
Cloud and Web Services for Librarians
Wednesday, October 14, 2015
Presenters:
John “JG” Chirapurath,
Senior Vice President and General Manager, ProQuest Workflow Solutions
Kurt Ewoldsen,
Manager, Infrastructure and Applications Support, California Digital Library,
University of California
Heather Lea Moulaison,
Assistant Professor, The iSchool (School of Information Science & Learning
Technologies), University of Missouri
http://www.niso.org/news/events/2015/webinars/cloud_services/

Utilizing the Cloud to
Empower Research Efforts
John "JG" Chirapurath, Senior Vice President and
General Manager, ProQuest Workflow Solutions

Growing Content Volume & Diversity
Print
Electronic
Digital
+
+

Derived from US Department of Education, NCES Academic Libraries Survey, 1998-2008.
Electronic Resources are the
Majority of Content in Collections
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1998 2000 2002 2004 2006 2008 2014 2020
Print Books and Jounals
.......
You
are
here
Projected change
.......
Academic Library Expenditures on Purchased and Licensed Content

Cloud-based Systems Benefit
Libraries
• Cloud technologies help libraries avoid spending
limited budgets on IT expenses and system
management
– Reduction in hardware costs, decrease in IT support
time and expense, more easily and rapidly updated
systems
– Provides a full disaster recovery system that libraries
don’t have to administer
• Reallocate budget to additional premium content
that benefit researchers
• Use librarians’ time to share experience and
insights with researchers which brings better value

Cloud-based Systems Support
Scalability & Researchers’ Access Needs
• Cloud-based systems’ scalability inherently
manages electronic resources better - key with
expanding volumes of electronic and digital
content
• Better enable collaboration between librarians
and researchers in multiple
locations
• Support research anytime,
anywhere, on any device

Research & Publishing Has Changed
• Emphasis on more frequent publishing in
academia
– Researchers want to share near real-time
insights gained from open access information
resources
• Focus on research as a competitive
advantage
• Trend of interdisciplinary research and
collaboration

Cloud-based Systems Improve
Discoverability & Collaboration
• Easier to find and share information and
resources – content discoverable more
rapidly
– Tools like the Summon® service make new
content discoverable more rapidly
• 2.5 billion+ records covering more than 90 different
content types and more than 10,000 providers
• Better supports multi-location collaboration &
access on any device
– RefWorks and other reference management
solutions make it easier for researchers to share
and collaborate

Next Generation LSPs Offer
Comprehensive Resource Management
• Supporting management of print & electronic
resources across the entire collection
– Unified workflows
• Assess, manage and track resources across
multiple location/geographic boundaries
• Ability to share e-resources and information for
databases and holdings across different
locations and types of libraries
• Better supports evolving role of librarians

And in the Future: Linked Data
• Cloud-based technologies are what make
Linked Data possible
• Allows libraries to expose relationships among
entities that users can easily follow
• Improves user navigation between
related resources, concepts, and
entities – more serendipitous
discovery

Library => Virtual Knowledge Center
• More diverse content types and growing body of
knowledge accessible through the library
• Researchers accessing information more fluidly
– More seamless collaboration across locations
– Faster insights leading to interdisciplinary
breakthroughs
• Librarians as value-added information
professionals collaborating with researchers at
later stages

Migrating CDL IT Infrastructure
to the AWS Cloud
Presented by:
Kurt Ewoldsen
Director of Infrastructure & Application Support
Kurt.Ewoldsen@ucop.edu
www.cdlib.org

The California Digital Library (CDL)
• Mission: The California Digital Library exists to support the
University of California community’s pursuit of scholarship and to
extend the University’s public service mission.
• Vision: The California Digital Library’s vision is to elevate the digital
library for UC so that it becomes "expansively global and deeply
local". CDL will advance the digital transition of scholarly
information in three spheres:
– Access: Scholars will have access to the highest quality research
collections worldwide through services that support and enable new
scholarship and make it as open as possible.
– Formats: CDL will support all digital formats throughout their life cycle
with a full range of services, especially to surface UC’s unique digital
assets and collections.
– Scale: Through partnerships and alliances, CDL will elevate services to
the network level for maximum impact.
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 14

CDL Infrastructure Timeline
Past Environments across 2 data centers
Migration to VMware environment underway
~90 VMs & ~45 physical systems (Sun/Solaris)
~200TB of SAN storage
2 small AWS accounts for grant-related work
Present Environments across 3 data centers
Final Sun/Solaris systems retired; migration to AWS underway
~30 VMs & ~200TB of SAN storage
AWS account with ~100 EC2 instances & ~50 RDS instances
Future Environments in a single platform (AWS)
VMware environment retired; no more physical infrastructure, all
equipment decommissioned
AWS account with ~150 EC2 instances & ~60 RDS instances

CDL Services in the Cloud
Customer Services Infrastructure Services
UC Libraries web site Nagios
CDL web site Tripwire
Calisphere LDAP
eScholarship Puppet Enterprise
EZID FTPS/SFTP
Online Archive of California NFS Home Server
Request (ILL) Bastion Servers
UC Library Reprints
HathiTrust Zephir

Why use the Cloud?
Cost
ScalabilityAgility

Think Best Value, Not Lowest Cost
• Business case for AWS migration not based on
decreased annual cost (although savings are
expected)
• AWS platform provides significant benefits for the
same spend: HA, DR, managed services (RDS),
and more
• Biggest cost savings is actually cost avoidance
– CDL was due to spend ~$750,000 to refresh
infrastructure equipment over the next 2 years; this is
now unnecessary

Vendor Management Cost Savings
• To support our physical infrastructure, we had
annual maintenance contracts with 10+
different vendors, totaling ~$60,000/year
• To support our AWS infrastructure, we have a
monthly maintenance fee with a single
vendor, with an annual cost of ~$30,000
• In addition to these direct cost savings, we
regain a significant amount of time formerly
spent on vendor/contract management

Cost Reporting
• In the typical shared or virtual environment, it is
difficult to determine the actual infrastructure
costs for any given service
• AWS has cost reporting down to a science, and I
now know in detail the infrastructure costs of
every service we support, and can share that
information with CDL managers and staff
• Good cost information leads to good decisions
regarding service development & deployment

Current AWS Costs

Cost Reporting By Tags - Monthly

Cost Reporting By Tags - Daily

Agility & Scalability
• Our annual budget cycle requires that we forecast
infrastructure needs 18 months in advance, sometimes
more
• However, we frequently get last minute requests for
extraordinary service: significant amounts of data to
archive or processing to complete
• In our legacy environment, these requests were almost
impossible to accept
• In AWS, we have virtually unlimited capacity (provided
someone is willing to pay for it!)
• Conversely, since we have no long-term investment in
infrastructure, we can also scale down to ensure our costs
decrease if service volume or use decreases

Cost > Innovation > Lower Cost
Application Profile An existing service that gets an occasional data upload that
must be ingested and indexed for discovery
Legacy Configuration Single, large shared system running multiple applications at
the same time.
1st AWS Iteration Application running on a large instance, waiting for data to be
uploaded and then processing the data. The instance mostly
sits idle; so not a very cost-effective solution.
2nd AWS Iteration Watchdog process running a single, small instance waiting for
data to be uploaded. When data is detected, multiple large
instances are started to process the data, then shut down.
This decreases the processing duration and the cost at the
same time.
3rd AWS Iteration Watchdog process running a single, small reserved instance
waiting for data to be uploaded. When data is detected,
multiple large instances are purchased on the “spot” market
and started to process the data, then shut down. Further
reduces costs.

Grant-funded Efforts
• AWS accounts are a good way to handle grant-
funded efforts that have to be delivered upon
completion (e.g. www.archivesspace.org)
• Simply create a separate AWS account, then
configure the environment and develop the
application
• Collaboration is easy, as you control access to the
environment
• At the end of the grant, simply sign the AWS
account over the to appropriate party (provide
the access keys and update the billing entity)

Challenges of the Cloud Migration
• Migrations are extra work
• Re-architecting applications for the new
platform is even more work
• The cloud provides more variable
performance
– between local and cloud environments
– between processing cycles in the cloud

More Challenges
• The rapid rate of change in AWS can make
staying current a challenge; in fact just
understanding all of the AWS services and
potential benefits is a difficult task
• Systems still hang, crash and reboot in the
cloud and because you don’t manage the
hardware directly, you have even less visibility
into the conditions behind these events

What’s Next
• Complete the migration (by July 2016)
• Move from On-demand to Reserved instances
• Move all appropriate services to a H/A architecture
• Create DR capability in an alternate region
• Evaluate OpsWorks to replace Puppet
• Provide performance & cost visibility to application
development teams
• CloudTrail notifications from CloudWatch
• Look at new databases (Aurora, Redshift, DynamoDB)
• Elastic Beanstalk pilot underway (Iaas > PaaS)

Thanks
• Special thanks to my system administration
team and CDL application development
teams, who accepted my vision for a better
future, adopted it as their own, and worked
tirelessly to turn the plan into reality. The
success we have achieved to date is due solely
to their dedication and professionalism.

Surveying the Horizon:
Preservation and the Cloud
October 14, 2015 NISO Webinar:
Heather Lea Moulaison
iSchool at the University of Missouri

Rationale
• Preserving digital information is becoming an ever
more important role of libraries and archives in our
digital society
• Cloud computing and, more specifically, cloud-based
data storage offers some potential to help address the
issues information professionals face
• Yet, not everyone agrees that cloud computing is the best
solution
• By examining the problem, reviewing the literature, and
assessing best practices, it is possible to understand
some of the issues that need to be considered

Agenda
• What is digital preservation and why is it
important?
• Definitions
• Digital preservation challenges and opportunities
• What is cloud computing?
• Definitions
• Cloud computing challenges and opportunities
• Digital preservation and cloud computing
• Examples and case studies
• Benefits and risks
• Discussion: best practices and strategies

What is Digital Preservation?
• “Digital preservation combines policies, strategies and
actions to ensure access to reformatted and born
digital content regardless of the challenges of media
failure and technological change. The goal of digital
preservation is the accurate rendering of authenticated
content over time” (ALCTS, 2007).
• A number of other definitions exist
• All basically require information professionals to ensure
digital content is accessible/usable over a long period of
time.
• To do this requires the right people with the right
technology who have the support and vision they need.

Digital Preservation: Challenges
and Opportunities
• Challenges
• The nature of digital files
• DP is more complex than simply backing up a file
• DP requires additional technical maintenance
along with safeguarding context
• Threats include
• Technological
• Meaning-related (if metadata is lost, the context is
not clear)
• Budgetary support
• Opportunities
• Continued relevancy in digital age
• Ensure access to digital content for future
generations
• Fulfill legal requirements
https://s.yimg.com/fz/api/res/1.2/01LjnAfcStgmzNU
Q1ESasw--
/YXBwaWQ9c3JjaGRkO2g9NDA0O3E9OTU7dz0zNzA-
/http://www.mrmartinweb.com/images/type/broth
erwp3400.jpg

What is Cloud Computing?
• “A model for enabling convenient, on-demand network access
to a shared pool of conﬁgurable computing resources (e.g.
networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management
effort or service provider interaction” (Mell & Grance, 2011 p. 1).
• As with digital preservation, a number of definitions exist
• All basically assume that information is being stored remotely through
the internet.
• Examples of cloud service provider and services: Amazon
(http://www.amazon.com/)
• Amazon Elastic Compute Cloud (EC2)
• Amazon Simple Storage Service (S3)
• Amazon Glacier

The NIST Cloud Computing Service
Models
http://www.servercloudcanada.com/2013/10/defining-the-cloud/

Cloud Computing: Challenges and
Opportunities
• Challenges
• Security in the cloud is a concern for many and can be perceived as
both a positive and a negative aspect of cloud computing
• Placing trust in a third party to store content can also be a positive
or a negative
• The uncertain future: it is possible, but not likely, that a cloud
computing company will go out of business, taking a library’s data
with it
• When data is stored in the cloud, the cloud provider has access to it
• Opportunities
• Perceived lowering of technical and financial barriers
• Increased flexibility to quickly meet increased demands, disaster
recovery, decreased software application and server maintenance,
decreased capital expenses, increased security, and cloud
computing is more environmentally friendly (Salesforce, 2015)
• Geographic diversity of server locations helps!

Digital Preservation and Cloud
Computing: Turnkey Solutions
• Examples:
• Preservica Cloud Edition: SaaS model.
• DuraSpace’s DuraCloud: cloud storage from either Amazon or the
San Diego Supercomputing Center (Schumacher et al., 2014).
• Cloud computing is widely utilized in industry, but Srivastava
and Verma (2015) observe that the “Application of cloud
computing in libraries is a relatively new area as compared
to its applications in business and corporate sector” (p. 33).
• Some information professional points of view:
• There are at least some benefits to using cloud computing for
digital preservation
• Questions: Is cloud computing truly the best approach for digital
preservation? If so, what might be the caveats?

Research into Digital Preservation
and Cloud Computing
• National Digital Stewardship Alliances’ Infrastructure Working
Group survey (Bailey, 2012):
• 74% of the members had a strong preference for controlling their
own preservation storage systems because of cost concerns,
trustworthiness, legal issues and security and risk management.
• Researchers from the University of Cape Town (Poulo, Phiri, &
Suleman, 2014):
• showed that digital library applications in the cloud can provide
adequate response time and that the response time is not
significantly affected by complexity or collection sizes.
• But… is the elasticity the cloud provides worth the cost?
• “Cloud storage really is cheaper if your demand is spiky, but digital
preservation is the canonical base-load application” (Rosenthal,
2014)
• Miller (2014) points out, “some of the most compelling attributes of
the public cloud are best suited to ephemeral or (relatively!) short-
term use cases” (¶ 2).

Research on Costs
• Rosenthal and Vargas (2013) performed a study in
which they experimented with running a Lots of Copies
Keep Stuff Safe (LOCKSS) (http://www.lockss.org/) box
on Amazon’s EC2 cloud backed with Amazon S3 cloud
storage.
• Their study concluded “that current cloud storage services
are not cost-competitive with local hardware for long term
storage, including for LOCKSS boxes” (Rosenthal & Vargas,
2013, p. 107).
• Han (2015), after reviewing this and other studies, “believes
that the combination of big price drops and free data transfer
within the same data zone makes the cloud storage a very
attractive solution for long-term digital preservation,
especially using [Amazon] Glacier” (p. 266).
• Benefits of cloud computing (e.g. geographically-
diverse storage) mean it is not a one-to-one
comparison.
• As Fryer and Brown (2014) write, more study in this area is

Best Practices/Case Studies
• Central Connecticut State University, USA (Iglesias,
2011; Iglesias & Meesangnil, 2010) decided to use
Amazon S3 for storing their long-term archival
masters for digital preservation purposes.
• In evaluating their decision, they determined that using
Amazon S3 for storage was a very good one for them:
costs were low and there was no downtime in the first
year of use (Iglesias & Meesangnil, 2010).
• The Parliamentary Archives, Houses of Parliament,
London, UK, adopted third-party cloud storage to
use with their Preserivca Enterprise digital
preservation system (Fryer & Brown, 2014). In order
to minimize risk, the Parliamentary Archives decided
to:
• Keep sensitive data local

A Graceful Exit
• An exit strategy is important to consider when
choosing to use a cloud computing service.
• Exit strategies are “often neglected because few want to
consider the demise of what is, at the moment, a
seemingly wonderful solution, being adopted and
implemented with great effort and expectation”
(Schaffer, 2014, p. 4).
• But… how does the library get its data back (Robinson,
2015)?

Conclusion
• Digital preservation is not easy. Although there are many
technical threats to digital preservation, ultimately digital
preservation is not merely a technological challenge.
• Policies and procedures need to be in place to ensure ongoing
digital preservation.
• Cloud computing can be used successfully and
economically for digital preservation in the appropriate
situations.
• Platforms used for digital preservation should be routinely reviewed
and revaluated.
• Digital presentation, in the cloud or locally, is not something
that can be done once and then forgotten about.
• Information professionals need to review both their digital
preservation strategy and the applicability of cloud computing as
part of that strategy on an ongoing basis.

References
• ALCTS. (2007, 24 June). Definitions of Digital Preservation. Retrieved from http://www.ala.org/alcts/resources/preserv/defdigpres0408
• Anderson, M. (2011, 7 September). B is for Bit Preservation. The Signal. Retrieved from http://blogs.loc.gov/digitalpreservation/2011/09/b-is-for-bit-preservation/
• Corrado, E. M, & Moulaison, H. L. (2015, August 12). Digital preservation and the cloud: Challenges and opportunities. IFLA 2015 Pre-Conference Satellite Meeting Preservation &
Conservation Section, Durban, South Africa, 12-13 August, 2015.
• Fryer, C. & Brown, A. (2014). Case study: Archives in the cloud: Challenges and opportunities, In B. Endicott-Popovsky (Ed.), International Conference on Cloud Security Management
ICCSM-2014: ICCSM2014. London, UK.
• Han, Y. (2015). Cloud storage for digital preservation: Optimal uses of Amazon S3 and Glacier. Library Hi Tech, 33(2), 261-271.
• Iglesias, E. (2011). Using Windows Home Server and Amazon S3 to back up high-resolution digital objects to the cloud. In E. M. Corrado & H. L. Moulaison (Eds.), Getting started with
cloud computing (pp. 143-151). New York: Neal-Schuman.
• Iglesias, E., &Meesangnil, W. (2010). “Amazon S3 in digital preservation in a mid-sized academic library: a case study of CCSU ERIS digital archive system”, The Code4Lib Journal, 12.
Retrieved from: http://journal.code4lib.org/articles/4468
• Mell, P., & Grance, T. (2011, September). The NIST definition of cloud computing, Special Publication 800-145, National Institute of Standards and Technology. Retrieved from:
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
• Poulo, L., Phiri, L., & Suleman, H. (2014). Fine-grained scalability of digital library services in the cloud. In J. P. van Deventer, M. C. Matthee, H. Gelderblom, and A. Gerber
(Eds.), Proceedings of the Southern African Institute for Computer Scientist and Information Technologists Annual Conference 2014 onSAICSIT 2014 Empowered by
Technology (SAICSIT '14) (pp. 157-165), ACM, New York, NY, USA. http://doi.acm.org/10.1145/2664591.2664611
• Rackspace Support (2013, October 22). Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS. Retrieved from
https://www.rackspace.com/knowledge_center/whitepaper/understanding-the-cloud-computing-stack-saas-paas-iaas
• Robinson, J. D. (2015). "The dogs bark and the circus moves on", The Bottom Line: Managing library finances, 28, (1/2), 7-18. http://dx.doi.org/10.1108/BL-01-2015-0002
• Rosenthal, D. H. S. & Vargas, D. L. (2013). Distributed digital preservation in the cloud, The International Journal of Digital Curation, 8(1). http://dx.doi.org/10.2218/ijdc.v8i1.248
• Rosenthal, D. H. S. (2014). Talk "Costs: Why do we care?" [blog post]. Retrieved from http://blog.dshr.org/2014/11/talk-costs-why-do-we-care.html
• Ross, S. (2007). Digital preservation, archival science and methodological foundations for digital libraries: Keynote address at the 11th European Conference on Digital Libraries
(ECDL), Budapest (17 September 2007). Retrieved from http://glasgowsciencefestival.org.uk/media/media_113621_en.pdf
• Schaffer, H. (2014 March/April). Will you ever need an exit strategy? IT Pro. Retrieved from : http://www.computer.org/csdl/mags/it/2014/02/mit2014020004.pdf
• Stark, L., & Tierney, M. (2014). Lockbox: Mobility, privacy and values in cloud storage. Ethics and Information Technology, 16(1), pp 1-13.

NISO Webinar • October 14, 2015
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/cloud_services/
October 14 NISO Webinar

Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU

Oct 14 NISO Webinar: Cloud and Web Services for Libraries

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Oct 14 NISO Webinar: Cloud and Web Services for Libraries

Similar to Oct 14 NISO Webinar: Cloud and Web Services for Libraries (20)

More from National Information Standards Organization (NISO)

More from National Information Standards Organization (NISO) (20)

Recently uploaded

Recently uploaded (20)

Oct 14 NISO Webinar: Cloud and Web Services for Libraries

Editor's Notes