Agenda
Introduction
Todd Carpenter, Executive Director, NISO
(Working placeholder title) Utilizing the Cloud to Empower Research Efforts
John “JG” Chirapurath, Senior Vice President and General Manager, ProQuest Workflow Solutions
Migrating CDL Infrastructure to Amazon Web Services
Kurt Ewoldsen, Manager, Infrastructure and Applications Support, California Digital Library, University of California
Surveying the Horizon: Preservation and the Cloud
Heather Lea Moulaison, Assistant Professor, The iSchool (School of Information Science & Learning Technologies), University of Missouri
Oct 14 NISO Webinar: Cloud and Web Services for Libraries
1. NISO Webinar:
Cloud and Web Services for Librarians
Wednesday, October 14, 2015
Presenters:
John “JG” Chirapurath,
Senior Vice President and General Manager, ProQuest Workflow Solutions
Kurt Ewoldsen,
Manager, Infrastructure and Applications Support, California Digital Library,
University of California
Heather Lea Moulaison,
Assistant Professor, The iSchool (School of Information Science & Learning
Technologies), University of Missouri
http://www.niso.org/news/events/2015/webinars/cloud_services/
2. Utilizing the Cloud to
Empower Research Efforts
John "JG" Chirapurath, Senior Vice President and
General Manager, ProQuest Workflow Solutions
4. Derived from US Department of Education, NCES Academic Libraries Survey, 1998-2008.
Electronic Resources are the
Majority of Content in Collections
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
1998 2000 2002 2004 2006 2008 2014 2020
Print Books and Jounals
.......
You
are
here
Projected change
.......
Academic Library Expenditures on Purchased and Licensed Content
5. Cloud-based Systems Benefit
Libraries
• Cloud technologies help libraries avoid spending
limited budgets on IT expenses and system
management
– Reduction in hardware costs, decrease in IT support
time and expense, more easily and rapidly updated
systems
– Provides a full disaster recovery system that libraries
don’t have to administer
• Reallocate budget to additional premium content
that benefit researchers
• Use librarians’ time to share experience and
insights with researchers which brings better value
6. Cloud-based Systems Support
Scalability & Researchers’ Access Needs
• Cloud-based systems’ scalability inherently
manages electronic resources better - key with
expanding volumes of electronic and digital
content
• Better enable collaboration between librarians
and researchers in multiple
locations
• Support research anytime,
anywhere, on any device
7. Research & Publishing Has Changed
• Emphasis on more frequent publishing in
academia
– Researchers want to share near real-time
insights gained from open access information
resources
• Focus on research as a competitive
advantage
• Trend of interdisciplinary research and
collaboration
8. Cloud-based Systems Improve
Discoverability & Collaboration
• Easier to find and share information and
resources – content discoverable more
rapidly
– Tools like the Summon® service make new
content discoverable more rapidly
• 2.5 billion+ records covering more than 90 different
content types and more than 10,000 providers
• Better supports multi-location collaboration &
access on any device
– RefWorks and other reference management
solutions make it easier for researchers to share
and collaborate
9. Next Generation LSPs Offer
Comprehensive Resource Management
• Supporting management of print & electronic
resources across the entire collection
– Unified workflows
• Assess, manage and track resources across
multiple location/geographic boundaries
• Ability to share e-resources and information for
databases and holdings across different
locations and types of libraries
• Better supports evolving role of librarians
10. And in the Future: Linked Data
• Cloud-based technologies are what make
Linked Data possible
• Allows libraries to expose relationships among
entities that users can easily follow
• Improves user navigation between
related resources, concepts, and
entities – more serendipitous
discovery
11. Library => Virtual Knowledge Center
• More diverse content types and growing body of
knowledge accessible through the library
• Researchers accessing information more fluidly
– More seamless collaboration across locations
– Faster insights leading to interdisciplinary
breakthroughs
• Librarians as value-added information
professionals collaborating with researchers at
later stages
13. Migrating CDL IT Infrastructure
to the AWS Cloud
Presented by:
Kurt Ewoldsen
Director of Infrastructure & Application Support
Kurt.Ewoldsen@ucop.edu
www.cdlib.org
14. The California Digital Library (CDL)
• Mission: The California Digital Library exists to support the
University of California community’s pursuit of scholarship and to
extend the University’s public service mission.
• Vision: The California Digital Library’s vision is to elevate the digital
library for UC so that it becomes "expansively global and deeply
local". CDL will advance the digital transition of scholarly
information in three spheres:
– Access: Scholars will have access to the highest quality research
collections worldwide through services that support and enable new
scholarship and make it as open as possible.
– Formats: CDL will support all digital formats throughout their life cycle
with a full range of services, especially to surface UC’s unique digital
assets and collections.
– Scale: Through partnerships and alliances, CDL will elevate services to
the network level for maximum impact.
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 14
15. CDL Infrastructure Timeline
Past Environments across 2 data centers
Migration to VMware environment underway
~90 VMs & ~45 physical systems (Sun/Solaris)
~200TB of SAN storage
2 small AWS accounts for grant-related work
Present Environments across 3 data centers
Final Sun/Solaris systems retired; migration to AWS underway
~30 VMs & ~200TB of SAN storage
AWS account with ~100 EC2 instances & ~50 RDS instances
Future Environments in a single platform (AWS)
VMware environment retired; no more physical infrastructure, all
equipment decommissioned
AWS account with ~150 EC2 instances & ~60 RDS instances
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 15
16. CDL Services in the Cloud
Customer Services Infrastructure Services
UC Libraries web site Nagios
CDL web site Tripwire
Calisphere LDAP
eScholarship Puppet Enterprise
EZID FTPS/SFTP
Online Archive of California NFS Home Server
Request (ILL) Bastion Servers
UC Library Reprints
HathiTrust Zephir
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 16
17. Why use the Cloud?
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 17
Cost
ScalabilityAgility
18. Think Best Value, Not Lowest Cost
• Business case for AWS migration not based on
decreased annual cost (although savings are
expected)
• AWS platform provides significant benefits for the
same spend: HA, DR, managed services (RDS),
and more
• Biggest cost savings is actually cost avoidance
– CDL was due to spend ~$750,000 to refresh
infrastructure equipment over the next 2 years; this is
now unnecessary
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 18
19. Vendor Management Cost Savings
• To support our physical infrastructure, we had
annual maintenance contracts with 10+
different vendors, totaling ~$60,000/year
• To support our AWS infrastructure, we have a
monthly maintenance fee with a single
vendor, with an annual cost of ~$30,000
• In addition to these direct cost savings, we
regain a significant amount of time formerly
spent on vendor/contract management
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 19
20. Cost Reporting
• In the typical shared or virtual environment, it is
difficult to determine the actual infrastructure
costs for any given service
• AWS has cost reporting down to a science, and I
now know in detail the infrastructure costs of
every service we support, and can share that
information with CDL managers and staff
• Good cost information leads to good decisions
regarding service development & deployment
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 20
22. Cost Reporting By Tags - Monthly
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 22
23. Cost Reporting By Tags - Daily
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 23
24. Agility & Scalability
• Our annual budget cycle requires that we forecast
infrastructure needs 18 months in advance, sometimes
more
• However, we frequently get last minute requests for
extraordinary service: significant amounts of data to
archive or processing to complete
• In our legacy environment, these requests were almost
impossible to accept
• In AWS, we have virtually unlimited capacity (provided
someone is willing to pay for it!)
• Conversely, since we have no long-term investment in
infrastructure, we can also scale down to ensure our costs
decrease if service volume or use decreases
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 24
25. Cost > Innovation > Lower Cost
Application Profile An existing service that gets an occasional data upload that
must be ingested and indexed for discovery
Legacy Configuration Single, large shared system running multiple applications at
the same time.
1st AWS Iteration Application running on a large instance, waiting for data to be
uploaded and then processing the data. The instance mostly
sits idle; so not a very cost-effective solution.
2nd AWS Iteration Watchdog process running a single, small instance waiting for
data to be uploaded. When data is detected, multiple large
instances are started to process the data, then shut down.
This decreases the processing duration and the cost at the
same time.
3rd AWS Iteration Watchdog process running a single, small reserved instance
waiting for data to be uploaded. When data is detected,
multiple large instances are purchased on the “spot” market
and started to process the data, then shut down. Further
reduces costs.
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 25
26. Grant-funded Efforts
• AWS accounts are a good way to handle grant-
funded efforts that have to be delivered upon
completion (e.g. www.archivesspace.org)
• Simply create a separate AWS account, then
configure the environment and develop the
application
• Collaboration is easy, as you control access to the
environment
• At the end of the grant, simply sign the AWS
account over the to appropriate party (provide
the access keys and update the billing entity)
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 26
27. Challenges of the Cloud Migration
• Migrations are extra work
• Re-architecting applications for the new
platform is even more work
• The cloud provides more variable
performance
– between local and cloud environments
– between processing cycles in the cloud
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 27
28. More Challenges
• The rapid rate of change in AWS can make
staying current a challenge; in fact just
understanding all of the AWS services and
potential benefits is a difficult task
• Systems still hang, crash and reboot in the
cloud and because you don’t manage the
hardware directly, you have even less visibility
into the conditions behind these events
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 28
29. What’s Next
• Complete the migration (by July 2016)
• Move from On-demand to Reserved instances
• Move all appropriate services to a H/A architecture
• Create DR capability in an alternate region
• Evaluate OpsWorks to replace Puppet
• Provide performance & cost visibility to application
development teams
• CloudTrail notifications from CloudWatch
• Look at new databases (Aurora, Redshift, DynamoDB)
• Elastic Beanstalk pilot underway (Iaas > PaaS)
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 29
30. Thanks
• Special thanks to my system administration
team and CDL application development
teams, who accepted my vision for a better
future, adopted it as their own, and worked
tirelessly to turn the plan into reality. The
success we have achieved to date is due solely
to their dedication and professionalism.
10/1/2015 Migrating CDL Infrastructure to AWS - NISO 30
31. Surveying the Horizon:
Preservation and the Cloud
October 14, 2015 NISO Webinar:
Cloud and Web Services for Librarians
Heather Lea Moulaison
iSchool at the University of Missouri
32. Rationale
• Preserving digital information is becoming an ever
more important role of libraries and archives in our
digital society
• Cloud computing and, more specifically, cloud-based
data storage offers some potential to help address the
issues information professionals face
• Yet, not everyone agrees that cloud computing is the best
solution
• By examining the problem, reviewing the literature, and
assessing best practices, it is possible to understand
some of the issues that need to be considered
33. Agenda
• What is digital preservation and why is it
important?
• Definitions
• Digital preservation challenges and opportunities
• What is cloud computing?
• Definitions
• Cloud computing challenges and opportunities
• Digital preservation and cloud computing
• Examples and case studies
• Benefits and risks
• Discussion: best practices and strategies
34. What is Digital Preservation?
• “Digital preservation combines policies, strategies and
actions to ensure access to reformatted and born
digital content regardless of the challenges of media
failure and technological change. The goal of digital
preservation is the accurate rendering of authenticated
content over time” (ALCTS, 2007).
• A number of other definitions exist
• All basically require information professionals to ensure
digital content is accessible/usable over a long period of
time.
• To do this requires the right people with the right
technology who have the support and vision they need.
35. Digital Preservation: Challenges
and Opportunities
• Challenges
• The nature of digital files
• DP is more complex than simply backing up a file
• DP requires additional technical maintenance
along with safeguarding context
• Threats include
• Technological
• Meaning-related (if metadata is lost, the context is
not clear)
• Budgetary support
• Opportunities
• Continued relevancy in digital age
• Ensure access to digital content for future
generations
• Fulfill legal requirements
https://s.yimg.com/fz/api/res/1.2/01LjnAfcStgmzNU
Q1ESasw--
/YXBwaWQ9c3JjaGRkO2g9NDA0O3E9OTU7dz0zNzA-
/http://www.mrmartinweb.com/images/type/broth
erwp3400.jpg
36. What is Cloud Computing?
• “A model for enabling convenient, on-demand network access
to a shared pool of configurable computing resources (e.g.
networks, servers, storage, applications, and services) that can be
rapidly provisioned and released with minimal management
effort or service provider interaction” (Mell & Grance, 2011 p. 1).
• As with digital preservation, a number of definitions exist
• All basically assume that information is being stored remotely through
the internet.
• Examples of cloud service provider and services: Amazon
(http://www.amazon.com/)
• Amazon Elastic Compute Cloud (EC2)
• Amazon Simple Storage Service (S3)
• Amazon Glacier
37. The NIST Cloud Computing Service
Models
http://www.servercloudcanada.com/2013/10/defining-the-cloud/
38. Cloud Computing: Challenges and
Opportunities
• Challenges
• Security in the cloud is a concern for many and can be perceived as
both a positive and a negative aspect of cloud computing
• Placing trust in a third party to store content can also be a positive
or a negative
• The uncertain future: it is possible, but not likely, that a cloud
computing company will go out of business, taking a library’s data
with it
• When data is stored in the cloud, the cloud provider has access to it
• Opportunities
• Perceived lowering of technical and financial barriers
• Increased flexibility to quickly meet increased demands, disaster
recovery, decreased software application and server maintenance,
decreased capital expenses, increased security, and cloud
computing is more environmentally friendly (Salesforce, 2015)
• Geographic diversity of server locations helps!
39. Digital Preservation and Cloud
Computing: Turnkey Solutions
• Examples:
• Preservica Cloud Edition: SaaS model.
• DuraSpace’s DuraCloud: cloud storage from either Amazon or the
San Diego Supercomputing Center (Schumacher et al., 2014).
• Cloud computing is widely utilized in industry, but Srivastava
and Verma (2015) observe that the “Application of cloud
computing in libraries is a relatively new area as compared
to its applications in business and corporate sector” (p. 33).
• Some information professional points of view:
• There are at least some benefits to using cloud computing for
digital preservation
• Questions: Is cloud computing truly the best approach for digital
preservation? If so, what might be the caveats?
40. Research into Digital Preservation
and Cloud Computing
• National Digital Stewardship Alliances’ Infrastructure Working
Group survey (Bailey, 2012):
• 74% of the members had a strong preference for controlling their
own preservation storage systems because of cost concerns,
trustworthiness, legal issues and security and risk management.
• Researchers from the University of Cape Town (Poulo, Phiri, &
Suleman, 2014):
• showed that digital library applications in the cloud can provide
adequate response time and that the response time is not
significantly affected by complexity or collection sizes.
• But… is the elasticity the cloud provides worth the cost?
• “Cloud storage really is cheaper if your demand is spiky, but digital
preservation is the canonical base-load application” (Rosenthal,
2014)
• Miller (2014) points out, “some of the most compelling attributes of
the public cloud are best suited to ephemeral or (relatively!) short-
term use cases” (¶ 2).
41. Research on Costs
• Rosenthal and Vargas (2013) performed a study in
which they experimented with running a Lots of Copies
Keep Stuff Safe (LOCKSS) (http://www.lockss.org/) box
on Amazon’s EC2 cloud backed with Amazon S3 cloud
storage.
• Their study concluded “that current cloud storage services
are not cost-competitive with local hardware for long term
storage, including for LOCKSS boxes” (Rosenthal & Vargas,
2013, p. 107).
• Han (2015), after reviewing this and other studies, “believes
that the combination of big price drops and free data transfer
within the same data zone makes the cloud storage a very
attractive solution for long-term digital preservation,
especially using [Amazon] Glacier” (p. 266).
• Benefits of cloud computing (e.g. geographically-
diverse storage) mean it is not a one-to-one
comparison.
• As Fryer and Brown (2014) write, more study in this area is
42. Best Practices/Case Studies
• Central Connecticut State University, USA (Iglesias,
2011; Iglesias & Meesangnil, 2010) decided to use
Amazon S3 for storing their long-term archival
masters for digital preservation purposes.
• In evaluating their decision, they determined that using
Amazon S3 for storage was a very good one for them:
costs were low and there was no downtime in the first
year of use (Iglesias & Meesangnil, 2010).
• The Parliamentary Archives, Houses of Parliament,
London, UK, adopted third-party cloud storage to
use with their Preserivca Enterprise digital
preservation system (Fryer & Brown, 2014). In order
to minimize risk, the Parliamentary Archives decided
to:
• Keep sensitive data local
43. A Graceful Exit
• An exit strategy is important to consider when
choosing to use a cloud computing service.
• Exit strategies are “often neglected because few want to
consider the demise of what is, at the moment, a
seemingly wonderful solution, being adopted and
implemented with great effort and expectation”
(Schaffer, 2014, p. 4).
• But… how does the library get its data back (Robinson,
2015)?
44. Conclusion
• Digital preservation is not easy. Although there are many
technical threats to digital preservation, ultimately digital
preservation is not merely a technological challenge.
• Policies and procedures need to be in place to ensure ongoing
digital preservation.
• Cloud computing can be used successfully and
economically for digital preservation in the appropriate
situations.
• Platforms used for digital preservation should be routinely reviewed
and revaluated.
• Digital presentation, in the cloud or locally, is not something
that can be done once and then forgotten about.
• Information professionals need to review both their digital
preservation strategy and the applicability of cloud computing as
part of that strategy on an ongoing basis.
46. References
• ALCTS. (2007, 24 June). Definitions of Digital Preservation. Retrieved from http://www.ala.org/alcts/resources/preserv/defdigpres0408
• Anderson, M. (2011, 7 September). B is for Bit Preservation. The Signal. Retrieved from http://blogs.loc.gov/digitalpreservation/2011/09/b-is-for-bit-preservation/
• Corrado, E. M, & Moulaison, H. L. (2015, August 12). Digital preservation and the cloud: Challenges and opportunities. IFLA 2015 Pre-Conference Satellite Meeting Preservation &
Conservation Section, Durban, South Africa, 12-13 August, 2015.
• Fryer, C. & Brown, A. (2014). Case study: Archives in the cloud: Challenges and opportunities, In B. Endicott-Popovsky (Ed.), International Conference on Cloud Security Management
ICCSM-2014: ICCSM2014. London, UK.
• Han, Y. (2015). Cloud storage for digital preservation: Optimal uses of Amazon S3 and Glacier. Library Hi Tech, 33(2), 261-271.
• Iglesias, E. (2011). Using Windows Home Server and Amazon S3 to back up high-resolution digital objects to the cloud. In E. M. Corrado & H. L. Moulaison (Eds.), Getting started with
cloud computing (pp. 143-151). New York: Neal-Schuman.
• Iglesias, E., &Meesangnil, W. (2010). “Amazon S3 in digital preservation in a mid-sized academic library: a case study of CCSU ERIS digital archive system”, The Code4Lib Journal, 12.
Retrieved from: http://journal.code4lib.org/articles/4468
• Mell, P., & Grance, T. (2011, September). The NIST definition of cloud computing, Special Publication 800-145, National Institute of Standards and Technology. Retrieved from:
http://csrc.nist.gov/publications/nistpubs/800-145/SP800-145.pdf
• Poulo, L., Phiri, L., & Suleman, H. (2014). Fine-grained scalability of digital library services in the cloud. In J. P. van Deventer, M. C. Matthee, H. Gelderblom, and A. Gerber
(Eds.), Proceedings of the Southern African Institute for Computer Scientist and Information Technologists Annual Conference 2014 onSAICSIT 2014 Empowered by
Technology (SAICSIT '14) (pp. 157-165), ACM, New York, NY, USA. http://doi.acm.org/10.1145/2664591.2664611
• Rackspace Support (2013, October 22). Understanding the Cloud Computing Stack: SaaS, PaaS, IaaS. Retrieved from
https://www.rackspace.com/knowledge_center/whitepaper/understanding-the-cloud-computing-stack-saas-paas-iaas
• Robinson, J. D. (2015). "The dogs bark and the circus moves on", The Bottom Line: Managing library finances, 28, (1/2), 7-18. http://dx.doi.org/10.1108/BL-01-2015-0002
• Rosenthal, D. H. S. & Vargas, D. L. (2013). Distributed digital preservation in the cloud, The International Journal of Digital Curation, 8(1). http://dx.doi.org/10.2218/ijdc.v8i1.248
• Rosenthal, D. H. S. (2014). Talk "Costs: Why do we care?" [blog post]. Retrieved from http://blog.dshr.org/2014/11/talk-costs-why-do-we-care.html
• Ross, S. (2007). Digital preservation, archival science and methodological foundations for digital libraries: Keynote address at the 11th European Conference on Digital Libraries
(ECDL), Budapest (17 September 2007). Retrieved from http://glasgowsciencefestival.org.uk/media/media_113621_en.pdf
• Schaffer, H. (2014 March/April). Will you ever need an exit strategy? IT Pro. Retrieved from : http://www.computer.org/csdl/mags/it/2014/02/mit2014020004.pdf
• Stark, L., & Tierney, M. (2014). Lockbox: Mobility, privacy and values in cloud storage. Ethics and Information Technology, 16(1), pp 1-13.
47. NISO Webinar • October 14, 2015
Questions?
All questions will be posted with presenter answers on
the NISO website following the webinar:
http://www.niso.org/news/events/2015/webinars/cloud_services/
October 14 NISO Webinar
Cloud and Web Services for Librarians
48. Thank you for joining us today.
Please take a moment to fill out the brief online survey.
We look forward to hearing from you!
THANK YOU
Editor's Notes
Volume and content type is growing as well as the number of locations that shared content might be in. This dramatically increases storage requirements and accessibility needs.
Modern users are demanding access to content in different ways and this has resulted in a shift in collections libraries are building. This is data from the National Center for Education Statistics and it illustrates the shift from print to electronic in collections. You can see that we have reached the “tipping point” where a small percent of collections budgets are spent on print.. Your print collection is still important, especially as a research collection, but electronic outweighs print in terms of time and priorities for support . You need to focus on your future and transform how you manage your collections.
Cloud-based systems offer expense and time savings
Cloud-based systems are designed to be more scalable for improved use, management and storage of data
Not only is content more diverse and the volume being managed by librarians growing, and the role and pace of research changing, the research process has evolved too.
Discovery services – large content volume to be searched, need for continual updates & updated indexing (AWS)
Improved processing for more frequent content indexing and metadata management
Summon is the largest unified index in the industry.
Open access information resources being ingested into cloud-based discovery systems help corporate researchers gain access to more current research.
Older technological systems are based on inventory and management of print resources and aren’t built to support the volume of electronic and digital content – opportunity for unified workflow
- Cloud systems better manage the rapid growth and complexity of content
Facilitates exposure of library data on the Web
Simplifies the processes associated with describing resources…and people, concepts, places, etc.
Simplifies management through new models for authority control
Reduces level of effort associatedwith traditional catalog management
Linked Data effort re-envisions a new bibliographic environment for libraries that makes the “network” central and interconnectedness commonplace
Makes library information accessible in the places where users are working = everywhere
Lowers metadata creation and maintenance costs
Librarians bring their unique expertise to the work of identifying and establishing more relationships between and among resources
Cloud based library systems better support the model of the library as a virtual knowledge center
Hello, my name is Kurt Ewoldsen and I am the Director of Infrastructure and Application Support for the CDL (which is just a fancy way of saying I am the IT manager here) and I am going to share a little about how we are moving the entire infrastructure supporting CDL services into the AWS cloud.
The California Digital Library supplies system-wide services to the University of California
Interesting Fact: The Department of Library Automation, which was the predecessor to the CDL, brought the first system-wide data network to UC, to support the Melvyl online library catalog, in 1982
Over the past several years, we have migrated our infrastructure from physical Sun/Solaris systems to Vmware VMs to AWS instances. That is an evolution through 3 different technology paradigms in a very short period of time.
We are a little more than 75% complete with our migration to AWS and have a number of customer-facing services in production on that platform, as well as all of the infrastructure services required to support our application development and deployment environments.
There were a number of factors that led us to move to the AWS cloud, but today I will focus on these three.
There are certainly less expensive hosting providers than AWS, but none that have the level of innovation and breadth of services that Amazon provides. The significant benefits come once you start fully leveraging the AWS ecosystem and all of the services that are available.
Every year at budget time, I would have to contact all of our support vendors and request a quote for the annual maintenance renewal.
Depending on how organized the vendor was, this could turn into a drawn out process: my contact from last year may have changed roles or left the company, they may have changed they way they assign contracts, etc.
I would use those estimates as part of my budget forecast.
Later in the year, when the actual renewal date approached, I would have to contact them all again.
First to renew the quotes, which are generally only good for 30 days, and then to work through the actual renewal process:
Generating the purchase request, ensuring the PO was created and sent to the vendor, receiving the invoice,
making sure the invoice was paid, and finally verifying that the vendor support portal or customer management system reflected the new expiration date for our account.
This is a lot of effort that does not provide direct benefit to users of our services, so I am happy to reduce the amount of my time dedicated to this activity.
When you have services sharing physical servers or VMs sharing physical hosts, it can be difficult to calculate the actual cost of the infrastructure dedicated to a particular service.
There are different types of costs to consider (capital and operational) and putting a price tag on virtual CPUs or memory assigned to a VM is challenging.
Because it is at the core of their business, AWS has cost reporting down to a science, and it is easy for this cost information to be shared within the organization.
We can track our spend on a daily basis, if we so desire, and break the costs down in a number of ways. This shows a breakdown by costs by AWS service type.
These are the monthly costs for our eScholarship service
These are the daily costs for our eScholarship service
This is one example of how an understanding of service costs led to several cycles of application architecture modifications that resulted in performance improvements and lowered costs.
AWS accounts are a good way to handle grand-funded efforts that have to be turned over at completion.
Instead of worrying about how to re-create the environment at another location when the grant ends, simply create a separate AWS account and perform the work in it. At the end of the grant, simply turn over the account credentials and change the billing information to the new owner.
We have done this with ArchivesSpace
Infrastructure migrations take time away from application development or improvement. While the long-term benefits are compelling, the short-term impact is real and can lead to delayed release of new services or features.
Modifying your applications to take advantage of the AWS platform and see the benefits of their H/A capabilities and other services takes even more effort, which takes even more time away from application development or improvement.
Performance in AWS is variable on all levels. Processing cycles will vary between legacy performance and performance in AWS. Even cycles in AWS can vary significantly between runs; a job that usually completes in 6 hours can take occasionally take twice as long, for no apparent reason.
The rate of innovation at AWS is both a blessing and a curse. It is difficult to keep up with the rate of change within the AWS environment, and with the new products and services that are continually introduced. Just last week at their annual user conference, AWS introduced more than 20 new services or significant enhancements to existing services. We are using only 11 of their 50+ services at this time.
Using the cloud just means that you are using someone else’s equipment. Technology will always have problems, no matter who is managing it, so you will still experience system problems while using the cloud, and you will need to plan your cloud deployment to accommodate these issues.
This list was created in August, and is already out of date (remember what I said about the rate of change in the AWS environment)
One of the things we are interested in pursuing is use of Elastic Beanstalk or the AWS Container Service, to move us from IaaS to PaaS
I’m using a photo from my summer vacation here -- as the leaves begin to change, I’m keen on preserving the memory of summer travels!
The title: Surveying the horizon – not a completely nuts-and-bolts approach, but enough to get you thinking, I hope.
Before I go too far, acknowledge co-author Edward M. Corrado, Associate Dean for Technology at the Univ of Alabama Lib -- without whom this presentation wouldn’t have been possible.
[NISO -- Kindly limit your presentation to 20 minutes to allow for sufficient Q&A time.
Preserving digital information is becoming an ever more important role of libraries and archives in our digital society. Cloud computing and, more specifically, cloud-based data storage offers some potential to help address the issues information professionals face. Yet, not everyone agrees that cloud computing is the best solution. By examining the problem, reviewing the literature, and assessing best practices, it is possible to understand some of the issues that need to be considered.]
Getting into this – in terms of the rationale…
Preserving digital information is becoming an ever more important part of what info pros do. Digital preservation is an economical, managerial, and technological challenge. So is cloud computing, in many ways.
Cloud computing and, more specifically, cloud-based data storage, offers some potential to help address issues faced – yet there is little in the way of out-of-the box guidance and help for preserving digital content in the cloud. Additionally, it’s not entirely clear that cloud computing is digital preservation’s best bet. [next]
[Again, this is not meant to be a workshop-like presentation, but more of an overview of points to consider and possible approaches when thinking about digital preservation and the cloud.] [next]
In terms of what we’ll cover in the next 15 or 20 minutes –
We’ll begin by talking about digital preservation – there are a lot of definitions out there, so I’ll explain what I mean when I say digital preservation and I’ll mention some of the tricky aspects of digital preservation to get us started.
Next, we’ll do the same thing for cloud computing.
I’d like to then explore the two topics together, and finish by talking about some implications. [next]
There are a number of definitions of digital preservation but they all point to the need to ensure that digital content is accessible over some period of time.
One definition of digital preservation prepared by the American Library Association’s Association for Library Collections and Technical Services (ALCTS) states that “Digital preservation combines policies, strategies and actions to ensure access to reformatted and born digital content regardless of the challenges of media failure and technological change. The goal of digital preservation is the accurate rendering of authenticated content over time” (ALCTS, 2007).
Sometimes we hear information professionals and researchers talk about “digital curation” – their definition probably overlaps quite well with the spirit of these definitions of “digital preservation”
[I mentioned a few slides ago that digital preservation is an economical, managerial, and technological challenge – this is because planning and oversight are a huge part of making digital content available into the future. What do I mean?] [next]
What do I mean?
In terms of the Digital Preservation: Challenges It is important to remember that “backup, alone, does not serve as an appropriate solution to digital archiving” (Payette, 2008).
Digital preservation is not just safeguarding the bits.
When something goes wrong and meaning or content is lost, we cannot say a digital object has been preserved. Part of the problem is the nature of digital objects. Digital objects, unlike many physical objects archived in libraries and archives, are relatively fragile. Technological threats can include physical deterioration of the storage medium used to store the digital object, being unable to read the file because the hardware and/or software is no longer supported or accessible, and loss of the software programs that can interpret the digital object (Waugh, Wilkinson, Hills, & Dell’oro, 2000).
*Nothing illustrates a point quite like an embarrassing story from personal experience – I will share with you that in the late 1980’s/early 1990s I bought a word processor that was essentially an electric typewriter with a stand alone monitor and the ability to save files to a diskette. Frankly, I couldn’t believe I actually found a photo of it online – so here it is ; this is what it looked like. In any event, I held on to all those diskettes for a long, long time after the machine was gone – I finally threw them away about 10 or 15 years ago after giving up on ever getting the contents formatted in a way that could be accessed – right about the time that I acknowledged that the stuff I was saving on them as a freshman in college wasn’t really probably worth saving anyway.*
There are other threats to long-term digital preservation besides technical failure, including budgetary. Digital preservation can be expensive and the benefits of digital preservation may not be immediately apparent to administers since the benefits are mostly in the future.
and Opportunities
But, the benefits are amazing when these challenges can be overcome! Society can use content into the future that otherwise would have been lost. This gives information professionals yet another way to be relevant in the information age. Depending on the kind of institution, saving necessary digital content can help fulfil legal requirements, or having explicit metadata explain future use scenarios has the potential to respect the wishes of donors or to better honor intellectual property rights for content, depending on the situation. [next]
What is Cloud Computing
The United States’ National Institute of Standards and Technology (NIST) defines cloud computing as “a model for enabling convenient, on-demand network access to a shared pool of configurable computing resources (e.g. networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (Mell & Grance, 2011 p. 1).
There is more to it than this, though. The NIST report from which this definition is drawn goes on to explain that that there are three service models, five essential characteristics including elasticity, and four deployment models involved with cloud computing.
Although this quite specific NIST understanding of cloud computing is useful for technologists, it might be more useful for non-IT information professionals to think of library “cloud computing as library data and services hosted beyond the library’s walls and accessible via the web” (Corrado & Moulaison, 2012).
The online sales giant Amazon (http://www.amazon.com/) is one example of a cloud computing services provider, offering services such as the Amazon Elastic Compute Cloud (EC2), Amazon Simple Storage Service (S3), and Amazon Glacier.
This is a pretty straightforward graphic showing the kinds of products involved with each service model.
In the last slide, I mentioned three service models as defined by NIST.
One of the most common cloud computing service models used by libraries is SaaS (pronounced “sass”), where software is used remotely through the cloud. In libraries, SpringShare’s offerings including LibGuides, LibAnswers, and LibCal, are popular SaaS offerings.
PaaS is another of the service models set forth in the NIST definition. PaaS “is the set of tools and services designed to make coding and deploying [cloud-based] applications quick and efficient” (Rackspace Support, 2013). Libraries with their own software developers may make use PaaS offerings such as Google App Engine, Heroku, and Microsoft Azure Services. In these cases, the library’s IT staff will develop software and then load content onto an already-robust technology platform that is running in a specific computing environment.
The third cloud computing service model is Infrastructure as a Service (IaaS) – “servers, storage and networking— on demand, in a pay-as-you-go model” (IBM, 2015). With IaaS, it is not necessary to make a significant up-front investment in computing hardware.
Cloud challenges:
As far as challenges go -- security in the cloud is a concern for many and can be perceived as both a positive and a negative aspect of cloud computing. Since systems administrators at the local institution don’t have as much control of security as they would with a locally-hosted applications, this concern is a valid one. However, since cloud services are typical provided by large organizations that have the resources and incentives to invest in security professionals, the cloud may provide better security than locally hosted options, especially when the library does not have a full complement of IT professionals (Srivastava & Verma, 2015). Additionally, the issue of placing trust in a third party to store content can also be a positive or a negative. Although it is possible that a large company’s servers might go off-line, it is probably less likely that a large company’s servers will go off-line than a single library’s.
It is also not likely that a cloud computing company will go out of business, taking a library’s data with it, but again, it is possible. Other potentially negative aspects of cloud computing that libraries need to consider is privacy of personal information or the protection of content. When data is stored in the cloud, the cloud-provider has access to it. As a result, there may be various legal and organizational policy implications.
Dark archives where content cannot be shared with users without permission must be sensitive to potential issues of privacy and security of content. Additionally, some governments do not permit certain data to be stored outside of their country or in specific countries due to the legislation under which the cloud computing providers are required to operate.
Opportunities:
Cloud computing has been touted as a solution for many technological challenges because of its primary perceived benefit: lowering technical and financial barriers. Other benefits of cloud computing include increased flexibility to quickly meet increased demands, disaster recovery, decreased software application and server maintenance, decreased capital expenses, increased security, and … cloud computing is more environmentally friendly (Salesforce, 2015).
These benefits can also apply when hosting digital library content in the cloud.
In particular, cloud-based storage permits on-demand provisioning and, because of its geographic diversity with servers located in different geographical regions, natural disasters such as earthquakes or floods might affect one set of servers where the information is housed, but not the other sets of servers located on the other side of the world. Keeping copies of data in multiple geographic areas, something cloud-based storage is designed to do, aligns well with digital preservation best practices of maintaining between two and six copies of digital content (Anderson, 2011).
Some turnkey solutions already exist for carrying out digital preservation in the cloud. Services such as Preservica Cloud Edition provide cloud-based digital preservation services using the SaaS model. DuraSpace also provides preservation services via their DuraCloud offering that utilizes cloud storage from either Amazon or the San Diego Supercomputing Center (Schumacher et al., 2014).
Cloud computing is widely utilized in business, industry, and in the corporate sector, but is somewhat new in libraries. Cloud computing is even more new when considering its use in digital preservation in libraries.
Many libraries see that there are benefits to using cloud computing for digital preservation, but there is not universal agreement that cloud computing is the best approach to use for digital preservation -- and the research at present is divided. [next]
Although cloud-based preservation has the many benefits of cloud computing mentioned earlier, there are some potential drawbacks. For instance, a survey of the National Digital Stewardship Alliances’ Infrastructure Working Group showed that 74% of the members had a strong preference for controlling their own preservation storage systems because of cost concerns, trustworthiness, legal issues and security and risk management (Bailey, 2012).
Researchers from the University of Cape Town conducted a set of experiments to investigate “the scalability of typical digital library services that use cloud computing facilities for core processing and storage” (Poulo, Phiri, & Suleman, 2014, p. 157). Their experiments showed that digital library applications in the cloud can provide adequate response time and that the response time is not significantly affected by complexity or collection sizes.
Yet, Rosenthal reminds us that for libraries to be able scale up quickly and recruit additional computing power from the cloud is great, but libraries tend to have a very predictable need for storage that does not necessitate this kind of elasticity.
On a somewhat related note, Miller (2014) points out, essentially, that the purported benefits of cloud computing, especially the financial savings, are achieved because of the elasticity that allows users to only pay for the computing resources they need, so if lots of computing power is needed quickly, cloud computing can support that need instantly. This benefit, then, isn’t really a benefit when it comes to long-term storage of digital content. [next]
Cost can be a major issue in digital preservation and also in cloud computing.
To investigate the costs of using cloud storage for long term storage using the Lots of Copies Keep Stuff Safe (LOCKSS) program from Stanford University (http://www.lockss.org/), Rosenthal and Vargas (2013) performed a study in which they experimented with running a LOCKSS box on Amazon’s EC2 cloud backed with Amazon S3 cloud storage. Their study concluded “that current cloud storage services are not cost-competitive with local hardware for long term storage, including for LOCKSS boxes” (Rosenthal & Vargas, 2013, p. 107).
One of the major advantages of using cloud-based storage versus local storage for digital preservation is that it typically includes redundant and geographically-diverse storage, which is considered a best practice for digital preservation. Therefore, it is not a one-to-one price comparison to the costs of storage on a local machine. As Fryer and Brown (2014) write, more study in this area is needed to determine the economic reality of using cloud storage for digital preservation. One thing that is relatively clear, however, is that it is important to understand the costs involved with cloud computing and cloud storage. In most cases the costs are not just based on the amount of data stored, but also on the bandwidth used. Therefore, information professionals need to factor in the potential data-transfer fees which can be a large part of the overall costs if cloud storage (Han, 2015).
Of course, when in doubt, a great option is always to see what others have done!
There have been a few case studies published that involve preserving digital objects using the cloud. Two are presented here.
Edward Iglesias – who just recently left Central Connecticut State University -- decided a few years ago to use Amazon S3 for storing their long-term archival masters for digital preservation purposes. After reviewing various options, they decided on Amazon S3 because the overall costs were lower than other options and it offered immediate access to archival files and provided redundant, geographically diverse storage. In evaluating their decision, they determined that using Amazon S3 for storage was a very good one for them and that the costs were low and there was no downtime in the first year of use (Iglesias & Meesangnil, 2010).
The Parliamentary Archives, Houses of Parliament, London, United Kingdom, adopted third-party cloud storage to use with their Preserivca Enterprise digital preservation system. Although the Preservica Enterprise software is installed on a local server, most of the digital objects are stored in the cloud. Fryer and Brown’s case study explored some of the opportunities and challenges of using the cloud for digital preservation. One of the biggest challenges was how to deal with sensitive data. The Parliamentary Archives decided to keep sensitive data local while storing the bulk of their data, which is publicly accessible, using cloud storage. As mentioned, one of the risks associated with cloud computing is the cloud provider going out of business or otherwise doing something to make the digital objects in the cloud inaccessible. In order to minimize this risk, the Parliamentary Archives decided to store their data with two different cloud providers.
And, finally,
An exit strategy is important to consider when choosing to use a cloud computing service. Exit strategies are often neglected -- However, no solution lasts forever and the perfect solution today may not be in the future! It is important to ask if a library stores data in the cloud, but the cloud provider goes out of business, or raises their prices, or for whatever reason the library simply does not like its service any more, how does the library get its data back (Robinson, 2015)?
This is especially an issue when dealing with large amounts of data that might be stored for digital preservation. These questions should be addressed and answered when signing a contract if at all possible. If the library can’t negotiate with the provider (a small, single library will not likely have much leverage negotiating with Amazon or Google) it is important for the library to understand the terms of service and what they mean for the libraries’ ability to migrate to a different solution in the future. Librarians need to determine who owns the data that they enter into a cloud system, how they will store and manage their data in the cloud, and how they get their data back out. These issues can be even more pronounced in a PaaS or SaaS solution since the data may be intrinsically tied to the cloud application or platform.
In conclusion… ***
Digital preservation isn’t easy.
Although there are many technical threats to digital preservation, ultimately digital preservation is not merely a technological challenge. Policies and procedures need to be in place to ensure ongoing digital preservation.
Cloud Computing can be successfully and economically used for digital preservation in the appropriate situations. However, cloud computing platforms used for digital preservation should be routinely reviewed and revaluated.
Digital presentation is not something that can be done once and then forgotten about. Likewise, cloud computing is a quickly evolving technical field. Information professionals need to review both their digital preservation strategy and the applicability of cloud computing as part of that strategy on an ongoing basis.