OpenStack at EBSCO

OpenStack at Ebsco
Nate Baechtold, IT Architect
Ebsco Information Services
August 23, 2016

Bulleted List
• The leading discovery service provider for libraries worldwide
with more than 10,000 discovery customers in over 100
countries.
• Preeminent provider of online research content for libraries,
including hundreds of research databases, historical archives,
point-of-care medical reference, and corporate learning tools
serving millions of end users at tens of thousands of institutions.
• Leading provider of electronic journals & books for libraries, with
more than 360,000 serials, including more than 57,000 e-
journals, as well as online access to more than 800,000 e-
books.
2

What did we need?
• Self service infrastructure to all development teams.
• Full stack automation to all environments.
• Increase agility and productivity of operations and development
teams.
• Lower costs by leveraging open source solutions.
• Provide a solution that integrates well with other products and
allows other products and tools to easily integrate with it.
3

Why OpenStack?
• Easy to consume API that commoditizes infrastructure with the same
methodology used by public clouds.
• Abstraction of underlying infrastructure allowing for configuration or
hardware differences to not propagate to consumers and
automation.
• Standardized interface for compute, network and storage
• When software supports OpenStack it tends to “just work”
• Allows us to build an IaaS platform fit for live services and safely
hand out access to diverse teams through built in project isolation.
• Prefer to tell consumers that “if you break it then it is our fault” rather than
giving them a long list of things that they should never do.
4

5
Current Scale
• 3 OpenStack clouds
• Approximately 1100 running
instances
• Almost 500,000 instances
created and destroyed since
general availability
• 68% of workloads concentrated
in development environments
• Around 1/3 of all virtualized
workloads currently on
OpenStack
68%
10%
22%
Distribution By Running Instance
DevQa Live DC 1 Live DC 2

6
Design Philosophy
• Build a platform to run production applications.
• Multi-tenant at its core
• Should be able to safely support development and operations teams sharing
the same cloud.
• All tools needed to build a highly available production application
need to be available
• Good enough for development but not production is not an acceptable
permanent state.
• Build general purpose solutions. Customize as little as possible.
• Provide an easy menu of infrastructure offerings
• Easy to use solution with safeguards to encourage experimentation
• Development is easier when you don’t need to worry about breaking the
environment

Current Architecture
7
Ebsco Private Cloud Platform
OpenStack CloudMonitoring
Operations
Dashboards NovaNeutron CinderGlance
Keystone Heat Ceilometer Horizon
Load Balancing

9
Problems to Solve:
• Skills and training
• Selection of vendors and
integrations
• Deployment
• Adoption
• Productionization

10
Skills and training:
Our Experiences
• Internally develop a core group of OpenStack
SMEs before progressing too far.
• Do not waste learning opportunities by relying
to much on professional services.
• Look for candidates with strong Linux,
networking, virtualization and python skills
rather than OpenStack experience.
• Give your team the time and opportunity to
experiment and learn how OpenStack works.
• Vendor support lowers the amount of
expertise you need to go to production.
• OpenStack skills are
VERY hard to hire
• Administration requires
good Linux experience
• Inexperienced
administrators can
cause huge amounts of
damage

11
Vendors and
integrations:
Our Experiences
• Prefer products that align with
OpenStack’s multi-tenancy model
whenever possible.
• Focus on vendors building for cloud rather
than trying to integrate it afterwards.
• Look at areas to improve everywhere in
the stack. Re-evaluate your product
decisions. There is high value when an
integration is done right.
• You will not know how good a vendor’s
integration is until you try it. There can be
many hidden landmines with missing
capabilities or API support.
• Tons of vendor
integrations with varying
degrees of quality
• Many established
vendors
• Users need access to
everything that they
need to deploy and
manage a highly
available production
application

Case Study – Existing Load Balancing
• Existing vendor had limited OpenStack knowledge and bare bones
integration at the time.
• Actual quote from support after a bug was discovered (vendor specific
lines edited)
• “For now, to avoid a failover, I would recommend to program the OpenStack not
to delete IPs.”
• LBaaS v1 was extremely limited. Would not have covered all
production use cases.
• Product did not support safe multi-tenancy. There were shared resources that
were a point of failure.
• Prolonged evaluation period of 6-8 months resulting in rejection.
12

Case Study – Cloud Load Balancer (AVI)
• Installation involves providing OpenStack credentials and it handles the
rest.
• Allowed us to make production grade load balancing generally available in
development within a week and produciton within a month.
• Multi-tenancy model aligns with OpenStack Projects and with keystone
• Nobody had to ask for access. If you had access to OpenStack then you have
access to a load balancing services.
• No fighting with permissions or concerns with preventing untrained users from
damaging the environment.
13

14
Problems to Solve:
Our Experiences
• Align resources for storage, networking
and datacenter teams and make sure that
someone on each team will make
troubleshooting installation issues a top
priority.
• OpenStack requires tight integration with all of
these elements. A slow troubleshooting
feedback loop will have a very negative effect
on the deployment.
• Understand what deployment choices are
difficult to change afterwards and make
sure that you got them right.
• Assume multiple tries to get a production
ready configuration.
• Deployment
• Deployments take a
long time and are
complex
• Some OpenStack
functionality is not ready
for production

15
Problems to Solve:
Our Experiences
• Have a close relationship with your early
adopters. They will help you increase the
resiliency of your deployment.
• Regularly speak with them in person to help them
understand OpenStack and to let them tell you
about issues before they become a problem.
• Get deployments into your users hands as
soon as possible.
• Do not stall getting to production. Teams will
not want to code to an API that they cannot
use in production.
• Adoption will be limited until you can get
production availability.
• Solving problems “just for development
environments” is the wrong mentality.
• Early feedback is critical.
• Adoption
• Adoption is one of the
most critical elements to
success.

16
Problems to Solve:
Our Experiences
• Monitor OpenStack by actually using
OpenStack. Build instances and use
OpenStack functionality to detect failures.
• OpenStack is very complex and understanding
the effect of a failure can be difficult.
• If you monitor by using OpenStack you will
catch most failures before your users do and
know what functionality is impacted.
• Automate common operational and
maintenance tasks.
• OpenStack HA is complex but needed for
all environments.
• Productionizaton
• OpenStack provides
building blocks but
some assembly is
required to build a
product out of it.
• Monitoring and common
operational tasks are
not solved out of the
box.

18
Phased Environments…
Prototype
• Single machine all
in one deployment
• Learn basics
• Validate direction
• Disposable
environment
Interim
• Break apart compute
and control
• Limited release to
early adopters
• Get feedback and
determine desired
configuration
DevQa
• Highly available
environment
• Treated like production
• General availability for
development workloads
• Determine
producitonization tasks
needed
Production
• Implement
productionizaiton
tasks
• Deploy production
clouds

19
What wound up happening…
Prototype Interim DevQa Production

20
Took too long to get
to production…
• Critical team member left
• Took too long finding a
replacement due to focus on
hiring OpenStack skillset.
• Additional work for monitoring
and operations automation
were required before we were
confident hosting production
workloads.
• Required skillsets that were not
a part of the OpenStack team
and focused manpower.

Solution: Create a focus squad
• Kicked of a 6 week effort with a cross-functional team that had
all required skills.
• This team would focus 100% on getting OpenStack to live.
• OpenStack tasks must be top priority for all team members.
• Director quote “Set your email to out of office if you have to”
• The focused effort was incredibly efficient.
• Feedback loops for troubleshooting massively reduced.
• Reduction of blocked tasks created a higher quality implementation.
21

What the focus squad do?
• Created a reliable monitoring solution based on Zabbix and a python
framework for executing OpenStack checks.
• Created automated recovery for problems discovered in DevQa.
• Automated compute node evacuation
• Automated failed OpenStack service recovery
• Increased visibility into the environment with Zabbix and Grafana.
• Automated common operational tasks to push button jobs in
Rundeck.
• Taking a compute or control node out of service
• Restarting OpenStack services
• Deployed all production OpenStack, Zabbix and Rundeck
infrastructure.
22

Tracking Success…
• Critical to getting continued commitment but hard to determine.
• We track the following metrics:
• Instance count and resource usage
• Number of teams and products leveraging OpenStack
• The number of instances created and deleted
• This can be a good indicator as to whether OpenStack was the right fit for your
organization. Indicates people using automation as opposed to manual usage.
23

OpenStack at EBSCO

More Related Content

What's hot

Viewers also liked

Similar to OpenStack at EBSCO

More from Tesora

Recently uploaded

OpenStack at EBSCO