SlideShare a Scribd company logo
© 2014 VMware Inc. All rights reserved.
OpenStack: Toward a More
Resilient Cloud
Or: “How I Learned to Stop Worrying &
Embrace My Inner Cloudiness”
Mark T. Voelker
OpenStack Architect
Percona University Smart Data Event
February 12, 2015
• Who is this guy?
• A little background on OpenStack
• Building more resilient clouds
– Withstanding failures
– Quickly recovering from failures
• Questions?
@marktvoelker
• OpenStack Architect @ VMware, OpenStack ATC, Ex-StackForge Puppet
core dev, Triangle OpenStack Meetup founder, OS Foundation Member #54
• Fact: can be bribed with doughnuts
• Currently works in VMware’s Software Defined Datacenter R&D group
• In copious (hah!) spare time: OpenStack solutions, Big Data, Massively
Scalable Data Centers, Devops, making sawdust with extreme prejudice
• Tech lead, manager, software developer, architect
• Started in OpenStack in 2011 at the Diablo Design Summit
….I’ve built a few clouds.
Today’s talk won’t be overly formal….
…because I tend to get excited by this stuff.
• Who is this guy?
• A little background on OpenStack
• Building more resilient clouds
– Withstanding failures
– Quickly recovering from failures
• Questions?
“OpenStack is a global collaboration of developers and cloud computing
technologists producing the ubiquitous open source cloud computing
platform for public and private clouds. The project aims to deliver
solutions for all types of clouds by being simple to implement, massively
scalable, and feature rich. The technology consists of a series of
interrelated projects delivering various components for a cloud
infrastructure solution.”
-- openstack.org
Basically, it’s software to run cloud
services—including compute, network,
storage, and security—and the
community behind that software.
Source: IDG Connect Survey
• IRC Channels and Mailing Lists
• User/Meetup Groups
• Social Networking
– Twitter
– LinkedIn
– Facebook
– Ohloh
• Code in cgit, mirrored on GitHub, Bugs/Milestones in Launchpad
• For now…may move to StoryBoard in future
• Over 20 million lines of code by over 1,419 contributors
• Two Annual Design Summit/Conferences (coinciding roughly
w/releases)
• Want to contribute? Start here.
• Don’t be intimidated.
• HolycrapthingsmovereallyreallyfastinOpenStack
• Jump in feet first: be agile and flexible.
• This is going to feel a little different for some of you.
www.meetup.com/Tri
angle-OpenStack-
Meetup
Next meetups: Feb. 24 & Mar. 11!
Now 550!
Horizon
Nova
Neutron
Swift (Object Storage)
Cinder (Block storage)
Glance
(VM Image Service)
Keystone
(Identity Service)
AWS Management Console
EC2
VPC
S3
EBS
Ceilometer
(Telemetry Service)
Trove
(Database Service) Heat
(Orchestration Service)Sahara
(Data Processing Service)
Library Projects
Supporting Projects
Documentation
Oslo (common code libraries)
Client libraries
Incubated Projects
(may become core
components in the future)
Designate (DNS service)
Zaqar (queuing service)
Gating Projects
CI & Infrastructure
DevStack (deployment script)
Tempest (integration test)
Barbican (key management)
Manila (shared FS as a
service)
• Who is this guy?
• A little background on OpenStack
• Building more resilient clouds
– Withstanding failures
– Quickly recovering from failures
• Questions?
What’s a “resilient” cloud?
re·sil·ient
/rəˈzilyənt/
(adjective) Able to withstand or recover quickly from difficult
conditions.
• Today we’ll primarily focus on the cloud itself
• Workloads running *in* clouds are another story…but we’ve only
got one hour!
8am: “Uh-oh. Something tells me it’s going to be an interesting day in
the datacenter….”
• Hardware Failures
• OpenStack software bugs (yep, those exist)
• Underpinning software failures (database, message queue, etc)
• Operating system failures
• Network/storage/power failures
• Planned maintenance windows
• Hackers and malcontents
• Upgrades
• Automation failures
• “Whoops, did I do that?”
Some causes of outages in the past year
….did you plan for these?
CONFIDENTIAL
24
Sometimes things break (in *any* system).
25
Withstand what you can. Quickly recover from the rest.
Because you don’t look this cute when your cloud is down.
High
Availability?
Sounds
great--I’ll
take two!
General Premise:
Assume hardware and software fail.
(because, shockingly, that actually happens)
CONFIDENTIAL
28
What Does “HA” Mean in an OpenStack Cloud?
CONFIDENTIAL
29
• Compute
• Multiple clusters
• Consider segmenting with Availability Zones, Host Aggregates, etc
• Consider ability to live migrate instances for hypervisor node
maintenance
• Ensure some capacity buffer for maintenance operations
• Storage
• Avoid single points of failure
• Multiple technologies can be used…but each has it’s own limitations
• Don’t think just Cinder here…your Glance backend and compute
storage matter too!
• Network
• Network disruptions will inevitably occur, so plan for them
• Design for control plane disruption (and pick technology accordingly)
• Control Plane
• May depend on the other things above
• Essential to keeping the cloud operational
• Data Plane
• Stuff that workloads running in the cloud depends on
High Availability Is Part of the Story….
….we need to think a bit about architecture.
(I’ll use a reference architecture from VMware Integrated OpenStack
as an example)
CONFIDENTIAL
30
VIO Architecture – Logical Topology-
Management
Notice something?
There’s a lot of stuff in there that isn’t OpenStack, but upon which
OpenStack depends.
CONFIDENTIAL
32
VIO Architecture – RabbitMQ
• RabbitMQ is a messaging broker - an
intermediary for messaging. It gives
applications a common platform to send and
receive messages, and the messages a safe
place to live until received.
• RabbitMQ is the default AMQP server used by
OpenStack services (Qpid is also an option,
some support for 0mq). In production clouds,
this should be a highly available infrastructure
component.
• The OpenStack subcomponents (nova-
scheduler to nova-compute, for example)
communicate among themselves using this
hosted message queue service. They also
utilize the hosted Memcached services for
caching authentication tokens etc. As always,
they persist data to the Database.
• Component-to-Component communications
(Nova-> Neutron) is done via REST.
• For more details about the HA implementation
of RabbitMQ, please click here.
VIO Architecture – Database
• The database cluster is at the heart of the
infrastructure. Typically MySQL or MariaDB
are used, but other options such as
PostgreSQL are also supported.
• The VIO MariaDB implementation makes use
of a 3-node Galera cluster, which in itself is
Active-Active-Active. However, since some
OpenStack services enforce table locking,
reads and writes are directed to a single
node via the Load Balancers.
• Note that this database is for management
plane data. OpenStack services that store
data as part of their purpose may use
additional DB’s. For example: Ceilometer
may store meter data to MySQL, Mongo,
PostgreSQL, HBase, or DB2.
VIO Architecture – Load Balancers
• Most OpenStack Services run on the
Controllers, which are mirrored on each
controller VM and load-balanced. They
are accessible via the internal virtual IP.
• Some of the services, such as the
Dashboard, compute-api, glance-api,
keystone, cinder, neutron and
novncproxy are exposed to the end
users via the load balancer’s public virtual
IP.
• Likewise, the hosted Message Queue
(RabbitMQ) and Memcached services are
also load-balanced between 2 VMs.
• For the Database Service, the load
balancer is configured to use a primary
DB VM. In case of failure it will switch to
one of the two backup DB VMs.
• Load Balancers user here are HAProxy
with Keepalived for high availability
Etc, etc, etc
CONFIDENTIAL
36
• There’s a network connecting all that stuff
• It’s running (as virtual machines) on servers which have operating systems
• Things may get wonky if NTP fails and clocks are out of sync
• If DNS can’t resolve, Bad Things ™ will probably happen
• Consider whether you want active/active or active/passive
• Setup and tooling differs a bit, but I generally like active/active
• Note that docs.openstack.org has an HA Guide
• Currently undergoing lots of updates…patches welcome!
• Prioritize HA for the control plane
• That also means thinking about your database, network, and RPC bus
• Note: HA == more hardware
• Some components need at least 3 nodes
• Mitigate by virtualizing control plane
• Stuff OpenStack needs to run: message brokers
• Check out RabbitMQ clustering and mirrored queues
• Check out Galera for MySQL/MariaDB
• I often see Percona XtraDB in the wild
• Frontend with an HAProxy/Keepalived pair
• Memcached for caching
• Don’t do rabbit clustering
over a WAN
• Be aware of the SELECT…
FOR UPDATE issue
• Long story short: Neutron and some parts of Nova invoke an SQL
pattern known as “SELECT…FOR UPDATE” which Galera doesn’t
support due to issues with cross-node locking.
• Can cause deadlock-like symptoms due to locks not being
replicated.
• Neutron/nova code being refactored, but will likely not be done
soon.
• Meanwhile: use HAProxy to send writes to a single Galera node
and you should be fine
• With the obvious scalability bottleneck
• More info here, here, & here.
• Thank Jay Pipes of Mirantis & Peter
Boros of Percona for the find!
• Pick a highly available storage to back Glance
• Pick a highly available storage backend for Cinder too
– SAN, distributed, software defined, plethora of options here
• Use Keepalived/HAProxy to front-end multiple API servers
• Or another load balancer technology of your choice
• Can be deployed as dedicated nodes for scale, or cohabitate
• Data plane network: DVR & Provider Network Extensions
• Distributed Virtual Routers are a new experimental feature in Juno (not yet
ready for production)
• Please go test it and report/fix bugs!
• Provider networks essentially punt the availability issue to your physical
network
• Allows you to use standard tools like virtual port channels and VRRP
• Also highly performant
• Network: consider your backend technology
– Neutron offers a variety of plugins for various open source and vendor-supplied
network technology
– Physical networks need usual redundancy protections
– Overlays are popular for segmentation/isolation; some scale better than others
– Shameless plug: check out VMware NSX which has been used in some very
large OpenStack deployments!
• Actually, most of these techniques and technologies are things that
seasoned developers and sysadmins have used before.
• It doesn’t take a genius to learn lessons from the past and apply
them, tweak them, and tune them (but it’s a fair amount of work).
Simple Rules of Thumb
Planning for availability can go to extreme levels, so start simple:
– Can I take any one [server | switch | storage unit] out of service in my control
plane and still be operational?
– For all of the above, what’s the impact?
• Performance hit?
• Capacity loss?
• World is broken?!?
– For all the above, how easy is it to reintroduce a repaired/replaced $thing?
• Is there a recovery period that will further impact performance?
• Is it a complex procedure?
• Does the procedure cause more $things to be temporarily unavailable?
– For all of the above, how can I monitor & alert for failure?
Once you have that down, dig deeper to your heart’s content.
CONFIDENTIAL
44
Recover Quickly
CONFIDENTIAL
45
Rule 1: Assume You Will Need to Change Stuff
• Change can be a lot of things:
– Hardware or software upgrades/patches/replacements
– Configuration tweaks
– Adding or subtracting capacity
– All systems change over time; OpenStack clouds are no exception.
CONFIDENTIAL
46
“Change in OpenStack?
Yeah, I’m good with
that…”
Rule 2: Assume You Can’t Manually Log In To All
The Nodes To Make Those Changes
• OpenStack is a series of cooperating distributed systems
– That means you could (potentially) have a lot of nodes
– Software & config must often be placed on many machines
– Manual changes == slow changes != quick recovery
CONFIDENTIAL
47
“I guess multitasking
only speeds things up
to a certain extent…”
Rule 3: Assume You Will Need To Test Stuff
• It’s a good idea to have a small test cloud where you can examine the
impact of changes
• When possible, roll out changes to a portion of your cloud and evaluate
before rolling out the rest
– Note: this means you need tests and monitoring…otherwise you don’t know
what “ok” looks like
CONFIDENTIAL
48
“It’s 3am and I’m still
debugging in
production…maybe I
should have taken the
time to set up a test
environment and
automate some testing
after all…”
Pile of
Bash
Scripts
• Software developers and operators are increasingly the same
people.
– Agile development
– Automate (almost) everything
– Treat config & changes as you would code
– Continuous integration, testing, deployment
– Incremental change & iteration
– Unified tooling & versioning
– Critical approach to working at scale
– Really useful for building resilient clouds
Image courtesy of Rajiv Pant (http://en.wikipedia.org/wiki/File:Devops.png)
How Configuration Tools Management Help
• Can orchestrate deployment….and re-deployment.
• Most can idempotently check configuration (no-op if everything is ok)
• Can touch many nodes in parallel
• Can type much faster and more accurately than you
• Are a great way to collaborate amongst teams of operators
• Most have strong communities within the OpenStack universe
– Using a commercial OpenStack? Most vendors are using one of these tools
to deploy and manage your cloud, whether you know it or not.
– Rolling your own? Check out StackForge for tons of Ansible/Puppet/Chef
modules you can use today
• Allow you to manage other things besides OpenStack itself
CONFIDENTIAL
51
• I’ve worked on a lot of OpenStack clouds and almost everyone has
their own preferred monitoring toolset.
• One possible exception: almost everybody seems to love Graphite.
• The golden rule is: use the tools that work for you!
• Very often this will be whatever you’re using in the rest of your infrastructure.
• Break it down into at least two buckets:
• Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are
OpenStack plugins out there on NagiosExchange)
• Trending data collection/plotting (ex: collectd/statsd feeding graphite)
• Don’t forget logging
• LogInsight, Logstash, etc.
• Also: use your peers!
• Operators often willing to share, so ask on the openstack-operators list.
• Ok, this could take another hour, so I’ll just hit a few highlights…
• Make use of OpenStack’s segregation features
– Availability zones, host aggregates, regions, server groups for compute
– Regions and zones for Swift
• Plan to make infrastructure maintenance less impacting
– Put API servers behind load balancers
– Virtualize tenant-facing parts of the control plane for greater scale and mobilty
– Use host evacuation and live migration to reduce impact
– OpenStack is extremely pluggable…choose your backends wisely
• You should know how to operate, monitor, and troubleshoot them
• Understand how their drivers interact with OpenStack
• You should be comfortable with their failure and recovery modes
Questions?
@marktvoelker
http://openstack.org/
http://www.vmware.com/products/openstack/
Thank you!

More Related Content

What's hot

(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
StackStorm
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre EvolutionGavin McCance
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
buildacloud
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with Puppet
Mark Voelker
 
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
Edureka!
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstackFramgia Vietnam
 
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
Mirantis
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
Giuseppe Paterno'
 
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
Dan Wendlandt
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking Architecture
Randy Bias
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
buildacloud
 
An Introduction to OpenStack
An Introduction to OpenStackAn Introduction to OpenStack
An Introduction to OpenStack
Scott Lowe
 
OpenStack Framework Introduction
OpenStack Framework IntroductionOpenStack Framework Introduction
OpenStack Framework Introduction
Jason TC HOU (侯宗成)
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebula Project
 
Successfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIOSuccessfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIO
Arraya Solutions
 
VMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIOVMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIOFilip Verloy
 
Bridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsBridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware Administrators
Kenneth Hui
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
Yaniv Zadka
 
Cloud Computing Open Stack Compute Node
Cloud Computing Open Stack Compute NodeCloud Computing Open Stack Compute Node
Cloud Computing Open Stack Compute Node
Palak Sood
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 clean
benrodrigue
 

What's hot (20)

(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
(SCALE 12x) OpenStack vs. VMware - A System Administrator Perspective
 
CERN Data Centre Evolution
CERN Data Centre EvolutionCERN Data Centre Evolution
CERN Data Centre Evolution
 
The Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep VittalThe Future of SDN in CloudStack by Chiradeep Vittal
The Future of SDN in CloudStack by Chiradeep Vittal
 
What's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with PuppetWhat's New in Grizzly & Deploying OpenStack with Puppet
What's New in Grizzly & Deploying OpenStack with Puppet
 
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
OpenStack Cloud Tutorial | What is OpenStack | OpenStack Tutorial | OpenStack...
 
Build public private cloud using openstack
Build public private cloud using openstackBuild public private cloud using openstack
Build public private cloud using openstack
 
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
2 Day Bootcamp for OpenStack--Cloud Training by Mirantis (Preview)
 
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...OpenStack Explained: Learn OpenStack architecture and the secret of a success...
OpenStack Explained: Learn OpenStack architecture and the secret of a success...
 
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)OpenStack + VMware: Everything You Need to Know (Kilo-edition)
OpenStack + VMware: Everything You Need to Know (Kilo-edition)
 
OpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking ArchitectureOpenStack Scale-out Networking Architecture
OpenStack Scale-out Networking Architecture
 
Introduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David NalleyIntroduction to Apache CloudStack by David Nalley
Introduction to Apache CloudStack by David Nalley
 
An Introduction to OpenStack
An Introduction to OpenStackAn Introduction to OpenStack
An Introduction to OpenStack
 
OpenStack Framework Introduction
OpenStack Framework IntroductionOpenStack Framework Introduction
OpenStack Framework Introduction
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Successfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIOSuccessfully Deliver and Operate OpenStack in Production with VMware VIO
Successfully Deliver and Operate OpenStack in Production with VMware VIO
 
VMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIOVMUG22 Filip Verloy VIO
VMUG22 Filip Verloy VIO
 
Bridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware AdministratorsBridging The Gap: Explaining OpenStack To VMware Administrators
Bridging The Gap: Explaining OpenStack To VMware Administrators
 
Introduction to openstack
Introduction to openstackIntroduction to openstack
Introduction to openstack
 
Cloud Computing Open Stack Compute Node
Cloud Computing Open Stack Compute NodeCloud Computing Open Stack Compute Node
Cloud Computing Open Stack Compute Node
 
Oct meetup open stack 101 clean
Oct meetup open stack 101   cleanOct meetup open stack 101   clean
Oct meetup open stack 101 clean
 

Similar to OpenStack: Toward a More Resilient Cloud

Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
Microsoft
 
OpenStack 101
OpenStack 101OpenStack 101
OpenStack 101
All Things Open
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
Joe Brockmeier
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stackNitin Mehta
 
Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014
Miguel Zuniga
 
CloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestCloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestke4qqq
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
Vinay Rao
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
lalitjangra9
 
Openstack presentation
Openstack presentationOpenstack presentation
Openstack presentationSankalp Jain
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11
Joe Brockmeier
 
OpenStack Deployment in the Enterprise
OpenStack Deployment in the Enterprise OpenStack Deployment in the Enterprise
OpenStack Deployment in the Enterprise
Cisco Canada
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
Chiradeep Vittal
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
John Burwell
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
Christian Posta
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
Jonas Brømsø
 
Chicago Microservices Integration Talk
Chicago Microservices Integration TalkChicago Microservices Integration Talk
Chicago Microservices Integration Talk
Christian Posta
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
Steve Wong
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWSTom Laszewski
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
Susan Wu
 

Similar to OpenStack: Toward a More Resilient Cloud (20)

Cloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: OpenstackCloud Architect Alliance #15: Openstack
Cloud Architect Alliance #15: Openstack
 
OpenStack 101
OpenStack 101OpenStack 101
OpenStack 101
 
Getting Started with Apache CloudStack
Getting Started with Apache CloudStackGetting Started with Apache CloudStack
Getting Started with Apache CloudStack
 
Hacking apache cloud stack
Hacking apache cloud stackHacking apache cloud stack
Hacking apache cloud stack
 
Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014Private cloud cloud-phoenix-april-2014
Private cloud cloud-phoenix-april-2014
 
Txlf2012
Txlf2012Txlf2012
Txlf2012
 
CloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWestCloudStack - LinuxFest NorthWest
CloudStack - LinuxFest NorthWest
 
State of the Container Ecosystem
State of the Container EcosystemState of the Container Ecosystem
State of the Container Ecosystem
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 
Openstack presentation
Openstack presentationOpenstack presentation
Openstack presentation
 
Taking the open cloud to 11
Taking the open cloud to 11Taking the open cloud to 11
Taking the open cloud to 11
 
OpenStack Deployment in the Enterprise
OpenStack Deployment in the Enterprise OpenStack Deployment in the Enterprise
OpenStack Deployment in the Enterprise
 
Directions for CloudStack Networking
Directions for CloudStack  NetworkingDirections for CloudStack  Networking
Directions for CloudStack Networking
 
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs reduxBetter, faster, cheaper infrastructure with apache cloud stack and riak cs redux
Better, faster, cheaper infrastructure with apache cloud stack and riak cs redux
 
Cloud Native Camel Riding
Cloud Native Camel RidingCloud Native Camel Riding
Cloud Native Camel Riding
 
Stackato v2
Stackato v2Stackato v2
Stackato v2
 
Chicago Microservices Integration Talk
Chicago Microservices Integration TalkChicago Microservices Integration Talk
Chicago Microservices Integration Talk
 
Introduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OSIntroduction to Apache Mesos and DC/OS
Introduction to Apache Mesos and DC/OS
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWSMigrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 

More from Mark Voelker

Open Clouds: The New Primitives in Enterprise IT & Mobile Networks
Open Clouds: The New Primitives in Enterprise IT & Mobile NetworksOpen Clouds: The New Primitives in Enterprise IT & Mobile Networks
Open Clouds: The New Primitives in Enterprise IT & Mobile Networks
Mark Voelker
 
Open Source in the Era of 5G - All Things Open 2018
Open Source in the Era of 5G - All Things Open 2018Open Source in the Era of 5G - All Things Open 2018
Open Source in the Era of 5G - All Things Open 2018
Mark Voelker
 
OpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud EcosystemOpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud Ecosystem
Mark Voelker
 
Interoperable Clouds and How to Build (or Buy) Them
Interoperable Clouds and How to Build (or Buy) ThemInteroperable Clouds and How to Build (or Buy) Them
Interoperable Clouds and How to Build (or Buy) Them
Mark Voelker
 
InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)
Mark Voelker
 
Considerations for Operating An OpenStack Cloud
Considerations for Operating An OpenStack CloudConsiderations for Operating An OpenStack Cloud
Considerations for Operating An OpenStack Cloud
Mark Voelker
 

More from Mark Voelker (6)

Open Clouds: The New Primitives in Enterprise IT & Mobile Networks
Open Clouds: The New Primitives in Enterprise IT & Mobile NetworksOpen Clouds: The New Primitives in Enterprise IT & Mobile Networks
Open Clouds: The New Primitives in Enterprise IT & Mobile Networks
 
Open Source in the Era of 5G - All Things Open 2018
Open Source in the Era of 5G - All Things Open 2018Open Source in the Era of 5G - All Things Open 2018
Open Source in the Era of 5G - All Things Open 2018
 
OpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud EcosystemOpenStack & the Evolving Cloud Ecosystem
OpenStack & the Evolving Cloud Ecosystem
 
Interoperable Clouds and How to Build (or Buy) Them
Interoperable Clouds and How to Build (or Buy) ThemInteroperable Clouds and How to Build (or Buy) Them
Interoperable Clouds and How to Build (or Buy) Them
 
InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)InteropWG Intro & Vertical Programs (May. 2017)
InteropWG Intro & Vertical Programs (May. 2017)
 
Considerations for Operating An OpenStack Cloud
Considerations for Operating An OpenStack CloudConsiderations for Operating An OpenStack Cloud
Considerations for Operating An OpenStack Cloud
 

Recently uploaded

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
DianaGray10
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Jeffrey Haguewood
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
Prayukth K V
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
Cheryl Hung
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 previewState of ICS and IoT Cyber Threat Landscape Report 2024 preview
State of ICS and IoT Cyber Threat Landscape Report 2024 preview
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 

OpenStack: Toward a More Resilient Cloud

  • 1. © 2014 VMware Inc. All rights reserved. OpenStack: Toward a More Resilient Cloud Or: “How I Learned to Stop Worrying & Embrace My Inner Cloudiness” Mark T. Voelker OpenStack Architect Percona University Smart Data Event February 12, 2015
  • 2. • Who is this guy? • A little background on OpenStack • Building more resilient clouds – Withstanding failures – Quickly recovering from failures • Questions?
  • 3. @marktvoelker • OpenStack Architect @ VMware, OpenStack ATC, Ex-StackForge Puppet core dev, Triangle OpenStack Meetup founder, OS Foundation Member #54 • Fact: can be bribed with doughnuts • Currently works in VMware’s Software Defined Datacenter R&D group • In copious (hah!) spare time: OpenStack solutions, Big Data, Massively Scalable Data Centers, Devops, making sawdust with extreme prejudice
  • 4. • Tech lead, manager, software developer, architect • Started in OpenStack in 2011 at the Diablo Design Summit
  • 5. ….I’ve built a few clouds.
  • 6. Today’s talk won’t be overly formal….
  • 7. …because I tend to get excited by this stuff.
  • 8. • Who is this guy? • A little background on OpenStack • Building more resilient clouds – Withstanding failures – Quickly recovering from failures • Questions?
  • 9. “OpenStack is a global collaboration of developers and cloud computing technologists producing the ubiquitous open source cloud computing platform for public and private clouds. The project aims to deliver solutions for all types of clouds by being simple to implement, massively scalable, and feature rich. The technology consists of a series of interrelated projects delivering various components for a cloud infrastructure solution.” -- openstack.org Basically, it’s software to run cloud services—including compute, network, storage, and security—and the community behind that software.
  • 10.
  • 11.
  • 13. • IRC Channels and Mailing Lists • User/Meetup Groups • Social Networking – Twitter – LinkedIn – Facebook – Ohloh • Code in cgit, mirrored on GitHub, Bugs/Milestones in Launchpad • For now…may move to StoryBoard in future • Over 20 million lines of code by over 1,419 contributors • Two Annual Design Summit/Conferences (coinciding roughly w/releases) • Want to contribute? Start here.
  • 14. • Don’t be intimidated. • HolycrapthingsmovereallyreallyfastinOpenStack • Jump in feet first: be agile and flexible. • This is going to feel a little different for some of you.
  • 16. Horizon Nova Neutron Swift (Object Storage) Cinder (Block storage) Glance (VM Image Service) Keystone (Identity Service) AWS Management Console EC2 VPC S3 EBS Ceilometer (Telemetry Service) Trove (Database Service) Heat (Orchestration Service)Sahara (Data Processing Service)
  • 17. Library Projects Supporting Projects Documentation Oslo (common code libraries) Client libraries Incubated Projects (may become core components in the future) Designate (DNS service) Zaqar (queuing service) Gating Projects CI & Infrastructure DevStack (deployment script) Tempest (integration test) Barbican (key management) Manila (shared FS as a service)
  • 18.
  • 19. • Who is this guy? • A little background on OpenStack • Building more resilient clouds – Withstanding failures – Quickly recovering from failures • Questions?
  • 20. What’s a “resilient” cloud? re·sil·ient /rəˈzilyənt/ (adjective) Able to withstand or recover quickly from difficult conditions.
  • 21. • Today we’ll primarily focus on the cloud itself • Workloads running *in* clouds are another story…but we’ve only got one hour!
  • 22. 8am: “Uh-oh. Something tells me it’s going to be an interesting day in the datacenter….”
  • 23. • Hardware Failures • OpenStack software bugs (yep, those exist) • Underpinning software failures (database, message queue, etc) • Operating system failures • Network/storage/power failures • Planned maintenance windows • Hackers and malcontents • Upgrades • Automation failures • “Whoops, did I do that?”
  • 24. Some causes of outages in the past year ….did you plan for these? CONFIDENTIAL 24
  • 25. Sometimes things break (in *any* system). 25 Withstand what you can. Quickly recover from the rest. Because you don’t look this cute when your cloud is down.
  • 26.
  • 28. General Premise: Assume hardware and software fail. (because, shockingly, that actually happens) CONFIDENTIAL 28
  • 29. What Does “HA” Mean in an OpenStack Cloud? CONFIDENTIAL 29 • Compute • Multiple clusters • Consider segmenting with Availability Zones, Host Aggregates, etc • Consider ability to live migrate instances for hypervisor node maintenance • Ensure some capacity buffer for maintenance operations • Storage • Avoid single points of failure • Multiple technologies can be used…but each has it’s own limitations • Don’t think just Cinder here…your Glance backend and compute storage matter too! • Network • Network disruptions will inevitably occur, so plan for them • Design for control plane disruption (and pick technology accordingly) • Control Plane • May depend on the other things above • Essential to keeping the cloud operational • Data Plane • Stuff that workloads running in the cloud depends on
  • 30. High Availability Is Part of the Story…. ….we need to think a bit about architecture. (I’ll use a reference architecture from VMware Integrated OpenStack as an example) CONFIDENTIAL 30
  • 31. VIO Architecture – Logical Topology- Management
  • 32. Notice something? There’s a lot of stuff in there that isn’t OpenStack, but upon which OpenStack depends. CONFIDENTIAL 32
  • 33. VIO Architecture – RabbitMQ • RabbitMQ is a messaging broker - an intermediary for messaging. It gives applications a common platform to send and receive messages, and the messages a safe place to live until received. • RabbitMQ is the default AMQP server used by OpenStack services (Qpid is also an option, some support for 0mq). In production clouds, this should be a highly available infrastructure component. • The OpenStack subcomponents (nova- scheduler to nova-compute, for example) communicate among themselves using this hosted message queue service. They also utilize the hosted Memcached services for caching authentication tokens etc. As always, they persist data to the Database. • Component-to-Component communications (Nova-> Neutron) is done via REST. • For more details about the HA implementation of RabbitMQ, please click here.
  • 34. VIO Architecture – Database • The database cluster is at the heart of the infrastructure. Typically MySQL or MariaDB are used, but other options such as PostgreSQL are also supported. • The VIO MariaDB implementation makes use of a 3-node Galera cluster, which in itself is Active-Active-Active. However, since some OpenStack services enforce table locking, reads and writes are directed to a single node via the Load Balancers. • Note that this database is for management plane data. OpenStack services that store data as part of their purpose may use additional DB’s. For example: Ceilometer may store meter data to MySQL, Mongo, PostgreSQL, HBase, or DB2.
  • 35. VIO Architecture – Load Balancers • Most OpenStack Services run on the Controllers, which are mirrored on each controller VM and load-balanced. They are accessible via the internal virtual IP. • Some of the services, such as the Dashboard, compute-api, glance-api, keystone, cinder, neutron and novncproxy are exposed to the end users via the load balancer’s public virtual IP. • Likewise, the hosted Message Queue (RabbitMQ) and Memcached services are also load-balanced between 2 VMs. • For the Database Service, the load balancer is configured to use a primary DB VM. In case of failure it will switch to one of the two backup DB VMs. • Load Balancers user here are HAProxy with Keepalived for high availability
  • 36. Etc, etc, etc CONFIDENTIAL 36 • There’s a network connecting all that stuff • It’s running (as virtual machines) on servers which have operating systems • Things may get wonky if NTP fails and clocks are out of sync • If DNS can’t resolve, Bad Things ™ will probably happen
  • 37. • Consider whether you want active/active or active/passive • Setup and tooling differs a bit, but I generally like active/active • Note that docs.openstack.org has an HA Guide • Currently undergoing lots of updates…patches welcome! • Prioritize HA for the control plane • That also means thinking about your database, network, and RPC bus • Note: HA == more hardware • Some components need at least 3 nodes • Mitigate by virtualizing control plane
  • 38. • Stuff OpenStack needs to run: message brokers • Check out RabbitMQ clustering and mirrored queues • Check out Galera for MySQL/MariaDB • I often see Percona XtraDB in the wild • Frontend with an HAProxy/Keepalived pair • Memcached for caching
  • 39. • Don’t do rabbit clustering over a WAN • Be aware of the SELECT… FOR UPDATE issue
  • 40. • Long story short: Neutron and some parts of Nova invoke an SQL pattern known as “SELECT…FOR UPDATE” which Galera doesn’t support due to issues with cross-node locking. • Can cause deadlock-like symptoms due to locks not being replicated. • Neutron/nova code being refactored, but will likely not be done soon. • Meanwhile: use HAProxy to send writes to a single Galera node and you should be fine • With the obvious scalability bottleneck • More info here, here, & here. • Thank Jay Pipes of Mirantis & Peter Boros of Percona for the find!
  • 41. • Pick a highly available storage to back Glance • Pick a highly available storage backend for Cinder too – SAN, distributed, software defined, plethora of options here • Use Keepalived/HAProxy to front-end multiple API servers • Or another load balancer technology of your choice • Can be deployed as dedicated nodes for scale, or cohabitate • Data plane network: DVR & Provider Network Extensions • Distributed Virtual Routers are a new experimental feature in Juno (not yet ready for production) • Please go test it and report/fix bugs! • Provider networks essentially punt the availability issue to your physical network • Allows you to use standard tools like virtual port channels and VRRP • Also highly performant
  • 42. • Network: consider your backend technology – Neutron offers a variety of plugins for various open source and vendor-supplied network technology – Physical networks need usual redundancy protections – Overlays are popular for segmentation/isolation; some scale better than others – Shameless plug: check out VMware NSX which has been used in some very large OpenStack deployments!
  • 43. • Actually, most of these techniques and technologies are things that seasoned developers and sysadmins have used before. • It doesn’t take a genius to learn lessons from the past and apply them, tweak them, and tune them (but it’s a fair amount of work).
  • 44. Simple Rules of Thumb Planning for availability can go to extreme levels, so start simple: – Can I take any one [server | switch | storage unit] out of service in my control plane and still be operational? – For all of the above, what’s the impact? • Performance hit? • Capacity loss? • World is broken?!? – For all the above, how easy is it to reintroduce a repaired/replaced $thing? • Is there a recovery period that will further impact performance? • Is it a complex procedure? • Does the procedure cause more $things to be temporarily unavailable? – For all of the above, how can I monitor & alert for failure? Once you have that down, dig deeper to your heart’s content. CONFIDENTIAL 44
  • 46. Rule 1: Assume You Will Need to Change Stuff • Change can be a lot of things: – Hardware or software upgrades/patches/replacements – Configuration tweaks – Adding or subtracting capacity – All systems change over time; OpenStack clouds are no exception. CONFIDENTIAL 46 “Change in OpenStack? Yeah, I’m good with that…”
  • 47. Rule 2: Assume You Can’t Manually Log In To All The Nodes To Make Those Changes • OpenStack is a series of cooperating distributed systems – That means you could (potentially) have a lot of nodes – Software & config must often be placed on many machines – Manual changes == slow changes != quick recovery CONFIDENTIAL 47 “I guess multitasking only speeds things up to a certain extent…”
  • 48. Rule 3: Assume You Will Need To Test Stuff • It’s a good idea to have a small test cloud where you can examine the impact of changes • When possible, roll out changes to a portion of your cloud and evaluate before rolling out the rest – Note: this means you need tests and monitoring…otherwise you don’t know what “ok” looks like CONFIDENTIAL 48 “It’s 3am and I’m still debugging in production…maybe I should have taken the time to set up a test environment and automate some testing after all…”
  • 50. • Software developers and operators are increasingly the same people. – Agile development – Automate (almost) everything – Treat config & changes as you would code – Continuous integration, testing, deployment – Incremental change & iteration – Unified tooling & versioning – Critical approach to working at scale – Really useful for building resilient clouds Image courtesy of Rajiv Pant (http://en.wikipedia.org/wiki/File:Devops.png)
  • 51. How Configuration Tools Management Help • Can orchestrate deployment….and re-deployment. • Most can idempotently check configuration (no-op if everything is ok) • Can touch many nodes in parallel • Can type much faster and more accurately than you • Are a great way to collaborate amongst teams of operators • Most have strong communities within the OpenStack universe – Using a commercial OpenStack? Most vendors are using one of these tools to deploy and manage your cloud, whether you know it or not. – Rolling your own? Check out StackForge for tons of Ansible/Puppet/Chef modules you can use today • Allow you to manage other things besides OpenStack itself CONFIDENTIAL 51
  • 52.
  • 53.
  • 54. • I’ve worked on a lot of OpenStack clouds and almost everyone has their own preferred monitoring toolset. • One possible exception: almost everybody seems to love Graphite. • The golden rule is: use the tools that work for you! • Very often this will be whatever you’re using in the rest of your infrastructure. • Break it down into at least two buckets: • Up/down and alerting (ex: Nagios or it’s derivatives…yes, there are OpenStack plugins out there on NagiosExchange) • Trending data collection/plotting (ex: collectd/statsd feeding graphite) • Don’t forget logging • LogInsight, Logstash, etc. • Also: use your peers! • Operators often willing to share, so ask on the openstack-operators list.
  • 55. • Ok, this could take another hour, so I’ll just hit a few highlights… • Make use of OpenStack’s segregation features – Availability zones, host aggregates, regions, server groups for compute – Regions and zones for Swift • Plan to make infrastructure maintenance less impacting – Put API servers behind load balancers – Virtualize tenant-facing parts of the control plane for greater scale and mobilty – Use host evacuation and live migration to reduce impact – OpenStack is extremely pluggable…choose your backends wisely • You should know how to operate, monitor, and troubleshoot them • Understand how their drivers interact with OpenStack • You should be comfortable with their failure and recovery modes

Editor's Notes

  1. 1
  2. Messaging enables software applications to connect and scale. Applications can connect to each other, as components of a larger application. Messaging is asynchronous, decoupling applications by separating sending and receiving data. Advanced Message Queuing Protocol (AMQP) is an open standard and it enables conforming client applications to communicate with conforming messaging middleware servers
  3. Image courtesy of Rajiv Pant (http://en.wikipedia.org/wiki/File:Devops.png)