SlideShare a Scribd company logo
Compute node HA
a.k.a. “can pets survive in OpenStack?”
Adam Spiers
Senior Software Engineer, Cloud & High Availability
aspiers@suse.com
OpenStack London Meetup, Wednesday 18th
November
– short update on upstream development
High Availability in
a typical OpenStack cloud today
3
Typical HA control plane in OpenStack
Pacemaker Cluster
Control Node 1 Node
DRBD
PostgreSQL
RabbitMQ
Keystone
Glance
Nova
Dashboard
Cinder
Neutron
Database Cluster
Node 1 Node 2
DRBD or shared storage
Database
Message Queue
Services Cluster
Node 1 Node 2 Node 3
Orchestration
Keystone
Glance
Nova
Dashboard
Cinder
Telemetry
Neutron
• Maximises cloud uptime
• Automatic restart of
OpenStack controller
services
• Active/Active API services
with load balancing
• DB + MQ either
Active/Active or
Active/Passive
4
Under the covers
Services Cluster
Node 1 Node 2 Node 3
• Recommended by
official HA guide
• HAProxy distributes service
requests
• Pacemaker
‒ monitoring and control of nodes
and services
• Corosync
‒ cluster membership /
messaging / quorum /
leadership election
Corosync
Pacemaker
HAProxy
But what I really want to do is keep my workloads up!
6
HA Cluster
Control node
OS
Message queue
Database
Identity
Images
Block storage
Networking
Dashboard
Compute
OS
Compute node
nova-compute
libvirt
HA only on control plane
OS
Compute node
nova-compute
libvirt
OS
Compute node
nova-compute
libvirt
7
HA Cluster
Control node
OS
Message queue
Database
Identity
Images
Block storage
Networking
Dashboard
Compute
OS
Compute node
nova-compute
libvirt
Can we simply extend the cluster?
OS
Compute node
nova-compute
libvirt
OS
Compute node
nova-compute
libvirt
8
9
Scaling up
• Corosync requires <= 32 nodes
• But we want lots of compute nodes!
• The obvious workarounds are ugly
‒ Multiple compute clusters
‒ introduces unwanted artificial boundaries
‒ Clusters inside / between guest VM instances
‒ requires cloud users to modify guest images (installing & configuring cluster
software)
‒ cluster stacks are not OS-agnostic
‒ cloud is supposed to make things easier not harder!
10
pacemaker_remote to the rescue!
• New(-ish) Pacemaker feature
• Allows arbitrary scalability of an existing
Pacemaker cluster
11
Extending the cluster to compute nodes
Services Cluster
Node 1 Node 2 Node 3
Corosync
Pacemaker
HAProxy Compute node
pacemaker_remote
Compute node
pacemaker_remote
Compute node
pacemaker_remote
Compute node
pacemaker_remote
12
Capabilities
• Increases availability of compute nodes
‒ Detects failed compute services
‒ Automatic recovery of compute services where possible
• “Quarantines” failing compute nodes
‒ STONITH (fencing) extends to remote nodes
• Coordinates with control plane
‒ VMs on dead compute nodes are resurrected elsewhere
‒ In nova, this is described as “evacuation”
13
Public Health Warning
nova evacuate does not really mean evacuation!
14
Think about earthquakes
Not too late
to evacuate
Too late to
evacuate
15
nova terminology
nova live-migration
nova evacuate
16
Public Health Warning
• nova evacuate does not do evacuation
• nova evacuate does resurrection
• In Vancouver, nova developers considered a rename
‒ Hasn't happened yet
‒ Due to impact, seems unlikely to happen any time soon
‒ Whenever you see “evacuate” in a nova-related context,
pretend you saw “resurrect”
17
Existing solutions
• NovaCompute / NovaEvacuate custom OCF RAs
‒ used by Red Hat / SUSE / Intel
‒ works with known limitations
• EvacuationD
‒ PoC to address above limitations
‒ decouples resurrection workflow from Pacemaker
• Masakari (NTT)
‒ similar architecture, different code
‒ monitoring at 3 layers (node, process, hypervisor)
• Approach of AWcloud / ChinaMobile
‒ very different; uses consul / raft / gossip
18
Proposed solutions
• Use Mistral to orchestrate resurrection workflow
• Intel currently working on prototype
• Possibly the most promising approach
‒ Mistral considered pretty solid
‒ This is exactly the kind of thing it was designed for
• However, Mistral currently a SPoF … oops
‒ Don't worry, should be fixed in mitaka cycle
• Feasibility of convergence with Masakari will probably
be analysed within next week or two
19
Community developments
• openstack-resource-agents project now on
stackforge
‒ maintained by me
• New #openstack-ha IRC channel on FreeNode
‒ automatic notifications for activity on HA repositories
• New topic category on openstack-dev@ mailing list
Subject: [HA] i can haz pets in my cloud?
• Weekly IRC meetings at Monday 9am UTC
• HA guide currently undergoing a revamp
• Everyone welcome to get involved!
21
Unpublished Work of SUSE LLC. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC.
Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of
their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated,
abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.
General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making
purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document,
and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The
development, release, and timing of features or functionality described for SUSE products remains at the sole
discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at
any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in
this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All
third-party trademarks are the property of their respective owners.

More Related Content

What's hot

High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
Deepak Mane
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack Cloud
Qiming Teng
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
OpenStack Korea Community
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the World
Mike Dorman
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebula Project
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
John Garbutt
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
Viet Stack
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
tcp cloud
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Vietnam Open Infrastructure User Group
 
Rethinking the OS
Rethinking the OSRethinking the OS
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
Docker, Inc.
 
Integrate Openshift with Cloudforms
Integrate Openshift with CloudformsIntegrate Openshift with Cloudforms
Integrate Openshift with Cloudforms
Michael Lessard
 
HVX: Virtualizing the Cloud
HVX: Virtualizing the CloudHVX: Virtualizing the Cloud
HVX: Virtualizing the Cloud
Alex Fishman
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Cloud Native Day Tel Aviv
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Opcito Technologies
 
OpenStack on AArch64
OpenStack on AArch64OpenStack on AArch64
Openstack nova
Openstack novaOpenstack nova
Openstack nova
Murali Boyapati
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
LinuxCon ContainerCon CloudOpen China
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
LinuxCon ContainerCon CloudOpen China
 
CoreOS Overview and Current Status
CoreOS Overview and Current StatusCoreOS Overview and Current Status
CoreOS Overview and Current Status
Sreenivas Makam
 

What's hot (20)

High availability and fault tolerance of openstack
High availability and fault tolerance of openstackHigh availability and fault tolerance of openstack
High availability and fault tolerance of openstack
 
High Availability in OpenStack Cloud
High Availability in OpenStack CloudHigh Availability in OpenStack Cloud
High Availability in OpenStack Cloud
 
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
[OpenStack Days Korea 2016] Track1 - Mellanox CloudX - Acceleration for Cloud...
 
Moving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the WorldMoving to Nova Cells without Destroying the World
Moving to Nova Cells without Destroying the World
 
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
OpenNebulaConf 2016 - Measuring and tuning VM performance by Boyan Krosnov, S...
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Hostvn ceph in production v1.1 dungtq
Hostvn   ceph in production v1.1 dungtqHostvn   ceph in production v1.1 dungtq
Hostvn ceph in production v1.1 dungtq
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
Meetup 23 - 01 - The things I wish I would have known before doing OpenStack ...
 
Rethinking the OS
Rethinking the OSRethinking the OS
Rethinking the OS
 
Heart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object ModelHeart of the SwarmKit: Store, Topology & Object Model
Heart of the SwarmKit: Store, Topology & Object Model
 
Integrate Openshift with Cloudforms
Integrate Openshift with CloudformsIntegrate Openshift with Cloudforms
Integrate Openshift with Cloudforms
 
HVX: Virtualizing the Cloud
HVX: Virtualizing the CloudHVX: Virtualizing the Cloud
HVX: Virtualizing the Cloud
 
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
Containerizing Network Services - Alon Harel - OpenStack Day Israel 2016
 
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
Securing & Monitoring Your K8s Cluster with RBAC and Prometheus”.
 
OpenStack on AArch64
OpenStack on AArch64OpenStack on AArch64
OpenStack on AArch64
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
Scale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 servicesScale Kubernetes to support 50000 services
Scale Kubernetes to support 50000 services
 
Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling Is there still room for innovation in container orchestration and scheduling
Is there still room for innovation in container orchestration and scheduling
 
CoreOS Overview and Current Status
CoreOS Overview and Current StatusCoreOS Overview and Current Status
CoreOS Overview and Current Status
 

Viewers also liked

Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Etsuji Nakai
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
Arthur Berezin
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
Arthur Berezin
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여
Ji-Woong Choi
 
Závislosti, injekce a vůbec
Závislosti, injekce a vůbecZávislosti, injekce a vůbec
Závislosti, injekce a vůbecDavid Grudl
 
SMART Response PGO
SMART Response PGOSMART Response PGO
SMART Response PGO
Darin Doherty
 
10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem
David Grudl
 

Viewers also liked (8)

Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA ArchitectureRed Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
Red Hat Enterprise Linux OpenStack Platform 7 - VM Instance HA Architecture
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여[오픈소스컨설팅]오픈스택에 대하여
[오픈소스컨설팅]오픈스택에 대하여
 
Závislosti, injekce a vůbec
Závislosti, injekce a vůbecZávislosti, injekce a vůbec
Závislosti, injekce a vůbec
 
SMART Response PGO
SMART Response PGOSMART Response PGO
SMART Response PGO
 
10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem10.000 followerů na Twitteru snadno a šupem
10.000 followerů na Twitteru snadno a šupem
 
Marianne faithfull slide
Marianne faithfull slideMarianne faithfull slide
Marianne faithfull slide
 

Similar to Compute node HA - current upstream development

Open stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availabilityOpen stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availability
Rick Ashford
 
Learning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSELearning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSE
OpenInfra Days Poland 2019
 
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise EnvironmentDeploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Rick Ashford
 
Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt
Ceph Community
 
High Availability in Neutron
High Availability in NeutronHigh Availability in Neutron
High Availability in Neutron
Rossella Sblendido
 
Hands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE CloudHands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE Cloud
Rick Ashford
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
Daniel Krook
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
Animesh Singh
 
SUSE: Alien Life Forms
SUSE: Alien Life FormsSUSE: Alien Life Forms
SUSE: Alien Life Forms
Kangaroot
 
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedoresCasos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
SUSE España
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack Cloud
SUSE
 
Productos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSPProductos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSP
SUSE España
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebula Project
 
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Joe Felisky
 
OSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim WernerOSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim Werner
NETWAYS
 
Nova states summit
Nova states summitNova states summit
Nova states summit
Joshua Harlow
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
Jean-François Deverge
 
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
WSO2
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
Dirk Oppenkowski
 
SUSE: Software Defined Storage
SUSE: Software Defined StorageSUSE: Software Defined Storage
SUSE: Software Defined Storage
Kangaroot
 

Similar to Compute node HA - current upstream development (20)

Open stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availabilityOpen stack meetup 2014 11-13 - 101 + high availability
Open stack meetup 2014 11-13 - 101 + high availability
 
Learning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSELearning to fly with Airship - Simon Briggs, SUSE
Learning to fly with Airship - Simon Briggs, SUSE
 
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise EnvironmentDeploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
Deploying SUSE Cloud in a Multi-Hypervisor Enterprise Environment
 
Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt Using Ceph in a Private Cloud - Ceph Day Frankfurt
Using Ceph in a Private Cloud - Ceph Day Frankfurt
 
High Availability in Neutron
High Availability in NeutronHigh Availability in Neutron
High Availability in Neutron
 
Hands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE CloudHands-On with Heat: Service Orchestration in SUSE Cloud
Hands-On with Heat: Service Orchestration in SUSE Cloud
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
CAPS: What's best for deploying and managing OpenStack? Chef vs. Ansible vs. ...
 
SUSE: Alien Life Forms
SUSE: Alien Life FormsSUSE: Alien Life Forms
SUSE: Alien Life Forms
 
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedoresCasos de uso para aplicaciones tradicionales en un mundo de contenedores
Casos de uso para aplicaciones tradicionales en un mundo de contenedores
 
Expert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack CloudExpert Day 2019 - SUSE OpenStack Cloud
Expert Day 2019 - SUSE OpenStack Cloud
 
Productos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSPProductos de SUSE basados en CaaSP
Productos de SUSE basados en CaaSP
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.Gartner Data Center Conference 2014 - When Downtime is Not an Option.
Gartner Data Center Conference 2014 - When Downtime is Not an Option.
 
OSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim WernerOSMC 2010 | Insides SUSE Linux by Joachim Werner
OSMC 2010 | Insides SUSE Linux by Joachim Werner
 
Nova states summit
Nova states summitNova states summit
Nova states summit
 
Mastering Real-time Linux
Mastering Real-time LinuxMastering Real-time Linux
Mastering Real-time Linux
 
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2 Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
Build Platform as a Service (PaaS) with SUSE Studio, WSO2 Middleware, and EC2
 
Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0Running SAP on SUSE Cloud 2.0
Running SAP on SUSE Cloud 2.0
 
SUSE: Software Defined Storage
SUSE: Software Defined StorageSUSE: Software Defined Storage
SUSE: Software Defined Storage
 

Recently uploaded

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
Neo4j
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
Claudio Di Ciccio
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
panagenda
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
Kumud Singh
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
Zilliz
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
Aftab Hussain
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 

Recently uploaded (20)

Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
GraphSummit Singapore | Neo4j Product Vision & Roadmap - Q2 2024
 
“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”“I’m still / I’m still / Chaining from the Block”
“I’m still / I’m still / Chaining from the Block”
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUHCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAU
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
Mind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AIMind map of terminologies used in context of Generative AI
Mind map of terminologies used in context of Generative AI
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
Full-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalizationFull-RAG: A modern architecture for hyper-personalization
Full-RAG: A modern architecture for hyper-personalization
 
Removing Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software FuzzingRemoving Uninteresting Bytes in Software Fuzzing
Removing Uninteresting Bytes in Software Fuzzing
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 

Compute node HA - current upstream development

  • 1. Compute node HA a.k.a. “can pets survive in OpenStack?” Adam Spiers Senior Software Engineer, Cloud & High Availability aspiers@suse.com OpenStack London Meetup, Wednesday 18th November – short update on upstream development
  • 2. High Availability in a typical OpenStack cloud today
  • 3. 3 Typical HA control plane in OpenStack Pacemaker Cluster Control Node 1 Node DRBD PostgreSQL RabbitMQ Keystone Glance Nova Dashboard Cinder Neutron Database Cluster Node 1 Node 2 DRBD or shared storage Database Message Queue Services Cluster Node 1 Node 2 Node 3 Orchestration Keystone Glance Nova Dashboard Cinder Telemetry Neutron • Maximises cloud uptime • Automatic restart of OpenStack controller services • Active/Active API services with load balancing • DB + MQ either Active/Active or Active/Passive
  • 4. 4 Under the covers Services Cluster Node 1 Node 2 Node 3 • Recommended by official HA guide • HAProxy distributes service requests • Pacemaker ‒ monitoring and control of nodes and services • Corosync ‒ cluster membership / messaging / quorum / leadership election Corosync Pacemaker HAProxy But what I really want to do is keep my workloads up!
  • 5. 6 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt HA only on control plane OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  • 6. 7 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt Can we simply extend the cluster? OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  • 7. 8
  • 8. 9 Scaling up • Corosync requires <= 32 nodes • But we want lots of compute nodes! • The obvious workarounds are ugly ‒ Multiple compute clusters ‒ introduces unwanted artificial boundaries ‒ Clusters inside / between guest VM instances ‒ requires cloud users to modify guest images (installing & configuring cluster software) ‒ cluster stacks are not OS-agnostic ‒ cloud is supposed to make things easier not harder!
  • 9. 10 pacemaker_remote to the rescue! • New(-ish) Pacemaker feature • Allows arbitrary scalability of an existing Pacemaker cluster
  • 10. 11 Extending the cluster to compute nodes Services Cluster Node 1 Node 2 Node 3 Corosync Pacemaker HAProxy Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote
  • 11. 12 Capabilities • Increases availability of compute nodes ‒ Detects failed compute services ‒ Automatic recovery of compute services where possible • “Quarantines” failing compute nodes ‒ STONITH (fencing) extends to remote nodes • Coordinates with control plane ‒ VMs on dead compute nodes are resurrected elsewhere ‒ In nova, this is described as “evacuation”
  • 12. 13 Public Health Warning nova evacuate does not really mean evacuation!
  • 13. 14 Think about earthquakes Not too late to evacuate Too late to evacuate
  • 15. 16 Public Health Warning • nova evacuate does not do evacuation • nova evacuate does resurrection • In Vancouver, nova developers considered a rename ‒ Hasn't happened yet ‒ Due to impact, seems unlikely to happen any time soon ‒ Whenever you see “evacuate” in a nova-related context, pretend you saw “resurrect”
  • 16. 17 Existing solutions • NovaCompute / NovaEvacuate custom OCF RAs ‒ used by Red Hat / SUSE / Intel ‒ works with known limitations • EvacuationD ‒ PoC to address above limitations ‒ decouples resurrection workflow from Pacemaker • Masakari (NTT) ‒ similar architecture, different code ‒ monitoring at 3 layers (node, process, hypervisor) • Approach of AWcloud / ChinaMobile ‒ very different; uses consul / raft / gossip
  • 17. 18 Proposed solutions • Use Mistral to orchestrate resurrection workflow • Intel currently working on prototype • Possibly the most promising approach ‒ Mistral considered pretty solid ‒ This is exactly the kind of thing it was designed for • However, Mistral currently a SPoF … oops ‒ Don't worry, should be fixed in mitaka cycle • Feasibility of convergence with Masakari will probably be analysed within next week or two
  • 18. 19 Community developments • openstack-resource-agents project now on stackforge ‒ maintained by me • New #openstack-ha IRC channel on FreeNode ‒ automatic notifications for activity on HA repositories • New topic category on openstack-dev@ mailing list Subject: [HA] i can haz pets in my cloud? • Weekly IRC meetings at Monday 9am UTC • HA guide currently undergoing a revamp • Everyone welcome to get involved!
  • 19. 21
  • 20. Unpublished Work of SUSE LLC. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary and trade secret information of SUSE LLC. Access to this work is restricted to SUSE employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of SUSE. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. SUSE makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for SUSE products remains at the sole discretion of SUSE. Further, SUSE reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All SUSE marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.