SlideShare a Scribd company logo
1
OpenStack Summit
May 12-16, 2014
Atlanta, Georgia
Enhancing High Availability
in the Context of OpenStack
Qiming Teng
tengqim@cn.ibm.com
IBM Research
2 © 2014 IBM Corporation
Agenda
 High Availability (HA) Overview
 Four Types of HA in OpenStack
 OpenStack HA
 VM/Application HA Options
‒ VM/App HA Orchestrated
‒ Open Questions
 HA as a Service?
3 © 2014 IBM Corporation
High Availability Overview
 Why HA?
‒ Single system
• Hardware failures
• Hypervisor defects
• OS (host/guest) crashes
• Application bugs
‒ In cloud
• Shared, virtualized storage
• Shared, virtualized network
 Use cases
‒ Server consolidation in private cloud
‒ Selling point for public cloud
‒ Ease of management
• Planned/unplanned downtime
‒ (potentially) a user consumable service
4 © 2014 IBM Corporation
How to Achieve HA?
 Three Technologies
‒ Redundancy
• Capacity Planning
• Cost
‒ Detection
• Watchdog
• Heartbeat messages
‒ Recovery
• Transparency
• Data consistency
• Interruption time
 Implications
‒ Automatic
‒ Autonomous
OPERATING
FAILURE RECOVERING
5 © 2014 IBM Corporation
Four Types of High Availability in an OpenStack Cloud
Host
OpenStack
VM
Application
• Physical nodes
• Physical network
• Physical storage
• Hypervisor
• Host OS
• …
• Compute Controller
• Network Controller
• Database
• Message Queue
• Storage
• …
• Service Resiliency
• Quality of Service
• Cost
• Transparency
• Data Integrity
• ...
• Virtual Machine
• Incl. Container
• Virtual Network
• Virtual Storage
• VM Mobility
• Ease of Management
• ...
6 © 2014 IBM Corporation
OpenStack HA: Deployment Pattern
 Main Focus
‒ Avoid SPOF (Single Point of Failure) in OpenStack services
• Controller, Network, Compute, Swift, etc.
‒ Stateful versus Stateless services
 Implementation
‒ Primarily based on Pacemaker/Corosync Linux-HA stack, plus
a load-balancer
‒ Keepalived/haproxy
 A Deployment Pattern, not part of OpenStack core
components
‒ HA Guide documentation
‒ Chef cookbooks
‒ TripleO elements
 Only deployment, no runtime management service
7 © 2014 IBM Corporation
An example setup (RDO)
Following chart are closing charts to promote IBM
sessions, demos, etcetera
8 © 2014 IBM Corporation
OpenStack HA: Intrinsic Supports
 Nova
‒ Host Aggregates
‒ Availability Zones
‒ Service Groups
• Internal heartbeat messages, zookeeper/memcached/matchmaker
‒ …
 Message Queues
‒ QPID heartbeats (60 seconds interval)
‒ ZeroMQ w/ MatchMaker
 Cinder
‒ Storwize driver (heartbeat: 10 seconds)
‒ Contrib services
 Swift
 ...
9 © 2014 IBM Corporation
OpenStack HA: Internal Heartbeats
[tengqm@node1 ~]$ nova service-list
+----+------------------+-------+----------+---------+-------+----------------------------+-----------------+
| Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason |
+----+------------------+-------+----------+---------+-------+----------------------------+-----------------+
| 1 | nova-conductor | node1 | internal | enabled | up | 2014-04-27T20:37:09.000000 | - |
| 3 | nova-cert | node1 | internal | enabled | up | 2014-04-27T20:37:05.000000 | - |
| 4 | nova-scheduler | node1 | internal | enabled | up | 2014-04-27T20:37:06.000000 | - |
| 5 | nova-consoleauth | node1 | internal | enabled | up | 2014-04-27T20:37:05.000000 | - |
| 6 | nova-compute | node1 | nova | enabled | up | 2014-04-27T20:37:06.000000 | - |
+----+------------------+-------+----------+---------+-------+----------------------------+-----------------+
[tengqm@node1 ~]$ neutron agent-list
Starting new HTTP connection (1): 9.186.106.171
Starting new HTTP connection (1): 9.186.106.171
+--------------------------------------+--------------------+-------+-------+----------------+
| id | agent_type | host | alive | admin_state_up |
+--------------------------------------+--------------------+-------+-------+----------------+
| 0f9b8470-577e-4439-84f1-36ce92eac77d | Metadata agent | node1 | :-) | True |
| 7ac10787-9a62-4a96-868f-bd90bb46d52b | L3 agent | node1 | :-) | True |
| c89d0bac-8a41-44ee-8df0-389a9c8db428 | Open vSwitch agent | node1 | :-) | True |
| e138db2d-bf3b-4ac2-89ab-50dbb8771a7b | DHCP agent | node1 | :-) | True |
+--------------------------------------+--------------------+-------+-------+----------------+
10 © 2014 IBM Corporation
VM/Application HA: Guest Clusters
hardware
Host OS / Hypervisor
App
App
Guest OS
App
App
Guest OS
App
App
Guest OS
App
App
Guest OS
nova
neutron
cinder
glance
…
heat
SCs/SDs
hardware
Host OS / Hypervisor
11 © 2014 IBM Corporation
VM/Application HA Timeline (reboot VM-2)
21:22:56 rgmanager Shutting down
rgmanager Stopping service load-balancer
rgmanager [script] Executing /etc/init.d/lbsvc stop
21:23:00
21:23:03
21:23:05
21:23:06
21:23:07
rgmanager [ip] Removing IPv4 address 10.0.2.212/24 from eth1
rgmanager Service load-balancer is stopped
rgmanager Resource groups locked; not evaluating
[CPG] got procleave message from node 2 (1)
dlm_controld: stop distributed lock manager (2)
21:23:04
rgmanager Dbus Released
rgmanager Stopped 1 service
rgmanager Disconnecting from CMAN
rgmanager Pausing to allow services to start on other nodes
[CPG] got procleave message from node 2
rgmanager Member 2 shutting down
dlm_controld: stop distributed lock manager (3)
rgmanager Evaluating load-balancer, stopped, owner none
rgmanager event (0:2:0) Processed
rgmanager Starting stopped service load-balancer
rgmanager [ip] Link for eth1: Detected
rgmanager [ip] Adding IP addr 10.0.2.212/24 to eth1
rgmanager [ip] Pinging addr 10.0.2.212 from dev eth1
rgmanager Event: Port Closed
rgmanager Node 2 is not listening
rgmanager [ip] Sending gratuitous ARP: 10.0.2.212
fa:16:3e:b4:69:b8 brd ff:ff:ff:ff:ff:ff
rgmanager [script] Executing /etc/init.d/lbsvc start
21:23:08
rgmanager Service load-balancer started rgmanager 1 events processed
VM-2(rebooted)VM-1
12 © 2014 IBM Corporation
VM/Application HA: Guest Clusters
App
App
Guest OS
App
App
Guest OS
App
App
Guest OS
App
App
Guest OS
nova
neutron
cinder
glance
…
heat
SCs/SDs
Service X
Image from
Dept A
Application Y
Image from
Dept B
Service X
Image from
Dept A
Service Z
Image from
Dept C
hardware
Host OS / Hypervisor
hardware
Host OS / Hypervisor
LIMITATIONS
- Ease of management - Application Specific - Intrusive
13 © 2014 IBM Corporation
VM/Application HA: Intrinsic Supports
 Redundancy
‒ Nova
• Server Groups
• Virtual Ensembles ?
• Virtual Clusters ?
‒ Heat
• InstanceGroup resource
• ResourceGroup resource
 Detection
‒ RPC notification, oslo.messaging
‒ Ceilometer
 Recovery
‒ Fencing support in nova, cinder, neutron [undergoing]
‒ VM reboot, rebuild, evacuation …
‒ OS::Heat::HARestarter resource in Heat (deprecating)
‒ …
Ceilometer Notification
14 © 2014 IBM Corporation
VM/Application HA: Heat Orchestrated – yesterday
Nova Server
heat-cfntools
heat-api-cloudwatch
Alarm
heat-engine
HARestarter
heat-api-cfn
Template
15 © 2014 IBM Corporation
VM/Application HA: Heat Orchestrated – yesterday
Nova Server
crond
cfn-hup
cfn-push-stats heat-api-cloudwatch
Alarm
heat-engine
create_watch_data
Watch_rule
HARestarter
restart()
delete; create
BOTO
heat-api-cfnBOTO
MQ
16 © 2014 IBM Corporation
VM/Application HA: Heat Orchestrated – today
Nova Server
os-collect-config
cfn-push-stats
ceilometer
Alarm
heat-engine
Alarm
BOTO
nova
MQ
metadata
heat-api-cfnMQHARestarter
restart()
delete; create
heat-api-cloudwatch
MQ
create_watch_data
metadata
HTTP
17 © 2014 IBM Corporation
VM/Application HA: Heat Orchestrated – tomorrow?
Nova Server
os-collect-config
???
ceilometer
Alarm
heat-engine
Alarm
MQnova
MQ
metadata
VMCluster
“Restart”
metadata
HTTP
notification
MQ
1
Application Heartbeats
2
VM Heartbeats
3
A native signal / alarm
4
A notion of VM groups for
• VM/Application redundancy
• HA policies
5
Native support VM and
application recovery:
• reboot
• rebuild
• migrate
• remote-restart
18 © 2014 IBM Corporation
VM/Application HA: Open Questions
 Physical placement of VMs
‒ No shared PDU/rack, no shared network switch
‒ HA-aware scheduling, e.g. server priority
 Detection of failures
‒ High availability and QoS, e.g. desired latency/throughput versus reality
‒ Reliable detection, application involvement, …
 Reasoning of failures
‒ Root cause, trend analysis
‒ Log collection and analysis
 HA management / orchestration
‒ As a cross-cutting concern, involving not only compute, but also storage and network
• Stack availability?
‒ Capacity planning / reservation
 Leverage existing HA software
‒ Can we leverage supports from hypervisors?
‒ Can/should we generalize this into a service?
19 © 2014 IBM Corporation
High Availability as a Service (HAaaS)
 Generic HA management service
‒ Applicable to different levels of HA
• Host, VM, App, OpenStack
‒ Applicable to different hypervisors
• vSphere, KVM, Xen, HyperV, PowerVM, …)
‒ Functionality determined via user authentication
 Well-defined service APIs
‒ Clusters management
‒ Application/Service resource definition
‒ HA policies
• Fail-over domain
• Fail-over priority, operation, timeout, retries, …
20 © 2014 IBM Corporation
HAaaS: OpenStack HA
VM or Physical Servers
have have
21 © 2014 IBM Corporation
HAaaS: VM HA
Physical Server
have have
22 © 2014 IBM Corporation
HAaaS: Application HA
VMs
have have
23 © 2014 IBM Corporation
Service Interfaces (1)
 Cluster Management
‒ cluster_create,
‒ cluster_destroy,
‒ cluster_start,
‒ cluster_stop,
‒ cluster_suspend,
‒ cluster_resume,
‒ cluster_set_attr,
‒ cluster_get_attr,
‒ cluster_by_host,
‒ cluster_get_status,
‒ cluster_get_log,
‒ ...
 Node Management (physical/virtual)
‒ node_join_cluster
‒ node_leave_cluster
‒ node_get_attr
‒ node_set_attr
‒ node_startup
‒ node_shutdown
‒ node_reboot
‒ node_evacuate
‒ node_get_status
‒ ...
24 © 2014 IBM Corporation
Service Interfaces (2)
 Resource Management
‒ resource_create
‒ resource_destroy
‒ resource_get_attr
‒ resource_set_attr
‒ ...
 Fencing Management
‒ Fencing_dev_add
‒ Fencing_dev_del
‒ Fencing_dev_associate
‒ Fencing_dev_deassociate
‒ Fencing_dev_set_opts
‒ Fencing_dev_get_opts
‒ ...
 Service Management
(aka. resource groups)
‒ service_create
‒ service_destroy
‒ service_add_resource
‒ service_del_resource
‒ service_list
‒ service_get_attr
‒ service_set_attr
‒ service_start
‒ service_stop
‒ service_restart
‒ service_relocate
‒ ...
25 © 2014 IBM Corporation
Monday, May 12 – Room B314
12:05-12:45
Wednesday, May 14 - Room B312
9:00-9:40
9:50-10:30
11:00-11:40
11:50-12:30
OpenStack is Rockin’ the OpenCloud Movement! Who‘s Next to Join the Band ?
Angel Diaz, VP Open Technology and Cloud Labs
David Lindquist, IBM Fellow, VP, CTO Cloud & Smarter Infrastructure
Getting from enterprise ready to enterprise bliss - why OpenStack and IBM is a match made in Cloud heaven.
Todd Moore - Director, Open Technologies and Partnerships
Taking OpenStack beyond Infrastructure with IBM SmartCloud Orchestrator.
Andrew Trossman - Distinguished Engineer, IBM Common Cloud Stack and SmartCloud Orchestrator
IBM, SoftLayer and OpenStack - present and future
Michael Fork - Cloud Architect
IBM and OpenStack: Enabling Enterprise Cloud Solutions Now.
Tammy Van Hove -Distinguished Engineer, Software Defined Systems
IBM Sponsored Sessions
26 © 2014 IBM Corporation
IBM Technical Sessions
Monday, May 12
3:40 - 4:20
3:40 - 4:20
Tuesday, May 13
11:15 - 11:55
2:00 - 2:40
5:30 - 6:10
5:30 - 6:10
Wednesday, May14
9:50 - 10:30
2:40 - 3:20
Thursday, May 15
9:50 - 10:30
1:30 - 2:10
2:20 - 3:00
27
Be sure to stop by the IBM booth to see some demos
and get your rockin’ OpenStack T-shirt while they last.
Thank you !

More Related Content

What's hot

Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with SenlinDeploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
Qiming Teng
 
Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
Arthur Berezin
 
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN ControllerOpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
Yongyoon Shin
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
Stephen Gordon
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013
Matt Ray
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
Peeyush Gupta
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
tcp cloud
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
John Garbutt
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
Murali Boyapati
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Nati Shalom
 
Role of sdn controllers in open stack
Role of sdn controllers in open stackRole of sdn controllers in open stack
Role of sdn controllers in open stack
openstackindia
 
Container Orchestration
Container OrchestrationContainer Orchestration
Container Orchestration
dfilppi
 
OpenStack Magnum 2016-08-04
OpenStack Magnum 2016-08-04OpenStack Magnum 2016-08-04
OpenStack Magnum 2016-08-04
Adrian Otto
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015
Arthur Berezin
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
openstackindia
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Cloud Native Day Tel Aviv
 
Topologies of OpenStack
Topologies of OpenStackTopologies of OpenStack
Topologies of OpenStack
haribabu kasturi
 
Open stack in action enovance-quantum in action
Open stack in action enovance-quantum in actionOpen stack in action enovance-quantum in action
Open stack in action enovance-quantum in action
eNovance
 

What's hot (19)

Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with SenlinDeploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
Deploy an Elastic, Resilient, Load-Balanced Cluster in 5 Minutes with Senlin
 
Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014Mirantis OpenStack-DC-Meetup 17 Sept 2014
Mirantis OpenStack-DC-Meetup 17 Sept 2014
 
Deep dive into highly available open stack architecture openstack summit va...
Deep dive into highly available open stack architecture   openstack summit va...Deep dive into highly available open stack architecture   openstack summit va...
Deep dive into highly available open stack architecture openstack summit va...
 
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN ControllerOpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
OpenStack KOREA 정기 세미나_OpenStack meet iNaaS SDN Controller
 
What's new in OpenStack Liberty
What's new in OpenStack LibertyWhat's new in OpenStack Liberty
What's new in OpenStack Liberty
 
Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013Chef and OpenStack Workshop from ChefConf 2013
Chef and OpenStack Workshop from ChefConf 2013
 
OpenStack Nova Scheduler
OpenStack Nova Scheduler OpenStack Nova Scheduler
OpenStack Nova Scheduler
 
OpenStack HA
OpenStack HAOpenStack HA
OpenStack HA
 
OpenStack Nova - Developer Introduction
OpenStack Nova - Developer IntroductionOpenStack Nova - Developer Introduction
OpenStack Nova - Developer Introduction
 
Openstack nova
Openstack novaOpenstack nova
Openstack nova
 
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...Orchestration tool roundup   kubernetes vs. docker vs. heat vs. terra form vs...
Orchestration tool roundup kubernetes vs. docker vs. heat vs. terra form vs...
 
Role of sdn controllers in open stack
Role of sdn controllers in open stackRole of sdn controllers in open stack
Role of sdn controllers in open stack
 
Container Orchestration
Container OrchestrationContainer Orchestration
Container Orchestration
 
OpenStack Magnum 2016-08-04
OpenStack Magnum 2016-08-04OpenStack Magnum 2016-08-04
OpenStack Magnum 2016-08-04
 
Neutron high availability open stack architecture openstack israel event 2015
Neutron high availability  open stack architecture   openstack israel event 2015Neutron high availability  open stack architecture   openstack israel event 2015
Neutron high availability open stack architecture openstack israel event 2015
 
Guts & OpenStack migration
Guts & OpenStack migrationGuts & OpenStack migration
Guts & OpenStack migration
 
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
Scaling OpenStack Networking Beyond 4000 Nodes with Dragonflow - Eshed Gal-Or...
 
Topologies of OpenStack
Topologies of OpenStackTopologies of OpenStack
Topologies of OpenStack
 
Open stack in action enovance-quantum in action
Open stack in action enovance-quantum in actionOpen stack in action enovance-quantum in action
Open stack in action enovance-quantum in action
 

Similar to High Availability in OpenStack Cloud

IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
MarkTaylorIBM
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
VirtualTech Japan Inc.
 
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryIBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
Rob Convery
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
MarkTaylorIBM
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
Andrew Schofield
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
OpenStack Korea Community
 
Accelerating Public Cloud Migration with Multi-Cloud Load Balancing
Accelerating Public Cloud Migration with Multi-Cloud Load BalancingAccelerating Public Cloud Migration with Multi-Cloud Load Balancing
Accelerating Public Cloud Migration with Multi-Cloud Load Balancing
Avi Networks
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
Guy Brown
 
Cloud stack troubleshooting
Cloud stack troubleshooting Cloud stack troubleshooting
Cloud stack troubleshooting
AlexTian
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PROIDEA
 
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t EnoughSecure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
Khash Nakhostin
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
confluent
 
OpenStack State of Fibre Channel
OpenStack State of Fibre ChannelOpenStack State of Fibre Channel
OpenStack State of Fibre Channel
hemna6969
 
F5 TMOS v13.0
F5 TMOS v13.0F5 TMOS v13.0
F5 TMOS v13.0
MarketingArrowECS_CZ
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
DataWorks Summit
 
Automated Deployment and Management of Edge Clouds
Automated Deployment and Management of Edge CloudsAutomated Deployment and Management of Edge Clouds
Automated Deployment and Management of Edge Clouds
Jay Bryant
 
WebSphere Technical University: Top WebSphere Problem Determination Features
WebSphere Technical University: Top WebSphere Problem Determination FeaturesWebSphere Technical University: Top WebSphere Problem Determination Features
WebSphere Technical University: Top WebSphere Problem Determination Features
Chris Bailey
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
MarkTaylorIBM
 
Example of One of my Desgins for Cyber &Networking Solutions for Customers ...
Example of One  of my Desgins  for Cyber &Networking Solutions for Customers ...Example of One  of my Desgins  for Cyber &Networking Solutions for Customers ...
Example of One of my Desgins for Cyber &Networking Solutions for Customers ...
chen sheffer
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
Tokyo Azure Meetup
 

Similar to High Availability in OpenStack Cloud (20)

IBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster RecoveryIBM MQ - High Availability and Disaster Recovery
IBM MQ - High Availability and Disaster Recovery
 
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
OpenStackを利用したEnterprise Cloudを支える技術 - OpenStack最新情報セミナー 2016年5月
 
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster RecoveryIBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
IBM Integration Bus & WebSphere MQ - High Availability & Disaster Recovery
 
IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)IBM MQ High Availabillity and Disaster Recovery (2017 version)
IBM MQ High Availabillity and Disaster Recovery (2017 version)
 
Ame 2269 ibm mq high availability
Ame 2269 ibm mq high availabilityAme 2269 ibm mq high availability
Ame 2269 ibm mq high availability
 
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
[OpenInfra Days Korea 2018] Day 2 - E6 - OpenInfra monitoring with Prometheus
 
Accelerating Public Cloud Migration with Multi-Cloud Load Balancing
Accelerating Public Cloud Migration with Multi-Cloud Load BalancingAccelerating Public Cloud Migration with Multi-Cloud Load Balancing
Accelerating Public Cloud Migration with Multi-Cloud Load Balancing
 
F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017F5 Meetup presentation automation 2017
F5 Meetup presentation automation 2017
 
Cloud stack troubleshooting
Cloud stack troubleshooting Cloud stack troubleshooting
Cloud stack troubleshooting
 
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
PLNOG15: Practical deployments of Kea, a high performance scalable DHCP - Tom...
 
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t EnoughSecure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
Secure Remote Access to AWS: Why OpenVPN & Jump Hosts Aren’t Enough
 
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
Modern Cloud-Native Streaming Platforms: Event Streaming Microservices with A...
 
OpenStack State of Fibre Channel
OpenStack State of Fibre ChannelOpenStack State of Fibre Channel
OpenStack State of Fibre Channel
 
F5 TMOS v13.0
F5 TMOS v13.0F5 TMOS v13.0
F5 TMOS v13.0
 
Practice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China MobilePractice of large Hadoop cluster in China Mobile
Practice of large Hadoop cluster in China Mobile
 
Automated Deployment and Management of Edge Clouds
Automated Deployment and Management of Edge CloudsAutomated Deployment and Management of Edge Clouds
Automated Deployment and Management of Edge Clouds
 
WebSphere Technical University: Top WebSphere Problem Determination Features
WebSphere Technical University: Top WebSphere Problem Determination FeaturesWebSphere Technical University: Top WebSphere Problem Determination Features
WebSphere Technical University: Top WebSphere Problem Determination Features
 
IBM MQ Disaster Recovery
IBM MQ Disaster RecoveryIBM MQ Disaster Recovery
IBM MQ Disaster Recovery
 
Example of One of my Desgins for Cyber &Networking Solutions for Customers ...
Example of One  of my Desgins  for Cyber &Networking Solutions for Customers ...Example of One  of my Desgins  for Cyber &Networking Solutions for Customers ...
Example of One of my Desgins for Cyber &Networking Solutions for Customers ...
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 

More from Qiming Teng

202203-技术沙龙-k8s-v1.pptx
202203-技术沙龙-k8s-v1.pptx202203-技术沙龙-k8s-v1.pptx
202203-技术沙龙-k8s-v1.pptx
Qiming Teng
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016
Qiming Teng
 
Autoscaling with magnum and senlin
Autoscaling with magnum and senlinAutoscaling with magnum and senlin
Autoscaling with magnum and senlin
Qiming Teng
 
VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region Scaling
Qiming Teng
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native Way
Qiming Teng
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20
Qiming Teng
 

More from Qiming Teng (6)

202203-技术沙龙-k8s-v1.pptx
202203-技术沙龙-k8s-v1.pptx202203-技术沙龙-k8s-v1.pptx
202203-技术沙龙-k8s-v1.pptx
 
Senlin deep dive 2016
Senlin deep dive 2016Senlin deep dive 2016
Senlin deep dive 2016
 
Autoscaling with magnum and senlin
Autoscaling with magnum and senlinAutoscaling with magnum and senlin
Autoscaling with magnum and senlin
 
VM HA and Cross-Region Scaling
VM HA and Cross-Region ScalingVM HA and Cross-Region Scaling
VM HA and Cross-Region Scaling
 
Managing Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native WayManaging Container Clusters in OpenStack Native Way
Managing Container Clusters in OpenStack Native Way
 
Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20Senlin deep dive 2015 05-20
Senlin deep dive 2015 05-20
 

Recently uploaded

Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
Jhone kinadey
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
kalichargn70th171
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
Tier1 app
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
WebConnect Pvt Ltd
 
What’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 UpdateWhat’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 Update
VictoriaMetrics
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Paul Brebner
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
Maitrey Patel
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
ShulagnaSarkar2
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
kgyxske
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
Paul Brebner
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
vaishalijagtap12
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
Yara Milbes
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
safelyiotech
 
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
manji sharman06
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
OnePlan Solutions
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
campbellclarkson
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
confluent
 

Recently uploaded (20)

Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdfBaha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
Baha Majid WCA4Z IBM Z Customer Council Boston June 2024.pdf
 
Boost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management AppsBoost Your Savings with These Money Management Apps
Boost Your Savings with These Money Management Apps
 
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
The Power of Visual Regression Testing_ Why It Is Critical for Enterprise App...
 
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSISDECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
DECODING JAVA THREAD DUMPS: MASTER THE ART OF ANALYSIS
 
Optimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptxOptimizing Your E-commerce with WooCommerce.pptx
Optimizing Your E-commerce with WooCommerce.pptx
 
What’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 UpdateWhat’s New in VictoriaLogs - Q2 2024 Update
What’s New in VictoriaLogs - Q2 2024 Update
 
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
Why Apache Kafka Clusters Are Like Galaxies (And Other Cosmic Kafka Quandarie...
 
ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.ACE - Team 24 Wrapup event at ahmedabad.
ACE - Team 24 Wrapup event at ahmedabad.
 
14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision14 th Edition of International conference on computer vision
14 th Edition of International conference on computer vision
 
Upturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in NashikUpturn India Technologies - Web development company in Nashik
Upturn India Technologies - Web development company in Nashik
 
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
一比一原版(sdsu毕业证书)圣地亚哥州立大学毕业证如何办理
 
Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...Superpower Your Apache Kafka Applications Development with Complementary Open...
Superpower Your Apache Kafka Applications Development with Complementary Open...
 
bgiolcb
bgiolcbbgiolcb
bgiolcb
 
42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert42 Ways to Generate Real Estate Leads - Sellxpert
42 Ways to Generate Real Estate Leads - Sellxpert
 
The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024The Rising Future of CPaaS in the Middle East 2024
The Rising Future of CPaaS in the Middle East 2024
 
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
Safelyio Toolbox Talk Softwate & App (How To Digitize Safety Meetings)
 
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
Call Girls Bangalore🔥7023059433🔥Best Profile Escorts in Bangalore Available 24/7
 
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical OperationsEnsuring Efficiency and Speed with Practical Solutions for Clinical Operations
Ensuring Efficiency and Speed with Practical Solutions for Clinical Operations
 
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
🏎️Tech Transformation: DevOps Insights from the Experts 👩‍💻
 
Building API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructureBuilding API data products on top of your real-time data infrastructure
Building API data products on top of your real-time data infrastructure
 

High Availability in OpenStack Cloud

  • 1. 1 OpenStack Summit May 12-16, 2014 Atlanta, Georgia Enhancing High Availability in the Context of OpenStack Qiming Teng tengqim@cn.ibm.com IBM Research
  • 2. 2 © 2014 IBM Corporation Agenda  High Availability (HA) Overview  Four Types of HA in OpenStack  OpenStack HA  VM/Application HA Options ‒ VM/App HA Orchestrated ‒ Open Questions  HA as a Service?
  • 3. 3 © 2014 IBM Corporation High Availability Overview  Why HA? ‒ Single system • Hardware failures • Hypervisor defects • OS (host/guest) crashes • Application bugs ‒ In cloud • Shared, virtualized storage • Shared, virtualized network  Use cases ‒ Server consolidation in private cloud ‒ Selling point for public cloud ‒ Ease of management • Planned/unplanned downtime ‒ (potentially) a user consumable service
  • 4. 4 © 2014 IBM Corporation How to Achieve HA?  Three Technologies ‒ Redundancy • Capacity Planning • Cost ‒ Detection • Watchdog • Heartbeat messages ‒ Recovery • Transparency • Data consistency • Interruption time  Implications ‒ Automatic ‒ Autonomous OPERATING FAILURE RECOVERING
  • 5. 5 © 2014 IBM Corporation Four Types of High Availability in an OpenStack Cloud Host OpenStack VM Application • Physical nodes • Physical network • Physical storage • Hypervisor • Host OS • … • Compute Controller • Network Controller • Database • Message Queue • Storage • … • Service Resiliency • Quality of Service • Cost • Transparency • Data Integrity • ... • Virtual Machine • Incl. Container • Virtual Network • Virtual Storage • VM Mobility • Ease of Management • ...
  • 6. 6 © 2014 IBM Corporation OpenStack HA: Deployment Pattern  Main Focus ‒ Avoid SPOF (Single Point of Failure) in OpenStack services • Controller, Network, Compute, Swift, etc. ‒ Stateful versus Stateless services  Implementation ‒ Primarily based on Pacemaker/Corosync Linux-HA stack, plus a load-balancer ‒ Keepalived/haproxy  A Deployment Pattern, not part of OpenStack core components ‒ HA Guide documentation ‒ Chef cookbooks ‒ TripleO elements  Only deployment, no runtime management service
  • 7. 7 © 2014 IBM Corporation An example setup (RDO) Following chart are closing charts to promote IBM sessions, demos, etcetera
  • 8. 8 © 2014 IBM Corporation OpenStack HA: Intrinsic Supports  Nova ‒ Host Aggregates ‒ Availability Zones ‒ Service Groups • Internal heartbeat messages, zookeeper/memcached/matchmaker ‒ …  Message Queues ‒ QPID heartbeats (60 seconds interval) ‒ ZeroMQ w/ MatchMaker  Cinder ‒ Storwize driver (heartbeat: 10 seconds) ‒ Contrib services  Swift  ...
  • 9. 9 © 2014 IBM Corporation OpenStack HA: Internal Heartbeats [tengqm@node1 ~]$ nova service-list +----+------------------+-------+----------+---------+-------+----------------------------+-----------------+ | Id | Binary | Host | Zone | Status | State | Updated_at | Disabled Reason | +----+------------------+-------+----------+---------+-------+----------------------------+-----------------+ | 1 | nova-conductor | node1 | internal | enabled | up | 2014-04-27T20:37:09.000000 | - | | 3 | nova-cert | node1 | internal | enabled | up | 2014-04-27T20:37:05.000000 | - | | 4 | nova-scheduler | node1 | internal | enabled | up | 2014-04-27T20:37:06.000000 | - | | 5 | nova-consoleauth | node1 | internal | enabled | up | 2014-04-27T20:37:05.000000 | - | | 6 | nova-compute | node1 | nova | enabled | up | 2014-04-27T20:37:06.000000 | - | +----+------------------+-------+----------+---------+-------+----------------------------+-----------------+ [tengqm@node1 ~]$ neutron agent-list Starting new HTTP connection (1): 9.186.106.171 Starting new HTTP connection (1): 9.186.106.171 +--------------------------------------+--------------------+-------+-------+----------------+ | id | agent_type | host | alive | admin_state_up | +--------------------------------------+--------------------+-------+-------+----------------+ | 0f9b8470-577e-4439-84f1-36ce92eac77d | Metadata agent | node1 | :-) | True | | 7ac10787-9a62-4a96-868f-bd90bb46d52b | L3 agent | node1 | :-) | True | | c89d0bac-8a41-44ee-8df0-389a9c8db428 | Open vSwitch agent | node1 | :-) | True | | e138db2d-bf3b-4ac2-89ab-50dbb8771a7b | DHCP agent | node1 | :-) | True | +--------------------------------------+--------------------+-------+-------+----------------+
  • 10. 10 © 2014 IBM Corporation VM/Application HA: Guest Clusters hardware Host OS / Hypervisor App App Guest OS App App Guest OS App App Guest OS App App Guest OS nova neutron cinder glance … heat SCs/SDs hardware Host OS / Hypervisor
  • 11. 11 © 2014 IBM Corporation VM/Application HA Timeline (reboot VM-2) 21:22:56 rgmanager Shutting down rgmanager Stopping service load-balancer rgmanager [script] Executing /etc/init.d/lbsvc stop 21:23:00 21:23:03 21:23:05 21:23:06 21:23:07 rgmanager [ip] Removing IPv4 address 10.0.2.212/24 from eth1 rgmanager Service load-balancer is stopped rgmanager Resource groups locked; not evaluating [CPG] got procleave message from node 2 (1) dlm_controld: stop distributed lock manager (2) 21:23:04 rgmanager Dbus Released rgmanager Stopped 1 service rgmanager Disconnecting from CMAN rgmanager Pausing to allow services to start on other nodes [CPG] got procleave message from node 2 rgmanager Member 2 shutting down dlm_controld: stop distributed lock manager (3) rgmanager Evaluating load-balancer, stopped, owner none rgmanager event (0:2:0) Processed rgmanager Starting stopped service load-balancer rgmanager [ip] Link for eth1: Detected rgmanager [ip] Adding IP addr 10.0.2.212/24 to eth1 rgmanager [ip] Pinging addr 10.0.2.212 from dev eth1 rgmanager Event: Port Closed rgmanager Node 2 is not listening rgmanager [ip] Sending gratuitous ARP: 10.0.2.212 fa:16:3e:b4:69:b8 brd ff:ff:ff:ff:ff:ff rgmanager [script] Executing /etc/init.d/lbsvc start 21:23:08 rgmanager Service load-balancer started rgmanager 1 events processed VM-2(rebooted)VM-1
  • 12. 12 © 2014 IBM Corporation VM/Application HA: Guest Clusters App App Guest OS App App Guest OS App App Guest OS App App Guest OS nova neutron cinder glance … heat SCs/SDs Service X Image from Dept A Application Y Image from Dept B Service X Image from Dept A Service Z Image from Dept C hardware Host OS / Hypervisor hardware Host OS / Hypervisor LIMITATIONS - Ease of management - Application Specific - Intrusive
  • 13. 13 © 2014 IBM Corporation VM/Application HA: Intrinsic Supports  Redundancy ‒ Nova • Server Groups • Virtual Ensembles ? • Virtual Clusters ? ‒ Heat • InstanceGroup resource • ResourceGroup resource  Detection ‒ RPC notification, oslo.messaging ‒ Ceilometer  Recovery ‒ Fencing support in nova, cinder, neutron [undergoing] ‒ VM reboot, rebuild, evacuation … ‒ OS::Heat::HARestarter resource in Heat (deprecating) ‒ … Ceilometer Notification
  • 14. 14 © 2014 IBM Corporation VM/Application HA: Heat Orchestrated – yesterday Nova Server heat-cfntools heat-api-cloudwatch Alarm heat-engine HARestarter heat-api-cfn Template
  • 15. 15 © 2014 IBM Corporation VM/Application HA: Heat Orchestrated – yesterday Nova Server crond cfn-hup cfn-push-stats heat-api-cloudwatch Alarm heat-engine create_watch_data Watch_rule HARestarter restart() delete; create BOTO heat-api-cfnBOTO MQ
  • 16. 16 © 2014 IBM Corporation VM/Application HA: Heat Orchestrated – today Nova Server os-collect-config cfn-push-stats ceilometer Alarm heat-engine Alarm BOTO nova MQ metadata heat-api-cfnMQHARestarter restart() delete; create heat-api-cloudwatch MQ create_watch_data metadata HTTP
  • 17. 17 © 2014 IBM Corporation VM/Application HA: Heat Orchestrated – tomorrow? Nova Server os-collect-config ??? ceilometer Alarm heat-engine Alarm MQnova MQ metadata VMCluster “Restart” metadata HTTP notification MQ 1 Application Heartbeats 2 VM Heartbeats 3 A native signal / alarm 4 A notion of VM groups for • VM/Application redundancy • HA policies 5 Native support VM and application recovery: • reboot • rebuild • migrate • remote-restart
  • 18. 18 © 2014 IBM Corporation VM/Application HA: Open Questions  Physical placement of VMs ‒ No shared PDU/rack, no shared network switch ‒ HA-aware scheduling, e.g. server priority  Detection of failures ‒ High availability and QoS, e.g. desired latency/throughput versus reality ‒ Reliable detection, application involvement, …  Reasoning of failures ‒ Root cause, trend analysis ‒ Log collection and analysis  HA management / orchestration ‒ As a cross-cutting concern, involving not only compute, but also storage and network • Stack availability? ‒ Capacity planning / reservation  Leverage existing HA software ‒ Can we leverage supports from hypervisors? ‒ Can/should we generalize this into a service?
  • 19. 19 © 2014 IBM Corporation High Availability as a Service (HAaaS)  Generic HA management service ‒ Applicable to different levels of HA • Host, VM, App, OpenStack ‒ Applicable to different hypervisors • vSphere, KVM, Xen, HyperV, PowerVM, …) ‒ Functionality determined via user authentication  Well-defined service APIs ‒ Clusters management ‒ Application/Service resource definition ‒ HA policies • Fail-over domain • Fail-over priority, operation, timeout, retries, …
  • 20. 20 © 2014 IBM Corporation HAaaS: OpenStack HA VM or Physical Servers have have
  • 21. 21 © 2014 IBM Corporation HAaaS: VM HA Physical Server have have
  • 22. 22 © 2014 IBM Corporation HAaaS: Application HA VMs have have
  • 23. 23 © 2014 IBM Corporation Service Interfaces (1)  Cluster Management ‒ cluster_create, ‒ cluster_destroy, ‒ cluster_start, ‒ cluster_stop, ‒ cluster_suspend, ‒ cluster_resume, ‒ cluster_set_attr, ‒ cluster_get_attr, ‒ cluster_by_host, ‒ cluster_get_status, ‒ cluster_get_log, ‒ ...  Node Management (physical/virtual) ‒ node_join_cluster ‒ node_leave_cluster ‒ node_get_attr ‒ node_set_attr ‒ node_startup ‒ node_shutdown ‒ node_reboot ‒ node_evacuate ‒ node_get_status ‒ ...
  • 24. 24 © 2014 IBM Corporation Service Interfaces (2)  Resource Management ‒ resource_create ‒ resource_destroy ‒ resource_get_attr ‒ resource_set_attr ‒ ...  Fencing Management ‒ Fencing_dev_add ‒ Fencing_dev_del ‒ Fencing_dev_associate ‒ Fencing_dev_deassociate ‒ Fencing_dev_set_opts ‒ Fencing_dev_get_opts ‒ ...  Service Management (aka. resource groups) ‒ service_create ‒ service_destroy ‒ service_add_resource ‒ service_del_resource ‒ service_list ‒ service_get_attr ‒ service_set_attr ‒ service_start ‒ service_stop ‒ service_restart ‒ service_relocate ‒ ...
  • 25. 25 © 2014 IBM Corporation Monday, May 12 – Room B314 12:05-12:45 Wednesday, May 14 - Room B312 9:00-9:40 9:50-10:30 11:00-11:40 11:50-12:30 OpenStack is Rockin’ the OpenCloud Movement! Who‘s Next to Join the Band ? Angel Diaz, VP Open Technology and Cloud Labs David Lindquist, IBM Fellow, VP, CTO Cloud & Smarter Infrastructure Getting from enterprise ready to enterprise bliss - why OpenStack and IBM is a match made in Cloud heaven. Todd Moore - Director, Open Technologies and Partnerships Taking OpenStack beyond Infrastructure with IBM SmartCloud Orchestrator. Andrew Trossman - Distinguished Engineer, IBM Common Cloud Stack and SmartCloud Orchestrator IBM, SoftLayer and OpenStack - present and future Michael Fork - Cloud Architect IBM and OpenStack: Enabling Enterprise Cloud Solutions Now. Tammy Van Hove -Distinguished Engineer, Software Defined Systems IBM Sponsored Sessions
  • 26. 26 © 2014 IBM Corporation IBM Technical Sessions Monday, May 12 3:40 - 4:20 3:40 - 4:20 Tuesday, May 13 11:15 - 11:55 2:00 - 2:40 5:30 - 6:10 5:30 - 6:10 Wednesday, May14 9:50 - 10:30 2:40 - 3:20 Thursday, May 15 9:50 - 10:30 1:30 - 2:10 2:20 - 3:00
  • 27. 27 Be sure to stop by the IBM booth to see some demos and get your rockin’ OpenStack T-shirt while they last. Thank you !