High Availability and Fault Tolerance
            (OpenStack)

             Deepak Mane
            Cloud Architect
Objective & Motivation

• To Build a a Fault Tolerance and High Availability
  Architecture (OpenStack)
• Motivation
   – To build a fault tolerance architecture for OpenStack
   – Build a Cluster Architecture for MySQL RabbitMQ
     components
   – To build high availability architecture for network
   – To build a predictive and reactive model for detecting
     failures of Nova , Swift and Compute.
   –
Use cases
• Master-Master Cluster architecture for MySQL
• Disk Level replication for mySQL using DBRD for
  Glance , Swift and Cinder
• Session level replication for RabbitMQ
• High availability for networking
• High availability for Horizon (Openstack
  dashboard)
• Predictive model for detecting failure for all
  components
• Reactive model for recovery for all components.
Non Use Cases
• Scenarios not suitable for cloud
  – Redundancy of network components, such as
    switches and routers,
  – Redundancy of applications and automatic service
    migration,
  – Redundancy of storage components,
  – Redundancy of facility services such as power, air
    conditioning, fire protection, and others
Pacemaker – High availability for
             OpenStack
• Cluster stack, the state of- the-art high availability
  and load balancing stack for the Linux platform
• Storage- and application-agnostic, and is in no
  way specific to OpenStack
• Pacemaker relies on the Corosync messaging
  layer for reliable cluster communications.
• Corosync implements the Totem single-ring
  ordering and membership protocol and provides
  UDP and InfiniBand based messaging, quorum,
  and cluster membership to Pacemaker.
Required packages
•    pacemaker
•    corosync
•    cluster-glue
•   resource-agents
Architecture and Technology
        Information
HA Architecture – Cloud Controller



                            High availability




Implemented using DBRD , Pacemaker , Corosync
DBRD Architecture – MySQL-Cloud
           Controller




   Master Cloud controller   Slave Cloud controller
DBRD Architecture – RabbitMQ- Cloud Controller




  Master Cloud controller   Slave Cloud controller
Nova - Recovery mode Approach
• Nova - Recovery mode Approach
Cloud Controller – Recovery approach

High availability and fault tolerance of openstack

  • 1.
    High Availability andFault Tolerance (OpenStack) Deepak Mane Cloud Architect
  • 2.
    Objective & Motivation •To Build a a Fault Tolerance and High Availability Architecture (OpenStack) • Motivation – To build a fault tolerance architecture for OpenStack – Build a Cluster Architecture for MySQL RabbitMQ components – To build high availability architecture for network – To build a predictive and reactive model for detecting failures of Nova , Swift and Compute. –
  • 3.
    Use cases • Master-MasterCluster architecture for MySQL • Disk Level replication for mySQL using DBRD for Glance , Swift and Cinder • Session level replication for RabbitMQ • High availability for networking • High availability for Horizon (Openstack dashboard) • Predictive model for detecting failure for all components • Reactive model for recovery for all components.
  • 4.
    Non Use Cases •Scenarios not suitable for cloud – Redundancy of network components, such as switches and routers, – Redundancy of applications and automatic service migration, – Redundancy of storage components, – Redundancy of facility services such as power, air conditioning, fire protection, and others
  • 5.
    Pacemaker – Highavailability for OpenStack • Cluster stack, the state of- the-art high availability and load balancing stack for the Linux platform • Storage- and application-agnostic, and is in no way specific to OpenStack • Pacemaker relies on the Corosync messaging layer for reliable cluster communications. • Corosync implements the Totem single-ring ordering and membership protocol and provides UDP and InfiniBand based messaging, quorum, and cluster membership to Pacemaker.
  • 6.
    Required packages • pacemaker • corosync • cluster-glue • resource-agents
  • 7.
  • 8.
    HA Architecture –Cloud Controller High availability Implemented using DBRD , Pacemaker , Corosync
  • 9.
    DBRD Architecture –MySQL-Cloud Controller Master Cloud controller Slave Cloud controller
  • 10.
    DBRD Architecture –RabbitMQ- Cloud Controller Master Cloud controller Slave Cloud controller
  • 11.
    Nova - Recoverymode Approach • Nova - Recovery mode Approach
  • 12.
    Cloud Controller –Recovery approach