Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
OpenStack HA -
Theory to Reality
GERD PRÜßMANN SHAMAIL TAHIR
SRIRAM SUBRAMANIAN KALIN NIKOLOV
Gerd Prüßmann Shamail Tahir
Cloud Architect Cloud Architect
Deutsche Telekom AG EMC Office of the CTO
Sriram Subramanian K...
Agenda
OpenStack HA - Introduction
Active/ Active
Active/ Passive
DT Implementation
eBay/PayPal Implementation
Summary
OpenStack HA - Introduction
What does it mean?
Why is it not by default?
Stateless vs Stateful
Challenges
More than one wa...
Is This?
Or This?
Active/ Active
API Service Endpoints
Database
Networking
Active/ Active
● OS High Availability (HA) concept depends on components used for
i.e. network virtualization, storage bac...
Active/ Active
● Target: Try to have all services of the platform highly available
Redundancy and resiliency against singl...
Active/ Active - API service endpoints
API endpoints
● deploy on multiple nodes
● configure load balancing with virtual IP...
Active/ Active - Databases
● MySQL or MariaDB with Galera cluster
(wsrep) library extension
o transaction commit level rep...
Active/ Active - RabbitMQ
● RabbitMQ nodes clustered
● mirrored queues configured via policy (i.e. ha-mode all)
● all serv...
Active/ Active - Networking
Network
● deploy multiple network nodes
● Neutron DHCP agent – configure multiple DHCP agents
...
Active/ Active - Example
Deployment example
Active/ Passive
General
Tools Overview
Controllers Overview
Active/ Passive: General
● Components should leverage a Virtual IP
● The primary tools used for Active/Passive
OpenStack c...
Corosync
● Messaging Layer used by Cluster
● Responsibilities include cluster membership and
messaging
● Leverages RRP (Re...
Pacemaker
● Cluster Resource Manager
● Cluster Information Base (CIB)
o Represents current state of resources
and cluster ...
DRBD
● Distributed Replicated Block Device
● Creates logical block devices (e.g. /dev/drbdX) that
having backing volumes
●...
Host1
Active/Passive: Database
MySQL
Host2
MySQL
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back MySQL
...
Host1
Active/Passive: RabbitMQ
RabbitMQ
Host2
RabbitMQ
DRBD DRBD
Pacemaker Pacemaker
Corosync Corosync
● Use DRBD to back ...
Active/Passive: Overview (From Guide)
● Leverage DB, RabbitMQ VIP in configuration files
● Configure Pacemaker Resources f...
DT Implementation - Overview
● Business Market Place (BMP)
● SaaS offering
● https://portal.telekomcloud.com/
● SaaS Appli...
DT Implementation
DTAG scale out project (ongoing)
Target: Migrate production to a new DC and scale out
Requirements:
● sc...
DT Implementation
● single region HA OS instance
● all services distributed over two DC rooms
o Compute and Storage distri...
DT Implementation
● Load Balancing
o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test)
● MySQL
o Galera Multi...
DT Implementation
Tests/Experiences so far
● Load balancing works well
● Database: OpenStack multi-node write issues
o 1 n...
DT Implementation
Plans for the future
● use DVR / VRRP in the future
o make network more resilient and elastic
● a third ...
eBay/PayPal Implementation
The scope of Ebay/PayPal OpenStack Clouds
● 100% of PayPal web/mid tier
● Most of Dev/QA
● Numb...
eBay/PayPal Implementation
● Database
MySQL MMM replication, VIP with FailoverPersistence / Galera
● RabbitMQ
VIP with Sin...
eBay/PayPal Implementation
Successful HA Implementations
● LoadBalanced HA - VIPs for every service
● LB Single Node Failo...
eBay/PayPal Implementation
HA Failures
● Corosync/Pacemaker
NeutronDHCP and LBaaS - missing advanced health checks
● Rabbi...
eBay/PayPal Implementation
Future direction
● HA on Global or Regional Services
One leg in each Availability Zone
(Keyston...
eBay/PayPal Global Identity Service
eBay/PayPal Implementation
Lessons Learned
● Try not to overcomplicate
● Simulate Failures
Before placing in production ma...
● OpenStack HA Guide Update Efforts
● WTE Work Group (now known as ‘Enterprise’)
● Share Best Practices
Call to Action
Reference
OpenStack HA guide:
http://docs.openstack.org/high-availability-guide/content/index.html
Percona Resources
https...
Upcoming SlideShare
Loading in …5
×

Open stack HA - Theory to Reality

1,677 views

Published on

OpenStack Summit, Vancouver 2015 Talk
Title: OpenStack HA - Theory to Reality

Published in: Technology
  • Be the first to comment

Open stack HA - Theory to Reality

  1. 1. OpenStack HA - Theory to Reality GERD PRÜßMANN SHAMAIL TAHIR SRIRAM SUBRAMANIAN KALIN NIKOLOV
  2. 2. Gerd Prüßmann Shamail Tahir Cloud Architect Cloud Architect Deutsche Telekom AG EMC Office of the CTO Sriram Subramanian Kalin Nikolov Founder & Cloud Specialist Cloud Engineer CloudDon PayPal @2digitsleft @ShamailXD @sriramhere
  3. 3. Agenda OpenStack HA - Introduction Active/ Active Active/ Passive DT Implementation eBay/PayPal Implementation Summary
  4. 4. OpenStack HA - Introduction What does it mean? Why is it not by default? Stateless vs Stateful Challenges More than one way Active/ Passive Active/ Active
  5. 5. Is This?
  6. 6. Or This?
  7. 7. Active/ Active API Service Endpoints Database Networking
  8. 8. Active/ Active ● OS High Availability (HA) concept depends on components used for i.e. network virtualization, storage backend, database system etc. ● Various technologies available to realize HA: Vendors use combinations: i.e. Pacemaker, Corosync, Galera, Keepalived, HAProxy, VRRP, DRBD … or their own tools The following description is derived from the generic proposal from the OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.html
  9. 9. Active/ Active ● Target: Try to have all services of the platform highly available Redundancy and resiliency against single service / node failure ● stateless services are load balanced (HAproxy + keepalived) o i.e. API endpoints / nova-scheduler ● stateful services use individual HA technologies o i.e. RabbitMQ, MySQL DB etc. o might be load balanced as well ● some services/agents where no built in HA feature is available
  10. 10. Active/ Active - API service endpoints API endpoints ● deploy on multiple nodes ● configure load balancing with virtual IPs in HAproxy ● use HAproxy’s VIPs to configure respective identity endpoints ● all service configuration files refer to these VIPs only schedulers ● nova-scheduler, nova-conductor, cinder-scheduler, neutron-server, ceilometer-collector, heat-engine ● schedulers will be configured with clustered RabbitMQ nodes
  11. 11. Active/ Active - Databases ● MySQL or MariaDB with Galera cluster (wsrep) library extension o transaction commit level replication ● synchronous multiple master nodes setup o min. 3 nodes to get quorum in case of network partition ● Write and read to any node ● other databases options possible: Percona XtraDB, PostgreSQL etc.
  12. 12. Active/ Active - RabbitMQ ● RabbitMQ nodes clustered ● mirrored queues configured via policy (i.e. ha-mode all) ● all services use the RabbitMQ nodes
  13. 13. Active/ Active - Networking Network ● deploy multiple network nodes ● Neutron DHCP agent – configure multiple DHCP agents (dhcp_agents_per_network) ● Neutron L3 agent o Automatic L3 agent HA (allow_automatic_l3agent_failover) o VRRP (l3_ha, max_l3_agents_per_router, min_l3_agents_per_router) ● Neutron L2 agent - no HA available ● Neutron metadata agent – no HA availailable ● Neutron LBaaS agent – no HA available ● no HA feature available: active/passive pacemaker / corosync solution
  14. 14. Active/ Active - Example Deployment example
  15. 15. Active/ Passive General Tools Overview Controllers Overview
  16. 16. Active/ Passive: General ● Components should leverage a Virtual IP ● The primary tools used for Active/Passive OpenStack configurations are general (non- OpenStack specific): Pacemaker + Corosync, and DRBD
  17. 17. Corosync ● Messaging Layer used by Cluster ● Responsibilities include cluster membership and messaging ● Leverages RRP (Redundant Ring Protocol) o Rings can be set up as A/A or A/P o UDP Only o mcastport specifies rcv port; mcastport minus 1 is send port
  18. 18. Pacemaker ● Cluster Resource Manager ● Cluster Information Base (CIB) o Represents current state of resources and cluster configuration (XML) ● Cluster Resource Management Daemon (CRMd) o Acts as decision maker (one master) ● Policy Engine (PEngine) o Send instructions to LRMd and CRMd ● STONITHd o Fencing mechanism CRMd STONITHd CIB PEngine LRMd
  19. 19. DRBD ● Distributed Replicated Block Device ● Creates logical block devices (e.g. /dev/drbdX) that having backing volumes ● Reads serviced locally ● Primary node writes are sent to secondary node
  20. 20. Host1 Active/Passive: Database MySQL Host2 MySQL DRBD DRBD Pacemaker Pacemaker Corosync Corosync ● Use DRBD to back MySQL ● Leverage VIP that can float between hosts ● Manage all resources (including MySQL Daemon) with Pacemaker ● MySQL/Galera is an alternative but current version of HA Guide does not recommend it
  21. 21. Host1 Active/Passive: RabbitMQ RabbitMQ Host2 RabbitMQ DRBD DRBD Pacemaker Pacemaker Corosync Corosync ● Use DRBD to back RabbitMQ ● Leverage VIP that can float between hosts ● Ensure erlang.cookie are identical on all nodes o Enables ability to communicate with each other ● RabbitMQ clustering does not tolerate network partitions well
  22. 22. Active/Passive: Overview (From Guide) ● Leverage DB, RabbitMQ VIP in configuration files ● Configure Pacemaker Resources for OpenStack Services o Image API o Identity o Block Storage API o Telemetry Central Agent o Networking o L3-Agent o DHCP
  23. 23. DT Implementation - Overview ● Business Market Place (BMP) ● SaaS offering ● https://portal.telekomcloud.com/ ● SaaS Applications from Software Partners (ISVs) and DT offered to SME customers ● Platform based on Open Source technologies only (OpenStack, CEPH, Linux) ● Project started in 2012 with OS Essex, CEPH ● In production since 3/13
  24. 24. DT Implementation DTAG scale out project (ongoing) Target: Migrate production to a new DC and scale out Requirements: ● scale out compute by 30%, storage by 40% ● eliminate all SPOFs ● Setup in two fire protection areas / physically separated DC rooms
  25. 25. DT Implementation ● single region HA OS instance ● all services distributed over two DC rooms o Compute and Storage distributed equally o All OpenStack services HA (as far as possible)  OSS (DNS, NTP, puppet master, Mirror etc., redundant perimeter firewall) ● Instance distribution: 4 Availability Zones, multiple host aggregates and scheduler filters
  26. 26. DT Implementation ● Load Balancing o HAproxy for MySQL, services, RabbitMQ, APIs (nginx under test) ● MySQL o Galera Multi Master Node replication (3 nodes) ● RabbitMQ o 2 nodes cluster / mirrored queues ● Neutron o DHCP multiple agents started; Pacemaker/Corosync ● API Endpoints o Loadbalancing with round robin distribution ● Storage o 2 shared, distributed CEPH clusters (RBD/S3)
  27. 27. DT Implementation Tests/Experiences so far ● Load balancing works well ● Database: OpenStack multi-node write issues o 1 node write / 2 nodes backup: diminishes Galera HA efficiency (monitoring) ● Specific issues with deployment in 2 DC rooms / uneven distribution of services (Galera) o if the “wrong” room fails  Galera: quorum requires majority! room with 2 nodes goes down → 3rd node will deactivate itself → DB outage  Storage specific:  CEPH may lose 2/3 of the replicas → heavy replication load on CEPH cluster  danger of losing data (OSD/disk failure) → raise replica level / adapt crush map  Network: recovering from a neutron / L3 failure: <15 minutes to recover o pet applications vulnerable – may suffer from hick-ups at disasters anyway ● DHCP agent failures
  28. 28. DT Implementation Plans for the future ● use DVR / VRRP in the future o make network more resilient and elastic ● a third DC room would be desirable :-) o CEPH replicas / MONs, MySQL Galera
  29. 29. eBay/PayPal Implementation The scope of Ebay/PayPal OpenStack Clouds ● 100% of PayPal web/mid tier ● Most of Dev/QA ● Number of HVs: 8,500 ● Number of Virtual Machines: 70,000 ● Number of users: Several thousands ● Availability zones: 10
  30. 30. eBay/PayPal Implementation ● Database MySQL MMM replication, VIP with FailoverPersistence / Galera ● RabbitMQ VIP with SingleNode FailoverPersistence or 3 nodes with mirrored queues ● NeutronDHCP / LBaaS Corosync/Pacemaker ● API Endpoints LB VIPs for every service with either RR or least connection ● Storage Shared storage with nfs/iscsi
  31. 31. eBay/PayPal Implementation Successful HA Implementations ● LoadBalanced HA - VIPs for every service ● LB Single Node Failover Persistence Profile ● Galera/Percona for Identity Service ● Global Identity Service using GLB
  32. 32. eBay/PayPal Implementation HA Failures ● Corosync/Pacemaker NeutronDHCP and LBaaS - missing advanced health checks ● RabbitMQ Single Node Failover Persistence ● MySQL Replication Single Node Failover Persistence sometimes doesn't work well Implemented external monitoring and disabling of the failed member. ● VIPs without ECV health checks
  33. 33. eBay/PayPal Implementation Future direction ● HA on Global or Regional Services One leg in each Availability Zone (Keystone, LBaaS, Swift) ● RabbitMQ with 3 node/mirrored queues LB VIP with least connections ● No shared NFS for Glance
  34. 34. eBay/PayPal Global Identity Service
  35. 35. eBay/PayPal Implementation Lessons Learned ● Try not to overcomplicate ● Simulate Failures Before placing in production make sure HA works ● Place your services in different Availability zones or at least different FaultZones ● Always make backups No matter how robust your HA solution is
  36. 36. ● OpenStack HA Guide Update Efforts ● WTE Work Group (now known as ‘Enterprise’) ● Share Best Practices Call to Action
  37. 37. Reference OpenStack HA guide: http://docs.openstack.org/high-availability-guide/content/index.html Percona Resources https://www.percona.com/resources/mysql-webinars/high-availability-using- mysql-cloud-today-tomorrow-and-keys-your-success HA Proxy Documentation: http://www.haproxy.org/

×