Creating a
Reference Architecture for
High Availability at Nokia
Rick Lane
Consulting Member of Technical Staff
Nokia
Geo-redundant HA
reference architecture
Nokia Common Software Foundation (CSF)
● Central development organization for commonly used components
● Work with PUs within company to determine needs/plans
● Determine which open-source products are used/planned by multiple PUs
● Productize highly used open-source products in central organization to
eliminate silo-based development model to
○ Reduce overall corporate development costs
○ Improve time to market for products
CSF OS/DB Department
● Centralized OS (Linux) and DB development and distribution
● Each DB component will define common reference architecture(s)
● CSF DB component develops the following around the open-source DB for
each reference architecture:
○ Deployment and Life Cycle Management (LCM) of database in Cloud/openstack environment
○ Docker container for deployment and LCM via kubernetes/Helm → PaaS
○ Tools/scripts to support deployment and management of DB as above, including ansible
● Centralized support for OS and DB within corporation
CSF CMDB (MariaDB Component)
● MariaDB was selected as one of the CSF DB components
● CMDB provides DB server with following reference architectures:
○ Standalone
○ Galera Cluster w/ or w/o Arbitrator
○ Replication: Master/Master and Master/Slave
● Galera or Master-Master w/JDBC mostly used for basic HA needs
● Deployment and LCM developed and packaged for:
○ Cloud/openstack ansible automated deployment for all reference architectures
○ Automated kubernetes container deployment via Helm charts
HA clustered DB
Geo-Redundant HA Requirements
● Multi-datacenter deployment (dual data center
initially)
● Must have HA clustering at each datacenter
● 99.999% availability
● 5 second WAN replication lag tolerance
● 6 hours of Transaction Buffering for datacenter
disconnect
● Automatic replication recovery after datacenter
disconnect
● Procedure to recover standby data center from
active datacenter after 6 hours of replication failure
APPLICATION APPLICATION
DC-A DC-B
HA clustered DB
Geo-Redundant Application Assumptions
● Application will write to only one datacenter at a time
● Application is responsible for failing over to alternate datacenter on failure
● Inter-DC replication lag and binlog buffering are a function of application
transaction rate and WAN speed and reliability
● Some data loss acceptable on failure
Architecture Alternatives
● Galera Cluster at each DC with segment_id definition to limit inter-DC traffic
over a single link
○ Achieve HA at each DC with minimal data loss (internal)
○ Would ideally require 3 datacenters with 3 nodes each for quorum requirements
○ Synchronous replication between DCs could cause severe performance impacts for more
write-intensive applications
● Master-Master at each DC
○ Can provide intra-DC HA via JDBC
○ Auto_increment_offset/increment would have to be managed for all nodes at all DCs
○ Still have inter-DC replication path adjustments needed on local DC failover
● Still need node to run service to monitor alternate DC health and alarm
○ MaxScale makes sense here to provide proxy and manage local cluster
○ Nokia added services will monitor alternate DC health via MaxScale
Geo-Redundant Reference Architecture
read write
MASTERSLAVES
readwrite
MASTER SLAVES
APPLICATION APPLICATION
DC-A DC-B
Master/Slave at each
datacenter for HA and
auto-failover
Masters in each
datacenter cross-
replicate to each other
Datacenter HA (automatic failover)
● First started working with replication-manager to support auto-failover
○ Replication-manager runs along-side MaxScale and performs auto-failover
○ MaxScale expected to auto-detect topology changes and update status - not always
○ Replication-manager configuration very complex
● Discovered maxscale-2.2 with auto-failover built in
○ Single point topology manager (no inconsistencies)
○ No additional 3rd party open-source plugins required (fully supported)
○ Worked with MaxScale development team as beta-tester
MaxScale-2.2 Initial Testing
● Very simple configuration
ignore_external_masters = true
auto_failover = true
auto_rejoin = true
● Maxscale provides full datacenter HA
○ Automatic promotion of a Slave as new Master when Master fails
○ Rejoin of previous failed master as Slave when recovers
○ Maxscale manages inter-DC cluster, we would add scripts to fix replication in both directions
when Master failover occurs in either DC
○ Huge simplification over using 3rd party replication managers / deterministic behavior
○ Maxscale now has crash-safe recovery
MaxScale-2.2 Final Testing
● New configuration
ignore_external_masters = false
auto_failover = true
auto_rejoin = true
● Additional behavior
○ Changed to set ignore_external_masters = true
○ New Master automatically replicates from same external Master as failed Master
○ Notification script plugin on topology changes allows us to automatically fix alternate
datacenter Master to replication from new Master on “new_master” notify event
○ Failover very quick (500ms monitor interval) - 2 to 3 seconds?
Master/Slave Database HA with MaxScale
• Automatic Failover: election and promotion of slave by MaxScale
• Continues routing read queries to slave during failover
• Manual failover and switchover interface
readwrite
MASTER SLAVES
MAXSCALE
1
1 Master fails
2 MaxScale elects new master
3 MaxScale promote candidate slave as new master
4 MaxScale instructs rest of the slave of new master
5 MaxScale sets new master to replication from old
Master external server (if existed)
6 MaxScale calls new_master event notify - Nokia
script to fix external master to replication to new
local master
2,34
Containerized Delivery
● Deployment of Master/Slave container cluster
○ Automatic configuration of first container as Master, other two Slave
○ All containers will automatically recover as Slave role
○ Advertise IPs via etcd service advertisement
● Deployment of Maxscale container
○ Gets all config from kubernetes manifest file and server IPs from etcd advertisement
○ Automatically configures maxscale and starts monitor of DB M/S cluster
● Container failure and re-deployed with different IP
○ Developed service to monitor etcd IP advertisements and detect host IP change at MaxScale
○ Run: maxadmin alter server <hostname> address=<ip>
Nokia developed capabilities
● Notify script plugin on “new_master” event to set remote DC master to new
promoted master
● Additional service will be developed to perform the following additional
functions:
○ Monitor MaxScale in alternate datacenter to verify datacenter health and verify local master
replicating to correct external master
○ Monitor replication to generate SNMP traps when replication breaks
○ Monitor replication lag and generate SNMP traps when lag exceeds threshold
○ Possible implementation of replication CHANGE MASTER retry if replication fails due to GTID
not found
○ Since log_slave_updates must be set, configurable “slave flush interval” to flush binlogs in
slaves
Future work
● Still need to see how replication will hold during failure under load
conditions
● MaxScale enhancement to full automate inter-DC master failover (both
directions)
● Support Galera Cluster as HA solution in each DC
● Support MaxScale HA cluster at each DC (via keepalived)
● Support segregated database access on both databases at same time?
Thank you!
Q&A

M|18 Creating a Reference Architecture for High Availability at Nokia

  • 1.
    Creating a Reference Architecturefor High Availability at Nokia Rick Lane Consulting Member of Technical Staff Nokia
  • 2.
  • 3.
    Nokia Common SoftwareFoundation (CSF) ● Central development organization for commonly used components ● Work with PUs within company to determine needs/plans ● Determine which open-source products are used/planned by multiple PUs ● Productize highly used open-source products in central organization to eliminate silo-based development model to ○ Reduce overall corporate development costs ○ Improve time to market for products
  • 4.
    CSF OS/DB Department ●Centralized OS (Linux) and DB development and distribution ● Each DB component will define common reference architecture(s) ● CSF DB component develops the following around the open-source DB for each reference architecture: ○ Deployment and Life Cycle Management (LCM) of database in Cloud/openstack environment ○ Docker container for deployment and LCM via kubernetes/Helm → PaaS ○ Tools/scripts to support deployment and management of DB as above, including ansible ● Centralized support for OS and DB within corporation
  • 5.
    CSF CMDB (MariaDBComponent) ● MariaDB was selected as one of the CSF DB components ● CMDB provides DB server with following reference architectures: ○ Standalone ○ Galera Cluster w/ or w/o Arbitrator ○ Replication: Master/Master and Master/Slave ● Galera or Master-Master w/JDBC mostly used for basic HA needs ● Deployment and LCM developed and packaged for: ○ Cloud/openstack ansible automated deployment for all reference architectures ○ Automated kubernetes container deployment via Helm charts
  • 6.
    HA clustered DB Geo-RedundantHA Requirements ● Multi-datacenter deployment (dual data center initially) ● Must have HA clustering at each datacenter ● 99.999% availability ● 5 second WAN replication lag tolerance ● 6 hours of Transaction Buffering for datacenter disconnect ● Automatic replication recovery after datacenter disconnect ● Procedure to recover standby data center from active datacenter after 6 hours of replication failure APPLICATION APPLICATION DC-A DC-B HA clustered DB
  • 7.
    Geo-Redundant Application Assumptions ●Application will write to only one datacenter at a time ● Application is responsible for failing over to alternate datacenter on failure ● Inter-DC replication lag and binlog buffering are a function of application transaction rate and WAN speed and reliability ● Some data loss acceptable on failure
  • 8.
    Architecture Alternatives ● GaleraCluster at each DC with segment_id definition to limit inter-DC traffic over a single link ○ Achieve HA at each DC with minimal data loss (internal) ○ Would ideally require 3 datacenters with 3 nodes each for quorum requirements ○ Synchronous replication between DCs could cause severe performance impacts for more write-intensive applications ● Master-Master at each DC ○ Can provide intra-DC HA via JDBC ○ Auto_increment_offset/increment would have to be managed for all nodes at all DCs ○ Still have inter-DC replication path adjustments needed on local DC failover ● Still need node to run service to monitor alternate DC health and alarm ○ MaxScale makes sense here to provide proxy and manage local cluster ○ Nokia added services will monitor alternate DC health via MaxScale
  • 9.
    Geo-Redundant Reference Architecture readwrite MASTERSLAVES readwrite MASTER SLAVES APPLICATION APPLICATION DC-A DC-B Master/Slave at each datacenter for HA and auto-failover Masters in each datacenter cross- replicate to each other
  • 10.
    Datacenter HA (automaticfailover) ● First started working with replication-manager to support auto-failover ○ Replication-manager runs along-side MaxScale and performs auto-failover ○ MaxScale expected to auto-detect topology changes and update status - not always ○ Replication-manager configuration very complex ● Discovered maxscale-2.2 with auto-failover built in ○ Single point topology manager (no inconsistencies) ○ No additional 3rd party open-source plugins required (fully supported) ○ Worked with MaxScale development team as beta-tester
  • 11.
    MaxScale-2.2 Initial Testing ●Very simple configuration ignore_external_masters = true auto_failover = true auto_rejoin = true ● Maxscale provides full datacenter HA ○ Automatic promotion of a Slave as new Master when Master fails ○ Rejoin of previous failed master as Slave when recovers ○ Maxscale manages inter-DC cluster, we would add scripts to fix replication in both directions when Master failover occurs in either DC ○ Huge simplification over using 3rd party replication managers / deterministic behavior ○ Maxscale now has crash-safe recovery
  • 12.
    MaxScale-2.2 Final Testing ●New configuration ignore_external_masters = false auto_failover = true auto_rejoin = true ● Additional behavior ○ Changed to set ignore_external_masters = true ○ New Master automatically replicates from same external Master as failed Master ○ Notification script plugin on topology changes allows us to automatically fix alternate datacenter Master to replication from new Master on “new_master” notify event ○ Failover very quick (500ms monitor interval) - 2 to 3 seconds?
  • 13.
    Master/Slave Database HAwith MaxScale • Automatic Failover: election and promotion of slave by MaxScale • Continues routing read queries to slave during failover • Manual failover and switchover interface readwrite MASTER SLAVES MAXSCALE 1 1 Master fails 2 MaxScale elects new master 3 MaxScale promote candidate slave as new master 4 MaxScale instructs rest of the slave of new master 5 MaxScale sets new master to replication from old Master external server (if existed) 6 MaxScale calls new_master event notify - Nokia script to fix external master to replication to new local master 2,34
  • 14.
    Containerized Delivery ● Deploymentof Master/Slave container cluster ○ Automatic configuration of first container as Master, other two Slave ○ All containers will automatically recover as Slave role ○ Advertise IPs via etcd service advertisement ● Deployment of Maxscale container ○ Gets all config from kubernetes manifest file and server IPs from etcd advertisement ○ Automatically configures maxscale and starts monitor of DB M/S cluster ● Container failure and re-deployed with different IP ○ Developed service to monitor etcd IP advertisements and detect host IP change at MaxScale ○ Run: maxadmin alter server <hostname> address=<ip>
  • 15.
    Nokia developed capabilities ●Notify script plugin on “new_master” event to set remote DC master to new promoted master ● Additional service will be developed to perform the following additional functions: ○ Monitor MaxScale in alternate datacenter to verify datacenter health and verify local master replicating to correct external master ○ Monitor replication to generate SNMP traps when replication breaks ○ Monitor replication lag and generate SNMP traps when lag exceeds threshold ○ Possible implementation of replication CHANGE MASTER retry if replication fails due to GTID not found ○ Since log_slave_updates must be set, configurable “slave flush interval” to flush binlogs in slaves
  • 16.
    Future work ● Stillneed to see how replication will hold during failure under load conditions ● MaxScale enhancement to full automate inter-DC master failover (both directions) ● Support Galera Cluster as HA solution in each DC ● Support MaxScale HA cluster at each DC (via keepalived) ● Support segregated database access on both databases at same time?
  • 17.