High Availability
ManageIQ/CloudForms
Brett Thurber - Red Hat
June 2016
Agenda
Introduction & Acknowledgements
What is HA?
Traditional HA
What’s on the horizon?
pglogical
BDR
Containers & Kubernetes
Summary
Introduction & Acknowledgements
Brett Thurber - RHCT, RHCE, RHCDS, RHCA, RHCVA
20+ years of IT experience
Been with Red Hat since 2011
Team lead in Systems Engineering focused on management and integrated solutions
Worked with MIQ/CloudForms since 2013
Authored 11 Reference Architectures
Presented at RH Summit 2015 - Application portability & interoperability with Red Hat Cloud Infrastructure
Contact: bthurber@redhat.com
What is HA?
“A system or component that is continuously operational for a desirably long
length of time. Availability can be measured relative to "100% operational" or
"never failing."” - Source: SearchDataCenter
“A characteristic of a system, which aims to ensure an agreed level of
operational performance for a higher than normal period.” - Source:
Wikipedia
Traditional HA
Heavy Lift
Highly complex and resource intensive
Shared storage
iSCSI, NFS, fibre channel
Multiple number of bare metal or VM hosts
Minimum of 2 cluster hosts for pgsql database
2+ MIQ/CFME instances
Haproxy to load balance
Complex and time intensive deployment
Active/Passive Deployment Pattern: intra-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql pacemaker
VIP
Active/Passive Deployment Pattern: inter-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql
Streaming
Replication
Site 1 Site 2
What’s on the horizon?
Interesting possibilities...
Emerging technologies present the possibility of reducing the complexity of HA
and postgresql.
pglogical
BDR
Containers & Kubernetes
pglogical
pglogical
What is pglogical?
pglogical offers Logical Replication as a PostgreSQL extension and is a replacement
for streaming replication
Introduced in postgresql 9.4 (MIQ Capablanca, CloudForms 4.1)
Less complex solution for database replication
pglogical works on a per-database level, not whole server level like physical
streaming replication
One Provider may feed multiple Subscribers without incurring additional disk
write overhead
How would it work?
pgsql pgsql pgsql pgsql
VMDB Database
MIQ/CFME MIQ/CFME
haproxy
VIP
SubscribersPublisher
What about failover?
pgsql pgsql pgsql pgsql
VMDB Database
MIQ/CFME MIQ/CFME
haproxy
VIP
SubscribersPublisher
??? ??? ???
pglogical limitations...
Not suitable for failover
Automatic DDL (data definition language) replication is not supported
Logical decoding doesn't decode catalog changes directly. So the plugin can't just
send a CREATE TABLE statement when a new table is added.
If the data being decoded is being applied to another PostgreSQL database then its
table definitions must be kept in sync via some means external to the logical
decoding plugin itself, such as:
Event triggers using DDL deparse to capture DDL changes as they happen and
write them to a table to be replicated and applied on the other end
Doing DDL management via tools that synchronise DDL on all nodes
Bi-Directional Replication
BDR
What is BDR?
Bi-Directional Replication (BDR) is an asynchronous multi-master replication system
for PostgreSQL, specifically designed to allow geographically distributed clusters.
Supporting up to 48 nodes (and possibly more in future releases). BDR is a low
overhead, low maintenance technology for distributed databases.
BDR excels in environments where users are distributed across high-latency
and/or unreliable network links where conventional tightly-coupled clustering
software does not work well
Support for DDL replication and Global DDL locking
Active/Active BDR Deployment Pattern: intra-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql
BDR
Active/Active BDR Deployment Pattern: inter-site
MIQ/CFME
haproxy
VIP
MIQ/CFME
pgsql pgsql
BDR
Site 1 Site 2
BDR limitations...
Still under development; not production ready (requires modified version of 9.4)
Asynchronous replication
Changes made on one BDR node are not replicated to other nodes before they are
committed locally. As a result the data is not exactly the same on all nodes at any
given time
Non-shared storage architecture means additional storage space considerations
Containers & Kubernetes
Containers?
Docker image for ManageIQ under development
Currently monolithic
Allows for a MIQ container image to be deployed to Atomic Host and other
container providers
Service decoupling on the horizon
Utilizing kubernetes pods, allows for:
Service distribution across multiple hosts
Persistent storage to be used for database
Highly available and scalable architecture
Possible Container Architecture
Container
Pod
http
rails
pgsql
Persistent Storage
Container
Pod
http
rails
pgsql
Persistent Storage
BDR
Node
Proxy
Possible Container Architecture (con’t)
Container
Pod
http
rails
pgsql
Persistent Storage
Container
Pod
http
rails
pgsql
Persistent Storage
BDR
Node
Proxy
Node
Proxy
Overlay
Network
What about networking?
Kubernetes imposes the following network rules:
All containers can communicate with all other containers without NAT
All nodes can communicate with all containers (and vice-versa) without NAT
The IP that a container sees itself as is the same IP that others see it as
Supported overlay networks
L2 networks and linux bridging
Flannel
OpenVSwitch
Romana
Summary
In closing….
Traditional HA clustering is complex, expensive, time consuming to implement
and poses some support limitations
pglogical is a good replacement for streaming replication however lacks some
needed features to make it a viable HA solution
BDR bridges the necessary gaps with pglogical to offer a viable HA solution
however is still growing in maturity (> postgresql 9.4)
Containers, coupled with Kubernetes, offer compelling use cases to include self-
healing, upgrades, scaling and high availability
Q & A
Thank You!
References
CloudForms 3.x HA Reference Architecture
Streaming Replication
pglogical FAQ
pglogical vs. streaming replication (logical vs. physical)
BDR Project
BDR Overview
BDR Requirements
MIQ Container Image

High Availability - Brett Thurber - ManageIQ Design Summit 2016

  • 1.
  • 2.
    Agenda Introduction & Acknowledgements Whatis HA? Traditional HA What’s on the horizon? pglogical BDR Containers & Kubernetes Summary
  • 3.
    Introduction & Acknowledgements BrettThurber - RHCT, RHCE, RHCDS, RHCA, RHCVA 20+ years of IT experience Been with Red Hat since 2011 Team lead in Systems Engineering focused on management and integrated solutions Worked with MIQ/CloudForms since 2013 Authored 11 Reference Architectures Presented at RH Summit 2015 - Application portability & interoperability with Red Hat Cloud Infrastructure Contact: bthurber@redhat.com
  • 4.
    What is HA? “Asystem or component that is continuously operational for a desirably long length of time. Availability can be measured relative to "100% operational" or "never failing."” - Source: SearchDataCenter “A characteristic of a system, which aims to ensure an agreed level of operational performance for a higher than normal period.” - Source: Wikipedia
  • 5.
  • 6.
    Heavy Lift Highly complexand resource intensive Shared storage iSCSI, NFS, fibre channel Multiple number of bare metal or VM hosts Minimum of 2 cluster hosts for pgsql database 2+ MIQ/CFME instances Haproxy to load balance Complex and time intensive deployment
  • 7.
    Active/Passive Deployment Pattern:intra-site MIQ/CFME haproxy VIP MIQ/CFME pgsql pgsql pacemaker VIP
  • 8.
    Active/Passive Deployment Pattern:inter-site MIQ/CFME haproxy VIP MIQ/CFME pgsql pgsql Streaming Replication Site 1 Site 2
  • 9.
  • 10.
    Interesting possibilities... Emerging technologiespresent the possibility of reducing the complexity of HA and postgresql. pglogical BDR Containers & Kubernetes
  • 11.
  • 12.
    pglogical What is pglogical? pglogicaloffers Logical Replication as a PostgreSQL extension and is a replacement for streaming replication Introduced in postgresql 9.4 (MIQ Capablanca, CloudForms 4.1) Less complex solution for database replication pglogical works on a per-database level, not whole server level like physical streaming replication One Provider may feed multiple Subscribers without incurring additional disk write overhead
  • 13.
    How would itwork? pgsql pgsql pgsql pgsql VMDB Database MIQ/CFME MIQ/CFME haproxy VIP SubscribersPublisher
  • 14.
    What about failover? pgsqlpgsql pgsql pgsql VMDB Database MIQ/CFME MIQ/CFME haproxy VIP SubscribersPublisher ??? ??? ???
  • 15.
    pglogical limitations... Not suitablefor failover Automatic DDL (data definition language) replication is not supported Logical decoding doesn't decode catalog changes directly. So the plugin can't just send a CREATE TABLE statement when a new table is added. If the data being decoded is being applied to another PostgreSQL database then its table definitions must be kept in sync via some means external to the logical decoding plugin itself, such as: Event triggers using DDL deparse to capture DDL changes as they happen and write them to a table to be replicated and applied on the other end Doing DDL management via tools that synchronise DDL on all nodes
  • 16.
  • 17.
    BDR What is BDR? Bi-DirectionalReplication (BDR) is an asynchronous multi-master replication system for PostgreSQL, specifically designed to allow geographically distributed clusters. Supporting up to 48 nodes (and possibly more in future releases). BDR is a low overhead, low maintenance technology for distributed databases. BDR excels in environments where users are distributed across high-latency and/or unreliable network links where conventional tightly-coupled clustering software does not work well Support for DDL replication and Global DDL locking
  • 18.
    Active/Active BDR DeploymentPattern: intra-site MIQ/CFME haproxy VIP MIQ/CFME pgsql pgsql BDR
  • 19.
    Active/Active BDR DeploymentPattern: inter-site MIQ/CFME haproxy VIP MIQ/CFME pgsql pgsql BDR Site 1 Site 2
  • 20.
    BDR limitations... Still underdevelopment; not production ready (requires modified version of 9.4) Asynchronous replication Changes made on one BDR node are not replicated to other nodes before they are committed locally. As a result the data is not exactly the same on all nodes at any given time Non-shared storage architecture means additional storage space considerations
  • 21.
  • 22.
    Containers? Docker image forManageIQ under development Currently monolithic Allows for a MIQ container image to be deployed to Atomic Host and other container providers Service decoupling on the horizon Utilizing kubernetes pods, allows for: Service distribution across multiple hosts Persistent storage to be used for database Highly available and scalable architecture
  • 23.
    Possible Container Architecture Container Pod http rails pgsql PersistentStorage Container Pod http rails pgsql Persistent Storage BDR Node Proxy
  • 24.
    Possible Container Architecture(con’t) Container Pod http rails pgsql Persistent Storage Container Pod http rails pgsql Persistent Storage BDR Node Proxy Node Proxy Overlay Network
  • 25.
    What about networking? Kubernetesimposes the following network rules: All containers can communicate with all other containers without NAT All nodes can communicate with all containers (and vice-versa) without NAT The IP that a container sees itself as is the same IP that others see it as Supported overlay networks L2 networks and linux bridging Flannel OpenVSwitch Romana
  • 26.
  • 27.
    In closing…. Traditional HAclustering is complex, expensive, time consuming to implement and poses some support limitations pglogical is a good replacement for streaming replication however lacks some needed features to make it a viable HA solution BDR bridges the necessary gaps with pglogical to offer a viable HA solution however is still growing in maturity (> postgresql 9.4) Containers, coupled with Kubernetes, offer compelling use cases to include self- healing, upgrades, scaling and high availability
  • 28.
  • 29.
  • 30.
    References CloudForms 3.x HAReference Architecture Streaming Replication pglogical FAQ pglogical vs. streaming replication (logical vs. physical) BDR Project BDR Overview BDR Requirements MIQ Container Image