Deploying MariaDB for HA on Google Cloud Platform

2© The Pythian Group Inc., 2018
February 26, 2019 - New York City, NY, USA
Matthias Crauwels
Deploying MariaDB for
High Availability on
Google Cloud Platform

© The Pythian Group Inc., 2018 3
Who am I?

© The Pythian Group Inc., 2018 44© The Pythian Group Inc., 2017
Matthias Crauwels
● Living in Ghent, Belgium
● Bachelor Computer Science
● ~20 years Linux user / admin
● ~10 years PHP developer
● ~8 years MySQL DBA
● 3rd year at Pythian
● Currently Lead Database Consultant
● GCP Certified Professional Architect
● AWS Solutions Architect Associate
● Father of Leander

© The Pythian Group Inc., 2018 5© The Pythian Group Inc., 2019 5
PYTHIAN
A global IT company that helps businesses leverage disruptive technologies to better compete.
Our services and software solutions unleash the power of cloud, data and analytics to drive better
business outcomes for our clients.
Our 20 years in data, commitment to hiring the best talent, and our deep technical and business expertise
allow us to meet our promise of using technology to deliver the best outcomes faster.
© The Pythian Group Inc., 2019

AI / ML / BLOCKCHAIN
Intelligent analytics
and decision making
Software autonomy
Disruptive data technologies
CLOUD MIGRATION
& OPERATIONS
Plan, Migrate, Manage,
Optimize, Innovate
Multi-cloud, Hybrid-Cloud,
Cloud Native
ANALYTIC DATA SYSTEMS
Kick AaaS cloud-native, pre-packaged
analytics platform
Custom analytics platform design, implementation
and support services–for on-premises and cloud
Data science consulting and implementation services
OPERATIONAL DATA SYSTEMS
Database services–architecture
to ongoing management
On prem and in the cloud
Oracle, MS SQL, MySQL, Cassandra, MongoDB,
Hadoop, AWS/Azure/Google DBaaS

AGENDA
● Google Cloud Platform (GCP)
● 2 possible implementation
● More complex theoretical example

Let's get started!

9© The Pythian Group Inc., 2018© The Pythian Group Inc., 2019 9
● Public cloud by Google
● Started as "AppEngine" (GA since Nov 2011)
● New services grew quickly, Compute Engine, CloudSQL, ...
Google Cloud Platform

Service comparison to AWS
Google Cloud Platform (GCP) Amazon Web Services (AWS)
Google Compute Engine (GCE) Elastic Compute Cloud (EC2)
Google Cloud Storage (GCS) Simple Storage Service (S3)
Google Kubernetes Engine (GKE) Elastic Container Service for Kubernetes
Google CloudSQL Relational Database Service (RDS)
Google BigQuery Redshift
... ...

Google Cloud Storage options
Source: Coursera

2 relational database stores?
Google CloudSQL Google Spanner

13© The Pythian Group Inc., 2018 13© The Pythian Group Inc., 2019
Google CloudSQL Google Spanner
● Managed service for MySQL
and PostgreSQL
● Scales up to 10 TB storage
● Regional availability
● Fully managed service
● Read replicas in multiple
zones
● Relational database store
● Horizontally scalable
● Heavily sharded
● Global availability
● Highly Available
● Fully managed service

● Stability
CloudSQL is still a pretty young product, it's not really mature (yet).
● Flexibility
There are still limits to the SQL you can run, not a simple
lift-and-shift
● Cost
Spanner is more expensive than Compute Engine
Why do something else?

Solution 1

● MariaDB + Galera cluster
● Minimum 3 nodes (always odd number of nodes)
● Preferably within the same region (latency)
● Writing to any node is supported, but not recommended
● Suggested to use a proxy-layer
● MaxScale (for MariaDB subscribers)
● ProxySQL
MariaDB Cluster

MariaDB Cluster - schematics

On premise → GCP
1. Take a backup on premise
2. Use backup to seed first node in GCE
3. Bootstrap first node
4. Start node 2 and 3 → they will perform SST
5. Set up async replication from on prem to first node
6. Setup proxies and load-balancers
7. Cut over application to new load-balancer
MariaDB Cluster - migration path

MariaDB Cluster - conclusion
Pro Con
Lift and shift from on-prem to the cloud Will not keep scaling
You can put nodes in different zones within the
same region
Will not scale to multi-region
Proxy will perform read-write split All the perks from using Galera
All the benefits from using Galera

Solution 2

● Default replication method for years in MySQL world
● Very stable and reliable
● Minimum 2 nodes (1 master + 1 replica/slave)
● Can be multi-region
● What about master high availability?
Regular (asynchronous) replication

Basic database architecture

● Master is single-point-of-failure (spof)
● How to detect master failure?
● What to do when failure is detected?
● How to announce changes in topology to the application
● ...
● Slaves scale out pretty well but
● There is no global load balancer for port 3306
● ...
Problems

Orchestrator is a High Availability and replication management tool.
● Works with ALL flavours of MySQL / MariaDB
● Can be use for multiple purposes
● discovery
● visualisation
● refactoring
● recovery
Master high availability using Orchestrator

Orchestrator - discovery
Orchestrator can (and will) discover your entire replication technology as
soon as you connect it to a single server in the topology.
It will use SHOW SLAVE HOSTS, SHOW PROCESSLIST, SHOW
SLAVE STATUS to try and connect to the other servers in the topology.
Requirement: the orchestrator_topology_userneeds to be created
on every server in the cluster so it can connect.

Orchestrator - visualisation
Orchestrator comes with a web interface that visualizes the servers in the
topology.

Orchestrator - refactoring
Orchestrator can be used to refactor the topology.
This can be done from the command line tool, via the API or even via the
web interface by dragging and dropping.
You can do things like
● Repoint a slave to a new master
● Promote a server to a (co-)master
● Start / Stop slave
● ...

Orchestrator - recovery
All of these features are nice, but they still require a human to execute
them. This doesn’t help you much when your master goes down at 3AM
and you get paged to resolve this.
Orchestrator can be configured to automatically recover your topology
from an outage.

Orchestrator - how recovery works?
To be able to perform a recovery, Orchestrator first needs to detect a
failure.
As indicated before Orchestrator connects to every server in the topology
and gathers information from each of the instances.
Orchestrator uses this information to make decisions on the best action to
take. They call this the holistic approach.

Orchestrator - failure detection example

Orchestrator HA
Orchestrator was written with High Availability as a basic concept.
You can easily run multiple Orchestrator instances with a shared MySQL
backend. All instances will collect all information but they will allow only
one instance to be the “active node” and to make changes to the
topology.
To eliminate a single-point-of-failure in the database backend you can
use either master-master replication (2 nodes) or Galera synchronous
replication (3 nodes).

Orchestrator HA (2)
Since version 3.x of Orchestrator there is “Orchestrator-on-Raft”.
Orchestrator now implements the ‘raft consensus protocol’. This will
● Ensure that a leader node is elected from the available nodes
● Ensure that the leader node has a quorum (majority) at all times
● Allow to run Orchestrator without a shared database backend
● Allow to run without a MySQL backend but use a sqlite backend

● Managed Instance Group requiring 3 nodes
● Orchestrator nodes using
● Raft for leader elections
● SQLite backend for local state
● Orchestrator database is auto-healing so no data replication is
required
Orchestrator in GCP

Orchestrator in GCP

Putting it together

● We introduce a proxy layer
● to help with read write splitting
● to prevent application connections to break on failover
● to have an easy to manage endpoint
● Proxy options
● ProxySQL
● MaxScale
● In this example I will use ProxySQL
Read write splitting

● Layer 7 proxy, understands MySQL protocol
● Uses "hostgroups" to group hosts together
(example one hostgroup for master and one for slaves)
● Query rules to redirect traffic to hostgroups
example:
by default all queries go to master hostgroup
add query rule to have regex ^SELECT to go to slaves
● ProxySQL clustering
● Proxy's will share configuration
● On GCP, start with 2 instances in Managed Instance Group with
autoscaling, add load balancer for traffic
ProxySQL

ProxySQL

Putting it together

● Up until this slide ProxySQL and MaxScale are interchangeable
● Hostgroups and Query rules can be replaced by the
read-write-split-service.
● We picked ProxySQL because of what comes next:
How do we connect Orchestrator and the proxy in a safe way?
Why ProxySQL over MaxScale

● Consul
● Distributed key-value store
● Orchestrator has native support for Consul
● Works across DC's
● Consul-template
● Connects to a consul service
● Updates a templated file everytime consul is update
● Runs arbitrary command on every update
Consul and Consul-template

● Orchestrator detects failure
● Orchestrator takes action, promoting new master
● Orchestrator updates Consul KV store
● Consul distributes values and notifies watchers
● Consul-template receives a notification
● Consul-template recreates query file and launches configured
command
Consul workflow

● Orchestrator has built-in support to update master entries in the Consul KV store
● But if we want the new master to be taken out of the reader pool we need to
implement some "hooks" to update Consul
● Example
# cat /usr/local/orchestrator/postfailoverhook.sh
#!/bin/bash
CONSUL_EXEC="/usr/local/bin/consul"
CONSUL_SLAVE_PREFIX="mysql/slave"
echo "Failover occurred. Running custom PostMasterFailoverProcesses Hook"
# orchestrator needs to make call to consul so the following is accomplished:
# 1. remove new master from slave group
# 2. put old master in slave group
echo "Removing new master from slave pool:
${CONSUL_SLAVE_PREFIX}/${ORC_FAILURE_CLUSTER_ALIAS}/${ORC_SUCCESSOR_HOST}"
${CONSUL_EXEC} kv delete ${CONSUL_SLAVE_PREFIX}/${ORC_FAILURE_CLUSTER_ALIAS}/${ORC_SUCCESSOR_HOST}
echo "Adding old master to slave pool: ${CONSUL_SLAVE_PREFIX}/${ORC_FAILURE_CLUSTER_ALIAS}/${ORC_FAILED_HOST}"
${CONSUL_EXEC} kv put ${CONSUL_SLAVE_PREFIX}/${ORC_FAILURE_CLUSTER_ALIAS}/${ORC_FAILED_HOST}
Orchestrator hooks

● Example of "query" file
{{ if keyExists "mysql/master/testcluster/hostname" }}
DELETE FROM mysql_servers where hostgroup_id=10;
REPLACE into mysql_servers (hostgroup_id, hostname) values ( 10, " {{ key "mysql/master/testcluster/hostname" }}" );
{{ end }}
DELETE FROM mysql_servers where hostgroup_id=11;
{{ range tree "mysql/slave/testcluster" }}
REPLACE INTO mysql_servers (hostgroup_id, hostname) values ( 11, " {{ .Key }}{{ .Value }}" );
{{ end }}
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
● Example command
command = "/bin/bash -c 'mysql --defaults-file=/etc/proxysql-admin.my.cnf <
/opt/consul-template/templates/proxysql.sql'"
command_timeout = "60s"
Consul-template examples

Single region solution

● WePay
https://wecode.wepay.com/posts/highly-available-mysql-clusters-at-
wepay
● Implementation done by our sibling-team
● Chosen for HAProxy over ProxySQL because they needed
end-to-end SSL encryption
● ProxySQL 1.4.x does not support client-side SSL
● ProxySQL 2.x does support full end-to-end SSL encryption but was not
GA yet at time of implementation.
Publicly shared example

● We would want to create a multi-region setup
● Will be able to work on read-heavy environment that can tolerate
some "stale" reads.
● Master will still be a single instance that runs in one region.
● 1 or more slave(s) in each region
● ProxySQL cluster with internal load balancer in each region,
applications connect to local proxy cluster
● Orchestrator and Consul server can be scale to 1 instance per
region (recommended max 3 instances so not every region might
have an instance)
Where to go from here?

Theoretical Multi-region example

Contact
Matthias Crauwels
crauwels@pythian.com
+1 (613) 565-8696 ext. 1215

Deploying MariaDB for HA on Google Cloud Platform

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Deploying MariaDB for HA on Google Cloud Platform

Similar to Deploying MariaDB for HA on Google Cloud Platform (20)

More from MariaDB plc

More from MariaDB plc (20)

Recently uploaded

Recently uploaded (20)

Deploying MariaDB for HA on Google Cloud Platform