ProxySQL provides native support for high availability solutions like PXC, InnoDB Cluster, and regular MySQL replication. It can monitor the health of nodes and redirect traffic away from unavailable or stale nodes, improving availability. It supports various topologies out of the box through host groups, health checks, and failure detection. ProxySQL helps implement robust HA architectures by integrating these functions and allowing automatic traffic redirection based on node status.
1. 1
Robust HA Solutions - Native Support
for PXC and InnoDB Cluster in
ProxySQL
Marco Tusa
Percona
2. 2
2
• Open source enthusiast
• Principal architect
• Working in DB world over 25 years
• Open source developer and community contributor
About Me
3. 3
What this talk is about
High level description of High Availability
HA solutions
Misses from current vanilla solutions
Where ProxySQL fits
It is NOT about Performance
It is NOT an in depth description of HA or ProxySQL feature
It is NOT an implementation guide
4. 4
• The need and dimension of HA or DR is related to the real
need of your business.
• We are pathologically online/connected, and often we
expect to have over dimensioned HA or DR.
• Business needs often do not require all the components to
be ALWAYS available.
How We Need to think about HA and DR
5. 5
Why We Need HA and DR
Do:
• Business needs
• Technical challenges
• Supportable solutions
• Knowhow
Don’t:
• Choose based on the “shiny object”
• Pick something you know nothing about
• Choose by yourself and push it up or down
• Use shortcuts, to accelerate deploying
time.
The first step to have a robust solution is to design the right solution for
your business.
6. 6
• High Availability is to
cover service
availability in a location
• 10 Gb Ethernet Best case
scenario
400 mt distance Max
HA vs DR
7. 7
• Disaster Recovery is to assure
we can restore service, in a
geographically different location
• Real speed may vary
• Linear distance ~1000Km
HA vs DR
8. 8
HA vs DR
DO NOT MIX
the solutions in your architecture
http://www.tusacentral.com/joomla/index.php/mysql-blogs/204-how-not-to-do-mysql-high-availability-geographic-node-distribution-with-galera-based-replication-misuse
http://www.tusacentral.com/joomla/index.php/mysql-blogs/205-mysql-high-availability-on-premises-a-geographically-distributed-scenario
9. 9
Replicate data is the key - Sync VS Async
1
Data state
3
Different
Data state
10. 10
Data Replication is the Base
Tightly coupled database clusters
• Datacentric approach (single state
of the data, distributed commit)
• Data is consistent in time cross
nodes
• Replication requires high
performant link
• Geographic distribution is
forbidden
• DR is not supported
Loosely coupled database clusters
• Single node approach (local commit)
• Data state differs by node
• Single node state does not affect the
cluster
• Replication link doesn’t need to be
high performance
• Geographic distribution is allow
• DR is supported
11. 11
In MySQL ecosystem we have different solutions
• NDB
• Galera replication
• InnoDB group replication
• Basic Replication
Different solutions
13. 13
Good thing:
• Very High availability (five 9s)
• Write scale (for real not a fake)
• Read scale
• Data is in sync all the time
Bad things:
• Too be very efficient should
stay in memory as much as possible
• Complex to manage
• Require deep understanding also from application point of view
Understand solutions: NDB
14. 14
Replication by internal design
Good thing:
• Highly available
• Read scale
• Data is in almost in sync
all the time
Bad things:
• Doesn’t Scale writes
• More nodes more internal
overhead
• Network is a very impacting factor
• 1 Master only (PLEASE!!!!!)
Understand solutions: Galera
15. 15
Understand solutions: Group replication
Replication by standard MySQL
Good thing:
• Highly available
• Read scale
• Data is in almost in sync
all the time
• More network tolerant
Bad things:
• Doesn’t Scale writes
• 1 Master only (PLEASE!!!!!)
When something ends to be a new things?
16. 16
Understand solutions: Replication
Replication by internal design
Good thing:
• Each node is independent
• Read scale
• Low network impact
Bad things:
• Stale reads
• Low HA
• Consistent only in the master
• Each node has its own data state
17. 17
1. No embedded R/W split
• Writes and reads are bound to 1 node
• Failing read channel will cause total failure
2. No recognition of possible stale reads
• In case of replication delay, application can get wrong data
3. No node failure identification and no single entry point
• Architecture must implement a double mechanism to check nodes
status (topology manager) and entry point like VIP
• Application must use a connector that allow multiple entry point in case
of issue. (like Java connector)
What they have in common?
18. 18
1. Embedded R/W split by group of servers (Host Group)
• Writes and reads are automatically redirect to working node(s)
2. Identify stale node with the binary log reader
• In case of replication delay, delayed node will not receive the read
requests
3. Embedded node failure, automatic redirection
• Proxy provide native status recognition for Galera and Group
replication
• Only when using basic replication, you need a topology manager
• Application connect to ProxySQL no need to add VIP
What ProxySQL can do to be better
19. 19
Native health checks
read_only
wsrep_local_recv_queue
wsrep_desync
wsrep_reject_queries
wsrep_sst_donor_rejects_queries
primary_partition
ProxySQL for Galera
• writer_hostgroup: the hostgroup ID that refers to the WRITER
• backup_writer_hostgroup: the hostgoup ID referring to the
Hostgorup that will contain the candidate servers
• reader_hostgroup: The reader Hostgroup ID, containing the list of
servers that need to be taken in consideration
• offline_hostgroup: The Hostgroup ID that will eventually contain the
writer that will be put OFFLINE
• active: True[1]/False[0] if this configuration needs to be used or not
• max_writers: This will contain the MAX number of writers you want
to have at the same time. In a sane setup this should be always 1,
but if you want to have multiple writers, you can define it up to the
number of nodes.
• writer_is_also_reader: If true [1] the Writer will NOT be removed
from the reader HG
• max_transactions_behind: The number
of wsrep_local_recv_queue after which the node will be set
OFFLINE. This must be carefully set, observing the node behaviour.
• comment: I suggest to put some meaningful notes to identify what is
what.
20. 20
I have these nodes
192.168.1.205 (Node1)
192.168.1.21 (Node2)
192.168.1.231 (node3)
ProxySQL for Galera
Will use these groups:
Writer HG-> 100
Reader HG-> 101
BackupW HG-> 102
offHG HG-> 9101
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205’,100,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.205',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.21',101,3306,1000);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight) VALUES ('192.168.1.231',101,3306,1000);
insert into mysql_galera_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup,
offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind) values
(100,102,101,9101,1,1,1,16);
24. 22
ProxySQL for Group Replication
Native health checks
Read_only
Viable_candidate
Transaction_behind
• writer_hostgroup: the hostgroup ID that refers to the WRITER
• backup_writer_hostgroup: the hostgoup ID referring to the
Hostgorup that will contain the candidate servers
• reader_hostgroup: The reader Hostgroup ID, containing the list of
servers that need to be taken in consideration
• offline_hostgroup: The Hostgroup ID that will eventually contain the
writer that will be put OFFLINE
• active: True[1]/False[0] if this configuration needs to be used or not
• max_writers: This will contain the MAX number of writers you want
to have at the same time. In a sane setup this should be always 1,
but if you want to have multiple writers, you can define it up to the
number of nodes.
• writer_is_also_reader: If true [1] the Writer will NOT be removed
from the reader HG
• max_transactions_behind: determines the maximum number of
transactions behind the writers that ProxySQL should allow before
shunning the node to prevent stale reads (this is determined by
querying the transactions_behind field of
the sys.gr_member_routing_candidate_status table in MySQL).
• comment: I suggest to put some meaningful notes to identify what is
what.
25. 23
I have these nodes
192.168.4.55 (Node1)
192.168.4.56 (Node2)
192.168.4.57 (node3)
ProxySQL for Group Replication
Will use these groups:
Writer HG-> 400
Reader HG-> 401
BackupW HG-> 402
offHG HG-> 9401
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.55',400,3306,10000,2000,'GR1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.55',401,3306,100,2000,'GR1');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.56',401,3306,10000,2000,'GR2');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment) VALUES ('192.168.4.57',401,3306,10000,2000,'GR2');
insert into mysql_group_replication_hostgroups (writer_hostgroup,backup_writer_hostgroup,reader_hostgroup,
offline_hostgroup,active,max_writers,writer_is_also_reader,max_transactions_behind)
values (400,402,401,9401,1,1,1,100);
select * from mysql_server_group_replication_log order by 3 desc,1 limit 3 ;
+--------------+------+------------------+-----------------+------------------+-----------+---------------------+-------+
| hostname | port | time_start_us | success_time_us | viable_candidate | read_only | transactions_behind | error |
+--------------+------+------------------+-----------------+------------------+-----------+---------------------+-------+
| 192.168.4.57 | 3306 | 1569593421324355 | 3085 | YES | YES | 0 | NULL |
| 192.168.4.56 | 3306 | 1569593421321825 | 2889 | YES | YES | 0 | NULL |
| 192.168.4.55 | 3306 | 1569593421321435 | 2764 | YES | NO | 0 | NULL |
+--------------+------+------------------+-----------------+------------------+-----------+---------------------+-------+
34. 27
ProxySQL for MySQL Replication
Native health checks
MySQL ping
read_only
Latency
Stale READS with GTID
delete from mysql_servers where hostgroup_id in (500,501);
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment)
VALUES ('10.0.0.50',500,3306,10000,2000,'DC1 writer');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment)
VALUES ('10.0.0.50',501,3306,10000,2000,'DC1 Reader');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment)
VALUES ('10.0.0.51',501,3306,10000,2000,'DC1 Reader');
INSERT INTO mysql_servers (hostname,hostgroup_id,port,weight,max_connections,comment)
VALUES ('10.0.0.52',501,3306,10000,2000,'DC1 Reader');
INSERT INTO mysql_replication_hostgroups VALUES (500,501,'read_only', 'Simple replica');
LOAD MYSQL SERVERS TO RUNTIME; SAVE MYSQL SERVERS TO DISK;
Easy to monitor
select b.weight, c.* from stats_mysql_connection_pool c left JOIN runtime_mysql_servers b ON
c.hostgroup=b.hostgroup_id and c.srv_host=b.hostname and c.srv_port = b.port where hostgroup in (500,501)
order by hostgroup,srv_host desc;
select * from mysql_server_read_only_log order by time_start_us desc limit 3;
+------------+------+------------------+-----------------+-----------+-------+
| hostname | port | time_start_us | success_time_us | read_only | error |
+------------+------+------------------+-----------------+-----------+-------+
| 10.0.0.52 | 3306 | 1569604127710729 | 895 | 1 | NULL |
| 10.0.0.51 | 3306 | 1569604127697866 | 542 | 1 | NULL |
| 10.0.0.50 | 3306 | 1569604127685005 | 795 | 0 | NULL |
+------------+------+------------------+-----------------+-----------+-------+
39. 29
ProxySQL is cool because:
• Allow you to better distribute the load
• Reduce the service downtime shifting to available node(s)
• In case of writer down reads are served
• With Binlog reader allow u to identify node with stale data also in basic
replication
• More stuff that we are not covering here…
• Multiplexing; sharding; masking; firewalling etc..
In short your HA solution will be more resilient and flexible, which
means that your business will be safer.
Conclusions