Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION

140 views

Published on

MySQL User camp 13th Feb 2019, Experience with MySQL HA solutions and Group Replication by Santhinesh Nagendran, Sr. DBA Tesla CA.

Published in: Engineering
  • Be the first to comment

EXPERIENCE WITH MYSQL HA SOLUTION AND GROUP REPLICATION

  1. 1. Experience with MySQL HA solutions and Group Replication
  2. 2. Who am I ? o Santhinesh Kumar Nagendran o Currently working as Senior Database Administrator @ Tesla Inc. Over 12 years Industry experience in supporting environments like healthcare, social networking applications like AOL, IBIBO, Sify etc. I primarily focus on Database High availability and DB automations at large scale.
  3. 3. Agenda 1. Why HA ? 2. HA Objectives 3. MySQL HA Solutions 4. Why MySQL GR ? 5. Implementation 6. Conclusion
  4. 4. Why HA ? u Continuation of Services with minimal or no interruptions u Improve Operations Standards by u Hardware Upgrades ( Memory/CPU Upgrades ) u OS Security patches u To meet application/Business/Customer SLAs
  5. 5. HA Objectives v How much reliable is your HA solution ? v Can we afford the complexity to fix issues caused by improper failover ? v Cost associated with no/manual failover vs fixing unexpected improper failover ? v Do we have the skill set to support the HA solution implemented
  6. 6. MySQL HA Solutions ü Master - Master Replication with HA proxy ü MySQL MHA with Keepalived ü MySQL MHA ü InnodB Cluster
  7. 7. M - M Replication with HA proxy S1 S2 S3 S1 S2 S3 crashed HA2HA1 HA2HA1 F5 BigIP F5 BigIP Master / RW Slave1 / RO Slave2 / RO Master / RW Slave2 / RO Application R/W Traffic Application R/W Traffic Application RO Traffic Application RO Traffic Users/App Users/App
  8. 8. M - M Replication with HA proxy Good Bad Seamlessly failover happened when primary became inaccessible Connections goes back to old primary if it comes back online in read-write mode New connections went to new master without any user interruptions Need to keep both the Master-Master servers in read-write mode all the time Read Write split using respective TCP port Very high probability to have accidental writes on both the servers Repointing of Replication to new master Fixing data is a big mess
  9. 9. Ø Too Many HA proxy Servers to handle when deployed in large scale Ø Not a cost effective as it needed 2 HA proxy servers for each 3 node clusters Ø HA proxy is not technically designed for MySQL/Database alone Ø Need to remove old server from the config file immediately after a failover happens to avoid failback when the failed server comes back online. Ø NON-DB components for customer to go through to reach the database F5 HA proxy DB Server Existing Drawbacks and future Requirements
  10. 10. MySQL MHA with Keepalived S1 S2 S3 S1 S2 S3 crashed Alias to MHA VIP Alias to MHA VIP Master / RW Slave1 / RO Slave2 / RO Master / RW Slave2 / RO Keepalived VIP Keepalived Service should be running in Master and Candidate Masters Keepalived VIP MHA does failover by stopping keepalived in old master Users/App Users/App
  11. 11. MySQL MHA with Keepalived Good Bad Seamlessly failover happened when primary became inaccessible MHA manager demon stops working to avoid another failover so DBA is asked to verify each failovers New connections went to new master without any user interruptions Not a fully automatic solution it requires necessary manual interventions Corrupt server goes out of cluster by itself If the server goes unreachable due to firewall issue Keeps only one server in [ read—write mode ] rest all the servers will be or should be in read-only mode Keepalived also fails over independently Manual Failover is possible keeping existing master alive or dead
  12. 12. Existing Drawbacks and future Requirements Ø Non-Standard / Custom Monitoring required to monitor components and failures Ø Needed proper inventory and automations to support MHA clusters in large scale Ø Too many false failovers due to keepalived due to network glitches Ø Too many components for customer to deal with in a HA setup F5 Keepalived MHA DB Server
  13. 13. MySQL MHA with F5 S1 S2 S3 S1 S2 S3 crashed F5 BigIP F5 BigIP Master / RW Slave1 / RO Slave2 / RO Master / RW Slave2 / RO F5 checks for read_only parameter to be off to send traffic to prod F5 checks for read_only parameter to be off to send traffic to prod MHA does failover without any other VIP involved Users/App Users/App
  14. 14. MySQL MHA with F5 Good Bad Seamlessly failover happened when primary became inaccessible MHA manager demon stops working to avoid another failover so its DBA JOB to verify each failovers completely F5 checks for server in read-write mode New connections went to new master without any user interruptions Not a fully automatic solution it requires necessary manual interventions Corrupt server goes out of cluster by itself non standard / custom monitoring components like mha_manager etc. Keeps only one server in [ read—write mode ] rest all the servers will be in read- only mode. Complicated Setup to support in large scale Manual Failover is possible keeping existing master alive or dead
  15. 15. Existing Drawbacks and future Requirements Ø Non-Standard / Custom Monitoring required to alter failures Ø Needed proper inventory and automations to support MHA clusters in large scale Ø Too many false failovers due to keepalived due to network glitches Ø Too many components for customer to deal with in a HA setup F5 MHA DB Server
  16. 16. S1S2 S3 F5 BigIP primarysecondary Secondary InnodB Cluster RT2RT1 Read-Write Read-OnlyRead-Only 3307 port 3306 port Users/App
  17. 17. InnodB Cluster Good NOTES • Powered by mysql shell. • mysqlsh makes setting innodb cluster is extremely easy Deafult user authentication plugin change from mysql_native_password (5.7) to caching_sha2_password (8.0) Mysqlrouter servers can support multiple innodb clusters which is a great relief Replication between multi zonal clusters can be challenging when a failover happens util.checkForServerUpgrade() makes DBAs life so easy and saves hell lots of time Can have replication between multiple innodb clusters Filtered multi-master replication
  18. 18. o Have lots of ways to monitoring o Can get cluster status using mysql shell o Can fetch the cluster status from performance_schema. replication_group_members o If we store the clusters is a proper inventory we can monitor respective clusters and setup alerts for events like o If a node gets out in a 3 node cluster then cluster status goes to OK_NO_TOLERANCE o Can setup alerts when number of active group members are not equal to the number of servers involved in that cluster as per inventory How do you Monitor ?
  19. 19. Conclusion o Group Replication has always been one of the best inhouse product for MySQL o Empowered by Mysqlrouter and MySQL shell utilities o One of the best and stable HA’s I have worked on till now.

×