MySQL HA with Pacemaker


Published on

My opendbcamp 2011 presentation on Pacemaker and MySQL opportunities

Published in: Technology
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

MySQL HA with Pacemaker

  1. 1. MySQL HAwith PaceMaker Kris Buytaert #opendbcamp
  2. 2. Kris Buytaert● I used to be a Dev, Then Became an Op,● Today I feel like a dev again● Senior Linux and Open Source Consultant● „Infrastructure Architect“● Building Clouds since before the Cloud● Surviving the 10th floor test● Co-Author of some books● Guest Editor at some sites
  3. 3. In this presentation● High Availability ?● MySQL HA Solutions● Linux HA / Pacemaker
  4. 4. What is HA Clustering ?● One service goes down => others take over its work● IP address takeover, service takeover,● Not designed for high-performance● Not designed for high troughput (load balancing)
  5. 5. Lies, Damn Lies, andStatistics Counting nines (slide by Alan R) 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9  hr   99% 3.5 day
  6. 6. The Rules of HA● Keep it Simple● Keep it Simple● Prepare for Failure● Complexity is the enemy of reliability● Test your HA setup
  7. 7. Eliminating the SPOF● Find out what Will Fail • Disks • Fans • Power (Supplies)● Find out what Can Fail • Network • Going Out Of Memory
  8. 8. Data vs Connection● DATA : • Replication • Shared storage • DRBD● Connection • LVS • Proxy • Heartbeat / Pacemaker
  9. 9. Shared Storage● 1 MySQL instance● Monitor MySQL node● Stonith● $$$ 1+1 <> 2● Storage = SPOF● Split Brain :(
  10. 10. DRBD● Distributed Replicated Block Device● In the Linux Kernel● Usually only 1 mount • Multi mount as of 8.X • Requires GFS / OCFS2● Regular FS ext3 ...● Only 1 MySQL instance Active accessing data● Upon Failover MySQL needs to be started on other node
  11. 11. DRBD(2)● What happens when you pull the plug of a Physical machine ? • Minimal Timeout • Why did the crash happen ? • Is my data still correct ? • Innodb Consistency Checks ? • Lengthy ? • Check your BinLog size
  12. 12. Other Solutions Today● MySQL Cluster NDBD● Multi Master Replication● MySQL Proxy● MMM● Flipper● BYO● ....
  13. 13. Pulling Traffic● Eg. for Cluster, MultiMaster setups • DNS • Advanced Routing • LVS • Or the upcoming slides
  14. 14. Linux-HA PaceMaker● Plays well with others● Manages more than MySQL●● ...v3 .. dont even think about the rest anymore●●
  15. 15. Heartbeat v1• Max 2 nodes• No finegrained resources• Monitoring using “mon”/etc/ha.d/ ntc-restart-mysql mon IPaddr2:: IPaddr2:: mon/etc/ha.d/authkeys
  16. 16. Heartbeat v2 • Stability issues • Forking ?“A consulting Opportunity” LMB
  17. 17. Clone ResourceClones in v2 were buggyResources were started on 2 nodesStopped again on “1”
  18. 18. Heartbeat v3• No more /etc/ha.d/haresources• No more xml• Better integrated monitoring• /etc/ha.d/ has• crm=yes
  19. 19. Pacemaker ?● Not a fork● Only CRM Code taken out of Heartbeat● As of Heartbeat 2.1.3 • Support for both OpenAIS / HeartBeat • Different Release Cycles as Heartbeat
  20. 20. Heartbeat, OpenAis,Corosync ?● All Messaging Layers● Initially only Heartbeat● OpenAIS● Heartbeat got unmaintained● OpenAIS had heisenbugs :(● Corosync● Heartbeat maintenance taken over by LinBit● CRM Detects which layer
  21. 21. PacemakerHeartbeat or OpenAIS Cluster Glue
  22. 22. ● Stonithd : The Heartbeat fencing subsystem.Pacemaker Architecture ● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts). ● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration. ● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes. ● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster. ● openais messaging and membership layer. ● heartbeat messaging layer, an alternative to OpenAIS. ● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
  23. 23. Configuring Heartbeat Correctlyheartbeat::hacf {"clustername": hosts => ["host-a","host-b"], hb_nic => ["bond0"], hostip1 => [""], hostip2 => [""], ping => [""], }heartbeat::authkeys {"ClusterName": password => “ClusterName ", }
  24. 24. CRM configure property $id="cib­bootstrap­options" ● Cluster Resource         stonith­enabled="FALSE"          no­quorum­policy=ignore  Manager         start­failure­is­fatal="FALSE"  rsc_defaults $id="rsc_defaults­options"          migration­threshold="1" ● Keeps Nodes in Sync         failure­timeout="1" primitive d_mysql ocf:local:mysql          op monitor interval="30s"          params test_user="sure" test_passwd="illtell" ● XML Based test_table="test.table" primitive ip_db ocf:heartbeat:IPaddr2          params ip="" nic="bond0" ● cibadm         op monitor interval="10s" group svc_db d_mysql ip_db commit● Cli manageable● Crm
  25. 25. Heartbeat Resources● LSB● Heartbeat resource (+status)● OCF (Open Cluster FrameWork) (+monitor)● Clones (dont use in HAv2)● Multi State Resources
  26. 26. LSB Resource Agents● LSB == Linux Standards Base● LSB resource agents are standard System V- style init scripts commonly used on Linux and other UNIX-like OSes● LSB init scripts are stored under /etc/init.d/● This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages which come with their own init script● Its straightforward to change an LSB script to an OCF script
  27. 27. OCF● OCF == Open Cluster Framework● OCF Resource agents are the most powerful type of resource agent we support● OCF RAs are extended init scripts • They have additional actions: • monitor – for monitoring resource health • meta-data – for providing information about the RA● OCF RAs are located in /usr/lib/ocf/resource.d/provider-name/
  28. 28. Monitoring● Defined in the OCF Resource script● Configured in the parameters● You have to support multiple states • Not running • Running • Failed
  29. 29. Anatomy of a Clusterconfig• Cluster properties• Resource Defaults• Primitive Definitions• Resource Groups and Constraints
  30. 30. Cluster Propertiesproperty $id="cib-bootstrap-options" stonith-enabled="FALSE" no-quorum-policy="ignore" start-failure-is-fatal="FALSE" No-quorum-policy = Well ignore the loss of quorum on a 2 node clusterStart-failure : When set to FALSE, the cluster will instead use the resources failcount and value for resource-failure-stickiness
  31. 31. Resource Defaultsrsc_defaults $id="rsc_defaults-options" migration-threshold="1" failure-timeout="1" resource-stickiness="INFINITY"failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to thenode on which it failed.Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeResource-stickiness=INFINITY means that the resource really wants to stay where it is now.
  32. 32. Primitive Definitionsprimitive d_mine ocf:custom:tomcat params instance_name="mine" monitor_urls="health.html" monitor_use_ssl="no" op monitor interval="15s" on-fail="restart" primitive ip_mine_svc ocf:heartbeat:IPaddr2 params ip="" cidr_netmask="16" nic="bond0" op monitor interval="10s"
  33. 33. Parsing a config● Isnt always done correctly● Even a verify wont find all issues● Unexpected behaviour might occur
  34. 34. Where a resource runs• multi state resources • Master – Slave , • e.g mysql master-slave, drbd• Clones • Resources that can run on multiple nodes e.g • Multimaster mysql servers • Mysql slaves • Stateless applications• location • Preferred location to run resource, eg. Based on hostname• colocation • Resources that have to live together • e.g ip address + service• order Define what resource has to start first, or wait for another resource• groups • Colocation + order
  35. 35. eg. A Service on DRBD● DRBD can only be active on 1 node● The filesystem needs to be mounted on that active DRBD nodegroup svc_mine d_mine ip_minems ms_drbd_storage drbd_storage meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1"notify="true"colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Masterorder fs_after_drbd inf: ms_drbd_storage:promote svc_mine:startlocation cli-prefer-svc_db svc_db rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a
  36. 36. A MySQL Resource● OCF • Clone • Where do you hook up the IP ? • Multi State • But we have Master Master replication • Meta Resource • Dummy resource that can monitor • Connection • Replication state
  37. 37. Simple 2 node exampleprimitive d_mysql ocf:ntc:mysql op monitor interval="30s" params test_user="just" test_passwd="kidding" test_table="really"primitive ip_mysql_svc ocf:heartbeat:IPaddr2 params ip="" cidr_netmask=""nic="bond0" op monitor interval="10s"group svc_mysql d_mysql ip_mysql_svc
  38. 38. Monitor your Setup● Not just connectivity● Also functional • Query data • Check resultset is correct● Check replication • MaatKit • OpenARK
  39. 39. How to deal with replication state ?● Multiple slaves • Use Drbd ocf resource● 2 masters only use own script • Replication is slow on the active node • Shouldnt happen talk to HR / cfgmt people • Replication is slow on the passive node • Weight-- • Replication breaks on the active node send out warning, dont modify weights and check other node • Replication breaks on the passive node • Fence of the passive node
  40. 40. Adding MySQL to thestack Replication Service IP MySQL “MySQLd” “MySQLd” Resource MySQL Cluster Stack Pacemaker HeartBeat Node A Node B Hardware
  41. 41. Pitfalls & Solutions● Monitor, • Replication state • Replication Lag● MaatKit● OpenARK
  42. 42. Conclusion● Plenty of Alternatives● Think about your Data● Think about getting Queries to that Data● Complexity is the enemy of reliability● Keep it Simple● Monitor inside the DB
  43. 43. ContactKris Buytaert Kris.Buytaert@inuits.beFurther Reading@KrisBuytaert Inuits Esquimaux t Hemeltje Kheops Business Gemeentepark 2 Center 2930 Brasschaat Avenque Georges 891.514.231 Lemaître 54 6041 Gosselies +32 473 441 636 889.780.406 +32 495 698 668