MySQL HAwith PaceMaker    Kris Buytaert   #opendbcamp
Kris Buytaert●   I used to be a Dev, Then Became an Op,●   Today I feel like a dev again●   Senior Linux and Open Source C...
In this presentation●   High Availability ?●   MySQL HA Solutions●   Linux HA / Pacemaker
What is HA Clustering ?●   One service goes down     => others take over its work●   IP address takeover, service takeover...
Lies, Damn Lies, andStatistics         Counting nines            (slide by Alan R) 99.9999%                        30 sec ...
The Rules of HA●   Keep it Simple●   Keep it Simple●   Prepare for Failure●   Complexity is the enemy of reliability●   Te...
Eliminating the SPOF●   Find out what Will Fail    •   Disks    •   Fans    •   Power (Supplies)●   Find out what Can Fail...
Data vs Connection●   DATA :    •   Replication    •   Shared storage    •   DRBD●   Connection    •   LVS    •   Proxy   ...
Shared Storage●   1 MySQL instance●   Monitor MySQL node●   Stonith●   $$$              1+1 <> 2●   Storage = SPOF●   Spli...
DRBD●   Distributed Replicated Block Device●   In the Linux Kernel●   Usually only 1 mount    •   Multi mount as of 8.X   ...
DRBD(2)●   What happens when you pull the plug of a    Physical machine ?    •   Minimal Timeout    •   Why did the crash ...
Other Solutions Today●   MySQL Cluster NDBD●   Multi Master Replication●   MySQL Proxy●   MMM●   Flipper●   BYO●   ....
Pulling Traffic●   Eg. for Cluster, MultiMaster setups    •   DNS    •   Advanced Routing    •   LVS    •   Or the upcomin...
Linux-HA PaceMaker●   Plays well with others●   Manages more than MySQL●●   ...v3 .. dont even think about the rest anymor...
Heartbeat v1•   Max 2 nodes•   No finegrained resources•   Monitoring using “mon”/etc/ha.d/ha.cf/etc/ha.d/haresourcesmdb-a...
Heartbeat v2 •   Stability issues •   Forking ?“A consulting Opportunity”                             LMB
Clone ResourceClones in v2 were buggyResources were started on 2 nodesStopped again on “1”
Heartbeat v3•   No more /etc/ha.d/haresources•   No more xml•   Better integrated monitoring•   /etc/ha.d/ha.cf has•   crm...
Pacemaker ?●   Not a fork●   Only CRM Code taken out of Heartbeat●   As of Heartbeat 2.1.3    •   Support for both OpenAIS...
Heartbeat, OpenAis,Corosync ?●   All Messaging Layers●   Initially only Heartbeat●   OpenAIS●   Heartbeat got unmaintained...
PacemakerHeartbeat       or         OpenAIS            Cluster Glue
●   Stonithd : The Heartbeat fencing subsystem.Pacemaker Architecture            ●   Lrmd : Local Resource Management Daem...
Configuring Heartbeat Correctlyheartbeat::hacf {"clustername":         hosts => ["host-a","host-b"],         hb_nic => ["b...
CRM                          configure                          property $id="cib­bootstrap­options" ●   Cluster Resource ...
Heartbeat Resources●   LSB●   Heartbeat resource (+status)●   OCF (Open Cluster FrameWork) (+monitor)●   Clones (dont use ...
LSB Resource Agents●   LSB == Linux Standards Base●   LSB resource agents are standard System V-    style init scripts com...
OCF●   OCF == Open Cluster Framework●   OCF Resource agents are the most powerful type of    resource agent we support●   ...
Monitoring●   Defined in the OCF Resource script●   Configured in the parameters●   You have to support multiple states   ...
Anatomy of a Clusterconfig•   Cluster properties•   Resource Defaults•   Primitive Definitions•   Resource Groups and Cons...
Cluster Propertiesproperty $id="cib-bootstrap-options"      stonith-enabled="FALSE"      no-quorum-policy="ignore"      st...
Resource Defaultsrsc_defaults $id="rsc_defaults-options"      migration-threshold="1"      failure-timeout="1"      resour...
Primitive Definitionsprimitive d_mine ocf:custom:tomcat      params instance_name="mine"      monitor_urls="health.html"  ...
Parsing a config●   Isnt always done correctly●   Even a verify wont find all issues●   Unexpected behaviour might occur
Where a resource runs•   multi state resources    •  Master – Slave ,       •   e.g mysql master-slave, drbd•   Clones    ...
eg. A Service on DRBD●   DRBD can only be active on 1 node●   The filesystem needs to be mounted on that    active DRBD no...
A MySQL Resource●   OCF    •   Clone        •   Where do you hook up the IP ?    •   Multi State        •   But we have Ma...
Simple 2 node exampleprimitive d_mysql ocf:ntc:mysql      op monitor interval="30s"      params test_user="just" test_pass...
Monitor your Setup●   Not just connectivity●   Also functional    •   Query data    •   Check resultset is correct●   Chec...
How to deal with replication state ?●   Multiple slaves    •   Use Drbd ocf resource●   2 masters only use own script     ...
Adding MySQL to thestack                     Replication  Service IP MySQL  “MySQLd”                          “MySQLd”   R...
Pitfalls & Solutions●   Monitor,    •   Replication state    •   Replication Lag●   MaatKit●   OpenARK
Conclusion●   Plenty of Alternatives●   Think about your Data●   Think about getting Queries to that Data●   Complexity is...
ContactKris Buytaert Kris.Buytaert@inuits.beFurther Reading@KrisBuytaerthttp://www.krisbuytaert.be/blog/http://www.inuits....
Upcoming SlideShare
Loading in...5
×

MySQL HA with Pacemaker

5,094

Published on

My opendbcamp 2011 presentation on Pacemaker and MySQL opportunities

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
5,094
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
130
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

MySQL HA with Pacemaker

  1. 1. MySQL HAwith PaceMaker Kris Buytaert #opendbcamp
  2. 2. Kris Buytaert● I used to be a Dev, Then Became an Op,● Today I feel like a dev again● Senior Linux and Open Source Consultant @inuits.be● „Infrastructure Architect“● Building Clouds since before the Cloud● Surviving the 10th floor test● Co-Author of some books● Guest Editor at some sites
  3. 3. In this presentation● High Availability ?● MySQL HA Solutions● Linux HA / Pacemaker
  4. 4. What is HA Clustering ?● One service goes down => others take over its work● IP address takeover, service takeover,● Not designed for high-performance● Not designed for high troughput (load balancing)
  5. 5. Lies, Damn Lies, andStatistics Counting nines (slide by Alan R) 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9  hr   99% 3.5 day
  6. 6. The Rules of HA● Keep it Simple● Keep it Simple● Prepare for Failure● Complexity is the enemy of reliability● Test your HA setup
  7. 7. Eliminating the SPOF● Find out what Will Fail • Disks • Fans • Power (Supplies)● Find out what Can Fail • Network • Going Out Of Memory
  8. 8. Data vs Connection● DATA : • Replication • Shared storage • DRBD● Connection • LVS • Proxy • Heartbeat / Pacemaker
  9. 9. Shared Storage● 1 MySQL instance● Monitor MySQL node● Stonith● $$$ 1+1 <> 2● Storage = SPOF● Split Brain :(
  10. 10. DRBD● Distributed Replicated Block Device● In the Linux Kernel● Usually only 1 mount • Multi mount as of 8.X • Requires GFS / OCFS2● Regular FS ext3 ...● Only 1 MySQL instance Active accessing data● Upon Failover MySQL needs to be started on other node
  11. 11. DRBD(2)● What happens when you pull the plug of a Physical machine ? • Minimal Timeout • Why did the crash happen ? • Is my data still correct ? • Innodb Consistency Checks ? • Lengthy ? • Check your BinLog size
  12. 12. Other Solutions Today● MySQL Cluster NDBD● Multi Master Replication● MySQL Proxy● MMM● Flipper● BYO● ....
  13. 13. Pulling Traffic● Eg. for Cluster, MultiMaster setups • DNS • Advanced Routing • LVS • Or the upcoming slides
  14. 14. Linux-HA PaceMaker● Plays well with others● Manages more than MySQL●● ...v3 .. dont even think about the rest anymore●● http://clusterlabs.org/
  15. 15. Heartbeat v1• Max 2 nodes• No finegrained resources• Monitoring using “mon”/etc/ha.d/ha.cf/etc/ha.d/haresourcesmdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 IPaddr2::10.16.0.13/16/bond0.16 mon/etc/ha.d/authkeys
  16. 16. Heartbeat v2 • Stability issues • Forking ?“A consulting Opportunity” LMB
  17. 17. Clone ResourceClones in v2 were buggyResources were started on 2 nodesStopped again on “1”
  18. 18. Heartbeat v3• No more /etc/ha.d/haresources• No more xml• Better integrated monitoring• /etc/ha.d/ha.cf has• crm=yes
  19. 19. Pacemaker ?● Not a fork● Only CRM Code taken out of Heartbeat● As of Heartbeat 2.1.3 • Support for both OpenAIS / HeartBeat • Different Release Cycles as Heartbeat
  20. 20. Heartbeat, OpenAis,Corosync ?● All Messaging Layers● Initially only Heartbeat● OpenAIS● Heartbeat got unmaintained● OpenAIS had heisenbugs :(● Corosync● Heartbeat maintenance taken over by LinBit● CRM Detects which layer
  21. 21. PacemakerHeartbeat or OpenAIS Cluster Glue
  22. 22. ● Stonithd : The Heartbeat fencing subsystem.Pacemaker Architecture ● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts). ● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration. ● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes. ● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster. ● openais messaging and membership layer. ● heartbeat messaging layer, an alternative to OpenAIS. ● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
  23. 23. Configuring Heartbeat Correctlyheartbeat::hacf {"clustername": hosts => ["host-a","host-b"], hb_nic => ["bond0"], hostip1 => ["10.0.128.11"], hostip2 => ["10.0.128.12"], ping => ["10.0.128.4"], }heartbeat::authkeys {"ClusterName": password => “ClusterName ", }http://github.com/jtimberman/puppet/tree/master/heartbeat/
  24. 24. CRM configure property $id="cib­bootstrap­options" ● Cluster Resource         stonith­enabled="FALSE"          no­quorum­policy=ignore  Manager         start­failure­is­fatal="FALSE"  rsc_defaults $id="rsc_defaults­options"          migration­threshold="1" ● Keeps Nodes in Sync         failure­timeout="1" primitive d_mysql ocf:local:mysql          op monitor interval="30s"          params test_user="sure" test_passwd="illtell" ● XML Based test_table="test.table" primitive ip_db ocf:heartbeat:IPaddr2          params ip="172.17.4.202" nic="bond0" ● cibadm         op monitor interval="10s" group svc_db d_mysql ip_db commit● Cli manageable● Crm
  25. 25. Heartbeat Resources● LSB● Heartbeat resource (+status)● OCF (Open Cluster FrameWork) (+monitor)● Clones (dont use in HAv2)● Multi State Resources
  26. 26. LSB Resource Agents● LSB == Linux Standards Base● LSB resource agents are standard System V- style init scripts commonly used on Linux and other UNIX-like OSes● LSB init scripts are stored under /etc/init.d/● This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages which come with their own init script● Its straightforward to change an LSB script to an OCF script
  27. 27. OCF● OCF == Open Cluster Framework● OCF Resource agents are the most powerful type of resource agent we support● OCF RAs are extended init scripts • They have additional actions: • monitor – for monitoring resource health • meta-data – for providing information about the RA● OCF RAs are located in /usr/lib/ocf/resource.d/provider-name/
  28. 28. Monitoring● Defined in the OCF Resource script● Configured in the parameters● You have to support multiple states • Not running • Running • Failed
  29. 29. Anatomy of a Clusterconfig• Cluster properties• Resource Defaults• Primitive Definitions• Resource Groups and Constraints
  30. 30. Cluster Propertiesproperty $id="cib-bootstrap-options" stonith-enabled="FALSE" no-quorum-policy="ignore" start-failure-is-fatal="FALSE" No-quorum-policy = Well ignore the loss of quorum on a 2 node clusterStart-failure : When set to FALSE, the cluster will instead use the resources failcount and value for resource-failure-stickiness
  31. 31. Resource Defaultsrsc_defaults $id="rsc_defaults-options" migration-threshold="1" failure-timeout="1" resource-stickiness="INFINITY"failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to thenode on which it failed.Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeResource-stickiness=INFINITY means that the resource really wants to stay where it is now.
  32. 32. Primitive Definitionsprimitive d_mine ocf:custom:tomcat params instance_name="mine" monitor_urls="health.html" monitor_use_ssl="no" op monitor interval="15s" on-fail="restart" primitive ip_mine_svc ocf:heartbeat:IPaddr2 params ip="10.8.4.131" cidr_netmask="16" nic="bond0" op monitor interval="10s"
  33. 33. Parsing a config● Isnt always done correctly● Even a verify wont find all issues● Unexpected behaviour might occur
  34. 34. Where a resource runs• multi state resources • Master – Slave , • e.g mysql master-slave, drbd• Clones • Resources that can run on multiple nodes e.g • Multimaster mysql servers • Mysql slaves • Stateless applications• location • Preferred location to run resource, eg. Based on hostname• colocation • Resources that have to live together • e.g ip address + service• order Define what resource has to start first, or wait for another resource• groups • Colocation + order
  35. 35. eg. A Service on DRBD● DRBD can only be active on 1 node● The filesystem needs to be mounted on that active DRBD nodegroup svc_mine d_mine ip_minems ms_drbd_storage drbd_storage meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1"notify="true"colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Masterorder fs_after_drbd inf: ms_drbd_storage:promote svc_mine:startlocation cli-prefer-svc_db svc_db rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a
  36. 36. A MySQL Resource● OCF • Clone • Where do you hook up the IP ? • Multi State • But we have Master Master replication • Meta Resource • Dummy resource that can monitor • Connection • Replication state
  37. 37. Simple 2 node exampleprimitive d_mysql ocf:ntc:mysql op monitor interval="30s" params test_user="just" test_passwd="kidding" test_table="really"primitive ip_mysql_svc ocf:heartbeat:IPaddr2 params ip="10.8.0.30" cidr_netmask="255.255.255.0"nic="bond0" op monitor interval="10s"group svc_mysql d_mysql ip_mysql_svc
  38. 38. Monitor your Setup● Not just connectivity● Also functional • Query data • Check resultset is correct● Check replication • MaatKit • OpenARK
  39. 39. How to deal with replication state ?● Multiple slaves • Use Drbd ocf resource● 2 masters only use own script • Replication is slow on the active node • Shouldnt happen talk to HR / cfgmt people • Replication is slow on the passive node • Weight-- • Replication breaks on the active node send out warning, dont modify weights and check other node • Replication breaks on the passive node • Fence of the passive node
  40. 40. Adding MySQL to thestack Replication Service IP MySQL “MySQLd” “MySQLd” Resource MySQL Cluster Stack Pacemaker HeartBeat Node A Node B Hardware
  41. 41. Pitfalls & Solutions● Monitor, • Replication state • Replication Lag● MaatKit● OpenARK
  42. 42. Conclusion● Plenty of Alternatives● Think about your Data● Think about getting Queries to that Data● Complexity is the enemy of reliability● Keep it Simple● Monitor inside the DB
  43. 43. ContactKris Buytaert Kris.Buytaert@inuits.beFurther Reading@KrisBuytaerthttp://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.virtualization.com/http://www.oreillygmt.com/ Inuits Esquimaux t Hemeltje Kheops Business Gemeentepark 2 Center 2930 Brasschaat Avenque Georges 891.514.231 Lemaître 54 6041 Gosselies +32 473 441 636 889.780.406 +32 495 698 668
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×