Advertisement
Advertisement

More Related Content

Advertisement
Advertisement

Recently uploaded(20)

MySQL HA with Pacemaker

  1. MySQL HA with PaceMaker Kris Buytaert #opendbcamp
  2. Kris Buytaert ● I used to be a Dev, Then Became an Op, ● Today I feel like a dev again ● Senior Linux and Open Source Consultant @inuits.be ● „Infrastructure Architect“ ● Building Clouds since before the Cloud ● Surviving the 10th floor test ● Co-Author of some books ● Guest Editor at some sites
  3. In this presentation ● High Availability ? ● MySQL HA Solutions ● Linux HA / Pacemaker
  4. What is HA Clustering ? ● One service goes down => others take over its work ● IP address takeover, service takeover, ● Not designed for high-performance ● Not designed for high troughput (load balancing)
  5. Lies, Damn Lies, and Statistics Counting nines (slide by Alan R) 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9  hr   99% 3.5 day
  6. The Rules of HA ● Keep it Simple ● Keep it Simple ● Prepare for Failure ● Complexity is the enemy of reliability ● Test your HA setup
  7. Eliminating the SPOF ● Find out what Will Fail • Disks • Fans • Power (Supplies) ● Find out what Can Fail • Network • Going Out Of Memory
  8. Data vs Connection ● DATA : • Replication • Shared storage • DRBD ● Connection • LVS • Proxy • Heartbeat / Pacemaker
  9. Shared Storage ● 1 MySQL instance ● Monitor MySQL node ● Stonith ● $$$ 1+1 <> 2 ● Storage = SPOF ● Split Brain :(
  10. DRBD ● Distributed Replicated Block Device ● In the Linux Kernel ● Usually only 1 mount • Multi mount as of 8.X • Requires GFS / OCFS2 ● Regular FS ext3 ... ● Only 1 MySQL instance Active accessing data ● Upon Failover MySQL needs to be started on other node
  11. DRBD(2) ● What happens when you pull the plug of a Physical machine ? • Minimal Timeout • Why did the crash happen ? • Is my data still correct ? • Innodb Consistency Checks ? • Lengthy ? • Check your BinLog size
  12. Other Solutions Today ● MySQL Cluster NDBD ● Multi Master Replication ● MySQL Proxy ● MMM ● Flipper ● BYO ● ....
  13. Pulling Traffic ● Eg. for Cluster, MultiMaster setups • DNS • Advanced Routing • LVS • Or the upcoming slides
  14. Linux-HA PaceMaker ● Plays well with others ● Manages more than MySQL ● ● ...v3 .. don't even think about the rest anymore ● ● http://clusterlabs.org/
  15. Heartbeat v1 • Max 2 nodes • No finegrained resources • Monitoring using “mon” /etc/ha.d/ha.cf /etc/ha.d/haresources mdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 IPaddr2::10.16.0.13/16/bond0.16 mon /etc/ha.d/authkeys
  16. Heartbeat v2 • Stability issues • Forking ? “A consulting Opportunity” LMB
  17. Clone Resource Clones in v2 were buggy Resources were started on 2 nodes Stopped again on “1”
  18. Heartbeat v3 • No more /etc/ha.d/haresources • No more xml • Better integrated monitoring • /etc/ha.d/ha.cf has • crm=yes
  19. Pacemaker ? ● Not a fork ● Only CRM Code taken out of Heartbeat ● As of Heartbeat 2.1.3 • Support for both OpenAIS / HeartBeat • Different Release Cycles as Heartbeat
  20. Heartbeat, OpenAis, Corosync ? ● All Messaging Layers ● Initially only Heartbeat ● OpenAIS ● Heartbeat got unmaintained ● OpenAIS had heisenbugs :( ● Corosync ● Heartbeat maintenance taken over by LinBit ● CRM Detects which layer
  21. Pacemaker Heartbeat or OpenAIS Cluster Glue
  22. Stonithd : The Heartbeat fencing subsystem. Pacemaker Architecture ● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts). ● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration. ● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes. ● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster. ● openais messaging and membership layer. ● heartbeat messaging layer, an alternative to OpenAIS. ● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
  23. Configuring Heartbeat Correctly heartbeat::hacf {"clustername": hosts => ["host-a","host-b"], hb_nic => ["bond0"], hostip1 => ["10.0.128.11"], hostip2 => ["10.0.128.12"], ping => ["10.0.128.4"], } heartbeat::authkeys {"ClusterName": password => “ClusterName ", } http://github.com/jtimberman/puppet/tree/master/heartbeat/
  24. CRM configure property $id="cib­bootstrap­options"  ● Cluster Resource         stonith­enabled="FALSE"          no­quorum­policy=ignore  Manager         start­failure­is­fatal="FALSE"  rsc_defaults $id="rsc_defaults­options"          migration­threshold="1"  ● Keeps Nodes in Sync         failure­timeout="1" primitive d_mysql ocf:local:mysql          op monitor interval="30s"          params test_user="sure" test_passwd="illtell"  ● XML Based test_table="test.table" primitive ip_db ocf:heartbeat:IPaddr2          params ip="172.17.4.202" nic="bond0"  ● cibadm         op monitor interval="10s" group svc_db d_mysql ip_db commit ● Cli manageable ● Crm
  25. Heartbeat Resources ● LSB ● Heartbeat resource (+status) ● OCF (Open Cluster FrameWork) (+monitor) ● Clones (don't use in HAv2) ● Multi State Resources
  26. LSB Resource Agents ● LSB == Linux Standards Base ● LSB resource agents are standard System V- style init scripts commonly used on Linux and other UNIX-like OSes ● LSB init scripts are stored under /etc/init.d/ ● This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages which come with their own init script ● It's straightforward to change an LSB script to an OCF script
  27. OCF ● OCF == Open Cluster Framework ● OCF Resource agents are the most powerful type of resource agent we support ● OCF RAs are extended init scripts • They have additional actions: • monitor – for monitoring resource health • meta-data – for providing information about the RA ● OCF RAs are located in /usr/lib/ocf/resource.d/provider-name/
  28. Monitoring ● Defined in the OCF Resource script ● Configured in the parameters ● You have to support multiple states • Not running • Running • Failed
  29. Anatomy of a Cluster config • Cluster properties • Resource Defaults • Primitive Definitions • Resource Groups and Constraints
  30. Cluster Properties property $id="cib-bootstrap-options" stonith-enabled="FALSE" no-quorum-policy="ignore" start-failure-is-fatal="FALSE" No-quorum-policy = We'll ignore the loss of quorum on a 2 node cluster Start-failure : When set to FALSE, the cluster will instead use the resource's failcount and value for resource-failure- stickiness
  31. Resource Defaults rsc_defaults $id="rsc_defaults-options" migration-threshold="1" failure-timeout="1" resource-stickiness="INFINITY" failure-timeout means that after a failure there will be a 60 second timeout before the resource can come back to the node on which it failed. Migration-treshold=1 means that after 1 failure the resource will try to start on the other node Resource-stickiness=INFINITY means that the resource really wants to stay where it is now.
  32. Primitive Definitions primitive d_mine ocf:custom:tomcat params instance_name="mine" monitor_urls="health.html" monitor_use_ssl="no" op monitor interval="15s" on-fail="restart" primitive ip_mine_svc ocf:heartbeat:IPaddr2 params ip="10.8.4.131" cidr_netmask="16" nic="bond0" op monitor interval="10s"
  33. Parsing a config ● Isn't always done correctly ● Even a verify won't find all issues ● Unexpected behaviour might occur
  34. Where a resource runs • multi state resources • Master – Slave , • e.g mysql master-slave, drbd • Clones • Resources that can run on multiple nodes e.g • Multimaster mysql servers • Mysql slaves • Stateless applications • location • Preferred location to run resource, eg. Based on hostname • colocation • Resources that have to live together • e.g ip address + service • order Define what resource has to start first, or wait for another resource • groups • Colocation + order
  35. eg. A Service on DRBD ● DRBD can only be active on 1 node ● The filesystem needs to be mounted on that active DRBD node group svc_mine d_mine ip_mine ms ms_drbd_storage drbd_storage meta master_max="1" master_node_max="1" clone_max="2" clone_node_max="1" notify="true" colocation fs_on_drbd inf: svc_mine ms_drbd_storage:Master order fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start location cli-prefer-svc_db svc_db rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-a
  36. A MySQL Resource ● OCF • Clone • Where do you hook up the IP ? • Multi State • But we have Master Master replication • Meta Resource • Dummy resource that can monitor • Connection • Replication state
  37. Simple 2 node example primitive d_mysql ocf:ntc:mysql op monitor interval="30s" params test_user="just" test_passwd="kidding" test_table="really" primitive ip_mysql_svc ocf:heartbeat:IPaddr2 params ip="10.8.0.30" cidr_netmask="255.255.255.0" nic="bond0" op monitor interval="10s" group svc_mysql d_mysql ip_mysql_svc
  38. Monitor your Setup ● Not just connectivity ● Also functional • Query data • Check resultset is correct ● Check replication • MaatKit • OpenARK
  39. How to deal with replication state ? ● Multiple slaves • Use Drbd ocf resource ● 2 masters only use own script • Replication is slow on the active node • Shouldn't happen talk to HR / cfgmt people • Replication is slow on the passive node • Weight-- • Replication breaks on the active node send out warning, don't modify weights and check other node • Replication breaks on the passive node • Fence of the passive node
  40. Adding MySQL to the stack Replication Service IP MySQL “MySQLd” “MySQLd” Resource MySQL Cluster Stack Pacemaker HeartBeat Node A Node B Hardware
  41. Pitfalls & Solutions ● Monitor, • Replication state • Replication Lag ● MaatKit ● OpenARK
  42. Conclusion ● Plenty of Alternatives ● Think about your Data ● Think about getting Queries to that Data ● Complexity is the enemy of reliability ● Keep it Simple ● Monitor inside the DB
  43. Contact Kris Buytaert Kris.Buytaert@inuits.be Further Reading @KrisBuytaert http://www.krisbuytaert.be/blog/ http://www.inuits.be/ http://www.virtualization.com/ http://www.oreillygmt.com/ Inuits Esquimaux 't Hemeltje Kheops Business Gemeentepark 2 Center 2930 Brasschaat Avenque Georges 891.514.231 Lemaître 54 6041 Gosselies +32 473 441 636 889.780.406 +32 495 698 668
Advertisement