Buytaertkrismysql pacemaker-120606052427-phpapp01


Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Buytaertkrismysql pacemaker-120606052427-phpapp01

  1. 1. MySQL HAMySQL HA with PaceMakerwith PaceMaker Kris Buytaert
  2. 2. Kris BuytaertKris Buytaert ● CTO and Open Source Consultant @inuits.euCTO and Open Source Consultant ● „„Infrastructure Architect“Infrastructure Architect“ ● I don't remember when I started using MySQLI don't remember when I started using MySQL ● Specializing in Automated , Large ScaleSpecializing in Automated , Large Scale Deployments , Highly Available infrastructures,Deployments , Highly Available infrastructures, since 2008 also known as “the Cloud”since 2008 also known as “the Cloud” ● Surviving the 10Surviving the 10thth floor testfloor test ● Cofounded devopsdays.orgCofounded
  3. 3. In this presentationIn this presentation ● High Availability ?High Availability ? ● MySQL HA SolutionsMySQL HA Solutions ● MySQL ReplicationMySQL Replication ● Linux HA / PacemakerLinux HA / Pacemaker
  4. 4. What is HA Clustering ?What is HA Clustering ? ● One service goes downOne service goes down => others take over its work=> others take over its work ● IP address takeover, service takeover,IP address takeover, service takeover, ● Not designed for high-performanceNot designed for high-performance ● Not designed for high troughput (loadNot designed for high troughput (load balancing)balancing)
  5. 5. Does it Matter ?Does it Matter ? ● Downtime is expensiveDowntime is expensive ● You mis out on $$$You mis out on $$$ ● Your boss complainsYour boss complains ● New users don't returnNew users don't return
  6. 6. Lies, Damn Lies, andLies, Damn Lies, and StatisticsStatistics Counting ninesCounting nines (slide by Alan R)(slide by Alan R) 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9 hr 99% 3.5 day
  7. 7. The Rules of HAThe Rules of HA ● Keep it SimpleKeep it Simple ● Keep it SimpleKeep it Simple ● Prepare for FailurePrepare for Failure ● Complexity is the enemy of reliabilityComplexity is the enemy of reliability ● Test your HA setupTest your HA setup
  8. 8. You care about ?You care about ? ● Your data ?Your data ? •ConsistentConsistent •RealitimeRealitime •Eventual ConsistentEventual Consistent ● Your ConnectionYour Connection •AlwaysAlways •Most of the timeMost of the time
  9. 9. Eliminating the SPOFEliminating the SPOF ● Find out what Will Fail •Disks •Fans •Power (Supplies) ● Find out what Can Fail •Network •Going Out Of Memory
  10. 10. Split BrainSplit Brain ● Communications failures can lead to separatedCommunications failures can lead to separated partitions of the clusterpartitions of the cluster ● If those partitions each try and take control ofIf those partitions each try and take control of the cluster, then it's called a split-brainthe cluster, then it's called a split-brain conditioncondition ● If this happens, then bad things will happenIf this happens, then bad things will happen •
  11. 11. Historical MySQL HAHistorical MySQL HA ● ReplicationReplication •1 read write node1 read write node •Multiple read only nodesMultiple read only nodes •Application needed to be modifiedApplication needed to be modified
  12. 12. Solutions TodaySolutions Today ● BYOBYO ● DRBDDRBD ● MySQL Cluster NDBDMySQL Cluster NDBD ● Multi Master ReplicationMulti Master Replication ● MySQL ProxyMySQL Proxy ● MMM / FlipperMMM / Flipper ● GaleraGalera ● Percona XtraDB ClusterPercona XtraDB Cluster
  13. 13. Data vs ConnectionData vs Connection ● DATA :DATA : •ReplicationReplication •DRBDDRBD ● ConnectionConnection •LVSLVS •ProxyProxy •Heartbeat / PacemakerHeartbeat / Pacemaker
  14. 14. Shared StorageShared Storage ● 1 MySQL instance1 MySQL instance ● Monitor MySQL nodeMonitor MySQL node ● StonithStonith ● $$$$$$ 1+1 <> 21+1 <> 2 ● Storage = SPOFStorage = SPOF ● Split Brain :(Split Brain :(
  15. 15. DRBDDRBD ● Distributed Replicated Block DeviceDistributed Replicated Block Device ● In the Linux Kernel (as of very recent)In the Linux Kernel (as of very recent) ● Usually only 1 mountUsually only 1 mount •Multi mount as of 8.XMulti mount as of 8.X •Requires GFS / OCFS2Requires GFS / OCFS2 ● Regular FS ext3 ...Regular FS ext3 ... ● Only 1 MySQL instance Active accessing dataOnly 1 MySQL instance Active accessing data ● Upon Failover MySQL needs to be started onUpon Failover MySQL needs to be started on other nodeother node
  16. 16. DRBD(2)DRBD(2) ● What happens when you pull the plug of aWhat happens when you pull the plug of a Physical machine ?Physical machine ? •Minimal TimeoutMinimal Timeout •Why did the crash happen ?Why did the crash happen ? •Is my data still correct ?Is my data still correct ? •Innodb Consistency Checks ?Innodb Consistency Checks ? •Lengthy ?Lengthy ? •Check your BinLog sizeCheck your BinLog size
  17. 17. MySQL Cluster NDBDMySQL Cluster NDBD ● Shared-nothing architectureShared-nothing architecture ● Automatic partitioningAutomatic partitioning ● Synchronous replicationSynchronous replication ● Fast automatic fail-over of data nodesFast automatic fail-over of data nodes ● In-memory indexesIn-memory indexes ● Not suitable for all query patterns (multi-tableNot suitable for all query patterns (multi-table JOINs, range scans)JOINs, range scans)
  18. 18. Title – Data
  19. 19. MySQL Cluster NDBDMySQL Cluster NDBD ● All indexed data needs to be in memoryAll indexed data needs to be in memory ● Good and bad experiencesGood and bad experiences •Better experiences when using the APIBetter experiences when using the API •Bad when using the MySQL ServerBad when using the MySQL Server ● Test before you deployTest before you deploy ● Does not fit for all appsDoes not fit for all apps
  20. 20. How replication worksHow replication works ● Master server keeps track of all updates in theMaster server keeps track of all updates in the Binary LogBinary Log •Slave requests to read the binary update logSlave requests to read the binary update log •Master acts in aMaster acts in a passivepassive role, not keeping trackrole, not keeping track of what slave has read what dataof what slave has read what data ● UponUpon connectingconnecting the slaves do the following:the slaves do the following: •The slaveThe slave informsinforms the master of where it left offthe master of where it left off •ItIt catches upcatches up on the updateson the updates •ItIt waitswaits for the masterfor the master to notify it of newto notify it of new updateupdatess
  21. 21. Two Slave ThreadsTwo Slave Threads ● How does it work?How does it work? •The I/O thread connects to the master and asks forThe I/O thread connects to the master and asks for the updates in the master’s binary logthe updates in the master’s binary log •The I/O thread copies the statements to the relayThe I/O thread copies the statements to the relay loglog •The SQL thread implements the statements in theThe SQL thread implements the statements in the relay logrelay log AdvantagesAdvantages •Long running SQL statements don’t block logLong running SQL statements don’t block log downloadingdownloading •Allows the slave to keep up with the master betterAllows the slave to keep up with the master better •In case of master crash the slave is more likely toIn case of master crash the slave is more likely to have all statementshave all statements
  23. 23. Show slave statusGShow slave statusG Slave_IO_State: Waiting for master to send eventSlave_IO_State: Waiting for master to send event Master_Host: Master_User: repliMaster_User: repli Master_Port: 3306Master_Port: 3306 Connect_Retry: 60Connect_Retry: 60 Master_Log_File: XMS-1-bin.000014Master_Log_File: XMS-1-bin.000014 Read_Master_Log_Pos: 106Read_Master_Log_Pos: 106 Relay_Log_File: XMS-2-relay.000033Relay_Log_File: XMS-2-relay.000033 Relay_Log_Pos: 251Relay_Log_Pos: 251 Relay_Master_Log_File: XMS-1-bin.000014Relay_Master_Log_File: XMS-1-bin.000014 Slave_IO_Running: YesSlave_IO_Running: Yes Slave_SQL_Running: YesSlave_SQL_Running: Yes Replicate_Do_DB: xpolReplicate_Do_DB: xpol Replicate_Ignore_DB:Replicate_Ignore_DB: Replicate_Do_Table:Replicate_Do_Table: Replicate_Ignore_Table:Replicate_Ignore_Table: Replicate_Wild_Do_Table:Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table:Replicate_Wild_Ignore_Table: Last_Errno: 0Last_Errno: 0 Last_Error:Last_Error: Skip_Counter: 0Skip_Counter: 0 Exec_Master_Log_Pos: 106Exec_Master_Log_Pos: 106 Relay_Log_Space: 547Relay_Log_Space: 547 Until_Condition: NoneUntil_Condition: None Until_Log_File:Until_Log_File: Until_Log_Pos: 0Until_Log_Pos: 0 Master_SSL_Allowed: NoMaster_SSL_Allowed: No Master_SSL_CA_File:Master_SSL_CA_File: Master_SSL_CA_Path:Master_SSL_CA_Path: Master_SSL_Cert:Master_SSL_Cert: Master_SSL_Cipher:Master_SSL_Cipher: Master_SSL_Key:Master_SSL_Key: Seconds_Behind_Master: 0Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: NoMaster_SSL_Verify_Server_Cert: No Last_IO_Errno: 0Last_IO_Errno: 0 Last_IO_Error:Last_IO_Error: Last_SQL_Errno: 0Last_SQL_Errno: 0 Last_SQL_Error:Last_SQL_Error: 1 row in set (0.00 sec)1 row in set (0.00 sec)
  24. 24. Row vs StatementRow vs Statement ● ProPro •Proven (around since MySQL 3.23)Proven (around since MySQL 3.23) •Smaller log filesSmaller log files •Auditing of actual SQL statementsAuditing of actual SQL statements •No primary key requirement forNo primary key requirement for replicated tablesreplicated tables ● ConCon •Non-deterministic functions andNon-deterministic functions and UDFsUDFs ● ProPro •All changes can be replicatedAll changes can be replicated •Similar technology used by otherSimilar technology used by other RDBMSesRDBMSes •Fewer locks required for someFewer locks required for some INSERT, UPDATE or DELETEINSERT, UPDATE or DELETE statementsstatements ● ConCon •More data to be loggedMore data to be logged •Log file size increasesLog file size increases (backup/restore implications)(backup/restore implications) •Replicated tables require explicitReplicated tables require explicit primary keysprimary keys •Possible different result sets onPossible different result sets on bulk INSERTsbulk INSERTs
  25. 25. Multi Master ReplicationMulti Master Replication ● Replicating the same table data both ways canReplicating the same table data both ways can lead to race conditionslead to race conditions •Auto_increment, unique keys, etc.. could causeAuto_increment, unique keys, etc.. could cause problems If you write them 2xproblems If you write them 2x ● Both nodes are masterBoth nodes are master ● Both nodes are slaveBoth nodes are slave ● Write in 1 get updates on the otherWrite in 1 get updates on the other M|S M|S
  26. 26. MySQL ProxyMySQL Proxy ● Man in the middleMan in the middle ● Decides where to connect toDecides where to connect to •LUALUA ● Write rules toWrite rules to •Redirect trafficRedirect traffic •
  27. 27. Master Slave & ProxyMaster Slave & Proxy ● Split Read and Write ActionsSplit Read and Write Actions ● No Application change requiredNo Application change required ● Sends specific queries to a specific nodeSends specific queries to a specific node ● Based onBased on •CustomerCustomer •UserUser •TableTable •AvailabilityAvailability
  28. 28. MySQL ProxyMySQL Proxy ● Your new SPOFYour new SPOF ● Make your Proxy HA too !Make your Proxy HA too ! •Heartbeat OCF ResourceHeartbeat OCF Resource
  29. 29. Breaking ReplicationBreaking Replication ● If the master and slave gets out of syncIf the master and slave gets out of sync ● Updates on slave with identical index idUpdates on slave with identical index id •Check error log for disconnections and issuesCheck error log for disconnections and issues with replicationwith replication
  30. 30. Monitor your SetupMonitor your Setup ● Not just connectivityNot just connectivity ● Also functionalAlso functional •Query dataQuery data •Check resultset is correctCheck resultset is correct ● Check replicationCheck replication •MaatKitMaatKit •OpenARKOpenARK
  31. 31. Pulling TrafficPulling Traffic ● Eg. for Cluster, MultiMaster setupsEg. for Cluster, MultiMaster setups •DNSDNS •Advanced RoutingAdvanced Routing •LVSLVS •Flipper / MMMFlipper / MMM
  32. 32. MMMMMM ● Multi-Master Replication ManagerMulti-Master Replication Manager for MySQLfor MySQL •Perl scripts to performPerl scripts to perform monitoring/failover andmonitoring/failover and management of MySQL master-management of MySQL master- master replication configurationsmaster replication configurations ● Balance master / slave configsBalance master / slave configs based on replication statebased on replication state •Map Virtual IP to the Best NodeMap Virtual IP to the Best Node ●
  33. 33. FlipperFlipper ● Flipper is a Perl tool forFlipper is a Perl tool for managing read and writemanaging read and write access pairs of MySQL serversaccess pairs of MySQL servers ● master-master MySQL Serversmaster-master MySQL Servers ● Clients machines do notClients machines do not connect "directly" to eitherconnect "directly" to either node instead,node instead, ● One IP for read,One IP for read, ● One IP for write.One IP for write. ● Flipper allows you to moveFlipper allows you to move these IP addresses betweenthese IP addresses between the nodes in a safe andthe nodes in a safe and controlled manner.controlled manner. ● are/flipper/are/flipper/
  34. 34. Linux-HA PaceMakerLinux-HA PaceMaker ● Plays well with othersPlays well with others ● Manages more than MySQLManages more than MySQL ● ● ...v3 .. don't even think about the rest anymore...v3 .. don't even think about the rest anymore ● ●
  35. 35. HeartbeatHeartbeat ● Heartbeat v1Heartbeat v1 •Max 2 nodesMax 2 nodes •No finegrained resourcesNo finegrained resources •Monitoring using “mon”Monitoring using “mon” ● Heartbeat v2Heartbeat v2 •XML usage was a consulting opportunityXML usage was a consulting opportunity •Stability issuesStability issues •Forking ?Forking ?
  36. 36. Pacemaker ArchitecturePacemaker Architecture ● Stonithd : The Heartbeat fencing subsystem. ● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts). ● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration. ● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes. ● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster. ● openais messaging and membership layer. ● heartbeat messaging layer, an alternative to OpenAIS. ● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
  37. 37. Pacemaker ?Pacemaker ? ● Not a forkNot a fork ● Only CRM Code taken out of HeartbeatOnly CRM Code taken out of Heartbeat ● As of Heartbeat 2.1.3As of Heartbeat 2.1.3 •Support for both OpenAIS / HeartBeatSupport for both OpenAIS / HeartBeat •Different Release Cycles as HeartbeatDifferent Release Cycles as Heartbeat
  38. 38. Heartbeat, OpenAis ?Heartbeat, OpenAis ? ● Both Messaging LayersBoth Messaging Layers ● Initially only HeartbeatInitially only Heartbeat ● OpenAISOpenAIS ● Heartbeat got unmaintainedHeartbeat got unmaintained ● OpenAIS has heisenbugs :(OpenAIS has heisenbugs :( ● Heartbeat maintenance taken over by LinBitHeartbeat maintenance taken over by LinBit ● CRM Detects which layerCRM Detects which layer
  39. 39. OpenAISHeartbeat Pacemaker Cluster Glue or
  40. 40. Configuring HeartbeatConfiguring Heartbeat ● /etc/ha.d/ Use crm = yesUse crm = yes ● /etc/ha.d/authkeys/etc/ha.d/authkeys
  41. 41. Configuring HeartbeatConfiguring Heartbeat heartbeat::hacf {"clustername":heartbeat::hacf {"clustername": hosts => ["host-a","host-b"],hosts => ["host-a","host-b"], hb_nic => ["bond0"],hb_nic => ["bond0"], hostip1 => [""],hostip1 => [""], hostip2 => [""],hostip2 => [""], ping => [""],ping => [""], }} heartbeat::authkeys {"ClusterName":heartbeat::authkeys {"ClusterName": password => “ClusterName ",password => “ClusterName ", }}
  42. 42. Heartbeat ResourcesHeartbeat Resources ● LSBLSB ● Heartbeat resource (+status)Heartbeat resource (+status) ● OCF (Open Cluster FrameWork) (+monitor)OCF (Open Cluster FrameWork) (+monitor) ● Clones (don't use in HAv2)Clones (don't use in HAv2) ● Multi State ResourcesMulti State Resources
  43. 43. A MySQL ResourceA MySQL Resource ● OCFOCF •CloneClone •Where do you hook up the IP ?Where do you hook up the IP ? •Multi StateMulti State •But we have Master Master replicationBut we have Master Master replication •Meta ResourceMeta Resource •Dummy resource that can monitorDummy resource that can monitor •ConnectionConnection •Replication stateReplication state
  44. 44. CRMCRM ● Cluster ResourceCluster Resource ManagerManager ● Keeps Nodes in SyncKeeps Nodes in Sync ● XML BasedXML Based ● cibadmcibadm ● Cli manageableCli manageable ● CrmCrm configureconfigure property $id="cib-bootstrap-property $id="cib-bootstrap- options" options" stonith-enabled="FALSE" stonith-enabled="FALSE" no-quorum-policy=ignore no-quorum-policy=ignore start-failure-is-fatal="FALSE" start-failure-is-fatal="FALSE" rsc_defaults $id="rsc_defaults-rsc_defaults $id="rsc_defaults- options" options" migration-threshold="1" migration-threshold="1" failure-timeout="1"failure-timeout="1" primitive d_mysql ocf:local:mysql primitive d_mysql ocf:local:mysql op monitor interval="30s" op monitor interval="30s" params test_user="sure"params test_user="sure" test_passwd="illtell"test_passwd="illtell" test_table="test.table"test_table="test.table" primitive ip_dbprimitive ip_db ocf:heartbeat:IPaddr2 ocf:heartbeat:IPaddr2 params ip=""params ip="" nic="bond0" nic="bond0" op monitor interval="10s"op monitor interval="10s" group svc_db d_mysql ip_dbgroup svc_db d_mysql ip_db commitcommit
  45. 45. Node A Node B HeartBeat Pacemaker “MySQLd” “MySQLd” Hardware Cluster Stack Resource MySQL Replication Service IP MySQL Adding MySQL to theAdding MySQL to the stackstack
  46. 46. Pitfalls & SolutionsPitfalls & Solutions ● Monitor,Monitor, •Replication stateReplication state •Replication LagReplication Lag ● MaatKitMaatKit ● OpenARKOpenARK
  47. 47. ConclusionConclusion ● Plenty of AlternativesPlenty of Alternatives ● Think about your DataThink about your Data ● Think about getting Queries to that DataThink about getting Queries to that Data ● Complexity is the enemy of reliabilityComplexity is the enemy of reliability ● Keep it SimpleKeep it Simple ● Monitor inside the DBMonitor inside the DB
  48. 48. ContactContact Kris BuytaertKris Buytaert Further ReadingFurther Reading @krisbuytaert@krisbuytaert InuitsInuits 't Hemeltje't Hemeltje Duboistraat 50Duboistraat 50 2060 Antwerpen2060 Antwerpen BelgiumBelgium 891.514.231891.514.231 +32 475 961221+32 475 961221 •Or the upcoming slidesOr the upcoming slides