1. Linux HA anno 2014
Julien PivottoJulien Pivotto
LOADays, AntwerpLOADays, Antwerp
April 4th, 2014April 4th, 2014
2. whoamiwhoami
• sysadmin @ inuitssysadmin @ inuits
• open-source defender for 7+ yearsopen-source defender for 7+ years
• devops believerdevops believer
• @roidelapluie on twitter/github@roidelapluie on twitter/github
Julien Pivotto Linux HA
4. What is HAWhat is HA
• High AvailabilityHigh Availability
• One service fail ⇒ another takes over its jobOne service fail ⇒ another takes over its job
• Transparent for the end-userTransparent for the end-user
Julien Pivotto Linux HA
5. Where HA will NOT helpWhere HA will NOT help
• It is not about scalabilityIt is not about scalability
• It will not fix your applicationIt will not fix your application
• It will make your application stableIt will make your application stable
• It is not a one-size-fits-all solutionIt is not a one-size-fits-all solution
• It is not about performancesIt is not about performances
• It is not backupIt is not backup
Julien Pivotto Linux HA
6. Why caring about HA?Why caring about HA?
• Service goes down at 5pm on Friday?Service goes down at 5pm on Friday?
• Downtime makes users unhappyDowntime makes users unhappy
• Downtime costs moneyDowntime costs money
Julien Pivotto Linux HA
7. What will not workWhat will not work
• Virtualization will not make your app HAVirtualization will not make your app HA
• VM mirroring is not HAVM mirroring is not HA
• Live migrations are not HALive migrations are not HA
• Containers are not HAContainers are not HA
• Cloud lolCloud lol
Julien Pivotto Linux HA
8. HA is about servicesHA is about services
Julien Pivotto Linux HA
9. Start on a good basisStart on a good basis
• AutomationAutomation
• MonitoringMonitoring
• CI / CDCI / CD
• TestingTesting
• . . . Then, start working on HA. . . Then, start working on HA
Julien Pivotto Linux HA
10. Eliminate the SPOFEliminate the SPOF
• Single Point of FailuresSingle Point of Failures
• Hardware failsHardware fails
• Disks always failDisks always fail
• etc. . .etc. . .
• Replicate. . .Replicate. . .
Julien Pivotto Linux HA
11. Split-BrainSplit-Brain
• Nodes can’t talk to each otherNodes can’t talk to each other
• They think they are aloneThey think they are alone
• They take decision and leadershipThey take decision and leadership
• Data inconsistencyData inconsistency
Julien Pivotto Linux HA
12. FencingFencing
• Shoot the other node in the headShoot the other node in the head
• Be sure a node is deadBe sure a node is dead
• Preserve integrity of the dataPreserve integrity of the data
• Combine with quorumsCombine with quorums
Julien Pivotto Linux HA
13. MonitoringMonitoring
• Monitoring if PID is running is uselessMonitoring if PID is running is useless
• Result-based monitoringResult-based monitoring
• Extract data out of itExtract data out of it
• E.g try to insert in DBE.g try to insert in DB
Julien Pivotto Linux HA
14. Cluster?Cluster?
• Active/active: everything is activeActive/active: everything is active
• Active/passive: nodes in standbyActive/passive: nodes in standby
• N+1: One node waiting in standbyN+1: One node waiting in standby
• N+M: Nodes waiting in standbyN+M: Nodes waiting in standby
• Can mix them etc. . .Can mix them etc. . .
Julien Pivotto Linux HA
21. The StateThe State
• Stateless applicationStateless application
• Everything in DBEverything in DB
• Avoid temp filesAvoid temp files
• Disaster recoveryDisaster recovery
Julien Pivotto Linux HA
22. The right toolsThe right tools
• Make relevant choices for you appMake relevant choices for you app
• Look for HA in databasesLook for HA in databases
• Look for HA in queuing systemsLook for HA in queuing systems
• Look for HA in filesystems?Look for HA in filesystems?
• Master/Master vs Master/slaveMaster/Master vs Master/slave
Julien Pivotto Linux HA
23. The configurationThe configuration
• Same config everywhereSame config everywhere
• Use puppet, chef, . . .Use puppet, chef, . . .
• Config in one placeConfig in one place
• KISSKISS
Julien Pivotto Linux HA
25. PacemakerPacemaker
• It is the brainIt is the brain
• Decides what to do, whenDecides what to do, when
• Gets information from ressourcesGets information from ressources
• Depends on messaging and cluster managerDepends on messaging and cluster manager
• Does not require shared storageDoes not require shared storage
Julien Pivotto Linux HA
26. DecisionsDecisions
• A node fails, now whatA node fails, now what
• A service fails, now whatA service fails, now what
• Restart? Move?Restart? Move?
• Needs to be quick and without interventionNeeds to be quick and without intervention
• Scores, policiesScores, policies
Julien Pivotto Linux HA
27. CIBCIB
• Cluster Information BaseCluster Information Base
• XML shared accross the clusterXML shared accross the cluster
• Updated using "pcs"Updated using "pcs"
• Contains knowledge about the clusterContains knowledge about the cluster
Julien Pivotto Linux HA
28. PrimitivesPrimitives
• Service, Ip address, mountpoint,. . .Service, Ip address, mountpoint,. . .
• Base bricks of a clusterBase bricks of a cluster
• Get a lot of parametersGet a lot of parameters
primitive ClusterIP ocf:heartbeat:IPaddr2
params ip="192.168.122.101" cidr_netmask="32"
op monitor interval="30s"
Julien Pivotto Linux HA
29. Resource AgentResource Agent
• ScriptScript
• How to startHow to start
• How to stopHow to stop
• How to change state (promote, demote)How to change state (promote, demote)
• How to monitor (real monitoring)How to monitor (real monitoring)
• An init script but way beterAn init script but way beter
Julien Pivotto Linux HA
30. ClonesClones
• Same resource running on multiple hostsSame resource running on multiple hosts
• Define minimum and maximum of running primitivesDefine minimum and maximum of running primitives
• Possible to run multiple on the same nodePossible to run multiple on the same node
clone WebIP ClusterIP
meta globally-unique="true" clone-max="2"
clone-node-max="2"
Julien Pivotto Linux HA
31. Master Slave (ms)Master Slave (ms)
• Set of primitives with roleSet of primitives with role
• Masters and slaves (e.g mysql, ldap)Masters and slaves (e.g mysql, ldap)
• Can promote slaves to masterCan promote slaves to master
• Can demote masters to slaveCan demote masters to slave
• Multiples slaves / mastersMultiples slaves / masters
ms WebDataClone WebData
meta master-max="2" master-node-max="1"
clone-max="2" clone-node-max="1"
Julien Pivotto Linux HA
32. GroupGroup
• Group of primitives of different kindGroup of primitives of different kind
• Implies colocationImplies colocation
• Starts in a fixed orderStarts in a fixed order
• Stops in the opposite orderStops in the opposite order
Julien Pivotto Linux HA
33. ColocationColocation
• ConstraintConstraint
• Must run on the same hostsMust run on the same hosts
• Has a scoreHas a score
• Order mattersOrder matters
• e.g vip with servicee.g vip with service
colocation website-with-ip inf: WebSite ClusterIPproperty
Julien Pivotto Linux HA
34. LocationLocation
• Set preferred locationSet preferred location
• Has a scoreHas a score
location prefer-apache-1 WebSite 50: apache-1
ms WebDataClone WebData
meta master-max="2" master-node-max="1"
clone-max="2" clone-node-max="1"
Julien Pivotto Linux HA
35. OrderOrder
• What starts after whatWhat starts after what
• Even across nodesEven across nodes
• Has a scoreHas a score
pcs order WebFS-after-WebData inf: WebDataClone:promote
WebFSClone:start
Julien Pivotto Linux HA
37. MaintenanceMaintenance
• Manually move resourcesManually move resources
• Set a DO-NOT-MANAGE flagSet a DO-NOT-MANAGE flag
• Do not forget to revertDo not forget to revert
Julien Pivotto Linux HA
39. CMANCMAN
• Manages membership and quorumManages membership and quorum
• Notifies pacemaker when something changesNotifies pacemaker when something changes
• Starts and manages corosyncStarts and manages corosync
• Needs a cluster.conf that contains all the nodesNeeds a cluster.conf that contains all the nodes
• Managed via ccsManaged via ccs
• Will propagate the changesWill propagate the changes
Julien Pivotto Linux HA
40. CorosyncCorosync
• Messaging layerMessaging layer
• Controlled via CMANControlled via CMAN
• Next version will take over CMANNext version will take over CMAN
Julien Pivotto Linux HA
42. DistributionsDistributions
• Developed mainly by RedHat and SuSeDeveloped mainly by RedHat and SuSe
• Used with Openstack tooUsed with Openstack too
• Getting into a unique stackGetting into a unique stack
• Available in * distrosAvailable in * distros
Julien Pivotto Linux HA
43. crmsh vs pcscrmsh vs pcs
• crmsh was more usedcrmsh was more used
• Disappeared in CentOS 6.4Disappeared in CentOS 6.4
• Getting used to pcsGetting used to pcs
• One goal: modify the CIBOne goal: modify the CIB
• pcs is young/not widely usedpcs is young/not widely used
Julien Pivotto Linux HA
45. Create a resourceCreate a resource
pcs resource create ClusterIP ocf:heartbeat:IPaddr2
ip=192.168.0.120 cidr_netmask=32 op monitor
interval=30s
Julien Pivotto Linux HA
46. Create constraintsCreate constraints
pcs constraint colocation add WebFS WebDataClone
INFINITY with-rsc-role=Master
pcs constraint order promote WebDataClone then
start WebFS
Julien Pivotto Linux HA
48. Check the status of the clusterCheck the status of the cluster
pcs status
Last updated: Fri Sep 14 12:41:12 2012
Last change: Fri Sep 14 12:41:08 2012 via crm_attribute on pcmk-1
Stack: corosync
Current DC: pcmk-1 (1) - partition with quorum
Version: 1.1.8-1.el7-60a19ed12fdb4d5c6a6b6767f52e5391e447fec0
2 Nodes configured, unknown expected votes
5 Resources configured.
Node pcmk-1 (1): standby
Online: [ pcmk-2 ]
Full list of resources:
ClusterIP (ocf::heartbeat:IPaddr2): Started pcmk-2
WebSite (ocf::heartbeat:apache): Started pcmk-2
Master/Slave Set: WebDataClone [WebData]
Masters: [ pcmk-2 ]
Stopped: [ WebData:1 ]
WebFS (ocf::heartbeat:Filesystem): Started pcmk-2
Julien Pivotto Linux HA
50. Percona Replication ManagerPercona Replication Manager
• MySQL replication with pacemakerMySQL replication with pacemaker
• Complete documentationComplete documentation
• Ressource agentsRessource agents
• Supports multi-slave setupsSupports multi-slave setups
• Good documentationGood documentation
Julien Pivotto Linux HA
51. Mysql Ressource AgentMysql Ressource Agent
• Keeps track of a score for each slaveKeeps track of a score for each slave
• In case of failure, will switch to the "best scored"In case of failure, will switch to the "best scored"
• Can be reused in your clusterCan be reused in your cluster
• https://github.com/percona/percona-pacemaker-agentshttps://github.com/percona/percona-pacemaker-agents
Julien Pivotto Linux HA
60. Be cleverBe clever
• KISSKISS
• AutomateAutomate
• MonitorMonitor
• Be realisticBe realistic
Julien Pivotto Linux HA
61. Do not promise the impossibleDo not promise the impossible
• WONTFIX your appWONTFIX your app
• Working together (devops)Working together (devops)
• Not about scaleNot about scale
• Not about stabilityNot about stability
• Do not talk in ninesDo not talk in nines
Julien Pivotto Linux HA
62. Linux HALinux HA
• ReliableReliable
• Pacemaker, Corosync, CMANPacemaker, Corosync, CMAN
• Pcs, crmsh, ccsPcs, crmsh, ccs
• A lot of readingA lot of reading
• A lot of experience to buildA lot of experience to build
Julien Pivotto Linux HA
63. RTFMRTFM
• http://clusterlabs.orghttp://clusterlabs.org
• Clusters From ScratchClusters From Scratch
• Pacemaker explainedPacemaker explained
• http://blog.clusterlabs.orghttp://blog.clusterlabs.org
• Old http://linux-ha.orgOld http://linux-ha.org
Julien Pivotto Linux HA