• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Linux-HA with Pacemaker
 

Linux-HA with Pacemaker

on

  • 14,005 views

Linux-HA with Pacemaker

Linux-HA with Pacemaker

Statistics

Views

Total Views
14,005
Views on SlideShare
13,993
Embed Views
12

Actions

Likes
6
Downloads
268
Comments
0

3 Embeds 12

http://coderwall.com 8
http://www.linkedin.com 2
https://www.linkedin.com 2

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Linux-HA with Pacemaker Linux-HA with Pacemaker Presentation Transcript

    • Linux High Availability Kris Buytaert
    • Kris Buytaert@krisbuytaert● I used to be a Dev, Then Became an Op● Senior Linux and Open Source Consultant @inuits.be● „Infrastructure Architect“● Building Clouds since before the Cloud● Surviving the 10th floor test● Co-Author of some books● Guest Editor at some sites
    • What is HA Clustering ?● One service goes down => others take over its work● IP address takeover, service takeover,● Not designed for high-performance● Not designed for high troughput (load balancing)
    • Does it Matter ?● Downtime is expensive● You mis out on $$$● Your boss complains● New users dont return
    • Lies, Damn Lies, andStatistics Counting nines (slide by Alan R) 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9  hr   99% 3.5 day
    • The Rules of HA● Keep it Simple● Keep it Simple● Prepare for Failure● Complexity is the enemy of reliability● Test your HA setup
    • Myths● Virtualization will solve your HA Needs● Live migration is the solution to all your problems● HA will make your platform more stable
    • You care about ?● Your data ? • Consistent • Realitime • Eventual Consistent● Your Connection • Always • Most of the time
    • Eliminating the SPOF● Find out what Will Fail • Disks • Fans • Power (Supplies)● Find out what Can Fail • Network • Going Out Of Memory
    • Split Brain● Communications failures can lead to separated partitions of the cluster● If those partitions each try and take control of the cluster, then its called a split-brain condition● If this happens, then bad things will happen • http://linux-ha.org/BadThingsWillHappen
    • Shared Storage● Shared Storage● Filesystem • e.g GFS, GpFS● Replicated ?● Exported Filesystem ?● $$$ 1+1 <> 2● Storage = SPOF● Split Brain :(● Stonith
    • (Shared) Data● Issues : • Who Writes ? • Who Reads ? • What if 2 Active application want to write ? • What if an active server crashes during writing ? • Can we accept delays ? • Can we accept readonly data ?● Hardware Requirements● Filesystem Requirements (GFS, GpFS, ...)
    • DRBD● Distributed Replicated Block Device● In the Linux Kernel (as of very recent)● Usually only 1 mount • Multi mount as of 8.X • Requires GFS / OCFS2● Regular FS ext3 ...● Only 1 application instance Active accessing data● Upon Failover application needs to be started on other node
    • DRBD(2)● What happens when you pull the plug of a Physical machine ? • Minimal Timeout • Why did the crash happen ? • Is my data still correct ?
    • Alternatives to DRBD● GlusterFS looked promising • “Friends dont let Friends use Gluster” • Consistency problems • Stability Problems • Maybe later● MogileFS • Not posix • App needs to implement the API● Ceph • ?
    • HA Projects● Linux HA Project● Red Hat Cluster Suite● LVS/Keepalived● Application Specific Clustering Software • e.g Terracotta, MySQL NDBD
    • HeartBeat ● No shared storage ● Serial Connections to UPS to STONITH ● (periodical/realtime) Replication or no shared data. ● e.g Static Website, FileServer
    • Heartbeat● Heartbeat v1 • Max 2 nodes • No finegrained resources • Monitoring using “mon”● Heartbeat v2 • XML usage was a consulting opportunity • Stability issues • Forking ?
    • Heartbeat v1/etc/ha.d/ha.cf/etc/ha.d/haresourcesmdb-a.menos.asbucenter.dz ntc-restart-mysql mon IPaddr2::10.8.0.13/16/bond0 IPaddr2::10.16.0.13/16/bond0.16 mon/etc/ha.d/authkeys
    • Heartbeat v2“A consulting Opportunity” LMB
    • Clone ResourceClones in v2 were buggyResources were started on 2 nodesStopped again on “1”
    • Heartbeat v3• No more /etc/ha.d/haresources• No more xml• Better integrated monitoring• /etc/ha.d/ha.cf has• crm=yes
    • Pacemaker ?● Not a fork● Only CRM Code taken out of Heartbeat● As of Heartbeat 2.1.3 • Support for both OpenAIS / HeartBeat • Different Release Cycles as Heartbeat
    • Heartbeat, OpenAis,Corosync ?● All Messaging Layers● Initially only Heartbeat● OpenAIS● Heartbeat got unmaintained● OpenAIS had heisenbugs :(● Corosync● Heartbeat maintenance taken over by LinBit● CRM Detects which layer
    • Configuring Heartbeat 3● /etc/ha.d/ha.cf Use crm = yes● /etc/ha.d/authkeys
    • Configuring Heartbeat with puppetheartbeat::hacf {"clustername": hosts => ["host-a","host-b"], hb_nic => ["bond0"], hostip1 => ["10.0.128.11"], hostip2 => ["10.0.128.12"], ping => ["10.0.128.4"], }heartbeat::authkeys {"ClusterName": password => “ClusterName ", }http://github.com/jtimberman/puppet/tree/master/heartbeat/
    • PacemakerHeartbeat or OpenAIS Cluster Glue
    • ● Stonithd : The Heartbeat fencingPacemaker Architecture subsystem. ● Lrmd : Local Resource Management Daemon. Interacts directly with resource agents (scripts). ● pengine Policy Engine. Computes the next state of the cluster based on the current state and the configuration. ● cib Cluster Information Base. Contains definitions of all cluster options, nodes, resources, their relationships to one another and current status. Synchronizes updates to all cluster nodes. ● crmd Cluster Resource Management Daemon. Largely a message broker for the PEngine and LRM, it also elects a leader to co-ordinate the activities of the cluster. ● openais messaging and membership layer. ● heartbeat messaging layer, an alternative to OpenAIS. ● ccm Short for Consensus Cluster Membership. The Heartbeat membership layer.
    • CRM configure● Cluster Resource Manager property $id="cib­bootstrap­options"          stonith­enabled="FALSE" ● Keeps Nodes in Sync         no­quorum­policy=ignore          start­failure­is­fatal="FALSE" ● XML Based rsc_defaults $id="rsc_defaults­options"          migration­threshold="1"          failure­timeout="1"● cibadm primitive d_mysql ocf:local:mysql          op monitor interval="30s" ● Cli manageable         params test_user="sure"  test_passwd="illtell" test_table="test.table"● Crm primitive ip_db ocf:heartbeat:IPaddr2          params ip="172.17.4.202" nic="bond0"          op monitor interval="10s" group svc_db d_mysql ip_db commit
    • Heartbeat Resources● LSB● Heartbeat resource (+status)● OCF (Open Cluster FrameWork) (+monitor)● Clones (dont use in HAv2)● Multi State Resources
    • LSB Resource Agents● LSB == Linux Standards Base● LSB resource agents are standard System V-style init scripts commonly used on Linux and other UNIX-like OSes● LSB init scripts are stored under /etc/init.d/● This enables Linux-HA to immediately support nearly every service that comes with your system, and most packages which come with their own init script● Its straightforward to change an LSB script to an OCF script
    • OCF● OCF == Open Cluster Framework● OCF Resource agents are the most powerful type of resource agent we support● OCF RAs are extended init scripts • They have additional actions: • monitor – for monitoring resource health • meta-data – for providing information about the RA● OCF RAs are located in /usr/lib/ocf/resource.d/provider-name/
    • Monitoring● Defined in the OCF Resource script● Configured in the parameters● Tomcat : • Checks a configurable health page● MySQL : • Checks query from a configurable table● Others : • Basic proces state
    • Anatomy of a Clusterconfig• Cluster properties• Resource Defaults• Primitive Definitions• Resource Groups and Constraints
    • Cluster Propertiesproperty $id="cib-bootstrap-options" stonith-enabled="FALSE" no-quorum-policy="ignore" start-failure-is-fatal="FALSE" pe-error-series-max="9" pe-warn-series-max="9" pe-input-series-max="9"No-quorum-policy = Well ignore the loss of quorum as this is a 2 node clusterpe-* = restricg loggingStart-failure : When set to FALSE, the cluster will instead use the resources failcount andvalue for resource-failure-stickiness
    • Resource Defaultsrsc_defaults $id="rsc_defaults-options" migration-threshold="1" failure-timeout="1" resource-stickiness="INFINITY"failure-timeout means that after a failure there will be a 60 second timeout before theresource can come back to the node on which it failed.Migration-treshold=1 means that after 1 failure the resource will try to start on the other nodeResource-stickiness=INFINITY means that the resource really wants to stay where it is now.
    • Primitive Definitionsprimitive d_mine ocf:custom:tomcat params instance_name="mine" monitor_urls="health.html" monitor_use_ssl="no" op monitor interval="15s" on-fail="restart" timeout="30s"primitive ip_mine_svc ocf:heartbeat:IPaddr2 params ip="10.8.4.131" cidr_netmask="16" nic="bond0" op monitor interval="10s"
    • Parsing a config● Isnt always done correctly● Even a verify wont find all issues● Unexpected behaviour might occur
    • Where a resource runs• multi state resources • Master – Slave , • e.g mysql master-slave, drbd• Clones • Resources that can run on multiple nodes e.g • Multimaster mysql servers • Mysql slaves • Stateless applications• location • Preferred location to run resource, eg. Based on hostname• colocation • Resources that have to live together • e.g ip address + service• order Define what resource has to start first, or wait for another resource• groups • Colocation + order
    • A Tomcat app on DRBD● DRBD can only be active on 1 node● The filesystem needs to be mounted on that active DRBD node●
    • Resource Groups andConstraintsgroup svc_mine d_mine ip_minems ms_drbd_storage drbd_storage meta master_max="1" master_node_max="1" clone_max="2"clone_node_max="1" notify="true"location cli-prefer-svc_db svc_db rule $id="cli-prefer-rule-svc_db" inf: #uname eq db-acolocation fs_on_drbd inf: svc_mine ms_drbd_storage:Masterorder fs_after_drbd inf: ms_drbd_storage:promote svc_mine:start
    • Using crm● Crm configure● Edit primitive● Verify● Commit
    • Crm commandsCrm Start the cluster resource managerCrm resource Change in to resource modeCrm configure Change into configure modeCrm configure show Show the current resource configCrm resource show Show the current resource stateCibadm -Q Dump the full Cluster Information Base in XML
    • But We love XML● Cibadm -Q
    • Checking the Cluster Statecrm_mon -1============Last updated: Wed Nov 4 16:44:26 2009Stack: HeartbeatCurrent DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorumVersion: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa72 Nodes configured, unknown expected votes2 Resources configured.============Online: [ xms-1 xms-2 ]Resource Group: svc_mysql d_mysql (ocf::ntc:mysql): Started xms-1 ip_mysql (ocf::heartbeat:IPaddr2): Started xms-1Resource Group: svc_XMS d_XMS (ocf::ntc:XMS): Started xms-2 ip_XMS (ocf::heartbeat:IPaddr2): Started xms-2 ip_XMS_public (ocf::heartbeat:IPaddr2): Started xms-2
    • Stopping a resourcecrm resource stop svc_XMScrm_mon -1============Last updated: Wed Nov 4 16:56:05 2009Stack: HeartbeatCurrent DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorumVersion: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa72 Nodes configured, unknown expected votes2 Resources configured.============Online: [ xms-1 xms-2 ]Resource Group: svc_mysql d_mysql (ocf::ntc:mysql): Started xms-1 ip_mysql (ocf::heartbeat:IPaddr2): Started xms-1
    • Starting a resourcecrm resource start svc_XMS crm_mon -1============Last updated: Wed Nov 4 17:04:56 2009Stack: HeartbeatCurrent DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorumVersion: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa72 Nodes configured, unknown expected votes2 Resources configured.============Online: [ xms-1 xms-2 ]Resource Group: svc_mysql d_mysql (ocf::ntc:mysql): Started xms-1 ip_mysql (ocf::heartbeat:IPaddr2): Started xms-1Resource Group: svc_XMS
    • Moving a resource● Resource migrate● Is permanent , even upon failure● Usefull in upgrade scenarios● Use resource unmigrate to restore
    • Moving a resource[xpoll-root@XMS-1 ~]# crm resource migrate svc_XMS xms-1[xpoll-root@XMS-1 ~]# crm_mon -1Last updated: Wed Nov 4 17:32:50 2009Stack: HeartbeatCurrent DC: xms-1 (c2c581f8-4edc-1de0-a959-91d246ac80f5) - partition with quorumVersion: 1.0.5-462f1569a43740667daf7b0f6b521742e9eb8fa72 Nodes configured, unknown expected votes2 Resources configured.Online: [ xms-1 xms-2 ]Resource Group: svc_mysql d_mysql (ocf::ntc:mysql): Started xms-1 ip_mysql (ocf::heartbeat:IPaddr2): Started xms-1Resource Group: svc_XMS d_XMS (ocf::ntc:XMS): Started xms-1 ip_XMS (ocf::heartbeat:IPaddr2): Started xms-1 ip_XMS_public (ocf::heartbeat:IPaddr2): Started xms-1
    • Putting a node in Standby[menos-val3-root@mss-1031a ~]# crm node standby[menos-val3-root@mss-1031a ~]# crm_mon -1============Last updated: Wed Dec 22 14:33:45 2010Stack: HeartbeatCurrent DC: mss-1031a (45674b38-5aad-4a7c-bbf1-562b2f244763) - partition with quorumVersion: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d7822 Nodes configured, unknown expected votes1 Resources configured.============Node mss-1031b (110dc817-e2ea-4290-b275-4e6d8ca7b031): OFFLINE (standby)Node mss-1031a (45674b38-5aad-4a7c-bbf1-562b2f244763): standby
    • Restoring a node from standby[menos-val3-root@mss-1031b ~]# crm node online[menos-val3-root@mss-1031b ~]# crm_mon -1============Last updated: Thu Dec 23 08:36:21 2010Stack: HeartbeatCurrent DC: mss-1031b (110dc817-e2ea-4290-b275-4e6d8ca7b031) - partition with quorumVersion: 1.0.7-d3fa20fc76c7947d6de66db7e52526dc6bd7d7822 Nodes configured, unknown expected votes1 Resources configured.============Online: [ mss-1031b mss-1031a ]
    • Migrate vs Standby● Think nrofnodes > 2 clusters● Migrate : send resource to node X • Only use that available one● Standby : do not send resources to node X • But use the other available ones
    • Debugging● Check crm_mon -f● Failcounts ?● Did the application launch correctly ?● /var/log/messages/ • Warning: very verbose● Tomcat logs
    • Resource not running[menos-val3-root@mrs-a ~]# crmcrm(live)# resourcecrm(live)resource# showResource Group: svc-MRS d_MRS (ocf::ntc:tomcat) Stopped ip_MRS_svc (ocf::heartbeat:IPaddr2) Stopped ip_MRS_usr (ocf::heartbeat:IPaddr2) Stopped
    • Resource Failcount[menos-val3-root@mrs-a ~]# crmcrm(live)# resourcecrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=1crm(live)resource# failcount d_MRS delete mrs-acrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=0
    • Resource Failcount[menos-val3-root@mrs-a ~]# crmcrm(live)# resourcecrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=1crm(live)resource# failcount d_MRS delete mrs-acrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=0
    • Resource Failcount[menos-val3-root@mrs-a ~]# crmcrm(live)# resourcecrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=1crm(live)resource# failcount d_MRS delete mrs-acrm(live)resource# failcount d_MRS show mrs-ascope=status name=fail-count-d_MRS value=0
    • Pacemaker and Puppet● Plenty of non usable modules around • Hav1● https://github.com/rodjek/puppet-pacemaker.git • Strict set of ops / parameters●● Make sure your modules dont enable resources● Ive been using templates till to populate● Cibadm to configure● Crm is complex , even crm doesnt parse correctly yet●● Plenty of work ahead !
    • Getting Help● http://clusterlabs.org● #linux-ha on irc.freenode.org● http://www.drbd.org/users-guide/
    • Contact :Kris BuytaertKris.Buytaert@inuits.beFurther Reading@krisbuytaerthttp://www.krisbuytaert.be/blog/http://www.inuits.be/http://www.virtualization.com/http://www.oreillygmt.com/ Inuits Esquimaux t Hemeltje Kheops Business Gemeentepark 2 Center 2930 Brasschaat Avenque Georges 891.514.231 Lemaître 54 6041 Gosselies +32 473 441 636 889.780.406