SlideShare a Scribd company logo
Supporting SUSE Linux                     ®



Enterprise High
Availability Extension 11
Support and Trouble-shooting




Lars Marowsky-Brée
Architect Storage and High-Availability
lmb@novell.com
Agenda

       Introduction

       Summary of Cluster Architecture

       Common Configuration Issues

       Gathering Cluster-wide Support Information

       Exploring Effects of Cluster Events

       Self-written Resource Agents

       Understanding Log Files
2   © Novell, Inc. All rights reserved.
Introduction
SUSE Linux Enterprise
                    ®




    Family
    •   SUSE Linux Enterprise Server

    •   SUSE Linux Enterprise Desktop

    •   SUSE Linux Enterprise Point of Service

    •   Extensions
         –   SUSE Linux Enterprise Real Time

         –   SUSE Linux High Availability Extension

         –   SUSE Linux Enterprise Mono Extension


4   © Novell, Inc. All rights reserved.
Data Center Challenges

                  Minimize unplanned downtime

                  Ensure quality of service

                  Contain costs

                  Utilize resources

                  Effectively manage multiple vendors

                  Minimize risk

5   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                   ®




    Value Proposition
    •   An integrated suite of robust open source clustering
        technologies that implement highly available
        physical and virtual services on Linux.

    •   Used with SUSE Linux Enterprise Server, it helps to
        maintain business continuity, protect data, and
        reduce unplanned downtime for all mission critical
        Linux workloads.

    •   Used with virtualization, it adds workload based
        availability and reliability.


6   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                   ®




    Benefits

                 Meet service-level agreements


                 Continuous access to systems and data


                 Maintain data integrity


                 Scale-out infrastructure


7   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                   ®




    Key Features
    •   Service Availability 24/7         •   Disaster Tolerance
         –   Policy driven clustering         –   Data replication via IP
              >   OpenAIS messaging and           >   Distributed replicated
                  membership layer                    block device
              >   Pacemaker cluster       •   Scale Network Services
                  resource manager
                                              –   IP load-balancing
    •   Sharing and Scaling Data-
        access by Multiple Nodes          •   User-friendly Tools
         –   Cluster file system              –   Graphical user interface
              >   OCFS2                       –   Unified command
                                                  line interface
              >   Clustered logical
                  volume manager

8   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                   ®




    HA Stack from 10 to 11
                  SLES 10                  SLE HA 11        SLE HA 11 SP1
                                             OCFS2            Metro-Area
                   Heartbeat
                                            general FS         Cluster
                                              Unified       Storage Quorum
                  DRBD 0.7
                                               CLI             Coverage
                                                                Samba
                   Yast2-HB                 Pacemaker
                                                                Cluster
                                                               Enhanced
             OCFS2 / EVMS2                   openAIS
                                                            Data Replication
                                               HA            Cluster Config
                                               GUI          Synchronization

                                           Yast2-DRBD       Node Recovery

                                          Yast2-Multipath      Web GUI



                                             Added in          Added in
              Part of SLES 10
                                            SLE HA 11       SLE HA 11 SP1
9   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                    ®




     Key Features in Service Pack 1
     •   Web GUI – Cross platform management
     •   Storage Based Quorum Coverage – Storage device as
         a quorum instance
     •   Integrated Samba Clustering – Integration of Samba with
         OCFS2 for higher throughput and scale out
     •   Metro-Area Clusters – Clustering between different data
         center locations
     •   Cluster-concurrent RAID1 – Improved resilience
     •   Enhance Data Replication – DRBD with Linbit cooperation
     •   Node Recovery – ReaR to recovery server nodes
     •   GFS2 Migration Support – Read-only access to GFS2
         for migration

10   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                    ®




     Pricing
     Pricing

          –   x86 and x86_64

               >   USD 699 per year per server

               >   Support level inherited from base SUSE Linux
                   Enterprise Server

          –   Power, Itanium, System z
               >   Bundled with SUSE Linux Enterprise Server

               >   Support level inherited from base SUSE Linux
                   Enterprise Server
11   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                    ®




     Promotion
     Existing Customers

          –   Free of charge subscription

               >   For all valid SUSE Linux Enterprise Server subscriptions

               >   Effective date: June 1st 2009

               >   Valid for subsequent subscription periods if base SUSE
                   Linux Enterprise Server is renewed on time



12   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                   ®




     Competitive Landscape
                               HP    IBM  Veritas MSFT     Steeleye   RHAT      Novell  Novell      Novell
                              HP-SG HACMP VCS     Cluster Lifekeeper Ad. Plat. SLES10 SLE HA 11 SLE HA 11 SP1

     Upper Node Limit           ↔          ↓   ↔     ↔        ↔        ↑       ↔         ↔           ↑
     Network Load-
     Balancing                  ↓          ↓   ↓     ↓        ↓        ↔       ↔         ↔           ↑
     System Recovery            ↓          ↓   ↓     ↓        ↓        ↓       ↓         ↓           ↑
     Disk Mirroring             ↔          ↓   ↓     ↓        ↓        ↑       ↔         ↑           ↑
     Platform Support           ↔          ↓   ↑     ↓        ↑        ↔       ↔         ↔           ↔
     HW Support                 ↔          ↓   ↑     ↑        ↑        ↑       ↑         ↑           ↑
     Storage Support            ↔          ↓   ↔     ↔        ↔        ↑       ↑         ↑           ↑
     ISV Support                ↔          ↓   ↑     ↑        ↔        ↔       ↔         ↔           ↔
     Setup, Installation
     and Configuration
                                ↔          ↔   ↑     ↑        ↔        ↔       ↓         ↔           ↑
     GUI                        ↔          ↔   ↑     ↑        ↑        ↔       ↓         ↔           ↑
     Command line               ↑          ↔   ↑     ↓        ↔        ↔       ↔         ↑           ↑
     Monitoring                 ↔          ↔   ↑     ↑        ↑        ↔       ↔         ↑           ↑
     Documentation              ↑          ↑   ↑     ↑        ↔        ↓       ↓         ↔           ↔

                                      Area with enhancements in SP1
13   © Novell, Inc. All rights reserved.
SUSE Linux Enterprise High Availability Extension
                    ®




     Customer Examples
                                     DFS Deutsche Flugsicherung - government-owned German Air Traffic Control


                                     Ensures the availability of critical air traffic control services by Implementing a fail-over
                                     solution using clusters of SUSE Linux Enterprise Servers.

                                     Getronics - the largest provider of IT services in the Netherlands

                                     Implemented a cost-effective high availability solution for a web-based customer
                                     information system supporting two million customers using SUSE Linux Enterprise
                                     Server, SAP, Oracle Real Application Clusters, and IBM System x3850 hardware.
                                     When the solution detects a failure in one node, it seamlessly recovers all running
                                     processes on the remaining node in its cluster.

                                     La Curacao – one of the top 100 electronics and appliance retailers in the U.S
                                     focusing on the Hispanic market

                                     Implemented SUSE Linux Enterprise Server in a clustered environment on HP
                                     ProLiant servers to run their mission critical databases and keeps La Curacao's
                                     stores running without interruption.

                                     Unitop - one of the largest producers of anionic surfactant chemicals in India.

                                     Implemented a certified high availability SAP ERP solution, using SUSE Linux
                                     Enterprise Server, IBM System x hardware, IBM DB2 information management
                                     software, and SAP, for all its business activities and information.

14   © Novell, Inc. All rights reserved.
Cluster Architecture
3 Node Cluster Overview

     Network                                                              Clients
     Links
                         LAMP
          Xen                                                       Xen
                        Apache
          VM                                                        VM
                           IP
           1                                                         2
                          ext3


                                             cLVM2+OCFS2
                                                 DLM
                                               Pacemaker
                                                                          Storage
                                           Corosync + openAIS

             Kernel                             Kernel          Kernel

16   © Novell, Inc. All rights reserved.
Detailed View of Components
     Per Node:
                                            SAP                                               ...
                                                                                                       Web GUI            LVS




                                                                                  STONITH
                                           MySQL                                            DRAC




                                                                       LSB init
                                           libvirt                                           iLO      Python GUI




                                                     Resource Agents
                                            Xen                                              SBD                          c
                                                                                                                         DRBD MPIO             c
                                                                                                                                             OpenAIS

                                           Apache
                                                                                                       CRM Shell
                                           iSCSI                                                                                  YaST2
                                                                                            Fencing           Policy




                                                                            LRM
                                      Filesystems                                                      CIB
                                                                                                              Engine
                                      IP address
                                           DRBD                                                 Pacemaker
                                     clvmd
                                Ocfs2_controld                                                      OpenAIS
                                  dlm_controld


         ext3, XFS                OCFS2
                                                                        DLM
                                                                                                             Linux Kernel
                       cLVM2
                                                                                                                                  UDP
                                                                                               SCTP              TCP             multicast
            DRBD               Multipath IO
                                                                                                               Bonding
                                   SAN                                                                                             UDP
       Local Disks             FC(oE), iSCSI                                                                   Ethernet         Infiniband
                                                                                                                                 multicast

17   © Novell, Inc. All rights reserved.
Why Is This Talk Necessary?

     We heard comments:

     •   Can't you just make the software stack easy
         to understand?

     •   Why is a multi-node setup more complicated than a
         single node?

     •   Gosh, this is awfully complicated! Why is this stuff so
         powerful? I don't need those other features!

     This session addresses most of these questions


18   © Novell, Inc. All rights reserved.
Design and Architecture Considerations
General Considerations

     •   Consider the support level requirements of your
         mission-critical systems.
     •   Your staff is your key asset!
          –   Invest in training, processes, knowledge sharing.
          –   A good administrator will provide higher availability than a
              mediocre cluster setup.
     •   Get expert help for the initial setup, and
     •   Write concise operation manuals that make sense at
         3am on a Saturday ;-)
     •   Thoroughly test the cluster regularly.
          –   Use a staging system before deploying large changes!

20   © Novell, Inc. All rights reserved.
Manage Expectations Properly

     •   Clustering improves reliability, but does not
         achieve 100%, ever.

     •   Clusters are more complex than single nodes.

     •   Fail-over clusters reduce service outage, but do
         not eliminate it.

     •   Clustering broken applications will not fix them.

     •   Clusters do not replace backups, RAID, or
         good hardware.


21   © Novell, Inc. All rights reserved.
Complexity Versus Reliability

     •   Every component has a failure probability.
          –   Good complexity: Redundant components.
          –   Undesirable complexity: chained components.
          –   Choke point → single point of failure
          –   Also consider: Administrative complexity.
     •   Use as few components (features) as feasible.
          –   Our extensive feature list is not a mandatory checklist for
              your deployment ;-)
     •   What is your fall-back in case the cluster breaks?
          –   Backups, non-clustered operation
          –   Architect your system so that this is feasible!

22   © Novell, Inc. All rights reserved.
Cluster Size Considerations

     •   More nodes:
          –   Increased absolute redundancy and capacity.
          –   Decreased relative redundancy.
          –   One cluster → one failure domain.
     •   Does your work-load scale well to larger node counts?
     •   Chose odd node counts.
          –   4 and 3 node clusters both lose majority after 2 nodes.
     •   Question:
          –   5 cheaper servers, or
          –   3 higher quality servers with more capacity each?


23   © Novell, Inc. All rights reserved.
Common Setup Issues
General Software Stack

     •   Please avoid chasing already solved problems!

     •   Please apply all available software updates:
          –   SUSE Linux Enterprise Server 11
                           ®




          –   SUSE Linux Enterprise High Availability Extension


     •   Consider migrating to SUSE Linux Enterprise High
         Availability Extension, if you have not already.
          –   Usability, ease of setup, integration are all much improved.
          –   SUSE Linux Enterprise Server 10 remains fully supported.


25   © Novell, Inc. All rights reserved.
From One to Many Nodes

     •   Error: Configuration files not identical across nodes.
          –   /etc/drbd.conf, /etc/corosync/corosync.conf, /etc/ais/openais.conf,
              resource-specific configurations ...
     •   Symptoms: Causes weird misbehavior, works one but
         not on other systems, interoperability issues, and
         possibly others.
     •   Solution: Make sure they are synchronized.
          –   SUSE Linux Enterprise High Availability Extension 11 SP1
                           ®



              provides “csync2” to do this automatically for you.
               >   You can add your own files to this list as needed.



26   © Novell, Inc. All rights reserved.
Networking

     •   Switches must support multicast properly.

     •   Bonding is preferable to using multiple rings:
          –   Reduces complexity

          –   Exposes redundancy to all cluster services and clients

     •   Firewall rules are not your friend.

     •   Keep firmware on switches uptodate!

     •   Make NIC names identical on all nodes

27   © Novell, Inc. All rights reserved.
Fencing (STONITH)

     •   Error: Not configuring STONITH at all
          –   It defaults to enabled, resource start-up will block and the
              cluster simply do nothing. This is for your own protection.
     •   Warning: Disabling STONITH
          –   DLM/OCFS2 will block forever waiting for a fence that is never
              going to happen.
     •   Error: Using “external/ssh”, “ssh”, “null” in production
          –   These plug-ins are for testing. They will not work in production!
          –   Use a “real” fencing device or external/sbd
     •   Error: configuring several power switches in parallel.
     •   Error: Trying to use external/sbd on DRBD

28   © Novell, Inc. All rights reserved.
CIB Configuration Issues

     •   2 node clusters cannot have majority with 1 node failed
          –   # crm configure property no-quorum-policy=ignore
     •   Resources are starting up in “random” order or on
         “wrong” nodes
                                                                We'll
          –   Add required constraints!                       get back
                                                              to that ...
     •   Resources move around when
         something “unrelated” changes
          –   # crm configure property default-resource-stickiness=1000
     •   # crm_verify -L ; ptest -L -VVVV
          –   Will point out some basic issues
29   © Novell, Inc. All rights reserved.
Configuring Cluster Resources

     •   Symptom: On start of one or more nodes, the cluster
         restarts resources!

     •   Cause: resources under cluster control are also started
         via the “init” sequence.
          –   The cluster “probes” all resources on start-up on a node, and
              when it finds resources active where they should not be –
              possibly even more than once in the cluster –, the recovery
              protocol is to stop them all (including all dependencies) and
              start them cleanly again.

     •   Solution: Remove them from the “init” sequence.

30   © Novell, Inc. All rights reserved.
Setting Resource Time-outs

     •   Belief: “Shorter time-outs make the cluster
         respond faster.”
     •   Fact: Too short time-outs cause resource operations
         to “fail” erroneously, making the cluster unstable
         and unpredictable.
          –   A somewhat too long time-out will cause a fail-over delay;
          –   a slightly too short time-out will cause an unnecessary
              service outage.
     •   Consider that a loaded cluster node may be slower
         than during deployment testing.
          –   Check “crm_mon -t1” output for the actual run-times
              of resources.
31   © Novell, Inc. All rights reserved.
OCFS2

     •   Using ocfs2-tools-o2cb (legacy mode)
          –   O2CB only works with Oracle RAC; full features of SUSE Linux ®



              Enterprise High Availability Extension are only available in
              combination with Pacemaker
          –   # zypper rm ocfs2-tools-o2cb
          –   Forget about /etc/ocfs2/cluster.conf, /etc/init.d/ocfs2, /etc/init.d/o2cb
              and /etc/sysconfig/ocfs2
     •   Nodes crash on shutdown
          –   If you have active ocfs2 mounts, you need to umount before shutdown
          –   If openais is part of the boot sequence
               >   # insserv openais
     •   Consider: Do you really need OCFS2?
         – Can your application really run concurrently?



32   © Novell, Inc. All rights reserved.
Distributed Replicated Block Device

     •   Myth: has no shared state, thus no STONITH needed.
          –   Fact: DRBD still needs fencing!
     •   Active/Active:
          –   Does not magically make ext3 or applications
              concurrency-safe, still can only be mounted once
          –   With OCFS2, split-brain is still fatal, as data diverges!
     •   Active/Passive:
          –   Ensures only one side can modify data, added protection.
          –   Does not magically make applications crash-safe.
     •   Issue: Replication traffic during reads.
          –   “noatime” mount option.

33   © Novell, Inc. All rights reserved.
Storage in General

     •   Activating non-battery backed caches for performance

          –   Causes data corruption.

     •   iSCSI over unreliable networks.

     •   Lack of multipath for storage.

     •   Believing that RAID replaces backups.
          –   RAID and replication immediately propagate logical errors!

     •   Please ensure that device names are identical on
         all nodes.
34   © Novell, Inc. All rights reserved.
Exploring the Effect of Events
What Are Events?

     •   All state changes to the cluster are events
          –   They cause an update of the CIB
          –   Configuration changes by the administrator
          –   Nodes going up or going down
          –   Resource monitoring failures
     •   Response to events is configured using the CIB
         policies and computed by the Policy Engine
     •   This can be simulated using ptest
          –   Available comfortably through the “crm” shell


36   © Novell, Inc. All rights reserved.
Simulating Node Failure

     hex-0:~ # crm

     crm(live)# cib new sandbox

     INFO: sandbox shadow CIB created

     crm(sandbox)# cib cibstatus node hex-0
     unclean

     crm(sandbox)# ptest




37   © Novell, Inc. All rights reserved.
Simulating Node Failure




38   © Novell, Inc. All rights reserved.
Simulating Resource Failure

     crm(sandbox)# cib cibstatus load live
     crm(sandbox)# cib cibstatus op
     usage: op <operation> <resource> <exit_code> [<op_status>]
     [<node>]
     crm(sandbox)# cib cibstatus op start
     dummy1 not_running done hex-0
     crm(sandbox)# cib cibstatus op start
     dummy1 unknown timeout hex-0
     crm(sandbox)# configure ptest
     ptest[4918]: 2010/02/17_12:44:17 WARN: unpack_rsc_op:
     Processing failed op dummy1_start_0 on hex-0: unknown error (1)



39   © Novell, Inc. All rights reserved.
Simulating Resource Failure




40   © Novell, Inc. All rights reserved.
Exploring Configuration Changes

     crm(sandbox)# cib cibstatus load live

     crm(sandbox)# configure primitive dummy42
     ocf:heartbeat:Dummy

     crm(sandbox)# ptest




41   © Novell, Inc. All rights reserved.
Configuration Changes - Woah!




42   © Novell, Inc. All rights reserved.
Exploring Configuration Changes


     crm(sandbox)# configure rsc_defaults
     resource-stickiness=1000
     crm(sandbox)# ptest
     crm(sandbox)# configure order order-42
     inf: dummy42 dummy1
     crm(sandbox)# ptest




43   © Novell, Inc. All rights reserved.
Configuration Changes – Almost ...




44   © Novell, Inc. All rights reserved.
Configuration Changes - Done




45   © Novell, Inc. All rights reserved.
Log Files and Their Meaning
hb_report Is the Silver Support Bullet

     •   Compiles
          –   Cluster-wide log files,
          –   Package state,
          –   DLM/OCFS2 state,
          –   System information,
          –   CIB history,
          –   Parsed core dump reports, into a single tarball for all
              support needs.
     •   # hb_report -n “node1 node2 node3” -f 12:00
         /tmp/hb_report_example1

47   © Novell, Inc. All rights reserved.
Logging

     •   “The cluster generates too many log messages!”
          –   Alas, customers are even more upset when asked to reproduce
              a problem on their production system ;-)

          –   Incidentially, all command line invocations are logged.

     •   System-wide logs: /var/log/messages
     •   CIB history: /var/lib/pengine/*
          –   All cluster events are logged here and can be analyzed with
              hindsight (python GUI, ptest, and the crm shell).



48   © Novell, Inc. All rights reserved.
Where Is the Real Cause?


         The answer is always in the logs.


         Even though the logs on the DC may print a reference
         to the error, the real cause may be on another node.


         Most errors are caused by resource agent
         misconfiguration.



49   © Novell, Inc. All rights reserved.
Correlating Messages to Their Cause

     •   Feb 17 13:06:57 hex-8 pengine: [7717]: WARN:
         unpack_rsc_op: Processing failed op ocfs2-
         1:2_monitor_20000 on hex-0: not running (7)
          –   This is not the failure, just the Policy Engine reporting on the
              CIB state! The real messages are on hex-0, grep for the
              operation key:
     •   Feb 17 13:06:57 hex-0 Filesystem[24825]: [24861]: INFO: /filer
         is unmounted (stopped)
     •   Feb 17 13:06:57 hex-0 crmd: [7334]: info: process_lrm_event:
         LRM operation ocfs 2-1:2_monitor_20000 (call=37, rc=7, cib-
         update=55, confirmed=false) not running
          –   Look for the error messages from the resource agent before the
              lrmd/pengine lines!

50   © Novell, Inc. All rights reserved.
Debugging Resource Agents
Common Resource Agent Issues

     •   Operations must succeed if the resource is already
         in the requested state.
     •   “monitor” must distinguish between at least
         “running/OK”, “running/failed”, and “stopped”
          –   Probes deserve special attention
     •   Meta-data must conform to DTD.
     •   3rd party resource agents do not belong under
         /usr/lib/ocf/resource.d/heartbeat – chose your own
         provider name!
     •   Use ocf-tester to validate your resource agent.

52   © Novell, Inc. All rights reserved.
ocf-tester Example Output
     hex-0:~ # ocf-tester -n Example
     /usr/lib/ocf/resource.d/bs2010/Dummy

     Beginning tests for /usr/lib/ocf/resource.d/bs2010/Dummy...

     * Your agent does not support the notify action (optional)

     * Your agent does not support the demote action (optional)

     * Your agent does not support the promote action (optional)

     * Your agent does not support master/slave (optional)

     * rc=7: Stopping a stopped resource is required to succeed

     Tests failed: /usr/lib/ocf/resource.d/bs2010/Dummy failed 1
     tests




53   © Novell, Inc. All rights reserved.
Questions and Answers
Unpublished Work of Novell, Inc. All Rights Reserved.
This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc.
Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope
of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified,
translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc.
Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability.


General Disclaimer
This document is not to be construed as a promise by any participating company to develop, deliver, or market a
product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in
making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents
of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any
particular purpose. The development, release, and timing of features or functionality described for Novell products
remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to
make changes to its content, at any time, without obligation to notify any person or entity of such revisions or
changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc.
in the United States and other countries. All third-party trademarks are the property of their respective owners.

More Related Content

What's hot

Sql Server High Availability & DR Technologies
Sql Server High Availability & DR TechnologiesSql Server High Availability & DR Technologies
Sql Server High Availability & DR Technologies
RockSolid SQL
 
GWAVACon 2013: Novell Open Enterprise Server Best Practices
GWAVACon 2013: Novell Open Enterprise Server Best PracticesGWAVACon 2013: Novell Open Enterprise Server Best Practices
GWAVACon 2013: Novell Open Enterprise Server Best Practices
GWAVA
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
SQLExpert.pl
 
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinuxSUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
Patrick Quairoli
 
The SQL Stack Design And Configurations
The SQL Stack Design And ConfigurationsThe SQL Stack Design And Configurations
The SQL Stack Design And Configurations
Stephan Lawson
 
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle Final
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle FinalCvc2009 Moscow Xen App5 Fp1 Fabian Kienle Final
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle FinalLiudmila Li
 
Dell PowerEdge C6220: Performance for large infrastructures
Dell PowerEdge C6220: Performance for large infrastructuresDell PowerEdge C6220: Performance for large infrastructures
Dell PowerEdge C6220: Performance for large infrastructures
Principled Technologies
 
Nexenta NV4V v2.0 Features
Nexenta NV4V v2.0 FeaturesNexenta NV4V v2.0 Features
Nexenta NV4V v2.0 FeaturesEvan Powell
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster Recovery
Michael Poremba
 
Scaling Xen within Rackspace Cloud Servers
Scaling Xen within Rackspace Cloud ServersScaling Xen within Rackspace Cloud Servers
Scaling Xen within Rackspace Cloud Servers
The Linux Foundation
 
rama linux solaris vmware admin resume
rama linux  solaris vmware admin resumerama linux  solaris vmware admin resume
rama linux solaris vmware admin resumeRam Ram
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql server
Eduardo Castro
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
Vepsun Technologies
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part X
Severalnines
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksJignesh Shah
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
Alex Gorbachev
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines
 
VMware Site Recovery Manager (SRM) 6.0 Lab Manual
VMware Site Recovery Manager (SRM) 6.0 Lab ManualVMware Site Recovery Manager (SRM) 6.0 Lab Manual
VMware Site Recovery Manager (SRM) 6.0 Lab Manual
Sanjeev Kumar
 

What's hot (20)

Sql Server High Availability & DR Technologies
Sql Server High Availability & DR TechnologiesSql Server High Availability & DR Technologies
Sql Server High Availability & DR Technologies
 
GWAVACon 2013: Novell Open Enterprise Server Best Practices
GWAVACon 2013: Novell Open Enterprise Server Best PracticesGWAVACon 2013: Novell Open Enterprise Server Best Practices
GWAVACon 2013: Novell Open Enterprise Server Best Practices
 
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
Always On - Wydajność i bezpieczeństwo naszych danych - High Availability SQL...
 
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinuxSUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
SUSE Linux Enterprise Server 11 SP2 for IBM PowerLinux
 
The SQL Stack Design And Configurations
The SQL Stack Design And ConfigurationsThe SQL Stack Design And Configurations
The SQL Stack Design And Configurations
 
Babu_Linux_Solaris_CV
Babu_Linux_Solaris_CVBabu_Linux_Solaris_CV
Babu_Linux_Solaris_CV
 
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle Final
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle FinalCvc2009 Moscow Xen App5 Fp1 Fabian Kienle Final
Cvc2009 Moscow Xen App5 Fp1 Fabian Kienle Final
 
Dell PowerEdge C6220: Performance for large infrastructures
Dell PowerEdge C6220: Performance for large infrastructuresDell PowerEdge C6220: Performance for large infrastructures
Dell PowerEdge C6220: Performance for large infrastructures
 
Nexenta NV4V v2.0 Features
Nexenta NV4V v2.0 FeaturesNexenta NV4V v2.0 Features
Nexenta NV4V v2.0 Features
 
SQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster RecoverySQL Server High Availability and Disaster Recovery
SQL Server High Availability and Disaster Recovery
 
Scaling Xen within Rackspace Cloud Servers
Scaling Xen within Rackspace Cloud ServersScaling Xen within Rackspace Cloud Servers
Scaling Xen within Rackspace Cloud Servers
 
rama linux solaris vmware admin resume
rama linux  solaris vmware admin resumerama linux  solaris vmware admin resume
rama linux solaris vmware admin resume
 
Introduction to failover clustering with sql server
Introduction to failover clustering with sql serverIntroduction to failover clustering with sql server
Introduction to failover clustering with sql server
 
VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3VMware Advance Troubleshooting Workshop - Day 3
VMware Advance Troubleshooting Workshop - Day 3
 
Severalnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part XSeveralnines Training: MySQL Cluster - Part X
Severalnines Training: MySQL Cluster - Part X
 
Understanding PostgreSQL LW Locks
Understanding PostgreSQL LW LocksUnderstanding PostgreSQL LW Locks
Understanding PostgreSQL LW Locks
 
sankar
sankarsankar
sankar
 
Oracle ASM 11g - The Evolution
Oracle ASM 11g - The EvolutionOracle ASM 11g - The Evolution
Oracle ASM 11g - The Evolution
 
Severalnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part VSeveralnines Self-Training: MySQL® Cluster - Part V
Severalnines Self-Training: MySQL® Cluster - Part V
 
VMware Site Recovery Manager (SRM) 6.0 Lab Manual
VMware Site Recovery Manager (SRM) 6.0 Lab ManualVMware Site Recovery Manager (SRM) 6.0 Lab Manual
VMware Site Recovery Manager (SRM) 6.0 Lab Manual
 

Viewers also liked

Cisco SUSE sapphire2016_booth-presentation
Cisco SUSE sapphire2016_booth-presentationCisco SUSE sapphire2016_booth-presentation
Cisco SUSE sapphire2016_booth-presentation
Mike Nelson
 
Ha cluster with openSUSE Leap
Ha cluster with openSUSE LeapHa cluster with openSUSE Leap
Ha cluster with openSUSE Leap
medwinz
 
Manage Virtual Machines with WebVirtMgr on openSUSE
Manage Virtual Machines with WebVirtMgr on openSUSEManage Virtual Machines with WebVirtMgr on openSUSE
Manage Virtual Machines with WebVirtMgr on openSUSE
Dendy P. Delly
 
Ha opensuse
Ha opensuseHa opensuse
Ha opensuse
Kenny (netman)
 
Building a Two Node SLES 11 SP2 Linux Cluster with VMware
Building a Two Node SLES 11 SP2 Linux Cluster with VMwareBuilding a Two Node SLES 11 SP2 Linux Cluster with VMware
Building a Two Node SLES 11 SP2 Linux Cluster with VMware
geekswing
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Novell
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy Steps
Tim Serong
 
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
Dirk Oppenkowski
 

Viewers also liked (8)

Cisco SUSE sapphire2016_booth-presentation
Cisco SUSE sapphire2016_booth-presentationCisco SUSE sapphire2016_booth-presentation
Cisco SUSE sapphire2016_booth-presentation
 
Ha cluster with openSUSE Leap
Ha cluster with openSUSE LeapHa cluster with openSUSE Leap
Ha cluster with openSUSE Leap
 
Manage Virtual Machines with WebVirtMgr on openSUSE
Manage Virtual Machines with WebVirtMgr on openSUSEManage Virtual Machines with WebVirtMgr on openSUSE
Manage Virtual Machines with WebVirtMgr on openSUSE
 
Ha opensuse
Ha opensuseHa opensuse
Ha opensuse
 
Building a Two Node SLES 11 SP2 Linux Cluster with VMware
Building a Two Node SLES 11 SP2 Linux Cluster with VMwareBuilding a Two Node SLES 11 SP2 Linux Cluster with VMware
Building a Two Node SLES 11 SP2 Linux Cluster with VMware
 
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
Building High Availability Clusters with SUSE Linux Enterprise High Availabil...
 
High Availability in 37 Easy Steps
High Availability in 37 Easy StepsHigh Availability in 37 Easy Steps
High Availability in 37 Easy Steps
 
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
SUSE Linux Enterprise and SAP NetWeaver 7.30 HA Cluster
 

Similar to SUSE Linux Enterprise High Availability Extension 11: Support and Troubleshooting

SUSE Linux Enterprise Server for System z SP1
SUSE Linux Enterprise Server  for System z SP1 SUSE Linux Enterprise Server  for System z SP1
SUSE Linux Enterprise Server for System z SP1
Novell
 
Cl306
Cl306Cl306
Cl116
Cl116Cl116
Win2k8 cluster kaliyan
Win2k8 cluster kaliyanWin2k8 cluster kaliyan
Win2k8 cluster kaliyan
Kaliyan S
 
Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016
Anderson Bassani
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to Linux
Novell
 
2013 linux days final
2013 linux days final2013 linux days final
2013 linux days final
RandomShare
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practicesHaseeb Alam
 
Linux on System z – disk I/O performance
Linux on System z – disk I/O performanceLinux on System z – disk I/O performance
Linux on System z – disk I/O performance
IBM India Smarter Computing
 
File Access in Novell Open Enterprise Server 2 SP2
File Access in Novell Open Enterprise Server 2 SP2File Access in Novell Open Enterprise Server 2 SP2
File Access in Novell Open Enterprise Server 2 SP2
Novell
 
04_virtualization1_v1.pdf
04_virtualization1_v1.pdf04_virtualization1_v1.pdf
04_virtualization1_v1.pdf
HossainOrnob
 
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSEXPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
The Linux Foundation
 
Migrating P2V: SUSE Linux Enterprise Server with Xen
Migrating P2V: SUSE Linux Enterprise Server with XenMigrating P2V: SUSE Linux Enterprise Server with Xen
Migrating P2V: SUSE Linux Enterprise Server with Xen
Novell
 
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
ITband
 
Update Management and Compliance Monitoring with the Subscription Management...
Update Management and Compliance Monitoring with the Subscription  Management...Update Management and Compliance Monitoring with the Subscription  Management...
Update Management and Compliance Monitoring with the Subscription Management...
Novell
 
Suse Linux Enterprise Server 9 - A Review by Larkin Cunningham
Suse Linux Enterprise Server 9 - A Review by Larkin CunninghamSuse Linux Enterprise Server 9 - A Review by Larkin Cunningham
Suse Linux Enterprise Server 9 - A Review by Larkin Cunningham
Larkin Cunningham
 
Novell Open Enterprise Server Architecture
Novell Open Enterprise Server ArchitectureNovell Open Enterprise Server Architecture
Novell Open Enterprise Server Architecture
Novell
 
Cl221
Cl221Cl221
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
IBM India Smarter Computing
 

Similar to SUSE Linux Enterprise High Availability Extension 11: Support and Troubleshooting (20)

SUSE Linux Enterprise Server for System z SP1
SUSE Linux Enterprise Server  for System z SP1 SUSE Linux Enterprise Server  for System z SP1
SUSE Linux Enterprise Server for System z SP1
 
Cl306
Cl306Cl306
Cl306
 
Cl116
Cl116Cl116
Cl116
 
Win2k8 cluster kaliyan
Win2k8 cluster kaliyanWin2k8 cluster kaliyan
Win2k8 cluster kaliyan
 
Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016Cloud stack for z Systems - July 2016
Cloud stack for z Systems - July 2016
 
Migrating Novell GroupWise to Linux
Migrating Novell GroupWise to LinuxMigrating Novell GroupWise to Linux
Migrating Novell GroupWise to Linux
 
2013 linux days final
2013 linux days final2013 linux days final
2013 linux days final
 
Oracle rac 10g best practices
Oracle rac 10g best practicesOracle rac 10g best practices
Oracle rac 10g best practices
 
Linux on System z – disk I/O performance
Linux on System z – disk I/O performanceLinux on System z – disk I/O performance
Linux on System z – disk I/O performance
 
File Access in Novell Open Enterprise Server 2 SP2
File Access in Novell Open Enterprise Server 2 SP2File Access in Novell Open Enterprise Server 2 SP2
File Access in Novell Open Enterprise Server 2 SP2
 
04_virtualization1_v1.pdf
04_virtualization1_v1.pdf04_virtualization1_v1.pdf
04_virtualization1_v1.pdf
 
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSEXPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
XPDS16: libvirt and Tools: What's New and What's Next - James Fehlig, SUSE
 
Migrating P2V: SUSE Linux Enterprise Server with Xen
Migrating P2V: SUSE Linux Enterprise Server with XenMigrating P2V: SUSE Linux Enterprise Server with Xen
Migrating P2V: SUSE Linux Enterprise Server with Xen
 
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
分会场二深入分析Veritas cluster server和storage foundation在aix高可用以及灾难恢复环境下如何对存储管理进行优化
 
Update Management and Compliance Monitoring with the Subscription Management...
Update Management and Compliance Monitoring with the Subscription  Management...Update Management and Compliance Monitoring with the Subscription  Management...
Update Management and Compliance Monitoring with the Subscription Management...
 
Suse Linux Enterprise Server 9 - A Review by Larkin Cunningham
Suse Linux Enterprise Server 9 - A Review by Larkin CunninghamSuse Linux Enterprise Server 9 - A Review by Larkin Cunningham
Suse Linux Enterprise Server 9 - A Review by Larkin Cunningham
 
Novell Open Enterprise Server Architecture
Novell Open Enterprise Server ArchitectureNovell Open Enterprise Server Architecture
Novell Open Enterprise Server Architecture
 
shaziaresume.
shaziaresume.shaziaresume.
shaziaresume.
 
Cl221
Cl221Cl221
Cl221
 
What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?What's New in RHEL 6 for Linux on System z?
What's New in RHEL 6 for Linux on System z?
 

More from Novell

Filr white paper
Filr white paperFilr white paper
Filr white paper
Novell
 
Social media class 4 v2
Social media class 4 v2Social media class 4 v2
Social media class 4 v2Novell
 
Social media class 3
Social media class 3Social media class 3
Social media class 3Novell
 
Social media class 2
Social media class 2Social media class 2
Social media class 2Novell
 
Social media class 1
Social media class 1Social media class 1
Social media class 1Novell
 
Social media class 2 v2
Social media class 2 v2Social media class 2 v2
Social media class 2 v2Novell
 
LinkedIn training presentation
LinkedIn training presentationLinkedIn training presentation
LinkedIn training presentation
Novell
 
Twitter training presentation
Twitter training presentationTwitter training presentation
Twitter training presentation
Novell
 
Getting started with social media
Getting started with social mediaGetting started with social media
Getting started with social media
Novell
 
Strategies for sharing and commenting in social media
Strategies for sharing and commenting in social mediaStrategies for sharing and commenting in social media
Strategies for sharing and commenting in social media
Novell
 
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECH
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECHInformation Security & Compliance in Healthcare: Beyond HIPAA and HITECH
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECH
Novell
 
Workload iq final
Workload iq   finalWorkload iq   final
Workload iq finalNovell
 
The Identity-infused Enterprise
The Identity-infused EnterpriseThe Identity-infused Enterprise
The Identity-infused Enterprise
Novell
 
Shining the Enterprise Light on Shades of Social
Shining the Enterprise Light on Shades of SocialShining the Enterprise Light on Shades of Social
Shining the Enterprise Light on Shades of Social
Novell
 
Accelerate to the Cloud
Accelerate to the CloudAccelerate to the Cloud
Accelerate to the Cloud
Novell
 
The New Business Value of Today’s Collaboration Trends
The New Business Value of Today’s Collaboration TrendsThe New Business Value of Today’s Collaboration Trends
The New Business Value of Today’s Collaboration Trends
Novell
 
Preventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementPreventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log Management
Novell
 
Iaas for a demanding business
Iaas for a demanding businessIaas for a demanding business
Iaas for a demanding businessNovell
 
Workload IQ: A Differentiated Approach
Workload IQ: A Differentiated ApproachWorkload IQ: A Differentiated Approach
Workload IQ: A Differentiated Approach
Novell
 
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
Novell
 

More from Novell (20)

Filr white paper
Filr white paperFilr white paper
Filr white paper
 
Social media class 4 v2
Social media class 4 v2Social media class 4 v2
Social media class 4 v2
 
Social media class 3
Social media class 3Social media class 3
Social media class 3
 
Social media class 2
Social media class 2Social media class 2
Social media class 2
 
Social media class 1
Social media class 1Social media class 1
Social media class 1
 
Social media class 2 v2
Social media class 2 v2Social media class 2 v2
Social media class 2 v2
 
LinkedIn training presentation
LinkedIn training presentationLinkedIn training presentation
LinkedIn training presentation
 
Twitter training presentation
Twitter training presentationTwitter training presentation
Twitter training presentation
 
Getting started with social media
Getting started with social mediaGetting started with social media
Getting started with social media
 
Strategies for sharing and commenting in social media
Strategies for sharing and commenting in social mediaStrategies for sharing and commenting in social media
Strategies for sharing and commenting in social media
 
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECH
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECHInformation Security & Compliance in Healthcare: Beyond HIPAA and HITECH
Information Security & Compliance in Healthcare: Beyond HIPAA and HITECH
 
Workload iq final
Workload iq   finalWorkload iq   final
Workload iq final
 
The Identity-infused Enterprise
The Identity-infused EnterpriseThe Identity-infused Enterprise
The Identity-infused Enterprise
 
Shining the Enterprise Light on Shades of Social
Shining the Enterprise Light on Shades of SocialShining the Enterprise Light on Shades of Social
Shining the Enterprise Light on Shades of Social
 
Accelerate to the Cloud
Accelerate to the CloudAccelerate to the Cloud
Accelerate to the Cloud
 
The New Business Value of Today’s Collaboration Trends
The New Business Value of Today’s Collaboration TrendsThe New Business Value of Today’s Collaboration Trends
The New Business Value of Today’s Collaboration Trends
 
Preventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log ManagementPreventing The Next Data Breach Through Log Management
Preventing The Next Data Breach Through Log Management
 
Iaas for a demanding business
Iaas for a demanding businessIaas for a demanding business
Iaas for a demanding business
 
Workload IQ: A Differentiated Approach
Workload IQ: A Differentiated ApproachWorkload IQ: A Differentiated Approach
Workload IQ: A Differentiated Approach
 
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
Virtual Appliances: Simplifying Application Deployment and Accelerating Your ...
 

SUSE Linux Enterprise High Availability Extension 11: Support and Troubleshooting

  • 1. Supporting SUSE Linux ® Enterprise High Availability Extension 11 Support and Trouble-shooting Lars Marowsky-Brée Architect Storage and High-Availability lmb@novell.com
  • 2. Agenda Introduction Summary of Cluster Architecture Common Configuration Issues Gathering Cluster-wide Support Information Exploring Effects of Cluster Events Self-written Resource Agents Understanding Log Files 2 © Novell, Inc. All rights reserved.
  • 4. SUSE Linux Enterprise ® Family • SUSE Linux Enterprise Server • SUSE Linux Enterprise Desktop • SUSE Linux Enterprise Point of Service • Extensions – SUSE Linux Enterprise Real Time – SUSE Linux High Availability Extension – SUSE Linux Enterprise Mono Extension 4 © Novell, Inc. All rights reserved.
  • 5. Data Center Challenges Minimize unplanned downtime Ensure quality of service Contain costs Utilize resources Effectively manage multiple vendors Minimize risk 5 © Novell, Inc. All rights reserved.
  • 6. SUSE Linux Enterprise High Availability Extension ® Value Proposition • An integrated suite of robust open source clustering technologies that implement highly available physical and virtual services on Linux. • Used with SUSE Linux Enterprise Server, it helps to maintain business continuity, protect data, and reduce unplanned downtime for all mission critical Linux workloads. • Used with virtualization, it adds workload based availability and reliability. 6 © Novell, Inc. All rights reserved.
  • 7. SUSE Linux Enterprise High Availability Extension ® Benefits Meet service-level agreements Continuous access to systems and data Maintain data integrity Scale-out infrastructure 7 © Novell, Inc. All rights reserved.
  • 8. SUSE Linux Enterprise High Availability Extension ® Key Features • Service Availability 24/7 • Disaster Tolerance – Policy driven clustering – Data replication via IP > OpenAIS messaging and > Distributed replicated membership layer block device > Pacemaker cluster • Scale Network Services resource manager – IP load-balancing • Sharing and Scaling Data- access by Multiple Nodes • User-friendly Tools – Cluster file system – Graphical user interface > OCFS2 – Unified command line interface > Clustered logical volume manager 8 © Novell, Inc. All rights reserved.
  • 9. SUSE Linux Enterprise High Availability Extension ® HA Stack from 10 to 11 SLES 10 SLE HA 11 SLE HA 11 SP1 OCFS2 Metro-Area Heartbeat general FS Cluster Unified Storage Quorum DRBD 0.7 CLI Coverage Samba Yast2-HB Pacemaker Cluster Enhanced OCFS2 / EVMS2 openAIS Data Replication HA Cluster Config GUI Synchronization Yast2-DRBD Node Recovery Yast2-Multipath Web GUI Added in Added in Part of SLES 10 SLE HA 11 SLE HA 11 SP1 9 © Novell, Inc. All rights reserved.
  • 10. SUSE Linux Enterprise High Availability Extension ® Key Features in Service Pack 1 • Web GUI – Cross platform management • Storage Based Quorum Coverage – Storage device as a quorum instance • Integrated Samba Clustering – Integration of Samba with OCFS2 for higher throughput and scale out • Metro-Area Clusters – Clustering between different data center locations • Cluster-concurrent RAID1 – Improved resilience • Enhance Data Replication – DRBD with Linbit cooperation • Node Recovery – ReaR to recovery server nodes • GFS2 Migration Support – Read-only access to GFS2 for migration 10 © Novell, Inc. All rights reserved.
  • 11. SUSE Linux Enterprise High Availability Extension ® Pricing Pricing – x86 and x86_64 > USD 699 per year per server > Support level inherited from base SUSE Linux Enterprise Server – Power, Itanium, System z > Bundled with SUSE Linux Enterprise Server > Support level inherited from base SUSE Linux Enterprise Server 11 © Novell, Inc. All rights reserved.
  • 12. SUSE Linux Enterprise High Availability Extension ® Promotion Existing Customers – Free of charge subscription > For all valid SUSE Linux Enterprise Server subscriptions > Effective date: June 1st 2009 > Valid for subsequent subscription periods if base SUSE Linux Enterprise Server is renewed on time 12 © Novell, Inc. All rights reserved.
  • 13. SUSE Linux Enterprise High Availability Extension ® Competitive Landscape HP IBM Veritas MSFT Steeleye RHAT Novell Novell Novell HP-SG HACMP VCS Cluster Lifekeeper Ad. Plat. SLES10 SLE HA 11 SLE HA 11 SP1 Upper Node Limit ↔ ↓ ↔ ↔ ↔ ↑ ↔ ↔ ↑ Network Load- Balancing ↓ ↓ ↓ ↓ ↓ ↔ ↔ ↔ ↑ System Recovery ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↓ ↑ Disk Mirroring ↔ ↓ ↓ ↓ ↓ ↑ ↔ ↑ ↑ Platform Support ↔ ↓ ↑ ↓ ↑ ↔ ↔ ↔ ↔ HW Support ↔ ↓ ↑ ↑ ↑ ↑ ↑ ↑ ↑ Storage Support ↔ ↓ ↔ ↔ ↔ ↑ ↑ ↑ ↑ ISV Support ↔ ↓ ↑ ↑ ↔ ↔ ↔ ↔ ↔ Setup, Installation and Configuration ↔ ↔ ↑ ↑ ↔ ↔ ↓ ↔ ↑ GUI ↔ ↔ ↑ ↑ ↑ ↔ ↓ ↔ ↑ Command line ↑ ↔ ↑ ↓ ↔ ↔ ↔ ↑ ↑ Monitoring ↔ ↔ ↑ ↑ ↑ ↔ ↔ ↑ ↑ Documentation ↑ ↑ ↑ ↑ ↔ ↓ ↓ ↔ ↔ Area with enhancements in SP1 13 © Novell, Inc. All rights reserved.
  • 14. SUSE Linux Enterprise High Availability Extension ® Customer Examples DFS Deutsche Flugsicherung - government-owned German Air Traffic Control Ensures the availability of critical air traffic control services by Implementing a fail-over solution using clusters of SUSE Linux Enterprise Servers. Getronics - the largest provider of IT services in the Netherlands Implemented a cost-effective high availability solution for a web-based customer information system supporting two million customers using SUSE Linux Enterprise Server, SAP, Oracle Real Application Clusters, and IBM System x3850 hardware. When the solution detects a failure in one node, it seamlessly recovers all running processes on the remaining node in its cluster. La Curacao – one of the top 100 electronics and appliance retailers in the U.S focusing on the Hispanic market Implemented SUSE Linux Enterprise Server in a clustered environment on HP ProLiant servers to run their mission critical databases and keeps La Curacao's stores running without interruption. Unitop - one of the largest producers of anionic surfactant chemicals in India. Implemented a certified high availability SAP ERP solution, using SUSE Linux Enterprise Server, IBM System x hardware, IBM DB2 information management software, and SAP, for all its business activities and information. 14 © Novell, Inc. All rights reserved.
  • 16. 3 Node Cluster Overview Network Clients Links LAMP Xen Xen Apache VM VM IP 1 2 ext3 cLVM2+OCFS2 DLM Pacemaker Storage Corosync + openAIS Kernel Kernel Kernel 16 © Novell, Inc. All rights reserved.
  • 17. Detailed View of Components Per Node: SAP ... Web GUI LVS STONITH MySQL DRAC LSB init libvirt iLO Python GUI Resource Agents Xen SBD c DRBD MPIO c OpenAIS Apache CRM Shell iSCSI YaST2 Fencing Policy LRM Filesystems CIB Engine IP address DRBD Pacemaker clvmd Ocfs2_controld OpenAIS dlm_controld ext3, XFS OCFS2 DLM Linux Kernel cLVM2 UDP SCTP TCP multicast DRBD Multipath IO Bonding SAN UDP Local Disks FC(oE), iSCSI Ethernet Infiniband multicast 17 © Novell, Inc. All rights reserved.
  • 18. Why Is This Talk Necessary? We heard comments: • Can't you just make the software stack easy to understand? • Why is a multi-node setup more complicated than a single node? • Gosh, this is awfully complicated! Why is this stuff so powerful? I don't need those other features! This session addresses most of these questions 18 © Novell, Inc. All rights reserved.
  • 19. Design and Architecture Considerations
  • 20. General Considerations • Consider the support level requirements of your mission-critical systems. • Your staff is your key asset! – Invest in training, processes, knowledge sharing. – A good administrator will provide higher availability than a mediocre cluster setup. • Get expert help for the initial setup, and • Write concise operation manuals that make sense at 3am on a Saturday ;-) • Thoroughly test the cluster regularly. – Use a staging system before deploying large changes! 20 © Novell, Inc. All rights reserved.
  • 21. Manage Expectations Properly • Clustering improves reliability, but does not achieve 100%, ever. • Clusters are more complex than single nodes. • Fail-over clusters reduce service outage, but do not eliminate it. • Clustering broken applications will not fix them. • Clusters do not replace backups, RAID, or good hardware. 21 © Novell, Inc. All rights reserved.
  • 22. Complexity Versus Reliability • Every component has a failure probability. – Good complexity: Redundant components. – Undesirable complexity: chained components. – Choke point → single point of failure – Also consider: Administrative complexity. • Use as few components (features) as feasible. – Our extensive feature list is not a mandatory checklist for your deployment ;-) • What is your fall-back in case the cluster breaks? – Backups, non-clustered operation – Architect your system so that this is feasible! 22 © Novell, Inc. All rights reserved.
  • 23. Cluster Size Considerations • More nodes: – Increased absolute redundancy and capacity. – Decreased relative redundancy. – One cluster → one failure domain. • Does your work-load scale well to larger node counts? • Chose odd node counts. – 4 and 3 node clusters both lose majority after 2 nodes. • Question: – 5 cheaper servers, or – 3 higher quality servers with more capacity each? 23 © Novell, Inc. All rights reserved.
  • 25. General Software Stack • Please avoid chasing already solved problems! • Please apply all available software updates: – SUSE Linux Enterprise Server 11 ® – SUSE Linux Enterprise High Availability Extension • Consider migrating to SUSE Linux Enterprise High Availability Extension, if you have not already. – Usability, ease of setup, integration are all much improved. – SUSE Linux Enterprise Server 10 remains fully supported. 25 © Novell, Inc. All rights reserved.
  • 26. From One to Many Nodes • Error: Configuration files not identical across nodes. – /etc/drbd.conf, /etc/corosync/corosync.conf, /etc/ais/openais.conf, resource-specific configurations ... • Symptoms: Causes weird misbehavior, works one but not on other systems, interoperability issues, and possibly others. • Solution: Make sure they are synchronized. – SUSE Linux Enterprise High Availability Extension 11 SP1 ® provides “csync2” to do this automatically for you. > You can add your own files to this list as needed. 26 © Novell, Inc. All rights reserved.
  • 27. Networking • Switches must support multicast properly. • Bonding is preferable to using multiple rings: – Reduces complexity – Exposes redundancy to all cluster services and clients • Firewall rules are not your friend. • Keep firmware on switches uptodate! • Make NIC names identical on all nodes 27 © Novell, Inc. All rights reserved.
  • 28. Fencing (STONITH) • Error: Not configuring STONITH at all – It defaults to enabled, resource start-up will block and the cluster simply do nothing. This is for your own protection. • Warning: Disabling STONITH – DLM/OCFS2 will block forever waiting for a fence that is never going to happen. • Error: Using “external/ssh”, “ssh”, “null” in production – These plug-ins are for testing. They will not work in production! – Use a “real” fencing device or external/sbd • Error: configuring several power switches in parallel. • Error: Trying to use external/sbd on DRBD 28 © Novell, Inc. All rights reserved.
  • 29. CIB Configuration Issues • 2 node clusters cannot have majority with 1 node failed – # crm configure property no-quorum-policy=ignore • Resources are starting up in “random” order or on “wrong” nodes We'll – Add required constraints! get back to that ... • Resources move around when something “unrelated” changes – # crm configure property default-resource-stickiness=1000 • # crm_verify -L ; ptest -L -VVVV – Will point out some basic issues 29 © Novell, Inc. All rights reserved.
  • 30. Configuring Cluster Resources • Symptom: On start of one or more nodes, the cluster restarts resources! • Cause: resources under cluster control are also started via the “init” sequence. – The cluster “probes” all resources on start-up on a node, and when it finds resources active where they should not be – possibly even more than once in the cluster –, the recovery protocol is to stop them all (including all dependencies) and start them cleanly again. • Solution: Remove them from the “init” sequence. 30 © Novell, Inc. All rights reserved.
  • 31. Setting Resource Time-outs • Belief: “Shorter time-outs make the cluster respond faster.” • Fact: Too short time-outs cause resource operations to “fail” erroneously, making the cluster unstable and unpredictable. – A somewhat too long time-out will cause a fail-over delay; – a slightly too short time-out will cause an unnecessary service outage. • Consider that a loaded cluster node may be slower than during deployment testing. – Check “crm_mon -t1” output for the actual run-times of resources. 31 © Novell, Inc. All rights reserved.
  • 32. OCFS2 • Using ocfs2-tools-o2cb (legacy mode) – O2CB only works with Oracle RAC; full features of SUSE Linux ® Enterprise High Availability Extension are only available in combination with Pacemaker – # zypper rm ocfs2-tools-o2cb – Forget about /etc/ocfs2/cluster.conf, /etc/init.d/ocfs2, /etc/init.d/o2cb and /etc/sysconfig/ocfs2 • Nodes crash on shutdown – If you have active ocfs2 mounts, you need to umount before shutdown – If openais is part of the boot sequence > # insserv openais • Consider: Do you really need OCFS2? – Can your application really run concurrently? 32 © Novell, Inc. All rights reserved.
  • 33. Distributed Replicated Block Device • Myth: has no shared state, thus no STONITH needed. – Fact: DRBD still needs fencing! • Active/Active: – Does not magically make ext3 or applications concurrency-safe, still can only be mounted once – With OCFS2, split-brain is still fatal, as data diverges! • Active/Passive: – Ensures only one side can modify data, added protection. – Does not magically make applications crash-safe. • Issue: Replication traffic during reads. – “noatime” mount option. 33 © Novell, Inc. All rights reserved.
  • 34. Storage in General • Activating non-battery backed caches for performance – Causes data corruption. • iSCSI over unreliable networks. • Lack of multipath for storage. • Believing that RAID replaces backups. – RAID and replication immediately propagate logical errors! • Please ensure that device names are identical on all nodes. 34 © Novell, Inc. All rights reserved.
  • 35. Exploring the Effect of Events
  • 36. What Are Events? • All state changes to the cluster are events – They cause an update of the CIB – Configuration changes by the administrator – Nodes going up or going down – Resource monitoring failures • Response to events is configured using the CIB policies and computed by the Policy Engine • This can be simulated using ptest – Available comfortably through the “crm” shell 36 © Novell, Inc. All rights reserved.
  • 37. Simulating Node Failure hex-0:~ # crm crm(live)# cib new sandbox INFO: sandbox shadow CIB created crm(sandbox)# cib cibstatus node hex-0 unclean crm(sandbox)# ptest 37 © Novell, Inc. All rights reserved.
  • 38. Simulating Node Failure 38 © Novell, Inc. All rights reserved.
  • 39. Simulating Resource Failure crm(sandbox)# cib cibstatus load live crm(sandbox)# cib cibstatus op usage: op <operation> <resource> <exit_code> [<op_status>] [<node>] crm(sandbox)# cib cibstatus op start dummy1 not_running done hex-0 crm(sandbox)# cib cibstatus op start dummy1 unknown timeout hex-0 crm(sandbox)# configure ptest ptest[4918]: 2010/02/17_12:44:17 WARN: unpack_rsc_op: Processing failed op dummy1_start_0 on hex-0: unknown error (1) 39 © Novell, Inc. All rights reserved.
  • 40. Simulating Resource Failure 40 © Novell, Inc. All rights reserved.
  • 41. Exploring Configuration Changes crm(sandbox)# cib cibstatus load live crm(sandbox)# configure primitive dummy42 ocf:heartbeat:Dummy crm(sandbox)# ptest 41 © Novell, Inc. All rights reserved.
  • 42. Configuration Changes - Woah! 42 © Novell, Inc. All rights reserved.
  • 43. Exploring Configuration Changes crm(sandbox)# configure rsc_defaults resource-stickiness=1000 crm(sandbox)# ptest crm(sandbox)# configure order order-42 inf: dummy42 dummy1 crm(sandbox)# ptest 43 © Novell, Inc. All rights reserved.
  • 44. Configuration Changes – Almost ... 44 © Novell, Inc. All rights reserved.
  • 45. Configuration Changes - Done 45 © Novell, Inc. All rights reserved.
  • 46. Log Files and Their Meaning
  • 47. hb_report Is the Silver Support Bullet • Compiles – Cluster-wide log files, – Package state, – DLM/OCFS2 state, – System information, – CIB history, – Parsed core dump reports, into a single tarball for all support needs. • # hb_report -n “node1 node2 node3” -f 12:00 /tmp/hb_report_example1 47 © Novell, Inc. All rights reserved.
  • 48. Logging • “The cluster generates too many log messages!” – Alas, customers are even more upset when asked to reproduce a problem on their production system ;-) – Incidentially, all command line invocations are logged. • System-wide logs: /var/log/messages • CIB history: /var/lib/pengine/* – All cluster events are logged here and can be analyzed with hindsight (python GUI, ptest, and the crm shell). 48 © Novell, Inc. All rights reserved.
  • 49. Where Is the Real Cause? The answer is always in the logs. Even though the logs on the DC may print a reference to the error, the real cause may be on another node. Most errors are caused by resource agent misconfiguration. 49 © Novell, Inc. All rights reserved.
  • 50. Correlating Messages to Their Cause • Feb 17 13:06:57 hex-8 pengine: [7717]: WARN: unpack_rsc_op: Processing failed op ocfs2- 1:2_monitor_20000 on hex-0: not running (7) – This is not the failure, just the Policy Engine reporting on the CIB state! The real messages are on hex-0, grep for the operation key: • Feb 17 13:06:57 hex-0 Filesystem[24825]: [24861]: INFO: /filer is unmounted (stopped) • Feb 17 13:06:57 hex-0 crmd: [7334]: info: process_lrm_event: LRM operation ocfs 2-1:2_monitor_20000 (call=37, rc=7, cib- update=55, confirmed=false) not running – Look for the error messages from the resource agent before the lrmd/pengine lines! 50 © Novell, Inc. All rights reserved.
  • 52. Common Resource Agent Issues • Operations must succeed if the resource is already in the requested state. • “monitor” must distinguish between at least “running/OK”, “running/failed”, and “stopped” – Probes deserve special attention • Meta-data must conform to DTD. • 3rd party resource agents do not belong under /usr/lib/ocf/resource.d/heartbeat – chose your own provider name! • Use ocf-tester to validate your resource agent. 52 © Novell, Inc. All rights reserved.
  • 53. ocf-tester Example Output hex-0:~ # ocf-tester -n Example /usr/lib/ocf/resource.d/bs2010/Dummy Beginning tests for /usr/lib/ocf/resource.d/bs2010/Dummy... * Your agent does not support the notify action (optional) * Your agent does not support the demote action (optional) * Your agent does not support the promote action (optional) * Your agent does not support master/slave (optional) * rc=7: Stopping a stopped resource is required to succeed Tests failed: /usr/lib/ocf/resource.d/bs2010/Dummy failed 1 tests 53 © Novell, Inc. All rights reserved.
  • 55.
  • 56. Unpublished Work of Novell, Inc. All Rights Reserved. This work is an unpublished work and contains confidential, proprietary, and trade secret information of Novell, Inc. Access to this work is restricted to Novell employees who have a need to know to perform tasks within the scope of their assignments. No part of this work may be practiced, performed, copied, distributed, revised, modified, translated, abridged, condensed, expanded, collected, or adapted without the prior written consent of Novell, Inc. Any use or exploitation of this work without authorization could subject the perpetrator to criminal and civil liability. General Disclaimer This document is not to be construed as a promise by any participating company to develop, deliver, or market a product. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. Novell, Inc. makes no representations or warranties with respect to the contents of this document, and specifically disclaims any express or implied warranties of merchantability or fitness for any particular purpose. The development, release, and timing of features or functionality described for Novell products remains at the sole discretion of Novell. Further, Novell, Inc. reserves the right to revise this document and to make changes to its content, at any time, without obligation to notify any person or entity of such revisions or changes. All Novell marks referenced in this presentation are trademarks or registered trademarks of Novell, Inc. in the United States and other countries. All third-party trademarks are the property of their respective owners.