Cl306

High-Availability with Novell Cluster Services ™ for Novell ® Open Enterprise Server on Linux Tim Heywood , CTO, NDS8 [email_address] Martin Weiss , Senior Technical Specialist [email_address] Dr. Frieder Schmidt , Senior Technical Specialist [email_address]

Agenda High Availability and Fault Tolerance Novell Cluster Services ™ Best Practices Deploying Cluster Services What is Clusterable? Demo

High-Availability and Fault Tolerance

High-Availability: Motivation Murphy's Law is universal: faults will occur Power failures, hardware crashes, software errors, human mistakes... Unmasked faults show through to the user

How much does downtime of a service cost you? Even if you can afford a 5 second blip, can you afford a day long outage or worse, loss of data? Can you afford low availability systems? If you are selling or depending on a service, service unavailability translates to cost

Definition: Availability Mean Time Between Failures (MTBF) follows a normal distribution Mean Time To Repair (MTTR)

Availability Percentage of time that a system functions as expected

Always computed for a certain time, i. e. a month, a year Example: MTBF: 360 days

How to Determine Availability? Availability of a complex system is determined by the availability of its individual components

two ways to couple components: serial design

parallel design Availability of a serial design: A ser = A 1 * A 2 ; A 1 = 0.99, A 2 = 0.99, A ser = 0.9801

Availability of a parallel design: A par = 1 – ( 1 - A 1 ) * ( 1 – A 2 ); A par = 1 – ( 1 - 0.99 ) * ( 1 – 0.99 ); A par = 1 – ( 0.01 ) * ( 0.01 ) = 0.9999

“3R Rule” for High-Availability Systems R edundancy, R edundancy, R edundancy Fault Tolerance “The ability of a system to respond gracefully to an unexpected hardware or software failure.” Webopedia Computer System Fault Tolerance “The ability of a computer system to continue to operate correctly even though one or more of its components are malfunctioning.” Institute for Telecommunication Services, National Telecommunications and Information Administration, US Dept. of Commerce

Managing Risk: Two Goals Primary Goal: Increase Mean Time to Failure (MTTF) Choose reliable hardware

Implement redundant / fault tolerant systems Easy to implement for some components (power supplies, LAN connectivity, SAN connectivity, RAID, etc.)

Not so easy for other components (main board, memory, processor, etc. Establish sound administrative practices Secondary Goal: Reduce Mean Time to Repair (MTTR) Keep hardware spares close at hand

Document repair procedures and train personnel

Chose Open Enterprise Server– Linux Server with Novell Cluster Services ™

High-Availability by Clustering Redundant setup “clustered” to act as one avoid Single Point of Failure (SPOF) Primary focus is availability , but can allow for increased performance HA via fail-over: In case [an application on] a server failure is detected, another server takes over Results achieved depend on failure detection time and startup delays The [virtual] hand moves faster than the eye The fault is masked before the user really notices

Depends on failure detection time, restart time, overhead

Novell Cluster Services ™ Cluster services allows a resource to be activated on any host in the cluster

Load distribution over multiple servers when having multiple resources

Monitors LAN and SAN/Storage connectivity – in the event of a failure – fences the problematic node and relocates the resource

Supports active-passive clustering

Supports Linux and Novell ® Open Enterprise Server services

Supports up to 32 nodes per cluster

Novell Cluster Services ™ Easy Management

Easy Configuration Load Script

Monitoring Script iManager integration

Integration with Novell ® Open Enterprise Server Services

Novell Cluster Services ™ Ctrl 2 Dual NICs Dual HBAs LUN 0 LUN 1 LUN … Ctrl 1 LAN Fabric SAN Fabric Storage Array Storage Array Novell iSCSI Storage Array Typical NCS 1.8 Architecture Fibre Channel or iSCSIl Ethernet

Cluster Services in Novell ® Open Enterprise Server (OES) 2 New features are Linux only

New from OES2 FCS on: Resource monitoring

x86_64 platform support Including mixed 32/64 bit node support Dynamic Storage Technology

What's New in SP1/2? Major rewrite of cluster code for SP2 Removed NetWare ® translation layer

Typical load average of 0.2! New/improved clustering for: iFolder 3

… NCP ™ virtual server for POSIX filesystem resources :-(

Cl306

More Related Content

What's hot

Similar to Cl306

More from Juliette Ponnet

Recently uploaded

Cl306

Editor's Notes