Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

An Introduction to OpenSAF 5.17.2011


Published on

Systems that meet stringent service availability (SA) and high availability (HA) requirements have been around for decades, but diverse segments use varied terminology to describe the same concepts. This session will provide a high-level technical overview of the Service Availability™ Forum standards and the support of those standards within OpenSAF, allowing those familiar with HA concepts to map their terminology to SA Forum and OpenSAF terminology.

The session will also help those relatively new to OpenSAF or the HA domain to familiarize themselves with the terms and concepts. This session will lay the technical foundation for the remainder of the conference so that attendees get the most out of the more detailed presentations that follow.

OpenSAF involves a number of complex ideas and is designed to work in many different environments. In order to make it easy for new users to get started, we will also detail options that new users have to educate themselves about OpenSAF and relevant environments for using the code base and interacting with the community.
great, th

  • Be the first to comment

An Introduction to OpenSAF 5.17.2011

  1. 1. Introduction to OpenSAF David Fick Senior Software Architect GoAhead Software
  2. 2. Introduction to OpenSAF• Service availability and high availability systems and concepts have been around for decades• However, HA terminology tends to vary from industry to industry and company to company• Goals of this session: – High-level technical overview of the Service Availability™ Forum standards – Overview of the support of those standards within OpenSAF – Allow you to: • Familiarize yourself with SA Forum and OpenSAF concepts and terminology OR • Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions – Resources for getting started with OpenSAF
  3. 3. SA Forum Interfaces: AIS & HPI Applications Application Interface Specifications (AIS) Service Availability Middleware System Management SAF Software Mgmt Availability Lock (LCK) Framework (SMF) ManagementStandards Framework (AMF)Implemented Information Checkpoint (CKPT)by OpenSAF Model Mgmt (IMM) Cluster Membership (CLM) Event (EVT) Notification (NTF) Log (LOG) Platform Mgmt (PLM) Message (MSG) Operating System Virtualization Hardware Platform Interface (HPI) Hardware Hardware Hardware Hardware Platform A Platform B Platform C Platform D
  4. 4. But how to make sense of the SA Forum “acronym soup”?
  5. 5. AIS Service Groupings • First, understand that the AIS services fall into three logical groupings*: System Management Resource Availability Application Services Services Management Services Information Availability Checkpoint (CKPT) Model Mgmt (IMM) Management Framework (AMF) Event (EVT) Software Mgmt Framework (SMF) Cluster Membership (CLM) Message (MSG) Notification (NTF) Platform Mgmt (PLM) Lock (LCK) Log (LOG) Services that manage central Services that manage and Optional services to support system capabilities commonly monitor the state of key system application operations such as: resources that affect availability: • Inter-process used by both: • Hardware / Operating communication • AIS services system • State replication • Applications • Cluster nodes • Shared resource access • Applications control* - Not official SA Forum AIS service groupings
  6. 6. Fault Management Cycle• Second, AIS services that manage availability are designed around a standard fault management cycle – Detection Detection • E.g. component healthchecks – Isolation • E.g. blade power off – Recovery Repair Notification Isolation • E.g. failover of workload assignments to associated standby resources – Repair Recovery • E.g. automatic restart of failed resource – Notification • E.g. state change notifications sent by service managing the resource
  7. 7. Resource Dependencies• Third, Availability Management in the AIS world is Managed driven by a detailed understanding of the availability Applications management dependencies across all resource types – Managed Applications • Simple to complex dependencies and relationships can be modeled between the various software elements • Dependency on a particular node also modeled AMF Node – AMF Node • Represents a node where AMF services are provided • Depends on a CLM node – CLM Node CLM Node • Represents a cluster node where AIS services are provided • Depends on an Execution Environment (optional) – Platform Resource • Containment and logical dependencies represented Platform between platform resources Resource • Execution Environment (EE) – Represents an operating system instance (standalone or virtual) • Hardware Element (HE) Hardware Execution – Represents a physical hardware resource in the system Element Environment
  8. 8. Common Design Patterns• Fourth, the AIS services follow common design patterns: – API • Common library lifecycle • Naming conventions – Resource managed by service Managed object • Typically with associated state model • Managed objects stored in common information model – Administrative operations • X.731 style administrative operations for resources which affect availability – Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
  9. 9. Resource Availability Management Services• Availability Management Framework (AMF) – Manages the lifecycle and monitors the state of the managed applications within the system – More detail in upcoming slides• Cluster Membership (CLM) AMF – Provides cluster membership change notifications to AIS services and interested applications – OpenSAF CLM implements cluster management protocol dealing with: • Cluster formation CLM • Active controller selection & failover • Node failure detection• Platform Management (PLM) – Manages state of modeled hardware elements and execution environments (operating system instances) PLM – Hardware element states and events accessed through Hardware Platform Interface (HPI) – Manages graceful blade extraction / de-activation cases – Supports hardware element controls (power on/off and reset) – Optional service within OpenSAF
  10. 10. Availability Management Framework (AMF) AMF Logical Entities• Structural Entities AMF – AMF Application Application • Represents the highest-level 1..* service(s) provided by the system – Service Group (SG) Service Group • Represents a group of like logical resources that provide the same service(s) • Associated redundancy model 1..* (e.g. 1+1) – Service Unit (SU) Service Unit 1 • Aggregates a set of resources which when combined provide a higher-level service 1..* – Component Component • Represents one or more resources that perform a function within the system
  11. 11. Availability Management Framework (AMF) AMF Logical Entities• Workload Entities AMF Application – Service Instance (SI) 1..* • Represents a workload to be supported by the system Service Service Service Group Protected by • Has associated redundancy Group Group requirements (1+1, N+M, etc.) • Protected by an identified SG • Assigned to one or more SUs 1..* 1..* with an HA state of active, Service standby, quiescing or Service Service1 Unit Assigned Service quiesced Unit 11 Unit Instance – Component Service Instance (CSI) 1..* 1..* • Represents a more granular Assigned Component workload that needs to be Component Component Component Service supported by the system Instance • Assigned to one or more components
  12. 12. Availability Management Framework (AMF) AMF Logical Entities• Common Characteristics – Well-defined state model for each logical entity type CLC-CLI Scripts – X.731 style administrative operations Lifecycle mgmt• Common AMF Component Types AMF comp HA state process – SA-aware assignment AMF • Applications modified to interact with AMF through AMF Library AMF API Lifecycle – Non-proxied, non-SA-aware mgmt Non- • Legacy or 3rd party applications that typically cannot proxied be modified AMF AMF comp • Interact with AMF through command line scripts to process manage application lifecycle • Always assigned active HA state if running CLC-CLI Proxy Lifecycle Scripts mgmt – Proxied, non-SA-aware Proxy component • Applications that have knowledge of HA concepts but AMF do not directly communicate with AMF AMF Library • Proxy application receives HA “commands” from AMF and forwards them to proxied application Lifecycle through a custom interface Proxy HA state assignment AND Proxied mgmt & Proxied comp lifecycle mgmt & AMF comp HA state HA state assignment requests process assignment
  13. 13. Availability Management Framework (AMF) Service Group Redundancy Models• 2N SI1 – Most common redundancy model – Preferred assignment model per SI: A S • 1 active resource • 1 standby resource SU1 SU2 – SUs can have either all active or all standby SI assignments Node1 Node2 – A.k.a. • 1+1, active-standby, active-backup SI1• N+M – Preferred assignment model per SI: A S • 1 active resource • 1 standby resource SU1 SU2 SU3 – SUs can have either all active or all standby SI assignments Node1 Node2 Node3 – Both N and M are configurable A S – Common variation: N+1 SI2
  14. 14. Availability Management Framework (AMF) Service Group Redundancy Models• No redundancy SI1 SI2 – Preferred assignment model per SI: • 1 active resource A A – Similar to a N+0 redundancy scheme SU1 SU2 where N is the number of protected SIs Node1 Node2• N-way SI1 – Preferred assignment model per SI: • 1 active resource A S S • Y standby resources (where Y is configurable) SU1 SU2 SU3 – SUs can concurrently have both active and Node1 Node2 Node3 standby assignments S A S SI1• N-way Active SI2 – Preferred assignment model per SI: A A • X active resources (where X is configurable) SU1 SU2 • No standby resource Node1 Node2
  15. 15. Availability Management Framework (AMF) Error Recovery Policies• Pre-defined AMF component error recovery policies – Configurable – Can be overridden at runtime• Recovery policy scopes – Component – Service Unit – Node• Recovery policy types – Restart – Failover – Failfast• Up to 3 actions per policy – Isolation – Recovery – Repair• Error escalation policies
  16. 16. System Management Services Information Model Management (IMM)• Information Model Highlights – Based on pre-defined object classes (including AIS classes) – Holds both configuration and runtime objects – Used by AIS services to store current configuration and runtime state info – Can be used by applications as well• Object Management API – Object class management – Access object attribute values – Search information model – Configuration change requests – Administrative operation invocation• Object Implementer API – Runtime object management – CCB validation and application – Administrative operation handling• OpenSAF Implementation – Persistence of information model managed through Persistence BackEnd (PBE) feature – Replicated to multiple cluster nodes
  17. 17. System Management Services Software Management Framework (SMF)• SMF controls migration from one deployment Upgrade “Upgrade configuration to another Instructions” Campaign Definition• Upgrade methods – Rolling upgrade Software – Single step upgrade Management Adaptation commands• [De-]Activation Unit Scope Framework (SMF config object) – AMF Node Install / remove - Admin operations – Service Unit software bundles - Read/Create/Delete/Update• During the migration SMF on target nodes objects – Maintains the campaign state change model – Takes measures to enable error recovery – Monitors for potential errors caused by the migration Software Information – Deploys error recovery procedures Repository Model
  18. 18. System Management Services• Notification (NTF) – Publish-and-subscribe semantics for system-level notifications • Reader interface for reading historical alarm info as well – Formal syntax and semantics for ITU X.73x notifications: • Alarm / security alarm / state change / object create/ delete / attribute change – Used by AIS services to publish service-specific notifications – Alarm and security alarm notifications automatically logged through LOG service• Log (LOG) – Flexible, centralized, system-wide logging mechanism – Pre-defined log streams: alarm, notification, system – Supports multiple, custom application log streams – Log streams are configurable on a per log stream basis • Including log file full action: halt, wrap, and rotate
  19. 19. Application Services• Checkpoint (CKPT) – Intended as a state replication mechanism for distributed applications – Can be used for all standby “temperature levels” • Cold • Warm • Hot – Through OpenSAF CKPT service API extension – Semantics of a checkpoint • Arbitrary set of sections containing opaque data • Stored in one or more replicas distributed across cluster • Reads and writes occur against the active replica – Both synchronous and asynchronous replication options available – Collocated checkpoint option provided for highest performance
  20. 20. Application Services• Event (EVT) – Publish-and-subscribe communication paradigm – Flexible event channel, pattern, and filtering definition – Subscriber event queue maintained within app process• Message (MSG) – Messages sent to and read from message queues – Single message queue owner at a time – Message queue maintained outside app process – Message queues can be logically grouped • Messages can be sent to a message queue group • Associated distribution policy (round-robin, broadcast, etc.)• Lock (LCK) – Cluster-wide, distributed lock service – Can be used to control access to cluster-level shared resources
  21. 21. Getting Started with OpenSAF• OpenSAF Technical Educational Resources – Developer Wiki [] – OpenSAF Developers blog [] – OpenSAF mailing lists [Subscribe:] • Users [Archive:] • Announce [Archive:] • Development [Archive:] – Latest documentation [ documentation/archive/tip.tar.gz] – FAQ [ NSAF%20RELEASE%204%20Final%20for%20publication.docx] – README files in source code repository
  22. 22. Questions