OpenSAF Symposium - Intro to OpenSAF_9.13.11

2,736 views
2,612 views

Published on

Systems that meet stringent service availability (SA) and high availability (HA) requirements have been around for decades, but diverse segments use varied terminology to describe the same concepts. This session will provide a high-level technical overview of the Service Availability Forum standards and the support of those standards within OpenSAF, allowing those familiar with HA concepts to map their terminology to SA Forum and OpenSAF terminology.

The session will also help those relatively new to OpenSAF or the HA domain to familiarize themselves with the terms and concepts. This session will lay the technical foundation for the remainder of the symposium so that attendees get the most out of the more detailed presentations that follow.

OpenSAF involves a number of complex ideas and is designed to work in many different environments. In order to make it easy for new users to get started, we will also detail options that new users have to educate themselves about OpenSAF and relevant environments for using the code base and interacting with the community.

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,736
On SlideShare
0
From Embeds
0
Number of Embeds
1,466
Actions
Shares
0
Downloads
31
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

OpenSAF Symposium - Intro to OpenSAF_9.13.11

  1. 1. Introduction to OpenSAF David Fick Senior Software Architect GoAhead Software
  2. 2. Introduction to OpenSAF• Service availability and high availability systems and concepts have been around for decades• However, HA terminology tends to vary from industry to industry and company to company• Goals of this session: – High-level technical overview of the Service Availability™ Forum standards – Overview of the support of those standards within OpenSAF – Allow you to: • Familiarize yourself with general HA concepts and terminology OR • Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions – Resources for getting started with OpenSAF
  3. 3. SA Forum Interfaces: AIS & HPI Applications Application Interface Specifications (AIS) Service Availability Middleware System Management SAF Software Mgmt Availability Lock (LCK) Framework (SMF) ManagementStandards Framework (AMF)Implemented Information Checkpoint (CKPT)by OpenSAF Model Mgmt (IMM) Cluster Membership (CLM) Event (EVT) Notification (NTF) Log (LOG) Platform Mgmt (PLM) Message (MSG) Operating System Virtualization Hardware Platform Interface (HPI) Hardware Hardware Hardware Hardware Platform A Platform B Platform C Platform D
  4. 4. But how to make sense of the SA Forum “acronym soup”?
  5. 5. AIS Service Groupings • First, understand that the AIS services fall into three logical groupings*: System Management Resource Availability Application Services Services Management Services Information Availability Checkpoint (CKPT) Model Mgmt (IMM) Management Framework (AMF) Event (EVT) Software Mgmt Framework (SMF) Cluster Membership (CLM) Message (MSG) Notification (NTF) Platform Mgmt (PLM) Lock (LCK) Log (LOG) Services that manage central Services that manage and Optional services to support system capabilities commonly monitor the state of key system application operations such as: resources that affect availability: • Inter-process used by both: • Hardware / Operating communication • AIS services system • State replication • Applications • Cluster nodes • Shared resource access • Applications control* - Not official SA Forum AIS service groupings
  6. 6. Fault Management Cycle• Second, AIS services that manage availability are designed around a standard fault management cycle – Detection Detection • E.g. component healthchecks – Isolation • E.g. blade power off – Recovery Repair Notification Isolation • E.g. failover of workload assignments to associated standby resources – Repair Recovery • E.g. automatic restart of failed resource – Notification • E.g. state change notifications sent by service managing the resource
  7. 7. Resource Dependencies• Third, Availability Management in the AIS world is Managed driven by a detailed understanding of the availability Applications management dependencies across all resource types – Managed Applications • Simple to complex dependencies and relationships can be modeled between the various software elements • Dependency on a particular node also modeled AMF Node – AMF Node • Represents a node where AMF services are provided • Depends on a CLM node – CLM Node CLM Node • Represents a cluster node where AIS services are provided • Depends on an Execution Environment (optional) – Platform Resource • Containment and logical dependencies represented Platform between platform resources Resource • Execution Environment (EE) – Represents an operating system instance (standalone or virtual) • Hardware Element (HE) Hardware Execution – Represents a physical hardware resource in the system Element Environment
  8. 8. Common Design Patterns• Fourth, the AIS services follow common design patterns: – API • Common library lifecycle • Naming conventions – Resource managed by service Managed object • Typically with associated state model • Managed objects stored in common information model – Administrative operations • X.731 style administrative operations for resources which affect availability – Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
  9. 9. Resource Availability Management Services• Availability Management Framework (AMF) – Manages the lifecycle and monitors the state of the managed applications within the system – More detail in upcoming slides• Cluster Membership (CLM) AMF – Provides cluster membership change notifications to AIS services and interested applications – OpenSAF CLM implements cluster management protocol dealing with: • Cluster formation CLM • Active controller selection & failover • Node failure detection• Platform Management (PLM) – Manages the state of modeled hardware elements and execution environments (operating system instances) PLM – Hardware element states and events accessed through Hardware Platform Interface (HPI) – Manages graceful blade extraction / de-activation cases – Supports hardware element controls (power on/off and reset) – Optional service within OpenSAF
  10. 10. Availability Management Framework (AMF) AMF Logical Entities• Structural Entities AMF – AMF Application Application • Represents the highest-level 1..* service(s) provided by the system – Service Group (SG) Service Group • Represents a group of like logical resources that provide the same service(s) • Associated redundancy model 1..* (e.g. 1+1) – Service Unit (SU) Service Unit • Aggregates a set of resources which when combined provide a higher-level service 1..* – Component Component • Represents one or more resources that perform a function within the system
  11. 11. Availability Management Framework (AMF) AMF Logical Entities• Workload Entities AMF Application – Service Instance (SI) 1..* • Represents a workload to be supported by the system Service Service Service Group Protected by • Has associated redundancy Group Group requirements (1+1, N+M, etc.) • Protected by an identified SG • Assigned to one or more SUs 1..* 1..* with an HA state of active, Service standby, quiescing or Service Service1 Unit Assigned Service quiesced Unit 1 Unit Instance – Component Service Instance (CSI) 1..* 1..* • Represents a more granular Assigned Component workload that needs to be Component Component Component Service supported by the system Instance • Assigned to one or more components
  12. 12. Availability Management Framework (AMF) AMF Logical Entities• Common Characteristics – Well-defined state model for each logical entity type • Operational • Administrative • Etc. – X.731 style administrative operations • Lock • Unlock CLC-CLI • Shutdown Lifecycle Scripts • Etc. mgmt AMF comp process• Common AMF Component Types AMF HA state assignment AMF Library – SA-aware – Non-proxied, non-SA-aware SA-aware Component Example – Proxied, non-SA-aware
  13. 13. Availability Management Framework (AMF) Service Group Redundancy Models• Key redundancy model characteristics – Preferred SI assignment model • # of active resource(s) • # of standby resource(s) – Allowed concurrent HA state assignments for SUs – # of assignable SUs SI1• Redundancy model options – 2N A S • Most common redundancy model • 1 active resource and 1 standby SU1 SU2 resource per SI A S • SUs can have either all active or all Node1 Node2 standby SI assignments – N+M – No Redundancy SI2 – N-way – N-way active 2N Service Group Example
  14. 14. Availability Management Framework (AMF) Error Recovery Policies• Pre-defined AMF component error recovery policies – Configurable – Can be overridden at runtime• Up to 3 actions per policy – Isolation – Recovery – Repair• Recovery policy scopes – Component – Service Unit – Node• Recovery policy types – Restart – Failover – Failfast• Recovery escalation policies
  15. 15. System Management Services Information Model Management (IMM)• Information Model Highlights – Based on pre-defined object classes (including AIS classes) – Holds both configuration and runtime objects – Used by AIS services to store current configuration and runtime state info – Can be used by applications as well• Object Management API – Object class management – Access object attribute values – Search information model – Configuration change requests – Administrative operation invocation• Object Implementer API – Runtime object management – CCB validation and application – Administrative operation handling• OpenSAF Implementation – Persistence of information model managed through Persistence BackEnd (PBE) feature – Replicated to multiple cluster nodes
  16. 16. System Management Services Software Management Framework (SMF)• SMF controls migration from one deployment Upgrade “Upgrade configuration to another Instructions” Campaign Definition• Upgrade methods – Rolling upgrade Software – Single step upgrade Management Adaptation commands• [De-]Activation Unit Scope Framework (SMF config object) – AMF Node Install / remove - Admin operations – Service Unit software bundles - Read/Create/Delete/Update• During the migration SMF on target nodes objects – Maintains the campaign state change model – Takes measures to enable error recovery – Monitors for potential errors caused by the migration Software Information – Deploys error recovery procedures Repository Model
  17. 17. System Management Services• Notification (NTF) – Publish-and-subscribe semantics for system-level notifications – Syntax and semantics for ITU X.73x notifications: • Alarm / security alarm / state change / object create/ delete / attribute change – Alarm and security alarm notifications automatically logged through LOG service• Log (LOG) – Flexible, centralized, system-wide logging mechanism – Pre-defined log streams: alarm, notification, system – Multiple, custom application log streams allowed – Configurable log stream characteristics including: • log file full action: halt, wrap, and rotate
  18. 18. Application Services• Checkpoint (CKPT) – Intended as a state replication mechanism for distributed applications – Can be used for all standby “temperature levels” • Cold • Warm • Hot – Through OpenSAF CKPT service API extension – Semantics of a checkpoint • Arbitrary set of sections containing opaque data • Stored in one or more replicas distributed across cluster • Reads and writes occur against the active replica – Both synchronous and asynchronous replication options available – Collocated checkpoint option provided for highest performance
  19. 19. Application Services• Event (EVT) – Publish-and-subscribe communication paradigm – Flexible event channel, pattern, and filtering definition – Subscriber event queue maintained within app process• Message (MSG) – Messages sent to and read from message queues – Single message queue owner at a time – Message queue maintained outside app process – Message queues can be logically grouped • Messages can be sent to a message queue group • Associated distribution policy (round-robin, broadcast, etc.)• Lock (LCK) – Cluster-wide, distributed lock service – Can be used to control access to cluster-level shared resources
  20. 20. Getting Started with OpenSAF• OpenSAF Technical Educational Resources – Developer Wiki [http://devel.opensaf.org/wiki] – OpenSAF Developers blog [http://devel.opensaf.org/blog] – OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/] • Users [Archive: http://list.opensaf.org/pipermail/users/] • Development [Archive: http://list.opensaf.org/pipermail/devel/] • Announce [Archive: http://list.opensaf.org/pipermail/announce/] – Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x- documentation/archive/tip.tar.gz] – FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPENSAF%20RE LEASE%204%20Final%20for%20publication.docx] – README files in source code repository• SA Forum Application Interface Specifications [http://www.saforum.org/Service-Availability-Forum:-Application-Interface-Specification- ~217404~16627.htm]
  21. 21. Questions

×