Your SlideShare is downloading. ×
  • Like
OpenSAF Symposium - Intro to OpenSAF_9.13.11
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

OpenSAF Symposium - Intro to OpenSAF_9.13.11

  • 2,292 views
Published

Systems that meet stringent service availability (SA) and high availability (HA) requirements have been around for decades, but diverse segments use varied terminology to describe the same concepts. …

Systems that meet stringent service availability (SA) and high availability (HA) requirements have been around for decades, but diverse segments use varied terminology to describe the same concepts. This session will provide a high-level technical overview of the Service Availability Forum standards and the support of those standards within OpenSAF, allowing those familiar with HA concepts to map their terminology to SA Forum and OpenSAF terminology.

The session will also help those relatively new to OpenSAF or the HA domain to familiarize themselves with the terms and concepts. This session will lay the technical foundation for the remainder of the symposium so that attendees get the most out of the more detailed presentations that follow.

OpenSAF involves a number of complex ideas and is designed to work in many different environments. In order to make it easy for new users to get started, we will also detail options that new users have to educate themselves about OpenSAF and relevant environments for using the code base and interacting with the community.

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
2,292
On SlideShare
0
From Embeds
0
Number of Embeds
2

Actions

Shares
Downloads
18
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Introduction to OpenSAF David Fick Senior Software Architect GoAhead Software
  • 2. Introduction to OpenSAF• Service availability and high availability systems and concepts have been around for decades• However, HA terminology tends to vary from industry to industry and company to company• Goals of this session: – High-level technical overview of the Service Availability™ Forum standards – Overview of the support of those standards within OpenSAF – Allow you to: • Familiarize yourself with general HA concepts and terminology OR • Map the HA concepts and terminology with which you are familiar to the SA Forum and OpenSAF versions – Resources for getting started with OpenSAF
  • 3. SA Forum Interfaces: AIS & HPI Applications Application Interface Specifications (AIS) Service Availability Middleware System Management SAF Software Mgmt Availability Lock (LCK) Framework (SMF) ManagementStandards Framework (AMF)Implemented Information Checkpoint (CKPT)by OpenSAF Model Mgmt (IMM) Cluster Membership (CLM) Event (EVT) Notification (NTF) Log (LOG) Platform Mgmt (PLM) Message (MSG) Operating System Virtualization Hardware Platform Interface (HPI) Hardware Hardware Hardware Hardware Platform A Platform B Platform C Platform D
  • 4. But how to make sense of the SA Forum “acronym soup”?
  • 5. AIS Service Groupings • First, understand that the AIS services fall into three logical groupings*: System Management Resource Availability Application Services Services Management Services Information Availability Checkpoint (CKPT) Model Mgmt (IMM) Management Framework (AMF) Event (EVT) Software Mgmt Framework (SMF) Cluster Membership (CLM) Message (MSG) Notification (NTF) Platform Mgmt (PLM) Lock (LCK) Log (LOG) Services that manage central Services that manage and Optional services to support system capabilities commonly monitor the state of key system application operations such as: resources that affect availability: • Inter-process used by both: • Hardware / Operating communication • AIS services system • State replication • Applications • Cluster nodes • Shared resource access • Applications control* - Not official SA Forum AIS service groupings
  • 6. Fault Management Cycle• Second, AIS services that manage availability are designed around a standard fault management cycle – Detection Detection • E.g. component healthchecks – Isolation • E.g. blade power off – Recovery Repair Notification Isolation • E.g. failover of workload assignments to associated standby resources – Repair Recovery • E.g. automatic restart of failed resource – Notification • E.g. state change notifications sent by service managing the resource
  • 7. Resource Dependencies• Third, Availability Management in the AIS world is Managed driven by a detailed understanding of the availability Applications management dependencies across all resource types – Managed Applications • Simple to complex dependencies and relationships can be modeled between the various software elements • Dependency on a particular node also modeled AMF Node – AMF Node • Represents a node where AMF services are provided • Depends on a CLM node – CLM Node CLM Node • Represents a cluster node where AIS services are provided • Depends on an Execution Environment (optional) – Platform Resource • Containment and logical dependencies represented Platform between platform resources Resource • Execution Environment (EE) – Represents an operating system instance (standalone or virtual) • Hardware Element (HE) Hardware Execution – Represents a physical hardware resource in the system Element Environment
  • 8. Common Design Patterns• Fourth, the AIS services follow common design patterns: – API • Common library lifecycle • Naming conventions – Resource managed by service Managed object • Typically with associated state model • Managed objects stored in common information model – Administrative operations • X.731 style administrative operations for resources which affect availability – Notifications automatically generated by AIS services for significant system events (alarms, state changes, etc.)
  • 9. Resource Availability Management Services• Availability Management Framework (AMF) – Manages the lifecycle and monitors the state of the managed applications within the system – More detail in upcoming slides• Cluster Membership (CLM) AMF – Provides cluster membership change notifications to AIS services and interested applications – OpenSAF CLM implements cluster management protocol dealing with: • Cluster formation CLM • Active controller selection & failover • Node failure detection• Platform Management (PLM) – Manages the state of modeled hardware elements and execution environments (operating system instances) PLM – Hardware element states and events accessed through Hardware Platform Interface (HPI) – Manages graceful blade extraction / de-activation cases – Supports hardware element controls (power on/off and reset) – Optional service within OpenSAF
  • 10. Availability Management Framework (AMF) AMF Logical Entities• Structural Entities AMF – AMF Application Application • Represents the highest-level 1..* service(s) provided by the system – Service Group (SG) Service Group • Represents a group of like logical resources that provide the same service(s) • Associated redundancy model 1..* (e.g. 1+1) – Service Unit (SU) Service Unit • Aggregates a set of resources which when combined provide a higher-level service 1..* – Component Component • Represents one or more resources that perform a function within the system
  • 11. Availability Management Framework (AMF) AMF Logical Entities• Workload Entities AMF Application – Service Instance (SI) 1..* • Represents a workload to be supported by the system Service Service Service Group Protected by • Has associated redundancy Group Group requirements (1+1, N+M, etc.) • Protected by an identified SG • Assigned to one or more SUs 1..* 1..* with an HA state of active, Service standby, quiescing or Service Service1 Unit Assigned Service quiesced Unit 1 Unit Instance – Component Service Instance (CSI) 1..* 1..* • Represents a more granular Assigned Component workload that needs to be Component Component Component Service supported by the system Instance • Assigned to one or more components
  • 12. Availability Management Framework (AMF) AMF Logical Entities• Common Characteristics – Well-defined state model for each logical entity type • Operational • Administrative • Etc. – X.731 style administrative operations • Lock • Unlock CLC-CLI • Shutdown Lifecycle Scripts • Etc. mgmt AMF comp process• Common AMF Component Types AMF HA state assignment AMF Library – SA-aware – Non-proxied, non-SA-aware SA-aware Component Example – Proxied, non-SA-aware
  • 13. Availability Management Framework (AMF) Service Group Redundancy Models• Key redundancy model characteristics – Preferred SI assignment model • # of active resource(s) • # of standby resource(s) – Allowed concurrent HA state assignments for SUs – # of assignable SUs SI1• Redundancy model options – 2N A S • Most common redundancy model • 1 active resource and 1 standby SU1 SU2 resource per SI A S • SUs can have either all active or all Node1 Node2 standby SI assignments – N+M – No Redundancy SI2 – N-way – N-way active 2N Service Group Example
  • 14. Availability Management Framework (AMF) Error Recovery Policies• Pre-defined AMF component error recovery policies – Configurable – Can be overridden at runtime• Up to 3 actions per policy – Isolation – Recovery – Repair• Recovery policy scopes – Component – Service Unit – Node• Recovery policy types – Restart – Failover – Failfast• Recovery escalation policies
  • 15. System Management Services Information Model Management (IMM)• Information Model Highlights – Based on pre-defined object classes (including AIS classes) – Holds both configuration and runtime objects – Used by AIS services to store current configuration and runtime state info – Can be used by applications as well• Object Management API – Object class management – Access object attribute values – Search information model – Configuration change requests – Administrative operation invocation• Object Implementer API – Runtime object management – CCB validation and application – Administrative operation handling• OpenSAF Implementation – Persistence of information model managed through Persistence BackEnd (PBE) feature – Replicated to multiple cluster nodes
  • 16. System Management Services Software Management Framework (SMF)• SMF controls migration from one deployment Upgrade “Upgrade configuration to another Instructions” Campaign Definition• Upgrade methods – Rolling upgrade Software – Single step upgrade Management Adaptation commands• [De-]Activation Unit Scope Framework (SMF config object) – AMF Node Install / remove - Admin operations – Service Unit software bundles - Read/Create/Delete/Update• During the migration SMF on target nodes objects – Maintains the campaign state change model – Takes measures to enable error recovery – Monitors for potential errors caused by the migration Software Information – Deploys error recovery procedures Repository Model
  • 17. System Management Services• Notification (NTF) – Publish-and-subscribe semantics for system-level notifications – Syntax and semantics for ITU X.73x notifications: • Alarm / security alarm / state change / object create/ delete / attribute change – Alarm and security alarm notifications automatically logged through LOG service• Log (LOG) – Flexible, centralized, system-wide logging mechanism – Pre-defined log streams: alarm, notification, system – Multiple, custom application log streams allowed – Configurable log stream characteristics including: • log file full action: halt, wrap, and rotate
  • 18. Application Services• Checkpoint (CKPT) – Intended as a state replication mechanism for distributed applications – Can be used for all standby “temperature levels” • Cold • Warm • Hot – Through OpenSAF CKPT service API extension – Semantics of a checkpoint • Arbitrary set of sections containing opaque data • Stored in one or more replicas distributed across cluster • Reads and writes occur against the active replica – Both synchronous and asynchronous replication options available – Collocated checkpoint option provided for highest performance
  • 19. Application Services• Event (EVT) – Publish-and-subscribe communication paradigm – Flexible event channel, pattern, and filtering definition – Subscriber event queue maintained within app process• Message (MSG) – Messages sent to and read from message queues – Single message queue owner at a time – Message queue maintained outside app process – Message queues can be logically grouped • Messages can be sent to a message queue group • Associated distribution policy (round-robin, broadcast, etc.)• Lock (LCK) – Cluster-wide, distributed lock service – Can be used to control access to cluster-level shared resources
  • 20. Getting Started with OpenSAF• OpenSAF Technical Educational Resources – Developer Wiki [http://devel.opensaf.org/wiki] – OpenSAF Developers blog [http://devel.opensaf.org/blog] – OpenSAF mailing lists [Subscribe: http://list.opensaf.org/maillist/listinfo/] • Users [Archive: http://list.opensaf.org/pipermail/users/] • Development [Archive: http://list.opensaf.org/pipermail/devel/] • Announce [Archive: http://list.opensaf.org/pipermail/announce/] – Latest documentation [http://devel.opensaf.org/hg/opensaf-4.x- documentation/archive/tip.tar.gz] – FAQ [http://www.opensaf.org/HOA/assn14944/images/FREQUENTLY%20ASKED%20QUESTIONS%20ABOUT%20OPENSAF%20RE LEASE%204%20Final%20for%20publication.docx] – README files in source code repository• SA Forum Application Interface Specifications [http://www.saforum.org/Service-Availability-Forum:-Application-Interface-Specification- ~217404~16627.htm]
  • 21. Questions