This presentation is about the requirements on monitoring and management in a NextGrid context. NextGrid’s main goal is to make the Grid suited to commercial resource and capacity trading between businesses. SLA Management as known from the application and internet service provisioning plays a fundamental role in achieving this, and is therefore the focus of this talk. Disclaimer, NextGrid just had its first set of Kick-Off meetings this week so this is still very preliminary work in progress. To map some of the requirements to a real deployed Grid system, SGAS and its requirements are presented. Focus on resource management.
Novel architectures and reference implementartions with end goal to achieve Grid sustainability and business viability.
The goals of the NextGrid Work Package task most relevant to this session the Grid Foundations – Advanced Deployment, Service Management and Migration task will be presented. What is an SLA Manager expected to do, what services does it make use of. Use case today, and future development as part of the NextGrid project. Summarize some of the NextGrid requirements.
Bulding blocks for business viable Grid services built on top of the OGSA architecture. Deliver framework and reference implementations. Monitor and manage both resources and services. Autonomous, self-healing systems without central control the goal. Highly dynamic systems with high availability goals. Phase 1 is to gather requirements. This session is therefore providing very useful input to that work. Phase 2 is to develop a reference implementattion of a framework for QoS based SLA management. Finally in phase 3 integrate monitoring and management solutions to provide business viable solution capabale of giving end-to-end QoS guarantees.
What’s involved in SLA management throughout the lifecycle of a contract. SLA construction client side, service side, automatic or manual. Initial offer must be negotiated to meet needs of both parties. Auctions, bidding, policy evaluation, fixed and dynamic terms. Clients broadcasting request for offers. Services marketing/offering themselves actively seeking clients. Make sure guarantees are fulfilled. May involve evaluating penalties and rewards estimates. Pay for usage, archive records of usage. Price according to SLA outcome and rating policies.
Services a SLA Management framework must provide and/or intercat with (outsource to 3 rd parties). Work that needs to be done to achieve agilie and adaptive, autonomously managed resources. Standard Interaction/Negotiation Protocols and ability to enforce agreed terms by dynamic customization of resource properties. Important for adaptive systems is to track and analyse usage and build up knowledge repository for future optimizations. In order for policies and contractracts to have an effect for end users traffic needs to be policed/shaped to stop violating access. Involves authorization decisions. NextGrid how to make the Grid viable to B2B, pricing rating engines to implement various business models. Example neural networks to introduce learning capability into fuzzy networks. Ontologies important for work flow engines but also for service discovery/negotiation. Usage rep quantitative, knowledge rep qualitative.
SweGrid, interconnecting 6 HPC centers in Sweden into a Grid serving the scientific community. Resource quotas given out by national allocations comitte to promising research projects. SGAS coordinated accounting system enforcing allocations across the entire Grid. Policy driven, on-line decisions. Standards based. GGF-UR transformation and tracking as a basis for service charging. Meta-accounting system, integration platform rather than monolithic accounting application. Based on OGSI primitives with transition to WS-RF planned.
How we plan to introduce SLA management into SGAS architecture. RSL sent to metascheduler that picks a resource to exwecute the job. It is intercepted by SGAS and a reservation is made in the bank corresponding to the cost to run the job and the time it is intended to run. Whether the reservation is allowed is subject to both resource policies, and bank policies, as well as user resource specification. User Negotiation Agent with goal to optimize user QoS, Resource NegotiationAgent with the goal of optimizing resource utilization.
Adapting information to its context involves ability to do SLA-SLS mappings. Event handling for quickly detecting or actively looking for contract violations. Virtualizations, simplify controlling and adapting resources to dynamically changing run time environments. A way to restrict, sandbox applications to enforce quotas/allocations.
RAs are conservative, low-risk, low-intrusion, user transparency key. Policies introduced in a way that fits stake-holder. Needs to be done dynamically through PAPs or configuration in running system.
NextGRID SLA Plans
NextGRID Monitoring and Fabric Management Requirements SLA Management Example: SweGrid Accounting System and Test-bed Thomas Sandholm, KTH, firstname.lastname@example.org
NextGRID <ul><li>How do we make </li></ul><ul><li>the Grid sustainable? </li></ul>
Outline <ul><li>NextGrid WP4: Grid Foundations - Advanced Deployment, Service Management and Migration </li></ul><ul><li>SLA Management Lifecycle: Construction, Negotiation, Attainment, Charging </li></ul><ul><li>Towards Adaptive Systems: SLA Manager Bag of Services </li></ul><ul><li>Example Test-bed: SweGrid & SGAS </li></ul><ul><li>Example SLA Usage: SLA Management in SGAS </li></ul><ul><li>Requirements Checklist </li></ul>
NextGRID WP4: Grid Foundations - Advanced Deployment, Service Management and Migration <ul><li>Work Package - Grid Foundations: </li></ul><ul><ul><li>Address basic properties, protocols, and core services of individual OGSA services, e.g., QoS & Manageability – engineer reference solution </li></ul></ul><ul><li>Task - advanced deployment, service management and migration: </li></ul><ul><ul><li>Requirement: Decentralized automatic control needed over hardware comprising Grid fabric as well as applications and services running on that fabric </li></ul></ul><ul><ul><li>Requirement: Incremental evolution to avoid loss of service </li></ul></ul><ul><ul><li>Focus on autonomous service management and SLA management </li></ul></ul><ul><ul><li>Phase 1: analyse available monitoring and supervision solutions. Requirements from existing Grid projects, e.g., Framework 5 Projects, GRASP, Android, SweGrid </li></ul></ul><ul><ul><li>Phase 2: develop management framework ,SLA+negotiation </li></ul></ul><ul><ul><li>Phase 3: integrate monitoring and management solution and introduce intelligent decision-making process . </li></ul></ul><ul><li>NG Partners: British Telecom (UK), HLRS (Germany), KTH (Sweden) </li></ul>
SLA Management Lifecycle <ul><li>Construction Phase: offers prepared by service providers (or their agents) with fixed and negotiable terms , service requests with QoS requirements prepared by customers (or their agents) </li></ul><ul><li>Negotiation Phase: negotiation protocol needed to settle on negotiable terms and sign SLA. SLA-SLS mapping . </li></ul><ul><li>Attainment Phase: monitoring , policing, re-negotiation, re-configuration, obligation fulfillment. </li></ul><ul><li>Charging Phase: accounting , usage recording, auditing, archiving, price rating, billing. </li></ul>
Example Test-bed: SweGrid & SGAS <ul><li>Swedish nation-wide computational resource comprising 600 Intel P4 at 6 HPC Centers interconnected with 10Gb/s GigaSunet network </li></ul><ul><li>Resources allocated to promising research projects with demanding computational and storage needs by national allocations comittee (SNAC) </li></ul><ul><li>SweGrid Accounting System (SGAS) provides soft real-time allocation enforcement across all centers in the Grid based on SNAC quota </li></ul><ul><li>3-party policy-driven resource access (user resource specification, local resource policy, allocation authority policy) </li></ul><ul><li>Java Web services, OGSA, WS-Security, GSI, GGF-UR, XACML standards-based Infrastructure </li></ul><ul><li>Integration platform for workload managers and local accounting systems/schedulers </li></ul><ul><li>Currently built with GT3 (OGSI), transition to GT4 (WS-RF) next year </li></ul>
Example NextGRID Deliverable Use: SLA Management in SGAS Allocation Authority Reservation Manager Usage Tracker Policy Manager Resource Specification Policy Manager Bank Resource Service Registration/Discovery Service Monitor 3rd Party (ARC/Globus) Remote Execution Service Negotiation Agent Negotiation Agent
Requirements Checklist (incomplete in random order) <ul><li>Decentralized automatic control needed over hardware comprising Grid fabric as well as applications and services running on that fabric (WP4) </li></ul><ul><li>Incremental evolution to avoid loss of service (WP4) </li></ul><ul><li>Common information models for service level agreements and for the management information that is required to deliver end-to-end application quality (WP3) </li></ul><ul><li>Techniques for adapting the representation of information according to its context (WP3) </li></ul><ul><li>Standardized QoS Ontologies to allow monitoring on predefined SLA parameters with well-defined metrics </li></ul><ul><li>Sensors and Controllers on various levels (e.g. Resource, Workflow) wrapping instrumented code – accessed using standard protocols defined in WSDL </li></ul><ul><li>Registration/Discovery of Sensors and Controllers – using standard protocols defined in WSDL </li></ul><ul><li>Both Push and Pull Event Handling of messages of various criticality (filterable) </li></ul><ul><li>Virtualization of Resources, Abstract Runtime (Hosting) Environments </li></ul><ul><li>Back-end SLS Control : CPU, Bandwidth, Storage, Memory </li></ul><ul><li>Front-end SLA Request : availability, run time, jitter, cost </li></ul>
Example Test-bed Experience: SGAS Resource Administrator Interaction and Policy Introduction <ul><li>Involve RAs early in the process with surveys </li></ul><ul><li>Feedback from running system crucial to move from prototype to production </li></ul><ul><li>Use a phased low-risk, low-intrusion deployment approach </li></ul><ul><li>Allow all stake-holders (e.g. RAs, users, resource owners) to customize local policies easily through XML document centric configurations and transformations, e.g. RSL, XACML, GGF-UR Style sheets. Provide sensible defaults. </li></ul>