G-Cloud Programme vision UK - technical architectureworkstrand-report t8
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

G-Cloud Programme vision UK - technical architectureworkstrand-report t8

on

  • 755 views

G-Cloud Program vision UK - technical architectureworkstrand-report t8

G-Cloud Program vision UK - technical architectureworkstrand-report t8

Statistics

Views

Total Views
755
Views on SlideShare
475
Embed Views
280

Actions

Likes
1
Downloads
14
Comments
0

1 Embed 280

http://www.gridnev.info 280

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

G-Cloud Programme vision UK - technical architectureworkstrand-report t8 Document Transcript

  • 1. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 DATA CENTRE MIGRATION, G-CLOUD AND APPLICATIONS STORE PROGRAMME PHASE 2 Technical Architecture Workstrand Report 5th May 2010 Version 1.5 G-Cloud Business Mark Ferrar Sponsor Director of Technology Strategy Department of Health Informatics Directorate Work Strand Lead: Miles Gray Hardware Platform Architect Department of Health Informatics Directorate Industry Co-Lead: Kate Craig-Wood Managing Director Memset Ltd
  • 2. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 Contents 1. Introduction ...................................................................................................... 3 1.1. Related Documents............................................................................................................. 3 1.2. Key Assumptions ................................................................................................................ 3 1.3. Scope .................................................................................................................................. 3 1.4. Objective ............................................................................................................................. 4 1.5. Key stakeholders................................................................................................................. 4 2. Definitions from a technical perspective .......................................................... 4 2.1. Contextual definitions .......................................................................................................... 4 2.2. Authoritative definitions within the G-Cloud programme..................................................... 5 2.3. Service layers: Infrastructure, Platform and Application ..................................................... 6 2.4. US National Institute of Standards and Technology (NIST) definitions .............................. 7 2.5. Application Workloads ........................................................................................................ 7 3. Architectural Principals .................................................................................... 8 3.1. G-Cloud Technical Architecture .......................................................................................... 8 3.2. Applications developed for the G-Cloud ............................................................................. 8 4. High Level Logical Architecture ....................................................................... 9 4.1. Context ................................................................................................................................ 9 4.2. Component descriptions ..................................................................................................... 9 4.2.1. Applications Store for Government (ASG) ................................................................. 9 4.2.2. G-Cloud Services Interchange (GC-SI) ...................................................................... 9 4.2.3. Certified Components Repository (CCR) ................................................................. 10 4.2.4. Data services ............................................................................................................ 10 4.2.5. Monitoring Services .................................................................................................. 10 4.2.6. Public Sector Network Service Information Monitor (PSN SIM)............................... 10 5. Technologies & market review ....................................................................... 14 5.1. Availability and usability of existing Cloud services .......................................................... 14 6. Proposed supplier and service certifications .................................................. 14 6.1. Service monitoring ............................................................................................................ 15 6.2. Infrastructure services certification ................................................................................... 15 6.3. Platform services certification ........................................................................................... 16 6.4. Software developer services certification ......................................................................... 16 6.5. Service integration / aggregation / management certification ........................................... 16 6.6. Information assurance ...................................................................................................... 16 6.7. Supplier Certification and Impact Levels matrix ................................................................ 17 7. Utility Computing............................................................................................ 17 7.1. Units of Utility Computing (IaaS) resource specification ................................................... 17 7.2. Elasticity and “burstability” of resources ........................................................................... 18 7.2.1. Background............................................................................................................... 18 7.2.2. Definitions for the purposes of this document .......................................................... 18 7.2.3. Discussion and existing examples ........................................................................... 19 7.2.4. Illustrative example of different capacity types in the G-Cloud ................................ 20 7.3. Utility / IaaS open specifications / standards recommendation ........................................ 21 7.4. Batch vs. Real-time Workloads ......................................................................................... 21 8. Interoperability between suppliers ................................................................. 22 8.1. Workload interoperability and migration............................................................................ 22 8.2. Workload scalability .......................................................................................................... 22 8.3. Data abstraction, sharing and interoperability .................................................................. 22 9. Data Centre Migration .................................................................................... 23 9.1. Data Centre Efficiency ...................................................................................................... 23 9.2. EU Code of Conduct for Data Centre Operations ............................................................. 24 9.3. Consolidation and migration of existing services .............................................................. 24 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 2
  • 3. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 1. Introduction This technical architecture provides a common foundation of software and hardware infrastructure principles for multiple business applications. The technical architecture provides the framework for interfaces, protocols, standards and products to be used in defining a platform that supports applications across UK public sector organisations. This document provides a high level of overview of the proposed technical architecture for Government Cloud (G-Cloud), Data Centre consolidation (DCC) and the Application Store for Government (ASG). 1.1. Related Documents Government ICT Strategy Open Source, Open Standards and Re-use: Government Action Plan Work Strand reports from; Information Assurance. Commercial. Service Management. Greening Government ICT CIO and CTO Workbook European Union Code of Conduct for Data Centre Operations 1.2. Key Assumptions Adequate solutions can be architected within the necessary Information Governance and Security framework. Suppliers are able to provide interoperable products and services that function within necessary Information Governance requirements at a cost that is attractive to the public sector. Technology exists to meet all public sector Software, Platform and Infrastructure needs that can be configured as a suitable and acceptable service to the public sector. Appropriate commercial models can be defined, agreed and set in place within the UK and EU procurement legislation to allow services to be bought by public sector organisations. Cloud interoperability will increase. 1.3. Scope In Scope: The hardware and software components required to implement and support the GCloud, Data Centre Consolidation and the Applications Store. Out of Scope: Network connectivity between data centres. Responsibility hands over to the Public Sector Network (PSN) at (and including) the Customer Premises Equipment (CPE) router(s) that terminate a PSN network connection. Data centre to user network connectivity (PSN‟s scope). 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 3
  • 4. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 The desktop strategy and client-side aspects of applications and services (Desktop Strategy scope). Detailed technical specifications of individual system elements. 1.4. Objective The objective of this Technical Strategy is: To define a technical architecture for Government Cloud, Data Centre consolidation and the Government Application Store that reflects both current best practice in the industry, reasonably foreseeable future developments and the UK public sector‟s unique blend of requirements. 1.5. Key stakeholders Stakeholder support of the strategy and approach are critical success factors. Our stakeholders include: The UK public sector CIO Council. The UK public sector CTO Council. The Public Sector Networks (PSN) programme. All public sector organisations in the UK, including, but not limited to Central Government Departments, Local Government Authorities, Non-Departmental Public Bodies (NDPB) and any other organisation within the definition of Contracting Authority within the UK. The IT supply-side industry of product and service providers. The Programme Team. 2. Definitions from a technical perspective 2.1. Contextual definitions The following definitions are taken from the Government ICT Strategy. The concepts are explored in more detailed in subsequent sections of this document. Government Cloud or G Cloud is an internet based ICT infrastructure that enables public bodies to host, select and use ICT systems from a secure, resilient and cost-effective service environment. Data Centre Rationalisation is the reduction in the number of data centres owned or used to host government application services from current (2009) levels to provisionally 10 or 12 highly resilient, secure data centres. something very much smaller, including significantly increasing utilisation of assets within the data centres and reducing environmental impact whilst not compromising the service or data integrity in the process. The Application Store for Government ASG is the gateway to sharing and reuse of online business applications, services and components between public sector organisations. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 4
  • 5. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 2.2. Authoritative definitions within the G-Cloud programme The following definitions have been used in the creation of this document: Key words for use in RFCs to Indicate Requirement Levels – As defined in RFC2119 (http://tools.ietf.org/html/rfc2119). The following definitions are for the purposes of this document and phase 2 of the G-Cloud programme: Utility Computing is the packaging of Computing Resources, such as computation and storage, as a metered service similar to a traditional service utility (such as electricity, water, natural gas, or telephone network). Public Cloud means Utility Computing that is available to individuals, public and private sector organisations. Public Cloud is often non-geographically specific and can be accessed wherever there is an Internet connection. Private Cloud means a Utility Computing infrastructure exclusively for the use of one organisation or community. Hybrid Cloud means a combination of Public and Private Clouds, both remaining separate entities, but with Workload able to migrate between them. Computing Resource refers to computer or server infrastructure resources which includes Processing, Storage and Network, described in more detail in section 7 of this document. Workload refers to any service or software application which makes use of Computing Resources. Burst Computing Resources automatically expand and contract in response to changes in application workload (see elasticity and burstability section). Elastic resources must be requested by the user, operator or application (see Elasticity and Burstability section). “Elastic” differs from burst in that the application or user must request the additional resources for example via an Application Programmatic Interface (API). 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 5
  • 6. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 2.3. Service layers: Infrastructure, Platform and Application The following diagram illustrates the components of the technology stack that are referred to in the remainder of this document. Figure 1: Service layers description Figure 1 Note: * Assumed to incorporate subordinate layers. The following sections are presented under the assumption that there shall exist a set of services known collectively as the G-Cloud and that they comprise Utility Computing services available to the UK public sector. It is assumed that "as a service" means all services within the definition are fully integrated up to and including the respective level, thus incorporating any sub-levels. Therefore, Software as a Service (SaaS) providers could either sub-contract to a Platform as a Service (PaaS) provider, or would incorporate the PaaS themselves and provide it as part of the SaaS "stack". In turn the Infrastructure as a Service (IaaS) could be sub-contracted or incorporated. The customer would see an integrated service. A better name of SaaS might be "Application as a Service", with "Application" plus "Platform" being synonymous with "Workload", however, these definitions are now recognised and widely used across the IT industry, so their use shall continue here. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 6
  • 7. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 2.4. US National Institute of Standards and Technology (NIST) definitions The American NIST recently published a definition of Cloud Computing which has rapidly gained wide acceptance among the consultancy community and is being regarded by some as the standard for reference. The G-Cloud definitions of Cloud Computing for the most part do not conflict with this definition, and the IaaS, PaaS and SaaS definitions fit very well. The NIST definition of Cloud Computing describes five essential characteristics of Cloud, three service models and four deployment models. This can be diagrammatically represented as a cube, as shown in figure 2. Figure 2: NIST Cloud Computing definition (diagrammatic representation) The NIST definition for Hybrid Cloud is defined as “a composition of two or more clouds (private, community, or public) that remain unique entities but are bound together by standardized or proprietary technology that enables data and application portability (e.g., cloud bursting for load-balancing between clouds)”. The G-Cloud can therefore be considered as being most closely aligned with the NIST definition for a Hybrid Cloud. 2.5. Application Workloads A Workload would typically be a software application or service which requires use of the Computing Resources provided from one or more Certified Infrastructure and/or Platform Suppliers (see section 6, “Supplier Certifications”). Workloads would be available via the App 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 7
  • 8. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 Store marketplace, and would normally require a mixture of Compute Resources (Processor, Storage and Network). 3. Architectural Principals The technical architecture shall be built using the following principles: 3.1. G-Cloud Technical Architecture The G-Cloud technical architecture shall: Not exclude foreseeable future technologies Assume implementation through a process of gradual, iterative change. Be provided by suppliers certified for service types to comply with relevant Government Information Governance architecture and standards. Allow for both long- and short-term contracts. Enable rapid provisioning and de-provisioning. Enable the automatic ability to meet peaks in demand. Enable service metering and “pay-as-you-go” pricing. Enable self-service ordering of services. Allow sharing of existing hardware infrastructure where practical. Access the benefits of data centre operations automation. Have an overarching service management framework. Enable and encourage software sharing through improved awareness. Reduce total volume of software licensing through consolidation. Enable better utilisation of hardware resources. Abstract hardware from software (within compatible systems). Be suitable for small, large, static or dynamic Workloads simultaneously. Facilitate improved business value delivery from the underlying technologies. Be approved to protect information and services with an impact of compromise profile of Confidentiality Integrity Availability (CIA) matrices (e.g. 224/444). Allow for complete transparency of pricing, infrastructure and services. Allow for a wide range of client devices, including mobiles, laptops, kiosks and desktop personal computers. Evolve over time as new technologies become available. 3.2. Applications developed for the G-Cloud New applications developed for the G-Cloud shall: Re-use pre-existing components where and whenever possible – in line with the Government Action Plan on Open Source and Open Standards http://www.cabinetoffice.gov.uk/cio/ict.aspx. Be scalable to any reasonable combination of workloads that may be predicted across UK public sector. Be created by Certified software developers. Be individually certified to defined information assurance requirements. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 8
  • 9. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 4. High Level Logical Architecture 4.1. Context Figure 3 shows the physical and logical components of the Government Cloud Computing environment, the Government Application Store and their interaction with each other, the PSN and client devices. Information Assurance and Service Management are not shown as they are considered to be ubiquitous. The intent of figure 3 is to: To describe and map out the physical and logical technical components of the G-Cloud system, as opposed to a map of the conceptual elements. To show how those elements interact with each other and with the Public Sector Network (PSN). To show the boundaries of responsibility of the data centre strategy, PSN strategy and desktop strategy. Additional notes: Central services such as authentication and mail relay are considered “just another hosted application”. Brand names are used for example purposes only and are not an indication of preference or an intent to purchase under any procurement. Information Assurance and Service Management are not shown since they are ubiquitous, however data flows are shown. 4.2. Component descriptions 4.2.1. Applications Store for Government (ASG) The Government Application Store is defined in the Government ICT Strategy as a gateway to sharing and reuse of online business applications, services and components between public sector organisations. In other words the ASG can be thought of as a portal where commissioners or purchasers of services can browse a catalogue of available services. The ASG will also provide detailed information relating to costs, capacity, service levels and lead times required to get a particular service live. The ASG will use the G-Cloud Services Interchange, described below, as the data store of available services. As a key piece of the G-Cloud infrastructure, the ASG must be fully resilient and fault tolerant. In line with the principles outlined in this document, the ASG must not act as either a bottleneck or a single point of failure for users wishing to purchase or provision services from the G-Cloud. 4.2.2. G-Cloud Services Interchange (GC-SI) Certified Suppliers will publish services to the G-Cloud Services Interchange (GC-SI). The ASG will be the portal to purchase and provision services published on the GC-SI. Provisioning a service via the ASG will act as the trigger for the monitoring service data flows.
  • 10. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 4.2.3. Certified Components Repository (CCR) Another element tightly linked (via an API, or possibly part of the same system) to the ASG will be the repository of software, database schemas and virtual machine images certified for use in the G-Cloud. These components would be made available for selection via the ASG where they can be deployed onto the G-Cloud. This process could be automated, for example the user or service integrator would request the desired infrastructure or platform via the ASG , which would then interface with the CCR in order to supply it with the information necessary, for example an IP address or an SSH key, to deploy the component onto the newly instantiated infrastructure or platform. 4.2.4. Data services Section 8.3 discusses the opportunities provided by abstracting data and applying common data models. 4.2.5. Monitoring Services In order to facilitate monitoring, maintenance and fault management of G-Cloud services, monitoring services will be necessary. It is not proposed that there be one central monitoring service. Instead, G-Cloud suppliers will be required to have their services monitored by a means to be decided. For example, a requirement on an IaaS provider might be that they supply and IP address for their data centres‟ core routers and allow ICMP pings from other select suppliers and G-Cloud users, and that they also allow for the installation of probe equipment in their data centres. 4.2.6. Public Sector Network Service Information Monitor (PSN SIM) The PSN SIM is outside the scope of this document, however, a simplified description maybe useful here for context. The PSN SIM will be a centralised record of the PSN services, dependencies and users. Network suppliers will report the status of the portions of the network which they manage to the SIM to facilitate fault tracking. A Dependency Map will be used by the SIM to provide alerts and notifications for incidents relating to dependent services. When provisioning services via the ASG, the Dependency Map, described by the Service Management work strand, will need to be updated so as to facilitate end to end service monitoring. Ideally the PSN SIM would be programmatically updated by the GC-SI. Automatically deployed services could then be monitored via the same system, although such functionality is not currently present in the SIM specification. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 10
  • 11. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 11
  • 12. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 Figure 3: High Level Logical Architecture
  • 13. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 13
  • 14. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 5. Technologies & market review 5.1. Availability and usability of existing Cloud services There are a number of established public cloud services, though the majority at this time appear to have insufficiently flexible service level agreements and geo-location specificity to be usable for most G-Cloud purposes. However, some organisations can offer solely UK-based services and are willing to adapt their operational practices to suit Government requirement, for example by partitioning off an area of their existing data centre facilities and restricting utilisation of that hardware by non-government customers. That example would be an instance of “private cloud” services. It is anticipated that there will be an overlap where some public cloud services are suitable for some Government purposes, as illustrated in figure 4: Figure 4: Public Cloud services suitability for Government consumption 6. Proposed supplier and service certifications This section should be read in conjunction with the Information Assurance Strategy section on „Assurance Methodologies for Services‟. Four types of certification / accreditation are proposed (to be viewed in conjunction with figure 1). Certification processes shall include an audit element (akin to ISO accreditations) and must place the pass mark sufficiently high to ensure good quality, but not so high as to restrict or disable access to innovative Small and Medium-size Enterprises (SMEs).
  • 15. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 IA assurance methodologies that will cover shared services in a cloud environment have yet to be defined. The principles covering the IA conditions and standards that will form part of the certification process will need to be established at the start of the next phase of the programme. A starting point for this work will be the Public Sector Network (PSN) Codes of Connection scheme (including the Technical Standards and the IA Conditions), but applied to the other service layers (infrastructure, platform and application software). This would compel suppliers to demonstrate their expertise in each of the key areas. The certifications would stipulate not only best practices, but also requirements such as the ability to interface systems and services with the centralised management system. For Information Assurance purposes, developers will be certified to produce services at various IL-Ts conforming to a combination of best practices and standards. These standards will be defined during Phase 3 of the programme and will use existing standard, policies and practices as a starting point. The standards will cover how the organisation produces good quality services using an information assurance regime commensurate with the applicable Impact and Threat levels. A key expectation is that certification of the supplier and the service they provide will allow the supplier's service(s) to be listed in the Applications Store as "available". The information in the Applications Store will specify the Impact Level (IL) and Threat (IL-T) combinations for which the service can be used, mandatory interfaces and components that required for technical architecture and information assurance compatibility. Completion of the certification process does not automatically mean that services and applications can be purchased and provisioned from the ASG. Further technical and information assurance processes may well be required. There is also an expectation of a rating system to indicate existing service consumers can indicate their satisfaction with a particular service. This will provide some quality differentiation above and beyond the minimal requirements set by the certifications. 6.1. Service monitoring rd All service suppliers should be required to allow selected 3 party monitoring of the services provided, for example by allowing firewall access for standard monitoring protocols to end-point servers and core network equipment. 6.2. Infrastructure services certification The following proposal will need to be confirmed with the relevant risk management roles during Phase 3: The provider shall not have logical access to machines as root or administrator (i.e. excludes systems administration duties), but would have physical access (for hardware maintenance etc.). Providers would also manage the local network and switching fabric (e.g. service providers like Amazon EC2 or Rackspace). The provider must conform to security standards applicable to the IL-T rating (see below) and must also conform to standards relating to efficiency and best practice, such as the EU Code of Conduct for data centres. It is envisaged that this certification would work like a "Code of Connection" for IaaS providers to make their services available through the G-Cloud. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 15
  • 16. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 6.3. Platform services certification The following proposal will need to be confirmed with the relevant risk management roles during Phase 3: The provider will have root / administrator access and will perform basic systems administration, probably in partnership with one or more application provider. Such services may include backup, service monitoring and help desk. 6.4. Software developer services certification This certification is for software developers and suppliers providing applications (or software) as a service or application support. For software the accreditation will mandate standards related to acceptable development practices. This will ensure software developers are able to offer applications via the Apps Store for trial and interest generation (a prototype). These applications are likely to be offered through a separate development and test community cloud. The use of case studies and pilots in this area will assist in drawing up criteria that can be used by the developers and the governance groups for technical architecture and information assurance. Application prototyping in the first stage will allow for feedback and hence partially enable “crowd sourcing” of the application‟s development. Production applications and stand-alone components will then need to be individually scrutinised and certified before being made available, and shall not to be "altered by prospective customers in order to generate feedback and buy-in. 6.5. Service integration / aggregation / management certification There is also a need for accreditation for systems and services integration, management and aggregation services, which could also be referred to as “Operate as a Service” (OaaS). The technical elements of such a certification would be covered by the certifications already described above. 6.6. Information assurance The IA Strategy section on „Assurance Methodologies for Services‟ covers the proposals for gaining assurance of services to be used in the G-Cloud environment. The current proposal is that each certification will cover a range of security levels, following the IL-T ratings. Examples of mandated practices will include: Infrastructure: Physical security of data centres, security screening of staff. Platform: Patching regime (e.g. months vs. days) and virtualisation layer management (e.g. disallowing unknown kernels) Software: Logging, peer reviewed code, security releases. Integration: Staff screening. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 16
  • 17. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 6.7. Supplier Certification and Impact Levels matrix During Phase 3 of the programme an extensive piece of work will be required to model the Skills, Impact and Threat levels against the IaaS, PaaS, SaaS and OaaS types of service to identify the standards, policies and procedures that will need to be required. 7. Utility Computing 7.1. Units of Utility Computing (IaaS) resource specification Certified infrastructure and platform suppliers (computing utilities) will be able to offer Computing Resource into the G-Cloud marketplace, for use in delivering Workloads. Computing Resource can be any combination of the following, with each element defined in the market place in a granular way. Processing o Capacity (processor cores, clock speed and RAM) o Type (e.g. instruction sets, integer vs. floating-point optimisation) Storage o Capacity (bytes) o Redundancy (probability of failure per unit time, e.g. RAID level) o Access latency (milliseconds) o Data Input/Output (I/O) rate (i.e. bits per second) Network o Bandwidth capacity (bits per second) o Simultaneous connections capacity o Redundancy (number of physically separate / divergent connections to site) All Computing Resources share the common characteristic of required availability. This is normally expressed as a percentage. For example 99.9% availability equates to 8 hours 45minutes of down time in a year whereas 99.99% availability equates to less than one hour of down time in a year. Such measures allow buyers to make an informed decision regarding cost and realistic availability requirements. While it will be possible to purchase any type of physical or virtualised infrastructure resources, ensuring standards do not exclude any particular type of hardware platform, it is important to standardise how resources are advertised in order to be able to treat (compare) them as a utility. Processing capacity, for example, may need to be expressed in terms of a common benchmark or baseline comparison. Each of the three Computing Resources could have all variables defined, but practically, this should be limited to just a few key variables. In order to allow for like-for-like comparison and better interoperability, as well as making provisioning simpler for suppliers, it is proposed that the Processing resource be a combination of instruction set type (e.g. x86 vs. RISC vs. GPU) and a small set of fixed ratios of CPU-toRAM, which are also constrained by the existing architectures. For example, Processing resource might be offered as either standard, high memory or high CPU. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 17
  • 18. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 7.2. Elasticity and “burstability” of resources 7.2.1. Background There are two modes of delivering additional Computing Resources on a short time frame, and there are also two terms broadly used to refer to them, however at present the terms are poorly defined. Therefore, for the purposes of this document they shall be given clear meanings. It is worth noting some of the ways in which the terms elasticity and burstability are applied. Elasticity, in the literal term, means something that can expand and will automatically contract to its previous state without intervention. Plastic means something which, if expanded, stays in its new shape. Amazon's "Elastic Compute Cloud" is named for marketing purposes. Technically it would be more accurately described as a "plastic compute utility" since it is a) reconfigurable only in response to API requests, and b) a single large utility computing service. However, we shall use “elastic” to refer to how their service operates since the term has become widely utilised in that sense. “Burstability” is also a commonly used term, most frequently to refer to the ability of a network connection‟s bandwidth to spike above the normal levels for a short time. Burstability has also been applied as a term by the long-standing virtual machine provider community, now described as utility computing providers, to describe RAM or CPU resources that are available for utilisation in brief load spikes. In both cases the mechanism is normally being used for peak load curtailment and to contend resources between users. 7.2.2. Definitions for the purposes of this document In the following description "burst" means that the resources are automatically made available without the user or application having to do anything. It is a property of the underlying platform in response to changes in workload. An example would be CPU utilisation - it is consumed automatically on demand and reverts to an idle state when not required. "Elastic" means resources that must be requested by the user, operator or application, for example by the application requesting additional virtual machine instances via an Application Programming Interface (API) from an "overflow" pool in response to changes in workload. It is likely that the majority of existing applications would need modification in order to take advantage of burst capacity. All Computing Resources share some or all of the following characteristics: Dedicated capacity: Resource entirely dedicated to the application workload which is never re-allocated nor shut down. This is how most capacity within the public sector is currently provisioned. Guaranteed burst capacity: Capacity which is available for use at all times, but which may be automatically re-allocated or shut down when not used without direction from the operator or hosted application. This capacity would be constrained to the limits of the hosting physical hardware or network connection, and would normally be restricted to one physical location. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 18
  • 19. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 Non-guaranteed burst capacity: As above, but not guaranteed. Common examples of this are consumer broadband where a user can consume up to 8Mbps but only when others are not using the bandwidth and also existing virtual machine suppliers, who allow CPU and RAM to breach the minimum guaranteed limits. Guaranteed elastic: Capacity, generally in the form of additional virtual machines, which can be requested by the operators (e.g. ahead of an expected load-spike) or ondemand programmatically by the application. Elastic capacity would normally have to be explicitly decommissioned (contrary to the implication in its name), and could be regarded simply as a form of very rapid provisioning. Non-guaranteed elastic: As above, but not guaranteed to be available. An example of this sort of capacity is Amazon's Elastic Compute Cloud (EC2) platform. Requirements may be specified as a timeline, with both capacity (quantity and type) and other characteristics such as uptime requirements and response-time Service Level Agreements (SLAs) able to change over time. That will enable peak load curtailment, for example HMRC's peak and the DVLA's peak being at different times of the year, thus reducing overall hardware requirement. The size of the time units for dedicated (and other) capacities should not be heavily restricted (initially some suppliers may only be able to do months), but the minimum time-unit initially will probably be one hour (like Amazon EC2), though in time government should expect and request finer granularity of charging, perhaps down to “per second” billing. 7.2.3. Discussion and existing examples A key differentiator between elasticity and burstability is that burstability has pre-defined limits, whereas elasticity need not have a limit. However, in practice it is unlikely that government would rely on non-guaranteed elastic capacity for mission-critical real time Workloads. It is also unlikely that suppliers will provide guaranteed elastic capacity without charge. However, the likelihood of non-guaranteed elastic capacity being unavailable may be so low as to not be a concern to the end-user. Elastic capacity will probably be billed for by the hour, with a surcharge for guaranteeing its availability. Non-guaranteed burst capacity would normally be a provided alongside a guaranteed component, for example a virtual machine on a shared host with a minimum guaranteed (within the SLA) CPU time share or a an ADSL connection (in which both cases an additional charge is not levied for the use of the non-guaranteed element). Guaranteed burst capacity may be chargeable, perhaps in-part, in-advance. For example, it might be purchased like traditional dedicated capacity, but with a rebate if it is not used. As with elasticity, it is likely that guaranteed burstability will be more expensive than nonguaranteed (analogous to a contended ADSL line vs. a dedicated network connection). Nonburst capacity is essentially the method of utilising the spare resources when the guaranteed capacity is being underutilised (e.g. at night), and may be useful for processing regular batch workloads, such as data analysis, cost effectively. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 19
  • 20. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 7.2.4. Illustrative example of different capacity types in the G-Cloud Figure 6 below illustrates an example scenario of how the different types of capacity might be utilised in the G-Cloud, and also shows the importance of the differentiations: Figure 6: Elasticity and burstability At night batch workload is able to burst into spare capacity. During daytime, batch application workload (BAW) is squeezed back down partially by real-time (human-driven) application workload #1 (RTAW1), and all the way down to its guaranteed capacity on VMH's it shares with RTAW2. Further, in this example RTAW1 the real-time (human-driven) workload has needed to use more than its capacity in facility1 by elastically spawning new virtual machine instances in facility 2. However, only half of RTAW2's elastic capacity in facility 2 is guaranteed, and although there was plenty of non-guaranteed at night there is little during the day, thus RTAW2 has no further capacity to consume. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 20
  • 21. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 7.3. Utility / IaaS open specifications / standards recommendation One of the requested outputs from the technical architecture work strand in phase two is a recommendation of which open standard(s) to utilise in the first instance of the G-Cloud for accessing IaaS / utility computing resources. Cloud management systems being investigated include: Eucalyptus - Open Source Cloud toolkit presented like Amazon EC2. Haizea - An Open Source VM-based Lease Manager. OpenNebula - Open Source Cloud management toolkit allowing hybrid model (burst into Public). Microsoft System Centre Operations Manager (part of BPOS and BPOS-D). Novell Intelligent Workload Management. Open Cloud standards reviewed: Amazon EC2 and Simple Storage Service (S3) APIs (being utilised by other companies than Amazon). Open Grid Forum's (OGF) Open Cloud Computing interface (OCCi). DMTF's OVMF (sponsored by Cisco and VMware). At present Amazon's EC2 and S3 API has become something of a de-facto standard, where considerable development effort is being expended. The EC2 API allows for maintenance of virtual machine (VM) images and for VMs to be provisioned and de-provisioned, whilst the S3 APIs (one based on REST, the other on SOAP) allow for simple data object storage, retrieval and management. Other functioning examples are Rackspace‟s Mosso API and Sun‟s Cloud API. These APIs are very simple and do not include all the functionality that might be envisaged or covered in this document (e.g. specifying the full range of VM instance characteristics), but could still be used as a common starting point. However, the vendor-backed APIs are not patent unencumbered, thus should probably not be used as a long-term solution. The recommended long-term solution is the OCCi, however at the time of writing the interface specification only loosely defines a protocol and does not contain a full API. Having spoken to the steering group of the OCCi it is clear that their intent is to create a fully open, patent unencumbered API for general use, which would be ideal for G-Cloud purposes. Further, there is an opportunity for the G-Cloud programme to influence the direction of development of the OCCi, which should be exploited. 7.4. Batch vs. Real-time Workloads With elastic and burst resource capacities it will be possible to mix Workload types within one pool of servers thus maximising utilisation, which is the most efficient way to use a server from both a cost and carbon perspective. Even simply turning under-used servers off is inefficient when the embedded energy (i.e. the carbon cost of manufacture) and money (i.e. the capital expenditure) are taken into account. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 21
  • 22. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 Many applications must respond to real-time, human-generated demand thus need guaranteed elastic or burst capacity. However, the majority of such applications often need little resource at night time. In the commercial world this issue is being addressed by "following the moon"; balancing Workloads across multiple time zones. However, may not be an option for some GCloud services given the constrains outlined in the Asset Valuation and Aggregation section of the Information Assurance Strategy. The resources can still be entirely used, however, without the necessity for relying on nonguaranteed burst or elastic capacity for real-time critical applications by instead mixing real time Workloads with batch processing Workloads within the same compute grid. There are many batch-type Workloads where the compute task will take a long time (longer than a human wishes to wait) and where whether the task is completed is an hour or in a couple of days does not greatly matter to the user, such as large data set analysis. 8. Interoperability between suppliers 8.1. Workload interoperability and migration In the most flexible vision of the G-Cloud it will be possible to transition Workloads between Certified Providers. In reality, at this time, this will only be practical for multi-node server-type Workloads that operate in a stateless manner across a cluster of servers which probably share common processor architecture. Interoperability will not work with Utility Computing platforms (PaaS) which take application code directly (e.g. Google Web Apps or Microsoft Azure), unless the receiving platform is capable of hosting the same software. In order to allow for Workload distribution across multiple providers and the transition of Workloads from one infrastructure or platform provider to another, the control of that application's cluster will probably need to be centrally managed, maybe with a service like the PSN's SIM acting as the hub. One alternative to centralised cluster management would be DNS-based randomised load balancing. Non-real time, manual Workload transitions will be simpler; the process would be similar to rolling out a new version of an application, with new server nodes being provisioned, tested, and then scaled up in size and/or scaled out in number at the new compute utility. 8.2. Workload scalability It is important to note that simply virtualising an application does not automatically make it scalable, and that most pre-existing applications would need to be extensively enhanced or rewritten in order to take advantage of elasticity. However, if a pre-existing application Workload is already designed to be hosted on a cluster of server nodes, and if it can be easily migrated from its existing environment onto a virtualised one, then it should be possible to host that Workload in an elastic environment, such that its resource utilisation can automatically expand and contract in response to demand. 8.3. Data abstraction, sharing and interoperability There are cases where the data should be considered in isolation from the application, and where it is highly desirable for the data to be abstracted from the application. For example, 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 22
  • 23. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 record sets containing citizen data are likely to have common components, and it is recommended that wherever possible common data models (data schemas) are developed and used. This would have three advantages: 1. Abstraction: Allow data to be divorced from applications, thus preventing software vendor or SaaS provider lock-in. 2. Interoperability: Allow for data sets to be operated on and utilised by more than one application, enabling greater innovation and flexibility. 3. Sharing: Facilitate the exposure of existing isolated data sets to the G-Cloud where they could be aggregated and re-distributed via a data distribution services, as suggested in figure 3. For example, some individual police forces have their own data repositories which are not inherently interoperable. By developing a standard data schema and common data model for police records, those data sets could be pooled and accessed via a standardised data distribution service. This would result in greatly enhanced public services by allowing better sharing of data between police forces, and potentially other government organisations, without having to create one large central data store. There are a number of existing examples of common data models being successfully deployed, resulting in reduced costs and enhanced service delivery: Telecommunications sector and the Shared Information Model (SID). NATO and JC3. Banking system and SWIFT. Insurance sector and ACCORD. 9. Data Centre Migration The stated aim of the Data Centre migration activity is to reduce the overall number of data centres used by the UK public sector to between approximately 10 or 12 secure, resilient facilities, with a corresponding reduction in cooling and power consumption. The rationalisation and standardisation of common applications, such as email, provides an excellent opportunity to increase the utilisation and reduce the amount of Computing Resource required. Section 7.4 of this document discusses how virtualised environments make the best use of the underlying hardware resources. 9.1. Data Centre Efficiency The Power Usage Efficiency (PUE) or Data Centre Infrastructure Efficiency (DCiE) metrics provide a measure of the efficiency of a facility in terms of the electricity usage of the facility compared to the electricity usage of the computing resources. They do not measure the utilisation rates of the computing resources themselves. For example, a facility which has a high DCiE figure (or low PUE figure) may well be running servers with low utilisation rates. Modern data centres can achieve low PUE rates, often less than 1.4, by a combination of efficient power infrastructure design and managing the air flow in the facility. Power equipment in a data centre often comprises power conditioning equipment, UPS, distribution units, cabling 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 23
  • 24. Data Centre Strategy, G-Cloud and Applications Store Programme Phase 2 and backup generators. The selection of this equipment should be based on a combination of the requirements for resilience and the actual power draw of the computing resources. The rating of a server‟s power supply is often several times greater than the power draw of the server, even when highly utilised. The implementation of hot and cold aisle containment can all but eliminate the need for the mechanical cooling of the cold air supply. The application of current best practices for the design of modern data centres, such as those listed in the EU Code of Conduct for Data Centre Operations will contribute significantly to achieving such efficiencies. 9.2. EU Code of Conduct for Data Centre Operations The EU Code of Conduct for data centre operations (EU CoC) provides a number of best practices which can be applied to data centres regardless of whether they are already in use, undergoing a retrofit process or still being planned. The best practices in the EU CoC are now listed in the Greening Government ICT CIO and CTO workbook. This paper recommends the application of those best practices to facilities providing Computing Resources for the G-Cloud. 9.3. Consolidation and migration of existing services Performing an audit of applications in use across the UK public sector would be an enormous undertaking. The Service Specification and Business Transition process will identify services that are suitable for inclusion in the initial G-Cloud. Before initiating a data centre consolidation exercise it will be necessary to identify a range of applications and services that are in common usage across the UK public sector. For example the NHS mail system is a common platform that is in use by hundreds of thousands of users spanning many NHS trusts. Extending the use of this platform beyond the NHS would enable other public sector organisations to benefit from the low cost, high security and high availability the NHS mail system. Other examples of common services and applications may include human resources, enterprise resource planning or finance applications. Services in the G-Cloud shall be designed and built in accordance with the architectural principles outlined in this document so as to facilitate a gradual migration of services. As organisations migrate existing services into the G-Cloud, the services will scale in a way which guarantees efficient and high utilisation of the underlying Computing Resources. Furthermore, The ASG will allow organisations to understand the costs and service levels offered by G-Cloud services prior to purchasing them. The ease of procurement, fast provisioning and known service levels will be compelling reasons to migrate existing systems and services into the GCloud. 08 FINAL _G-CLOUD_ Technical Architecture Workstrand Report _version v1.5_.docUNCLASSIFIED 24