This is a PowerPoint presentation delivered by Prof John Morrison (UCC) on 9 December 2016 at the IC4 and Host in Ireland Workshop: Data Centres in Ireland.
3. Partners
CloudLightning comprises of
eight partners from academia
and industry and is
coordinated by University
College Cork.
Industrial partners:
• Intel Ireland (IE)
• Maxeler (UK)
Academic partners:
• University College Cork (IE)
• Norwegian University of
Science and Technology (NO)
• Institute e-Austria Timisoara
(RO)
• Democritus University of
Thrace (GR)
• The Centre for Research &
Technology, Hellas (GR)
• Dublin City University (IE)
5. Specific
Challenge
CloudLightning was
funded under Call H2020-
ICT-2014-1 Advanced
Cloud Infrastructures
and Services.
The aim is to develop
infrastructures, methods
and tools for high
performance, adaptive
cloud applications and
Services that go beyond
the current capabilities.
• Cloud computing is being transformed by new requirements such as
- heterogeneity of resources and devices
- software-defined data centres
- cloud networking,security,and
- the rising demands for better quality of user experience.
• Cloud computing research will be oriented towards
- new computational and data management models (at both
infrastructure and services levels) that respond to the advent of
faster and more efficient machines,
- rising heterogeneity of access modes and devices,
- demand for low energy solutions,
- widespread use ofbig data,
- federated clouds and
- secure multi-actor environments including public administrations.
6. EU Use Case
Motivations
CloudLightning’s use cases
support the European
Union HPC strategy and
specific industries
identified by IDC in their
recent report on the
progress of the EU HPC
Strategy (IDC, 2015).
1
The health sector represents 10% of EU GDP and 8% of the
EU workforce (EC, 2014).HPC is increasingly centralto
genome processing and thus advanced medicine and
bioscience research.
2
The oil and gas industry is responsible for 170,000 European
jobs and €440 billion of Europe's GDP (IDC, 2015).HPC
improves discovery performance and exploitation.
3
Ray tracing is a fundamental technology in many industries
and specifically in CAD/CAE,digital content and mechanical
design, sectors dominated by SMEs.
4
European ROI in HPC is very attractive - each euro invested
in HPC on average returned€867 in increased
revenue/income (IDC, 2015).
7. The HPC
Market
Although the EU has the
largest GDP in the world
(€13.2 trillion), the U.S. has
substantially outspent the
EU region in high
performance computing
which has a knock-on effect
in scientific discovery,
innovation and
competitiveness.
IDC estimate the HPC market at €21bn.
IDC forecasts that European HPC ecosystem spending will increase by
37.8% (6.6% CAGR) to reach about €5.2 billion in 2018, or 24.9% of
worldwide HPC ecosystem spending (€21.3 billion).
8. HPC
Challenges
“The challenge is less
about educating users
about cloud computing and
more about the ability of
clouds to handle more
types of HPC jobs over
time.”
IDC, 2015
1 Hard to use without deep IT knowledge
2 Expensive
3 Inaccessible to individuals and SMEs
Traditional High Performance Computing is…
4 Inflexible
Most HPC workloads are not ready to run on today’s cloud architectures.
9. The Market for
HPC in the
Cloud
Cloud segment is the
one of the smallest but
fastest growing
segments in the HPC
market.
Spending on HPC in the
cloud and Hybrid-custom
HPC clouds is forecast to
grow from US$1.7bn in
2015 to US$5.2bn in 2017
(IDC, 2015).
The proportion of HPC sites employing cloud computing has grown from
13.8% in 2011, to 23.5% in 2013,to 34.1% in 2015 (IDC, 2015).
CloudLightning primary researchsuggests 48% of sites are using cloud
computing although for relatively less complex workloads.
$1.5
billion
$3.7
billion
$15.4
billion
Hybrid-Custom HPC Clouds
(2017)
HPC Public Clouds
(2017)
Traditional HPC Servers and
Private Clouds
(2017)
10. Drivers and
Barriers to HPC in
the Cloud
Adoption
Our primary research
(n=92) confirms our desk
research which suggests
that there are significant
economic and capacity-
related drivers but both
general cloud and HPC-
specific barriers to HPC
in the cloud adoption.
1
Access to extra capacity for
overflow or surge workloads
2 Reduced capital costs
3
Access to a datacentre or
specialised software
Drivers
1 Data protection and control
2
3
Complexity and difficulties migrating
and integrating existing systems with
the Cloud
Barriers
Communication speed
concerns
11. CloudLightning
Objectives
CloudLightning seeks
to address the
challenges in the HPC
market through 9
technical, commercial
and societal objectives.
Build Prototype
Management System
and Delivery Model
(WP4, WP5, WP6)
Competitive Advantage
through Infrastructure
Efficiencies
(WP4, WP8)
Energy Efficiency
(WP3, WP7)
Validate Approach with
Use Cases
(WP5, WP6)
Competitive Advantage
through Improved
Accessibility
(WP5, WP6, WP8)
Improved Accessibility
to Cloud Resources
(WP2, WP5, WP6)
Demonstrate
Scalability
(WP7)
Opportunities in Use
Case Domains
(WP2, WP8)
Scientific Advancement
(WP8)
Technical Objectives CommercialObjectives Societal Objectives
12. CloudLightning
Approach
CloudLightning proposes a
novel architecture for
provisioning heterogeneous
cloud resources to deliver
services, specified by the
user, using a bespoke
service description
language.
01
Complexity
CloudLightning uses self-
organisation and self-
management to manage
complexity effectively.
02
Heterogeneous
Resources
CloudLightning was
specifically for
heterogeneous hardware
03
IaaS
Access
04
Energy Efficiency
05
Resource
Utilisation
CloudLightning
uses dynamic
workload and
resource
management to
increase the
efficiency of
resource utilisation.
06
Service
Deployment
The CloudLightning
deployment
mechanism
simplifies the
operational
overhead for non-
technical users
Achieved through
heterogeneous resources,
reducing overprovisioning,
maximising VM/server density
and turning off idle servers
Clear service interface through
separation of concerns between
consumer and
provider.
13. Gateway
Service
Self Organizing
Self Management System
Plug & Play
Service
Blueprint
Creator
End User
Services
Catalogue
Blueprint Catalogue Enterprise
Cloud
Operator
Gateway
Service
UI
Heterogeneous Resources
New Hardware
Deploy
Service
Service
User
Perspective
Monitor
Request
to join
CL-Resource
Discover
Resource
Extract / Modify
Blueprints
Request
Resource
CL-Resources
Deploy Blueprint
Running
Service
Extract
Blueprint
Get
Services
Create
Blueprints
Get
Status
Resource
Handler
14. Progress Beyond
the State of the Art
CloudLightning is, and will,
contribute to progress
beyond the state of the art
across all technical work
packages and primary use
cases.
We are, and will, contribute
to:
1. The expected impacts
listed in the call topic
2. The innovative capacity
of the consortium
members
3. The innovative capacity
of European industry
4. Other European
environmental and
societal priorities
Cloud
Architecture
Service
Description
Languages
Local
Decision
Strategy
Framework
Resource
Coalitions
Ray Tracing
Oil & Gas
Genome
Processing
Large Scale
Simulation
1
5
37
2
6
4
8
17. Design
Requirements
Create a Heterogeneous
Service-Oriented Cloud
Architecture to Support
HPC Workloads
1
2
3
4
Ease of Use
Improve Resource Utilization compared to current Cloud
deployments
Support Heterogeneity
Improve Service Delivery
19. Service 1
Service Catalogue
Service 2
Service 3
Implementation Library
Implementation 1
Implementation 2
Implementation 3
id: unique identifier
definition: concrete
SW/HW
(...)
Implementation
id: unique identifier
definition: service specification
constraints: logical expressions
metrics: atomic values
parameters: atomic values
Service
id: unique identifier
constraints: logical expressions
metrics: atomic values
parameters: atomic values
Blueprint
No implementation
Blueprint 1
Blueprint Catalogue
Blueprint 2
Blueprint 3
Composition of services
Blueprints,
Service
Catalogue and
Implementation
Library
• A Blueprint is a
composition of services.
• A service describes the
features of many
different hardware types
and executable code for
the same task.
• An implementation is
an executable code
on a hardware type of
a task.
20. CloudLightning
API Flow
The main CL system
components,APIs,
communication protocols
and a sequence of
documents that maintains
the state of each,and every,
interaction has been
defined.
24. We assume a Cloud with a
Resource Fabric far
greater than that currently
available.
Adding structure to the
Cloud Fabric by creating
virtual partitions and
grouping them together.
Management of
physical
resources
• The resource fabric is partitioned
into vRacks.
• Each vRack is managed by a
vRack Manager.
• A vRack Manager can form
Coalitions of its resources to
support services.
• vRack Managers self organize to
optimize service delivery
Heterogeneous
Physical Resources
25. • A vRack is a
homogeneous
partition of the
resource fabric.
• Each vRack is
managed by a
dedicated vRack
Manager.
• vRack Managers of
different types exist
based on the resource
types being managed.
vRacks and
vRack Managers
Svr
Svr
Svr
Svr
Svr
Svr
Svr
Svr Svr
Resources Fabric
vRack
vRackvRack
vRack
vRack
vRack Manager
Specialized
HW
Specialized
HW
vRack
vRack
Svr Svr Svr Svr
vRack Manager
Dedicated High-speed Interconnection
Svr Svr
vRack
vRack Manager
26. • Groups of vRack
Managers can be
formed to simplify
access to resources
and to enable self-
organization
• There are three types
of vRack Manager
Groups.
vRack Manager
Groups
vRack
Manager
Specialized
HW
Specialized
HW
vRack
vRack
Manager
Specialized
HW
Specialized
HW
vRack
vRack
Svr Svr Svr Svr
vRack Manager
Dedicated High-speed Interconnection
vRack
Svr Svr Svr Svr
vRack Manager
Dedicated High-speed Interconnection
Type A
Type B
Type C
Svr Svr
vRack
vRack Manager
Svr Svr
vRack
vRack Manager
27. To generically manipulate
resources of different
types, the SOSM system
introduces the conceptof
a CL-Resource.
CL-Resources refer to
different hardware types
and to different
configurations ofthose
type.
Thus heterogeneity can
be introduced
dynamically.
CL-Resources
Local Resource Manager
Svr
MIC
Svr
Svr
Svr
MIC MIC
MIC
MIC-World
MIC Cluster of Servers Container/VM
Resource Partitioning Posibilities
28. Advanced
architecture
support
• Dynamic VPN creation
for Blueprint Service
Execution
• Autoscaling
• High availability
• Data locality
Blueprint
S1
S3
S2
vRack
Server
Server
Server
Server
vRack
Server
Server
Server
Server
Virtual Network
Connection
30. A Framework
for Hosting and
Executing
SOSM
Strategies
A framework for hosting and
executing SOSM strategies
associated with any
hierarchical architecture to
achieve their local goals,
eventually the whole system
evolves to the ideal global
goal state.
Perception
Metrics
Assessment
Functions
Impetus
Weights
Suitability
Index
Directed
Evolution
33. Customizing the
self-organisation
self-management
framework with
CL strategies
The Assessment Functions and
Directed Evolution are related to
the CL specific objectives of:
• Maximizing task throughput
• Maximizing energy efficiency
• Maximizing computational
efficiency
• Maximizing resource
management efficiency
Metrics
Weights
Perception Impetus
Suitability
Index
Local goal: maximize its
Suitability Index
35. Self-organisation
framework
augmentations in
support of
virtualization
Goals:
• Support for
virtualization
• Increase resource
utilization
• Decrease job rejection
rate
Add new assessment function reflecting
Memory consumption
Two-stage self-organisation strategy
introduced: CPU and vCPU
Resource over-commitment is addressed
36. • Coalitions are used to
supportthe process
parallelism within a
service.
• Coalitions existentirely
inside a vRack.
• The CL-Resources ofa
Coalition may span
multiple servers within
the same vRack.
WP 3
Coalitions
Server Server Server
Server Server Server
vRack
39. The Telemetry system
provides updates to the
SOSM system on the
status of resources
fabric.
It is implemented by
using InfluxDB and
SNAP.
Determining
the local state
Gateway
Service
Self Organizing
Self Management
Framework
Blueprint
Services Catalogue
Blueprint Catalogue
Plug & Play
Service
Coalition
Coalition
Coalition
Deployed Blueprint
Blueprint
Creator
End User
Plug & Play
Service
Self Organizing
Self Management
Framework
Physical ResourcesPhysical Resources
Enterprise
Cloud
Operator
40. • The SOSM system
supports the addition
of new hardware by
using a plug and play
mechanism.
• New hardware can
register with SOSM
and it is automatically
added and managed.
Support for
new hardware
Gateway
Service
Self Organizing
Self Management
Framework
Blueprint
Physical Resources
Services Catalogue
Blueprint Catalogue
Plug & Play
Service
Coalition
Coalition
Coalition
Deployed Blueprint
Blueprint
Creator
End User
Self Organizing
Self Management
Framework
Physical Resources
Enterprise
Cloud
Operator