Bringing Private Cloud Computing to
HPC and Science
Ignacio M. Llorente
OpenNebula Project Director
© OpenNebula Project. Creative Commons Attribution-NonCommercial-ShareAlike License
Computer Sciences Seminar
Berkeley, July 3rd, 2014
2/44Bringing Private Cloud Computing to HPC and Science !
Contents
Building Private Cloud Computing to HPC and Science
This presentation is about:
•  The Private HPC Cloud Use Case
•  Main Challenges for Private HPC Cloud
•  Resource Provisioning Framework
•  Private HPC Cloud Case Studies
•  About Grid and Cloud
•  About OpenNebula
3/44Bringing Private Cloud Computing to HPC and Science !
The Private HPC and Science Cloud Use Case
Different Perspectives to Present Innovations in Cloud Computing!
Demand Side
(Consumption Model)
Supply Side
(Provisioning Model)
HPC & Science
Applications
4/44Bringing Private Cloud Computing to HPC and Science !
The Private HPC and Science Cloud Use Case
The Pre-cloud Era!
LRMS (LSF, PBS, SGE…)
Grid Middleware
AccessProvision
5/44Bringing Private Cloud Computing to HPC and Science !
The Private HPC and Science Cloud Use Case
OpenNebula as an Infrastructure Tool – Enhanced Capabilities!
Virtual Worker Nodes
LRMS (LSF, PBS, SGE…)
Grid Middleware
AccessProvisionService
•  Common interfaces
•  Grid integration
•  Custom environments
•  Dynamic elasticity
•  Consolidation of WNs
•  Simplified management
•  Physical – Virtual WNs
•  Dynamic capacity partitioning
•  Faster upgrades
Service/Provisioning Decoupling!
6/44Bringing Private Cloud Computing to HPC and Science !
The Private HPC and Science Cloud Use Case
OpenNebula as an Provisioning Tool – Enhanced Capabilities!
Pilot Jobs, SSH…
IaaS Interface
AccessProvisionService
•  Simple Provisioning Interface
•  Raw/Appliance VMs
•  Dynamic scalable computing
•  Custom access to capacity
•  Not only batch workloads
•  Not only scientific workloads
•  Improve utilization
•  Reduced service management
•  Cost efficiency
7/44Bringing Private Cloud Computing to HPC and Science !
Main Challenges for Private HPC Cloud
Main Demands from Engineering, Research and Supercomputing !
Flexible Definition of
Multi-tier Applications
Resource
Management
Application
Performance
Provisioning
Model
8/44Bringing Private Cloud Computing to HPC and Science !
A Comprehensive Framework to Manage Complex Applications
•  Several tiers
•  Deployment dependencies between components
•  Each tier has its own cardinality and elasticity rules
Main Challenges for Private HPC Cloud
Execution of Multi-tiered Applications !
Front-end
Worker Nodes
{ "name": ”Computing_Cluster",
"deployment": "straight",
"roles": [
{
"name": "frontend",
"vm_template": 0
}, {
"name": "worker",
"parents": frontend,
"cardinality": 2,
"vm_template": 3,
"min_vms" : 1,
"max_vms" : 5,
"elasticity_policies" : {
”expressions" : ”CPU> 90%”,
"type" : "CHANGE",
"adjust" : 2,
"period_number" : 3,
"period" : 10}, …
9/44Bringing Private Cloud Computing to HPC and Science !
Management of interconnected multi-VM applications:
•  Definition of application flows
•  Catalog with pre-defined applications
•  Sharing between users and groups
•  Management of persistent scientific data
•  Automatic elasticity
Main Challenges for Private HPC Cloud
Using the Cloud – Execution of Multi-tiered Applications !
10/44Bringing Private Cloud Computing to HPC and Science !
Main Challenges for Private HPC Cloud
Performance Penalty as a Small Tax You Have to Pay!
Overhead in Virtualization
•  Single processor performance penalty between 1% and 5%
•  NASA has reported an overhead between 9% and 25% (HPCC and NPB)1
•  Growing number of users demanding containers (OpenVZ and LXC)
Need for Low-Latency High-Bandwidth Interconnection
•  Lower performance, 10 GigE typically, used in clouds has a significant
negative (x2-x10, especially latency) impact on HPC applications1
•  FermiCloud has reported MPI performance (HPL benchmark) on VMs and
SR-IOV/Infiniband with only a 4% overhead2
•  The Center for HPC at CSR has contributed the KVM SR-IOV Drivers for
Infiniband3
(1)  An Application-Based Performance Evaluation of Cloud Computing, NASA Ames, 2013
(2)  FermiCloud Update, Keith Chadwick!, Fermilab, HePIX Spring Workshop 2013
(3)  http://wiki.chpc.ac.za/acelab:opennebula_sr-iov_vmm_driver , 2013
Overhead in Input/Output
•  Growing number of Big Data apps
•  Support for multiple system datastores including automatic scheduling
11/44Bringing Private Cloud Computing to HPC and Science !
Optimal Placement of Virtual Machines
•  Automatic placement of VM near input data
•  Striping policy to maximize the resources available to VMs
Fair Share of Resources
•  Resource quota management to allocate, track and limit resource utilization
Management of Different Hardware Profiles
•  Resource pools (physical clusters) with specific Hw and Sw profiles, or
security levels for different workload profiles (HPC and HTC)
Isolated Execution of Applications
•  Full Isolation of performance-sensitive applications
Provide VOs with Isolated Cloud Environ
•  Automatic provision of Virtual Data Centers
Hybrid Cloud Computing
•  Cloudbursting to address peak or fluctuating demands for no critical and
HTC workloads
Main Challenges for Private HPC Cloud
Resource Management!
12/44Bringing Private Cloud Computing to HPC and Science !
The Resource Provisioning Framework
Challenges from the Organizational Perspective!
Bio HTC Simulations HPC Simulations Big Data Analysis
Comprehensive Framework to Manage User Groups
•  Several divisions, units, organizations…
•  Different workloads profiles
•  Different performance and security requirements
•  Dynamic groups that require admin privileges
=> From many private clusters to a single consolidated environment
13/44Bringing Private Cloud Computing to HPC and Science !
Challenges from the Infrastructure Perspective!
DC ESRIN DC ESACPublic Clouds
Comprehensive Framework to Manage Infrastructure Resources
•  Scalability: Several DCs with multiple clusters
•  Outsourcing: Access to several clouds for cloudbursting
•  Heterogeneity: Different hardware for specific workload profiles
The Resource Provisioning Framework
14/44Bringing Private Cloud Computing to HPC and Science !
The Goal: Dynamic Allocation of Private and Public Resources to Groups of Users!
DC West Coast DC EuropePublic Clouds
Bio HTC Simulations HPC Simulations Big Data Analysis
The Resource Provisioning Framework
15/44Bringing Private Cloud Computing to HPC and Science !
Definition of Clusters (Resource Providers)!
Bio HTC Simulations HPC Simulations Big Data Analysis
DC West Coast DC EuropePublic Clouds
The Resource Provisioning Framework
16/44Bringing Private Cloud Computing to HPC and Science !
Definition of vDCs!
DC West Coast DC EuropePublic Clouds
Bio HTC Simulations HPC Simulations Big Data Analysis
The Resource Provisioning Framework
17/44Bringing Private Cloud Computing to HPC and Science !
The Resource Provisioning Framework
Admins in each Group/vDC Manage to its Own Virtual Private Cloud !
!•  Each vDC has an admin
•  Delegation of management in the VDC
•  Only virtual resources, not the underlying physical infrastructure
vDC Admin View
18/44Bringing Private Cloud Computing to HPC and Science !
Users in each Group/vDC Access to its Own Virtual Private Cloud !
DC West Coast DC EuropePublic Clouds
Bio HTC
Simulations
HPC
Simulations
Big Data
Analysis
Cloud API
The Resource Provisioning Framework
19/44Bringing Private Cloud Computing to HPC and Science !
New Level of Provisioning: IaaS as a Service!
DC West Coast DC EuropePublic Clouds
Big Data
Analysis
CloudAdminsvDCAdminsConsumers
HPC
Simulations
Bio HTC
Simulations
The Resource Provisioning Framework
20/44Bringing Private Cloud Computing to HPC and Science !
Benefits!
•  Partition of cloud resources
•  Complete isolation of users, organizations or workloads
•  Allocation of Clusters with different levels of security, performance or high
availability to different groups with different workload profiles
•  Containers for the execution of virtual appliances (SDDCs)
•  Way of hiding physical resources from Group members
•  Simple federation and scalability of cloud infrastructures beyond a single
cloud instance and data center
The Resource Provisioning Framework
21/44Bringing Private Cloud Computing to HPC and Science !
Private HPC Cloud Case Studies
One of Our Main User Communities!
Supercomputing Centers
Research Centers
Distributed Computing Infrastructures
Industry
22/44Bringing Private Cloud Computing to HPC and Science !
FermiCloud!
Nodes KVM on 29 nodes (2 TB RAM – 608 cores) Koi Computer
Network Gigabit and Infiniband
Storage CLVM+GFS2 on shared 120TB NexSAN SataBeats
AuthN X509
Linux Scientific Linux
Interface Sunstone Self-service and EC2 API
App Profile Legacy, HTC and MPI HPC
http://www-fermicloud.fnal.gov/
Typical Workloads
•  Production VM-based batch system via
the EC2 emulation => 1,000 VMs
•  Scientific stakeholders get access to on-
demand VMs
•  Developers & integrators of new Grid
applications
Private HPC Cloud Case Studies
23/44Bringing Private Cloud Computing to HPC and Science !
CESGA Cloud!
Nodes KVM on 35 nodes (0.6 TB RAM – 280 cores) HP ProLiant
Network 2 x Gigabit (1G and 10G)
Storage ssh from remote EMC storage server
AuthN X509 and core password
Linux Scientific Linux 6.4
Interface Sunstone Self-service and OCCI
App Profile Individual VMs and virtualised computing clusters
Typical Workloads
•  160 users
•  Genomic, rendering…
•  Grid services on production at CESGA
•  Node at FedCloud project
•  UMD middleware testing
http://cloud.cesga.es/
Private HPC Cloud Case Studies
24/44Bringing Private Cloud Computing to HPC and Science !
SARA Cloud!
Nodes KVM on 30 HPC nodes (256 GB RAM 1,300 cores + 2 TB High-memory node) Dell
PowerEdge and 10 “light” nodes (64 GB RAM 80 cores) Supermicro
Network 2 x Gigabit (10G) with Arista switch
Storage NFS on 500 TB NAS for HPC and ssh for “light”
AuthN Core password
Linux CentOS
Interface Sunstone and OCCI
App Profile MPI clusters, windows clusters and independent VMs
http://www.cloud.sara.nl
Typical Workloads
•  Ad-hoc clusters with MPI and pilot jobs
•  Windows clusters for Windows-bound
software
•  Single VMs, sometimes acting as web
servers to disseminate results
Private HPC Cloud Case Studies
25/44Bringing Private Cloud Computing to HPC and Science !
SZTAKI Cloud!
Nodes KVM on 8 nodes (2 TB RAM – 512 cores) DELL PowerEdge
Network Redundant 10Gb
Storage Dell storage servers: iSCSI ( 36TB ) and CEPH ( 288 TB )
AuthN X509
Linux CentOS 6.5
Interface Sunstone Self-service, EC2 and OCCI
App Profile Individual VMs and virtualised computing cluster
http://cloud.sztaki.hu/
.
Typical Workloads
•  Run standard and grid services (e.g.: web
servers, grid middleware…)
•  Development and testing of new codes
•  Research on performance and
opportunistic computing
Private HPC Cloud Case Studies
26/44Bringing Private Cloud Computing to HPC and Science !
KTH Cloud!
Nodes KVM on 768 cores (768 GB RAM) HP ProLiant
Network Infiniband and Gigabit
Storage NFS and LVM
AuthN X509 and core password
Linux Ubuntu
Interface Sunstone self-service, OCCI and EC2
App Profile Individual VMs and virtualised computing cluster
http://www.pdc.kth.se/
Typical Workloads
•  Mainly BIO
•  Hadoop, Spark, Galaxy, Cloud Bio Linux…
Private HPC Cloud Case Studies
27/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
What is the Difference between a Grid Site and a Cloud Provider?!
Definitions of Grid Site
•  “A resource provider is a site which provides services and resources (e.g.
data storage) to this VO“ (GridPP)
•  “What makes a grid site a grid site?. A single grid resource (a grid site)
offers compute and/ or storage services to remote users via standardized
interfaces” (GridKa)
•  “A typical (minimal) grid site provides computing and storage to supported
Virtual Organizations (VOs) and runs a few services to make those resources
visible on the grid” (StratusLab)
Definitions of Cloud Provider
•  “A cloud provider is a service provider that offers storage and compute
resources on a private or public network“ (IBM)
•  “(Cloud) Providers offer resources to the customer – either via dedicated
APIs (PaaS), virtual machines and / or direct access to the resources
(IaaS)” (EC Report on The Future of Cloud Computing)
28/44Bringing Private Cloud Computing to HPC and Science !
Virtual CE, WN… Other (web, mail...) Raw machines
LRMS (LSF, PBS…)
Grid Middleware IaaS Interface
Access
•  Batch Job Processing
•  Custom Execution Environments
•  Grid Service Integration
•  Industry Applications
•  Other WMS (pilots)
•  Complete Services (cluster)
Grid Site External Providers
ProvisionService
About Grid and Cloud
The OpenNebula Vision for Grid Sites: Extending the Range of Applications!
29/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
What is the Difference between a Grid and Cloud Federation?!
Definitions of Grid
•  Ian Foster’s definition lists these primary attributes: “Computing resources
are not administered centrally, open standards are used, and nontrivial
quality of service is achieved”
•  Plaszczak/Wellner: “The technology that enables resource virtualization, on-
demand provisioning, and service (resource) sharing between organizations”
•  IBM: “The ability, using a set of open standards and protocols, to gain access
to applications and data, processing power, storage capacity and a vast
array of other computing resources over the Internet”
•  CERN: “A service for sharing computer power and data storage capacity
over the Internet”
Definition of Cloud Federation
•  “Cloud federation is the practice of interconnecting the cloud computing
environments of two or more service providers for the purpose of load
balancing traffic and accommodating spikes in demand”, Wikipedia
30/44Bringing Private Cloud Computing to HPC and Science !
Grid Services
Grid API Cloud API Grid API Cloud API
Appliance Repo
MarketPlace
Cloud/Grid Site Cloud/Grid Site
•  Sharing existing VM images
•  Registry of metadata
•  Image are kept elsewhere
•  Supports trust
•  Federation facilities
•  Security
•  Grid specific services
•  Storage VM images
•  Distributed
•  Multi-protocol
About Grid and Cloud
The OpenNebula Vision for Grid Infrastructures!
31/44Bringing Private Cloud Computing to HPC and Science !
CloudsGridsUsage
§ Job Processing
§ Big Batch System
§ File Sharing Services
Achievements
§ Federation of Resources
§ VO Concept
But…
§ User experience
§ Complexity
Usage
§ Raw infrastructure
§ Elasticity & Pay-per-use
§ Simple Web Interface
Achievements
§ Agile Infrastructures
§ IT is another Utility
But…
§ Interoperability
§ Federation
Customize Environments
Uniform Security
Resource Management
Scientific Applications
Resource Sharing
Flexibility & Simplicity
About Grid and Cloud
Grid and Cloud as Complementary Computing Models!
32/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
EGI Federated Cloud – The Architecture!
EGI	
  Core	
  Pla,orm	
  
Federated	
  AAI	
  
Service	
  
Registry	
  
Monitoring	
   Accoun6ng	
  
EGI	
  Cloud	
  Infrastructure	
  Pla=orm	
  
Virtual	
  
Instance	
  
Mgmt	
  
Informa6on	
  
Discovery	
  
Storage	
  
Management	
  
Cloud	
  Management	
  Framework	
  
(OpenNebula,	
  Synnefo,	
  OpenStack	
  …)	
  
Help	
  and	
  
Support	
  
Security	
  Co-­‐
ordina9on	
  
Training	
  and	
  
Outreach	
  
EGI	
  Collabora6on	
  Tools	
  
EGI	
  Applica6on	
  
DB	
  
Image	
  
Repository	
  
EGI	
  Cloud	
  Marketplace	
  
Sustainable	
  
Business	
  
Models	
  
GSIGLUE2
OCCI CDMI
SAM UR
OVF
User	
  Community	
  
EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
33/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
EGI Federated Cloud – The Standards!
rOCCI-­‐Server	
  
Cloud	
  
Management	
  
Framework	
  
EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
34/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
EGI Federated Cloud – CMP Compatibility!
Cloud Mgmt.
Fram.
Fed. AAI
Information
Pub
Monitoring Accounting
Img.
Mgmt.
OCCI CDMI
OpenStack Yes Yes Yes Yes Yes Yes Yes
OpenNebula Yes Yes Yes Yes Yes Yes Yes
Synnefo Yes Yes Yes Yes - Yes -
Cloudstack Yes
Emotive Yes N/A Yes
Stoxy Yes Yes* N/A Yes
EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
35/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
EGI Federated Cloud – The Providers!
15 certified resource providers from 12
countries from the public and private
sector
•  Czech Republic, Germany, Greece,
Hungary, Italy, Macedonia, Poland,
Slovakia, Spain, Sweden, Turkey,
United Kingdom
2 countries currently integrating
•  Croatia, Finland
6 countries interested
•  Bulgaria, France, Israel*, The
Netherlands, Portugal, Switzerland
Worldwide partnership/interest
•  Australia* (NECTAR)
•  South Africa* (SAGrid)
•  South Korea* (KISTI)
•  United States* (NIST, NSF Centres)
* Not shown on map
Certified
Integrating
Interested
Launch capability – 5,000 cores, 225 TB storage
Q4 2014 (planned) – 18,000 cores, 6000 TB storage
2020 Vision – 1,000,000 cores, 1 EB storage
36/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
EGI Federated Cloud – The Users!
•  Ecology – BioVeL: Biodiversity Virtual e-Laboratory
•  Structural biology – WeNMR: a worldwide e-Infrastructure for NMR and structural biology
•  Linguistics – CLARIN: ‘British National Corpus’ service (BNCWeb)
•  Earth Observation – SSEP: European Space Agency’s Supersites Exploitation Platform for
volcano and earthquakes monitoring (Collaboration with Helix Nebula)
•  Software Engineering – SCI-BUS: simulated environments for portal testing
•  Software Engineering – DIRAC: deploying ready-to-use distributed computing systems
•  Interdisciplinary research– Catania Science Gateway Framework
•  Musicology – Peachnote: dynamic analysis of musical scores
•  Earth Observation – ENVRI: Common Operations of Environmental Research
infrastructures (collaboration with EISCAT3D)
•  Geology – VERCE: Virtual Earthquake and seismology Research
•  Ecology – LifeWatch: E-Science European Infrastructure for Biodiversity and Ecosystem
Research
•  High Energy Physics – CERN ATLAS: ATLAS processing cluster via HelixNebula
EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
37/44Bringing Private Cloud Computing to HPC and Science !
About Grid and Cloud
Different Names for the Same Model? Same Challenges but Different Technologies?!
Grid Computing Cloud Computing
38/44Bringing Private Cloud Computing to HPC and Science !
About OpenNebula
Simple but feature-rich, production-ready, customizable solution to build clouds!
39/44Bringing Private Cloud Computing to HPC and Science !
From Research Project to Open-source Project for Enterprise!
2005
2008 2009 2010 2011 2012
• Develop & innovate
• Support the community
• Collaborate
Large-scale production
deployment: 16,000 VMs
20142013
Research
Project
5,000
down/month
About OpenNebula
2015
Very large-scale
production deployment:
200,000 VMs (10 zones)
Hybrid cloud partners
Tech Days
40/44Bringing Private Cloud Computing to HPC and Science !
An Open Community Driven by Users!
Develop
Open-source Apache code
Transparent process
Public roadmap
Communicate
User groups
Cloud technology days
OpenNebulaConf
Integrate
Add-ons catalog
Ecosystem directory
Use
Give us feedback
Contribute experiences
Help support new users
BBC
About OpenNebula
41/44Bringing Private Cloud Computing to HPC and Science !
Adopt as innovation
platform or
interoperability tool
Standards Projects
Linux Distributions
Requirements for
innovative functionality
Adopt
standards
Contribute to
standards
Distribution
channel
Industry and Goverment
The Intersection between Industry, Standards and Research Projects!
What is OpenNebula?
Requirements
Feedback
Contributions Adopt
open-source
42/44Bringing Private Cloud Computing to HPC and Science !
What is OpenNebula?
Some Research References !
•  B. Sotomayor, R. S. Montero, I. M. Llorente and I. Foster, “Virtual Infrastructure
Management in Private and Hybrid Clouds”, IEEE Internet Computing,
September/October 2009 (vol. 13 no. 5)
•  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “Multi-
Cloud Deployment of Computing Clusters for Loosely-Coupled MTC
Applications”, IEEE Transactions on Parallel and Distributed Systems, 22(6):
924-930, April 2011
•  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “IaaS Cloud
Architecture: From Virtualized Data Centers to Federated Cloud
Infrastructures”, IEEE Computer, 45(12):65-72, December 2012
•  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “Key
Challenges in Cloud Computing to Enable the Future Internet of Services”, IEEE
Internet Computing, 17(4):18-25, 2012.
Innovation in Cloud Architecture
43/44Bringing Private Cloud Computing to HPC and Science !
Upcoming Community Events
44/44Bringing Private Cloud Computing to HPC and Science !
We Will Be Happy to Answer Your Questions !
Questions?
OpenNebula.org @OpenNebula

Bringing Private Cloud Computing to HPC and Science - Berkeley Lab - July 2014

  • 1.
    Bringing Private CloudComputing to HPC and Science Ignacio M. Llorente OpenNebula Project Director © OpenNebula Project. Creative Commons Attribution-NonCommercial-ShareAlike License Computer Sciences Seminar Berkeley, July 3rd, 2014
  • 2.
    2/44Bringing Private CloudComputing to HPC and Science ! Contents Building Private Cloud Computing to HPC and Science This presentation is about: •  The Private HPC Cloud Use Case •  Main Challenges for Private HPC Cloud •  Resource Provisioning Framework •  Private HPC Cloud Case Studies •  About Grid and Cloud •  About OpenNebula
  • 3.
    3/44Bringing Private CloudComputing to HPC and Science ! The Private HPC and Science Cloud Use Case Different Perspectives to Present Innovations in Cloud Computing! Demand Side (Consumption Model) Supply Side (Provisioning Model) HPC & Science Applications
  • 4.
    4/44Bringing Private CloudComputing to HPC and Science ! The Private HPC and Science Cloud Use Case The Pre-cloud Era! LRMS (LSF, PBS, SGE…) Grid Middleware AccessProvision
  • 5.
    5/44Bringing Private CloudComputing to HPC and Science ! The Private HPC and Science Cloud Use Case OpenNebula as an Infrastructure Tool – Enhanced Capabilities! Virtual Worker Nodes LRMS (LSF, PBS, SGE…) Grid Middleware AccessProvisionService •  Common interfaces •  Grid integration •  Custom environments •  Dynamic elasticity •  Consolidation of WNs •  Simplified management •  Physical – Virtual WNs •  Dynamic capacity partitioning •  Faster upgrades Service/Provisioning Decoupling!
  • 6.
    6/44Bringing Private CloudComputing to HPC and Science ! The Private HPC and Science Cloud Use Case OpenNebula as an Provisioning Tool – Enhanced Capabilities! Pilot Jobs, SSH… IaaS Interface AccessProvisionService •  Simple Provisioning Interface •  Raw/Appliance VMs •  Dynamic scalable computing •  Custom access to capacity •  Not only batch workloads •  Not only scientific workloads •  Improve utilization •  Reduced service management •  Cost efficiency
  • 7.
    7/44Bringing Private CloudComputing to HPC and Science ! Main Challenges for Private HPC Cloud Main Demands from Engineering, Research and Supercomputing ! Flexible Definition of Multi-tier Applications Resource Management Application Performance Provisioning Model
  • 8.
    8/44Bringing Private CloudComputing to HPC and Science ! A Comprehensive Framework to Manage Complex Applications •  Several tiers •  Deployment dependencies between components •  Each tier has its own cardinality and elasticity rules Main Challenges for Private HPC Cloud Execution of Multi-tiered Applications ! Front-end Worker Nodes { "name": ”Computing_Cluster", "deployment": "straight", "roles": [ { "name": "frontend", "vm_template": 0 }, { "name": "worker", "parents": frontend, "cardinality": 2, "vm_template": 3, "min_vms" : 1, "max_vms" : 5, "elasticity_policies" : { ”expressions" : ”CPU> 90%”, "type" : "CHANGE", "adjust" : 2, "period_number" : 3, "period" : 10}, …
  • 9.
    9/44Bringing Private CloudComputing to HPC and Science ! Management of interconnected multi-VM applications: •  Definition of application flows •  Catalog with pre-defined applications •  Sharing between users and groups •  Management of persistent scientific data •  Automatic elasticity Main Challenges for Private HPC Cloud Using the Cloud – Execution of Multi-tiered Applications !
  • 10.
    10/44Bringing Private CloudComputing to HPC and Science ! Main Challenges for Private HPC Cloud Performance Penalty as a Small Tax You Have to Pay! Overhead in Virtualization •  Single processor performance penalty between 1% and 5% •  NASA has reported an overhead between 9% and 25% (HPCC and NPB)1 •  Growing number of users demanding containers (OpenVZ and LXC) Need for Low-Latency High-Bandwidth Interconnection •  Lower performance, 10 GigE typically, used in clouds has a significant negative (x2-x10, especially latency) impact on HPC applications1 •  FermiCloud has reported MPI performance (HPL benchmark) on VMs and SR-IOV/Infiniband with only a 4% overhead2 •  The Center for HPC at CSR has contributed the KVM SR-IOV Drivers for Infiniband3 (1)  An Application-Based Performance Evaluation of Cloud Computing, NASA Ames, 2013 (2)  FermiCloud Update, Keith Chadwick!, Fermilab, HePIX Spring Workshop 2013 (3)  http://wiki.chpc.ac.za/acelab:opennebula_sr-iov_vmm_driver , 2013 Overhead in Input/Output •  Growing number of Big Data apps •  Support for multiple system datastores including automatic scheduling
  • 11.
    11/44Bringing Private CloudComputing to HPC and Science ! Optimal Placement of Virtual Machines •  Automatic placement of VM near input data •  Striping policy to maximize the resources available to VMs Fair Share of Resources •  Resource quota management to allocate, track and limit resource utilization Management of Different Hardware Profiles •  Resource pools (physical clusters) with specific Hw and Sw profiles, or security levels for different workload profiles (HPC and HTC) Isolated Execution of Applications •  Full Isolation of performance-sensitive applications Provide VOs with Isolated Cloud Environ •  Automatic provision of Virtual Data Centers Hybrid Cloud Computing •  Cloudbursting to address peak or fluctuating demands for no critical and HTC workloads Main Challenges for Private HPC Cloud Resource Management!
  • 12.
    12/44Bringing Private CloudComputing to HPC and Science ! The Resource Provisioning Framework Challenges from the Organizational Perspective! Bio HTC Simulations HPC Simulations Big Data Analysis Comprehensive Framework to Manage User Groups •  Several divisions, units, organizations… •  Different workloads profiles •  Different performance and security requirements •  Dynamic groups that require admin privileges => From many private clusters to a single consolidated environment
  • 13.
    13/44Bringing Private CloudComputing to HPC and Science ! Challenges from the Infrastructure Perspective! DC ESRIN DC ESACPublic Clouds Comprehensive Framework to Manage Infrastructure Resources •  Scalability: Several DCs with multiple clusters •  Outsourcing: Access to several clouds for cloudbursting •  Heterogeneity: Different hardware for specific workload profiles The Resource Provisioning Framework
  • 14.
    14/44Bringing Private CloudComputing to HPC and Science ! The Goal: Dynamic Allocation of Private and Public Resources to Groups of Users! DC West Coast DC EuropePublic Clouds Bio HTC Simulations HPC Simulations Big Data Analysis The Resource Provisioning Framework
  • 15.
    15/44Bringing Private CloudComputing to HPC and Science ! Definition of Clusters (Resource Providers)! Bio HTC Simulations HPC Simulations Big Data Analysis DC West Coast DC EuropePublic Clouds The Resource Provisioning Framework
  • 16.
    16/44Bringing Private CloudComputing to HPC and Science ! Definition of vDCs! DC West Coast DC EuropePublic Clouds Bio HTC Simulations HPC Simulations Big Data Analysis The Resource Provisioning Framework
  • 17.
    17/44Bringing Private CloudComputing to HPC and Science ! The Resource Provisioning Framework Admins in each Group/vDC Manage to its Own Virtual Private Cloud ! !•  Each vDC has an admin •  Delegation of management in the VDC •  Only virtual resources, not the underlying physical infrastructure vDC Admin View
  • 18.
    18/44Bringing Private CloudComputing to HPC and Science ! Users in each Group/vDC Access to its Own Virtual Private Cloud ! DC West Coast DC EuropePublic Clouds Bio HTC Simulations HPC Simulations Big Data Analysis Cloud API The Resource Provisioning Framework
  • 19.
    19/44Bringing Private CloudComputing to HPC and Science ! New Level of Provisioning: IaaS as a Service! DC West Coast DC EuropePublic Clouds Big Data Analysis CloudAdminsvDCAdminsConsumers HPC Simulations Bio HTC Simulations The Resource Provisioning Framework
  • 20.
    20/44Bringing Private CloudComputing to HPC and Science ! Benefits! •  Partition of cloud resources •  Complete isolation of users, organizations or workloads •  Allocation of Clusters with different levels of security, performance or high availability to different groups with different workload profiles •  Containers for the execution of virtual appliances (SDDCs) •  Way of hiding physical resources from Group members •  Simple federation and scalability of cloud infrastructures beyond a single cloud instance and data center The Resource Provisioning Framework
  • 21.
    21/44Bringing Private CloudComputing to HPC and Science ! Private HPC Cloud Case Studies One of Our Main User Communities! Supercomputing Centers Research Centers Distributed Computing Infrastructures Industry
  • 22.
    22/44Bringing Private CloudComputing to HPC and Science ! FermiCloud! Nodes KVM on 29 nodes (2 TB RAM – 608 cores) Koi Computer Network Gigabit and Infiniband Storage CLVM+GFS2 on shared 120TB NexSAN SataBeats AuthN X509 Linux Scientific Linux Interface Sunstone Self-service and EC2 API App Profile Legacy, HTC and MPI HPC http://www-fermicloud.fnal.gov/ Typical Workloads •  Production VM-based batch system via the EC2 emulation => 1,000 VMs •  Scientific stakeholders get access to on- demand VMs •  Developers & integrators of new Grid applications Private HPC Cloud Case Studies
  • 23.
    23/44Bringing Private CloudComputing to HPC and Science ! CESGA Cloud! Nodes KVM on 35 nodes (0.6 TB RAM – 280 cores) HP ProLiant Network 2 x Gigabit (1G and 10G) Storage ssh from remote EMC storage server AuthN X509 and core password Linux Scientific Linux 6.4 Interface Sunstone Self-service and OCCI App Profile Individual VMs and virtualised computing clusters Typical Workloads •  160 users •  Genomic, rendering… •  Grid services on production at CESGA •  Node at FedCloud project •  UMD middleware testing http://cloud.cesga.es/ Private HPC Cloud Case Studies
  • 24.
    24/44Bringing Private CloudComputing to HPC and Science ! SARA Cloud! Nodes KVM on 30 HPC nodes (256 GB RAM 1,300 cores + 2 TB High-memory node) Dell PowerEdge and 10 “light” nodes (64 GB RAM 80 cores) Supermicro Network 2 x Gigabit (10G) with Arista switch Storage NFS on 500 TB NAS for HPC and ssh for “light” AuthN Core password Linux CentOS Interface Sunstone and OCCI App Profile MPI clusters, windows clusters and independent VMs http://www.cloud.sara.nl Typical Workloads •  Ad-hoc clusters with MPI and pilot jobs •  Windows clusters for Windows-bound software •  Single VMs, sometimes acting as web servers to disseminate results Private HPC Cloud Case Studies
  • 25.
    25/44Bringing Private CloudComputing to HPC and Science ! SZTAKI Cloud! Nodes KVM on 8 nodes (2 TB RAM – 512 cores) DELL PowerEdge Network Redundant 10Gb Storage Dell storage servers: iSCSI ( 36TB ) and CEPH ( 288 TB ) AuthN X509 Linux CentOS 6.5 Interface Sunstone Self-service, EC2 and OCCI App Profile Individual VMs and virtualised computing cluster http://cloud.sztaki.hu/ . Typical Workloads •  Run standard and grid services (e.g.: web servers, grid middleware…) •  Development and testing of new codes •  Research on performance and opportunistic computing Private HPC Cloud Case Studies
  • 26.
    26/44Bringing Private CloudComputing to HPC and Science ! KTH Cloud! Nodes KVM on 768 cores (768 GB RAM) HP ProLiant Network Infiniband and Gigabit Storage NFS and LVM AuthN X509 and core password Linux Ubuntu Interface Sunstone self-service, OCCI and EC2 App Profile Individual VMs and virtualised computing cluster http://www.pdc.kth.se/ Typical Workloads •  Mainly BIO •  Hadoop, Spark, Galaxy, Cloud Bio Linux… Private HPC Cloud Case Studies
  • 27.
    27/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud What is the Difference between a Grid Site and a Cloud Provider?! Definitions of Grid Site •  “A resource provider is a site which provides services and resources (e.g. data storage) to this VO“ (GridPP) •  “What makes a grid site a grid site?. A single grid resource (a grid site) offers compute and/ or storage services to remote users via standardized interfaces” (GridKa) •  “A typical (minimal) grid site provides computing and storage to supported Virtual Organizations (VOs) and runs a few services to make those resources visible on the grid” (StratusLab) Definitions of Cloud Provider •  “A cloud provider is a service provider that offers storage and compute resources on a private or public network“ (IBM) •  “(Cloud) Providers offer resources to the customer – either via dedicated APIs (PaaS), virtual machines and / or direct access to the resources (IaaS)” (EC Report on The Future of Cloud Computing)
  • 28.
    28/44Bringing Private CloudComputing to HPC and Science ! Virtual CE, WN… Other (web, mail...) Raw machines LRMS (LSF, PBS…) Grid Middleware IaaS Interface Access •  Batch Job Processing •  Custom Execution Environments •  Grid Service Integration •  Industry Applications •  Other WMS (pilots) •  Complete Services (cluster) Grid Site External Providers ProvisionService About Grid and Cloud The OpenNebula Vision for Grid Sites: Extending the Range of Applications!
  • 29.
    29/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud What is the Difference between a Grid and Cloud Federation?! Definitions of Grid •  Ian Foster’s definition lists these primary attributes: “Computing resources are not administered centrally, open standards are used, and nontrivial quality of service is achieved” •  Plaszczak/Wellner: “The technology that enables resource virtualization, on- demand provisioning, and service (resource) sharing between organizations” •  IBM: “The ability, using a set of open standards and protocols, to gain access to applications and data, processing power, storage capacity and a vast array of other computing resources over the Internet” •  CERN: “A service for sharing computer power and data storage capacity over the Internet” Definition of Cloud Federation •  “Cloud federation is the practice of interconnecting the cloud computing environments of two or more service providers for the purpose of load balancing traffic and accommodating spikes in demand”, Wikipedia
  • 30.
    30/44Bringing Private CloudComputing to HPC and Science ! Grid Services Grid API Cloud API Grid API Cloud API Appliance Repo MarketPlace Cloud/Grid Site Cloud/Grid Site •  Sharing existing VM images •  Registry of metadata •  Image are kept elsewhere •  Supports trust •  Federation facilities •  Security •  Grid specific services •  Storage VM images •  Distributed •  Multi-protocol About Grid and Cloud The OpenNebula Vision for Grid Infrastructures!
  • 31.
    31/44Bringing Private CloudComputing to HPC and Science ! CloudsGridsUsage § Job Processing § Big Batch System § File Sharing Services Achievements § Federation of Resources § VO Concept But… § User experience § Complexity Usage § Raw infrastructure § Elasticity & Pay-per-use § Simple Web Interface Achievements § Agile Infrastructures § IT is another Utility But… § Interoperability § Federation Customize Environments Uniform Security Resource Management Scientific Applications Resource Sharing Flexibility & Simplicity About Grid and Cloud Grid and Cloud as Complementary Computing Models!
  • 32.
    32/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud EGI Federated Cloud – The Architecture! EGI  Core  Pla,orm   Federated  AAI   Service   Registry   Monitoring   Accoun6ng   EGI  Cloud  Infrastructure  Pla=orm   Virtual   Instance   Mgmt   Informa6on   Discovery   Storage   Management   Cloud  Management  Framework   (OpenNebula,  Synnefo,  OpenStack  …)   Help  and   Support   Security  Co-­‐ ordina9on   Training  and   Outreach   EGI  Collabora6on  Tools   EGI  Applica6on   DB   Image   Repository   EGI  Cloud  Marketplace   Sustainable   Business   Models   GSIGLUE2 OCCI CDMI SAM UR OVF User  Community   EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
  • 33.
    33/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud EGI Federated Cloud – The Standards! rOCCI-­‐Server   Cloud   Management   Framework   EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
  • 34.
    34/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud EGI Federated Cloud – CMP Compatibility! Cloud Mgmt. Fram. Fed. AAI Information Pub Monitoring Accounting Img. Mgmt. OCCI CDMI OpenStack Yes Yes Yes Yes Yes Yes Yes OpenNebula Yes Yes Yes Yes Yes Yes Yes Synnefo Yes Yes Yes Yes - Yes - Cloudstack Yes Emotive Yes N/A Yes Stoxy Yes Yes* N/A Yes EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
  • 35.
    35/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud EGI Federated Cloud – The Providers! 15 certified resource providers from 12 countries from the public and private sector •  Czech Republic, Germany, Greece, Hungary, Italy, Macedonia, Poland, Slovakia, Spain, Sweden, Turkey, United Kingdom 2 countries currently integrating •  Croatia, Finland 6 countries interested •  Bulgaria, France, Israel*, The Netherlands, Portugal, Switzerland Worldwide partnership/interest •  Australia* (NECTAR) •  South Africa* (SAGrid) •  South Korea* (KISTI) •  United States* (NIST, NSF Centres) * Not shown on map Certified Integrating Interested Launch capability – 5,000 cores, 225 TB storage Q4 2014 (planned) – 18,000 cores, 6000 TB storage 2020 Vision – 1,000,000 cores, 1 EB storage
  • 36.
    36/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud EGI Federated Cloud – The Users! •  Ecology – BioVeL: Biodiversity Virtual e-Laboratory •  Structural biology – WeNMR: a worldwide e-Infrastructure for NMR and structural biology •  Linguistics – CLARIN: ‘British National Corpus’ service (BNCWeb) •  Earth Observation – SSEP: European Space Agency’s Supersites Exploitation Platform for volcano and earthquakes monitoring (Collaboration with Helix Nebula) •  Software Engineering – SCI-BUS: simulated environments for portal testing •  Software Engineering – DIRAC: deploying ready-to-use distributed computing systems •  Interdisciplinary research– Catania Science Gateway Framework •  Musicology – Peachnote: dynamic analysis of musical scores •  Earth Observation – ENVRI: Common Operations of Environmental Research infrastructures (collaboration with EISCAT3D) •  Geology – VERCE: Virtual Earthquake and seismology Research •  Ecology – LifeWatch: E-Science European Infrastructure for Biodiversity and Ecosystem Research •  High Energy Physics – CERN ATLAS: ATLAS processing cluster via HelixNebula EGI Federated Cloud: Use Cases and Architecture, David Wallom, July 2014
  • 37.
    37/44Bringing Private CloudComputing to HPC and Science ! About Grid and Cloud Different Names for the Same Model? Same Challenges but Different Technologies?! Grid Computing Cloud Computing
  • 38.
    38/44Bringing Private CloudComputing to HPC and Science ! About OpenNebula Simple but feature-rich, production-ready, customizable solution to build clouds!
  • 39.
    39/44Bringing Private CloudComputing to HPC and Science ! From Research Project to Open-source Project for Enterprise! 2005 2008 2009 2010 2011 2012 • Develop & innovate • Support the community • Collaborate Large-scale production deployment: 16,000 VMs 20142013 Research Project 5,000 down/month About OpenNebula 2015 Very large-scale production deployment: 200,000 VMs (10 zones) Hybrid cloud partners Tech Days
  • 40.
    40/44Bringing Private CloudComputing to HPC and Science ! An Open Community Driven by Users! Develop Open-source Apache code Transparent process Public roadmap Communicate User groups Cloud technology days OpenNebulaConf Integrate Add-ons catalog Ecosystem directory Use Give us feedback Contribute experiences Help support new users BBC About OpenNebula
  • 41.
    41/44Bringing Private CloudComputing to HPC and Science ! Adopt as innovation platform or interoperability tool Standards Projects Linux Distributions Requirements for innovative functionality Adopt standards Contribute to standards Distribution channel Industry and Goverment The Intersection between Industry, Standards and Research Projects! What is OpenNebula? Requirements Feedback Contributions Adopt open-source
  • 42.
    42/44Bringing Private CloudComputing to HPC and Science ! What is OpenNebula? Some Research References ! •  B. Sotomayor, R. S. Montero, I. M. Llorente and I. Foster, “Virtual Infrastructure Management in Private and Hybrid Clouds”, IEEE Internet Computing, September/October 2009 (vol. 13 no. 5) •  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “Multi- Cloud Deployment of Computing Clusters for Loosely-Coupled MTC Applications”, IEEE Transactions on Parallel and Distributed Systems, 22(6): 924-930, April 2011 •  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “IaaS Cloud Architecture: From Virtualized Data Centers to Federated Cloud Infrastructures”, IEEE Computer, 45(12):65-72, December 2012 •  Rafael Moreno-Vozmediano,  Ruben S. Montero, Ignacio M. Llorente, “Key Challenges in Cloud Computing to Enable the Future Internet of Services”, IEEE Internet Computing, 17(4):18-25, 2012. Innovation in Cloud Architecture
  • 43.
    43/44Bringing Private CloudComputing to HPC and Science ! Upcoming Community Events
  • 44.
    44/44Bringing Private CloudComputing to HPC and Science ! We Will Be Happy to Answer Your Questions ! Questions? OpenNebula.org @OpenNebula