Lxcloud

PES CERN's Cloud Computing infrastructure

CERN's Cloud Computing Infrastructure

Tony Cass, Sebastien Goasguen, Belmiro Moreira, Ewan Roche,
Ulrich Schwickerath, Romain Wartel

Cloudview conference, Porto, 2010

See also related presentations:
HEPIX spring and autumn meeting 2009, 2010
Virtualization vision, Grid Deployment Board (GDB) 9/9/2009
Batch virtualization at CERN, EGEE09 conference, Barcelona

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Outline and disclaimer

An introduction to CERN

Why virtualization and cloud computing ?

Virtualization of batch resources at CERN

Building blocks and current status

Image management systems: ISF and ONE

Status of the project and first numbers

Disclaimer: We are still in the testing and evaluation phase. No final decision
has been taken yet on what we are going to use in the future.

All given numbers and figures are preliminary

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it
CERNs Cloud computing infrastructure - a status report - 2

PES Introduction to CERN

European Organization for Nuclear Research

The world’s largest particle physics laboratory
Located on Swiss/French border
Funded/staffed by 20 member states in 1954
With many contributors in the USA
Birth place of World Wide Web
Made popular by the movie “Angels and Demons”
Flag ship accelerator LHC
http://www.cern.ch
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Introduction to CERN: LHC and the experiments

TOTEM
LHCb LHCf Alice

Circumference of LHC: 26 659 m
Magnets : 9300
Temperature: -271.3°C (1.9 K)
Cooling: ~60T liquid He
Max. Beam energy: 7TeV
Current beam energy: 3.5TeV
ATLAS CMS

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Introduction to CERN

Data Signal/Noise ratio 10-9
Data volume:
High rate * large number of
channels * 4 experiments
 15 PetaBytes of new data
each year
Compute power
Event complexity * Nb. events *
thousands users
 100 k CPUs (cores)
Worldwide analysis & funding
Computing funding locally in
major regions & countries
Efficient analysis everywhere
 GRID technology
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES LCG computing GRID

Requried computing capacity:
~100 000 processors

Number of sites:
T0: 1 (CERN), 20%
T1: 11 round the world
T2: ~160

http://lcg.web.cern.ch/lcg/public/

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES The CERN Computer Center

Disk and tape:
1500 disk servers
5PB disk space
16PB tape storage

Computing facilities:
>20.000 CPU cores (batch only)
Up to ~10000 concurrent jobs
Job throughput ~200 000/day

CERN IT Department
http://it-dep.web.cern.ch/it-dep/communications/it_facts__figures.htm
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Why virtualization and cloud computing ?

Service consolidation:

Improve resource usage by squeezing mostly unused machines
onto single big hypervisors
Decouple hardware life cycle from applications running on the box
Ease management by supporting life migration

Virtualization of batch resources:

Decouple jobs and physical resources
Ease management of the batch farm resources
Enable the computer center for new computing models

This presentation is about virtualization of batch resources only

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Batch virtualization

CERN batch farm lxbatch:
~3000 physical hosts
~20000 CPU cores
>70 queues

Type 1:
Run my jobs in your VM

Type 2:
Run my jobs in my VM

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Towards cloud computing

Type 3:
Give me my infrastructure
i.e a VM or a batch of VMs

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Philosophy

(SLC = Scientific Linux CERN)
I

( Near future:

Batch

SLC4 WN SLC5 WN
Physical Physical
SLC4 WN SLC5 WN
hypervisor cluster

(far) future ?

Batch T0 development other/cloud applications

Internal cloud
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Visions beyond the current plan

Reusing/sharing images between different sites (phase 2)

HEPIX working group founded in autumn 2009 to define rules and
boundary conditions (https://www.hepix.org/)

Experiment specific images (phase 3)

Use of images which are customized for specific experiments

Use of resources in a cloud like operation mode (phase 4)

Images directly join experiment controlled scheduling systems
(phase 5)

Controlled by experiment
Spread across sites
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Virtual batch: basic ideas
Virtual batch worker nodes:

Clones of real worker nodes, same setup
Platform Infrastructure Sharing Facility
Mix with physical resources (ISF)
Dynamically join the batch farm as normal worker nodes
For high level VM management
Limited lifetime: stop accepting jobs after 24h

Destroy when empty

Only one user job per VM at a time

Note:
The limited lifetime allows for a fully automated system which dynamically adapts
to the current needs, and automatically deploys intrusive updates.

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Virtual batch: basic ideas, technical

Images:

staged on hypervisors Infrastructure Sharing Facility
Platform
(ISF)
master images, instances use LVM snapshots
Start with few different flavors only

Image creation:

Derived from a centrally managed “golden node”
Regularly updated and distributed to get updates “in”

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it


Images distribution:
(ISF)
Only shared file system available is AFS
Prefer peer to peer methods (more on that later)

SCP wave

Rtorrent

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it


VM placement and Platform Infrastructure Sharing Facility
management system
(ISF)
Use existing solutions
Testing both a free and a commercial solution

OpenNebula (ONE)

Platform's Infrastructure Sharing Facility (ISF)

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Batch virtualization: architecture

Centrally managed
Grey: physical resource
Golden nodes
Job submission Colored: different VMs
CE/interactive
VM kiosk

Batch system
management Hypervisors / HW resources

VM worker nodes
With limited lifetime VM management system
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Status of building blocks (test system)

Submission Hypervisor VM kiosk VM
and batch cluster and image management
managemen distribution system

Initial
deployment OK OK OK OK

Central ISF OK,
management OK OK Mostly ONE
implemented missing

Monitoring
and alarming OK Switched off missing missing
for tests

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Image distribution: SCP wave versus rtorrent

Slow nodes,
(under investigation)
Preliminary !

(BT = bit torrent)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES VM placement and management

OpenNebula (ONE):
Basic model:

Single ONE master

Communication with hypervisors via ssh only

(Currently) no special tools on the hypervisors

Some scalability issues at the beginning (50 VM at the beginning)

Addressing issues as they turn up

Close collaboration with developers, ideas for improvements

Managed to start more than 7,500 VMs

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Scalability tests: some first numbers

“One shot” test with OpenNebula:
Inject virtual machine requests
And let them die
Record the number of alive machines seen by LSF every 30s

Units: 1h5min
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES VM placement and management

Platforms Infrastructure Sharing Facility (ISF)

One active ISF master node, plus fail-over candidates
Hypervisors run an agent which talks to XEN
Needed to be packaged for CERN
Resource management layer similar to LSF
Scalability expected to be good but needs verification
Tested with 2 racks (96 machines) so far, ramping up
Filled with ~2.000 VMs so far (which is the maximum)

See: http://www.platform.com

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Screen shots: ISF

VM status:

Status: 1 of 2 racks available

CERN IT Department
Note: one out of 2 racks enabled for demonstration purpose
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Screen shots: ISF

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Summary

Virtualization efforts at CERN are proceeding.
Still some work to be done.

Main challenges

Scalability considerations
provisioning systems
of the batch system
No decision on provisioning system to be used yet
Reliability and speed of image distribution
General readiness for production (hardening)
Seamless integration into the existing infrastructure

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Outlook

What's next ?

Continue testing of ONE and ISF in parallel

Solve remaining (known) issues

Release first VMs for testing by our users soon

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Questions ?

CERN IT Department
?
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Philosophy

VM provisioning system(s) (ISF)


OpenNebula pVMO

Hypervisor cluster
(physical resources)
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Details: Some additional explanations ...

“Golden node”:
A centrally managed (i.e. Quattor controlled) standard worker node
which
Is a virtual machine
Does not accept jobs
Receives regular updates
Purpose: creation of VM images

“Virtual machine worker node”:
A virtual machine derived from a golden node
Not updated during their life time
Dynamically adds itself to the batch farm
Accepts jobs for only 24h
Runs only one user job at a time
Destroys itself when empty

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES VM kiosk and Image distribution

Boundary conditions at CERN

Only available shared file system is AFS
Network infrastructure with a single 1GE connection
No dedicated fast network for transfers that could be used
(eg 10GE, IB or similar)

Tested options:

Scp wave:
Developed at Clemson university
Based on simple scp

rtorrent:
Infrastructure developed at CERN
Each node starts serving blocks it already hosts

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Details: VM kiosk and Image distribution

Local image distribution model at CERN

Virtual machines are instantiated as LVM snapshots of a base
image.This is process is very fast

Replacing a production image:

Approved images are moved to a central image repository (the “kiosk”)
Hypervisors check regularly for new images on the kiosk node
The new image is transferred to a temporary area on the hypervisors
When the transfer is finished, a new LV is created
The new image is unpacked into the new LV
The current production image is renamed (via lvrename)
The new image is renamed to become the production image

Note: may need a sophisticated locking strategy in this process
CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Image distribution with torrrent: transfer speed

7GB file compressed
452 target nodes
y
in ar
lim
Pre
90% finished after 25min

Slow nodes, under investigation

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

PES Image distribution: total distribution speed

→ Unpacking still needs some tuning !

7GB file compressed
452 target nodes

y
in ar All done after 1.5h

lim
Pre

CERN IT Department
CH-1211 Genève 23
Switzerland
www.cern.ch/it

Lxcloud

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (9)

Similar to Lxcloud

Similar to Lxcloud (20)

More from EuroCloud

More from EuroCloud (20)

Recently uploaded

Recently uploaded (20)

Lxcloud