CERN Cloud Architecture
Ops Midcycle - High Performance Computing with OpenStack - Manchester, 2016
Belmiro Moreira
belmiro.moreira@cern.ch @belmiromoreira
What is CERN?
3
CERN Cloud – LHC and Experiments
4
CMS detector
https://www.google.com/maps/streetview/#cern
CERN Cloud – AMS
5
OpenStack at CERN by numbers
6
~ 5500 Compute Nodes (~140k cores)
•  ~ 5300 KVM
•  ~ 200 Hyper-V
~ 2800 Images ( ~ 44 TB in use)
~ 2000 Volumes ( ~ 800 TB allocated)
~ 2200 Users
~ 2500 Projects
> 17000 VMs running
Number of VMs created (green) and VMs deleted (red) every 30 minutes
OpenStack timeline at CERN
7
ESSEX
5 Apr 2012
FOLSOM
27 Sep 2012
GRIZZLY
4 Apr 2013
HAVANA
17 Oct 2013
ICEHOUSE
17 Apr 2014
JUNO
16 Oct 2014
Havana
February 2014
Icehouse
October 2014
KILO
30 Apr 2015
“Hamster”
Oct 2013
“Guppy”
Jun 2012
“Ibex”
Mar 2013
Grizzly
Jul 2013
Juno
April 2015
LIBERTY
Kilo
October 2015
CERN production infrastructure
•  Evolution of the number of VMs created since July 2013
OpenStack timeline at CERN
8
Number of VMs running Number of VMs created (cumulative)
Infrastructure Overview
•  One region, two data centres, 33 Cells
•  HA architecture only on Top Cell
•  Children Cells control plane are usually VMs running in the shared infrastructure
•  Using nova-network with custom CERN driver / Neutron in one cell
•  2 Hypervisor types (KVM, HyperV)
•  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2
•  2 Ceph instances
•  Keystone integrated with CERN account/lifecycle system
•  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally; Magnum; Neutron
•  Deployment using OpenStack puppet modules and RDO
9
Architecture Overview
10
Nova Compute Cell
Nova Top Cell
Nova Compute Cell
Nova Compute Cell
Load BalancerCeph
Glance
Cinder
Heat
Ceilometer
Horizon
Keystone
DB infrastructure
(...)
Geneva Data Centre Budapest Data Centre
Ceph
DB infrastructure
Nova Compute Cell
Nova Compute Cell
Nova Compute Cell
(...)
Neutron
Magnum
Cells
11
AVZ_A
AVZ_B
HyperV
GVA
GVA
KVM
GVA
KVM
AVZ_C
WIG
KVM
WIG
KVM
AVZ_A
KVM
WIG
KVM
Project: uuid1
Nova Deployment at CERN
12
nova-cells
rabbitmqTop cell controller API node
nova-api
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
rabbitmq
nova-cells
nova-api
nova-scheduler
nova-conductor
nova-network
Child cell controller
Compute node
nova-compute
DB
(...)
Load Balancer
DB DB
Keystone Deployment at CERN
13
Load Balancer
DB
Service
CatalogueDB
Keystone
Service
Catalogue
(Exposed to Users) (Dedicated to Ceilometer)
Keystone
Active
Directory
Glance Deployment at CERN
14
Load Balancer
DB
Glance-api
Glance-registry
Glance node
(Exposed to Users)
Glance-api
Glance-registry
Glance node
(Only used for Ceilometer calls)
Ceph
Geneva
Cinder Deployment at CERN
15
Load Balancer
DB
Cinder-api
Cinder-volume
Cinder node
Cinder-scheduler
rabbitmq
Ceph
Geneva
Ceph
Budapest
NetApp
Ceilometer Deployment at CERN
16
nova-compute
ceilometer-compute
Hbase
Ceilometer
Notification
Agent
Ceilometer
Pulling
Collector
Ceilometer
Notification
Collector
Ceilometer
UDP
Collector
MysqlMongoDB
Ceilometer
API
Cell
rabbitmq
notifications
Ceilometer
rabbitmq
Ceilometer
API
sampleRPC
sampleUDP
Aodh
Evaluator & Notifier
HEAT
ceilometer-central-agent
Compute node
Aodh
API
Challenges
•  Capacity increase to 200k cores by Summer 2016
•  Live Migrate ~5000 thousands of VMs
•  Upgrade ~800 compute nodes from SLC6 to CC7
•  Retire old servers
•  Migrate to Neutron
•  Identity Federation with different scientific sites
•  Scale Magnum and containers deployment
17
belmiro.moreira@cern.ch
@belmiromoreira
http://openstack-in-production.blogspot.com

Cern Cloud Architecture - February, 2016

  • 2.
    CERN Cloud Architecture OpsMidcycle - High Performance Computing with OpenStack - Manchester, 2016 Belmiro Moreira belmiro.moreira@cern.ch @belmiromoreira
  • 3.
  • 4.
    CERN Cloud –LHC and Experiments 4 CMS detector https://www.google.com/maps/streetview/#cern
  • 5.
  • 6.
    OpenStack at CERNby numbers 6 ~ 5500 Compute Nodes (~140k cores) •  ~ 5300 KVM •  ~ 200 Hyper-V ~ 2800 Images ( ~ 44 TB in use) ~ 2000 Volumes ( ~ 800 TB allocated) ~ 2200 Users ~ 2500 Projects > 17000 VMs running Number of VMs created (green) and VMs deleted (red) every 30 minutes
  • 7.
    OpenStack timeline atCERN 7 ESSEX 5 Apr 2012 FOLSOM 27 Sep 2012 GRIZZLY 4 Apr 2013 HAVANA 17 Oct 2013 ICEHOUSE 17 Apr 2014 JUNO 16 Oct 2014 Havana February 2014 Icehouse October 2014 KILO 30 Apr 2015 “Hamster” Oct 2013 “Guppy” Jun 2012 “Ibex” Mar 2013 Grizzly Jul 2013 Juno April 2015 LIBERTY Kilo October 2015 CERN production infrastructure
  • 8.
    •  Evolution ofthe number of VMs created since July 2013 OpenStack timeline at CERN 8 Number of VMs running Number of VMs created (cumulative)
  • 9.
    Infrastructure Overview •  Oneregion, two data centres, 33 Cells •  HA architecture only on Top Cell •  Children Cells control plane are usually VMs running in the shared infrastructure •  Using nova-network with custom CERN driver / Neutron in one cell •  2 Hypervisor types (KVM, HyperV) •  Scientific Linux CERN 6; CERN Centos 7; Windows Server 2012 R2 •  2 Ceph instances •  Keystone integrated with CERN account/lifecycle system •  Nova; Keystone; Glance; Cinder; Heat; Horizon, Ceilometer; Rally; Magnum; Neutron •  Deployment using OpenStack puppet modules and RDO 9
  • 10.
    Architecture Overview 10 Nova ComputeCell Nova Top Cell Nova Compute Cell Nova Compute Cell Load BalancerCeph Glance Cinder Heat Ceilometer Horizon Keystone DB infrastructure (...) Geneva Data Centre Budapest Data Centre Ceph DB infrastructure Nova Compute Cell Nova Compute Cell Nova Compute Cell (...) Neutron Magnum
  • 11.
  • 12.
    Nova Deployment atCERN 12 nova-cells rabbitmqTop cell controller API node nova-api rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute rabbitmq nova-cells nova-api nova-scheduler nova-conductor nova-network Child cell controller Compute node nova-compute DB (...) Load Balancer DB DB
  • 13.
    Keystone Deployment atCERN 13 Load Balancer DB Service CatalogueDB Keystone Service Catalogue (Exposed to Users) (Dedicated to Ceilometer) Keystone Active Directory
  • 14.
    Glance Deployment atCERN 14 Load Balancer DB Glance-api Glance-registry Glance node (Exposed to Users) Glance-api Glance-registry Glance node (Only used for Ceilometer calls) Ceph Geneva
  • 15.
    Cinder Deployment atCERN 15 Load Balancer DB Cinder-api Cinder-volume Cinder node Cinder-scheduler rabbitmq Ceph Geneva Ceph Budapest NetApp
  • 16.
    Ceilometer Deployment atCERN 16 nova-compute ceilometer-compute Hbase Ceilometer Notification Agent Ceilometer Pulling Collector Ceilometer Notification Collector Ceilometer UDP Collector MysqlMongoDB Ceilometer API Cell rabbitmq notifications Ceilometer rabbitmq Ceilometer API sampleRPC sampleUDP Aodh Evaluator & Notifier HEAT ceilometer-central-agent Compute node Aodh API
  • 17.
    Challenges •  Capacity increaseto 200k cores by Summer 2016 •  Live Migrate ~5000 thousands of VMs •  Upgrade ~800 compute nodes from SLC6 to CC7 •  Retire old servers •  Migrate to Neutron •  Identity Federation with different scientific sites •  Scale Magnum and containers deployment 17
  • 18.