SlideShare a Scribd company logo
1 of 32
Download to read offline
CERN  Data  Centre  Evolution
Gavin  McCance
gavin.mccance@cern.ch
@gmccance
SDCD12:  Supporting  Science  with  Cloud  Computing
Bern
19th November  2012
What  is  CERN  ?
Gavin  McCance,  CERN 2
• Conseil Européen pour  la  
Recherche Nucléaire – aka  
European  Laboratory  for  
Particle  Physics
• Between  Geneva  and  the  
Jura  mountains,  straddling  
the  Swiss-­‐French  border
• Founded  in  1954  with  an  
international  treaty
• Our  business  is  fundamental  
physics  ,  what  is  the  
universe  made  of  and  how  
does  it  work
Gavin  McCance,  CERN 3
Answering fundamental questions…
• How  to  explain particles have  mass?
We have  theories and  accumulating experimental evidence..  Getting close…
• What is 96%  of  the  universe made  of  ?
We can only see 4%  of  its estimated mass!
• Why isn’t there anti-­‐matter
in  the  universe?
Nature  should be symmetric…
• What was the  state  of  matter just
after the  « Big Bang »  ?
Travelling  back  to  the  earliest instants  of
the  universe would help…
4
The  Large  Hadron  Collider  (LHC)  tunnel
Gavin  McCance,  CERN
Gavin  McCance,  CERN 5
Gavin  McCance,  CERN 6
• Data  Centre  by  Numbers
– Hardware  installation  &  retirement
• ~7,000  hardware  movements/year;  ~1,800  disk  failures/year
Xeon  
5150
2%
Xeon  
5160
10%
Xeon  
E5335
7%
Xeon  
E5345
14%
Xeon  
E5405
6%
Xeon  
E5410
16%
Xeon  
L5420
8%
Xeon  
L5520
33%
Xeon  
3GHz
4%
Fujitsu
3%
Hitachi
23%
HP
0%
Maxtor
0%
Seagate
15%
Western  
Digital
59%
Other
0%
High  Speed  Routers
(640 Mbps  →  2.4  Tbps)
24
Ethernet  Switches 350
10  Gbps  ports 2,000
Switching  Capacity 4.8 Tbps
1  Gbps  ports 16,939
10  Gbps  ports 558
Racks 828
Servers 11,728
Processors 15,694
Cores 64,238
HEPSpec06 482,507
Disks 64,109
Raw  disk  capacity  (TiB) 63,289
Memory  modules 56,014
Memory  capacity  (TiB) 158
RAID  controllers 3,749
Tape  Drives 160
Tape  Cartridges 45,000
Tape  slots 56,000
Tape Capacity  (TiB) 73,000
IT  Power  Consumption 2,456  KW
Total Power  Consumption 3,890  KW
Current  infrastructure
• Around  12k  servers
– Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes
– Majority  Scientific  Linux  (RHEL5/6  clone)
– Mostly  running  on  real  hardware
– Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes  
onto  Microsoft  HyperV
– Various  other  virtualisation  projects  around
• In  2002  we  developed  our  own  management  toolset
– Quattor /  CDB  configuration  tool
– Lemon  computer  monitoring
– Open  source,  but  a  small  community
Gavin  McCance,  CERN 7
• Many  diverse  applications  (”clusters”)  
• Managed  by  different  teams  (CERN  IT  +  experiment  groups)
Gavin  McCance,  CERN 8
New  data  centre  to  expand  capacity
Gavin  McCance,  CERN 9
• Data  centre  in  Geneva  
at  the  limit  of  
electrical  capacity  at  
3.5MW
• New  centre  chosen  in  
Budapest,  Hungary
• Additional  2.7MW  of  
usable  power
• Hands  off  facility
• Deploying  from  2013  
with  200Gbit/s  
network  to  CERN
Time  to  change  strategy
• Rationale
– Need  to  manage  twice  the  servers  as  today
– No  increase  in  staff  numbers
– Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is
• Approach
– CERN  is  no  longer  a  special  case  for  compute
– Adopt  an  open  source  tool  chain  model
– Our  engineers  rapidly  iterate
• Evaluate  solutions  in  the  problem  domain
• Identify  functional  gaps  and  challenge  old  assumptions
• Select  first  choice  but  be  prepared  to  change  in  future
– Contribute  new  function  back  to  the  community
Gavin  McCance,  CERN 10
Building  Blocks
Gavin  McCance,  CERN 11
Bamboo  
Koji,  Mock
AIMS/PXE
Foreman
Yum  repo
Pulp
Puppet-­DB
mcollective,  yum
JIRA
Lemon  /
Hadoop
git
OpenStack  
Nova
Hardware  
database
Puppet
Active  Directory  /
LDAP
Choose  Puppet  for  Configuration
• The  tool  space  has  exploded  in  last  few  years
– In  configuration  management  and  operations
• Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’
• Many  large  enterprises  now  use  Puppet
– Its  declarative  approach  fits  what  we’re  used  to  at  CERN
– Large  installations:  friendly,  wide-­‐based  community
– You  can  buy  books  on  it
– You  can  employ  people  who  know  it  better  than  do
Gavin  McCance,  CERN 12
Puppet  Experience
• Excellent:  basic  puppet  is  easy  to  setup
and  can  be  scaled-­‐up  well
• Well  documented,  configuring  services  with  it  is  easy
• Handle  our  cluster  diversity  and  dynamic  clouds  well
• Lots  of  resource  (“modules”)  online,  though  of  varying  quality
• Large,  responsive  community  to  help
• Lots  of  nice  tooling  for  free
– Configuration  version  control  and  branching:  integrates  well  with  git
– Dashboard:  we  use  the  Foreman dashboard
• We’re  moving  all  our  production  service  over  in  2013
Gavin  McCance,  CERN 13
Gavin  McCance,  CERN 14
Preparing  the  move  to  cloud
• Improve  operational  efficiency  and  dynamicness
– Dynamic  multiple  operating  system  demand
– Dynamic  temporary  load  spikes  for  special  activities
– Hardware  interventions  with  long  running  programs  (live  migration)
• Improve  resource  efficiency
– Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O
– Highly  variable  load  such  as  interactive  or  build  machines
• Enable  cloud  architectures
– Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and  
workflows
• Improve  responsiveness
– Self-­‐Service  with  coffee  break  response  time
Gavin  McCance,  CERN 15
What  is  OpenStack  ?
• OpenStack  is  a  cloud  operating  system  that  controls  large  
pools  of  compute,  storage,  and  networking  resources  
throughout  a  datacenter,  all  managed  through  a  dashboard  
that  gives  administrators  control  while  empowering  their  users  
to  provision  resources  through  a  web  interface
Gavin  McCance,  CERN 16
Service  Model
Gavin  McCance,  CERN 17
• Pets  are  given  names  like  
pussinboots.cern.ch  
• They  are  unique,  lovingly  hand  raised  
and  cared  for
• When  they  get  ill,  you  nurse  them  back  
to  health
• Cattle  are  given  numbers  like  
vm0042.cern.ch
• They  are  almost  identical  to  other  cattle
• When  they  get  ill,  you  get  another  one
• Future  application  architectures  should  use  Cattle  but  Pets  with  
strong  configuration  management  are  viable  and  still  needed
Borrowed  from
@randybias at  Cloudscaling
http://www.slideshare.net/randybias/the-­‐cloud-­‐
revolution-­‐cyber-­‐press-­‐forum-­‐philippines
Basic  Openstack Components
Gavin  McCance,  CERN 18
Compute Scheduler
NetworkVolume
Registry Image
KEYSTONE HORIZON
NOVAGLANCE
• Each  component  has  an  API  and  is  pluggable
• Other  non-­‐core  projects  interact  with  these  components  
Supporting  the  Pets  with  OpenStack
• Network
– Interfacing  with  legacy  site  DNS  and  IP  management
– Ensuring  Kerberos  identity  before  VM  start
• Puppet
– Ease  use  of  configuration  management  tools  with  our  users
– Exploit  mcollective  for  orchestration/delegation
• External  Block  Storage
– Currently  using  nova-­‐volume  with  Gluster backing  store
• Live  migration  to  maximise  availability
– KVM  live  migration  using  Gluster
– KVM  and  Hyper-­‐V  block  migration
Gavin  McCance,  CERN 19
Current  Status  of  OpenStack  at  CERN
• Working  on  an  Essex  code  base  from  the  EPEL  repository
– Excellent  experience  with  the  Fedora  cloud-­‐sig  team
– Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora
• Components
– Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V
– Keystone  running  with  Active  Directory  and  Glance  for  Linux  and  
Windows  images
• Pre-­‐production  facility  with  around  200  Hypervisors,    with  
2000  VMs  integrated  with  CERN  infrastructure
– used  for  simulation  of  magnet  placement  using  LHC@Home and  batch  
physics  programs
Gavin  McCance,  CERN 20
Gavin  McCance,  CERN 21
Next  Steps
• Deploy  into  production  at  the  start  of  2013  with  Folsom  running  
production  services  and  compute  on  top  of  OpenStack  IaaS
• Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary
• Exploit  new  functionality
– Ceilometer  for  metering
– Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers
– X.509  user  certificate  authentication
– Load  balancing  as  a  service
Ramping  to  15K  hypervisors  with  100K  
VMs  by  2015  
Gavin  McCance,  CERN 22
Conclusions
• CERN  computer  centre  is  expanding
• We’re  in  the  process  of  refurbishing  the  tools  we  use  
to  manage  the  centre  based  on  Openstack for  IaaS
and  Puppet for  configuration  management
• Production  at  CERN  in  next  few  months  on  Folsom
– Gradual  migration  of  all  our  services
• Community  is  key  to  shared  success
– CERN  contributes  and  benefits
Gavin  McCance,  CERN 23
BACKUP  SLIDES
Gavin  McCance,  CERN 24
Training  and  Support
• Buy  the  book  rather  than  guru  mentoring
• Follow  the  mailing  lists  to  learn
• Newcomers  are  rapidly  productive  (and  often  know  more  than  us)
• Community  and  Enterprise  support  means  we’re  not  on  our  own
Gavin  McCance,  CERN 25
Staff  Motivation
• Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts  
end
Gavin  McCance,  CERN 26
When  communities  combine…
• OpenStack’s  many  components  and  options  make  
configuration  complex  out  of  the  box
• Puppet  forge module  from  PuppetLabs  does  our  configuration
• The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a  
configured  machine  in  15  minutes
Gavin  McCance,  CERN 27
Foreman  to  manage  Puppetized VM
Gavin  McCance,  CERN 28
Active  Directory  Integration
• CERN’s  Active  Directory
– Unified  identity  management  across  the  site
– 44,000  users
– 29,000  groups
– 200  arrivals/departures  per  month
• Full  integration  with  Active  Directory  via  LDAP
– Uses  the  OpenLDAP backend  with  some  particular  configuration  
settings
– Aim  for  minimal  changes  to  Active  Directory
– 7  patches  submitted  around  hard  coded  values  and  additional  filtering
• Now  in  use  in  our  pre-­‐production  instance
– Map  project  roles  (admins,  members)  to  groups
– Documentation  in  the  OpenStack  wiki
Gavin  McCance,  CERN 29
What  are  we  missing  (or  haven’t  found  yet)  ?
• Best  practice  for
– Monitoring  and  KPIs  as  part  of  core  functionality
– Guest  disaster  recovery
– Migration  between  versions  of  OpenStack
• Roles  within  multi-­‐user  projects
– VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete)
– Project  admins  allowed  to  manage  all  resources
– Other  members  should  not  have  high  rights  over  other  members  VMs
• Global  quota  management  for  non-­‐elastic  private  cloud
– Manage  resource  prioritisation  and  allocation  centrally
– Capacity  management  /  utilisation  for  planning
Gavin  McCance,  CERN 30
Opportunistic  Clouds  in  online  experiment  farms
• The  CERN  experiments  have  farms  of  1000s  of  Linux  servers  
close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s  
to  be  recorded  to  tape
• When  the  accelerator  is  not  running,  these  machines  are  
currently    idle
– Accelerator  has  regular  maintenance  slots  of  several  days
– Long  Shutdown  due  from  March  2013-­‐November  2014
• One  of  the  experiments  are  deploying  OpenStack  on  their  farm
– Simulation  (low  I/O,  high  CPU)
– Analysis  (high  I/O,  high  CPU,  high  network)
Gavin  McCance,  CERN 31
New  architecture  data  flows
Gavin  McCance,  CERN 32

More Related Content

What's hot

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at FlickrJohn Allspaw
 
Whamcloud - Lustre for HPC and Ai
Whamcloud - Lustre for HPC and AiWhamcloud - Lustre for HPC and Ai
Whamcloud - Lustre for HPC and Aiinside-BigData.com
 
Docker Swarm Introduction
Docker Swarm IntroductionDocker Swarm Introduction
Docker Swarm Introductionrajdeep
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeSumant Tambe
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101Bernd Ocklin
 
Clouds and Tools: Cheat Sheets & Infographics
Clouds and Tools: Cheat Sheets & InfographicsClouds and Tools: Cheat Sheets & Infographics
Clouds and Tools: Cheat Sheets & InfographicsThomas Poetter
 
Giới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGiới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGMO-Z.com Vietnam Lab Center
 
An Introduction To Jenkins
An Introduction To JenkinsAn Introduction To Jenkins
An Introduction To JenkinsKnoldus Inc.
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentJean-François Gagné
 
1. Docker Introduction.pdf
1. Docker Introduction.pdf1. Docker Introduction.pdf
1. Docker Introduction.pdfAmarGautam15
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Spark Summit
 
Head First to Container&Kubernetes
Head First to Container&KubernetesHead First to Container&Kubernetes
Head First to Container&KubernetesHungWei Chiu
 
Building a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir MelnikBuilding a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir MelnikShapeBlue
 
シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視Kazuhiro Nishiyama
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityJérôme Petazzoni
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...Severalnines
 
The dream is alive! Running Linux containers on an illumos kernel
The dream is alive! Running Linux containers on an illumos kernelThe dream is alive! Running Linux containers on an illumos kernel
The dream is alive! Running Linux containers on an illumos kernelbcantrill
 

What's hot (20)

10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
10+ Deploys Per Day: Dev and Ops Cooperation at Flickr
 
Whamcloud - Lustre for HPC and Ai
Whamcloud - Lustre for HPC and AiWhamcloud - Lustre for HPC and Ai
Whamcloud - Lustre for HPC and Ai
 
Docker Swarm Introduction
Docker Swarm IntroductionDocker Swarm Introduction
Docker Swarm Introduction
 
Communication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/SubscribeCommunication Patterns Using Data-Centric Publish/Subscribe
Communication Patterns Using Data-Centric Publish/Subscribe
 
MySQL NDB Cluster 101
MySQL NDB Cluster 101MySQL NDB Cluster 101
MySQL NDB Cluster 101
 
Clouds and Tools: Cheat Sheets & Infographics
Clouds and Tools: Cheat Sheets & InfographicsClouds and Tools: Cheat Sheets & Infographics
Clouds and Tools: Cheat Sheets & Infographics
 
Giới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cdGiới thiệu docker và ứng dụng trong ci-cd
Giới thiệu docker và ứng dụng trong ci-cd
 
Docker & kubernetes
Docker & kubernetesDocker & kubernetes
Docker & kubernetes
 
An Introduction To Jenkins
An Introduction To JenkinsAn Introduction To Jenkins
An Introduction To Jenkins
 
MySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated EnvironmentMySQL Scalability and Reliability for Replicated Environment
MySQL Scalability and Reliability for Replicated Environment
 
1. Docker Introduction.pdf
1. Docker Introduction.pdf1. Docker Introduction.pdf
1. Docker Introduction.pdf
 
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
Building Real-Time BI Systems with Kafka, Spark, and Kudu: Spark Summit East ...
 
Head First to Container&Kubernetes
Head First to Container&KubernetesHead First to Container&Kubernetes
Head First to Container&Kubernetes
 
Building a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir MelnikBuilding a redundant CloudStack management cluster - Vladimir Melnik
Building a redundant CloudStack management cluster - Vladimir Melnik
 
シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視シェルスクリプトで簡易ping監視
シェルスクリプトで簡易ping監視
 
Docker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and securityDocker, Linux Containers (LXC), and security
Docker, Linux Containers (LXC), and security
 
Zabbix Monitoring Platform
Zabbix Monitoring Platform Zabbix Monitoring Platform
Zabbix Monitoring Platform
 
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
MySQL Load Balancers - Maxscale, ProxySQL, HAProxy, MySQL Router & nginx - A ...
 
The dream is alive! Running Linux containers on an illumos kernel
The dream is alive! Running Linux containers on an illumos kernelThe dream is alive! Running Linux containers on an illumos kernel
The dream is alive! Running Linux containers on an illumos kernel
 
Kubernetes PPT.pptx
Kubernetes PPT.pptxKubernetes PPT.pptx
Kubernetes PPT.pptx
 

Similar to CERN Data Centre Evolution

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user storylaurabeckcahoon
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERNGavin McCance
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab OverviewEd Dodds
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Belmiro Moreira
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchTom Connor
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicTim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3Tim Bell
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Eshed Gal-Or
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Ian Lumb
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSamir Ibradzic
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a BudgetSusan Wu
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCOTesora
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentKamesh Pemmaraju
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case StudyPuppet
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Belmiro Moreira
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data DataCentred
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...David Wallom
 

Similar to CERN Data Centre Evolution (20)

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCO
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deployment
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case Study
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 

Recently uploaded

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 

Recently uploaded (20)

Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 

CERN Data Centre Evolution

  • 1. CERN  Data  Centre  Evolution Gavin  McCance gavin.mccance@cern.ch @gmccance SDCD12:  Supporting  Science  with  Cloud  Computing Bern 19th November  2012
  • 2. What  is  CERN  ? Gavin  McCance,  CERN 2 • Conseil Européen pour  la   Recherche Nucléaire – aka   European  Laboratory  for   Particle  Physics • Between  Geneva  and  the   Jura  mountains,  straddling   the  Swiss-­‐French  border • Founded  in  1954  with  an   international  treaty • Our  business  is  fundamental   physics  ,  what  is  the   universe  made  of  and  how   does  it  work
  • 3. Gavin  McCance,  CERN 3 Answering fundamental questions… • How  to  explain particles have  mass? We have  theories and  accumulating experimental evidence..  Getting close… • What is 96%  of  the  universe made  of  ? We can only see 4%  of  its estimated mass! • Why isn’t there anti-­‐matter in  the  universe? Nature  should be symmetric… • What was the  state  of  matter just after the  « Big Bang »  ? Travelling  back  to  the  earliest instants  of the  universe would help…
  • 4. 4 The  Large  Hadron  Collider  (LHC)  tunnel Gavin  McCance,  CERN
  • 6. Gavin  McCance,  CERN 6 • Data  Centre  by  Numbers – Hardware  installation  &  retirement • ~7,000  hardware  movements/year;  ~1,800  disk  failures/year Xeon   5150 2% Xeon   5160 10% Xeon   E5335 7% Xeon   E5345 14% Xeon   E5405 6% Xeon   E5410 16% Xeon   L5420 8% Xeon   L5520 33% Xeon   3GHz 4% Fujitsu 3% Hitachi 23% HP 0% Maxtor 0% Seagate 15% Western   Digital 59% Other 0% High  Speed  Routers (640 Mbps  →  2.4  Tbps) 24 Ethernet  Switches 350 10  Gbps  ports 2,000 Switching  Capacity 4.8 Tbps 1  Gbps  ports 16,939 10  Gbps  ports 558 Racks 828 Servers 11,728 Processors 15,694 Cores 64,238 HEPSpec06 482,507 Disks 64,109 Raw  disk  capacity  (TiB) 63,289 Memory  modules 56,014 Memory  capacity  (TiB) 158 RAID  controllers 3,749 Tape  Drives 160 Tape  Cartridges 45,000 Tape  slots 56,000 Tape Capacity  (TiB) 73,000 IT  Power  Consumption 2,456  KW Total Power  Consumption 3,890  KW
  • 7. Current  infrastructure • Around  12k  servers – Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes – Majority  Scientific  Linux  (RHEL5/6  clone) – Mostly  running  on  real  hardware – Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes   onto  Microsoft  HyperV – Various  other  virtualisation  projects  around • In  2002  we  developed  our  own  management  toolset – Quattor /  CDB  configuration  tool – Lemon  computer  monitoring – Open  source,  but  a  small  community Gavin  McCance,  CERN 7
  • 8. • Many  diverse  applications  (”clusters”)   • Managed  by  different  teams  (CERN  IT  +  experiment  groups) Gavin  McCance,  CERN 8
  • 9. New  data  centre  to  expand  capacity Gavin  McCance,  CERN 9 • Data  centre  in  Geneva   at  the  limit  of   electrical  capacity  at   3.5MW • New  centre  chosen  in   Budapest,  Hungary • Additional  2.7MW  of   usable  power • Hands  off  facility • Deploying  from  2013   with  200Gbit/s   network  to  CERN
  • 10. Time  to  change  strategy • Rationale – Need  to  manage  twice  the  servers  as  today – No  increase  in  staff  numbers – Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is • Approach – CERN  is  no  longer  a  special  case  for  compute – Adopt  an  open  source  tool  chain  model – Our  engineers  rapidly  iterate • Evaluate  solutions  in  the  problem  domain • Identify  functional  gaps  and  challenge  old  assumptions • Select  first  choice  but  be  prepared  to  change  in  future – Contribute  new  function  back  to  the  community Gavin  McCance,  CERN 10
  • 11. Building  Blocks Gavin  McCance,  CERN 11 Bamboo   Koji,  Mock AIMS/PXE Foreman Yum  repo Pulp Puppet-­DB mcollective,  yum JIRA Lemon  / Hadoop git OpenStack   Nova Hardware   database Puppet Active  Directory  / LDAP
  • 12. Choose  Puppet  for  Configuration • The  tool  space  has  exploded  in  last  few  years – In  configuration  management  and  operations • Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’ • Many  large  enterprises  now  use  Puppet – Its  declarative  approach  fits  what  we’re  used  to  at  CERN – Large  installations:  friendly,  wide-­‐based  community – You  can  buy  books  on  it – You  can  employ  people  who  know  it  better  than  do Gavin  McCance,  CERN 12
  • 13. Puppet  Experience • Excellent:  basic  puppet  is  easy  to  setup and  can  be  scaled-­‐up  well • Well  documented,  configuring  services  with  it  is  easy • Handle  our  cluster  diversity  and  dynamic  clouds  well • Lots  of  resource  (“modules”)  online,  though  of  varying  quality • Large,  responsive  community  to  help • Lots  of  nice  tooling  for  free – Configuration  version  control  and  branching:  integrates  well  with  git – Dashboard:  we  use  the  Foreman dashboard • We’re  moving  all  our  production  service  over  in  2013 Gavin  McCance,  CERN 13
  • 15. Preparing  the  move  to  cloud • Improve  operational  efficiency  and  dynamicness – Dynamic  multiple  operating  system  demand – Dynamic  temporary  load  spikes  for  special  activities – Hardware  interventions  with  long  running  programs  (live  migration) • Improve  resource  efficiency – Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O – Highly  variable  load  such  as  interactive  or  build  machines • Enable  cloud  architectures – Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and   workflows • Improve  responsiveness – Self-­‐Service  with  coffee  break  response  time Gavin  McCance,  CERN 15
  • 16. What  is  OpenStack  ? • OpenStack  is  a  cloud  operating  system  that  controls  large   pools  of  compute,  storage,  and  networking  resources   throughout  a  datacenter,  all  managed  through  a  dashboard   that  gives  administrators  control  while  empowering  their  users   to  provision  resources  through  a  web  interface Gavin  McCance,  CERN 16
  • 17. Service  Model Gavin  McCance,  CERN 17 • Pets  are  given  names  like   pussinboots.cern.ch   • They  are  unique,  lovingly  hand  raised   and  cared  for • When  they  get  ill,  you  nurse  them  back   to  health • Cattle  are  given  numbers  like   vm0042.cern.ch • They  are  almost  identical  to  other  cattle • When  they  get  ill,  you  get  another  one • Future  application  architectures  should  use  Cattle  but  Pets  with   strong  configuration  management  are  viable  and  still  needed Borrowed  from @randybias at  Cloudscaling http://www.slideshare.net/randybias/the-­‐cloud-­‐ revolution-­‐cyber-­‐press-­‐forum-­‐philippines
  • 18. Basic  Openstack Components Gavin  McCance,  CERN 18 Compute Scheduler NetworkVolume Registry Image KEYSTONE HORIZON NOVAGLANCE • Each  component  has  an  API  and  is  pluggable • Other  non-­‐core  projects  interact  with  these  components  
  • 19. Supporting  the  Pets  with  OpenStack • Network – Interfacing  with  legacy  site  DNS  and  IP  management – Ensuring  Kerberos  identity  before  VM  start • Puppet – Ease  use  of  configuration  management  tools  with  our  users – Exploit  mcollective  for  orchestration/delegation • External  Block  Storage – Currently  using  nova-­‐volume  with  Gluster backing  store • Live  migration  to  maximise  availability – KVM  live  migration  using  Gluster – KVM  and  Hyper-­‐V  block  migration Gavin  McCance,  CERN 19
  • 20. Current  Status  of  OpenStack  at  CERN • Working  on  an  Essex  code  base  from  the  EPEL  repository – Excellent  experience  with  the  Fedora  cloud-­‐sig  team – Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora • Components – Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V – Keystone  running  with  Active  Directory  and  Glance  for  Linux  and   Windows  images • Pre-­‐production  facility  with  around  200  Hypervisors,    with   2000  VMs  integrated  with  CERN  infrastructure – used  for  simulation  of  magnet  placement  using  LHC@Home and  batch   physics  programs Gavin  McCance,  CERN 20
  • 22. Next  Steps • Deploy  into  production  at  the  start  of  2013  with  Folsom  running   production  services  and  compute  on  top  of  OpenStack  IaaS • Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary • Exploit  new  functionality – Ceilometer  for  metering – Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers – X.509  user  certificate  authentication – Load  balancing  as  a  service Ramping  to  15K  hypervisors  with  100K   VMs  by  2015   Gavin  McCance,  CERN 22
  • 23. Conclusions • CERN  computer  centre  is  expanding • We’re  in  the  process  of  refurbishing  the  tools  we  use   to  manage  the  centre  based  on  Openstack for  IaaS and  Puppet for  configuration  management • Production  at  CERN  in  next  few  months  on  Folsom – Gradual  migration  of  all  our  services • Community  is  key  to  shared  success – CERN  contributes  and  benefits Gavin  McCance,  CERN 23
  • 25. Training  and  Support • Buy  the  book  rather  than  guru  mentoring • Follow  the  mailing  lists  to  learn • Newcomers  are  rapidly  productive  (and  often  know  more  than  us) • Community  and  Enterprise  support  means  we’re  not  on  our  own Gavin  McCance,  CERN 25
  • 26. Staff  Motivation • Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts   end Gavin  McCance,  CERN 26
  • 27. When  communities  combine… • OpenStack’s  many  components  and  options  make   configuration  complex  out  of  the  box • Puppet  forge module  from  PuppetLabs  does  our  configuration • The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a   configured  machine  in  15  minutes Gavin  McCance,  CERN 27
  • 28. Foreman  to  manage  Puppetized VM Gavin  McCance,  CERN 28
  • 29. Active  Directory  Integration • CERN’s  Active  Directory – Unified  identity  management  across  the  site – 44,000  users – 29,000  groups – 200  arrivals/departures  per  month • Full  integration  with  Active  Directory  via  LDAP – Uses  the  OpenLDAP backend  with  some  particular  configuration   settings – Aim  for  minimal  changes  to  Active  Directory – 7  patches  submitted  around  hard  coded  values  and  additional  filtering • Now  in  use  in  our  pre-­‐production  instance – Map  project  roles  (admins,  members)  to  groups – Documentation  in  the  OpenStack  wiki Gavin  McCance,  CERN 29
  • 30. What  are  we  missing  (or  haven’t  found  yet)  ? • Best  practice  for – Monitoring  and  KPIs  as  part  of  core  functionality – Guest  disaster  recovery – Migration  between  versions  of  OpenStack • Roles  within  multi-­‐user  projects – VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete) – Project  admins  allowed  to  manage  all  resources – Other  members  should  not  have  high  rights  over  other  members  VMs • Global  quota  management  for  non-­‐elastic  private  cloud – Manage  resource  prioritisation  and  allocation  centrally – Capacity  management  /  utilisation  for  planning Gavin  McCance,  CERN 30
  • 31. Opportunistic  Clouds  in  online  experiment  farms • The  CERN  experiments  have  farms  of  1000s  of  Linux  servers   close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s   to  be  recorded  to  tape • When  the  accelerator  is  not  running,  these  machines  are   currently    idle – Accelerator  has  regular  maintenance  slots  of  several  days – Long  Shutdown  due  from  March  2013-­‐November  2014 • One  of  the  experiments  are  deploying  OpenStack  on  their  farm – Simulation  (low  I/O,  high  CPU) – Analysis  (high  I/O,  high  CPU,  high  network) Gavin  McCance,  CERN 31
  • 32. New  architecture  data  flows Gavin  McCance,  CERN 32

Editor's Notes

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  4. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  5. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  6. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  7. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  8. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  9. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  10. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  11. Complex to configure… take advantage of the experience of others
  12. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  13. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  14. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…