SlideShare a Scribd company logo
CERN  Data  Centre  Evolution
Gavin  McCance
gavin.mccance@cern.ch
@gmccance
SDCD12:  Supporting  Science  with  Cloud  Computing
Bern
19th November  2012
What  is  CERN  ?
Gavin  McCance,  CERN 2
• Conseil Européen pour  la  
Recherche Nucléaire – aka  
European  Laboratory  for  
Particle  Physics
• Between  Geneva  and  the  
Jura  mountains,  straddling  
the  Swiss-­‐French  border
• Founded  in  1954  with  an  
international  treaty
• Our  business  is  fundamental  
physics  ,  what  is  the  
universe  made  of  and  how  
does  it  work
Gavin  McCance,  CERN 3
Answering fundamental questions…
• How  to  explain particles have  mass?
We have  theories and  accumulating experimental evidence..  Getting close…
• What is 96%  of  the  universe made  of  ?
We can only see 4%  of  its estimated mass!
• Why isn’t there anti-­‐matter
in  the  universe?
Nature  should be symmetric…
• What was the  state  of  matter just
after the  « Big Bang »  ?
Travelling  back  to  the  earliest instants  of
the  universe would help…
4
The  Large  Hadron  Collider  (LHC)  tunnel
Gavin  McCance,  CERN
Gavin  McCance,  CERN 5
Gavin  McCance,  CERN 6
• Data  Centre  by  Numbers
– Hardware  installation  &  retirement
• ~7,000  hardware  movements/year;  ~1,800  disk  failures/year
Xeon  
5150
2%
Xeon  
5160
10%
Xeon  
E5335
7%
Xeon  
E5345
14%
Xeon  
E5405
6%
Xeon  
E5410
16%
Xeon  
L5420
8%
Xeon  
L5520
33%
Xeon  
3GHz
4%
Fujitsu
3%
Hitachi
23%
HP
0%
Maxtor
0%
Seagate
15%
Western  
Digital
59%
Other
0%
High  Speed  Routers
(640 Mbps  →  2.4  Tbps)
24
Ethernet  Switches 350
10  Gbps  ports 2,000
Switching  Capacity 4.8 Tbps
1  Gbps  ports 16,939
10  Gbps  ports 558
Racks 828
Servers 11,728
Processors 15,694
Cores 64,238
HEPSpec06 482,507
Disks 64,109
Raw  disk  capacity  (TiB) 63,289
Memory  modules 56,014
Memory  capacity  (TiB) 158
RAID  controllers 3,749
Tape  Drives 160
Tape  Cartridges 45,000
Tape  slots 56,000
Tape Capacity  (TiB) 73,000
IT  Power  Consumption 2,456  KW
Total Power  Consumption 3,890  KW
Current  infrastructure
• Around  12k  servers
– Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes
– Majority  Scientific  Linux  (RHEL5/6  clone)
– Mostly  running  on  real  hardware
– Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes  
onto  Microsoft  HyperV
– Various  other  virtualisation  projects  around
• In  2002  we  developed  our  own  management  toolset
– Quattor /  CDB  configuration  tool
– Lemon  computer  monitoring
– Open  source,  but  a  small  community
Gavin  McCance,  CERN 7
• Many  diverse  applications  (”clusters”)  
• Managed  by  different  teams  (CERN  IT  +  experiment  groups)
Gavin  McCance,  CERN 8
New  data  centre  to  expand  capacity
Gavin  McCance,  CERN 9
• Data  centre  in  Geneva  
at  the  limit  of  
electrical  capacity  at  
3.5MW
• New  centre  chosen  in  
Budapest,  Hungary
• Additional  2.7MW  of  
usable  power
• Hands  off  facility
• Deploying  from  2013  
with  200Gbit/s  
network  to  CERN
Time  to  change  strategy
• Rationale
– Need  to  manage  twice  the  servers  as  today
– No  increase  in  staff  numbers
– Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is
• Approach
– CERN  is  no  longer  a  special  case  for  compute
– Adopt  an  open  source  tool  chain  model
– Our  engineers  rapidly  iterate
• Evaluate  solutions  in  the  problem  domain
• Identify  functional  gaps  and  challenge  old  assumptions
• Select  first  choice  but  be  prepared  to  change  in  future
– Contribute  new  function  back  to  the  community
Gavin  McCance,  CERN 10
Building  Blocks
Gavin  McCance,  CERN 11
Bamboo  
Koji,  Mock
AIMS/PXE
Foreman
Yum  repo
Pulp
Puppet-­DB
mcollective,  yum
JIRA
Lemon  /
Hadoop
git
OpenStack  
Nova
Hardware  
database
Puppet
Active  Directory  /
LDAP
Choose  Puppet  for  Configuration
• The  tool  space  has  exploded  in  last  few  years
– In  configuration  management  and  operations
• Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’
• Many  large  enterprises  now  use  Puppet
– Its  declarative  approach  fits  what  we’re  used  to  at  CERN
– Large  installations:  friendly,  wide-­‐based  community
– You  can  buy  books  on  it
– You  can  employ  people  who  know  it  better  than  do
Gavin  McCance,  CERN 12
Puppet  Experience
• Excellent:  basic  puppet  is  easy  to  setup
and  can  be  scaled-­‐up  well
• Well  documented,  configuring  services  with  it  is  easy
• Handle  our  cluster  diversity  and  dynamic  clouds  well
• Lots  of  resource  (“modules”)  online,  though  of  varying  quality
• Large,  responsive  community  to  help
• Lots  of  nice  tooling  for  free
– Configuration  version  control  and  branching:  integrates  well  with  git
– Dashboard:  we  use  the  Foreman dashboard
• We’re  moving  all  our  production  service  over  in  2013
Gavin  McCance,  CERN 13
Gavin  McCance,  CERN 14
Preparing  the  move  to  cloud
• Improve  operational  efficiency  and  dynamicness
– Dynamic  multiple  operating  system  demand
– Dynamic  temporary  load  spikes  for  special  activities
– Hardware  interventions  with  long  running  programs  (live  migration)
• Improve  resource  efficiency
– Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O
– Highly  variable  load  such  as  interactive  or  build  machines
• Enable  cloud  architectures
– Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and  
workflows
• Improve  responsiveness
– Self-­‐Service  with  coffee  break  response  time
Gavin  McCance,  CERN 15
What  is  OpenStack  ?
• OpenStack  is  a  cloud  operating  system  that  controls  large  
pools  of  compute,  storage,  and  networking  resources  
throughout  a  datacenter,  all  managed  through  a  dashboard  
that  gives  administrators  control  while  empowering  their  users  
to  provision  resources  through  a  web  interface
Gavin  McCance,  CERN 16
Service  Model
Gavin  McCance,  CERN 17
• Pets  are  given  names  like  
pussinboots.cern.ch  
• They  are  unique,  lovingly  hand  raised  
and  cared  for
• When  they  get  ill,  you  nurse  them  back  
to  health
• Cattle  are  given  numbers  like  
vm0042.cern.ch
• They  are  almost  identical  to  other  cattle
• When  they  get  ill,  you  get  another  one
• Future  application  architectures  should  use  Cattle  but  Pets  with  
strong  configuration  management  are  viable  and  still  needed
Borrowed  from
@randybias at  Cloudscaling
http://www.slideshare.net/randybias/the-­‐cloud-­‐
revolution-­‐cyber-­‐press-­‐forum-­‐philippines
Basic  Openstack Components
Gavin  McCance,  CERN 18
Compute Scheduler
NetworkVolume
Registry Image
KEYSTONE HORIZON
NOVAGLANCE
• Each  component  has  an  API  and  is  pluggable
• Other  non-­‐core  projects  interact  with  these  components  
Supporting  the  Pets  with  OpenStack
• Network
– Interfacing  with  legacy  site  DNS  and  IP  management
– Ensuring  Kerberos  identity  before  VM  start
• Puppet
– Ease  use  of  configuration  management  tools  with  our  users
– Exploit  mcollective  for  orchestration/delegation
• External  Block  Storage
– Currently  using  nova-­‐volume  with  Gluster backing  store
• Live  migration  to  maximise  availability
– KVM  live  migration  using  Gluster
– KVM  and  Hyper-­‐V  block  migration
Gavin  McCance,  CERN 19
Current  Status  of  OpenStack  at  CERN
• Working  on  an  Essex  code  base  from  the  EPEL  repository
– Excellent  experience  with  the  Fedora  cloud-­‐sig  team
– Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora
• Components
– Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V
– Keystone  running  with  Active  Directory  and  Glance  for  Linux  and  
Windows  images
• Pre-­‐production  facility  with  around  200  Hypervisors,    with  
2000  VMs  integrated  with  CERN  infrastructure
– used  for  simulation  of  magnet  placement  using  LHC@Home and  batch  
physics  programs
Gavin  McCance,  CERN 20
Gavin  McCance,  CERN 21
Next  Steps
• Deploy  into  production  at  the  start  of  2013  with  Folsom  running  
production  services  and  compute  on  top  of  OpenStack  IaaS
• Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary
• Exploit  new  functionality
– Ceilometer  for  metering
– Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers
– X.509  user  certificate  authentication
– Load  balancing  as  a  service
Ramping  to  15K  hypervisors  with  100K  
VMs  by  2015  
Gavin  McCance,  CERN 22
Conclusions
• CERN  computer  centre  is  expanding
• We’re  in  the  process  of  refurbishing  the  tools  we  use  
to  manage  the  centre  based  on  Openstack for  IaaS
and  Puppet for  configuration  management
• Production  at  CERN  in  next  few  months  on  Folsom
– Gradual  migration  of  all  our  services
• Community  is  key  to  shared  success
– CERN  contributes  and  benefits
Gavin  McCance,  CERN 23
BACKUP  SLIDES
Gavin  McCance,  CERN 24
Training  and  Support
• Buy  the  book  rather  than  guru  mentoring
• Follow  the  mailing  lists  to  learn
• Newcomers  are  rapidly  productive  (and  often  know  more  than  us)
• Community  and  Enterprise  support  means  we’re  not  on  our  own
Gavin  McCance,  CERN 25
Staff  Motivation
• Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts  
end
Gavin  McCance,  CERN 26
When  communities  combine…
• OpenStack’s  many  components  and  options  make  
configuration  complex  out  of  the  box
• Puppet  forge module  from  PuppetLabs  does  our  configuration
• The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a  
configured  machine  in  15  minutes
Gavin  McCance,  CERN 27
Foreman  to  manage  Puppetized VM
Gavin  McCance,  CERN 28
Active  Directory  Integration
• CERN’s  Active  Directory
– Unified  identity  management  across  the  site
– 44,000  users
– 29,000  groups
– 200  arrivals/departures  per  month
• Full  integration  with  Active  Directory  via  LDAP
– Uses  the  OpenLDAP backend  with  some  particular  configuration  
settings
– Aim  for  minimal  changes  to  Active  Directory
– 7  patches  submitted  around  hard  coded  values  and  additional  filtering
• Now  in  use  in  our  pre-­‐production  instance
– Map  project  roles  (admins,  members)  to  groups
– Documentation  in  the  OpenStack  wiki
Gavin  McCance,  CERN 29
What  are  we  missing  (or  haven’t  found  yet)  ?
• Best  practice  for
– Monitoring  and  KPIs  as  part  of  core  functionality
– Guest  disaster  recovery
– Migration  between  versions  of  OpenStack
• Roles  within  multi-­‐user  projects
– VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete)
– Project  admins  allowed  to  manage  all  resources
– Other  members  should  not  have  high  rights  over  other  members  VMs
• Global  quota  management  for  non-­‐elastic  private  cloud
– Manage  resource  prioritisation  and  allocation  centrally
– Capacity  management  /  utilisation  for  planning
Gavin  McCance,  CERN 30
Opportunistic  Clouds  in  online  experiment  farms
• The  CERN  experiments  have  farms  of  1000s  of  Linux  servers  
close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s  
to  be  recorded  to  tape
• When  the  accelerator  is  not  running,  these  machines  are  
currently    idle
– Accelerator  has  regular  maintenance  slots  of  several  days
– Long  Shutdown  due  from  March  2013-­‐November  2014
• One  of  the  experiments  are  deploying  OpenStack  on  their  farm
– Simulation  (low  I/O,  high  CPU)
– Analysis  (high  I/O,  high  CPU,  high  network)
Gavin  McCance,  CERN 31
New  architecture  data  flows
Gavin  McCance,  CERN 32

More Related Content

What's hot

[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
Ji-Woong Choi
 
VMware Outlines Its Own Journey to the Cloud
VMware Outlines Its Own Journey to the CloudVMware Outlines Its Own Journey to the Cloud
VMware Outlines Its Own Journey to the Cloud
VMware
 
Software defined datacenter SDDC
Software defined datacenter SDDCSoftware defined datacenter SDDC
Software defined datacenter SDDC
psjitha
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdf
ssuser1490e8
 
Kubernetes
KubernetesKubernetes
Kubernetes
erialc_w
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
VMware Tanzu
 
Intro To Docker
Intro To DockerIntro To Docker
Intro To Docker
Jessica Lucci
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
Winton Winton
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
Yevgeniy Brikman
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
Julien Pivotto
 
Openstack zun,virtual kubelet
Openstack zun,virtual kubeletOpenstack zun,virtual kubelet
Openstack zun,virtual kubelet
Chanyeol yoon
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
Aparna Pillai
 
Introduction to Docker - 2017
Introduction to Docker - 2017Introduction to Docker - 2017
Introduction to Docker - 2017
Docker, Inc.
 
Ansible Automation Platform.pdf
Ansible Automation Platform.pdfAnsible Automation Platform.pdf
Ansible Automation Platform.pdf
VuHoangAnh14
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
NAVER LABS
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
SlideTeam
 
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Silicon Comté
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
Docker, Inc.
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16
David Pasek
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
AIMDek Technologies
 

What's hot (20)

[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
[오픈소스컨설팅] Open Stack Ceph, Neutron, HA, Multi-Region
 
VMware Outlines Its Own Journey to the Cloud
VMware Outlines Its Own Journey to the CloudVMware Outlines Its Own Journey to the Cloud
VMware Outlines Its Own Journey to the Cloud
 
Software defined datacenter SDDC
Software defined datacenter SDDCSoftware defined datacenter SDDC
Software defined datacenter SDDC
 
OpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdfOpenShift Virtualization- Technical Overview.pdf
OpenShift Virtualization- Technical Overview.pdf
 
Kubernetes
KubernetesKubernetes
Kubernetes
 
Getting Started with Kubernetes
Getting Started with Kubernetes Getting Started with Kubernetes
Getting Started with Kubernetes
 
Intro To Docker
Intro To DockerIntro To Docker
Intro To Docker
 
Open shift 4 infra deep dive
Open shift 4    infra deep diveOpen shift 4    infra deep dive
Open shift 4 infra deep dive
 
Comprehensive Terraform Training
Comprehensive Terraform TrainingComprehensive Terraform Training
Comprehensive Terraform Training
 
Introduction to Prometheus
Introduction to PrometheusIntroduction to Prometheus
Introduction to Prometheus
 
Openstack zun,virtual kubelet
Openstack zun,virtual kubeletOpenstack zun,virtual kubelet
Openstack zun,virtual kubelet
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Introduction to Docker - 2017
Introduction to Docker - 2017Introduction to Docker - 2017
Introduction to Docker - 2017
 
Ansible Automation Platform.pdf
Ansible Automation Platform.pdfAnsible Automation Platform.pdf
Ansible Automation Platform.pdf
 
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
Docker + Kubernetes를 이용한 빌드 서버 가상화 사례
 
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
Kubernetes Docker Container Implementation Ppt PowerPoint Presentation Slide ...
 
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
Introduction à Docker et utilisation en production /Digital apéro Besançon [1...
 
Kubernetes in Docker
Kubernetes in DockerKubernetes in Docker
Kubernetes in Docker
 
VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16VMware HCI solutions - 2020-01-16
VMware HCI solutions - 2020-01-16
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 

Similar to CERN Data Centre Evolution

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebula Project
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
laurabeckcahoon
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
Gavin McCance
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
Ed Dodds
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Belmiro Moreira
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
Tom Connor
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
Tim Bell
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
Tim Bell
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?
Eshed Gal-Or
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Ian Lumb
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
Susan Wu
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
Samir Ibradzic
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCO
Tesora
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deployment
Kamesh Pemmaraju
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case Study
Puppet
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
Belmiro Moreira
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
DataCentred
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
David Wallom
 

Similar to CERN Data Centre Evolution (20)

OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
DOE Magellan OpenStack user story
DOE Magellan OpenStack user storyDOE Magellan OpenStack user story
DOE Magellan OpenStack user story
 
Configuration Management Evolution at CERN
Configuration Management Evolution at CERNConfiguration Management Evolution at CERN
Configuration Management Evolution at CERN
 
CloudLab Overview
CloudLab OverviewCloudLab Overview
CloudLab Overview
 
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013Deep Dive Into the CERN Cloud Infrastructure - November, 2013
Deep Dive Into the CERN Cloud Infrastructure - November, 2013
 
CLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB LaunchCLIMB System Introduction Talk - CLIMB Launch
CLIMB System Introduction Talk - CLIMB Launch
 
The OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack NordicThe OpenStack Cloud at CERN - OpenStack Nordic
The OpenStack Cloud at CERN - OpenStack Nordic
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
20181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v320181219 ucc open stack 5 years v3
20181219 ucc open stack 5 years v3
 
Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?Who Needs Network Management in a Cloud Native Environment?
Who Needs Network Management in a Cloud Native Environment?
 
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
Dev / Test / Ops – Gain More Horsepower and Reduce Costs by Sharing Kubernete...
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
Operating OpenStack on a Budget
Operating OpenStack on a BudgetOperating OpenStack on a Budget
Operating OpenStack on a Budget
 
OpenStack at EBSCO
OpenStack at EBSCOOpenStack at EBSCO
OpenStack at EBSCO
 
Dell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deploymentDell openstack cloud with inktank ceph – large scale customer deployment
Dell openstack cloud with inktank ceph – large scale customer deployment
 
NICS Puppet Case Study
NICS Puppet Case StudyNICS Puppet Case Study
NICS Puppet Case Study
 
Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015Unveiling CERN Cloud Architecture - October, 2015
Unveiling CERN Cloud Architecture - October, 2015
 
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
The Effectiveness, Efficiency and Legitimacy of Outsourcing Your Data
 
Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...Supporting Research through "Desktop as a Service" models of e-infrastructure...
Supporting Research through "Desktop as a Service" models of e-infrastructure...
 

Recently uploaded

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
Daiki Mogmet Ito
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
名前 です男
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
Pixlogix Infotech
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
Edge AI and Vision Alliance
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems S.M.S.A.
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
Matthew Sinclair
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Zilliz
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
Neo4j
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
DianaGray10
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
Ana-Maria Mihalceanu
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
DianaGray10
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
Rohit Gautam
 

Recently uploaded (20)

Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
Goodbye Windows 11: Make Way for Nitrux Linux 3.5.0!
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
How to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For FlutterHow to use Firebase Data Connect For Flutter
How to use Firebase Data Connect For Flutter
 
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
みなさんこんにちはこれ何文字まで入るの?40文字以下不可とか本当に意味わからないけどこれ限界文字数書いてないからマジでやばい文字数いけるんじゃないの?えこ...
 
20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website20 Comprehensive Checklist of Designing and Developing a Website
20 Comprehensive Checklist of Designing and Developing a Website
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
“Building and Scaling AI Applications with the Nx AI Manager,” a Presentation...
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Uni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdfUni Systems Copilot event_05062024_C.Vlachos.pdf
Uni Systems Copilot event_05062024_C.Vlachos.pdf
 
20240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 202420240605 QFM017 Machine Intelligence Reading List May 2024
20240605 QFM017 Machine Intelligence Reading List May 2024
 
Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...Building RAG with self-deployed Milvus vector database and Snowpark Container...
Building RAG with self-deployed Milvus vector database and Snowpark Container...
 
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...
 
UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5UiPath Test Automation using UiPath Test Suite series, part 5
UiPath Test Automation using UiPath Test Suite series, part 5
 
Monitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR EventsMonitoring Java Application Security with JDK Tools and JFR Events
Monitoring Java Application Security with JDK Tools and JFR Events
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1Communications Mining Series - Zero to Hero - Session 1
Communications Mining Series - Zero to Hero - Session 1
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Large Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial ApplicationsLarge Language Model (LLM) and it’s Geospatial Applications
Large Language Model (LLM) and it’s Geospatial Applications
 

CERN Data Centre Evolution

  • 1. CERN  Data  Centre  Evolution Gavin  McCance gavin.mccance@cern.ch @gmccance SDCD12:  Supporting  Science  with  Cloud  Computing Bern 19th November  2012
  • 2. What  is  CERN  ? Gavin  McCance,  CERN 2 • Conseil Européen pour  la   Recherche Nucléaire – aka   European  Laboratory  for   Particle  Physics • Between  Geneva  and  the   Jura  mountains,  straddling   the  Swiss-­‐French  border • Founded  in  1954  with  an   international  treaty • Our  business  is  fundamental   physics  ,  what  is  the   universe  made  of  and  how   does  it  work
  • 3. Gavin  McCance,  CERN 3 Answering fundamental questions… • How  to  explain particles have  mass? We have  theories and  accumulating experimental evidence..  Getting close… • What is 96%  of  the  universe made  of  ? We can only see 4%  of  its estimated mass! • Why isn’t there anti-­‐matter in  the  universe? Nature  should be symmetric… • What was the  state  of  matter just after the  « Big Bang »  ? Travelling  back  to  the  earliest instants  of the  universe would help…
  • 4. 4 The  Large  Hadron  Collider  (LHC)  tunnel Gavin  McCance,  CERN
  • 6. Gavin  McCance,  CERN 6 • Data  Centre  by  Numbers – Hardware  installation  &  retirement • ~7,000  hardware  movements/year;  ~1,800  disk  failures/year Xeon   5150 2% Xeon   5160 10% Xeon   E5335 7% Xeon   E5345 14% Xeon   E5405 6% Xeon   E5410 16% Xeon   L5420 8% Xeon   L5520 33% Xeon   3GHz 4% Fujitsu 3% Hitachi 23% HP 0% Maxtor 0% Seagate 15% Western   Digital 59% Other 0% High  Speed  Routers (640 Mbps  →  2.4  Tbps) 24 Ethernet  Switches 350 10  Gbps  ports 2,000 Switching  Capacity 4.8 Tbps 1  Gbps  ports 16,939 10  Gbps  ports 558 Racks 828 Servers 11,728 Processors 15,694 Cores 64,238 HEPSpec06 482,507 Disks 64,109 Raw  disk  capacity  (TiB) 63,289 Memory  modules 56,014 Memory  capacity  (TiB) 158 RAID  controllers 3,749 Tape  Drives 160 Tape  Cartridges 45,000 Tape  slots 56,000 Tape Capacity  (TiB) 73,000 IT  Power  Consumption 2,456  KW Total Power  Consumption 3,890  KW
  • 7. Current  infrastructure • Around  12k  servers – Dedicated  compute,  dedicated  disk  server,  dedicated  service  nodes – Majority  Scientific  Linux  (RHEL5/6  clone) – Mostly  running  on  real  hardware – Last  couple  of  years,  we’ve  consolidated  some  of  the  service  nodes   onto  Microsoft  HyperV – Various  other  virtualisation  projects  around • In  2002  we  developed  our  own  management  toolset – Quattor /  CDB  configuration  tool – Lemon  computer  monitoring – Open  source,  but  a  small  community Gavin  McCance,  CERN 7
  • 8. • Many  diverse  applications  (”clusters”)   • Managed  by  different  teams  (CERN  IT  +  experiment  groups) Gavin  McCance,  CERN 8
  • 9. New  data  centre  to  expand  capacity Gavin  McCance,  CERN 9 • Data  centre  in  Geneva   at  the  limit  of   electrical  capacity  at   3.5MW • New  centre  chosen  in   Budapest,  Hungary • Additional  2.7MW  of   usable  power • Hands  off  facility • Deploying  from  2013   with  200Gbit/s   network  to  CERN
  • 10. Time  to  change  strategy • Rationale – Need  to  manage  twice  the  servers  as  today – No  increase  in  staff  numbers – Tools  becoming  increasingly  brittle  and  will  not  scale  as-­‐is • Approach – CERN  is  no  longer  a  special  case  for  compute – Adopt  an  open  source  tool  chain  model – Our  engineers  rapidly  iterate • Evaluate  solutions  in  the  problem  domain • Identify  functional  gaps  and  challenge  old  assumptions • Select  first  choice  but  be  prepared  to  change  in  future – Contribute  new  function  back  to  the  community Gavin  McCance,  CERN 10
  • 11. Building  Blocks Gavin  McCance,  CERN 11 Bamboo   Koji,  Mock AIMS/PXE Foreman Yum  repo Pulp Puppet-­DB mcollective,  yum JIRA Lemon  / Hadoop git OpenStack   Nova Hardware   database Puppet Active  Directory  / LDAP
  • 12. Choose  Puppet  for  Configuration • The  tool  space  has  exploded  in  last  few  years – In  configuration  management  and  operations • Puppet and  Chef are  the  clear  leaders  for  ‘core  tools’ • Many  large  enterprises  now  use  Puppet – Its  declarative  approach  fits  what  we’re  used  to  at  CERN – Large  installations:  friendly,  wide-­‐based  community – You  can  buy  books  on  it – You  can  employ  people  who  know  it  better  than  do Gavin  McCance,  CERN 12
  • 13. Puppet  Experience • Excellent:  basic  puppet  is  easy  to  setup and  can  be  scaled-­‐up  well • Well  documented,  configuring  services  with  it  is  easy • Handle  our  cluster  diversity  and  dynamic  clouds  well • Lots  of  resource  (“modules”)  online,  though  of  varying  quality • Large,  responsive  community  to  help • Lots  of  nice  tooling  for  free – Configuration  version  control  and  branching:  integrates  well  with  git – Dashboard:  we  use  the  Foreman dashboard • We’re  moving  all  our  production  service  over  in  2013 Gavin  McCance,  CERN 13
  • 15. Preparing  the  move  to  cloud • Improve  operational  efficiency  and  dynamicness – Dynamic  multiple  operating  system  demand – Dynamic  temporary  load  spikes  for  special  activities – Hardware  interventions  with  long  running  programs  (live  migration) • Improve  resource  efficiency – Exploit  idle  resources,  especially  waiting  for  disk  and  tape  I/O – Highly  variable  load  such  as  interactive  or  build  machines • Enable  cloud  architectures – Gradual  migration  from  traditional  batch  +  disk  to  cloud  interfaces  and   workflows • Improve  responsiveness – Self-­‐Service  with  coffee  break  response  time Gavin  McCance,  CERN 15
  • 16. What  is  OpenStack  ? • OpenStack  is  a  cloud  operating  system  that  controls  large   pools  of  compute,  storage,  and  networking  resources   throughout  a  datacenter,  all  managed  through  a  dashboard   that  gives  administrators  control  while  empowering  their  users   to  provision  resources  through  a  web  interface Gavin  McCance,  CERN 16
  • 17. Service  Model Gavin  McCance,  CERN 17 • Pets  are  given  names  like   pussinboots.cern.ch   • They  are  unique,  lovingly  hand  raised   and  cared  for • When  they  get  ill,  you  nurse  them  back   to  health • Cattle  are  given  numbers  like   vm0042.cern.ch • They  are  almost  identical  to  other  cattle • When  they  get  ill,  you  get  another  one • Future  application  architectures  should  use  Cattle  but  Pets  with   strong  configuration  management  are  viable  and  still  needed Borrowed  from @randybias at  Cloudscaling http://www.slideshare.net/randybias/the-­‐cloud-­‐ revolution-­‐cyber-­‐press-­‐forum-­‐philippines
  • 18. Basic  Openstack Components Gavin  McCance,  CERN 18 Compute Scheduler NetworkVolume Registry Image KEYSTONE HORIZON NOVAGLANCE • Each  component  has  an  API  and  is  pluggable • Other  non-­‐core  projects  interact  with  these  components  
  • 19. Supporting  the  Pets  with  OpenStack • Network – Interfacing  with  legacy  site  DNS  and  IP  management – Ensuring  Kerberos  identity  before  VM  start • Puppet – Ease  use  of  configuration  management  tools  with  our  users – Exploit  mcollective  for  orchestration/delegation • External  Block  Storage – Currently  using  nova-­‐volume  with  Gluster backing  store • Live  migration  to  maximise  availability – KVM  live  migration  using  Gluster – KVM  and  Hyper-­‐V  block  migration Gavin  McCance,  CERN 19
  • 20. Current  Status  of  OpenStack  at  CERN • Working  on  an  Essex  code  base  from  the  EPEL  repository – Excellent  experience  with  the  Fedora  cloud-­‐sig  team – Cloud-­‐init for  contextualisation,  oz for  images  with  RHEL/Fedora • Components – Current  focus  is  on  Nova  with  KVM  and  Hyper-­‐V – Keystone  running  with  Active  Directory  and  Glance  for  Linux  and   Windows  images • Pre-­‐production  facility  with  around  200  Hypervisors,    with   2000  VMs  integrated  with  CERN  infrastructure – used  for  simulation  of  magnet  placement  using  LHC@Home and  batch   physics  programs Gavin  McCance,  CERN 20
  • 22. Next  Steps • Deploy  into  production  at  the  start  of  2013  with  Folsom  running   production  services  and  compute  on  top  of  OpenStack  IaaS • Support  multi-­‐site  operations  with  2nd data  centre  in  Hungary • Exploit  new  functionality – Ceilometer  for  metering – Bare  metal  for  non-­‐virtualised  use  cases  such  as  high  I/O  servers – X.509  user  certificate  authentication – Load  balancing  as  a  service Ramping  to  15K  hypervisors  with  100K   VMs  by  2015   Gavin  McCance,  CERN 22
  • 23. Conclusions • CERN  computer  centre  is  expanding • We’re  in  the  process  of  refurbishing  the  tools  we  use   to  manage  the  centre  based  on  Openstack for  IaaS and  Puppet for  configuration  management • Production  at  CERN  in  next  few  months  on  Folsom – Gradual  migration  of  all  our  services • Community  is  key  to  shared  success – CERN  contributes  and  benefits Gavin  McCance,  CERN 23
  • 25. Training  and  Support • Buy  the  book  rather  than  guru  mentoring • Follow  the  mailing  lists  to  learn • Newcomers  are  rapidly  productive  (and  often  know  more  than  us) • Community  and  Enterprise  support  means  we’re  not  on  our  own Gavin  McCance,  CERN 25
  • 26. Staff  Motivation • Skills  valuable  outside  of  CERN  when  an  engineer’s  contracts   end Gavin  McCance,  CERN 26
  • 27. When  communities  combine… • OpenStack’s  many  components  and  options  make   configuration  complex  out  of  the  box • Puppet  forge module  from  PuppetLabs  does  our  configuration • The  Foreman  adds  OpenStack  provisioning  for  user  kiosk  to  a   configured  machine  in  15  minutes Gavin  McCance,  CERN 27
  • 28. Foreman  to  manage  Puppetized VM Gavin  McCance,  CERN 28
  • 29. Active  Directory  Integration • CERN’s  Active  Directory – Unified  identity  management  across  the  site – 44,000  users – 29,000  groups – 200  arrivals/departures  per  month • Full  integration  with  Active  Directory  via  LDAP – Uses  the  OpenLDAP backend  with  some  particular  configuration   settings – Aim  for  minimal  changes  to  Active  Directory – 7  patches  submitted  around  hard  coded  values  and  additional  filtering • Now  in  use  in  our  pre-­‐production  instance – Map  project  roles  (admins,  members)  to  groups – Documentation  in  the  OpenStack  wiki Gavin  McCance,  CERN 29
  • 30. What  are  we  missing  (or  haven’t  found  yet)  ? • Best  practice  for – Monitoring  and  KPIs  as  part  of  core  functionality – Guest  disaster  recovery – Migration  between  versions  of  OpenStack • Roles  within  multi-­‐user  projects – VM  owner  allowed  to  manage  their  own  resources  (start/stop/delete) – Project  admins  allowed  to  manage  all  resources – Other  members  should  not  have  high  rights  over  other  members  VMs • Global  quota  management  for  non-­‐elastic  private  cloud – Manage  resource  prioritisation  and  allocation  centrally – Capacity  management  /  utilisation  for  planning Gavin  McCance,  CERN 30
  • 31. Opportunistic  Clouds  in  online  experiment  farms • The  CERN  experiments  have  farms  of  1000s  of  Linux  servers   close  to  the  detectors  to  filter  the  1PByte/s  down  to  6GByte/s   to  be  recorded  to  tape • When  the  accelerator  is  not  running,  these  machines  are   currently    idle – Accelerator  has  regular  maintenance  slots  of  several  days – Long  Shutdown  due  from  March  2013-­‐November  2014 • One  of  the  experiments  are  deploying  OpenStack  on  their  farm – Simulation  (low  I/O,  high  CPU) – Analysis  (high  I/O,  high  CPU,  high  network) Gavin  McCance,  CERN 31
  • 32. New  architecture  data  flows Gavin  McCance,  CERN 32

Editor's Notes

  1. Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  2. Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  3. The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  4. To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  5. So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  6. Asked member states for offers200Gbit/s links connecting the centresExpect to double computing capacity compared to today by 2015
  7. Double the capacity, same manpowerNeed to rethink how to solve the problem… look at how others approach itWe had our own tools in 2002 and as they become more sophisticated, it was not possible to take advantage of other developments elsewhere without a major break.Doing this while doing their ‘day’ jobs so it re-enforces the approach of taking what we can from the community
  8. Model based on Google Toolchain, Puppet is key for many operations. We’ve only had to write one new significant custom CERN software component which is in the certificate authority. Other parts such as Lemon for monitoring are from our previous implementation as we did not want to change all at once and they scale.
  9. Standardise hardware … buy in bulk and pile it up then work out what to use it forMemory, motherboards, cables or disks interventionsUsers waiting for I/O means wasted cycles. Build machines at night unused during the day. Interactive machines mainly during the dayMove to cloud APIs … need to support them but also maintain our existing applicationsDetails later on reception and testing
  10. Puppet applies well to the cattle model but we’re also using it to handle the pet cases that can’t yet move over due to software limitations. So, they get cloud provisioning but flexible configuration management.
  11. Complex to configure… take advantage of the experience of others
  12. We’ve been very pleased with our choices. Along with the obvious benefits of the functionality, there are soft benefits from the community model.
  13. Many staff at CERN are short term contracts… good benefits for those staff to leave with skills in need.
  14. Communities integrating … when a new option is being used at CERN in OpenStack, we contribute the changes back to the puppet forge such as certificate handling. Even looking at Hyper-V/Windows openstack configuration…