Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Towards An Agile Infrastructure at CERN<br />Tim Bell<br />Tim.Bell@cern.ch<br />OpenStack Conference<br />6th October 201...
What is CERN ?<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />2<br />ConseilEuropéen pour la RechercheNuc...
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />3<br />Answeringfundamental questions…<br /><ul><li>How to expl...
Community collaboration on an international scale<br />Tim Bell, CERN<br />4<br />OpenStack Conference, Boston 2011<br />
The Large Hadron Collider<br />Tim Bell, CERN<br />5<br />OpenStack Conference, Boston 2011<br />
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />6<br />
LHC construction<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />7<br />
The Large Hadron Collider (LHC) tunnel<br />8<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />9<br />
Accumulating events in 2009-2011<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />10<br />
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />11<br />
Heavy Ion Collisions<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />12<br />
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />13<br />
OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />14<br />Tier-0 (CERN):<br /><ul><li>Data recording
Initial data reconstruction
Data distribution</li></ul>Tier-1 (11 centres):<br /><ul><li>Permanent storage
Re-processing
Analysis</li></ul>Tier-2  (~200 centres):<br /><ul><li> Simulation
 End-user analysis
Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
In a normal day, the grid provides 100,000 CPU days executing 1 million jobs</li></li></ul><li>OpenStack Conference, Bosto...
Our Environment<br />Our users<br />Experiments build on top of our infrastructure and services to deliver application fra...
Our Infrastructure<br />Hardware is generally based on commodity, white-box servers<br />Open tendering process based on S...
Our Challenges – Compute<br />Optimise CPU resources<br />Maximise production lifetime of servers<br />Schedule interventi...
Our Challenges – variable demand<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />19<br />
Our Challenges - Data storage<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />20<br /><ul><li>25PB/year to...
>20 years retention
6GB/s average
25GB/s peaks</li></li></ul><li>OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />21<br />
Our Challenges – ‘minor’ other issues<br />Power<br />Living within a fixed envelope of 2.9MW available for computer centr...
Server Consolidation<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />23<br />
Batch Virtualisation<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />24<br />
Infrastructure as a Service Studies<br />CERN has been using virtualisation on a small scale since 2007<br />Server Consol...
OpenStack Infrastructure as a Service Studies<br />Current Focus<br />Converge the current virtualisation services into a ...
Areas where we struggled<br />Networking configuration with Cactus<br />Trying out new Network-as-a-Service Quantum functi...
OpenStack investigations : next steps<br />Homogeneous servers for both storage and batch ?<br />OpenStack Conference, Bos...
OpenStack investigations : next steps<br />Scale testing with CERN’s toolchains to install and schedule 16,000 VMs<br />Op...
OpenStack investigations : next steps<br />Investigate the commodity solutions for external volume storage<br />Ceph<br />...
Areas of interest looking forward<br />Nova and Glance<br />Scheduling VMs near to the data they need<br />Managing the qu...
Final Thoughts<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />32<br /><ul><li>A small project to share do...
Open Source
Transparent governance
Basis for innovation and competition
Upcoming SlideShare
Loading in …5
×

CERN User Story

11,259 views

Published on

CERN, the European Organization for Nuclear Research, is one of the world’s largest centres for scientific research. Its business is fundamental physics, finding out what the universe is made of and how it works. At CERN, accelerators such as the 27km Large Hadron Collider, are used to study the basic constituents of matter. This talk reviews the challenges to record and analyse the 25 Petabytes/year produced by the experiments and the investigations into how OpenStack could help to deliver a more agile computing infrastructure.

Published in: Technology
  • Login to see the comments

CERN User Story

  1. 1. Towards An Agile Infrastructure at CERN<br />Tim Bell<br />Tim.Bell@cern.ch<br />OpenStack Conference<br />6th October 2011<br />1<br />
  2. 2. What is CERN ?<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />2<br />ConseilEuropéen pour la RechercheNucléaire – aka European Laboratory for Particle Physics<br />Between Geneva and the Jura mountains, straddling the Swiss-French border<br />Founded in 1954 with an international treaty<br />Our business is fundamental physics and how our universe works<br />
  3. 3. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />3<br />Answeringfundamental questions…<br /><ul><li>How to explainparticles have mass?</li></ul>We have theories but needexperimentalevidence<br /><ul><li>Whatis 96% of the universe made of ?</li></ul>Wecanonlysee 4% of itsestimated mass!<br /><ul><li>Whyisn’tthere anti-matterin the universe?</li></ul>Nature shouldbesymmetric…<br /><ul><li>Whatwas the state of matterjustafter the « Big Bang » ?</li></ul>Travelling back to the earliest instants of<br />the universewould help…<br />
  4. 4. Community collaboration on an international scale<br />Tim Bell, CERN<br />4<br />OpenStack Conference, Boston 2011<br />
  5. 5. The Large Hadron Collider<br />Tim Bell, CERN<br />5<br />OpenStack Conference, Boston 2011<br />
  6. 6. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />6<br />
  7. 7. LHC construction<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />7<br />
  8. 8. The Large Hadron Collider (LHC) tunnel<br />8<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />
  9. 9. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />9<br />
  10. 10. Accumulating events in 2009-2011<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />10<br />
  11. 11. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />11<br />
  12. 12. Heavy Ion Collisions<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />12<br />
  13. 13. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />13<br />
  14. 14. OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />14<br />Tier-0 (CERN):<br /><ul><li>Data recording
  15. 15. Initial data reconstruction
  16. 16. Data distribution</li></ul>Tier-1 (11 centres):<br /><ul><li>Permanent storage
  17. 17. Re-processing
  18. 18. Analysis</li></ul>Tier-2 (~200 centres):<br /><ul><li> Simulation
  19. 19. End-user analysis
  20. 20. Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
  21. 21. In a normal day, the grid provides 100,000 CPU days executing 1 million jobs</li></li></ul><li>OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />15<br />Data Centre by Numbers<br />Hardware installation & retirement<br />~7,000 hardware movements/year; ~1,800 disk failures/year<br />
  22. 22. Our Environment<br />Our users<br />Experiments build on top of our infrastructure and services to deliver application frameworks for the 10,000 physicists<br />Our custom user applications split into<br />Raw data processing from the accelerator and export to the world wide LHC computing grid<br />Analysis of physics data<br />Simulation<br />We also have standard large organisation applications<br />Payroll, Web, Mail, HR, …<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />16<br />
  23. 23. Our Infrastructure<br />Hardware is generally based on commodity, white-box servers<br />Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF<br />Compute nodes typically dual processor, 2GB per core<br />Bulk storage on 24x2TB disk storage-in-a-box with a RAID card<br />Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise<br />Focus is on stability in view of the number of centres on the WLCG<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />17<br />
  24. 24. Our Challenges – Compute<br />Optimise CPU resources<br />Maximise production lifetime of servers<br />Schedule interventions such as hardware repairs and OS patching<br />Match memory and core requirements per job<br />Reduce CPUs waiting idle for I/O<br />Conflicting software requirements<br />Different experiments want different libraries<br />Maintenance of old programs needs old OSes<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />18<br />
  25. 25. Our Challenges – variable demand<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />19<br />
  26. 26. Our Challenges - Data storage<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />20<br /><ul><li>25PB/year to record
  27. 27. >20 years retention
  28. 28. 6GB/s average
  29. 29. 25GB/s peaks</li></li></ul><li>OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />21<br />
  30. 30. Our Challenges – ‘minor’ other issues<br />Power<br />Living within a fixed envelope of 2.9MW available for computer centre<br />Cooling<br />Only 6kW/m2 without using water cooled racks (and no spare power) <br />Space<br />New capacity replaces old servers in same racks (as density is low)<br />Staff<br />CERN staff headcount is fixed<br />Budget<br />CERN IT budget reflects member states contributions<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />22<br />
  31. 31. Server Consolidation<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />23<br />
  32. 32. Batch Virtualisation<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />24<br />
  33. 33. Infrastructure as a Service Studies<br />CERN has been using virtualisation on a small scale since 2007<br />Server Consolidation with Microsoft System Centre VM manager and Hyper-V<br />Virtual batch compute farm using OpenNebula and Platform ISF on KVM<br />We are investigating moving to a cloud service provider model for infrastructure at CERN<br />Virtualisation consolidation across multiple sites<br />Bulk storage / Dropbox / …<br />Self-Service <br />Aims<br />Improve efficiency<br />Reduce operations effort<br />Ease remote data centre support<br />Enable cloud APIs<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />25<br />
  34. 34. OpenStack Infrastructure as a Service Studies<br />Current Focus<br />Converge the current virtualisation services into a single IaaS<br />Test Swift for bulk storage, compatibility with S3 tools and resilience on commodity hardware<br />Integrate OpenStack with CERN’s infrastructure such as LDAP and network databases<br />Status<br />Swift testbed (480TB) is being migrated to Diablo and expanded to 1PB with 10Ge networking<br />48 Hypervisors running RHEL/KVM/Nova under test<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />26<br />
  35. 35. Areas where we struggled<br />Networking configuration with Cactus<br />Trying out new Network-as-a-Service Quantum functions in Diablo<br />Redhat distribution base<br />RPMs not yet in EPEL but Grid Dynamics RPMs helped<br />Puppet manifests needed adapting and multiple sources from OpenStack and Puppetlabs<br />Currently only testing with KVM<br />We’ll try Hyper-V once Diablo/Hyper-V support is fully in place<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />27<br />
  36. 36. OpenStack investigations : next steps<br />Homogeneous servers for both storage and batch ?<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />28<br />
  37. 37. OpenStack investigations : next steps<br />Scale testing with CERN’s toolchains to install and schedule 16,000 VMs<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />29<br />Previous test results performed with OpenNebula<br />
  38. 38. OpenStack investigations : next steps<br />Investigate the commodity solutions for external volume storage<br />Ceph<br />Sheepdog<br />Gluster<br />...<br />Focus is on<br />Reducing performance impact of I/O with virtualisation<br />Enabling widespread use of live migration<br />Understanding the future storage classes and service definitions<br />Supporting remote data centre use cases<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />30<br />
  39. 39. Areas of interest looking forward<br />Nova and Glance<br />Scheduling VMs near to the data they need<br />Managing the queue of requests when “no credit card” and no resources<br />Orchestration of bare metal servers within OpenStack<br />Swift<br />High performance transfers through the proxies without encryption<br />Long term archiving for low power disks or tape<br />General<br />Filling in the missing functions such as billing, availability and performance monitoring<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />31<br />
  40. 40. Final Thoughts<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />32<br /><ul><li>A small project to share documents at CERN in the ‘90s created the massive phenomenon that is today’s world wide web
  41. 41. Open Source
  42. 42. Transparent governance
  43. 43. Basis for innovation and competition
  44. 44. Standard APIs where consensus
  45. 45. Stable production ready solutions
  46. 46. Vibrant eco-system
  47. 47. There is a strong need for a similar solution in the Infrastructure-as-a-Service space
  48. 48. The next year is going to be exciting for OpenStack as the project matures and faces the challenges of production deployments</li></li></ul><li>References<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />33<br />
  49. 49. Backup Slides<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />34<br />
  50. 50. CERN’s tools<br />The world’s most powerful accelerator: LHC<br />A 27 km long tunnel filled with high-tech instruments<br />Equipped with thousands of superconducting magnets<br />Accelerates particles to energies never before obtained<br />Produces particle collisions creating microscopic “big bangs”<br />Very large sophisticated detectors<br />Four experiments each the size of a cathedral<br />Hundred million measurement channels each<br />Data acquisition systems treating Petabytes per second<br />Top level computing to distribute and analyse the data<br />A Computing Grid linking ~200 computer centres around the globe<br />Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />35<br />
  51. 51. Other non-LHC experiments at CERN<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />36<br />
  52. 52. Superconducting magnets – October 2008<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />37<br />Afaulty connection between two superconducting magnets led to the release of a large amount of helium into the LHC tunnel and forced the machine to shut down for repairs<br />
  53. 53. CERN Computer Centre<br />Tim Bell, CERN<br />38<br />OpenStack Conference, Boston 2011<br />
  54. 54. Our Challenges – keeping up to date<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />39<br />
  55. 55. CPU capacity at CERN during ‘80s and ‘90s <br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />40<br />
  56. 56. Testbed Configuration for Nova / Swift<br />24 servers<br />Single server configuration for both compute and storage<br />Supermicro based systems<br />Intel Xeon CPU L5520 @ 2.27GHz<br />12GB memory<br />10Ge connectivity<br />IPMI<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />41<br />
  57. 57. Data Rates at Tier-0<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />42<br />Typical tier-0 bandwidth<br />Average in: 2 GB/s with peaks at 11.5 GB/s<br />Average out: 6 GB/s with peaks at 25 GB/s<br />
  58. 58. Web Site Activity<br />OpenStack Conference, Boston 2011<br />Tim Bell, CERN<br />43<br />

×