Successfully reported this slideshow.

Academic cloud experiences cern v4

813 views

Published on

  • Be the first to comment

  • Be the first to like this

Academic cloud experiences cern v4

  1. 1. Clouds at CERNTim Belltim.bell@cern.chClouds at CERNTim Belltim.bell@cern.chAcademic Cloud Experiences, 29th April 2013Academic Cloud Experiences, 29th April 2013T. Bell 1
  2. 2. 2CERN was founded 1954: 12 European States“Science for Peace”Today: 20 Member StatesMember States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway,Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United KingdomCandidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, UkraineObservers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCOMember States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark,Finland, France, Germany, Greece, Hungary, Italy, the Netherlands, Norway,Poland, Portugal, Slovakia, Spain, Sweden, Switzerland andthe United KingdomCandidate for Accession: RomaniaAssociate Members in Pre-Stage to Membership: Israel, SerbiaApplicant States for Membership or Associate Membership:Brazil, Cyprus (awaiting ratification), Pakistan, Russia, Slovenia, Turkey, UkraineObservers to Council: India, Japan, Russia, Turkey, United States of America;European Commission and UNESCO~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHF~ 2300 staff~ 1000 other paid personnel> 11000 usersBudget (2013) ~1000 MCHFT. Bell 2
  3. 3. T. Bell 3Is the Higgs boson the source of mass of ourfundamental particles?
  4. 4. T. Bell 4Why is the universe madeof matterand not equal amounts of matter/antimatter?
  5. 5. T. Bell 5Dark Matter and Dark Energy?TTWe do not know thecomposition of95% of the universeTemperature of the universeWMAP satellite
  6. 6. T. Bell 6Blue tubes contain the two beam pipes and magnets at 1.8degrees Kelvin
  7. 7. T. Bell 7ATLAS detector during construction in 2005
  8. 8. T. Bell 8Number of candidates(vertical axis)Mass of the candidates(horizontal axis)We observe an excessof candidates with amass of 125 proton-massesSearch for Higgs decays to 4 “leptons” (electrons or muons)Also observed in the CMSexperiment
  9. 9. T. Bell 9July 4, 2012
  10. 10. The Worldwide LHC Computing GridTier-1: permanentstorage, re-processing,analysisTier-1: permanentstorage, re-processing,analysisTier-0 (CERN): datarecording,reconstruction anddistributionTier-0 (CERN): datarecording,reconstruction anddistributionTier-2: Simulation,end-user analysisTier-2: Simulation,end-user analysis> 2 million jobs/day> 2 million jobs/day~250’000 cores~250’000 cores173 PB of storage173 PB of storagenearly 160 sites,35 countriesnearly 160 sites,35 countries10 Gb links10 Gb linksTier-1: permanentstorage, re-processing,analysisTier-0 (CERN): datarecording,reconstruction anddistributionTier-2: Simulation,end-user analysis> 2 million jobs/day~250’000 cores173 PB of storagenearly 160 sites,35 countries10 Gb linksWLCG:An International collaboration to distribute and analyse LHC dataIntegrates computer centres worldwide that provide computing and storageresource into a single infrastructure accessible by all LHC physicistsWLCG:An International collaboration to distribute and analyse LHC dataIntegrates computer centres worldwide that provide computing and storageresource into a single infrastructure accessible by all LHC physicistsT. Bell 10
  11. 11. IT Infrastructure Challenges Staff numbers fixed Materials budget decreasing Increasing users of CERN’s facilities Legacy tools are high maintenance and brittle Additional data centre in Budapest now onlinedoubling potential capacity and 200GBit/snetworkHow do we scale from our current 11,000servers within these constraints ?T. Bell 11
  12. 12. Approach Remodel IT services on Cloud layeredmodels IaaS, PaaS, SaaS Move to commonly used open source tools Puppet,OpenStack,Foreman,Koji,Oz,Kibana, … Implement clouds at scale IT aims for 15,000 hypervisors with 150,000 VMsby 2015 Exploit ecosystem solutions such as LBaaS,DBaaS, MQaaS rather than build our ownT. Bell 12
  13. 13. Clouds in High Energy PhysicsT. Bell 13Long-term preservationof software and data ofHEP experimentsUtilize specialcomputing resourcesattached to thedetectorsSimplify the managementof heterogeneous in-house resourcesUse commercial cloudsfor exceptionalcomputing demandsDistributed cloudcomputing using HEPand non-HEP clouds
  14. 14. Service ModelsT. Bell 14 Pets are given names likepussinboots.cern.ch They are unique, lovingly hand raised andcared for When they get ill, you nurse them back tohealth Cattle are given numbers likevm0042.cern.ch They are almost identical to other cattle When they get ill, you get another oneFuture application architectures tend towards Cattle but Pet support is needed forsome specific zones of the cloud
  15. 15. Refine Service Levels ?T. Bell 15 Hippos are cattle with bulkstorage. Useful whereCassandra or MongoDBensures redundancy Canaries are cattle at highrisk to give early warning offailures .. Deploy early, failfast and fix
  16. 16. Infrastructure OverviewT. Bell 16Microsoft ActiveDirectoryCERN DBon DemandCERN NetworkDatabaseAccount mgmt.systemHorizonKeystoneNetworkComputeGlanceSchedulerCinderNovaCERN BlockStorage provider
  17. 17. Dashboard using HorizonT. Bell 17
  18. 18. Timelines Deploy as stable release becomes available inEPEL Keep up to date but not too close Benefit from continuous integration testing ofother companiesT. Bell 18Grizzly 12 Jan2013Feb Apr May … Oct Dec  13Today HavanaOct, 2013Havana ServiceNov/Dec, 2013Apr 4, 2013Grizzly ServiceMay, 2013IbexFeb, 2013FolsomSep 27, 2012
  19. 19. Status CERN IT OpenStack Cloud Running Folsom around 500 hypervisors on KVMand Hyper-V High availability using load balancing 75 users creating around 50 new VMs/day Experiment farms CMS currently running 1,300 hypervisors with50,000 cores using Essex ATLAS starting to ramp up to a similar size Other HEP sites moving to private cloud Brookhaven, IN2P3, FutureGrid, NeCTAR, IHEP,…T. Bell 19
  20. 20. Next Steps (I) Move to Grizzly Target end May 2013 Enable Kerberos and X.509 authentication Avoids users having to enter passwords Recycle existing hardware and scale usingcells Can recycle around 100 batch machines tohypervisors/weekT. Bell 20
  21. 21. CellsT. Bell 21
  22. 22. We’re not alone …T. Bell 22Already 6 sites running more than 10,000 hypervisorsaccording to the latest OpenStack user survey
  23. 23. Next Steps (II) Block Storage for Hippos and Pets Cinder with Ceph, NetApp or GlusterFS Heat for Orchestration and auto-scaling Load Balancing as a Service Bare-Metal to bring all servers underOpenStack Move ceilometer into production Accounting by project Move to wall-clock, vCPU meteringT. Bell 23
  24. 24. Cost Model CERN computing is funded from CERN centralbudgets, no billing but quotasT. Bell 24IT resource managerExperiment resource managersProject Management
  25. 25. Quota Management What to do when quota is exceeded ? No credit card If capacity is not used ? Spot market on low SLA conditions Fair share across the cloud ? Worked for supercomputers but heavy for cloudsat scale Bursting to public clouds an option ? IT provisioned or experiment decisionT. Bell 25
  26. 26. Cloud of clouds: the next big step What is required to get to a cloud of clouds ? Federated identity Image conversion and sharing API standardisation SLAs Security models Many initiatives investigating this at differentlevels Public/Private bursting Private/Private sharing (as the grid) Homogeneous and Heterogeneous We will see intensive efforts in this area overthe coming yearT. Bell 26
  27. 27. Conclusions Clouds provide a framework for re-engineering how ITis delivering responsive services to the physicists OpenStack and the ecosystem provide a suitablesolution with flexibility and opportunity to contribute aswell as benefit from work of others Migration via re-cycling bare-metal to hypervisorsprovides a smooth transition Cloud of clouds has potential to replace gridcomputing models in the futureT. Bell 27
  28. 28. Questions?Questions?T. Bell 28
  29. 29. BACKUP SLIDES
  30. 30. Job OpportunitiesT. Bell 30
  31. 31. Science is getting more and more globalCERN: x staff, x fellowsT. Bell 31

×