• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
CERN User Story

CERN User Story



CERN, the European Organization for Nuclear Research, is one of the world’s largest centres for scientific research. Its business is fundamental physics, finding out what the universe is made of ...

CERN, the European Organization for Nuclear Research, is one of the world’s largest centres for scientific research. Its business is fundamental physics, finding out what the universe is made of and how it works. At CERN, accelerators such as the 27km Large Hadron Collider, are used to study the basic constituents of matter. This talk reviews the challenges to record and analyse the 25 Petabytes/year produced by the experiments and the investigations into how OpenStack could help to deliver a more agile computing infrastructure.



Total Views
Views on SlideShare
Embed Views



5 Embeds 93

http://a0.twimg.com 45
http://paper.li 36
http://us-w1.rockmelt.com 9
http://twitter.com 2
http://www.klektd.com 1



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • Established by an international treaty at the end of 2nd world war as a place where scientists could work together for fundamental researchNuclear is part of the name but our world is particle physics
  • Our current understanding of the universe is incomplete. A theory, called the Standard Model, proposes particles and forces, many of which have been experimentally observed. However, there are open questions- Why do some particles have mass and others not ? The Higgs Boson is a theory but we need experimental evidence.Our theory of forces does not explain how Gravity worksCosmologists can only find 4% of the matter in the universe, we have lost the other 96%We should have 50% matter, 50% anti-matter… why is there an asymmetry (although it is a good thing that there is since the two anhialiate each other) ?When we go back through time 13 billion years towards the big bang, we move back through planets, stars, atoms, protons/electrons towards a soup like quark gluon plasma. What were the properties of this?
  • Biggest international scientific collaboration in the world, over 10,000 scientistsfrom 100 countriesAnnual Budget around 1.1 billion USDFunding for CERN, the laboratory, itselfcomesfrom the 20 member states, in ratio to the grossdomesticproduct… other countries contribute to experimentsincludingsubstantial US contribution towards the LHC experiments
  • The LHC is CERN’s largest accelerator. A 17 mile ring 100 meters underground where two beams of particles are sent in opposite directions and collided at the 4 experiments, Atlas, CMS, LHCb and ALICE. Lake Geneva and the airport are visible in the top to give a scale.
  • CERN is more than just the LHCCNGS neutrinos to Gran Sasso faster than the speed of light?CLOUD demonstrating impacts of cosmic rays on weather patternsAnti-hydrogen atoms contained for minutes in a magnetic vesselHowever, for those of you who have read Dan Brown’s Angels and Demons or seen the film, there are no maniacal monks with pounds of anti-matter running around the campus
  • LHC was conceived in the 1980s and construction was started in 2002 within the tunnel of a previous accelerator called LEP6,000 magnets lowered down 100m shafts weighing up to 35 tons each
  • The ring consists of two beam pipes, with a vacuum pressure 10 times lower than on the moon which contain the beams of protons accelerated to just below the speed of light. These go round 11,000 times per second being bent by the superconducting magnets cooled to 2K by liquid helium (-450F), colder than outer space. The beams themselves have a total energy similar to a high speed train so care needs to be taken to make sure they turn the corners correctly and don’t bump into the walls of the pipe.
  • - At 4 points around the ring, the beams are made to cross at points where detectors, the size of cathedrals and weighing up to 12,500 tonnes surround the pipe. These are like digital camera, but they take 100 mega pixel photos 40 million times a second. This produces up to 1 petabyte/s.
  • - Collisions can be visualised by the tracks left in the various parts of the detectors. With many collisions, the statistics allows particle identification such as mass and charge. This is a simple one…
  • To improve the statistics, we send round beams of multiple bunches, as they cross there are multiple collisions as 100 billion protons per bunch pass through each otherSoftware close by the detector and later offline in the computer centre then has to examine the tracks to understand the particles involved
  • To get Quark Gluon plasma, the material closest to the big bang, we also collide lead ions which is much more intensive… the temperatures reach 100,000 times that in the sun.
  • - We cannot record 1PB/s so there are hardware filters to remove uninteresting collisions such as those whose physics we understand already. The data is then sent to the CERN computer centre for recording via 10Gbit optical connections.
  • The Worldwide LHC Computing grid is used to record and analyse this data. The grid currently runs around 1 million jobs/day, less than 10% of the work is done at CERN. There is an agreed set of protocols for running jobs, data distribution and accounting between all the sites which co-operate in order to support the physicists across the globe.
  • So, to the Tier-0 computer centre at CERN… we are unusual in that we are public with our environment as there is no competitive advantage for us. We have thousands of visitors a year coming for tours and education and the computer center is a popular visit.The data centre has around 2.9MW of usable power looking after 12,000 servers.. In comparison, the accelerator uses 120MW, like a small town.With 64,000 disks, we have around 1,800 failing each year… this is much higher than the manufacturers’ MTBFs which is consistent with results from Google.Servers are mainly Intel processors, some AMD with dual core Xeon being the most common configuration.
  • CERN has around 10,000 physicist programmersApplications split into data recording, analysis and simulation.It is high throughput computing, not high performance computing… no parallel programs required as each collision is independent and can be farmed out using commodity networkingMajority of servers are running SL, some RHEL for Oracle databases
  • We purchase on an annuak cycle, replacing around ¼ of the servers. This purchasing is based on performance metrics such as cost per SpecInt or cost/GBGenerally, we are seeing dual core computer servers with Intel or AMD processors and bulk storage servers with 24 or 36 2TB disksThe operating system is Redhatlinux based distributon called Scientific Linux. We share the development and maintenance with Fermilab in Chicago. The choice of a Redhat based distribution comes from the need for stability across the grid, where keeping the 200 centres running compatible Linux distributions.
  • Get burnt in quickly, production and retire lateShort vs long programs can vary by up to 1 week
  • Generally running 30,000 jobs in the Tier-0 with up to 110,000 waiting to run, especially as conferences approach and physicists prepare the last minute analysis.
  • Our data storage system has to record and preserve 25PB/year with an expected lifetime of 20 years. Keeping the old data is required to get the maximum statistics for discoveries. At times, physicists will want to skim this data looking for new physics. Data rates are around 6GB/s average, with peaks of 25GB/s.
  • Around 60,000 tape mounts / week so the robots are kept busy
  • Our service consolidation environment is intended to allow rapid machine requests such as development servers through to full servers with live migration for productionCurrently based on Hyper-V and using SCVMM, we have around 1,600 guests running a mixture of Linux and Windows
  • Provides virtual machines to run physics jobs such that the users do not see any different between a physical machine and a virtual oneCurrently based on OpenNebula providing EC2 APIs for experiments to investigate using clouds
  • Can we find a model where Compute and Mass Storage reside on the same server?
  • Previous tests performed with OpenNebulaBottlenecks were identified within CERN’s toolchain (LDAP and batch system) rather than with the orchestrator
  • These are items which we foresee as being potentially interesting in a few months time where we would like to discuss with other users of openstack to understand potential solutions.
  • Infrastructure as a Service with a vibrant open source implementation such as OpenStack can offer efficiency and agility to IT services, both private and publicAs more users and companies move towards production usage, we need to balance the rapid evolution with the need for stabilityAs demonstrated by the World Wide Web’s evolution from a CERN project to a global presence, a set of core standards allows innovation & competition. Let’s not forget in our enthusaism to enhance OpenStack that there will be more and more sites facing the classic issues of production stability and maintenance.With the good information sharing amongst the community such as these conferences, these can be addressed.
  • Peaks of up to 25GBytes/s to handle with averages of 6 over the year.

CERN User Story CERN User Story Presentation Transcript

  • Towards An Agile Infrastructure at CERN
    Tim Bell
    OpenStack Conference
    6th October 2011
  • What is CERN ?
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    ConseilEuropéen pour la RechercheNucléaire – aka European Laboratory for Particle Physics
    Between Geneva and the Jura mountains, straddling the Swiss-French border
    Founded in 1954 with an international treaty
    Our business is fundamental physics and how our universe works
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Answeringfundamental questions…
    • How to explainparticles have mass?
    We have theories but needexperimentalevidence
    • Whatis 96% of the universe made of ?
    Wecanonlysee 4% of itsestimated mass!
    • Whyisn’tthere anti-matterin the universe?
    Nature shouldbesymmetric…
    • Whatwas the state of matterjustafter the « Big Bang » ?
    Travelling back to the earliest instants of
    the universewould help…
  • Community collaboration on an international scale
    Tim Bell, CERN
    OpenStack Conference, Boston 2011
  • The Large Hadron Collider
    Tim Bell, CERN
    OpenStack Conference, Boston 2011
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • LHC construction
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • The Large Hadron Collider (LHC) tunnel
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Accumulating events in 2009-2011
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Heavy Ion Collisions
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Tier-0 (CERN):
    • Data recording
    • Initial data reconstruction
    • Data distribution
    Tier-1 (11 centres):
    • Permanent storage
    • Re-processing
    • Analysis
    Tier-2 (~200 centres):
    • Simulation
    • End-user analysis
    • Data is recorded at CERN and Tier-1s and analysed in the Worldwide LHC Computing Grid
    • In a normal day, the grid provides 100,000 CPU days executing 1 million jobs
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Data Centre by Numbers
    Hardware installation & retirement
    ~7,000 hardware movements/year; ~1,800 disk failures/year
  • Our Environment
    Our users
    Experiments build on top of our infrastructure and services to deliver application frameworks for the 10,000 physicists
    Our custom user applications split into
    Raw data processing from the accelerator and export to the world wide LHC computing grid
    Analysis of physics data
    We also have standard large organisation applications
    Payroll, Web, Mail, HR, …
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Our Infrastructure
    Hardware is generally based on commodity, white-box servers
    Open tendering process based on SpecInt/CHF, CHF/Watt and GB/CHF
    Compute nodes typically dual processor, 2GB per core
    Bulk storage on 24x2TB disk storage-in-a-box with a RAID card
    Vast majority of servers run Scientific Linux, developed by Fermilab and CERN, based on Redhat Enterprise
    Focus is on stability in view of the number of centres on the WLCG
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Our Challenges – Compute
    Optimise CPU resources
    Maximise production lifetime of servers
    Schedule interventions such as hardware repairs and OS patching
    Match memory and core requirements per job
    Reduce CPUs waiting idle for I/O
    Conflicting software requirements
    Different experiments want different libraries
    Maintenance of old programs needs old OSes
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Our Challenges – variable demand
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Our Challenges - Data storage
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    • 25PB/year to record
    • >20 years retention
    • 6GB/s average
    • 25GB/s peaks
  • OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Our Challenges – ‘minor’ other issues
    Living within a fixed envelope of 2.9MW available for computer centre
    Only 6kW/m2 without using water cooled racks (and no spare power)
    New capacity replaces old servers in same racks (as density is low)
    CERN staff headcount is fixed
    CERN IT budget reflects member states contributions
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Server Consolidation
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Batch Virtualisation
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Infrastructure as a Service Studies
    CERN has been using virtualisation on a small scale since 2007
    Server Consolidation with Microsoft System Centre VM manager and Hyper-V
    Virtual batch compute farm using OpenNebula and Platform ISF on KVM
    We are investigating moving to a cloud service provider model for infrastructure at CERN
    Virtualisation consolidation across multiple sites
    Bulk storage / Dropbox / …
    Improve efficiency
    Reduce operations effort
    Ease remote data centre support
    Enable cloud APIs
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack Infrastructure as a Service Studies
    Current Focus
    Converge the current virtualisation services into a single IaaS
    Test Swift for bulk storage, compatibility with S3 tools and resilience on commodity hardware
    Integrate OpenStack with CERN’s infrastructure such as LDAP and network databases
    Swift testbed (480TB) is being migrated to Diablo and expanded to 1PB with 10Ge networking
    48 Hypervisors running RHEL/KVM/Nova under test
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Areas where we struggled
    Networking configuration with Cactus
    Trying out new Network-as-a-Service Quantum functions in Diablo
    Redhat distribution base
    RPMs not yet in EPEL but Grid Dynamics RPMs helped
    Puppet manifests needed adapting and multiple sources from OpenStack and Puppetlabs
    Currently only testing with KVM
    We’ll try Hyper-V once Diablo/Hyper-V support is fully in place
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack investigations : next steps
    Homogeneous servers for both storage and batch ?
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • OpenStack investigations : next steps
    Scale testing with CERN’s toolchains to install and schedule 16,000 VMs
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Previous test results performed with OpenNebula
  • OpenStack investigations : next steps
    Investigate the commodity solutions for external volume storage
    Focus is on
    Reducing performance impact of I/O with virtualisation
    Enabling widespread use of live migration
    Understanding the future storage classes and service definitions
    Supporting remote data centre use cases
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Areas of interest looking forward
    Nova and Glance
    Scheduling VMs near to the data they need
    Managing the queue of requests when “no credit card” and no resources
    Orchestration of bare metal servers within OpenStack
    High performance transfers through the proxies without encryption
    Long term archiving for low power disks or tape
    Filling in the missing functions such as billing, availability and performance monitoring
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Final Thoughts
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    • A small project to share documents at CERN in the ‘90s created the massive phenomenon that is today’s world wide web
    • Open Source
    • Transparent governance
    • Basis for innovation and competition
    • Standard APIs where consensus
    • Stable production ready solutions
    • Vibrant eco-system
    • There is a strong need for a similar solution in the Infrastructure-as-a-Service space
    • The next year is going to be exciting for OpenStack as the project matures and faces the challenges of production deployments
  • References
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Backup Slides
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • CERN’s tools
    The world’s most powerful accelerator: LHC
    A 27 km long tunnel filled with high-tech instruments
    Equipped with thousands of superconducting magnets
    Accelerates particles to energies never before obtained
    Produces particle collisions creating microscopic “big bangs”
    Very large sophisticated detectors
    Four experiments each the size of a cathedral
    Hundred million measurement channels each
    Data acquisition systems treating Petabytes per second
    Top level computing to distribute and analyse the data
    A Computing Grid linking ~200 computer centres around the globe
    Sufficient computing power and storage to handle 25 Petabytes per year, making them available to thousands of physicists for analysis
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Other non-LHC experiments at CERN
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Superconducting magnets – October 2008
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Afaulty connection between two superconducting magnets led to the release of a large amount of helium into the LHC tunnel and forced the machine to shut down for repairs
  • CERN Computer Centre
    Tim Bell, CERN
    OpenStack Conference, Boston 2011
  • Our Challenges – keeping up to date
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • CPU capacity at CERN during ‘80s and ‘90s
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Testbed Configuration for Nova / Swift
    24 servers
    Single server configuration for both compute and storage
    Supermicro based systems
    Intel Xeon CPU L5520 @ 2.27GHz
    12GB memory
    10Ge connectivity
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
  • Data Rates at Tier-0
    OpenStack Conference, Boston 2011
    Tim Bell, CERN
    Typical tier-0 bandwidth
    Average in: 2 GB/s with peaks at 11.5 GB/s
    Average out: 6 GB/s with peaks at 25 GB/s
  • Web Site Activity
    OpenStack Conference, Boston 2011
    Tim Bell, CERN