CERN: Big Science Meets Big Data

Uploaded on


More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads


Total Views
On Slideshare
From Embeds
Number of Embeds



Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

    No notes for slide


  • 1. CERN: Big Science Meets Big Data Dr Bob Jones CERN Bob.Jones <at>
  • 2. CERN was founded 1954: 12 European StatesToday: 20 Member States~ 2300 staff~ 790 other paid personnel> 10000 usersBudget ~1000 MCHF annual 20 Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom Candidates for Accession: Romania, Israel Observers to Council: India, Japan, the Russian Federation, the United States of America, Bob Jones - December 2012 2 2 Turkey, the European Commission and UNESCO
  • 3. Accelerating Science and Innovation Bob Jones - December 2012 3
  • 4. Data flow to permanent storage: 4-6 GB/sec200-400 MB/sec 1-2 GB/sec 1.25 GB/sec 1-2 GB/sec Bob Jones - December 2012 4
  • 5. WLCG – what and why?• A distributed computing  infrastructure to provide the  production and analysis  environments for the LHC  experiments• Managed and operated by a  worldwide collaboration between  the experiments and the  participating computer centres• The resources are distributed – for  funding and sociological reasons• Our task was to make use of the  Tier-0 (CERN): Tier-2 (~130 centres): resources available to us – no  • Data recording • Simulation matter where they are located • Initial data reconstruction • End-user analysis • Data distribution Tier-1 (11 centres):• Secure access via X509 certificates  •Permanent storage issued by network of national  •Re-processing authorities ‐ International Grid  •Analysis Trust Federation (IGTF) –  Bob Jones ‐ December  2012 5
  • 6. WLCG: Data Taking• Castor service at Tier 0 well adapted to  HI: ALICE data into Castor  > 4 GB/s (red) the load: – Heavy Ions: more than 6 GB/s to tape  (tests show that Castor can easily  support >12 GB/s); Actual limit  now  is network from experiment to CC – Major improvements in tape  efficiencies – tape writing at ~native  HI: Overall rates to tape > 6 GB/s (r+b) drive speeds. Fewer drives needed – ALICE had x3 compression for raw  data in HI runs Accumulated Data 2012 0.1 0.2 0.8 0.0 ALICE 0.8 1.3 AMS 0.9 1.2 ATLAS CMS 5.8 COMPASS 4.6 LHCB 23 PB data written in 2011 NA48
  • 7. 0 10000000 20000000 30000000 40000000 50000000 60000000 Jan‐09 Feb‐09 Mar‐09 Apr‐09 May‐09 Jun‐09 CMS LHCb ALICE Jul‐09 ATLAS Aug‐09 Sep‐09 Oct‐09 Nov‐09 Dec‐09 Jan‐10 Feb‐10 Mar‐10 1.5M jobs/day Apr‐10 May‐10 Jun‐10Ja Jul‐10 100000000 200000000 300000000 400000000 500000000 600000000 700000000 800000000 900000000 0 1E+09 n‐ 10Fe Aug‐10 b‐ 10 Sep‐10M ar ‐1 Oct‐10 0Ap Nov‐10 r‐1 0 Dec‐10M ay ‐1 Jan‐11 0 Ju Feb‐11 CMS LHCb n‐ ALICE 10 Mar‐11 Ju l‐1 ATLAS Apr‐11 0Au May‐11 g‐ 10 Jun‐11Se p‐ Jul‐11 10Oc Aug‐11 t‐ 1 0 Sep‐11No v‐ 10 Oct‐11De Nov‐11 c‐ 10 Dec‐11Ja n‐ 11 Jan‐12Fe b‐ 11M ar ‐1 1Ap r‐1 1 Overall use of WLCGM ay ‐1 1Ju n‐ 109 HEPSPEC‐hours/month 11 (~150 k CPU continuous use) Ju l ‐1 1Au g‐ 11 ‐ # jobs/daySe ‐ CPU usage p‐ 11Oc t‐ 1 1No year technical stop v‐ 11 Usage continues to De c‐ 11 grow even over end of 
  • 8. Broader Impact of the LHC  Computing Grid• WLCG has been leveraged on both sides of  the Atlantic, to benefit the wider scientific  community – Europe (EC FP7): • Enabling Grids for E‐sciencE Archeology (EGEE) 2004‐2010 Astronomy Astrophysics • European Grid Infrastructure  Civil Protection (EGI)  2010‐‐ Comp. Chemistry – USA (NSF): Earth Sciences • Open Science Grid (OSG)  Finance Fusion 2006‐2012  (+ extension?) Geophysics High Energy Physics• Many scientific applications  Life Sciences Multimedia Material Sciences Bob Jones ‐ December  2012 8 …
  • 9. How to evolve LHC data processingMaking what we have today more sustainable is a challenge• Big Data issues – Data management and access – How to make reliable and fault tolerant systems – Data preservation and open access• Need to adapt to changing technologies – Use of many‐core CPUs – Global filesystem‐like facilities – Virtualisation at all levels (including cloud computing services) • Network infrastructure – Has proved to be very reliable so invest in networks and make full  use of the distributed system Bob Jones ‐ December  2012 9
  • 10. CERN openlab in a nutshell• A science – industry partnership to drive R&D and innovation with over a decade of success• Evaluate state-of-the-art technologies in a challenging environment and improve them• Test in a research environment today what will be used in many business sectors tomorrow• Train next generation of engineers/employees CONTRIBUTOR (2012)• Disseminate results and outreach to new audiences Bob Jones - December 2012 10
  • 11. Virtuous Cycle CERN requirements push the limit Produce Apply new advanced techniques products and and services technologies Test Joint prototypes in development CERN in rapid environment cyclesA public-private partnership between the research community and industry Bob Jones - December 2012 11
  • 12. Bob Jones - December 2012 12
  • 13. A European Cloud Computing Partnership: big science teams up with big business To create an Earth  To support the  Setting up a new  Strategic Plan computing capacity  service to simplify  Observation platform,  focusing on  needs for the ATLAS  analysis of large   Establish multi‐tenant,  experiment genomes, for a deeper  earthquake and  multi‐provider cloud  volcano research insight into evolution  infrastructure and biodiversity  Identify and adopt policies  for trust, security and  privacy  Create governance  structure  Define funding schemes Twitter: HelixNebulaSC Web: 13
  • 14. Initial flagship use cases ATLAS High Energy Genomic Assembly SuperSites Exploitation Physics Cloud Use in the Cloud PlatformTo support the computing A new service to simplify To create an Earthcapacity needs for the large scale genome Observation platform,ATLAS experiment analysis; for a deeper focusing on earthquake insight into evolution and and volcano research biodiversity  Scientific challenges with societal impact  Sponsored by user organisations  Stretch what is possible with the cloud today 14 Bob Jones - December 2012
  • 15. Conclusions• Big Data Management and Analytics require a solid organisational structure at all levels • Corporate culture: our community started preparing more than a  decade before real physics data arrived• Now, the LHC data processing is under control but data rates will  continue to increase (dramatically) for years to come: Big Data at the  Exabyte scale • CERN is working with other science organisations and leading IT  companies to address our growing big‐data needs Bob Jones ‐ December  2012 15
  • 16. Accelerating Science and Innovation 16