Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Big Data Analytics in High-Energy Physics

1,854 views

Published on

There are four key issues to overcome if you want to tame Big Data: volume, variety, velocity and veracity. You have to be able to deal with lots and lots, of all kinds of data, moving really quickly.

Big Data Analytics has a huge impact on how we plan CERN’s overall technology strategy as well as specific strategies for High-Energy Physics analysis. We want to profit from our data investment and extract the knowledge. This has to be done in a proactive, predictive and intelligent way.

This presentation shows you how we use Big Data Analytics to improve the operation of the Large Hardron Collider. See also: http://alexloth.com/2012/06/03/challenges-big-data-analytics-high-energy-physics/

Published in: Data & Analytics
  • Be the first to comment

Big Data Analytics in High-Energy Physics

  1. 1. DB CERN CH-1211 Geneva 23 Switzerland www.cern.ch Big Data Analytics in High-Energy Physics Alexander Loth CERN 23 May 2012
  2. 2. CERN CH-1211 Geneva 23 Switzerland www.cern.ch CERN • CERN is the European Organization for Nuclear Research • Founded in 1954 by 12 countries for fundamental physics • Today: the global effort of 21 member states – About 1 billion CHF yearly budget – 3300 employees • Supporting the research activities of ~10000 scientists from 110+ nationalities Alexander Loth, 23 May 2012
  3. 3. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Fundamental Research at CERN • Why do particles have mass? • Why is there no antimatter left in the universe? • What was the state of universe just after the Big Bang? Alexander Loth, 23 May 2012
  4. 4. CERN CH-1211 Geneva 23 Switzerland www.cern.ch CERN Accelerator Complex Alexander Loth, 23 May 2012
  5. 5. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Potential of Big Data Analytics Stage 4: WISDOM Decisions Stage 3: KNOWLEDGE GENERATION  Reduce and predict faults and corrective interventions  Increase the availability and operations efficiency Predictions Reporting Visualization PROACTIVE PREDICTIVE INTELLIGENT Stage 2: INFORMATION RETRIEVAL Queries Statistics Analysis Stage 1: DATA COLLECTION AND STORAGE Data Integration Data Merging ETL CONTROL AND MONITORING SYSTEMS Alexander Loth, 23 May 2012
  6. 6. CERN CH-1211 Geneva 23 Switzerland www.cern.ch What about Business Intelligence? Traditional BI Big Data Analytics TBs to EBs of data External + Operational Un-/Semi- Structured Ad hoc GBs to TBs of data Operational Structured Repetitive Alexander Loth, 23 May 2012
  7. 7. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Challenges of Big Data Analytics VOLUME Scale of data: in 2011 humankind created 1200 EB of information VELOCITY Analysis of streaming data: worldwide digital content will double every 18 month VARIETY Different forms of data: 80% of data is unstructured CERN: 22PB/year, peaking 20GB/s, writing spread across 80 tape drives VERACITY Uncertainty of data: poor data quality costs $3.1 trillions a year Sources: The Economist, Gartner, IDC, McKinsey Alexander Loth, 23 May 2012
  8. 8. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Big Data Analytics Use Cases Alexander Loth, 23 May 2012
  9. 9. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Why using Hadoop at CERN? • System should manage and heal itself – Automatically and transparently route around failure – Speculatively execute redundant tasks if certain nodes are detected to be slow • Performance should scale linearly – Proportional change in capacity with resource change • Computing should move to data – Lower latency, lower bandwidth • Simple core that is modular and extensible Alexander Loth, 23 May 2012
  10. 10. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Hadoop Clusters at CERN • CASTOR Cluster with ~10 servers – ~100GB of logs per day – >100TB of logs in total • ATLAS Cluster with ~20 servers – Event index catalogue for experimental data in the Grid • Monitoring Cluster with ~10 servers – Log events from CERN Computer Cluster Alexander Loth, 23 May 2012
  11. 11. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Meta data from Physics Events (1) • Meta data are created upon recording of a physics event • Example 1: Event Information – Run number, Event number – Timestamp – Luminosity block number – Trigger that selected the event, etc. Alexander Loth, 23 May 2012
  12. 12. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Meta data from Physics Events (2) • Meta data are created upon recording of a physics event • Example 2: Tape Storage Event Log – On which tape is my file stored? – Is there a copy on a disk? – List me all events for a given tape or drive – Was the tape repacked? Alexander Loth, 23 May 2012
  13. 13. CERN CH-1211 Geneva 23 Switzerland www.cern.ch Questions? Alexander Loth, 23 May 2012

×