Big Data Analytics:
Unlocking the full potential of the Large Hadron Collider.
Manuel Martín Márquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
CERN
• CERN - European Laboratory for Particle Physics
• Founded in 1954 by 12 Countries for fundamental
physics research in a post-war Europe
• Major milestone in the post-World War II recovery/reconstruction
process
3Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
CERN openlab
4
• Public-private partnership between CERN and
leading ICT companies
• Accelerate cutting-edge solutions to be used by
the worldwide LHC community
• Train the next generation of top engineers and
scientists.
5Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
6Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
A World-Wide Collaboration
How the Universe works
and what is made of…
8Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
Fundamental Research
• Why do particles have mass?
9Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
Fundamental Research
• Why is there no antimatter left in the Universe?
• Nature should be symmetrical
• What was matter like during the first second of the
Universe, right after the "Big Bang"?
• A journey towards the beginning of the Universe
gives us deeper insight.
10Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
Fundamental Research
• What is 95% of the Universe made of?
11Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
12Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
1/20/2015 Document reference 13
The Large Hadron Collider (LHC)
Largest machine in the world
27km, 6000+ superconducting magnets
Emptiest place in the solar system
High vacuum inside the magnets
Hottest spot in the galaxy
During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun;
Fastest racetrack on Earth
Protons circulate 11245 times/s (99.9999991% the speed of light)
1/20/2015 Document reference 14
The Large Hadron Collider (LHC)
CERN’s Accelerator Complex
15Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
1/20/2015 Document reference 16
ATLAS Detector
150 Million of sensor
Control and detection sensors
Massive 3D camera
Capturing 40+ million collisions per second
Data rate TB per second
1/20/2015 Document reference 17
CMS Detector
Raw Data
Was a detector element hint?
How much energy?
What time?
Reconstructed Data
Particle Type
Origin
Momentum of tracks (4 vectors)
Energy in cluster (jets)
Calibration Information
18
19
CERN Control Centre
CERN Accelerator Complex is unique installation
Therefore, we have to face unique challenges
Control and Operations
Million of sensors, large number of control devices, front-end equipment, etc.
Many critical systems: Cryogenics, Vacuums, Machine Protection, etc.
Data Analytics Challenges
20
Access System,
527
Controls, 158
Cryogenics,
655
Electricity, 455
Fluids, 657
Other, 12
Heavy
Handling, 266
Safety
Systems, 233
Technical
Infrastructure,
124
LHC Corrective Intervention: 3087 / year
Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Data Analytics Challenges
• A look into the near Future
• LHC run 2 (2015)
21Manuel Martin Marquez
2015
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
Data Analytics Challenges
• Profit from our data investment
• Extracting knowledge.
• Optimize our systems is mandatory
• Reducing and predicting faults and corrective interventions
• Increase the availability and operations efficiency
• Control and Monitoring Systems
• Proactive
• Predictive
• Intelligent
22Manuel Martin Marquez
Intel IoT Ignition Lab – Cloud and Big Data
Munich, September 17th
Data & Analytics Environment
23Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
CERN’s Data Analytics Use Cases
24Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Network Monitoring fo
WLCG relies heavily on the u
networks
Interconnect sites and resources
Simone.Campana@cern.ch – OPENLAB Workshop
• WLCG relies heavily on the underlying networks
• Network Monitoring WLCG
• Correlation in time and topology
• Real-Time Analytics
• Root Cause Analysis
• Early warning systems
CERN’s Data Analytics Use Cases
25Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
• Intelligent Data Placement for CMS
• Resources optimization
• Minimize number of replicas
• Remove Obsolete
• Job time in data access
• Job time in data analysis
• Resources Prediction
SDC
IT- SDC D. Giordano 20 Nov 2013
CMSComputing Infrastructure
[CHEP2013 -The CMSData Manageme
CERN’s Data Analytics Use Cases
26Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
• CASTOR - CERN Advance Storage Manager
• CERN Mass Storage Solution
• Disk + Tapes
• 12k disks, 30k tapes
• Expert system
• Spot and ongoing incidents
• Predictive analysis
• Predict problem occurrences
CERN’s Data Analytics Use Cases
27Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
• CASTOR – Anomaly Detection
• Data to be transferred
• Queue data
• Prediction Model
• Real Vs Prediction
CERN’s Data Analytics Use Cases
28Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
• Control Systems
• Control system Health
• Gas Breakdown
• Predictive maintenance
• Cryogenics
• Vacuum
• Machine Protection
• Quench Detection
CERN Accelerators Control System
29Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
the API must be pre-registered. At the time of writing
there are 124 applications registered to a
heterogeneous client community, which collectively
account for an average of 5 million extraction
requests per day. Direct SQL access is not permitted.
· A generic Java GUI called TIMBER is also provided
as a means to visualize and extract logged data. The
tool is heavily used, with more than 800 active users.
Figure 2: Logging ervice architecture overview.
The Java APIs for both logging and extracting data are
significantly optimized for performance, and more
importantly – service stability. For data extraction
• Close to 1 million pre-defined signals
• Cryogenics temperatures,
• Magnetic field strengths, Power dissipation, Vacuum Pressures,
• Beam intensities and positions…etc…
• About 5 million daily/average data requests
• Throughput over 100TB/Year, 300TB in 2015
Largest Cryogenics Installation
30Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Largest Cryogenics Installation
31Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Largest Cryogenics Installation
32
• Study based on sectors
 L4, R4, L8 and R8.
• Sensor Outputs
• aperture order (%)
• aperture measured (%)
• Three different status:
 Faulty,
 Not faulty
 Unknown
Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Largest Cryogenics Installation
33
• Signals used:
• S = aperture order - aperture measured
• Features extractions based on S
• Variance
• Percentile 99.9
• Rope distance – R(S)
• Noise Band – B(S)
• Automatic Faulty Valves Detection System
• SVM - Support Vector Machine
Manuel Martin Marquez
Big Data & Analytics Innovation Summit
Singapore, February 27-28
Intel_IoT_Munich

Intel_IoT_Munich

  • 2.
    Big Data Analytics: Unlockingthe full potential of the Large Hadron Collider. Manuel Martín Márquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 3.
    CERN • CERN -European Laboratory for Particle Physics • Founded in 1954 by 12 Countries for fundamental physics research in a post-war Europe • Major milestone in the post-World War II recovery/reconstruction process 3Manuel Martin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 4.
    CERN openlab 4 • Public-privatepartnership between CERN and leading ICT companies • Accelerate cutting-edge solutions to be used by the worldwide LHC community • Train the next generation of top engineers and scientists.
  • 5.
    5Manuel Martin Marquez IntelIoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 6.
    6Manuel Martin Marquez IntelIoT Ignition Lab – Cloud and Big Data Munich, September 17th A World-Wide Collaboration
  • 7.
    How the Universeworks and what is made of…
  • 8.
    8Manuel Martin Marquez IntelIoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 9.
    Fundamental Research • Whydo particles have mass? 9Manuel Martin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 10.
    Fundamental Research • Whyis there no antimatter left in the Universe? • Nature should be symmetrical • What was matter like during the first second of the Universe, right after the "Big Bang"? • A journey towards the beginning of the Universe gives us deeper insight. 10Manuel Martin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 11.
    Fundamental Research • Whatis 95% of the Universe made of? 11Manuel Martin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 12.
    12Manuel Martin Marquez IntelIoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 13.
    1/20/2015 Document reference13 The Large Hadron Collider (LHC) Largest machine in the world 27km, 6000+ superconducting magnets Emptiest place in the solar system High vacuum inside the magnets Hottest spot in the galaxy During Lead ion collisions create temperatures 100 000x hotter than the heart of the sun; Fastest racetrack on Earth Protons circulate 11245 times/s (99.9999991% the speed of light)
  • 14.
    1/20/2015 Document reference14 The Large Hadron Collider (LHC)
  • 15.
    CERN’s Accelerator Complex 15ManuelMartin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 16.
    1/20/2015 Document reference16 ATLAS Detector 150 Million of sensor Control and detection sensors Massive 3D camera Capturing 40+ million collisions per second Data rate TB per second
  • 17.
    1/20/2015 Document reference17 CMS Detector Raw Data Was a detector element hint? How much energy? What time? Reconstructed Data Particle Type Origin Momentum of tracks (4 vectors) Energy in cluster (jets) Calibration Information
  • 18.
  • 19.
    19 CERN Control Centre CERNAccelerator Complex is unique installation Therefore, we have to face unique challenges Control and Operations Million of sensors, large number of control devices, front-end equipment, etc. Many critical systems: Cryogenics, Vacuums, Machine Protection, etc.
  • 20.
    Data Analytics Challenges 20 AccessSystem, 527 Controls, 158 Cryogenics, 655 Electricity, 455 Fluids, 657 Other, 12 Heavy Handling, 266 Safety Systems, 233 Technical Infrastructure, 124 LHC Corrective Intervention: 3087 / year Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28
  • 21.
    Data Analytics Challenges •A look into the near Future • LHC run 2 (2015) 21Manuel Martin Marquez 2015 Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 22.
    Data Analytics Challenges •Profit from our data investment • Extracting knowledge. • Optimize our systems is mandatory • Reducing and predicting faults and corrective interventions • Increase the availability and operations efficiency • Control and Monitoring Systems • Proactive • Predictive • Intelligent 22Manuel Martin Marquez Intel IoT Ignition Lab – Cloud and Big Data Munich, September 17th
  • 23.
    Data & AnalyticsEnvironment 23Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28
  • 24.
    CERN’s Data AnalyticsUse Cases 24Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 Network Monitoring fo WLCG relies heavily on the u networks Interconnect sites and resources Simone.Campana@cern.ch – OPENLAB Workshop • WLCG relies heavily on the underlying networks • Network Monitoring WLCG • Correlation in time and topology • Real-Time Analytics • Root Cause Analysis • Early warning systems
  • 25.
    CERN’s Data AnalyticsUse Cases 25Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 • Intelligent Data Placement for CMS • Resources optimization • Minimize number of replicas • Remove Obsolete • Job time in data access • Job time in data analysis • Resources Prediction SDC IT- SDC D. Giordano 20 Nov 2013 CMSComputing Infrastructure [CHEP2013 -The CMSData Manageme
  • 26.
    CERN’s Data AnalyticsUse Cases 26Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 • CASTOR - CERN Advance Storage Manager • CERN Mass Storage Solution • Disk + Tapes • 12k disks, 30k tapes • Expert system • Spot and ongoing incidents • Predictive analysis • Predict problem occurrences
  • 27.
    CERN’s Data AnalyticsUse Cases 27Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 • CASTOR – Anomaly Detection • Data to be transferred • Queue data • Prediction Model • Real Vs Prediction
  • 28.
    CERN’s Data AnalyticsUse Cases 28Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 • Control Systems • Control system Health • Gas Breakdown • Predictive maintenance • Cryogenics • Vacuum • Machine Protection • Quench Detection
  • 29.
    CERN Accelerators ControlSystem 29Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28 the API must be pre-registered. At the time of writing there are 124 applications registered to a heterogeneous client community, which collectively account for an average of 5 million extraction requests per day. Direct SQL access is not permitted. · A generic Java GUI called TIMBER is also provided as a means to visualize and extract logged data. The tool is heavily used, with more than 800 active users. Figure 2: Logging ervice architecture overview. The Java APIs for both logging and extracting data are significantly optimized for performance, and more importantly – service stability. For data extraction • Close to 1 million pre-defined signals • Cryogenics temperatures, • Magnetic field strengths, Power dissipation, Vacuum Pressures, • Beam intensities and positions…etc… • About 5 million daily/average data requests • Throughput over 100TB/Year, 300TB in 2015
  • 30.
    Largest Cryogenics Installation 30ManuelMartin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28
  • 31.
    Largest Cryogenics Installation 31ManuelMartin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28
  • 32.
    Largest Cryogenics Installation 32 •Study based on sectors  L4, R4, L8 and R8. • Sensor Outputs • aperture order (%) • aperture measured (%) • Three different status:  Faulty,  Not faulty  Unknown Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28
  • 33.
    Largest Cryogenics Installation 33 •Signals used: • S = aperture order - aperture measured • Features extractions based on S • Variance • Percentile 99.9 • Rope distance – R(S) • Noise Band – B(S) • Automatic Faulty Valves Detection System • SVM - Support Vector Machine Manuel Martin Marquez Big Data & Analytics Innovation Summit Singapore, February 27-28

Editor's Notes

  • #3 Introduce myself: 8 years data engineering, data analytics (make sense)
  • #15 But LHC is just the last part of an accelerator chain. Construction reducing cost and profit from the previous infrastructure.