Big Data Management at CERN: The CMS Example
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Big Data Management at CERN: The CMS Example

on

  • 511 views

 

Statistics

Views

Total Views
511
Views on SlideShare
501
Embed Views
10

Actions

Likes
0
Downloads
12
Comments
0

3 Embeds 10

http://www.linkedin.com 8
http://cern-cms.iyte.edu.tr 1
https://www.linkedin.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Big Data Management at CERN: The CMS Example Presentation Transcript

  • 1. Big Data Management at CERN:
The CMS Example"J.A. Coarasa "CERN, Geneva, Switzerland"DBTA Workshop on Big Data, Cloud DataManagement and NoSQLOctober 10th 2012, Bern, Switzerland
  • 2. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleOutline"•  Introduction"•  The Large Hadron Collider (LHC)"•  The 4 big experiments"•  Data Along the Data Path:"•  Origin of the DATA"•  DB for the DATA"•  Other Data"•  The Big (experimental) DATA "J.A. Coarasa 2
  • 3. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example•  Introduction"•  The Large Hadron Collider (LHC)"•  The 4 experiments"J.A. Coarasa 3
  • 4. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleThe Large Hadron Collider (LHC)"J.A. Coarasa (CERN) 4
  • 5. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleThe Large Hadron Collider (LHC)"J.A. Coarasa (CERN) 5
  • 6. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Multipurpose Experiments"J.A. Coarasa (CERN) 6 ATLAS A Toroidal LHC ApparatuS µ CMS Compact Muon Solenoid µ
  • 7. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Specific Experiments"J.A. Coarasa (CERN) 7 The ALICE Collaboration built a dedicated heavy-ion detector to study the physics of strongly interacting matter at extreme energy densities, where the formation of a new phase of matter, the quark-gluon plasma, is expected. ALICE A Large Ion Collider Experiment LHCb (Study of CP violation in B-meson decays at the LHC collider)
  • 8. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Trigger and DAQ Summary"J.A. Coarasa (CERN) 8 "Trigger !Level-0,1,2 !Event !Readout !HLT Out!No. Levels ""Rate (Hz) "Size (Byte) "Bandw.(GB/s) "MB/s (Event/s)"""3 ! LV-1 105 !1.5x106 !4.5 !300 (2x102)"! LV-2 3x103 !!""2 ! LV-1 105 !106 !100 !O(1000) (102)"!!!! !!""2 !LV-0 106 !3x104 !30 !40 (2x102)
"
4 !Pb-Pb 500 !5x107 !25 !1250 (102)"!p-p 103 !2x106 ! !200 (102)"CMS!ATLAS!LHCb!ALICE!
  • 9. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleTrigger and DAQ trends in HEP"J.A. Coarasa (CERN) 9
  • 10. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example•  Data Along the Data Path:"•  Origin of the DATA"•  DB for the DATA"•  Other Data"•  The Big DATA "J.A. Coarasa 10
  • 11. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleThe CMS Experiment"•  The collaboration has around 4300active members"– 179 institutes"– 41 countries"•  The Detector"– Weight: 12,500t"– Diameter: 15m"– Length: 21.6m"– Magnetic field: 3.8T"– Channels: ~70,000,000"J.A. Coarasa (CERN) 11
  • 12. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS: The Data Path"J.A. Coarasa (CERN) 12 Raw Data:100 Gbit/sEvents:20 Gbit/sControls:1 Gbit/sTo Tier-1 centersControls:1 Gbit/s
  • 13. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS: Collisions Overview"J.A. Coarasa (CERN) 13 Collision rateEvent Rates: ~109 HzEvent Selection: ~1/1013
  • 14. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS: Data Origin, The DAQ"J.A. Coarasa (CERN) 14 Large Data Volumes (~100 Gbytes/s data flow, 20TB/day)!–  After Level 1 Trigger ~100 Gbytes/s (rate ~O(100) kHz) reach the eventbuilding (2 stages, ~2000 computers).!–  HLT filter cluster select 1 out 1000. Max. rate to tape: ~O(100) Hz "⇒  The storage manager (stores and forwards) can sustain a2GB/s traffic.!⇒  Up to ~300 Mbytes/s sustained forwarded to the CERN T0.(>20TB/day).!Detector Front-endComputing Services Readout Systems Builder and Filter Systems Event Manager Builder Networks Level 1 TriggerRun Control 40 MHz100 kHz100 Hz
  • 15. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleMore than 3000 computers mostly under Scientific Linux CERN 5:"–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3independent 1 Gbit Ethernet lines for data networking. (1280 cores);"–  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high leveltrigger computers with 2 Gbit Ethernet lines. (13312 cores);"–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernetlines for data networking and 2 additional ones for networking to Tier 0;"CMS: Online Computing"J.A. Coarasa 15 Highbandwidthnetworking
  • 16. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleMore than 3000 computers mostly under Scientific Linux CERN 5:"–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3independent 1 Gbit Ethernet lines for data networking. (1280 cores);"–  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high leveltrigger computers with 2 Gbit Ethernet lines. (13312 cores);"–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernetlines for data networking and 2 additional ones for networking to Tier 0;"–  More than 400 used by the subdetectors;"–  90 running Windows for Detector Control Systems;"–  12 computers as an ORACLE RAC;"–  12 computers as CMS control computers;"–  50 computers as desktop computers in the control rooms;"–  200 computers for commissioning, integration and testing;"–  15 computers as infrastructure and access servers;"–  250 active spare computers;"!⇒ Many different Roles!CMS: Online Computing"J.A. Coarasa 16 Highbandwidthnetworking
  • 17. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS: Online Networking"CMS Networks:"–  Private Networks:"•  Service Network ""(~3000 1 Gbit ports);"•  Data Network ""(~4000 1Gbit ports)"–  Source routing on computers"–  VLANs on switches "•  Central Data Recording ""(CDR). Network to Tier 0."•  Private networks for Oracle RAC"•  Private networks for subdetectors"–  Public CERN Network"J.A. Coarasa (CERN) 17 CMS NetworksCMSSitesComputer gatewaysReadout, HLTControl…FirewallInternetService Network"Data NetworksCDR NetworkCERN NetworkStorage Manager
  • 18. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS: “official” Databases"•  Configuration information"–  Detectors, DAQ, L1 trigger, HLT…"•  Run, Beam and Luminosity information"–  Info on which files are written sent to Tier-0, eLog…"•  Offline DB also hosting computing applications"–  Tier-0 workflow processing, Data distribution service(PhEDEx), Data Bookkeeping Service,…"•  Conditions data for offline reconstruction andanalysis"–  Critical data, exposed to a large community"J.A. Coarasa (CERN) 18
  • 19. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS Databases: Chalenge"•  Over 75 million channels in various detectors"•  Detector information in each channel"–  Conditions: Temperature, HV, LV, status…"–  Calibration: pedestals, charge/count…"–  Changes with time (temperature and radiation)"•  Necessary for performance monitoring"•  Subset used by offline reconstruction andphysics anaysis"–  Conditions data"–  Need to distribute to all Tier-N centres worldwide"J.A. Coarasa (CERN) 19
  • 20. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS DB Clients: Frontier"•  Offline (or HLT) reconstructions jobs couldcreate a large load on the DBs"–  Tens of thousands of jobs, few hundred querieseach"•  Frontier squid caches minimize the directaccess to Oracle servers"–  Additional latency as set by the cache refreshpolicy"–  Frontier service for Online"•  Used to distribute configuration and conditions to HLT"–  Frontier service for Offline (Tier-N)"•  Reading from “Snapshot” from Offline DB"•  Heavily used for reprocessing"J.A. Coarasa (CERN) 20
  • 21. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS Databases till end of 2011"J.A. Coarasa (CERN) 21 P5 CERN CCCMSONRCMINTRomdsorconotherCMSONRInactive stdbyomdsorconotheromdsorcoffComput.CMSRomdsorcoffComput.CMSRInactive stdbyOrcoffSnap.otherCMSARCOracle StreamsOracle StreamsOracle Data GuardOracleDataGuardINT2R INT9RCMS CERN CC Off-SitemaintestOracle 10Firewall
  • 22. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS DB space usage"•  DB growth about1.5Tbyte/year"– Both on/off-line"•  Condition data isonly a small fraction"– ~300Gbyte now"– Growth: +20Gbyte/year"– ~50 Global Tags /month"J.A. Coarasa (CERN) 22 01.534.56Dec 09 Dec 10 Dec 11DB size in TBCMSONR CMSR
  • 23. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS DB operations 2011"•  Smooth running"– CMSONR availability: 99.88%"•  10.5 hours downtime"– CMSR availability: 99.64%"•  30.7 hours downtime"– SQL query time stable (few msec)"J.A. Coarasa (CERN) 23 10 msBig Thankto CERNDBAs !!
  • 24. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS Databases in 2012"J.A. Coarasa (CERN) 24 P5 CERN CCCMSONRCMINTRomdsorconotherCMSONRInactive stdbyomdsorconotheromdsorcoffComput.CMSRomdsorcoffComput.CMSRInactive stdbyOrcoffSnap.otherCMSARCINT2R INT9ROracle StreamsOracle StreamsOracle Data GuardOracleDataGuardCMSONRactive stdbyomdsorconotherOracleDataGuardCMS CERN CC Off-SitemainOracle 11gActive Data GuardActive Data GuardFirewalltest
  • 25. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleOther CMS Documents"x    4000  people      …  for  many  decadesJ.A. Coarasa (CERN) 25
  • 26. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleOther CMS Documents: Size"A printed pile of all CMS documentsthat are already in a managed system= 1.0 x (Empire State building)Plus we have almost the same amountspread all over the place (PCs, afs, dfs,various  websites  …)J.A. Coarasa (CERN) 26
  • 27. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleOther CMS Documents: Size"No. of CMS Documents, May 2012J.A. Coarasa (CERN) 27
  • 28. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCMS website: cms.web.cern.ch"•  Migrated to anew drupalinfrastructure:"– Offers acoherent viewof all CMSinformation"J.A. Coarasa (CERN) 28 Public site focus is fresh news– Text, images, links, etc.– Keywords, groups/topics– Author / editor w owEmail no ca ons, RSS,Twi er, FB, G+ …Promote to home page orfeatures sliderPush to selected CMS groups
  • 29. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Offline Computing. The GRID"J.A. Coarasa (CERN) 29 Tier-0 (CERN) : recording – reconstruction – distributionTier-1 (~10 centres) : storage - reprocessing – analysisTier-2 (~140 centres) : simulation – end-user analysisThe GRID. A distributed computing infrastructure (~150 kCores), uniting resources of HEPinstitutes around the world to provide seamless access to CPU and storage for the LHCexperiments. A common solution for an unprecedented demand (in HEP) of computing power forphysics analysis.
  • 30. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleScale-free Networks"J.A. Coarasa (CERN) 30 On/Off-line TDAQ (and GRID) systems are, by construction, scale-free systems; they arecapable of operating efficiently, taking advantage of any additional resources that becomeavailable or as they change in size or volume of data handled.Other complex systems. e.g. the Word Wide Web, show the same behavior. This is theresult of the simple mechanism that allows networks to expand by the addition of newvertices which are attached to existing well-connect vertices.On-line (TDAQ)Off-line (GRID)SizePerformanceScale-free internet (2002 snapshot)!
  • 31. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Storage: Sizes"J.A. Coarasa (CERN) 31 152 M280 M18 M5 MALICE ATLAS CMS LHCB0 PB200 PB400 PB1791941691882012 2013DISK TAPETape StorageLHC Storage... the storage is aggregatedand virtualized by experimentframeworks348 PB 382 PBre on re are mor11
  • 32. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleLHC Storage: Big Instances"J.A. Coarasa (CERN) 32 0 PB6.5 PB13 PB1310510FNAL dCacheCERN EOSATLASFZK dCacheBNLVolume0 Mio35 Mio70 Mio10381065FNAL dCacheCERN EOSATLASFZK dCacheBNLObjects025005000120050002000FNAL dCacheCERN EOSATLASFZK dCacheDevices015030030023040FNAL dCacheCERN EOSATLASFZK dCacheServer NodesLHC Storage12
  • 33. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleBig Storages: Sizes"J.A. Coarasa (CERN) 33
  • 34. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleBig Storages: Number of Instances"J.A. Coarasa (CERN) 34
  • 35. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS ExampleCoincidence"J.A. Coarasa (CERN) 35 1997 CERN.A LHC event builder prototype1997 Stanford.A Web search engine prototype2008 The CMS HLT center on CESSYand hundreds Off-line GRID computing centres 105 cores2008 One of Google data center 106 cores
  • 36. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example""""""""Thank you. Questions?"J.A. Coarasa 36