SlideShare a Scribd company logo
1 of 36
Download to read offline
Big Data Management at CERN:

The CMS Example"
J.A. Coarasa "
CERN, Geneva, Switzerland"
DBTA Workshop on Big Data, Cloud Data
Management and NoSQL
October 10th 2012, Bern, Switzerland
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Outline"
•  Introduction"
•  The Large Hadron Collider (LHC)"
•  The 4 big experiments"
•  Data Along the Data Path:"
•  Origin of the DATA"
•  DB for the DATA"
•  Other Data"
•  The Big (experimental) DATA "
J.A. Coarasa 2
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
•  Introduction"
•  The Large Hadron Collider (LHC)"
•  The 4 experiments"
J.A. Coarasa 3
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
The Large Hadron Collider (LHC)"
J.A. Coarasa (CERN) 4
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
The Large Hadron Collider (LHC)"
J.A. Coarasa (CERN) 5
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Multipurpose Experiments"
J.A. Coarasa (CERN) 6	

ATLAS	

 A Toroidal LHC ApparatuS	

µ	

CMS	

 Compact Muon Solenoid	

µ
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Specific Experiments"
J.A. Coarasa (CERN) 7	

The ALICE Collaboration built a	

 	

dedicated heavy-ion detector to study the physics	

	

of strongly interacting matter at extreme energy	

	

densities, where the formation of a new phase of	

	

matter, the quark-gluon plasma, is expected.	

ALICE	

A Large Ion Collider Experiment	

LHCb	

(Study of CP violation in B-meson	

	

decays at the LHC collider)
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Trigger and DAQ Summary"
J.A. Coarasa (CERN) 8	

"
Trigger !Level-0,1,2 !Event !Readout !HLT Out!
No. Levels ""Rate (Hz) "Size (Byte) "Bandw.(GB/s) "MB/s (Event/s)"
"
"
3 ! LV-1 105 !1.5x106 !4.5 !300 (2x102)"
! LV-2 3x103 !!
"
"
2 ! LV-1 105 !106 !100 !O(1000) (102)"
!!!! !!
"
"
2 !LV-0 106 !3x104 !30 !40 (2x102)

"


4 !Pb-Pb 500 !5x107 !25 !1250 (102)"
!p-p 103 !2x106 ! !200 (102)"
CMS!
ATLAS!
LHCb!
ALICE!
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Trigger and DAQ trends in HEP"
J.A. Coarasa (CERN) 9
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
•  Data Along the Data Path:"
•  Origin of the DATA"
•  DB for the DATA"
•  Other Data"
•  The Big DATA "
J.A. Coarasa 10
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
The CMS Experiment"
•  The collaboration has around 4300
active members"
– 179 institutes"
– 41 countries"
•  The Detector"
– Weight: 12,500t"
– Diameter: 15m"
– Length: 21.6m"
– Magnetic field: 3.8T"
– Channels: ~70,000,000"
J.A. Coarasa (CERN) 11
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS: The Data Path"
J.A. Coarasa (CERN) 12	

Raw Data:
100 Gbit/s
Events:
20 Gbit/s
Controls:
1 Gbit/s
To Tier-1 centers
Controls:
1 Gbit/s
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS: Collisions Overview"
J.A. Coarasa (CERN) 13	

Collision rate
Event Rates: ~109 Hz
Event Selection: ~1/1013
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS: Data Origin, The DAQ"
J.A. Coarasa (CERN) 14	

Large Data Volumes (~100 Gbytes/s data flow, 20TB/day)!
–  After Level 1 Trigger ~100 Gbytes/s (rate ~O(100) kHz) reach the event
building (2 stages, ~2000 computers).!
–  HLT filter cluster select 1 out 1000. Max. rate to tape: ~O(100) Hz "
⇒  The storage manager (stores and forwards) can sustain a
2GB/s traffic.!
⇒  Up to ~300 Mbytes/s sustained forwarded to the CERN T0.
(>20TB/day).!
Detector Front-end
Computing Services	

Readout	

Systems	

Builder and Filter	

Systems	

Event	

	

Manager	

 Builder Networks	

Level 1	

Trigger
Run	

Control	

40 MHz
100 kHz
100 Hz
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
More than 3000 computers mostly under Scientific Linux CERN 5:"
–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3
independent 1 Gbit Ethernet lines for data networking. (1280 cores);"
–  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high level
trigger computers with 2 Gbit Ethernet lines. (13312 cores);"
–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet
lines for data networking and 2 additional ones for networking to Tier 0;"
CMS: Online Computing"
J.A. Coarasa 15	

Highbandwidth
networking
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
More than 3000 computers mostly under Scientific Linux CERN 5:"
–  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3
independent 1 Gbit Ethernet lines for data networking. (1280 cores);"
–  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high level
trigger computers with 2 Gbit Ethernet lines. (13312 cores);"
–  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet
lines for data networking and 2 additional ones for networking to Tier 0;"
–  More than 400 used by the subdetectors;"
–  90 running Windows for Detector Control Systems;"
–  12 computers as an ORACLE RAC;"
–  12 computers as CMS control computers;"
–  50 computers as desktop computers in the control rooms;"
–  200 computers for commissioning, integration and testing;"
–  15 computers as infrastructure and access servers;"
–  250 active spare computers;"
!⇒ Many different Roles!
CMS: Online Computing"
J.A. Coarasa 16	

Highbandwidth
networking
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS: Online Networking"
CMS Networks:"
–  Private Networks:"
•  Service Network "
"(~3000 1 Gbit ports);"
•  Data Network "
"(~4000 1Gbit ports)"
–  Source routing on computers"
–  VLANs on switches "
•  Central Data Recording "
"(CDR). Network to Tier 0."
•  Private networks for Oracle RAC"
•  Private networks for subdetectors"
–  Public CERN Network"
J.A. Coarasa (CERN) 17	

CMS Networks
CMSSites
Computer gateways
Readout, HLTControl…
Firewall
Internet
Service Network"
Data Networks
CDR Network
CERN Network
Storage Manager
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS: “official” Databases"
•  Configuration information"
–  Detectors, DAQ, L1 trigger, HLT…"
•  Run, Beam and Luminosity information"
–  Info on which files are written sent to Tier-0, eLog…"
•  Offline DB also hosting computing applications"
–  Tier-0 workflow processing, Data distribution service
(PhEDEx), Data Bookkeeping Service,…"
•  Conditions data for offline reconstruction and
analysis"
–  Critical data, exposed to a large community"
J.A. Coarasa (CERN) 18
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS Databases: Chalenge"
•  Over 75 million channels in various detectors"
•  Detector information in each channel"
–  Conditions: Temperature, HV, LV, status…"
–  Calibration: pedestals, charge/count…"
–  Changes with time (temperature and radiation)"
•  Necessary for performance monitoring"
•  Subset used by offline reconstruction and
physics anaysis"
–  Conditions data"
–  Need to distribute to all Tier-N centres worldwide"
J.A. Coarasa (CERN) 19
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS DB Clients: Frontier"
•  Offline (or HLT) reconstructions jobs could
create a large load on the DBs"
–  Tens of thousands of jobs, few hundred queries
each"
•  Frontier squid caches minimize the direct
access to Oracle servers"
–  Additional latency as set by the cache refresh
policy"
–  Frontier service for Online"
•  Used to distribute configuration and conditions to HLT"
–  Frontier service for Offline (Tier-N)"
•  Reading from “Snapshot” from Offline DB"
•  Heavily used for reprocessing"
J.A. Coarasa (CERN) 20
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS Databases till end of 2011"
J.A. Coarasa (CERN) 21	

P5 CERN CC
CMSONR
CMINTR
omds
orcon
other
CMSONR
Inactive stdby
omds
orcon
other
omds
orcoff
Comput.
CMSR
omds
orcoff
Comput.
CMSR
Inactive stdby
Orcoff
Snap.
other
CMSARC
Oracle Streams
Oracle Streams
Oracle Data Guard
O
racle
D
ata
G
uard
INT2R INT9R
CMS CERN CC Off-Site
main
test
Oracle 10
Firewall
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS DB space usage"
•  DB growth about
1.5Tbyte/year"
– Both on/off-line"
•  Condition data is
only a small fraction"
– ~300Gbyte now"
– Growth: +20Gbyte/
year"
– ~50 Global Tags /
month"
J.A. Coarasa (CERN) 22	

0
1.5
3
4.5
6
Dec 09 Dec 10 Dec 11
DB size in TB
CMSONR CMSR
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS DB operations 2011"
•  Smooth running"
– CMSONR availability: 99.88%"
•  10.5 hours downtime"
– CMSR availability: 99.64%"
•  30.7 hours downtime"
– SQL query time stable (few msec)"
J.A. Coarasa (CERN) 23	

10 ms
Big Thank
to CERN
DBAs !!
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS Databases in 2012"
J.A. Coarasa (CERN) 24	

P5 CERN CC
CMSONR
CMINTR
omds
orcon
other
CMSONR
Inactive stdby
omds
orcon
other
omds
orcoff
Comput.
CMSR
omds
orcoff
Comput.
CMSR
Inactive stdby
Orcoff
Snap.
other
CMSARC
INT2R INT9R
Oracle Streams
Oracle Streams
Oracle Data Guard
O
racle
D
ata
G
uard
CMSONR
active stdby
omds
orcon
other
OracleDataGuard
CMS CERN CC Off-Site
main
Oracle 11g
Active Data Guard
Active Data Guard
Firewall
test
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Other CMS Documents"
x    4000  people      …  for  many  decades
J.A. Coarasa (CERN) 25
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Other CMS Documents: Size"
A printed pile of all CMS documents
that are already in a managed system
= 1.0 x (Empire State building)
Plus we have almost the same amount
spread all over the place (PCs, afs, dfs,
various  websites  …)
J.A. Coarasa (CERN) 26
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Other CMS Documents: Size"
No. of CMS Documents, May 2012
J.A. Coarasa (CERN) 27
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
CMS website: cms.web.cern.ch"
•  Migrated to a
new drupal
infrastructure:"
– Offers a
coherent view
of all CMS
information"
J.A. Coarasa (CERN) 28	

Public site focus is fresh news
– Text, images, links, etc.
– Keywords, groups/topics
– Author / editor w ow
Email no ca ons, RSS,
Twi er, FB, G+ …
Promote to home page or
features slider
Push to selected CMS groups
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Offline Computing. The GRID"
J.A. Coarasa (CERN) 29	

Tier-0 (CERN) : recording – reconstruction – distribution
Tier-1 (~10 centres) : storage - reprocessing – analysis
Tier-2 (~140 centres) : simulation – end-user analysis
The GRID. A distributed computing infrastructure (~150 kCores), uniting resources of HEP
institutes around the world to provide seamless access to CPU and storage for the LHC
experiments. A common solution for an unprecedented demand (in HEP) of computing power for
physics analysis.
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Scale-free Networks"
J.A. Coarasa (CERN) 30	

On/Off-line TDAQ (and GRID) systems are, by construction, scale-free systems; they are
capable of operating efficiently, taking advantage of any additional resources that become
available or as they change in size or volume of data handled.
Other complex systems. e.g. the Word Wide Web, show the same behavior. This is the
result of the simple mechanism that allows networks to expand by the addition of new
vertices which are attached to existing well-connect vertices.
On-line (TDAQ)
Off-line (GRID)
Size
Performance
Scale-free internet (2002 snapshot)!
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Storage: Sizes"
J.A. Coarasa (CERN) 31	

152 M
280 M
18 M
5 M
ALICE ATLAS CMS LHCB
0 PB
200 PB
400 PB
179
194
169
188
2012 2013
DISK TAPE
Tape StorageLHC Storage
... the storage is aggregated
and virtualized by experiment
frameworks
348 PB 382 PB
re on re are mor
11
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
LHC Storage: Big Instances"
J.A. Coarasa (CERN) 32	

0 PB
6.5 PB
13 PB13
10
5
10
FNAL dCache
CERN EOSATLAS
FZK dCache
BNL
Volume
0 Mio
35 Mio
70 Mio
10
38
10
65
FNAL dCache
CERN EOSATLAS
FZK dCache
BNL
Objects
0
2500
5000
1200
5000
2000
FNAL dCache
CERN EOSATLAS
FZK dCache
Devices
0
150
300300
230
40
FNAL dCache
CERN EOSATLAS
FZK dCache
Server Nodes
LHC Storage
12
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Big Storages: Sizes"
J.A. Coarasa (CERN) 33
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Big Storages: Number of Instances"
J.A. Coarasa (CERN) 34
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
Coincidence"
J.A. Coarasa (CERN) 35	

1997 CERN.
A LHC event builder prototype
1997 Stanford.
A Web search engine prototype
2008 The CMS HLT center on CESSY
and hundreds Off-line GRID computing centres 105 cores
2008 One of Google data center 106 cores
DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example
"
"
"
"
"
"
"
"
Thank you. Questions?"
J.A. Coarasa 36

More Related Content

What's hot

Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeJason Shih
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellAmrita Prasad
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCjuanrojochacon
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNCeph Community
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Amazon Web Services
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASAIan Foster
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...Igor Sfiligoi
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runIgor Sfiligoi
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstIgor Sfiligoi
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemJayjeetChakraborty
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Igor Sfiligoi
 
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHelix Nebula The Science Cloud
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopPiotr Turek
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"Frank Wuerthwein
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...J On The Beach
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snwScott Adams
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Igor Sfiligoi
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenNETWAYS
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014Tim Bell
 

What's hot (20)

Hpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challengeHpc, grid and cloud computing - the past, present, and future challenge
Hpc, grid and cloud computing - the past, present, and future challenge
 
OpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim BellOpenStack @ CERN, by Tim Bell
OpenStack @ CERN, by Tim Bell
 
Parton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHCParton Distributions: future needs, and the role of the High-Luminosity LHC
Parton Distributions: future needs, and the role of the High-Luminosity LHC
 
London Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERNLondon Ceph Day: Ceph at CERN
London Ceph Day: Ceph at CERN
 
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
Finding New Sub-Atomic Particles on the AWS Cloud (BDT402) | AWS re:Invent 2013
 
Lxcloud
LxcloudLxcloud
Lxcloud
 
Big Process for Big Data @ NASA
Big Process for Big Data @ NASABig Process for Big Data @ NASA
Big Process for Big Data @ NASA
 
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic... NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
NRP Engagement webinar - Running a 51k GPU multi-cloud burst for MMA with Ic...
 
Burst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud runBurst data retrieval after 50k GPU Cloud run
Burst data retrieval after 50k GPU Cloud run
 
Data-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud BurstData-intensive IceCube Cloud Burst
Data-intensive IceCube Cloud Burst
 
SkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage SystemSkyhookDM - Towards an Arrow-Native Storage System
SkyhookDM - Towards an Arrow-Native Storage System
 
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
Demonstrating a Pre-Exascale, Cost-Effective Multi-Cloud Environment for Scie...
 
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in ShanghaiHNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
HNSciCloud represented at HUAWEI CONNECT 2017 in Shanghai
 
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and HadoopUnraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
Unraveling mysteries of the Universe at CERN, with OpenStack and Hadoop
 
"Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner""Building and running the cloud GPU vacuum cleaner"
"Building and running the cloud GPU vacuum cleaner"
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
Cern intro 2010-10-27-snw
Cern intro 2010-10-27-snwCern intro 2010-10-27-snw
Cern intro 2010-10-27-snw
 
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
Running a GPU burst for Multi-Messenger Astrophysics with IceCube across all ...
 
OSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe HaenOSMC 2012 | Monitoring at CERN by Christophe Haen
OSMC 2012 | Monitoring at CERN by Christophe Haen
 
CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014CERN Mass and Agility talk at OSCON 2014
CERN Mass and Agility talk at OSCON 2014
 

Similar to Big Data Management at CERN: The CMS Example

The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...
The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...
The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...Jose Antonio Coarasa Perez
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudMongoDB
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?Robert Grossman
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLucidworks
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkQAware GmbH
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Sun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHSun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHMark Rabne
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)Jose Antonio Coarasa Perez
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCSheetal Dolas
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale SupercomputerSagar Dolas
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersGlenn K. Lockwood
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackTon Ngo
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийGeeksLab Odessa
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranJoseph Glorieux
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopDatabricks
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL Bernd Ocklin
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHungWei Chiu
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationBigstep
 
Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Jisc
 

Similar to Big Data Management at CERN: The CMS Example (20)

The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...
The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...
The CMS Online Cluster: 
 Setup, Operation and Maintenance 
 of an Evolving C...
 
High Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal CloudHigh Performance, Scalable MongoDB in a Bare Metal Cloud
High Performance, Scalable MongoDB in a Bare Metal Cloud
 
Exascale Capabl
Exascale CapablExascale Capabl
Exascale Capabl
 
What Are Science Clouds?
What Are Science Clouds?What Are Science Clouds?
What Are Science Clouds?
 
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAwareLeveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
Leveraging the Power of Solr with Spark: Presented by Johannes Weigend, QAware
 
Leveraging the Power of Solr with Spark
Leveraging the Power of Solr with SparkLeveraging the Power of Solr with Spark
Leveraging the Power of Solr with Spark
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Sun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWHSun Oracle Exadata V2 For OLTP And DWH
Sun Oracle Exadata V2 For OLTP And DWH
 
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
The CMS openstack, opportunistic, overlay, online-cluster Cloud (CMSooooCloud)
 
Open Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOCOpen Security Operations Center - OpenSOC
Open Security Operations Center - OpenSOC
 
Programmable Exascale Supercomputer
Programmable Exascale SupercomputerProgrammable Exascale Supercomputer
Programmable Exascale Supercomputer
 
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC ClustersSR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
SR-IOV: The Key Enabling Technology for Fully Virtualized HPC Clusters
 
Toward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStackToward 10,000 Containers on OpenStack
Toward 10,000 Containers on OpenStack
 
Tweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский ДмитрийTweaking perfomance on high-load projects_Думанский Дмитрий
Tweaking perfomance on high-load projects_Думанский Дмитрий
 
Afterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écranAfterwork big data et data viz - du lac à votre écran
Afterwork big data et data viz - du lac à votre écran
 
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a LaptopProject Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
Project Tungsten Phase II: Joining a Billion Rows per Second on a Laptop
 
MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL MySQL NDB Cluster 8.0 SQL faster than NoSQL
MySQL NDB Cluster 8.0 SQL faster than NoSQL
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Memory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and VirtualizationMemory, Big Data, NoSQL and Virtualization
Memory, Big Data, NoSQL and Virtualization
 
Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...Lessons learned from shifting real data around: An ad hoc data challenge from...
Lessons learned from shifting real data around: An ad hoc data challenge from...
 

Recently uploaded

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Big Data Management at CERN: The CMS Example

  • 1. Big Data Management at CERN:
 The CMS Example" J.A. Coarasa " CERN, Geneva, Switzerland" DBTA Workshop on Big Data, Cloud Data Management and NoSQL October 10th 2012, Bern, Switzerland
  • 2. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Outline" •  Introduction" •  The Large Hadron Collider (LHC)" •  The 4 big experiments" •  Data Along the Data Path:" •  Origin of the DATA" •  DB for the DATA" •  Other Data" •  The Big (experimental) DATA " J.A. Coarasa 2
  • 3. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example •  Introduction" •  The Large Hadron Collider (LHC)" •  The 4 experiments" J.A. Coarasa 3
  • 4. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example The Large Hadron Collider (LHC)" J.A. Coarasa (CERN) 4
  • 5. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example The Large Hadron Collider (LHC)" J.A. Coarasa (CERN) 5
  • 6. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Multipurpose Experiments" J.A. Coarasa (CERN) 6 ATLAS A Toroidal LHC ApparatuS µ CMS Compact Muon Solenoid µ
  • 7. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Specific Experiments" J.A. Coarasa (CERN) 7 The ALICE Collaboration built a dedicated heavy-ion detector to study the physics of strongly interacting matter at extreme energy densities, where the formation of a new phase of matter, the quark-gluon plasma, is expected. ALICE A Large Ion Collider Experiment LHCb (Study of CP violation in B-meson decays at the LHC collider)
  • 8. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Trigger and DAQ Summary" J.A. Coarasa (CERN) 8 " Trigger !Level-0,1,2 !Event !Readout !HLT Out! No. Levels ""Rate (Hz) "Size (Byte) "Bandw.(GB/s) "MB/s (Event/s)" " " 3 ! LV-1 105 !1.5x106 !4.5 !300 (2x102)" ! LV-2 3x103 !! " " 2 ! LV-1 105 !106 !100 !O(1000) (102)" !!!! !! " " 2 !LV-0 106 !3x104 !30 !40 (2x102)
 " 
 4 !Pb-Pb 500 !5x107 !25 !1250 (102)" !p-p 103 !2x106 ! !200 (102)" CMS! ATLAS! LHCb! ALICE!
  • 9. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Trigger and DAQ trends in HEP" J.A. Coarasa (CERN) 9
  • 10. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example •  Data Along the Data Path:" •  Origin of the DATA" •  DB for the DATA" •  Other Data" •  The Big DATA " J.A. Coarasa 10
  • 11. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example The CMS Experiment" •  The collaboration has around 4300 active members" – 179 institutes" – 41 countries" •  The Detector" – Weight: 12,500t" – Diameter: 15m" – Length: 21.6m" – Magnetic field: 3.8T" – Channels: ~70,000,000" J.A. Coarasa (CERN) 11
  • 12. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS: The Data Path" J.A. Coarasa (CERN) 12 Raw Data: 100 Gbit/s Events: 20 Gbit/s Controls: 1 Gbit/s To Tier-1 centers Controls: 1 Gbit/s
  • 13. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS: Collisions Overview" J.A. Coarasa (CERN) 13 Collision rate Event Rates: ~109 Hz Event Selection: ~1/1013
  • 14. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS: Data Origin, The DAQ" J.A. Coarasa (CERN) 14 Large Data Volumes (~100 Gbytes/s data flow, 20TB/day)! –  After Level 1 Trigger ~100 Gbytes/s (rate ~O(100) kHz) reach the event building (2 stages, ~2000 computers).! –  HLT filter cluster select 1 out 1000. Max. rate to tape: ~O(100) Hz " ⇒  The storage manager (stores and forwards) can sustain a 2GB/s traffic.! ⇒  Up to ~300 Mbytes/s sustained forwarded to the CERN T0. (>20TB/day).! Detector Front-end Computing Services Readout Systems Builder and Filter Systems Event Manager Builder Networks Level 1 Trigger Run Control 40 MHz 100 kHz 100 Hz
  • 15. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example More than 3000 computers mostly under Scientific Linux CERN 5:" –  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3 independent 1 Gbit Ethernet lines for data networking. (1280 cores);" –  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high level trigger computers with 2 Gbit Ethernet lines. (13312 cores);" –  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet lines for data networking and 2 additional ones for networking to Tier 0;" CMS: Online Computing" J.A. Coarasa 15 Highbandwidth networking
  • 16. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example More than 3000 computers mostly under Scientific Linux CERN 5:" –  640 (2-core) as a 1st stage building, equipped with 2 Myrinet and 3 independent 1 Gbit Ethernet lines for data networking. (1280 cores);" –  1264 (720 (8-core) + 288 (12-core) + 256 (16-core)) as high level trigger computers with 2 Gbit Ethernet lines. (13312 cores);" –  16 (2-core) with access to 300 TBytes of FC storage, 4 Gbit Ethernet lines for data networking and 2 additional ones for networking to Tier 0;" –  More than 400 used by the subdetectors;" –  90 running Windows for Detector Control Systems;" –  12 computers as an ORACLE RAC;" –  12 computers as CMS control computers;" –  50 computers as desktop computers in the control rooms;" –  200 computers for commissioning, integration and testing;" –  15 computers as infrastructure and access servers;" –  250 active spare computers;" !⇒ Many different Roles! CMS: Online Computing" J.A. Coarasa 16 Highbandwidth networking
  • 17. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS: Online Networking" CMS Networks:" –  Private Networks:" •  Service Network " "(~3000 1 Gbit ports);" •  Data Network " "(~4000 1Gbit ports)" –  Source routing on computers" –  VLANs on switches " •  Central Data Recording " "(CDR). Network to Tier 0." •  Private networks for Oracle RAC" •  Private networks for subdetectors" –  Public CERN Network" J.A. Coarasa (CERN) 17 CMS Networks CMSSites Computer gateways Readout, HLTControl… Firewall Internet Service Network" Data Networks CDR Network CERN Network Storage Manager
  • 18. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS: “official” Databases" •  Configuration information" –  Detectors, DAQ, L1 trigger, HLT…" •  Run, Beam and Luminosity information" –  Info on which files are written sent to Tier-0, eLog…" •  Offline DB also hosting computing applications" –  Tier-0 workflow processing, Data distribution service (PhEDEx), Data Bookkeeping Service,…" •  Conditions data for offline reconstruction and analysis" –  Critical data, exposed to a large community" J.A. Coarasa (CERN) 18
  • 19. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS Databases: Chalenge" •  Over 75 million channels in various detectors" •  Detector information in each channel" –  Conditions: Temperature, HV, LV, status…" –  Calibration: pedestals, charge/count…" –  Changes with time (temperature and radiation)" •  Necessary for performance monitoring" •  Subset used by offline reconstruction and physics anaysis" –  Conditions data" –  Need to distribute to all Tier-N centres worldwide" J.A. Coarasa (CERN) 19
  • 20. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS DB Clients: Frontier" •  Offline (or HLT) reconstructions jobs could create a large load on the DBs" –  Tens of thousands of jobs, few hundred queries each" •  Frontier squid caches minimize the direct access to Oracle servers" –  Additional latency as set by the cache refresh policy" –  Frontier service for Online" •  Used to distribute configuration and conditions to HLT" –  Frontier service for Offline (Tier-N)" •  Reading from “Snapshot” from Offline DB" •  Heavily used for reprocessing" J.A. Coarasa (CERN) 20
  • 21. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS Databases till end of 2011" J.A. Coarasa (CERN) 21 P5 CERN CC CMSONR CMINTR omds orcon other CMSONR Inactive stdby omds orcon other omds orcoff Comput. CMSR omds orcoff Comput. CMSR Inactive stdby Orcoff Snap. other CMSARC Oracle Streams Oracle Streams Oracle Data Guard O racle D ata G uard INT2R INT9R CMS CERN CC Off-Site main test Oracle 10 Firewall
  • 22. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS DB space usage" •  DB growth about 1.5Tbyte/year" – Both on/off-line" •  Condition data is only a small fraction" – ~300Gbyte now" – Growth: +20Gbyte/ year" – ~50 Global Tags / month" J.A. Coarasa (CERN) 22 0 1.5 3 4.5 6 Dec 09 Dec 10 Dec 11 DB size in TB CMSONR CMSR
  • 23. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS DB operations 2011" •  Smooth running" – CMSONR availability: 99.88%" •  10.5 hours downtime" – CMSR availability: 99.64%" •  30.7 hours downtime" – SQL query time stable (few msec)" J.A. Coarasa (CERN) 23 10 ms Big Thank to CERN DBAs !!
  • 24. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS Databases in 2012" J.A. Coarasa (CERN) 24 P5 CERN CC CMSONR CMINTR omds orcon other CMSONR Inactive stdby omds orcon other omds orcoff Comput. CMSR omds orcoff Comput. CMSR Inactive stdby Orcoff Snap. other CMSARC INT2R INT9R Oracle Streams Oracle Streams Oracle Data Guard O racle D ata G uard CMSONR active stdby omds orcon other OracleDataGuard CMS CERN CC Off-Site main Oracle 11g Active Data Guard Active Data Guard Firewall test
  • 25. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Other CMS Documents" x    4000  people      …  for  many  decades J.A. Coarasa (CERN) 25
  • 26. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Other CMS Documents: Size" A printed pile of all CMS documents that are already in a managed system = 1.0 x (Empire State building) Plus we have almost the same amount spread all over the place (PCs, afs, dfs, various  websites  …) J.A. Coarasa (CERN) 26
  • 27. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Other CMS Documents: Size" No. of CMS Documents, May 2012 J.A. Coarasa (CERN) 27
  • 28. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example CMS website: cms.web.cern.ch" •  Migrated to a new drupal infrastructure:" – Offers a coherent view of all CMS information" J.A. Coarasa (CERN) 28 Public site focus is fresh news – Text, images, links, etc. – Keywords, groups/topics – Author / editor w ow Email no ca ons, RSS, Twi er, FB, G+ … Promote to home page or features slider Push to selected CMS groups
  • 29. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Offline Computing. The GRID" J.A. Coarasa (CERN) 29 Tier-0 (CERN) : recording – reconstruction – distribution Tier-1 (~10 centres) : storage - reprocessing – analysis Tier-2 (~140 centres) : simulation – end-user analysis The GRID. A distributed computing infrastructure (~150 kCores), uniting resources of HEP institutes around the world to provide seamless access to CPU and storage for the LHC experiments. A common solution for an unprecedented demand (in HEP) of computing power for physics analysis.
  • 30. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Scale-free Networks" J.A. Coarasa (CERN) 30 On/Off-line TDAQ (and GRID) systems are, by construction, scale-free systems; they are capable of operating efficiently, taking advantage of any additional resources that become available or as they change in size or volume of data handled. Other complex systems. e.g. the Word Wide Web, show the same behavior. This is the result of the simple mechanism that allows networks to expand by the addition of new vertices which are attached to existing well-connect vertices. On-line (TDAQ) Off-line (GRID) Size Performance Scale-free internet (2002 snapshot)!
  • 31. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Storage: Sizes" J.A. Coarasa (CERN) 31 152 M 280 M 18 M 5 M ALICE ATLAS CMS LHCB 0 PB 200 PB 400 PB 179 194 169 188 2012 2013 DISK TAPE Tape StorageLHC Storage ... the storage is aggregated and virtualized by experiment frameworks 348 PB 382 PB re on re are mor 11
  • 32. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example LHC Storage: Big Instances" J.A. Coarasa (CERN) 32 0 PB 6.5 PB 13 PB13 10 5 10 FNAL dCache CERN EOSATLAS FZK dCache BNL Volume 0 Mio 35 Mio 70 Mio 10 38 10 65 FNAL dCache CERN EOSATLAS FZK dCache BNL Objects 0 2500 5000 1200 5000 2000 FNAL dCache CERN EOSATLAS FZK dCache Devices 0 150 300300 230 40 FNAL dCache CERN EOSATLAS FZK dCache Server Nodes LHC Storage 12
  • 33. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Big Storages: Sizes" J.A. Coarasa (CERN) 33
  • 34. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Big Storages: Number of Instances" J.A. Coarasa (CERN) 34
  • 35. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example Coincidence" J.A. Coarasa (CERN) 35 1997 CERN. A LHC event builder prototype 1997 Stanford. A Web search engine prototype 2008 The CMS HLT center on CESSY and hundreds Off-line GRID computing centres 105 cores 2008 One of Google data center 106 cores
  • 36. DBTA Workshop on Big Data, Cloud Data Management and NoSQLBig Data Management at CERN: The CMS Example " " " " " " " " Thank you. Questions?" J.A. Coarasa 36