Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Upcoming SlideShare
What to Upload to SlideShare
What to Upload to SlideShare
Loading in …3
1 of 25

Marco Cattaneo "Event data processing in LHCb"



Download to read offline

Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012

Marco Cattaneo, CERN

Related Books

Free with a 30 day trial from Scribd

See all

Marco Cattaneo "Event data processing in LHCb"

  1. 1. Event Data Processing in LHCb Marco Cattaneo CERN – LHCb On behalf of the LHCb Computing Group
  2. 2. LHC interactions LHC: Two proton beams of ~1380 bunches, rotating at 11kHz! !15 MHz crossing rate (30 MHz in 2015)! !Average 1.5 interactions per crossing in LHCb! ! !Each crossing contains a potentially interesting “event”! 2
  3. 3. Typical event in LHCb Raw event size ~60 kB 3
  4. 4. Data reduction in real time (trigger) pp collisions 15 MHz ü  custom hardware Level-0 ü  partial detector information ü  CPU farm -> software trigger 1 MHz ü  full detector information HLT ü  reconstruction in real time ü (~20k jobs in parallel) ~5 kHz ü  ~200 independent “lines” ü  300 MB/s Offline Storage ü 1.3 PB in 2012 4
  5. 5. Event Reconstruction ❍  On full event sample (~2 billion events expected in 2012): ❏  Pattern recognition to measure particle trajectories ✰  Identify vertices, measure momentum ❏  Particle ID to measure particle types ~ 2 sec/event ~ 5k concurrent jobs (run on grid) Reconstructed data stored in “FULL DST” DST event size ~100 kB -> 2 PB in 2012 (includes RAW) 5
  6. 6. Data reduction offline (stripping) ❍  Any given physics analysis selects 0.1-1.0% of events ❏  Inefficient to allow individual physicists to run selection job over FULL DST ❍  Group all selections in a single selection pass, executed by central production team ❏  Runs over FULL DST ❏  Executes ~800 independent stripping “lines” ✰  ~ 0.5 sec/event in total ✰  Writes out only events selected by one or more of these lines ❏  Output events are grouped into ~15 streams ❏  Each stream selects 1-10% of events ❍  Overall data volume reduction: x50 – x500 depending on stream ❏  Few TB (<50) per stream, replicated at several places ❏  Accessible to physicists for data analysis 6
  7. 7. Simulation LHCb Monte Carlo simulation software: –  Simulation of physics event –  Detailed detector and material description (GEANT) –  Pattern recognition, trigger simulation and offline event selection –  Implements detector inefficiencies, noise hits, effects of multiple collisions 7
  8. 8. Resources for simulation ❍  Simulation jobs require 1-2 minutes per event ❏  Several billion events per year required ❏  Runs on ~20k CPUs worldwide Yandex contribution ~25% 8
  9. 9. Physics Applications Software Organization High level triggers Reconstruction Simulation Applications built on top of Analysis frameworks and implementing the required algorithms. One framework for basic services + various specialized frameworks: detector description, visualization, persistency, Frameworks interactivity, simulation, etc. Toolkits A series of widely used basic libraries: Boost, GSL, Root etc. Foundation Libraries
  10. 10. Gaudi Framework (Object Diagram) Application Converter Event Converter Manager Converter Selector Message Event Data Persistency Data Service Service Service Files Transient JobOptions Event Service Algorithm Store Algorithm Algorithm Transient Particle Prop. Detec. Data Persistency Data Detector Service Service Service Files Store Other Services Transient Histogram Histogram Persistency Data Service Store Service Files
  11. 11. The LHCb Computing Model 11
  12. 12. LHCb Workload management System :DIRAC ! User Community! ❍  DIRAC forms an
 overlay network! ❏  A way for grid
 interoperability for
 Grid B! Grid A! a given Community! (WLCG)! (NDG)! ❏  Needs specific Agent 
 Director per resource type! ❍  From the user perspective
 all the resources are seen as a single large “batch system”!
  13. 13. DIRAC WMS! u  Jobs are submitted to the DIRAC Central Task Queue ! u  VRC policies are applied here by prioritizing jobs in the Queue! u  Pilot Jobs are submitted by
 specific Directors to various Grids or computer clusters! u  Allows to aggregate various computing types resources transparently for the users ! u  The Pilot Job gets the most appropriate user job! u  Jobs are running in a verified environment with a high efficiency! ! 13
  14. 14. DIRAC ❍  Live DIRAC Display 14
  15. 15. Data replication RAW! CERN + 1 Tier1 ❍  Active data placement Brunel ! (recons)! ❍  Split disk (for analysis) and archive ❏  No disk replica for RAW and FULL 1 Tier1 FULL DST (read few times) DST! DaVinci! (stripping DST1! DST1 ! 1 Tier1 and DST1 ! DST1! DST1! (scratch) streaming) ! Merge! CERN + 1 Tier1 DST1! CERN + 3 Tier1s 15
  16. 16. Datasets ❍  Granularity at the file level ❏  Data Management operations (replicate, remove replica, delete file) ❏  Workload Management: input/output files of jobs ❍  LHCbDirac perspective ❏  DMS and WMS use Logical File Names to reference files ❏  LFN namespace refers to the origin of the file ✰  Constructed by the jobs (uses production and job number) ✰  Hierarchical namespace for convenience ✰  Used to define file class (tape-sets) for RAW, FULL.DST, DST ✰  GUID used for internal navigation between files (Gaudi) ❍  User perspective ❏  File is part of a dataset (consistent for physics analysis) ❏  Dataset: specific conditions of data, processing version and processing level ✰  Files in a dataset should be exclusive and consistent in quality and content 16
  17. 17. Replica Catalog (1) ❍  Logical namespace ❏  Reflects somewhat the origin of the file (run number for RAW, production number for output files of jobs) ❏  File type also explicit in the directory tree ❍  Storage Elements ❏  Essential component in the DIRAC DMS ❏  Logical SEs: several DIRAC SEs can physically use the same hardware SE (same instance, same SRM space) ❏  Described in the DIRAC configuration ✰  Protocol, endpoint, port, SAPath, Web Service URL ✰  Allows autonomous construction of the SURL ✰  SURL = srm:<endPoint>:<port><WSUrl><SAPath><LFN> ❏  SRM spaces at Tier1s ✰  Used to have as many SRM spaces as DIRAC SEs, now only 3 ✰  LHCb-Tape (T1D0) custodial storage ✰  LHCb-Disk (T0D1) fast disk access ✰  LHCb-User (T0D1) fast disk access for user data 17
  18. 18. Replica Catalog (2) ❍  Currently using the LFC ❏  Master write service at CERN ❏  Replication using Oracle streams to Tier1s ❏  Read-only instances at CERN and Tier1s ✰  Mostly for redundancy, no need for scaling ❍  LFC information: ❏  Metadata of the file ❏  Replicas ✰  Use “host name” field for the DIRAC SE name ✰  Store SURL of creation for convenience (not used) ❄  Allows lcg-util commands to work ❏  Quality flag ✰  One character comment used to set temporarily a replica as unavailable ❍  Testing scalability of the DIRAC file catalog ❏  Built-in storage usage capabilities (per directory) 18
  19. 19. Bookkeeping Catalog (1) ❍  User selection criteria ❏  Origin of the data (real or MC, year of reference) ✰  LHCb/Collision12 ❏  Conditions for data taking of simulation (energy, magnetic field, detector configuration… ✰  Beam4000GeV-VeloClosed-MagDown ❏  Processing Pass is the level of processing (reconstruction, stripping…) including compatibility version ✰  Reco13/Stripping19 ❏  Event Type is mostly useful for simulation, single value for real data ✰  8 digit numeric code (12345678, 90000000) ❏  File Type defines which type of output files the user wants to get for a given processing pass (e.g. which stream) ✰  RAW, SDST, BHADRON.DST (for a streamed file) ❍  Bookkeeping search ❏  Using a path ✰  /<origin>/<conditions>/<processing pass>/<event type>/<file type>! 19
  20. 20. Bookkeeping Catalog (2) ❍  Much more than a dataset catalog! ❍  Full provenance of files and jobs ❏  Files are input of processing steps (“jobs”) that produce files ❏  All files ever created are recorded, each processing step as well ✰  Full information on the “job” (location, CPU, wall clock time…) ❍  BK relational database ❏  Two main tables: “files” and “jobs” ❏  Jobs belong to a “production” ❏  “Productions” belong to a “processing pass”, with a given “origin” and “condition” ❏  Highly optimized search for files, as well as summaries ❍  Quality flags ❏  Files are immutable, but can have a mutable quality flag ❏  Files have a flag indicating whether they have a replica or not 20
  21. 21. Bookkeeping browsing ❍  Allows to save datasets ❏  Filter, selection ❏  Plain list of files ❏  Gaudi configuration file ❍  Can return files with only replica at a given location 21
  22. 22. Event indexing ❍  Book-keeping has no 104248:539058326 information about individual events Advanced search Select all Download Selected 104248:539058326 But can be beneficial to Event time Oct. 27, 2011, 9:11 p.m. File names ❍  /lhcb/LHCb/Collision11/DIMUON.DST/00013016/0000/00013016_00000037_1.dimuon.dst /lhcb/LHCb/Collision11/EW.DST/00013017/0000/00013017_00000033_1.ew.dst select events based on global Application Tags Brunel v41r1 DDDB DQFLAGS head-20110914 tt-20110126 criteria: Number of tracks, clusters LHCBCOND head-20111111 ONLINE HEAD ❏  etc. Stripping lines DY2MuMuLine2_Hlt 1 FullDSTDiMuonDiMuonHighMassLine 1 Trigger or Stripping lines StreamDimuon 0 StreamEW 0 WMuLine 2 ❏  fired Z02MuMuLine 1 Z02MuMuNoPIDsLine 1 Z02TauTauLine 2 … Global Event Activity counters nBackTracks 37 nPV 2 ❏  nDownstreamTracks 21 nRich1Hits 944 nITClusters 352 nRich2Hits 985 nLongTracks 11 nSPDhits 128 nMuonCoordsS0 156 nTTClusters 357 nMuonCoordsS1 86 nTTracks 14 Prototype implemented by nMuonCoordsS2 15 nTracks 92 nMuonCoordsS3 31 nUpstreamTracks 6 nMuonCoordsS4 nMuonTracks 18 9 nVeloClusters nVeloTracks 1389 56 ❍  Andrey Ustyuzhanin nOTClusters 4615 Search is supported by 22
  23. 23. Staging: using files from tape ❍  If jobs use files that are not online (on disk) ❏  Before submitting the job ❏  Stage the file from tape, and pin it on cache ❍  Stager agent ❏  Performs also cache management ❏  Throttle staging requests depending on the cache size and the amount of pinned data ❏  Requires fine tuning (pinning and cache size) ✰  Caching architecture highly site dependent ✰  No publication of cache sizes (except Castor and StoRM) ❍  Jobs using staged files ❏  Check first the file is still staged ✰  If not reschedule the job ❏  Copies the file locally on the WN whenever possible ✰  Space is released faster ✰  More reliable access for very long jobs (reconstruction) or jobs using many files (merging) 23
  24. 24. Data Management Outlook ❍  Improvements on staging ❏  Improve the tuning of cache settings ✰  Depends on how caches are used by sites ❏  Pinning/unpinning ✰  Difficult if files are used by more than one job ❍  Popularity ❏  Record dataset usage ✰  Reported by jobs: number of files used in a given dataset ✰  Account number of files used per dataset per day/week ❏  Assess dataset popularity ✰  Relate usage to dataset size ❏  Take decisions on the number of online replicas ✰  Taking into account available space ✰  Taking into account expected need in the coming week 24
  25. 25. Longer term challenges ❍  Computing model evolution ❏  Networking performance allows evolution from static “Tier” model of data access to much more dynamic model ❏  Current processing model of 1 sequential job per file per CPU breaks down in multi-core era due to memory limitations ✰  parallelisation of event processing ❏  Whole node scheduling, virtualisation, clouds ❍  LHCb upgrade in 2018 ❏  From computing point of view: ✰  x40 readout rate into HLT farm ✰  x4 event rate to storage (20kHz) ✰  x1.5 event size (100kB/RAW event) ❏  x10 increase in data rates imply ✰  scaling of Dirac WMS and DMS ✰  scaling of data selection catalogs ✰  new ideas for data mining? ❍  Data preservation and open access ❏  Preserve ability to analyse old data many year in the future ❏  Make data available for analysis outside LHCb + general public 25