• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Niko Neufeld "A 32 Tbit/s Data Acquisition System"
 

Niko Neufeld "A 32 Tbit/s Data Acquisition System"

on

  • 1,039 views

Семинар «Использование современных информационных технологий для решения современных задач физики частиц» ...

Семинар «Использование современных информационных технологий для решения современных задач физики частиц» в московском офисе Яндекса, 3 июля 2012

Niko Neufeld, CERN

Statistics

Views

Total Views
1,039
Views on SlideShare
584
Embed Views
455

Actions

Likes
0
Downloads
4
Comments
0

5 Embeds 455

http://events.yandex.ru 397
http://tech.yandex.ru 51
http://events.lynx.yandex.ru 5
http://events.yandex-team.ru 1
http://news.google.com 1

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Scheme showing the basic principle of the PU vertex algorithm implemented on the VEPROBs: on top, the $r$-coordinates of the hits are combined in a coincidence matrix; then the sum of entries in a wedge between lines of constant $\\frac{R_B}{R_A}$ is used to extract the vertex information; finally the $z$-position of all vertex candidates is projected onto an histogram. The highest peak is labeled as primary vertex PV.
  • Left: Vertex histogram obtained from the combinations of PU hits from a collision event of 2011 data. The histogram filled in black (red) is obtained before (after) the ``peak-masking'' phase. The second peak, that is the peak with the maximum number of entries in a 3-bins wide window, is now clearly visible.Right: Distance in mm between the $z-$position of the PU vertex candidate and the $z-$position of the offline (reconstructed) vertex, for events with at least 2 PU vertices and 2 reconstructed vertices. The histogram is obtained after applying the misalignment corrections to the Pile-Up.
  • TFC (TTC) system used as a load-balancer. No separate event-builder units – event-building done directly on each trigger farm node. Trigger farm nodes send event-requests to TFC system. TFC system broadcasts IP address to read-out board. Readout boards push data to trigger-farm node. Single stage read-out. Unreliable network protocol. Relies on large buffers in network and some over-provisioning. Typical link-load in DAQ 70 to 80% (for up-links)
  • Take advantage of unidirection

Niko Neufeld "A 32 Tbit/s Data Acquisition System" Niko Neufeld "A 32 Tbit/s Data Acquisition System" Presentation Transcript

  • LHCb Trigger & DAQan Introductory Overview Niko Neufeld CERN/PH Department Yandex, July 3rd Moscow
  • The Large Hadron Collider LHC Trigger & DAQ - Niko Neufeld, CERN 2
  • Physics, Detectors, Trigger & DAQ High rate signals Fast collider electronics datarare, need decisions Data Triggermany collisions acquisition Event Mass Filter Storage High throughput DAQ, Niko Neufeld, CERN 3
  • The Data Acquisition Challenge at LHC • 15 million detector channels • @ 40 MHz • = ~15 * 1,000,000 * 40 * 1,000,000 bytes • = ~ 600 TB/sec ? LHC Trigger & DAQ - Niko Neufeld, CERN 4
  • Should we read everything? • A typical collision is “boring” 109 Hz – Although we need also some of these “boring” data as cross- 5 106 Hz check, calibration tool and also some important “low-energy” physics • “Interesting” physics is about 6–8 orders of magnitude rarer (EWK & Top) EWK: 20–100 Hz • “Exciting” physics involving new 10 Hz particles/discoveries is 9 orders of magnitude below tot – 100 GeV Higgs 0.1 Hz* – 600 GeV Higgs 0.01 Hz • We just  need to efficiently identify these rare processes from the overwhelming background before reading out & storing the whole event *Note: this is just the production rate, properly finding it is much rarer! LHC Trigger & DAQ - Niko Neufeld, CERN 5
  • Know Your Enemy: pp Collisions at 14 TeV at 1034 cm-2s-1• (pp) = 70 mb --> >7 x 108 /s (!)• In ATLAS and CMS* 20 – 30 min bias events overlap• H ZZ Z H 4 muons: Reconstructed tracks the cleanest with pt > 25 GeV (“golden”) And this signature (not the H though…) repeats every 25 ns… *)LHCb @4x1033 cm-2-1 isn’t much nicer and in Alice (PbPb) is even more busy LHC Trigger & DAQ - Niko Neufeld, CERN 6
  • Trivial DAQ with a real trigger 2 Sensor Trigger Delay Discriminator Start Busy Logic ADC and not Proces- Interrupt Set sing Clear Q Ready storageDeadtime (%) is the ratio between the time the DAQis busy and the total time. High throughput DAQ, Niko Neufeld, CERN 7
  • A “simple” 40 MHz track trigger – the LHCb PileUp system LHC Trigger & DAQ - Niko Neufeld, CERN 8
  • Finding vertices in FPGAs • Use r-coordinates of hits in Si-detector discs (detector geometry made for this task!) • Find coincidences between hits on two discs • Count & histogram LHC Trigger & DAQ - Niko Neufeld, CERN 9
  • LHCb Pileup Finding multiple vertices and quality Comparing with the “offline” truth (full tracking, calibration, alignment) LHC Trigger & DAQ - Niko Neufeld, CERN 10
  • LHCb Pileup Algorithm • Time-budget for this algorithm about 2 us • Runs in conventional FPGAs in a radiation- safe area • Limited to low pile-up (ok for LHCb) LHC Trigger & DAQ - Niko Neufeld, CERN 11
  • After the TriggerDetector Read-out and DAQ
  • DAQ design guidelines• Scalability – change in event-size, luminosity (pileup!)• Robust (very little dead-time, high efficiency, non- expert operators)  intelligent control-systems• Use industry-standard, commercial technologies (long- term maintenance)  PCs, Ethernet• Low cost   PCs, standard LANs• High band-width (many Gigabytes/s)  use local area networks (LAN)• “Creative” & “Flexible” (open for new things)  use software and reconfigurable logic (FPGAs) LHC Trigger & DAQ - Niko Neufeld, CERN 13
  • One network to rule the all• Ethernet, IEEE 802.3xx, has almost become synonymous with Local Area Networking• Ethernet has many nice features: cheap, simple, cheap, etc…• Ethernet does not: – guarantee delivery of messages – allow multiple network paths – provide quality of service or bandwidth assignment (albeit to a varying degree this is provided by many switches)• Because of this raw Ethernet is rarely • Flow-control in standard Ethernet is used, usually it serves as a transport only defined between immediate medium for IP, UDP, TCP etc… neighbors • Sending station is free to throw Xoff data away x-offed frames (and often does ) High throughput DAQ, Niko Neufeld, CERN 14
  • Generic DAQ implemented on a LAN Typical number of pieces Detector 1 Custom links from the detector 1000 “Readout Units” 100 to 1000 for protocol adaptation Powerful Core routers 2 to 8 Edge switches 50 to 100Servers for eventfiltering > 1000 LHC Trigger & DAQ - Niko Neufeld, CERN 15
  • Congestion • "Bang" translates into2 2 random, uncontrolled packet- loss • In Ethernet this is perfectly valid behavior and implemented by many low- latency devices • This problem comes from synchronized sources sending to the same destination at the same time Bang • Either a higher level “event- building” protocol avoids this congestion or the switches must avoid packet loss with 2 deep buffer memories LHC Trigger & DAQ - Niko Neufeld, CERN 16
  • Push-Based Event Building withstore& forward switching and load-balancing Sources do not buffer – “Send me so switch must buffer to “Send me an event!” an event!” avoid packet loss due to overcommitment Event Builder 1 Event Builder 2 me “Send an event!” Data Acquisition Switch Event Builder 3 “Send meEB1: 0 1 an event!”EB2: 0 1 “Send “Send next eventEB3: 0 1 “Send Event Manager tonext event EB1” next event to EB2” to EB3” Event Builders notify Event Manager1 2 Readout system Event Manager available capacity ensures that data are sent only to nodes with available capacity LHC Trigger & DAQ - Niko Neufeld, CERN 3 relies on feedback from Event Builders 17
  • LHCb DAQ Detector VELO ST OT RICH ECal HCal Muon L0Trigger L0 trigger FE FE FE FE FE FE FE Experiment Control System (ECS) Electronics Electronics Electronics Electronics Electronics Electronics Electronics LHC clock TFC System Readout Board Readout Board Readout Board Readout Board Readout Board Readout Board Readout Board Front-End MEP Request READOUT NETWORK 55 GB/s Event building 200 - 300 MB/s SWITCH SWITCH SWITCH SWITCH SWITCH SWITCH SWITCH SWITCH C C CC C C CC C C C C C C CC C CC C CC CC CC CC P P P P P P P P P P P P P P P P P P P P PP P P P P P P U UUU U UUU U UUU U UUU U UUU UU UU UU UU MON farm HLT farm Event data Average event size 55 kB Timing and Fast Control Signals Average rate into farm 1 MHz Control and Monitoring data Average rate to tape 4 – 5 kHz LHC Trigger & DAQ - Niko Neufeld, CERN 18
  • LHCb DAQ• Events are very small (about 55 kB) total  – each read-out board contributes about 200 bytes (only!!) – A UDP message on Ethernet takes 8 + 14 + 20 + 8 + 4 = 52 bytes  25% overhead(!)• LHCb uses coalescence of messages, packing about 10 to 15 events into one message (called MEP)  message rate is ~ 80 kHz (c.f. CMS, ATLAS)• Protocol is a simple, single stage push, every farm- node builds complete events, the TTC system is used to assign IP addresses coherently to the read-out boards LHC Trigger & DAQ - Niko Neufeld, CERN 19
  • DAQ network parameters Link load Technology Protocol Eventbuilding [%]ALICE 30% Ethernet TCP/IP pull InfiniBand (HLT) pull (RDMA)ATLAS 20% 10 Gbit/s (L2) Ethernet TCP/IP pull 50% (Event-collection)CMS 65% Myrinet Myrinet push (with credits) 40 – 80% Ethernet TCP/IP pullLHCb 40 - 80% Ethernet UDP push LHC Trigger & DAQ - Niko Neufeld, CERN 20
  • LHC Trigger/DAQ parameters (as seen 2011/12) # Level-0,1,2 Event Network Storage Trigger Rate (Hz) Size (Byte) Bandw.(GB/s) MB/s (Event/s)ALICE 4 Pb-Pb 500 5x107 25 4000 (102) p-p 103 2x106 200 (102)ATLAS 3 LV-1 105 1.5x106 6.5 700 (6x102) LV-2 3x103 105 106 ~1000 (102)CMS 2 LV-1 100 2 LV-0 106 5.5x104 55 250 (4.5x103)LHCb LHC Trigger & DAQ - Niko Neufeld, CERN 21
  • High Level Trigger Farms And that, in simple terms, is what we do in the High Level Trigger High throughput DAQ, Niko Neufeld, CERN 22
  • Online Trigger Farms 2012 ALICE ATLAS CMS LHCb# cores 2700 17000 13200 15500(+ hyperthreading)# servers ~ 2000 ~ 1300 1574(mainboards)total available ~ 500 ~ 820 800 525cooling powertotal available ~ 2000 2400 ~ 3600 2200rack-space (Us)CPU type(s) AMD Intel 54xx, Intel 54xx, Intel 5450, Opteron, Intel 56xx Intel 56xx Intel 5650, Intel 54xx, Intel E5-2670 AMD 6220 Intel 56xx And counting… LHC Trigger & DAQ - Niko Neufeld, CERN 23
  • LHC planning Not yet approved! Long Shutdown 1 (LS1) CMS: Myrinet  InfiniBand / Ethernet ATLAS: Merge L2 and EventCollection infrastructures Long Shutdown 2 (LS2) Long Shutdown 3 (LS3)ALICE continuous read-outLHCb 40 MHz read-out CMS track-trigger LHC Trigger & DAQ - Niko Neufeld, CERN 24
  • Motivation• The LHC (large hadron collider) collides protons every 25 ns (40 MHz)• Each collision produces about 100 kB of data in the detector• Currently a pre-selection in custom electronics rejects 97.5% of these events  unfortunately a lot of them contain interesting physics• In 2017 the detector will be changed so that all events can be read-out into a standard compute platform for detailed inspection Niko Neufeld, CERN 25
  • LHCb after LS2• Ready for all software trigger (resources permitting)• 0-suppression on front-end electronics mandatory!• Event-size about 100 kB, readout-rate up to 40 MHz• Will need a network scalable up to 32 Tbit/s: InfiniBand, 10/40/100 Gigabit Ethernet? LHC Trigger & DAQ - Niko Neufeld, CERN 26
  • Key figures• Minimum required bandwidth: > 32 Tbit/s• # of 100 Gigabit/s links > 320• # of compute units > 1500• An event (“snapshot of a collision”) is about 100 kB of data• # of events processed every second: 10 to 40 millions• # of events retained after filtering: 20000 to 30000 (data reduction of at least a factor 1000) Niko Neufeld, CERN 27
  • LHCb DAQ as of 2018 Detector GBT: custom radiation- hard link over MMF, 3.2 Gbit/sReadout Units (about 10000) Input into DAQ network (10/40 Gigabit Ethernet orDAQ network FDR IB) (1000 to 4000) Output from DAQ network 100 m rock into compute unit clusters (100 Gbit Ethernet / EDR IB) (200 to 400 links) Compute Units Compute units could be servers with GPUs or other coprocessors LHC Trigger & DAQ - Niko Neufeld, CERN 28
  • Readout Unit• Readout Unit needs to collect custom-links• Some pre-processing• Buffering• Coalescing of data-fragment  reduce message-rate / transport overheads• Needs an FPGA• Sends data using standard network protocol (IB, Ethernet)• Sending of data can be done directly from the FPGA or via a standard network silicon• Works together with Compute Units to build events Niko Neufeld, CERN 29
  • Compute Unit• A compute unit is a destination for the event- data fragments from the readout units• It assembles the fragments into a complete “event” and runs various selection algorithms on this event• About 0.1 % of events is retained• A compute unit will be a high-density server platform (mainboard with standard CPUs), probably augmented with a co-processor card (like Intel MIC or GPU) Niko Neufeld, CERN 30
  • Future DAQ systems: trends• Certainly LAN based – InfiniBand deserves a serious evaluation for high-bandwidth (> 100 GB/s) – In Ethernet if DCB works, might be able to build networks from smaller units, otherwise we will stay with large store&forward boxes• Trend to “trigger-free”  do everything in software  bigger DAQ will continue – Physics data-handling in commodity CPUs• Will there be a place for multi-core / coprocessor cards (Intel MIC / CUDA)? – IMHO this will depend on if we can establish a development framework which allows for longterm maintenance of the software by non-”geek” users, much more than on the actual technology High throughput DAQ, Niko Neufeld, CERN 31
  • Fat-Tree Topology for One Slice• 48-port 10 GbE switches• Mix readout-boards (ROB) and filter-farm-servers in one switch – 15 x readout-boards – 18 x servers – 15 x uplinksNon-block switchinguse 65% of installed bandwidth(classical DAQ only 50%)• Each slice accomodates – 690 x inputs (ROBS) – 828 x outputs serversRatio (server/ROB) is adjustable High throughput DAQ, Niko Neufeld, CERN 32
  • Pull-Based Event Building “Send event “Send me “Send me “Send1!” to EB event an event!” “Send1!” to EB event an event!” “Send1!” to EB event to EB 1!”Event Builder 1 Event Builder 2 me “Send an event!” Data Acquisition Switch Event Builder 3 “Send meEB1: 0 1 an event!”EB2: 0 1 “EB1, get “EB2, next getEB3: 0 1 next event” event” Event Builders notify Event Manager elects1 2 Readout traffic is Event Manager of available capacity event-builder node LHC Trigger & DAQ - Niko Neufeld, CERN 3 driven by Event Builders 33
  • Summary• Large modern DAQ systems are based entirely (mostly) on Ethernet and big PC-server farms• Bursty, uni-directional traffic is a challenge in the network and the receivers, and requires substantial buffering in the switches• The future: – It seems that buffering in switches is being reduced (latency vs. buffering) – Advanced flow-control is coming, but it will need to be tested if it is sufficient for DAQ – Ethernet is still strongest, but InfiniBand looks like a very interesting alternative – Integrated protocols (RDMA) can offload servers, but will be more complex – Integration of GPUs, non-Intel processors and other many-cores will be need to be studied• For the DAQ and triggering the question is not if we can do it, but how we can do it so we can afford it! High throughput DAQ, Niko Neufeld, CERN 34
  • More Stuff
  • Cut-through switching Head of Line Blocking 1 3 • The reason for this is the First 2 2 4 in First Out (FIFO) structure of the input buffer • Queuing theory tells us* that for random traffic (and infinitely Packet to node 4 must wait many switch ports) the throughput of the switchnode 4 is f even though port to will go down to 58.6%  that means on 100 MBit/s network the nodes will "see" effectively only ~ 58 MBit/s 24 *) "Input Versus Output Queueing on a Space-Division Packet Switch"; Karol, M. et al. ; IEEE Trans. Comm., 35/12 LHC Trigger & DAQ - Niko Neufeld, CERN 36
  • Event-building Detector Readout Units send to Compute Units Compute Units receive passively “Push-architecture”Readout UnitsDAQ network GBT: custom radiation- hard link over MMF, 3.2 Gbit/s (about 10000) Input into DAQ 100 m rock network (10/40 Gigabit Ethernet or FDR IB) (1000 to 4000) Output from DAQ network into compute unit clusters (100 Gbit Compute Units Ethernet / EDR IB) (200 to 400 links) Niko Neufeld, CERN 37
  • Runcontrol© Warner Bros.
  • Runcontrol challenges• Start, configure and control O(10000) processes on farms of several 1000 nodes• Configure and monitor O(10000) front-end elements• Fast data-base access, caching, pre-loading, parallelization and all this 100% reliable! LHC Trigger & DAQ - Niko Neufeld, CERN 39
  • Runcontrol technologies• Communication: – CORBA (ATLAS) – HTTP/SOAP (CMS) – DIM (LHCb, ALICE)• Behavior & Automatisation: – SMI++ (Alice) – CLIPS (ATLAS) – RCMS (CMS) – SMI++ (in PVSS) (used also in the DCS)• Job/Process control: – Based on XDAQ, CORBA, … – FMC/PVSS (LHCb, does also fabric monitoring)• Logging: – log4C, log4j, syslog, FMC (again), … LHC Trigger & DAQ - Niko Neufeld, CERN 40