LHCb Trigger Overview

LHCb Trigger & DAQ
an Introductory Overview
Niko Neufeld
CERN/PH Department
Yandex, July 3rd Moscow

The Large Hadron Collider

LHC Trigger & DAQ - Niko Neufeld, CERN 2

Physics, Detectors, Trigger & DAQ

High rate signals Fast
collider electronics

data
rare, need decisions Data
Trigger
many collisions acquisition

Event Mass
Filter Storage

High throughput DAQ, Niko Neufeld, CERN 3

The Data Acquisition Challenge at LHC

• 15 million detector channels
• @ 40 MHz
• = ~15 * 1,000,000 * 40 * 1,000,000 bytes

• = ~ 600 TB/sec

?

Should we read everything?
• A typical collision is “boring”
109 Hz – Although we need also some of
these “boring” data as cross-
5 106 Hz check, calibration tool and also
some important “low-energy”
physics
• “Interesting” physics is about 6–8
orders of magnitude rarer (EWK &
Top)
EWK: 20–100 Hz
• “Exciting” physics involving new
10 Hz particles/discoveries is 9 orders
of magnitude below tot
– 100 GeV Higgs 0.1 Hz*
– 600 GeV Higgs 0.01 Hz

• We just  need to efficiently
identify these rare processes from
the overwhelming background
before reading out & storing the
whole event
*Note: this is just the production rate, properly finding it is much rarer!

Know Your Enemy:
pp Collisions at 14 TeV at 1034 cm-2s-1
• (pp) = 70 mb
--> >7 x 108 /s
(!)
• In ATLAS and
CMS* 20 – 30
min bias
events overlap
• H ZZ
Z
H 4 muons: Reconstructed tracks
the cleanest with pt > 25 GeV
(“golden”) And this
signature
(not the H though…)
repeats every 25 ns…

*)LHCb @4x1033 cm-2-1 isn’t much nicer and in Alice (PbPb) is even more busy

Trivial DAQ with a real trigger 2
Sensor
Trigger

Delay Discriminator

Start Busy Logic
ADC
and not
Proces- Interrupt Set
sing Clear Q
Ready

storage

Deadtime (%) is the ratio between the time the DAQ
is busy and the total time.

A “simple” 40 MHz track trigger – the
LHCb PileUp system


Finding vertices in FPGAs

• Use r-coordinates of
hits in Si-detector discs
(detector geometry
made for this task!)
• Find coincidences
between hits on two
discs
• Count & histogram


LHCb Pileup Finding multiple vertices
and quality
Comparing with the “offline” truth
(full tracking, calibration, alignment)


LHCb Pileup Algorithm

• Time-budget for this
algorithm about 2 us
• Runs in conventional
FPGAs in a radiation-
safe area
• Limited to low pile-up
(ok for LHCb)


After the Trigger
Detector Read-out and DAQ

DAQ design guidelines
• Scalability – change in event-size, luminosity (pileup!)
• Robust (very little dead-time, high efficiency, non-
expert operators)  intelligent control-systems
• Use industry-standard, commercial technologies (long-
term maintenance)  PCs, Ethernet
• Low cost   PCs, standard LANs
• High band-width (many Gigabytes/s)  use local area
networks (LAN)
• “Creative” & “Flexible” (open for new things)  use
software and reconfigurable logic (FPGAs)


One network to rule the all
• Ethernet, IEEE 802.3xx, has almost
become synonymous with Local Area
Networking
• Ethernet has many nice features: cheap,
simple, cheap, etc…
• Ethernet does not:
– guarantee delivery of messages
– allow multiple network paths
– provide quality of service or bandwidth
assignment (albeit to a varying degree
this is provided by many switches)
• Because of this raw Ethernet is rarely • Flow-control in standard Ethernet is
used, usually it serves as a transport only defined between immediate
medium for IP, UDP, TCP etc… neighbors
• Sending station is free to throw
Xoff data
away x-offed frames (and often does
)


Generic DAQ implemented on a LAN
Typical number of pieces
Detector
1
Custom links from the
detector 1000

“Readout Units” 100 to 1000
for protocol adaptation

Powerful Core routers
2 to 8

Edge switches
50 to 100

Servers for event
filtering
> 1000


Congestion
• "Bang" translates into
2 2 random, uncontrolled packet-
loss
• In Ethernet this is perfectly
valid behavior and
implemented by many low-
latency devices
• This problem comes from
synchronized sources sending
to the same destination at the
same time
Bang • Either a higher level “event-
building” protocol avoids this
congestion or the switches
must avoid packet loss with
2 deep buffer memories

Push-Based Event Building with
store& forward switching and load-balancing
Sources do not buffer – “Send me
so switch must buffer to “Send me
an event!”
an event!”
avoid packet loss due to
overcommitment
Event Builder 1

Event Builder 2 me
“Send
an event!”
Data Acquisition
Switch
Event Builder 3
“Send me
EB1: 0
1 an event!”
EB2: 0
1 “Send
“Send
next event
EB3: 0
1 “Send
Event Manager tonext event
EB1”
next event
to EB2”
to EB3”

Event Builders notify Event Manager

1 2
Readout system
Event Manager
available capacity
ensures that data are
sent only to nodes with
available capacity
LHC Trigger & DAQ - Niko Neufeld, CERN
3 relies on feedback
from Event Builders

17

LHCb DAQ
Detector
VELO ST OT RICH ECal HCal Muon
L0
Trigger
L0 trigger FE FE FE FE FE FE FE

Experiment Control System (ECS)
Electronics Electronics Electronics Electronics Electronics Electronics Electronics
LHC clock TFC
System Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board
Readout
Board

Front-End
MEP Request
READOUT NETWORK

55 GB/s Event building
200 - 300 MB/s
SWITCH SWITCH
SWITCH SWITCH SWITCH SWITCH SWITCH

SWITCH

C C CC C C CC C C C C C C CC C CC C CC CC
CC CC P P P P P P P P P P P P P P P P P P P P PP P P
P P P P U UUU U UUU U UUU U UUU U UUU UU UU
UU UU

MON farm HLT farm
Event data Average event size 55 kB
Timing and Fast Control Signals Average rate into farm 1 MHz
Control and Monitoring data
Average rate to tape 4 – 5 kHz

LHCb DAQ
• Events are very small (about 55 kB) total 
– each read-out board contributes about 200 bytes (only!!)
– A UDP message on Ethernet takes 8 + 14 + 20 + 8 + 4 = 52
bytes  25% overhead(!)
• LHCb uses coalescence of messages, packing about 10
to 15 events into one message (called MEP) 
message rate is ~ 80 kHz (c.f. CMS, ATLAS)
• Protocol is a simple, single stage push, every farm-
node builds complete events, the TTC system is used to
assign IP addresses coherently to the read-out boards


DAQ network parameters
Link load Technology Protocol Eventbuilding
[%]
ALICE

30% Ethernet TCP/IP pull
InfiniBand (HLT) pull (RDMA)
ATLAS

20% 10 Gbit/s (L2) Ethernet TCP/IP pull
50% (Event-collection)
CMS

65% Myrinet Myrinet push (with credits)
40 – 80% Ethernet TCP/IP pull
LHCb

40 - 80% Ethernet UDP push


LHC Trigger/DAQ parameters (as
seen 2011/12)
# Level-0,1,2 Event Network Storage
Trigger Rate (Hz) Size (Byte) Bandw.(GB/s) MB/s (Event/s)
ALICE

4 Pb-Pb 500 5x107 25 4000 (102)
p-p 103 2x106 200 (102)
ATLAS

3 LV-1 105 1.5x106 6.5 700 (6x102)
LV-2 3x103
105 106 ~1000 (102)
CMS

2 LV-1 100

2 LV-0 106 5.5x104 55 250 (4.5x103)
LHCb


High Level Trigger Farms

And that, in simple terms, is what
we do in the High Level Trigger


Online Trigger Farms 2012
ALICE ATLAS CMS LHCb
# cores 2700 17000 13200 15500
(+ hyperthreading)
# servers ~ 2000 ~ 1300 1574
(mainboards)
total available ~ 500 ~ 820 800 525
cooling power
total available ~ 2000 2400 ~ 3600 2200
rack-space (Us)
CPU type(s) AMD Intel 54xx, Intel 54xx, Intel 5450,
Opteron, Intel 56xx Intel 56xx Intel 5650,
Intel 54xx, Intel E5-2670 AMD 6220
Intel 56xx

And counting…


LHC planning
Not yet
approved!
Long Shutdown 1 (LS1)

CMS: Myrinet  InfiniBand / Ethernet
ATLAS: Merge L2 and EventCollection infrastructures


ALICE continuous read-out
LHCb 40 MHz read-out CMS track-trigger


Motivation
• The LHC (large hadron collider) collides protons
every 25 ns (40 MHz)
• Each collision produces about 100 kB of data in
the detector
• Currently a pre-selection in custom electronics
rejects 97.5% of these events  unfortunately a
lot of them contain interesting physics
• In 2017 the detector will be changed so that all
events can be read-out into a standard compute
platform for detailed inspection

Niko Neufeld, CERN 25

LHCb after LS2

• Ready for all software trigger (resources permitting)
• 0-suppression on front-end electronics mandatory!
• Event-size about 100 kB, readout-rate up to 40 MHz
• Will need a network scalable up to 32 Tbit/s:
InfiniBand, 10/40/100 Gigabit Ethernet?

Key figures
• Minimum required bandwidth: > 32 Tbit/s
• # of 100 Gigabit/s links > 320
• # of compute units > 1500
• An event (“snapshot of a collision”) is about 100
kB of data
• # of events processed every second: 10 to 40
millions
• # of events retained after filtering: 20000 to
30000 (data reduction of at least a factor 1000)


LHCb DAQ as of 2018
Detector GBT: custom radiation- hard
link over MMF, 3.2 Gbit/s
Readout Units (about 10000)
Input into DAQ network
(10/40 Gigabit Ethernet or
DAQ network FDR IB) (1000 to 4000)
Output from DAQ network
100 m rock
into compute unit clusters
(100 Gbit Ethernet / EDR IB)
(200 to 400 links)
Compute Units Compute units could be
servers with GPUs or other
coprocessors

Readout Unit
• Readout Unit needs to collect custom-links
• Some pre-processing
• Buffering
• Coalescing of data-fragment  reduce message-rate /
transport overheads
• Needs an FPGA
• Sends data using standard network protocol (IB,
Ethernet)
• Sending of data can be done directly from the FPGA or
via a standard network silicon
• Works together with Compute Units to build events


Compute Unit
• A compute unit is a destination for the event-
data fragments from the readout units
• It assembles the fragments into a complete
“event” and runs various selection algorithms on
this event
• About 0.1 % of events is retained
• A compute unit will be a high-density server
platform (mainboard with standard CPUs),
probably augmented with a co-processor card
(like Intel MIC or GPU)


Future DAQ systems: trends
• Certainly LAN based
– InfiniBand deserves a serious evaluation for high-bandwidth (>
100 GB/s)
– In Ethernet if DCB works, might be able to build networks from
smaller units, otherwise we will stay with large store&forward
boxes
• Trend to “trigger-free”  do everything in software 
bigger DAQ will continue
– Physics data-handling in commodity CPUs
• Will there be a place for multi-core / coprocessor cards
(Intel MIC / CUDA)?
– IMHO this will depend on if we can establish a development
framework which allows for longterm maintenance of the
software by non-”geek” users, much more than on the actual
technology

Fat-Tree Topology for One Slice
• 48-port 10 GbE switches
• Mix readout-boards (ROB) and filter-farm-servers in one
switch
– 15 x readout-boards
– 18 x servers
– 15 x uplinks
Non-block switching
use 65% of installed bandwidth
(classical DAQ only 50%)

• Each slice accomodates
– 690 x inputs (ROBS)
– 828 x outputs servers
Ratio (server/ROB) is adjustable

Pull-Based Event Building
“Send event “Send me
“Send me
“Send1!”
to EB event an event!”
“Send1!”
to EB event an event!”
“Send1!”
to EB event
to EB 1!”Event Builder 1

Event Builder 2 me
“Send
an event!”
Data Acquisition
Switch
Event Builder 3
“Send me
EB1: 0
1 an event!”
EB2: 0
1 “EB1, get
“EB2,
next get
EB3: 0
1
next
event”
event”

Event Builders notify Event Manager elects

1 2
Readout traffic is
Event Manager of
available capacity
event-builder node

LHC Trigger & DAQ - Niko Neufeld, CERN
3 driven by Event
Builders

33

Summary
• Large modern DAQ systems are based entirely (mostly) on Ethernet and
big PC-server farms
• Bursty, uni-directional traffic is a challenge in the network and the
receivers, and requires substantial buffering in the switches
• The future:
– It seems that buffering in switches is being reduced (latency vs. buffering)
– Advanced flow-control is coming, but it will need to be tested if it is sufficient
for DAQ
– Ethernet is still strongest, but InfiniBand looks like a very interesting
alternative
– Integrated protocols (RDMA) can offload servers, but will be more complex
– Integration of GPUs, non-Intel processors and other many-cores will be need
to be studied
• For the DAQ and triggering the question is not if we can do it,
but how we can do it so we can afford it!


Cut-through switching
Head of Line Blocking
1 3 • The reason for this is the First
2 2
4 in First Out (FIFO) structure of
the input buffer
• Queuing theory tells us* that
for random traffic (and infinitely
Packet to node 4 must wait
many switch ports) the
throughput of the switchnode 4 is f
even though port to will
go down to 58.6%  that
means on 100 MBit/s network
the nodes will "see" effectively
only ~ 58 MBit/s
2
4 *) "Input Versus Output Queueing on a Space-Division
Packet Switch"; Karol, M. et al. ; IEEE Trans. Comm.,
35/12

Event-building
Detector Readout Units send to Compute Units
Compute Units receive passively
“Push-architecture”
Readout Units

DAQ network GBT: custom radiation-
hard link over MMF, 3.2
Gbit/s (about 10000)
Input into DAQ
100 m rock network (10/40 Gigabit
Ethernet or FDR IB)
(1000 to 4000)
Output from DAQ
network into compute
unit clusters (100 Gbit
Compute Units Ethernet / EDR IB) (200
to 400 links)


Runcontrol challenges

• Start, configure and control O(10000)
processes on farms of several 1000 nodes
• Configure and monitor O(10000) front-end
elements
• Fast data-base access, caching, pre-loading,
parallelization and all this 100% reliable!


Runcontrol technologies
• Communication:
– CORBA (ATLAS)
– HTTP/SOAP (CMS)
– DIM (LHCb, ALICE)
• Behavior & Automatisation:
– SMI++ (Alice)
– CLIPS (ATLAS)
– RCMS (CMS)
– SMI++ (in PVSS) (used also in the DCS)
• Job/Process control:
– Based on XDAQ, CORBA, …
– FMC/PVSS (LHCb, does also fabric monitoring)
• Logging:
– log4C, log4j, syslog, FMC (again), …


LHCb Trigger Overview

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (14)

Similar to LHCb Trigger Overview

Similar to LHCb Trigger Overview (20)

More from Yandex

More from Yandex (20)

Recently uploaded

Recently uploaded (20)

LHCb Trigger Overview

Editor's Notes