Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

Published in: Technology
  • Be the first to comment

  • Be the first to like this


  1. 1. BTeV Trigger BTeV was terminated in February of 2005.
  2. 2. BTeV Trigger Overview <ul><li>Trigger Philosophy: Trigger on characteristics common to all heavy-quark decays, separated production and decay vertices. </li></ul><ul><li>Aim: Reject > 99.9% of background. Keep > 50% of B events. </li></ul><ul><li>The challenge for the BTeV trigger and data acquisition system is to reconstruct particle tracks and interaction vertices in every beam crossing. Looking for topological evidence of B (or D) decay. </li></ul><ul><li>This is feasible for BTeV detector and trigger system, because of </li></ul><ul><ul><li>Pixel detector – low occupancy, excellent spatial resolution, fast readout </li></ul></ul><ul><ul><li>Heavily pipelined and parallel architecture (~5000 processors) </li></ul></ul><ul><ul><li>Sufficient memory to buffer events while awaiting trigger decision </li></ul></ul><ul><ul><li>Rapid development in technology – FPGAs, processors, networking </li></ul></ul><ul><li>3 Levels : </li></ul><ul><ul><li>L1 Vertex trigger (pixels only) + L1 Muon trigger </li></ul></ul><ul><ul><li>L2 Vertex trigger – refined tracking and vertexing </li></ul></ul><ul><ul><li>L3 Full event reconstruction, data compression </li></ul></ul>
  3. 3. BTeV detector BTeV detector 30 station p p
  4. 4. Si pixel detector Si pixel detector 14,080 pixels (128 rows x 110 cols) Multichip module 5 cm 1 cm 50  m 400  m Si pixel sensors 5 FPIX ROC’s 128 rows x 22 columns
  5. 5. L1 vertex trigger algorithm <ul><li>Segment Finder (pattern Recognition) </li></ul><ul><ul><li>Find beginning and ending segments of tracks from hit clusters in 3 adjacent stations (triplets): </li></ul></ul><ul><ul><ul><li>beginning segments: required to originate from beam region </li></ul></ul></ul><ul><ul><ul><li>ending segments: required to project out of pixel detector volume </li></ul></ul></ul><ul><li>Tracking and Vertex Finding. </li></ul><ul><ul><li>Match beginning and ending segments found by FPGA segment finder to form complete tracks. </li></ul></ul><ul><ul><li>Reconstruct primary interaction vertices using complete tracks with p T <1.2GeV/c. </li></ul></ul><ul><ul><li>Find tracks that are “detached” from reconstructed primaries. </li></ul></ul><ul><li>Trigger Decision </li></ul><ul><ul><li>Generate Level-1 accept if it has two “detached” tracks going into the instrumented arm of the BTeV detector. </li></ul></ul>
  6. 6. BTeV trigger overview BTeV trigger overview L1 rate reduction: ~50x L2/3 rate reduction: ~20x 500 GB/s (200KB/event) 2.5 MHz 12.5 GB/s (250KB/event) 50 KHz 2.5 KHz 200 MB/s (250KB / 3.125 = 80KB/event)
  7. 7. Level 1 vertex trigger architecture Level 1 vertex trigger architecture 30 Pixel Stations Pixel Pre-processors FPGA Segment Finders ~2500-node track/vertex farm Switch (sort by crossing number) MERGE To Global Level-1 (GL1)
  8. 8. Pixel Preprocessor Counting Room Collision Hall Pixel stations FPIX2 Read-out chip DCB DCB DCB Data combiners Row (7bits) Column (5bits) BCO (8bits) ADC (3bits) sync (1bit) Optical links Pixel processor Pixel processor Pixel processor Pixel processor Optical Receiver Interface Time Stamp Expansion Event sorted by Time and column Hit cluster finder & x-y coordinate translator Level 1 Buffer Interface FPGA segment finder to neighboring FPGA segment finder
  9. 9. The Segment Tracker Architecture <ul><li>Find interior and exterior track segments in parallel in FPGAs. </li></ul><ul><li>The Segment finder algorithm is implemented in VHDL </li></ul>Long doublets Triplets N+1 Short doublets N Short doublets N-1 Short doublets MUX Station N Bend Station N-1 Bend Station N+1 Bend Long doublet projections Triplets projection Station N-1 nonbend Station N nonbend Triplets projection Station N+1 nonbend Triplets projection Short doublet outputs BB33 outputs Station 15 Station 16 Station 17 Station 15 Station 16 Station 17 Bend view Nonbend view 12 half Pixel planes at 12 different Z locations.
  10. 10. L1 Track and Vertex Farm <ul><li>Original baseline of L1 track and vertex farm used custom made processor board based on DSP or other processors. Total processors estimated to be 2500 TI DSP 6711. The L1 switch is custom designed too. </li></ul><ul><li>After DOE CD1 review, BTeV changed L1 baseline design. </li></ul><ul><ul><li>L1 Switch, Commercial off-the-shell Infiniband switch (or equivalent). </li></ul></ul><ul><ul><li>L1 Farm, array of commodity general processors, Apple G5 Xserves (or equivalent). </li></ul></ul>
  11. 11. Level 1 Trigger Architecture (New Baseline) 56 inputs at ~45 MB/s each Level 1 switch 33 outputs at ~76 MB/s each 1 Highway Trk/Vtx node #1 Trk/Vtx node #2 Trk/Vtx node #N PTSM network Global Level 1 Level 1 Buffer Track/Vertex Farm 33 “8GHz” Apple Xserve G5’s with dual IBM970’s Infiniband switch Ethernet network Apple Xserve identical to track/vertex nodes
  12. 12. R&D projects <ul><li>Software development for DSP Pre-prototype. </li></ul><ul><li>Level 1 trigger algorithm processing time studies on various processors. </li></ul><ul><ul><li>Part of trigger system R&D for a custom-made Level 1 trigger computing farm. </li></ul></ul><ul><li>StarFabric Switch test and bandwidth measurement. </li></ul><ul><ul><li>R&D for new Level 1 Trigger system baseline design. </li></ul></ul><ul><ul><li>After DOE CD1 review, BTeV collaboration decided to change baseline design of Level 1 trigger system. </li></ul></ul><ul><ul><ul><li>L1 Switch – replace custom switch with Infiniband switch(or equivalent). </li></ul></ul></ul><ul><ul><ul><li>L1 Farm – replace DSP hardware with Apple G5 Xserves (or equivalent). </li></ul></ul></ul><ul><li>Pixel Preprocessor of Level 1 trigger system. </li></ul><ul><ul><li>Clustering algorithm and firmware development. </li></ul></ul>
  13. 13. DSP Pre-prototype main goals <ul><li>Investigate current DSP hardware and software to determine technical choices for baseline design. </li></ul><ul><li>Study I/O data flow strategies. </li></ul><ul><li>Study Control and Monitoring techniques. </li></ul><ul><li>Study FPGA firmware algorithms and simulation tools. </li></ul><ul><ul><li>Understand major blocks needed. </li></ul></ul><ul><ul><li>Estimate logic size and achievable data bandwidths. </li></ul></ul><ul><li>Measure internal data transfers rates, latencies, and software overheads between processing nodes. </li></ul><ul><li>Provide a platform to run DSP fault tolerant routines. </li></ul><ul><li>Provide a platform to Run Trigger algorithms. </li></ul>
  14. 14. Features of DSP Pre-prototype Board <ul><li>Four DSP mezzanine cards on the board. This can test different Different TI DSPs for comparision. </li></ul><ul><li>The FPGA Data I/O Manager provides two way data buffering. It Communicates the PCI Test Adapter (PTA ) card to each DSP. </li></ul><ul><li>Two Arcnet Network ports. </li></ul><ul><ul><li>Port I is the PTSM (Pixel Trigger Supervise Monitor). </li></ul></ul><ul><ul><li>Port II is the Global Level 1 result port. </li></ul></ul><ul><ul><li>Each Network port is managed by a Hitachi microcontroller. </li></ul></ul><ul><ul><li>PTSM microcontroller communicates to the DSPs via DSP Host Interface to generate initialization and commands. </li></ul></ul><ul><ul><li>GL1 microcontroller receives trigger results via DSP’s Buffered Serial Port (BSP). </li></ul></ul><ul><li>Compact Flash Card to store DSP software and parameters. </li></ul><ul><li>Multiple JTAG ports for debugging and initial startup. </li></ul><ul><li>Operator LEDs. </li></ul>
  15. 15. L1 trigger 4-DSP prototype board
  16. 16. Level 1 Pixel Trigger Test Stand for the DSP pre-prototype Xilinx programming cable PTA+PMC card ARCnet card TI DSP JTAG emulator DSP daughter card
  17. 17. DSP Pre-prototype Software(1) <ul><li>PTSM task on the Hitachi PTSM microcontroller. </li></ul><ul><ul><li>System initialization. Kernel and DSP application downloading. </li></ul></ul><ul><ul><li>Command parsing and distribution to subsystems. </li></ul></ul><ul><ul><li>Error handling and reporting. </li></ul></ul><ul><ul><li>Hardware and software status reporting. </li></ul></ul><ul><ul><li>Diagnostics and testing functions. </li></ul></ul><ul><li>GL1 task on the Hitachi GL1 microcontroller. </li></ul><ul><ul><li>Receives the trigger results from the DSP’s and send to the GL1 host computer. </li></ul></ul><ul><li>Hitachi Microcontroller API. A library of low level C routines have been developed to support many low level functions. </li></ul><ul><ul><li>ArcNet network driver. </li></ul></ul><ul><ul><li>Compact Flash API. Support FAT16 file system. </li></ul></ul><ul><ul><li>LCD API. Display messages on the on-board LCD. </li></ul></ul><ul><ul><li>Serial Port API: </li></ul></ul><ul><ul><li>JTAG API </li></ul></ul><ul><ul><li>One Wire API </li></ul></ul><ul><ul><li>DSP Interface API. Boot and reset DSP’s; access memory and registers on the DSP’s. </li></ul></ul>
  18. 18. DSP Pre-prototype Software(2) <ul><li>Host computer software. </li></ul><ul><ul><li>PTSM Menu-driven interface. </li></ul></ul><ul><ul><li>GL1 message receiving and displaying. </li></ul></ul><ul><li>Custom defined protocol built on the lowest level of ArcNet network driver. Most efficient without standard protocol overhead. </li></ul>
  19. 19. Processor evaluation <ul><li>We continued to measure Level 1 trigger algorithm processing time on various new processors. </li></ul><ul><li>MIPS RM9000x2 processor. Jaguar-ATX evaluation board. </li></ul><ul><ul><li>Time studies on Linux 2.4 </li></ul></ul><ul><ul><li>Time studies on standalone. Compiler MIPS SDE Lite 5.03.06. </li></ul></ul><ul><ul><li>System (Linux) overhead for processing time is about 14%. </li></ul></ul><ul><li>PowerPC 7447 (G4) and PowerPC 8540 PowerQuiccIII (G5). </li></ul><ul><ul><li>GDA Tech PMC8540 eval card and Motorola Sandpoint eval board with PMC7447A. </li></ul></ul><ul><ul><li>Green Hills Multi2000 IDE with Green Hills probe for standalone testing. </li></ul></ul>Green Hills Probe 8540 eval board
  20. 20. Candidate processors for Level 1 Farm Intel Pentium 4/Xeon IBM 970 PPC Motorola 74xx G4 PPC Motorola 8540 PQIII PPC PMC Sierra MIPS RM9000x2 L1 algorithm processing time TI TMS320C6711 (baseline) 341 us (600MHz, MIPS SDE Lite 5.03.06) 195 us (1GHz 7455, Apple PowerMac G4) 117 us (2.4 GHz Xeon) 74 us (2.0 GHz Apple PowerMac G5) 1,571 us provided for comparison suited for an off-the-shelf solution using desktop PC (or G5 server) for computing farm. Processor 271 us (660MHz, GHS MULTI 2K 4.01) Motorola 7447A G4 PPC 121 us (1.4GHz, GHS MULTI 2K 4.01)
  21. 21. StarFabric Switch Testing and Bandwidth Measurement <ul><li>In the new baseline design of BTeV Level 1 trigger system, the commercial, off-the-shelf switch will be used for the event builder. </li></ul><ul><li>Two commercial switch technology are tested, Infiniband (by Fermilab) and StarFabric (by IIT group with Fermilab). </li></ul><ul><li>Hardware setup for StarFabric switch testing. </li></ul><ul><ul><li>PC with PCI bus 32/33. </li></ul></ul><ul><ul><li>StarFabric adapter, StarGen 2010. </li></ul></ul><ul><ul><li>StarFabric switch, StarGen 1010. </li></ul></ul><ul><li>Software </li></ul><ul><ul><li>StarFabric windows driver. </li></ul></ul>P4/W2k SG2010 PCI 32/33 SG1010 Athlon/XP SG2010 PCI 32/33 Test Stand
  22. 22. L1 Switch Bandwidth Measurement <ul><li>StarFabric bandwitdh is between 74~84 Mbytes/s for packet size of 1 kByte to 8 kBytes. This result can not meet the bandwidth requirement of event builder. </li></ul><ul><li>A simple way to improve performance is to use PCI-x(32/66 or 64/66) . Infiniband test stand uses PCI-X adapters in input/output computer nodes. </li></ul><ul><li>Based on this result and other consideration, Infiniband is chosen in the new baseline design of the Level 1 trigger system. But, we are still looking at StarFabric and other possible switch fabric. </li></ul>167 MB/s Bandwidth Target At peak luminosity (<6> ints./BCO), with 50% excess capacity Infiniband StarFabric
  23. 23. Pixel Preprocessor Optical Receiver Interface Time Stamp Expansion Event sorted by Time and culomn Hit cluster finder & x-y coordinate translator Segment Trackers Pixel Detector Front-end Level 1 Buffer Interface DAQ 56 inputs at ~ 45 MB/s each Level 1 switch 33 outputs at ~76 MB/s each Infiniband switch 30 station pixel detector PP&ST Segment Tracker Nodes
  24. 24. Row and Column Clustering <ul><li>A track can hit more than one pixel due to charge sharing. </li></ul><ul><li>One function of pixel Preprocessor is to find adjacent pixel hits, group them as a cluster and calculate x-y coordinates of cluster. </li></ul><ul><li>Adjacent hits in the same row form a row cluster. </li></ul><ul><li>Two overlapping row clusters in adjacent columns form a cross column cluster. </li></ul>Pixel Chip
  25. 25. Cluster Finder Block Diagram <ul><li>The order of input hits in a row is defined. However, the column order is not. </li></ul><ul><li>The hash sorter is used to produce defined column order. </li></ul><ul><li>The row cluster processor identifies adjacent hits in a row and pass the starting/ending row numbers to next stage. </li></ul><ul><li>The cross-column processor groups overlap hits (or clusters) in adjacent columns together. </li></ul><ul><li>Cluster parameters are calculated in the cluster parameter calculator. </li></ul>Hit Input Row Cluster Processor: Cross-Row Clusters FIFO Hash Sorter: Column Ordering Cross-Column Processor Cross-Col. Clusters Col N Col N-1 Cluster Parameters Calculator Cluster
  26. 26. Implementation for Cross-Column Cluster Finder State Control Hits Cross-row headers Col. A Col. B Cross-column headers Hits Col. B Col. A The cluster in Col. A is a single column one and is popped out. The two clusters form a cross-column one and are popped out. Col. B Col. A If Col. B is not next to the Col. A, entire Col. A is popped out. FIFO2 FIFO1 The cluster in Col. B is not connected with Col. A and is filled into FIFO2.
  27. 27. Implementation for Cross-Column Cluster Finder (cont’d) <ul><li>The cross-column cluster finder firmware is written with VHDL. </li></ul>Fill Col. A Col. B = Col. A +1 ? Pop Col. A Fill Col. B (1) u AN < u B1 (2) u A1 > u BN Fill B Pop A Pop A Pop B Y N u AN < u B1 u A1 > u BN Neither State Control Col. A Col. B FIFO2 FIFO1
  28. 28. BES-II DAQ System BES experiment upgraded its detector and DAQ system in 1997.
  29. 29. Beijing Spectrometer
  30. 30. Performance of BES-II and BES-I 375ps 180ps  T TOF 20ms 10ms Dead Time DAQ 7.9 cm Layer 1 10.6 cm Layer 2 13.2 cm Layer 3  Z MUON 24.4% E -1/2 21% E -1/2  E/E SC 7.8% 8.0% dE/dx 200-250  m 198-224  m  xy 220  m 90  m  xy VC 1.76%(1+ P 2 ) 1/2 1.78%(1+ P 2 ) 1/2  P/P MDC BES-I BES-II Variable Subsystem
  31. 31. BES-II DAQ System <ul><li>Front-end electronics for all of system, except VC, consist of CAMAC BADC (Brilliant ADC). </li></ul><ul><li>VCBD, VME CAMAC Branch Driver. Read data of one detector subsystem. And store the data in the local buffer. </li></ul><ul><li>Two VME CPU modules with RT OS VMEexec. </li></ul><ul><ul><li>One for data acquisition and event building. </li></ul></ul><ul><ul><li>Another one for event logging to tape and sending a fraction of events to Alpha 3600. </li></ul></ul><ul><li>DEC Alpha 3600 machine. </li></ul><ul><ul><li>DAQ control console. </li></ul></ul><ul><ul><li>Status/error report. </li></ul></ul><ul><ul><li>Online data analysis and display. </li></ul></ul><ul><ul><li>Communication with BEPC control machines to obtain BEPC status parameters. </li></ul></ul><ul><li>The system dead time: 10 ms. </li></ul><ul><ul><li>BADC conversion: 6ms. </li></ul></ul><ul><ul><li>VCBD readout: 3ms. </li></ul></ul>
  32. 32. Fastbus subsystem for Vertex Chamber <ul><li>One Fastbus crate for 640 VC channels. </li></ul><ul><li>Fastbus logical board. </li></ul><ul><ul><li>Distributes every kind of signals to TDCs, common stop, reset (fast clear). </li></ul></ul><ul><ul><li>Produce internal start and stop test pulses. </li></ul></ul><ul><ul><li>Good event signal tells the 1821 to read data from 1879. </li></ul></ul>
  33. 33. Microcode for the 1821 <ul><li>Initialization for 1879. </li></ul><ul><ul><li>TDC scale: 1 us. </li></ul></ul><ul><ul><li>Compact parameter: 10 ns. </li></ul></ul><ul><ul><li>Active Time Interval: 512 bins. </li></ul></ul><ul><li>Readout 1879 data into data memory of 1821. </li></ul><ul><ul><li>Block transfer. </li></ul></ul><ul><ul><li>Sparse data scan method. TDC modules containing data are readout only. </li></ul></ul><ul><li>Send data ready signal (interrupt) to VME. </li></ul><ul><li>SONIC language. Symbolic Macro Assembler. Converted to microcode under LIFT. </li></ul><ul><li>LIFT (LeCroy Interactive Fastbus Toolkit). Tool for developing microcodes and testing FB system under PC. </li></ul>
  34. 34. VC DAQ Software in VME <ul><li>A task running in VME 162. </li></ul><ul><li>Control by BES-II DAQ main task through message queues. </li></ul><ul><li>Down loading the microcode into 1821. </li></ul><ul><li>Control the procedure of VC data taking. </li></ul><ul><li>Readout time data from 1821 into 1131 data memory after receiving interrupt signal. </li></ul><ul><li>Data transfer modes: </li></ul><ul><ul><li>High 16-bit: DMA. </li></ul></ul><ul><ul><li>Low 16-bit: word by word. </li></ul></ul><ul><li>Measured transfer rate. </li></ul><ul><ul><li>96(chans)x7(modules)x2(both edges)+3(marks) = 1347 32-bit words. </li></ul></ul><ul><ul><li>High 16-bit: DMA: 1.1 ms @VME 162. </li></ul></ul><ul><ul><li>Low 16-bit: word by word: 3.5 ms@VME 162. </li></ul></ul>
  35. 35. End The End
  36. 36. Backup slides Backups
  37. 37. BTeV trigger architecture
  38. 38. L1 Highway Bandwidth Estimates Bandwidth estimates are for 6 interactions/crossing & include 50 % excess capacity DAQ Highway Switch