The document summarizes the software development for the new data acquisition system of the COMPASS experiment at CERN. It describes the existing DAQ system and its limitations. The new system replaces readout buffers and event builders with custom FPGA-based hardware for improved performance and reliability. It discusses the hardware architecture, software requirements, and layered software architecture based on master and slave processes. Preliminary performance and stability tests of the system show promising results for deployment in 2014.
Abstract— During the past year Xilinx, for the first time ever, set out to quantify the soft error rate of a multi-core microprocessor. This work extends on Xilinx’s 10+ years of heritage in FPGA radiation testing. Built on the 28 nanometer technology node, Xilinx’s ZynqTM family of devices integrate a processor subsystem with programmable logic. The processor subsystem includes two 32 bit ARM CortexTM-A9 CPU’s, two NEONTM floating point units, two SIMD processing units, an L1 and L2 cache, on chip SRAM memory and various peripherals. The programmable logic is directly connected with the processing subsystem via ARM’s AMBATM 4 AXI interface. This programmable logic is based on the 7 Series FPGA fabric, consisting of 6-input LUTs and DFFs along with Block RAM, DSP slices, multi-gigabit transceivers, and other blocks. Tests were performed using a proton beam to analyze the soft error susceptibility of the new device. Proton beam testing was deemed acceptable since previous neutron beam and proton beam testing had shown virtually identical cross-sections for 7 Series programmable logic. The results are promising and yield a solid baseline for a typical embedded application targeting any of the Zynq SoC devices. As a foray into processor testing, this Zynq work has laid a solid foundation for future Xilinx SoC test campaigns.
Austin Lesea, Wojciech Koszek, Glenn Steiner, Gary Swift, and Dagan White Xilinx, Inc.
Paper: SELSE 2014 @ Stanford University (PDF, 456KB), 2014
Slides: (PDF, 933KB), 2014
Challenges in Assessing Single Event Upset Impact on Processor SystemsWojciech Koszek
Abstract—This paper presents a test methodology developed at Xilinx for real-time soft-error rate testing as well as the software framework in which Device-Under-Test (DUT) and controlling computer are both synchronized with the proton beam controls and run experiments automatically in a predictable manner. The method presented has been successfully used for Zynq®-7000 All Programmable SoC testing at the UC Davis Crocker Nuclear Lab. Presented are the issues and challenges encountered during design and implementation of the framework, as well as lessons learned from the in-house experiments and bootstrapping tests performed with Thorium Foil. The method presented has helped Xilinx to deliver high-quality experimental data and to optimize time spent in the testing facility.
Keywords—Error detection, soft error, architectural vulnerability, statistical error, confidence level, beam facility control
The document discusses the development and testing of a centralized fault identification and location (CFL) system for a medium voltage direct current shipboard power system. Key points:
1) A CFL system was modeled to identify faults within 8 ms as required for the power system.
2) Testing of the CFL system demonstrated fault detection times of around 300 microseconds for different system configurations and fault conditions.
3) Performance models were developed to analyze how factors like topology, bandwidth, noise and others affect the CFL system and scaling.
Bridging the gap between hardware and software tracingChristian Babeux
For a numbers of years, silicon vendors have been providing hardware tracing facilities to embedded developers. By using these, developers can resolve performance and latency issues more quickly, resulting in shorter time to market. In this talk, we will cover the hardware based tracing facilities offered by various manufacturers and see how they differ from their software counterparts with respect to their instrumentation capabilities, transport mechanisms, output formats, etc. We will also show how joint hardware and software tracing can be used by developers to gain deeper insights in their applications’ behaviour. Finally, we will outline the on-going work within the Linux Trace Toolkit next generation (LTTng) project to enhance hardware tracing support and tracing data visualization.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsQin Liu
SAND is a fault-tolerant streaming architecture designed for real-time network traffic analytics. It uses a C++ implementation to achieve high performance processing speeds unable to be matched by Java-based systems like Storm. SAND introduces a new fault tolerance scheme using a checkpointing protocol that allows workers to reliably recover state after failures while providing correct results. Evaluation shows SAND can process over 9.6 million packets per second, outperforming alternatives by 3.7-37x, and reliably recover from failures within seconds.
1. The document describes an MMAP failure occurring occasionally with a DPDK secondary application.
2. Address Space Layout Randomization (ASLR) can interfere with shared memory mappings between primary and secondary DPDK processes. Disabling ASLR may resolve MMAP failures.
3. Providing a fixed base virtual address with the "--base-virtaddr" option can ensure primary and secondary applications mmap shared memory at the same locations if ASLR is enabled.
Abstract— During the past year Xilinx, for the first time ever, set out to quantify the soft error rate of a multi-core microprocessor. This work extends on Xilinx’s 10+ years of heritage in FPGA radiation testing. Built on the 28 nanometer technology node, Xilinx’s ZynqTM family of devices integrate a processor subsystem with programmable logic. The processor subsystem includes two 32 bit ARM CortexTM-A9 CPU’s, two NEONTM floating point units, two SIMD processing units, an L1 and L2 cache, on chip SRAM memory and various peripherals. The programmable logic is directly connected with the processing subsystem via ARM’s AMBATM 4 AXI interface. This programmable logic is based on the 7 Series FPGA fabric, consisting of 6-input LUTs and DFFs along with Block RAM, DSP slices, multi-gigabit transceivers, and other blocks. Tests were performed using a proton beam to analyze the soft error susceptibility of the new device. Proton beam testing was deemed acceptable since previous neutron beam and proton beam testing had shown virtually identical cross-sections for 7 Series programmable logic. The results are promising and yield a solid baseline for a typical embedded application targeting any of the Zynq SoC devices. As a foray into processor testing, this Zynq work has laid a solid foundation for future Xilinx SoC test campaigns.
Austin Lesea, Wojciech Koszek, Glenn Steiner, Gary Swift, and Dagan White Xilinx, Inc.
Paper: SELSE 2014 @ Stanford University (PDF, 456KB), 2014
Slides: (PDF, 933KB), 2014
Challenges in Assessing Single Event Upset Impact on Processor SystemsWojciech Koszek
Abstract—This paper presents a test methodology developed at Xilinx for real-time soft-error rate testing as well as the software framework in which Device-Under-Test (DUT) and controlling computer are both synchronized with the proton beam controls and run experiments automatically in a predictable manner. The method presented has been successfully used for Zynq®-7000 All Programmable SoC testing at the UC Davis Crocker Nuclear Lab. Presented are the issues and challenges encountered during design and implementation of the framework, as well as lessons learned from the in-house experiments and bootstrapping tests performed with Thorium Foil. The method presented has helped Xilinx to deliver high-quality experimental data and to optimize time spent in the testing facility.
Keywords—Error detection, soft error, architectural vulnerability, statistical error, confidence level, beam facility control
The document discusses the development and testing of a centralized fault identification and location (CFL) system for a medium voltage direct current shipboard power system. Key points:
1) A CFL system was modeled to identify faults within 8 ms as required for the power system.
2) Testing of the CFL system demonstrated fault detection times of around 300 microseconds for different system configurations and fault conditions.
3) Performance models were developed to analyze how factors like topology, bandwidth, noise and others affect the CFL system and scaling.
Bridging the gap between hardware and software tracingChristian Babeux
For a numbers of years, silicon vendors have been providing hardware tracing facilities to embedded developers. By using these, developers can resolve performance and latency issues more quickly, resulting in shorter time to market. In this talk, we will cover the hardware based tracing facilities offered by various manufacturers and see how they differ from their software counterparts with respect to their instrumentation capabilities, transport mechanisms, output formats, etc. We will also show how joint hardware and software tracing can be used by developers to gain deeper insights in their applications’ behaviour. Finally, we will outline the on-going work within the Linux Trace Toolkit next generation (LTTng) project to enhance hardware tracing support and tracing data visualization.
Here are some useful GDB commands for debugging:
- break <function> - Set a breakpoint at a function
- break <file:line> - Set a breakpoint at a line in a file
- run - Start program execution
- next/n - Step over to next line, stepping over function calls
- step/s - Step into function calls
- finish - Step out of current function
- print/p <variable> - Print value of a variable
- backtrace/bt - Print the call stack
- info breakpoints/ib - List breakpoints
- delete <breakpoint#> - Delete a breakpoint
- layout src - Switch layout to source code view
- layout asm - Switch layout
SAND: A Fault-Tolerant Streaming Architecture for Network Traffic AnalyticsQin Liu
SAND is a fault-tolerant streaming architecture designed for real-time network traffic analytics. It uses a C++ implementation to achieve high performance processing speeds unable to be matched by Java-based systems like Storm. SAND introduces a new fault tolerance scheme using a checkpointing protocol that allows workers to reliably recover state after failures while providing correct results. Evaluation shows SAND can process over 9.6 million packets per second, outperforming alternatives by 3.7-37x, and reliably recover from failures within seconds.
1. The document describes an MMAP failure occurring occasionally with a DPDK secondary application.
2. Address Space Layout Randomization (ASLR) can interfere with shared memory mappings between primary and secondary DPDK processes. Disabling ASLR may resolve MMAP failures.
3. Providing a fixed base virtual address with the "--base-virtaddr" option can ensure primary and secondary applications mmap shared memory at the same locations if ASLR is enabled.
In the last few years energy efficiency of large scale infrastructures gained a lot of attention, as power consumption became one of the most impacting factors of the operative costs of a data-center and of its Total Cost of Ownership. Power consumption can be observed at different layers of the data-center: from the overall power grid, moving to each rack and arriving to each machine and system. Given the rise of application containers in the cloud computing scenario, it becomes more and more important to measure power consumption also at the application level, where power-aware schedulers and orchestrators can optimize the execution of the workloads not only from a performance perspective, but also considering performance/power trade-offs. DEEP-mon is a novel monitoring tool able to measure power consumption and attribute it for each thread and application container running in the system, without any previous knowledge regarding the characteristics of the application and without any kind of workload instrumentation. DEEP-mon is able to aggregate data for threads, application containers and hosts with a negligible impact on the monitored system and on the running workloads.
Information obtained with DEEP-mon open the way for a wide set of applications exploiting the capabilities offered by the monitoring tool, from power (and hence cost) metering of new software components deployed in the data center, to fine grained power capping and power-aware scheduling and co-location.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
This document discusses performance optimization for data centers on multi-core platforms and provides a case study analysis. It introduces Intel software tuning tools, describes a methodology for data center performance tuning involving system, application, and microarchitecture levels, and analyzes a case study where thread synchronization overhead was identified and reduced through the use of NPTL in Linux, improving CPU utilization and throughput.
PMIx enables dynamic coordination between runtime systems like MPI and OpenMP. Geoffroy Vallee discussed using PMIx to coordinate MPI and OpenMP runtimes for improved resource utilization. Specifically, PMIx could be used to explicitly place MPI ranks and OpenMP threads, allowing dynamic adjustment of the placement layout between application phases. This coordination could optimize application performance on modern complex multicore and manycore architectures.
This document discusses using static tracing and dynamic probes to debug DPDK applications. It provides an overview of tools like DPDK-PROCINFO, DPDK-PDUMP, LTTNG, and user probes. Screenshots demonstrate using eBPF binaries with DPDK to enable dynamic debugging of applications in production environments where other debugging techniques are not possible. Future areas to explore include developing user probes similar to dynamic tracing and integrating eBPF event data with tools like VTune.
This document discusses various Linux debugging tools including:
1. SIMD, cache monitoring, firmware checks, NUMA memory, interrupts using tools like lstopo, ethtool, lspci, and lshw.
2. Using GDB for debugging with features like breakpoints, disassembly, and core file generation.
3. Tools like strace, ltrace, nm, objdump, and readelf for system call tracing, library call tracing, symbol tables, and object file analysis.
4. Techniques like LD_PRELOAD, ulimit, and perf for custom debugging and performance analysis.
Altreonic was spun off in 2008 from Eonic Systems to focus on real-time operating systems using formal techniques. Their OpenComRTOS is a small, network-centric real-time OS that uses CSP concurrency and can scale from 1 to over 10,000 nodes. It provides priority-based communication and fault tolerance and has been implemented on many heterogeneous platforms from DSPs to many-core systems.
ebpf and IO Visor: The What, how, and what next!Affan Syed
Extended BPF (eBPF) provides a mechanism for running custom programs inside the Linux kernel that can be used for filtering network packets, monitoring system activity, and more. eBPF programs are written in a restricted subset of C and compiled to bytecode that is verified by the kernel for safety before being run. The BCC toolkit makes it easier to write and load eBPF programs. The IO Visor project aims to further develop eBPF and provide tools and use cases for networking, security, and system tracing applications.
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
1. DPDK achieves high throughput packet processing on commodity hardware by reducing kernel overhead through techniques like polling, huge pages, and userspace drivers.
2. In Linux, packet processing involves expensive operations like system calls, interrupts, and data copying between kernel and userspace. DPDK avoids these by doing all packet processing in userspace.
3. DPDK uses techniques like isolating cores for packet I/O threads, lockless ring buffers, and NUMA awareness to further optimize performance. It can achieve throughput of over 14 million packets per second on 10GbE interfaces.
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
The QuantStudio 12K Flex Real-Time PCR System is an all-in-one instrument designed for high throughput real-time PCR. It has five interchangeable blocks that allow it to analyze from 1 to over 12,000 samples in a single run. The OpenArray format can analyze up to four plates with 3,072 reactions each, generating over 12,000 data points per run. The system offers flexibility in sample volume and format, easy block changes, and software for integrated analysis and sample tracking from loading to results.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
This slides were presented at the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’16).
Virtual Machine based approaches to workload consolidation, as seen in IaaS cloud as well as datacenter platforms, have long had to contend with performance degradation caused by synchronization primitives inside the guest environments. These primitives can be affected by virtual CPU preemptions by the host scheduler that can introduce delays that are orders of magnitude longer than those primitives were designed for. While a significant amount of work has focused on the behavior of spinlock primitives as a source of these performance issues, spinlocks do not represent the entirety of synchronization mechanisms that are susceptible to scheduling issues when running in a virtualized environment. In this paper we address the virtualized performance issues introduced by TLB shootdown operations. Our profiling study, based on the PARSEC benchmark suite, has shown that up to 64% of a VM's CPU time can be spent on TLB shootdown operations under certain workloads. In order to address this problem, we present a paravirtual TLB shootdown scheme named Shoot4U. Shoot4U completely eliminates TLB shootdown preemptions by invalidating guest TLB entries from the VMM and allowing guest TLB shootdown operations to complete without waiting for remote virtual CPUs to be scheduled. Our performance evaluation using the PARSEC benchmark suite demonstrates that Shoot4U can reduce benchmark runtime by up to 85% compared an unmodified Linux kernel, and up to 44% over a state-of-the-art paravirtual TLB shootdown scheme.
An Essential Relationship between Real-time and Resource PartitioningYoshitake Kobayashi
(ELC Europe 2013)
Running real-time and general purpose applications on a same hardware is normally a crazy idea in most case. However, we strongly focus to run both applications on a hardware without virtualization. Resource Partitioning enables the assignment of hardware resource (e.g.: core, execution time, memory bandwidth or device access) to processes with special requirements (e.g: real-time performance or safety requirements).
In this talk, we would like to discuss current limitation on Linux kernel and describe how to solve it.
NUSE is a library implementation of a network stack in userspace that allows new protocols and implementations to be added more quickly without modifying the kernel. It works by hijacking system calls related to networking at the library level, running the network stack code in a separate execution context using lightweight virtualization, and connecting to the network interface using options like raw sockets, DPDK, or netmap. This approach avoids the slow evolution of making kernel changes and allows network stacks and applications to be updated and deployed more flexibly on a per-application basis.
This document contains summaries of 14 software projects developed by LV Tailoring Software for the semiconductor and medical industries. The software projects include modifications to robotic arm cleaning systems, advanced matching for CDSEM metrology, remote CDSEM control centers, log file analyzers for various tools, and systems for fast creation of new product classes for metrology tools.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses using the Eclipse framework and OSGi for developing modular applications for embedded devices. It describes challenges in embedded systems development and how OSGi addresses these challenges through loose coupling and dynamic updating. It also summarizes several Eclipse projects - DeviceKit for hardware interfacing, an industrial graphics framework, and p2 for remote provisioning - and provides examples of applications developed with these technologies, including for the US Army.
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
This document describes a Linux-based data acquisition and processing system implemented on a palmtop computer. The system uses a PCMCIA data acquisition card and free Linux drivers and libraries to acquire signals from sensors. As a demonstration, a phonometer application was created that can sample 1024 signals at 100 ksamples/s and compute the fast Fourier transform of the signal up to 6 times per second. The document outlines the hardware and software design of the system, including using a custom Linux kernel, COMEDI libraries for device control, and TCL/Tk for the user interface. Experimental results showed the system could successfully implement the phonometer application for acoustic signal analysis on the palmtop computer.
In the last few years energy efficiency of large scale infrastructures gained a lot of attention, as power consumption became one of the most impacting factors of the operative costs of a data-center and of its Total Cost of Ownership. Power consumption can be observed at different layers of the data-center: from the overall power grid, moving to each rack and arriving to each machine and system. Given the rise of application containers in the cloud computing scenario, it becomes more and more important to measure power consumption also at the application level, where power-aware schedulers and orchestrators can optimize the execution of the workloads not only from a performance perspective, but also considering performance/power trade-offs. DEEP-mon is a novel monitoring tool able to measure power consumption and attribute it for each thread and application container running in the system, without any previous knowledge regarding the characteristics of the application and without any kind of workload instrumentation. DEEP-mon is able to aggregate data for threads, application containers and hosts with a negligible impact on the monitored system and on the running workloads.
Information obtained with DEEP-mon open the way for a wide set of applications exploiting the capabilities offered by the monitoring tool, from power (and hence cost) metering of new software components deployed in the data center, to fine grained power capping and power-aware scheduling and co-location.
DPDK in depth
This document provides an overview of DPDK (Data Plane Development Kit):
1. DPDK is an open source project for data plane programming and network acceleration. It started at Intel in 2010 and is now maintained by the Linux Foundation.
2. DPDK provides poll mode drivers (PMDs), libraries, and sample applications for fast packet processing. It uses hugepages and avoids kernel involvement for high performance.
3. The document outlines several DPDK projects, libraries, PMDs, advantages and disadvantages, development process, and demonstrates a simple DPDK application (l2fwd) and the testpmd tool.
This document discusses performance optimization for data centers on multi-core platforms and provides a case study analysis. It introduces Intel software tuning tools, describes a methodology for data center performance tuning involving system, application, and microarchitecture levels, and analyzes a case study where thread synchronization overhead was identified and reduced through the use of NPTL in Linux, improving CPU utilization and throughput.
PMIx enables dynamic coordination between runtime systems like MPI and OpenMP. Geoffroy Vallee discussed using PMIx to coordinate MPI and OpenMP runtimes for improved resource utilization. Specifically, PMIx could be used to explicitly place MPI ranks and OpenMP threads, allowing dynamic adjustment of the placement layout between application phases. This coordination could optimize application performance on modern complex multicore and manycore architectures.
This document discusses using static tracing and dynamic probes to debug DPDK applications. It provides an overview of tools like DPDK-PROCINFO, DPDK-PDUMP, LTTNG, and user probes. Screenshots demonstrate using eBPF binaries with DPDK to enable dynamic debugging of applications in production environments where other debugging techniques are not possible. Future areas to explore include developing user probes similar to dynamic tracing and integrating eBPF event data with tools like VTune.
This document discusses various Linux debugging tools including:
1. SIMD, cache monitoring, firmware checks, NUMA memory, interrupts using tools like lstopo, ethtool, lspci, and lshw.
2. Using GDB for debugging with features like breakpoints, disassembly, and core file generation.
3. Tools like strace, ltrace, nm, objdump, and readelf for system call tracing, library call tracing, symbol tables, and object file analysis.
4. Techniques like LD_PRELOAD, ulimit, and perf for custom debugging and performance analysis.
Altreonic was spun off in 2008 from Eonic Systems to focus on real-time operating systems using formal techniques. Their OpenComRTOS is a small, network-centric real-time OS that uses CSP concurrency and can scale from 1 to over 10,000 nodes. It provides priority-based communication and fault tolerance and has been implemented on many heterogeneous platforms from DSPs to many-core systems.
ebpf and IO Visor: The What, how, and what next!Affan Syed
Extended BPF (eBPF) provides a mechanism for running custom programs inside the Linux kernel that can be used for filtering network packets, monitoring system activity, and more. eBPF programs are written in a restricted subset of C and compiled to bytecode that is verified by the kernel for safety before being run. The BCC toolkit makes it easier to write and load eBPF programs. The IO Visor project aims to further develop eBPF and provide tools and use cases for networking, security, and system tracing applications.
DPDK is a set of drivers and libraries that allow applications to bypass the Linux kernel and access network interface cards directly for very high performance packet processing. It is commonly used for software routers, switches, and other network applications. DPDK can achieve over 11 times higher packet forwarding rates than applications using the Linux kernel network stack alone. While it provides best-in-class performance, DPDK also has disadvantages like reduced security and isolation from standard Linux services.
1. DPDK achieves high throughput packet processing on commodity hardware by reducing kernel overhead through techniques like polling, huge pages, and userspace drivers.
2. In Linux, packet processing involves expensive operations like system calls, interrupts, and data copying between kernel and userspace. DPDK avoids these by doing all packet processing in userspace.
3. DPDK uses techniques like isolating cores for packet I/O threads, lockless ring buffers, and NUMA awareness to further optimize performance. It can achieve throughput of over 14 million packets per second on 10GbE interfaces.
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
The QuantStudio 12K Flex Real-Time PCR System is an all-in-one instrument designed for high throughput real-time PCR. It has five interchangeable blocks that allow it to analyze from 1 to over 12,000 samples in a single run. The OpenArray format can analyze up to four plates with 3,072 reactions each, generating over 12,000 data points per run. The system offers flexibility in sample volume and format, easy block changes, and software for integrated analysis and sample tracking from loading to results.
This document provides an introduction to eBPF and XDP. It discusses the history of BPF and how it evolved into eBPF. Key aspects of eBPF covered include the instruction set, JIT compilation, verifier, helper functions, and maps. XDP is introduced as a way to program the data plane using eBPF programs attached early in the receive path. Example use cases and performance benchmarks for XDP are also mentioned.
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
This slides were presented at the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’16).
Virtual Machine based approaches to workload consolidation, as seen in IaaS cloud as well as datacenter platforms, have long had to contend with performance degradation caused by synchronization primitives inside the guest environments. These primitives can be affected by virtual CPU preemptions by the host scheduler that can introduce delays that are orders of magnitude longer than those primitives were designed for. While a significant amount of work has focused on the behavior of spinlock primitives as a source of these performance issues, spinlocks do not represent the entirety of synchronization mechanisms that are susceptible to scheduling issues when running in a virtualized environment. In this paper we address the virtualized performance issues introduced by TLB shootdown operations. Our profiling study, based on the PARSEC benchmark suite, has shown that up to 64% of a VM's CPU time can be spent on TLB shootdown operations under certain workloads. In order to address this problem, we present a paravirtual TLB shootdown scheme named Shoot4U. Shoot4U completely eliminates TLB shootdown preemptions by invalidating guest TLB entries from the VMM and allowing guest TLB shootdown operations to complete without waiting for remote virtual CPUs to be scheduled. Our performance evaluation using the PARSEC benchmark suite demonstrates that Shoot4U can reduce benchmark runtime by up to 85% compared an unmodified Linux kernel, and up to 44% over a state-of-the-art paravirtual TLB shootdown scheme.
An Essential Relationship between Real-time and Resource PartitioningYoshitake Kobayashi
(ELC Europe 2013)
Running real-time and general purpose applications on a same hardware is normally a crazy idea in most case. However, we strongly focus to run both applications on a hardware without virtualization. Resource Partitioning enables the assignment of hardware resource (e.g.: core, execution time, memory bandwidth or device access) to processes with special requirements (e.g: real-time performance or safety requirements).
In this talk, we would like to discuss current limitation on Linux kernel and describe how to solve it.
NUSE is a library implementation of a network stack in userspace that allows new protocols and implementations to be added more quickly without modifying the kernel. It works by hijacking system calls related to networking at the library level, running the network stack code in a separate execution context using lightweight virtualization, and connecting to the network interface using options like raw sockets, DPDK, or netmap. This approach avoids the slow evolution of making kernel changes and allows network stacks and applications to be updated and deployed more flexibly on a per-application basis.
This document contains summaries of 14 software projects developed by LV Tailoring Software for the semiconductor and medical industries. The software projects include modifications to robotic arm cleaning systems, advanced matching for CDSEM metrology, remote CDSEM control centers, log file analyzers for various tools, and systems for fast creation of new product classes for metrology tools.
In this session, we’ll review how previous efforts, including Netfilter, Berkley Packet Filter (BPF), Open vSwitch (OVS), and TC, approached the problem of extensibility. We’ll show you an open source solution available within the Red Hat Enterprise Linux kernel, where extending and merging some of the existing concepts leads to an extensible framework that satisfies the networking needs of datacenter and cloud virtualization.
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The document discusses using the Eclipse framework and OSGi for developing modular applications for embedded devices. It describes challenges in embedded systems development and how OSGi addresses these challenges through loose coupling and dynamic updating. It also summarizes several Eclipse projects - DeviceKit for hardware interfacing, an industrial graphics framework, and p2 for remote provisioning - and provides examples of applications developed with these technologies, including for the US Army.
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
This document describes a Linux-based data acquisition and processing system implemented on a palmtop computer. The system uses a PCMCIA data acquisition card and free Linux drivers and libraries to acquire signals from sensors. As a demonstration, a phonometer application was created that can sample 1024 signals at 100 ksamples/s and compute the fast Fourier transform of the signal up to 6 times per second. The document outlines the hardware and software design of the system, including using a custom Linux kernel, COMEDI libraries for device control, and TCL/Tk for the user interface. Experimental results showed the system could successfully implement the phonometer application for acoustic signal analysis on the palmtop computer.
Linux-Based Data Acquisition and Processing On Palmtop ComputerIOSR Journals
This document describes the development of a data acquisition and processing system using a palmtop computer running Linux. The system uses a PCMCIA data acquisition card and free Linux drivers and libraries. A demo application was created that can sample 1024 signals from a microphone at 100 ksamples/s and compute the fast Fourier transform of the signal up to 6 times per second. The document outlines the hardware and software implementation including developing the C code on a desktop, cross compiling it for the palmtop, and downloading and testing the executable on the palmtop computer. It provides details on using COMEDI libraries for data acquisition and TCL/Tk for the graphical user interface.
This document provides an overview of BARCoMmS, a ground station testing software created by NASA interns for the iSat project. BARCoMmS consists of four main modules - DITL, CFDP, Bulletin, and Command. The CFDP module enables reliable file transfers using CCSDS protocols and includes GUIs for control and monitoring. The Command module sends commands to and displays telemetry from the satellite. All modules communicate through signals and slots in a modular architecture, and additional modules can easily be added. BARCoMmS provides a framework for testing and developing the iSat flight software.
OSGi: Best Tool In Your Embedded Systems ToolboxBrett Hackleman
We discuss several of our past and current OSGi-based solutions for defense systems, mining equipment, construction equipment, industrial automation, and automotive/telematics domains. We present some best practices for building flexible, cross-platform, high-performance embedded application and the resulting lessons learned along the way. We demonstrate how the Eclipse Runtime Components and Frameworks can be used to access communication buses such as CAN, J1939, J1850, and MIL-STD-1553. Finally, we explain how using OSGi and Equinox can simplify the development, testing, and deployment of your next application, whether embedded or not.
This document provides an update on perfSONAR network measurement tools, the IRIS and DyGIR projects, the Archipelago measurement platform, network services on TransPAC3 and ACE, and the Data Logistics Toolkit. Key points include:
- perfSONAR and OSCARS software will be used to provide monitoring and dynamic circuit services on TransPAC3 and ACE.
- The IRIS and DyGIR projects will develop monitoring and dynamic circuit software packages for international research networks.
- The Archipelago platform conducts large-scale IPv4 topology measurements from over 50 probes worldwide.
- TransPAC3 and ACE will provide high-performance connectivity between regions and dedicated infrastructure for data movement using the
YOW2018 Cloud Performance Root Cause Analysis at NetflixBrendan Gregg
Keynote by Brendan Gregg for YOW! 2018. Video: https://www.youtube.com/watch?v=03EC8uA30Pw . Description: "At Netflix, improving the performance of our cloud means happier customers and lower costs, and involves root cause
analysis of applications, runtimes, operating systems, and hypervisors, in an environment of 150k cloud instances
that undergo numerous production changes each week. Apart from the developers who regularly optimize their own code
, we also have a dedicated performance team to help with any issue across the cloud, and to build tooling to aid in
this analysis. In this session we will summarize the Netflix environment, procedures, and tools we use and build t
o do root cause analysis on cloud performance issues. The analysis performed may be cloud-wide, using self-service
GUIs such as our open source Atlas tool, or focused on individual instances, and use our open source Vector tool, f
lame graphs, Java debuggers, and tooling that uses Linux perf, ftrace, and bcc/eBPF. You can use these open source
tools in the same way to find performance wins in your own environment."
Presentation given at the Géant3 NMS workshop (GN3/NA3/T4 Campus Best Practices) in Belgrade, 20th October 2009
An overview of where NAV came from, what it does and how it is developed.
EclipseEmbeddedDay2009-OSGi: Best Tool In Your Embedded Systems ToolboxBrett Hackleman
We discuss several of our past and current OSGi-based solutions for defense systems, mining equipment, construction equipment, industrial automation, and automotive/telematics domains. We present some best practices for building flexible, cross-platform, high-performance embedded application and the resulting lessons learned along the way. We demonstrate how the Eclipse Runtime Components and Frameworks can be used to access communication buses such as CAN, J1939, J1850, and MIL-STD-1553. Finally, we explain how using OSGi and Equinox can simplify the development, testing, and deployment of your next application, whether embedded or not.
The ALICE experiment at the LHC requires a data acquisition system capable of handling both frequent small events from proton-proton collisions and rare but large events from heavy ion collisions. The ALICE DAQ system uses over 300 front-end processors to collect data from detectors at up to 2.5 GB/s and store over 1 PB of data per year using a multi-tiered storage architecture. Regular data challenges since 1998 have tested the DAQ system and achieved event building rates over 1 GB/s.
Akshay Sanjay Kale has a Master's degree in Computer Science from USC and a Bachelor's degree from the University of Pune, India. He has work experience at Qualcomm, AirTight Networks, and Speedy Packets. His academic projects include implementing process and thread architecture for a kernel as well as virtual file system and memory. He also developed tools for distributed denial of service detection and implemented routing and file transfer protocols.
Slides 23 and 24 mentions experience with HDF-EOS.
Source: http://hdfeos.org/workshops/ws04/presentations/Jones/000901%20DPEAS%20Overview%20-%20HDFEOS%20Workshop.ppt
The document summarizes the use of the Sector and Sphere cloud computing software on the Open Cloud Testbed for the SC08 Bandwidth Challenge. Key points include:
- Sector is a distributed storage system and Sphere simplifies distributed data processing using a map-reduce model.
- The Open Cloud Testbed provided 101 nodes across 4 locations for running applications like TeraSort (sorting 1TB of data) and CreditStone (analyzing 3TB of credit card transactions).
- Sector/Sphere applications achieved transfer rates of up to 20Gbps for TeraSort and 7.2Gbps for CreditStone, utilizing the distributed resources for large-scale data processing.
This document is a curriculum vitae for VeerannaBabu I that outlines his professional experience and qualifications. He has 3 years of experience developing LabVIEW software and is a Certified LabVIEW Developer. Some of his project experience includes developing data acquisition systems, automated test jigs, and real-time control systems using NI hardware and LabVIEW for various defense organizations in India. He has expertise in communication protocols, NI hardware platforms, and software development best practices.
The document summarizes a tutorial presentation about the Open Grid Computing Environments (OGCE) software tools for building science gateways. The OGCE tools include a gadget container, workflow composer called XBaya, and application factory service called GFac. The presentation demonstrates how these tools can be used to build portals and compose workflows to access resources like the TeraGrid.
Extending the life of your device (firmware updates over LoRa) - LoRa AMMJan Jongboom
This document discusses extending the lifespan of IoT devices through firmware updates and outlines some challenges and solutions. It proposes a standardized approach using multicast transmissions, forward error correction, and an update server to efficiently deliver firmware over constrained low-power wide area networks. An open-source reference implementation is available to demonstrate feasibility on current hardware within radio regulations.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
This document summarizes a computer physics communications article about the conditions database system for the COMPASS experiment. The key points are:
1) COMPASS integrated a conditions database system to manage time-dependent detector condition, calibration, and geometry alignment information using software from CERN.
2) The conditions database consists of administration tools, a data handling library, and software to transfer data from detector controls to the database.
3) Performance tests on the COMPASS computing farm showed the conditions database system was able to efficiently manage the large volumes of time-dependent experimental data needed for the COMPASS experiment.
The document provides a status report on testing the Helix Nebula Science Cloud for interactive data analysis by end users of the TOTEM experiment. It summarizes the deployment of a "Science Box" platform on the Helix Nebula Cloud using technologies like EOS, CERNBox, SWAN and SPARK. Initial tests of the platform were successful in 2017 using a single VM. Current tests involve a scalable deployment with Kubernetes and using SPARK as the computing engine. Synthetic benchmarks and a TOTEM data analysis example show the platform is functioning well with room to scale out storage and computing resources for larger datasets and analyses.
Prasad Meduri has over 8 years of experience in quality assurance testing. He has expertise in testing networking devices such as IP encryptors, routers, and VOIP interfaces. Some of his responsibilities include test case design, test execution, defect tracking, and ensuring software quality. He has worked on projects for clients such as ISRO and eSeva and aims to continuously acquire skills in emerging technologies.
Similar to Software development for the COMPASS experiment (20)
1. Software development for the C OMPASS
experiment
Martin Bodlák1 Vladimír Jarý1∗ Igor Konorov2
Alexander Mann2 Josef Nový1 Stephan Paul2
Miroslav Virius1
1 Faculty of Nuclear Sciences and Physical Engineering
Czech Technical University in Prague
2 Physik-Department
Technische Universität München
Conference “Tvorba softwaru 2012”
24th May 2012, Ostrava
∗
Vladimir.Jary@cern.ch
Vladimí Jarý et al. Software development for the C OMPASS experiment
2. Overview
1 Introduction
C OMPASS experiment
2 Current DAQ system
Architecture of the system
DATE package
3 Control and monitoring software for a new DAQ system
Motivation and requirements
Overview of the hardware architecture
Layers of the DAQ software
Implementation details
Performance tests
4 Conclusion and outlook
Vladimí Jarý et al. Software development for the C OMPASS experiment
3. COMPASS experiment
COMPASS: Common muon and proton apparatus for
structure and spectroscopy
experiment with a fixed target situated on the Super Proton
Synchrotron particle accelerator at CERN, [1]
scientific program approved in 1997 by CERN
experiments with hadron beam (glueballs, Primakoff
scattering, charmed hadrons,. . . )
experiments with muon beam (gluon contribution to the
nucleon spin, transverse spin structure of nucleons,. . . )
multiple types of polarized target
data taking started in 2002
plans at least until 2016 as COMPASS-II
3 programs: GPDs, Drell-Yan, Primakoff scattering
international project: cca 250 physicists from 11 countries
and 29 institutions
Vladimí Jarý et al. Software development for the C OMPASS experiment
4. COMPASS spectrometer
polarized target on the left, length approximately 50 m
COMPASS spectrometer, image taken from [1]
spectrometer consists of detectors:
1 measurement of deposited energy (calorimeters)
2 particle identification (RICH, muon filters)
3 particle tracking (wire chambers)
Vladimí Jarý et al. Software development for the C OMPASS experiment
5. Terminology
event: collection of data describing flight and interactions
of particle through the spectrometer
roles of the data acquisition system (DAQ):
1 reads data produced by detectors (readout)
2 assembles full events from fragment (event building)
3 sends events into a permanent storage (data logging)
4 enables configuration, control, and monitoring (run control)
5 preprocesses and filters data (e.g. track reconstruction,
online filter)
trigger : selects physically interesting events (or refuses
noninteresting events) in a high rate environment with
minimal latency
trigger efficiency : = Ngood(selected) /Ngood(produced) < 1
DAQ deadtime: D = timesystem is busy /timetotal
when system is busy, it cannot accept any other trigger
which leads to loss of data
Vladimí Jarý et al. Software development for the C OMPASS experiment
6. Overview of the TDAQ system
Structure of the trigger and data acquisition system according to [4]
Vladimí Jarý et al. Software development for the C OMPASS experiment
7. Current DAQ architecture
influenced by the cycle of the SPS particle accelerator:
12 s of accelaration, 4.8 s of extraction (spill/burst)
key aspects: multiple layers, parelelism, buffering
1 detector (frontend) electronics:
preamplify, digitize data
250000 data channels
2 concentrator modules (CATCH, GeSiCA):
perform readout (triggered by the Trigger Control System)
append subevent header
3 readout buffers: buffering subevents in spillbuffer PCI cards
makes use of the SPS cycle to reduce data rate to 1/3 of the
onspill rate, roughly stable data rate on the output
(derandomization)
4 event builders:
assemble full events from subevents
send full events to the permanent storage, store
metainformation about events into the Oracle DB
additional tasks (online filter, data quality monitoring)
Vladimí Jarý et al. Software development for the C OMPASS experiment
8. Current DAQ software
based on the ALICE DATE package[2]
DATE distinguishes two kinds of processors:
1 local data concentrators (LDCs)
perform readout of subevents, correspond to readout buffers
2 global data collectors (GDCs)
perform event building, correspond to event builders
requirements on the nodes:
1 all nodes must be x86 compatible
2 all nodes must be powered by GNU/Linux OS
3 all nodes must be connected to the network supporting the
TCP/IP stack
flexible system (fixed targer mode × collider mode)
scalable system (full scale LHC experiment × small
laboratory system with one processor)
performance:
40 GB/s readout
2.5 GB/s event building
1.25 GB/s storage
Vladimí Jarý et al. Software development for the C OMPASS experiment
9. Functionality
DATE provides:
1 readout, data flow
2 event building
3 run control
4 interactive configuration (based on the MySQL database)
5 event monitoring (COOOL)
6 data quality monitoring (MurphyTV )
7 information reporting (infoLogger, infoBrowser )
8 online filter (Cinderella)
9 load balancing (EDM, optional)
10 log book
11 ...
Vladimí Jarý et al. Software development for the C OMPASS experiment
10. Problems with existing DAQ system
Motivation
260 TB recorded during the 2002 Run, 508 TB during the
2004 Run, more than 2 PB during the 2010 Run
increasing number of detectors and detector channels,
trigger rate ⇒ increasing data rates
aging of the hardware ⇒ increasing failure rate of hardware
PCI technology deprecated
Main idea of the new system
replace ROBs and EVBs by custom FPGA-based HW
hardware based data flow control and event building
smaller number of components, higher reliability
Vladimí Jarý et al. Software development for the C OMPASS experiment
11. Overview of the hardware architecture
frontend electronics and concentrator modules unchanged
readout buffers and event builders replaced with custom
hardware:
Field Programmable Gate Array (FPGA) technology
FPGA card designed as a module for Advanced
Telecommunications Computing Architecture (ATCA) carrier
card, in total 8 carrier cards:
6 for data multiplexing
2 for event building
each carrier card equipped with 4 FPGA modules
different functionality, same firmware
FPGA card equipped with 4 GB of RAM, 16 serial links
(bandwidth 3.25 GB/s)
softcore processor on cards for powering control and
monitoring software, communication based on Ethernet
ROBs and EVBs will be used for computing farm
Vladimí Jarý et al. Software development for the C OMPASS experiment
12. Hardware architecture
Vladimí Jarý et al. Software development for the C OMPASS experiment
13. Requirements analysis
Requirements:
distributed system, communication based on TCP/IP
compatibility with Detector Control System
compatibility with software for physical analysis
remote control and monitoring
multiple user roles
real time not required
Decisions:
use the DIM library for communication
do not use the DATE package
possibly reuse some DATE components (COOOL,
MurphyTV )
keep data format unchanged
Vladimí Jarý et al. Software development for the C OMPASS experiment
14. Software architecture
Roles participating in the control and monitoring software
Vladimí Jarý et al. Software development for the C OMPASS experiment
15. Roles participating in the software
1 Master process
controls slave processes
receives commands from GUI
authenticate and authorize users
reads and writes configuration to online database
2 Slave processes
monitor and control the hardware
receive configuration information and commands from the
master process
3 GUI
receives information about health of the system from the
master process
sends commands to the master process that distributes
these commands to the slave processes
4 Message logger: collects messages produced by other
processes and stores them into database
5 Message browser: displays the messages produced by
other process
Vladimí Jarý et al. Software development for the C OMPASS experiment
16. Implementation details
communication between nodes based on the DIM library
implementation in Qt framework
slave processes implemented in C++ language, without Qt
scripting in Python (e.g. starting of the slave processes)
MySQL database (compatibility with Detector Control
System and DATE)
complex system ⇒ describe behavior of the master and
the slave processes by state machines
Vladimí Jarý et al. Software development for the C OMPASS experiment
17. State machines
State machine describing behavior of the master process
Vladimí Jarý et al. Software development for the C OMPASS experiment
18. DIM Library[3]
developed for the DELPHI experiment at CERN
asynchronous one-to-many communication in
heterogenous network environment [3]
based on the TCP/IP
interfaces to C, C++, Python, Java languages
communication between servers (publishers) and clients
(subscribers) through DIM Name Server (DNS)
types of messages:
services updated at regular intervals
services updated on demand
commands
Vladimí Jarý et al. Software development for the C OMPASS experiment
19. DIM Name Server
Position of the DIM Name Server
Vladimí Jarý et al. Software development for the C OMPASS experiment
20. Evaluation of the system
Test scenario:
number of nodes: 2 - 16
message size: 100 B - 500 kB
C OMPASS internal network during winter shutdown (Gigabit
Ethernet)
standard x86 compatible hardware (event builders)
Tests performed:
performance
is system able to update information about status of
hardware every 100 ms?
stability
Vladimí Jarý et al. Software development for the C OMPASS experiment
21. Results of the performance tests
Transfer speed as a function of size of the message
Vladimí Jarý et al. Software development for the C OMPASS experiment
22. Results of the stability tests
Stability of the software in time
Vladimí Jarý et al. Software development for the C OMPASS experiment
23. Summary and outlook
1 Analysis of the existing data acquisition system
based on the DATE package
scalability issues, deprecated technologies (PCI bus)
2 Development of control and monitoring software for new
DAQ architecture
analysis of requirements on software
description of the hardware architecture
definition of roles and behavior of the system
implementation
performance tests
3 Goals:
to test system on the real hardware
to have fully functional system in 2013
to deploy the system in 2014
Vladimí Jarý et al. Software development for the C OMPASS experiment
24. The bibliography
P. Abbon et al.: The COMPASS experiment at CERN, In:
Nucl. Instrum. Methods Phys. Res., A 577, 3 (2007) pp.
455–518. See also the COMPASS homepage at
http://wwwcompass.cern.ch
T. Anticic et al. (ALICE DAQ Project): ALICE DAQ and ECS
User’s Guide, CERN EDMS 616039, January 2006
C. Gaspar: Distributed Information Management System
[online]. 2011. Available at: http://dim.web.cern.ch
W. Vandeli: Introduction to Data Acquisition, In: Internation
School of Trigger and Data Acquisition, Roma, February
2011
Acknowledgement
This work has been supported by the MŠMT grants LA08015
and SGS 11/167.
Vladimí Jarý et al. Software development for the C OMPASS experiment