Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
This document discusses processes and process management in operating systems. It covers key topics such as:
- Process states including new, running, ready, waiting, and terminated.
- Process representation using Process Control Blocks (PCBs) which contain process state and scheduling information.
- Process scheduling with long-term, short-term, and medium-term schedulers that move processes between queues.
- Interprocess communication (IPC) which allows processes to communicate and synchronize, using both direct and indirect communication with mailboxes.
RTLinux is a real-time operating system that allows real-time applications to run on top of Linux. It modifies the Linux kernel to add a virtual machine layer with a separate task scheduler that prioritizes real-time tasks over standard Linux processes. This enables RTLinux to support hard real-time deadlines. Programming in RTLinux involves creating modules that can be loaded and unloaded from the kernel using specific commands. Real-time threads and synchronization objects like mutexes are implemented using POSIX interfaces.
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
Real-Time Operating System (RTOS) Vs. General Purpose OS (GPOS)
Can Linux provide real-time guarantees?
Commercial RTOSs
RTLinux Vs. Linux: Architectural comparison
RTLinux Vs. Linux: Code perspective
Get the RTLinux setup ready
Things to Issue and Handling the hard disk
Lab #1: Detailed discussion
This document summarizes key aspects of real-time kernels. It begins by defining a kernel and its role. It then discusses the structure of a real-time kernel, including layers, states, data structures, and primitives. Scheduling mechanisms like ready queues, insertion, and extraction are covered. Task management, semaphores, and intertask communication using mailboxes and cyclical asynchronous buffers are summarized. The document also discusses system overhead considerations like context switching and interrupts.
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
EuroSTAR Software Testing Conference 2012 presentation on Innovations for Testing Parallel Software by Mike Bartley.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
Using the big guns: Advanced OS performance tools for troubleshooting databas...Nikolay Savvinov
Using OS performance tools and basic alternatives to troubleshoot production database issues
The document discusses using Linux performance tools like pidstat, ps, and tracing tools like perf, systemtap, and dtrace to troubleshoot complex database problems that may involve issues at the operating system, hardware, or network level. It provides examples of using these tools to diagnose specific issues like memory fragmentation, I/O problems, and network congestion and presents a methodology around reproducing issues, analyzing tool output, identifying root causes, and developing solutions.
This document discusses processes and process management in operating systems. It covers key topics such as:
- Process states including new, running, ready, waiting, and terminated.
- Process representation using Process Control Blocks (PCBs) which contain process state and scheduling information.
- Process scheduling with long-term, short-term, and medium-term schedulers that move processes between queues.
- Interprocess communication (IPC) which allows processes to communicate and synchronize, using both direct and indirect communication with mailboxes.
RTLinux is a real-time operating system that allows real-time applications to run on top of Linux. It modifies the Linux kernel to add a virtual machine layer with a separate task scheduler that prioritizes real-time tasks over standard Linux processes. This enables RTLinux to support hard real-time deadlines. Programming in RTLinux involves creating modules that can be loaded and unloaded from the kernel using specific commands. Real-time threads and synchronization objects like mutexes are implemented using POSIX interfaces.
This document discusses the challenges of real-time computing on Linux and potential solutions. Real-time means very low maximum latency, below 100 microseconds. While Linux was not designed for real-time, it is now used in many embedded systems. Options to address real-time include using separate hardware, a hypervisor with an real-time operating system (RTOS), asymmetric multiprocessing (AMP) with an RTOS, or solutions within Linux like PREEMPT_RT that adds preemption and CPU isolation techniques to reduce worst-case latency without changing applications. The document reviews these approaches and notes that real-time remains an important area as Linux is increasingly used in embedded systems.
This slides indicate an introduction on the definition of real time and RTOSes, then you can find information on introducing RT Linux approaches and comparing them with each other, then finally you can see a latency measurement test done by "Linutronix" in the slides
Real-Time Operating System (RTOS) Vs. General Purpose OS (GPOS)
Can Linux provide real-time guarantees?
Commercial RTOSs
RTLinux Vs. Linux: Architectural comparison
RTLinux Vs. Linux: Code perspective
Get the RTLinux setup ready
Things to Issue and Handling the hard disk
Lab #1: Detailed discussion
This document summarizes key aspects of real-time kernels. It begins by defining a kernel and its role. It then discusses the structure of a real-time kernel, including layers, states, data structures, and primitives. Scheduling mechanisms like ready queues, insertion, and extraction are covered. Task management, semaphores, and intertask communication using mailboxes and cyclical asynchronous buffers are summarized. The document also discusses system overhead considerations like context switching and interrupts.
Mike Bartley - Innovations for Testing Parallel Software - EuroSTAR 2012TEST Huddle
EuroSTAR Software Testing Conference 2012 presentation on Innovations for Testing Parallel Software by Mike Bartley.
See more at: http://conference.eurostarsoftwaretesting.com/past-presentations/
This document provides an introduction to real-time systems and discusses approaches to making Linux a real-time operating system. It defines hard and soft real-time systems and explains why Linux is commonly used instead of dedicated real-time operating systems. The document then discusses two main solutions, PREEMPT_RT and Xenomai 3, which provide patches to make Linux meet timing constraints through different approaches. It also provides an overview of basic real-time concepts like scheduling algorithms, preemptive vs. non-preemptive scheduling, and interprocess communication.
The document discusses computation flow for reconfigurable systems. It covers several key points:
1) Computation flow involves both run-time and compile-time iterations for some applications.
2) Synchronization and blocking access are usually used between the processor and reconfigurable device for memory access.
3) Full and partial reconfiguration approaches involve either fully or partially reconfiguring the FPGA device while it continues running other tasks. Managing computation flow and reconfiguration presents challenges around fragmentation and communication between new and old tasks.
The document discusses processes and threads and their implementation. It covers:
- The differences between user-level and kernel-level threads and their trade-offs.
- Context switching, which involves saving the state of the current thread and restoring the state of the next thread to run. This happens when switching between threads.
- Typical context switch operations like saving registers to the stack, storing the old stack pointer, loading the new stack pointer and restoring registers.
High availability (HA) aims to ensure a prearranged level of operational performance by increasing the mean time between failures (MTBF) and decreasing the mean time to repair (MTTR). When implementing HA for a DMF system, considerations include redundant hardware, limiting single points of failure, minimizing downtime during repairs and upgrades, and having mechanisms to quickly address failures like STONITH (shoot the other node). Real-world HA often involves initially setting up a single DMF server and testing it before converting to an active-passive HA configuration with mechanisms to monitor the system and quickly transition services between nodes if needed.
This document discusses the evolution of computer architecture from CISC to RISC designs. It covers major advances like cache memory and microprocessors that enabled RISC. Key RISC features include large register files optimized via register allocation algorithms. Pipelining is optimized in RISC via techniques like delayed branching. While CISC aimed to simplify compilers, RISC focuses on optimizing instruction execution through techniques like register referencing and simplified instruction sets. The document also notes ongoing debates around quantitatively and qualitatively comparing RISC and CISC designs.
The document discusses scheduling in computing systems. It defines scheduling as the process by which the scheduler decides which ready process to run next. Scheduling is important in multi-programmed, multi-user, and batch systems to determine the order of execution of ready processes. The goals of different scheduling algorithms are discussed, including fairness, policy enforcement, balance/efficiency for general algorithms, and minimizing response time and meeting deadlines for interactive and real-time systems respectively. Popular scheduling techniques like round robin, priorities, and real-time scheduling are explained through examples.
This document discusses concurrency and processes. It begins by defining threads of execution and how operating systems provide the illusion of multiple processors through time-sharing techniques. It then discusses how modern operating systems abstract processes to provide protection between threads through separate address spaces and the virtualization of hardware resources. The document covers techniques like context switching, scheduling, and inter-process communication to enable concurrency while maintaining isolation between processes.
On mp so c software execution at the transaction levelTAIWAN
The document discusses various strategies for executing software in transaction-level simulations of multiprocessor systems-on-chip, including instruction interpretation, dynamic binary translation, and native execution on the host machine. It compares the strategies based on speed, accuracy, and development time, finding that native execution provides the most efficient simulation speed while still enabling accurate performance evaluation through annotation strategies. The strategies are then integrated into a transaction-level modeling environment, with native execution found to be best suited for software development layers.
This document discusses various techniques for inter-process communication and synchronization between concurrent processes. It covers topics like mutual exclusion, semaphores, monitors, and classical synchronization problems. Mutual exclusion is required to prevent race conditions when accessing shared resources. Common solutions discussed are software algorithms, hardware support using test-and-set operations, and operating system semaphores. Monitors provide synchronization through condition variables. Message passing enables communication and synchronization between distributed processes.
Getting Started with JDK Mission ControlMarcus Hirt
This document provides an overview of Java Mission Control (JMC), which is a tool for profiling and diagnosing Java applications. It discusses how to get and run JMC, use the JMX console and Java Flight Recorder to analyze application performance, and do heap analysis with JOverflow. The document also provides demos of these JMC features and additional resources for learning more.
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...peknap
In dealing with a real world problem of scheduling three classes of tasks – network packet forwarding, voice over IP and application level services for a home gateway device, the author found that mechanisms coming with vanilla Linux kernel are not enough. This talk will cover the unique real-time requirements for each task class, why moving softirq to process context with RT_PREEMPT patch is an important step in solving the problem and how a deadline based process scheduler would be a better solution than regular real-time scheduling classes.
Real Time Debugging - What to do when a breakpoint just won't doLloydMoore
Abstract:
Debugging real time issues present a unique set of challenges and requirements to the developer. Normal debugging techniques such as breakpoints, printf statements and logging frequently fail to locate the problem and can actually make the issue worse. This presentation will examine why common debugging techniques fail when applied to real time issues, and then present tools and techniques which can successfully address the unique challenges of real time debugging.
Bio:
Lloyd Moore is the founder and owner of CyberData Corporation, which provides consulting services in the robotics, machine vision and industrial automation fields. Lloyd has worked in software industry for 25 years, with his formal training in biological based artificial intelligence, electronics, and psychology. Lloyd is also currently the president of NWCPP and organizer of the Seattle Robotics Society Robothon event.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
This document discusses using GNU/Linux for safety-related systems. It introduces GNU/Linux and its development process using tools like git. It also discusses kernel development tools like git, cscope, sparse and tools for testing like gcov and gprof. Finally, it discusses safety standards like IEC 61508 and requirements for the highest safety integrity level like those in EN 50128, including requirements for modular design, coding standards, testing and configuration management. The goal is to determine if GNU/Linux is suitable for safety-critical applications.
Linux PREEMPT_RT improves the preemptiveness of the Linux kernel by allowing preemption everywhere except when preemption is disabled or interrupts are disabled. This reduces latency from preemption, critical sections, and interrupts. However, non-deterministic external interrupt events and timing as well as interrupt collisions can still cause unpredictable latency. Tracing tools can help analyze latency but practical issues remain in fully guaranteeing hard real-time behavior.
Hadoop Summit 2012 | HDFS High AvailabilityCloudera, Inc.
The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.
This document discusses real-time solutions in Linux. It explains that real-time systems focus on determinism, ensuring events and timing are known. There are several approaches to real-time Linux, including dual kernel systems like RTLinux that run a real-time kernel alongside Linux, and PREEMPT_RT kernels that modify Linux for lower latencies. Popular dual kernel options are RTAI and Xenomai, which provide real-time cores on top of Linux. RTAI focuses on x86 while Xenomai aims to facilitate porting and supports multiple personality "skins". The document provides references for further information on these real-time Linux solutions.
gcma: guaranteed contiguous memory allocatorSeongJae Park
This document presents GCMA, a Guaranteed Contiguous Memory Allocator that improves upon the current Contiguous Memory Allocator (CMA) solution in Linux. CMA can have unpredictable latency and even fail when allocating contiguous memory, especially under memory pressure or with background workloads. GCMA guarantees fast latency for contiguous memory allocation, success of allocation, and reasonable memory utilization by using discardable memory as its secondary client instead of movable pages. Experimental results on a Raspberry Pi 2 show that GCMA has significantly faster allocation latency than CMA, keeps camera latency fast even with background workloads, and can improve overall system performance compared to CMA.
Constraint Programming in Compiler Optimization: Lessons LearnedPeter van Beek
Constraint programming techniques were applied to compiler optimization problems like instruction selection, instruction scheduling, and register allocation. The techniques were able to find optimal solutions to some problems that were previously only solved heuristically. Constraint models were improved over time by adding implied constraints, dominance constraints, and preprocessing. Solvers were improved through techniques like restarts, portfolios, and machine learning of heuristics. The approach led to identifying and solving interesting subproblems with general applicability, like improved consistency algorithms for global constraints.
This document discusses real time operating systems for networked embedded systems. It provides an example of a fire alarm system with hundreds of sensors communicating over low bandwidth radio links to controllers and a central server. It outlines the challenges in ensuring sensor information is logged and appropriate action initiated in real time. The document then discusses real time operating systems, including scheduling algorithms like priority scheduling and clock driven scheduling. It covers features of RTOS like interrupt handling and resource allocation. Finally, it mentions specific RTOS like Linux, RTLinux, and the author's own rtker RTOS.
Open ComRTOS 1.4_tutorial_2o4_presentationEric Verhulst
This document outlines a workshop on OpenComRTOS and its designer suite. It introduces OpenComRTOS as a CSP-inspired distributed RTOS and describes its various components, including the designer suite, runtime system, and available ports. It also provides examples of system composition using tasks, hubs and services, and summarizes interaction semantics like waiting and non-waiting.
The document summarizes the theoretical foundations of OpenComRTOS. It begins with an introduction that defines distributed heterogeneous systems and their advantages and problems. It then provides an overview of Communicating Sequential Processes (CSP), a formal method for modeling concurrent systems. CSP uses processes, alphabets and traces to model systems. The document applies these CSP concepts to model a counter and web server. It concludes with an outline of the workshop sessions on OpenComRTOS.
This document provides an introduction to real-time systems and discusses approaches to making Linux a real-time operating system. It defines hard and soft real-time systems and explains why Linux is commonly used instead of dedicated real-time operating systems. The document then discusses two main solutions, PREEMPT_RT and Xenomai 3, which provide patches to make Linux meet timing constraints through different approaches. It also provides an overview of basic real-time concepts like scheduling algorithms, preemptive vs. non-preemptive scheduling, and interprocess communication.
The document discusses computation flow for reconfigurable systems. It covers several key points:
1) Computation flow involves both run-time and compile-time iterations for some applications.
2) Synchronization and blocking access are usually used between the processor and reconfigurable device for memory access.
3) Full and partial reconfiguration approaches involve either fully or partially reconfiguring the FPGA device while it continues running other tasks. Managing computation flow and reconfiguration presents challenges around fragmentation and communication between new and old tasks.
The document discusses processes and threads and their implementation. It covers:
- The differences between user-level and kernel-level threads and their trade-offs.
- Context switching, which involves saving the state of the current thread and restoring the state of the next thread to run. This happens when switching between threads.
- Typical context switch operations like saving registers to the stack, storing the old stack pointer, loading the new stack pointer and restoring registers.
High availability (HA) aims to ensure a prearranged level of operational performance by increasing the mean time between failures (MTBF) and decreasing the mean time to repair (MTTR). When implementing HA for a DMF system, considerations include redundant hardware, limiting single points of failure, minimizing downtime during repairs and upgrades, and having mechanisms to quickly address failures like STONITH (shoot the other node). Real-world HA often involves initially setting up a single DMF server and testing it before converting to an active-passive HA configuration with mechanisms to monitor the system and quickly transition services between nodes if needed.
This document discusses the evolution of computer architecture from CISC to RISC designs. It covers major advances like cache memory and microprocessors that enabled RISC. Key RISC features include large register files optimized via register allocation algorithms. Pipelining is optimized in RISC via techniques like delayed branching. While CISC aimed to simplify compilers, RISC focuses on optimizing instruction execution through techniques like register referencing and simplified instruction sets. The document also notes ongoing debates around quantitatively and qualitatively comparing RISC and CISC designs.
The document discusses scheduling in computing systems. It defines scheduling as the process by which the scheduler decides which ready process to run next. Scheduling is important in multi-programmed, multi-user, and batch systems to determine the order of execution of ready processes. The goals of different scheduling algorithms are discussed, including fairness, policy enforcement, balance/efficiency for general algorithms, and minimizing response time and meeting deadlines for interactive and real-time systems respectively. Popular scheduling techniques like round robin, priorities, and real-time scheduling are explained through examples.
This document discusses concurrency and processes. It begins by defining threads of execution and how operating systems provide the illusion of multiple processors through time-sharing techniques. It then discusses how modern operating systems abstract processes to provide protection between threads through separate address spaces and the virtualization of hardware resources. The document covers techniques like context switching, scheduling, and inter-process communication to enable concurrency while maintaining isolation between processes.
On mp so c software execution at the transaction levelTAIWAN
The document discusses various strategies for executing software in transaction-level simulations of multiprocessor systems-on-chip, including instruction interpretation, dynamic binary translation, and native execution on the host machine. It compares the strategies based on speed, accuracy, and development time, finding that native execution provides the most efficient simulation speed while still enabling accurate performance evaluation through annotation strategies. The strategies are then integrated into a transaction-level modeling environment, with native execution found to be best suited for software development layers.
This document discusses various techniques for inter-process communication and synchronization between concurrent processes. It covers topics like mutual exclusion, semaphores, monitors, and classical synchronization problems. Mutual exclusion is required to prevent race conditions when accessing shared resources. Common solutions discussed are software algorithms, hardware support using test-and-set operations, and operating system semaphores. Monitors provide synchronization through condition variables. Message passing enables communication and synchronization between distributed processes.
Getting Started with JDK Mission ControlMarcus Hirt
This document provides an overview of Java Mission Control (JMC), which is a tool for profiling and diagnosing Java applications. It discusses how to get and run JMC, use the JMX console and Java Flight Recorder to analyze application performance, and do heap analysis with JOverflow. The document also provides demos of these JMC features and additional resources for learning more.
Solving Real-Time Scheduling Problems With RT_PREEMPT and Deadline-Based Sche...peknap
In dealing with a real world problem of scheduling three classes of tasks – network packet forwarding, voice over IP and application level services for a home gateway device, the author found that mechanisms coming with vanilla Linux kernel are not enough. This talk will cover the unique real-time requirements for each task class, why moving softirq to process context with RT_PREEMPT patch is an important step in solving the problem and how a deadline based process scheduler would be a better solution than regular real-time scheduling classes.
Real Time Debugging - What to do when a breakpoint just won't doLloydMoore
Abstract:
Debugging real time issues present a unique set of challenges and requirements to the developer. Normal debugging techniques such as breakpoints, printf statements and logging frequently fail to locate the problem and can actually make the issue worse. This presentation will examine why common debugging techniques fail when applied to real time issues, and then present tools and techniques which can successfully address the unique challenges of real time debugging.
Bio:
Lloyd Moore is the founder and owner of CyberData Corporation, which provides consulting services in the robotics, machine vision and industrial automation fields. Lloyd has worked in software industry for 25 years, with his formal training in biological based artificial intelligence, electronics, and psychology. Lloyd is also currently the president of NWCPP and organizer of the Seattle Robotics Society Robothon event.
This document discusses several hardware memory models including Total Store Order (TSO), Processor Consistency (PC), and Weak Ordering. TSO allows loads to bypass earlier stores to different addresses but maintains order of loads and stores. PC is similar to TSO but does not guarantee write atomicity. Weak Ordering relaxes all instruction ordering and uses synchronization operations like locks and barriers to enforce ordering. The document also describes memory barrier instructions in PowerPC that can be used to enforce ordering between memory accesses.
This document discusses using GNU/Linux for safety-related systems. It introduces GNU/Linux and its development process using tools like git. It also discusses kernel development tools like git, cscope, sparse and tools for testing like gcov and gprof. Finally, it discusses safety standards like IEC 61508 and requirements for the highest safety integrity level like those in EN 50128, including requirements for modular design, coding standards, testing and configuration management. The goal is to determine if GNU/Linux is suitable for safety-critical applications.
Linux PREEMPT_RT improves the preemptiveness of the Linux kernel by allowing preemption everywhere except when preemption is disabled or interrupts are disabled. This reduces latency from preemption, critical sections, and interrupts. However, non-deterministic external interrupt events and timing as well as interrupt collisions can still cause unpredictable latency. Tracing tools can help analyze latency but practical issues remain in fully guaranteeing hard real-time behavior.
Hadoop Summit 2012 | HDFS High AvailabilityCloudera, Inc.
The HDFS NameNode is a robust and reliable service as seen in practice in production at Yahoo and other customers. However, the NameNode does not have automatic failover support. A hot failover solution called HA NameNode is currently under active development (HDFS-1623). This talk will cover the architecture, design and setup. We will also discuss the future direction for HA NameNode.
This document discusses real-time solutions in Linux. It explains that real-time systems focus on determinism, ensuring events and timing are known. There are several approaches to real-time Linux, including dual kernel systems like RTLinux that run a real-time kernel alongside Linux, and PREEMPT_RT kernels that modify Linux for lower latencies. Popular dual kernel options are RTAI and Xenomai, which provide real-time cores on top of Linux. RTAI focuses on x86 while Xenomai aims to facilitate porting and supports multiple personality "skins". The document provides references for further information on these real-time Linux solutions.
gcma: guaranteed contiguous memory allocatorSeongJae Park
This document presents GCMA, a Guaranteed Contiguous Memory Allocator that improves upon the current Contiguous Memory Allocator (CMA) solution in Linux. CMA can have unpredictable latency and even fail when allocating contiguous memory, especially under memory pressure or with background workloads. GCMA guarantees fast latency for contiguous memory allocation, success of allocation, and reasonable memory utilization by using discardable memory as its secondary client instead of movable pages. Experimental results on a Raspberry Pi 2 show that GCMA has significantly faster allocation latency than CMA, keeps camera latency fast even with background workloads, and can improve overall system performance compared to CMA.
Constraint Programming in Compiler Optimization: Lessons LearnedPeter van Beek
Constraint programming techniques were applied to compiler optimization problems like instruction selection, instruction scheduling, and register allocation. The techniques were able to find optimal solutions to some problems that were previously only solved heuristically. Constraint models were improved over time by adding implied constraints, dominance constraints, and preprocessing. Solvers were improved through techniques like restarts, portfolios, and machine learning of heuristics. The approach led to identifying and solving interesting subproblems with general applicability, like improved consistency algorithms for global constraints.
This document discusses real time operating systems for networked embedded systems. It provides an example of a fire alarm system with hundreds of sensors communicating over low bandwidth radio links to controllers and a central server. It outlines the challenges in ensuring sensor information is logged and appropriate action initiated in real time. The document then discusses real time operating systems, including scheduling algorithms like priority scheduling and clock driven scheduling. It covers features of RTOS like interrupt handling and resource allocation. Finally, it mentions specific RTOS like Linux, RTLinux, and the author's own rtker RTOS.
Open ComRTOS 1.4_tutorial_2o4_presentationEric Verhulst
This document outlines a workshop on OpenComRTOS and its designer suite. It introduces OpenComRTOS as a CSP-inspired distributed RTOS and describes its various components, including the designer suite, runtime system, and available ports. It also provides examples of system composition using tasks, hubs and services, and summarizes interaction semantics like waiting and non-waiting.
The document summarizes the theoretical foundations of OpenComRTOS. It begins with an introduction that defines distributed heterogeneous systems and their advantages and problems. It then provides an overview of Communicating Sequential Processes (CSP), a formal method for modeling concurrent systems. CSP uses processes, alphabets and traces to model systems. The document applies these CSP concepts to model a counter and web server. It concludes with an outline of the workshop sessions on OpenComRTOS.
The document discusses Altreonic NV's approach to safety engineering, describing how they view safety as an emergent property of a quality engineering process rather than just preventing harm, and that their methodology focuses on developing simple, elegant system architectures through a formalized engineering process involving requirements capture, specification, modeling, verification and certification. It also outlines some of the challenges in ensuring safety for complex systems with multiple stakeholders and technical domains.
The document outlines a brand communication strategy for a new birthplace called The Birthplace (TBP) in 3 stages:
1. Stage 1 "Coming Soon - A New Birthplace" will create initial buzz and anticipation about the new brand.
2. Stage 2 "Coming Soon - TBP" will reveal more details about the concept and facilities while launching the logo and recruiting staff.
3. Stage 3 "We Are Open" will include activities for the official launch day and post-launch campaigns to communicate what TBP offers and engage with consumers through education and community events. The strategy aims to position TBP as exclusive yet approachable and define its care as attributes of a good birthing
The document outlines a marketing campaign targeting pregnant women in Hyderabad, India. The campaign aims to introduce the women to the Harry Potter brand through a website called duesoon.com that allows interaction. The campaign would drive traffic to the website through posters and banners, then engage the women by having them share what they wish for on the site and by giving away free t-shirts.
Unified Systems Engeneering with GoedelWorksEric Verhulst
1) The document discusses a metamodel for systems engineering called a "systems grammar" developed by Open License Society and used in various EU projects.
2) It is currently commercialized by Altreonic as GoedelWorks and refined by adding structure and properties to avoid overlapping concepts.
3) The metamodel takes a multi-level approach with different views and user levels that correspond to domains like process, engineering, modeling, and software.
Altreonic was spun off in 2008 from Eonic Systems to focus on real-time operating systems using formal techniques. Their OpenComRTOS is a small, network-centric real-time OS that uses CSP concurrency and can scale from 1 to over 10,000 nodes. It provides priority-based communication and fault tolerance and has been implemented on many heterogeneous platforms from DSPs to many-core systems.
Unified Systems Engineering feasibilityEric Verhulst
Is unified systems and safety engineering feasible?
This presentation introduces a new approach for developing composable systems with different SIL levels will be presented. It introduces the new notion of ARRL (Assured Reliability and Resilience Level).
This document outlines the session "OpenComRTOS Internals". It discusses how OpenComRTOS works internally, including interacting entities like tasks and hubs, the virtual single processor programming model, and priority inversion. It also describes the build process for OpenComRTOS systems and how to extend OpenComRTOS, including components, porting to a new platform, and device drivers. The speaker is Bernhard Sputh from Altreonic providing details on these OpenComRTOS internals.
The document discusses sequential processing, parallel processing, and pipelining techniques to improve CPU performance.
Sequential processing executes instructions one at a time based on the von Neumann architecture. Pipelining breaks jobs into stages to keep processor resources busy and improve throughput. Parallel processing uses multiple processors simultaneously to potentially reduce execution time by dividing a program across processors. Different parallel processor architectures include multiple instruction/multiple data streams and symmetric multiprocessors. The document compares sequential, pipelined, and parallel systems and their advantages and disadvantages for efficient processing.
Von Neumann Architecture microcontroller.pptxSUNILNYATI2
The von Neumann architecture consists of a main memory, central processing unit (CPU), and interconnection between them. The CPU contains a control unit and arithmetic logic unit. Main memory stores instructions and data at unique addresses. Registers inside the CPU allow for faster access. To address the bottleneck of CPU speed exceeding memory speed, modifications like caching, pipelining, and virtual memory were developed. Caching stores recently accessed data closer to the CPU. Pipelining divides operations into stages to allow for parallel processing. Virtual memory uses secondary storage like hard disks to supplement physical memory.
Asynchronous and Parallel Programming in .NETssusere19c741
This document provides an overview of asynchronous and parallel programming concepts including:
- Mono-processor and multiprocessor systems
- Flynn's taxonomy for classifying computer architectures
- Serial and parallel computing approaches
- .NET frameworks for parallel programming like the Task Parallel Library and Parallel LINQ
It includes demonstrations of using tasks and PLINQ for parallel programming in .NET.
Implementing Parallelism in PostgreSQL - PGCon 2014EDB
PostgreSQL's architecture is based heavily on the idea that each connection is served by a single backend process, but CPU core counts are rising much faster than CPU speeds, and large data sets can't be efficiently processed serially. Adding parallelism to PostgreSQL requires significant architectural changes to many areas of the system, including background workers, shared memory, memory allocation, locking, GUC, transactions, snapshots, and more.
Latest (storage IO) patterns for cloud-native applications OpenEBS
Applying micro service patterns to storage giving each workload its own Container Attached Storage (CAS) system. This puts the DevOps persona within full control of the storage requirements and brings data agility to k8s persistent workloads. We will go over the concept and the implementation of CAS, as well as its orchestration.
Parallel computing involves using multiple processing units simultaneously to solve computational problems. It can save time by solving large problems or providing concurrency. The basic design involves memory storing program instructions and data, and a CPU fetching instructions from memory and sequentially performing them. Flynn's taxonomy classifies computer systems based on their instruction and data streams as SISD, SIMD, MISD, or MIMD. Parallel architectures can also be classified based on their memory arrangement as shared memory or distributed memory systems.
Presentation given at the 2017 LinuxCon China
Unikernel is a novel software technology that links an application with OS in the form of a library and packages them into a specialized image that facilitates direct deployment on a hypervisor. Comparing to the traditional VM or the recent containers, Unikernels are smaller, more secure and efficient, making them ideal for cloud environments. There are already lots of open source projects like OSv, Rumprun and so on. But why these existing unikernels have yet to gain large popularity broadly? We think Unikernels are facing three major challenges: 1. Compatibility with existing applications; 2. Lack of production support (e.g. monitoring, debugging, logging); 3. Lack of compelling use case. In this presentation, we will review our investigations and exploration of if-how we can convert Linux as Unikernel to eliminate these significant shortcomings, plus some explorations of coordinating and cooperating with hypervisor.
This document provides an overview of high performance computing infrastructures. It discusses parallel architectures including multi-core processors and graphical processing units. It also covers cluster computing, which connects multiple computers to increase processing power, and grid computing, which shares resources across administrative domains. The key aspects covered are parallelism, memory architectures, and technologies used to implement clusters like Message Passing Interface.
Control Your Network ASICs, What Benefits switchdev Can Bring UsHungWei Chiu
In this slide, I will introduce what is switchdev and what problem it wants to solve. To this day, most of the hardware switch's application-specific integrated circuit (ASIC) only be controlled by the vendor's proprietary binary (SDK) and it's inconvenient for system administrator/developer. In order to break the chip vendor's lock-in situation, the switchdev had been designed to solve this. With the help of switchdev, we can develop a general solution for hardware switch chips and break the connection with vendor's binary-blob (SDK).
In order words. Linux kernel can directly communicate with the vendor's proprietary ASIC now, and the software programmer/system administrator can easily control that ASIC to provide more flexible, powerful and programmable network function.
This is the story of how we managed to scale and improve Tappsi’s RoR RESTful API to handle our ever-growing load - told from different perspectives: infrastructure, data storage tuning, web server tuning, RoR optimization, monitoring and architecture design.
Introduction to Cloud Computing Data Center and Network Issues to Internet Research Lab at NTU, Taiwan. Another definition of cloud computing and comparison of traditional IT warehouse and current cloud data center. (ppt slide for download.) Take a opensource data center management OS, OpenStack, as an example. Underlying network issues inside a cloud DC.
The Challenges facing Libraries and Imperative Languages from Massively Paral...Jason Hearne-McGuiness
The document discusses challenges related to parallel processing and massive parallel architectures. It covers topics like pipeline processors, multiprocessors, processing in memory architectures like Cyclops and picoChip, and cellular architectures. It also discusses code generation issues that arise from massive parallelism and possible solutions using compilers or libraries.
The document discusses new hardware and software capabilities introduced with the IBM z13 mainframe, including vector processing instructions, improved support for Java, XML, Unicode, and big data workloads. It explains how these features enable new types of applications in areas like analytics, search, and processing unstructured data more efficiently. The document also notes ways for customers to adopt these new workloads while controlling costs, such as through new application licensing charges and workload-based pricing models.
This document discusses Process Management Interface for Exascale (PMIx). It provides an overview and objectives of PMIx, which aims to establish an independent and open community effort to develop scalable client/server libraries for job launch and management. The document discusses performance status showing improvements over PMI2, integration status in Open MPI and SLURM, and roadmap for continued development including supporting evolving application needs through flexible resource allocation and fault tolerance. It also discusses different types of malleable and adaptive jobs that PMIx aims to support.
Reduced instruction set computing, or RISC (pronounced 'risk', /ɹɪsk/), is a CPU design strategy based on the insight that a simplified instruction set provides higher performance when combined with a microprocessor architecture capable of executing those instructions using fewer microprocessor cycles per instruction.
The document provides an introduction to high performance computing architectures. It discusses the von Neumann architecture that has been used in computers for over 40 years. It then explains Flynn's taxonomy, which classifies parallel computers based on whether their instruction and data streams are single or multiple. The main categories are SISD, SIMD, MISD, and MIMD. It provides examples of computer architectures that fall under each classification. Finally, it discusses different parallel computer memory architectures, including shared memory, distributed memory, and hybrid models.
Talk @ APT Group, University of Manchester, 06 August 2014
Abstract:
Nowadays HPC systems, such as those in the Top500, are equipped with a range of different processors, from multi-core CPUs to GPUs. Programming them can be a tough job, specially if we want to squeeze every last FLOPs of performance out of them.
As a Phd Student, I am now doing a brief research visit in the APT group, working in topics related to the programmability and efficient use of GPUs and many-core coprocessors. In particular, I am implementing a large database operation using OpenCL in these state-of-the-art systems. In this talk I will summarize my work in Manchester and discuss the future work in this topic.
How Texas Instruments Uses InfluxDB to Uphold Product Standards and to Improv...DevOps.com
This document provides an overview of Texas Instruments' use of InfluxDB and Grafana for time-series data analysis. It discusses (1) Texas Instruments' business and the importance of data-driven decisions, (2) their current setup ingesting metrics like tool utilization into InfluxDB, and (3) examples of Grafana dashboards created using this data. It also covers some issues encountered and plans to expand usage, including additional metrics, time-series modeling, and developing applications.
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
JT Kellington, IBM and Allan Cantle, Nallatech present at the 2015 HPCC Systems Engineering Summit Community Day about porting HPCC Systems to the POWER8-based ppc64el architecture.
Similar to Session 1 introduction concurrent programming (20)
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...SOFTTECHHUB
The choice of an operating system plays a pivotal role in shaping our computing experience. For decades, Microsoft's Windows has dominated the market, offering a familiar and widely adopted platform for personal and professional use. However, as technological advancements continue to push the boundaries of innovation, alternative operating systems have emerged, challenging the status quo and offering users a fresh perspective on computing.
One such alternative that has garnered significant attention and acclaim is Nitrux Linux 3.5.0, a sleek, powerful, and user-friendly Linux distribution that promises to redefine the way we interact with our devices. With its focus on performance, security, and customization, Nitrux Linux presents a compelling case for those seeking to break free from the constraints of proprietary software and embrace the freedom and flexibility of open-source computing.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
GraphSummit Singapore | The Art of the Possible with Graph - Q2 2024Neo4j
Neha Bajwa, Vice President of Product Marketing, Neo4j
Join us as we explore breakthrough innovations enabled by interconnected data and AI. Discover firsthand how organizations use relationships in data to uncover contextual insights and solve our most pressing challenges – from optimizing supply chains, detecting fraud, and improving customer experiences to accelerating drug discoveries.
Pushing the limits of ePRTC: 100ns holdover for 100 daysAdtran
At WSTS 2024, Alon Stern explored the topic of parametric holdover and explained how recent research findings can be implemented in real-world PNT networks to achieve 100 nanoseconds of accuracy for up to 100 days.
“An Outlook of the Ongoing and Future Relationship between Blockchain Technologies and Process-aware Information Systems.” Invited talk at the joint workshop on Blockchain for Information Systems (BC4IS) and Blockchain for Trusted Data Sharing (B4TDS), co-located with with the 36th International Conference on Advanced Information Systems Engineering (CAiSE), 3 June 2024, Limassol, Cyprus.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Communications Mining Series - Zero to Hero - Session 1DianaGray10
This session provides introduction to UiPath Communication Mining, importance and platform overview. You will acquire a good understand of the phases in Communication Mining as we go over the platform with you. Topics covered:
• Communication Mining Overview
• Why is it important?
• How can it help today’s business and the benefits
• Phases in Communication Mining
• Demo on Platform overview
• Q/A
1. Introduction Concurrent
Programming
Eric Verhulst
eric.verhulst@altreonic.com,
http://www.altreonic.com
From Deep Space to Deep Sea
Content
• The von Neuman legacy
• Using more than one processor
• On concurrent multi-tasking
• How to program these things?
• From MP to VSP: the Virtual Single Processor
• Embedded Systems Engineering
• The formal development of OpenComRTOS
• Conclusion
11 January 2012 MC4ES workshop 2
2. The von Neuman ALU
vs. an embedded processor
• The sequential programming paradigm is based on the von Neumann architecture
• But this was only meant for a single ALU
• A real processor in an embedded system :
• Inputs data, Processes the data (only this covered by von Neumann) Output the result
• On other words :
• at least two communications, often one computation
• => Communication/Computation ratio must be > 1 (in optimal case)
• Standard programming languages (C, Java, …) only cover the computation and sometimes
limited runtime multitasking
• We have an unbalance, and have been living with it for decades
• Reason ? : history
• Computer scientists use workstations (eye Nyquist Frequency is 50 Hz)
• Only embedded systems must process data in real-time
• Embedded systems were first developed by hardware engineers
11 January 2012 MC4ES workshop 3
Multi-tasking:
doing more than one thing at a time
• Origin:
• A software solution to a hardware limitation
• von Neumann processors are sequential, the real-world is “parallel” by nature
and software is just modeling
• Developed out of industrial needs
• How?
• A function is a [callable] sequential stream of instructions
• Uses resources [mainly registers] => defines “context”
• Non-sequential processing =
• switching between ownership of processor(s)
• reducing overhead by using idle time or to avoid active wait :
• each function has its own workspace
• a task = function with proper context and workspace
• Scheduling to achieve real-time behavior for each task
• preemption capability : switch asynchronously
11 January 2012 MC4ES workshop 4
3. Scheduling approaches (1)
• Dominant real-time/scheduling system view paradigms:
• Control flow:
• Event driven - asynchronous : latency is the issue
• Traverse the state machine
• Uncovered states generate complexity
• Data-flow:
• Data-driven : throughput is the issue
• Multi-rate processing generates complexity
• Time-triggered:
• Play safe: allocate timeslots beforehand
• Reliable if system is predictable and stationary
• Static scheduling
• Play safe: single pre-calculated thread
• Reliable if no SW errors, if no HW failures
11 January 2012 MC4ES workshop 5
Scheduling approaches (2)
• REAL SYSTEMS :
• combination of above
• distinction is mainly implementation and style issue, not conceptual
• SCHEDULING IS AN ORTHOGONAL ISSUE TO MULTI-TASKING
• multi-tasking gives logical behaviour
• scheduling gives timely behaviour
11 January 2012 MC4ES workshop 6
4. Classical (RT) scheduling (1)
• Superloop :
• loops into single endless loop : [test, call function]
• Round-robin, FIFO :
• first in, first served
• cooperative multi-tasking, run till descheduling point, no preemption
• Priority based, pre-emptive
• tasks run when executable in order of priority
• can be descheduled at any moment
• Rata Monotonic Analysis (Liu & Stanckovic) : CPU load < 70 %
• simplified model : tasks are independent
• graceful degradation by design
11 January 2012 MC4ES workshop 7
Classical (RT) scheduling (2)
• Earliest deadline first (EDF):
• most dynamic form : deadlines are time based
• but complex to use and implement
• time granularity is an issue (hardware lacking support)
• catastrophic failure mode
• For MP systems:
• No real generic solution for embedded
• IT solutions focus on QoS and soft real-time + high overhead
• Higher need for error recovery
• Graceful degradation : garantee QoS even if HW resources fail
• RMA (priority, preemptive) still best approach
11 January 2012 MC4ES workshop 8
6. Why Multi-Processing ?
• Laws of diminishing return :
• Power consumption increases more than linearly with speed (F**2, V)
• Highest (peak) speed achieved by micro-parallel tricks :
• Pipelining, VLIW, out of order execution, branch prediction, …
• Efficiency depends on application code
• Requires higher frequencies and many more gates
• Creates new bottlenecks :
• I/O and communication become bottlenecks
• Memory access speed slower than ALU processing speed
• Result :
• 2 CPU@1F Hz can be better than 1 CPU@2F Hz if communication support (HW and
SW) is adequate
• The catch :
• Not supported by von Neumann model
• Scheduling, task partitioning and communication are inter-dependent
• BUT SCHEDULING IS NOT ORTHOGONAL TO PROCESSOR MAPPING AND
INTERPROCESSOR COMMUNICATION
11 January 2012 MC4ES workshop 11
Generic MP system
• There is no conceptual difference between:
• Multicore (often assumed homogeneous)
• Manycore (often assumed heterogeneous)
• Parallel (closer to reality)
• Distributed
• Networked
• Difference is implementation architecture:
• Shared vs. distributed memory
• SMP vs. SMD vs. MIMD
• Note: PCIe (RapidIO, …) emulate shared bus using
point to point very fast sit serial lanes
11 January 2012 MC4ES workshop 12
7. Example: A Rack in a (MP)-SoC(1)
11 January 2012 MC4ES workshop 13
Example: A Rack in a (MP)-SoC(2)
11 January 2012 MC4ES workshop 14
8. Why MP-SoC is now standard
• High NRE, high frequency signals
• Conclusion :
• multi-core, course grain asynchronous SoC design
• cores as proven components -> well defined interfaces
• keep critical circuits inside
• simplify I/O, high speed serial links, no buses
• NRE dictates high volume -> more reprogrammability
• system is now a component
• below minimum thresholds of power and cost
• it becomes cheap to “burn” gates
• software becomes the differentiating factor
11 January 2012 MC4ES workshop 15
On concurrent multi-tasking
11 January 2012 MC4ES workshop 16
9. On concurrent multi-tasking (1)
• Tasks need to interact
• synchronize
• pass data = communicate
• share resources
• A task = a virtual single processor or unit of abstraction
• A (SW) multi-tasking system can emulate a (HW) real system
• Multi-tasking needs communication services
11 January 2012 MC4ES workshop 17
On concurrent multi-tasking (2)
• Theoretical model :
• CSP : Communicating Sequential Processes (and its variations)
• C.A.R. Hoare
• CSP := sequential processes + channels (cfr. occam)
• Channels := synchronised (blocked) communication, no protocol
• Formal, but doesn’t match complexity of real world
• Generic model : module based, multi-tasking based, process oriented ,…
• Generic model matches reality of MP-SoC
• Very powerful to break the von-Neumann constrictor
11 January 2012 MC4ES workshop 18
10. There are only programs
• Simplest form of computation is assignment :
a:= b
• Semi-Formal :
BEFORE : a = UNDEF; b = VALUE(b)
AFTER : a = VALUE(b); b = VALUE(b)
• Implementation in typical von Neumann machine :
Load b, register X
Store X, a
11 January 2012 MC4ES workshop 19
CSP explained in occam
PROC P1, P2 :
CHAN OF INT32 c1,c2 :
PAR
P1(c1, c2)
P2(c1, c2)
/* c1 ? a : read from channel c1 into variable a */
/* c1 ! a : write variable a into channel c2 */
/* order of execution not defined by clock but by */
/* channel communication : execute when data is ready */
Needed :
a C1
b
- context
P1 P2 - communication
C2
11 January 2012 MC4ES workshop 20
11. A small parallel program
No assumption in PAR case about order of
execution => self-synchronising
P1 P2
INT32 a : INT32 b :
c SEQ
SEQ
a:= ANY b:= ANY
c1 ! a c1 ? b
Equivalent :
SEQ
INT32 a,b :
a:= ANY
b:= ANY
b:= a
11 January 2012 MC4ES workshop 21
The PAR version at von Neuman machine level
PROC_1
Load a, register X
Store X, output register
(hidden : start channel transfer)
(hidden : transfer control to PROC_2
PROC_2
(hidden : detect channel transfer)
(hidden : transfer control to PROC_2)
Load input register, X
Store X, b
In between :
• Data moves from output register to input register
• Sequential case is an optimization of the parallel case
• But there hidden assumptions about data validity!
11 January 2012 MC4ES workshop 22
12. The same program for hardware with Handel-C
Void main(void)
par /* WILL GENERATE PARALLEL HW (1 clock cycle) */
chan chan_between;
int a, b;
{ chan_between ! a
chan_between ? b}
But :
seq /* WILL GENERATE SEQUENTIAL HW (2 clock cycles) */
chan chan_between;
int a, b;
chan_between ! a
chan_between ? b
}
11 January 2012 MC4ES workshop 23
Consequences
• Data is protected inside scope of process.
• Interaction is through explicit communication
• In order to safeguard abstract equivalence :
• Communication backbone needed
• Automatic routing needed (but deadlock free)
• Process scheduler if on same processor
• In order to safeguard real-time behavior
• Prioritisation of communication for dynamic applications (or schedule)
• In order to handle multi-byte communication :
• Buffering at communication layer, Packetisation, DMA in background
• Result :
• prioritized packet switching : header, priority, payload
• Communication not fundamentally different from data I/O
11 January 2012 MC4ES workshop 24
13. The (next generation) SoC
Xilinx Zync dual core ARM + FPGA
11 January 2012 MC4ES workshop 25
The (next generation) SoC
Altera dual core ARM + FPGA
11 January 2012 MC4ES workshop 26
14. How to program these things?
11 January 2012 MC4ES workshop 27
Where’s the (new) programming model?
• Issue: what’s wrong with the “ good old” software?
• => von neuman => shared memory syndrome
• Issue is not access to memory but integrity of memory
• Issue is not bandwidth to memory, but latency
• Sequential programs have lost the information of the
inherent parallelism in the problem domain
• Most attempts (MPI, ...) just add a heavy comm layer
• Issue: underlying hardware still visible
• Difficult for:
• Porting to another target - Scalability (from small to large AND
vice-versa)
• Application domain specific - Performance doesn’t scale
11 January 2012 MC4ES workshop 28
15. Beyond concurent multi-tasking
• Multi-tasking = Process Oriented Programming
• A Task =
• Unit of execution
• Encapsulated functional behavior
• Modular programming
• High Level [Programming] Language :
• common specification :
• for SW: compile to asm
• for HW: compile to VHDL or Verilog
• E.g. program PPC with ANSI C (and RTOS), FPGA with Handel-C
• C level design is enabler for SoC “co-design”
• More abstraction gives higher productivity
• But interfaces be better standardized for better re-use
• Interfaces can be “compiled” for higher volume applications
11 January 2012 MC4ES workshop 29
Multi-tasking API: an orthogonal set
• Events : binary data, one to one, local -> interface to HW
• [counting] semaphore : events with a memory, many to many, distributed
• FIFO queue : simple datacomm between tasks, many to many, distributed
• Mailbox/Port : variable size datacomm between tasks, rendez-vous,
one/many to one/many, distributed
• Resources : ownership protection, priority inheritance
• Memory maps/pools
• semantic issues : distributed operation, blocking, non-blocking, time-out,
asynchronous operation, group operators
• L1_memcpy_[A][WT] (not just memcpy!)
• L1_SendPacketToPort_[NW][W][T][A]
11 January 2012 MC4ES workshop 30
16. From MP to VSP:
Virtual Single Processor
11 January 2012 MC4ES workshop 31
Virtual Single Processor (VSP) model
• Transparent parallel programming
• Cross development on any platform + portability
• Scalability, even on heterogeneous targets
• Distributed semantics
• Program logic neutral to topology and object mapping
• Clean API provides for less programming errors
• Prioritized packet switching communication layer
• Based on “CSP” (C.A.R. Hoare): Communicating Sequential Processes:
VSP is pragmatic superset
• Implemented first in Virtuoso RTOS (now WRS/Intel) -> OpenComRTOS
Multitasking and message passing
Process oriented programming
Interfacing using communication protocols
Application doesn’t need to know physical layer
11 January 2012 MC4ES workshop 32
17. 11 January 2012 MC4ES workshop 33
11 January 2012 MC4ES workshop 34
18. Full application : Matlab/Simulink type design
Embedded DSP app with GUI front-end
GUI front-end Virtuoso tasks & communication channels, on specific DSP
card
Read Process Process
ADC
Audio Audio Audio Split L-R
ADC Driver
Data data data channels
Task
Task stage 1 stage 2
Parameter knobs,
monitor windows,
DSP 1 Process Process
etc... Parameter settings R L
& Control task channel channel
stage 3 stage 3
DSP 2
DSP 3
Process Process
Monitor Task R L
DSP 4 channel channel
stage 4 stage 4
Front-end can be
written in any
language, and run
Play Process
remotely Process
DAC Audio Audio Channel
DAC Driver Data Audio
data joiner
task task data
stage 6
stage 5
11 January 2012 MC4ES workshop 35
Homogeneous OpenComRTOS
Block diagram at top level, executable spec in e.g. C
11 January 2012 MC4ES workshop 36
19. Heterogeneous OpenComRTOS with VSP
and host OS
11 January 2012 MC4ES workshop 37
Next:
Heterogeneous VSP with reprogrammable HW
11 January 2012 MC4ES workshop 38
20. Conclusion
• RTOS is much more than real-time
• General purpose “process oriented” design and programming
• Hide complexity inside chip for hardware (in SoC chip)
• Hide complexity inside task for software (with RTOS)
• Hide complexity of communication in system level support
• CSP provides unified theoretical base for hardware and software, RTOS
makes it pragmatic for real world :
• “DESIGN PARALLEL, OPTIMIZE SEQUENTIALLY”
• Software meets hardware with same development paradigm :
• Handel-C for FPGA, “Parallel” C for SW
• FPGA with macro-blocks is next generation SW defined SoC :
• Time for asynchronous HW design ?
11 January 2012 MC4ES workshop 39
Embedded Systems
Engineering
11 January 2012 MC4ES workshop 40
22. The OpenComRTOS approach
• Derived from a unified systems engineering
methodology
• Two keywords:
• Unified Semantics
• use of common “systems grammar”
• covers requirements, specifications, architecture, runtime, ...
• Interacting Entities ( models almost any system)
• RTOS and embedded systems:
• Map very well on “interacting entities”
• Time and architecture mostly orthogonal
• Logical model is not communication but “interaction”
11 January 2012 MC4ES workshop 43
The OpenComRTOS project
• Target systems:
• Multi-Many-core, parallel processors, networked systems, include
“legacy” processing nodes running old (RT)OS
• Methodology:
• Formal modeling and formal verification
• Architecture:
• Target is multi-node, hence communication is system-
level issue, not a programmer’s concern
• Scheduling is orthogonal issue
• An application function = a “task” or a set of “tasks”
• Composed of sequential “segments”
• In between Tasks synchronise and pass data (“interaction”)
11 January 2012 MC4ES workshop 44
23. TLA+ used for formal modelling
• TLA (the Temporal
Logic of Actions) is
a logic for specifying
and reasoning about
concurrent and
reactive systems.
• Also UPPAAL for
time-out services
11 January 2012 MC4ES workshop 45
Graphical view of RTOS “Hubs”
Data needs to
be buffered Buffer List
CeilingPriority
Prioity Inheritance
For resources Owner Task
For semaphores Count
Predicate Action
Synchronisation
Synchronising
Predicate
Synchronisation
W W
L L
Waiting Lists T T
Threshold T
Generic Hub (N-N)
Similar to Guarded Actions or a pragmatic superset of CSP
11 January 2012 MC4ES workshop 46
24. All RTOS entities are “HUBs”
11 January 2012 MC4ES workshop 47
OpenComRTOS application view:
any entity can be mapped onto any node
11 January 2012 MC4ES workshop 48
25. Unexpected: RTOS 10x smaller
• Reference is Virtuoso RTOS (ex-Eonic Systems)
• New architectures benefits:
• Much easier to port
• Same functionilaty (and more) in 10x less code
• Smallest size SP: 1 KByte program, 200 bytes of RAM
• Smallest size MP: 2 KBytes
• Full version MP: 5 Kbytes, grows to 8 KB for complex SoC
• Why is small better ?
• Much better performance (less instructions)
• Frees up more fast internal memory
• Easier to verify and modify
• Architecture allows new services without changing the RTOS
kernel task!
11 January 2012 MC4ES workshop 49
Clean architecture gives small code
OpenComRTOS L1 code size figures (MLX16)
MP FULL SP SMALL
L0 L1 L0 L1
L0 Port 162 132
L1 Hub shared 574 400
L1 Port 4 4
L1 Event 68 70
L1 Semaphore 54 54
L1 Resource 104 104
L1 FIFO 232 232
L1 Resource List 184 184
Total L1 services 1220 1048
Grand Total 3150 4532 996 2104
Smallest application: 1048 bytes program code and 198 bytes RAM (data)
(SP, 2 tasks with 2 Ports sending/receiving Packets in a loop, ANSI-C)
Number of instructions : 605 instructions for one loop (= 2 x context switches,
2 x L0_SendPacket_W, 2 x L0_ReceivePacket_W)
11 January 2012 MC4ES workshop 50
26. Probably the smallest MP-demo in the world
Code Size Data Size
Platform firmware 520 0
- 2 application tasks 230 1002, of which
- 2 UART Driver tasks - Kernel stack: 100
- Kernel task 338 - Task stack: 4*64
- Idle task - ISR stack: 64
- Idle Stack: 50
- OpenComRTOS full MP
(_NW, _W, _WT, _A) 3500 - 568
Total 4138 + 520 1002 + 568
Conclusions
• Don’t be afraid of multi-many-parallel-distributed-
network-centric programming.
• It’s more natural than sequential programming
• The benefits are:
• scalability, faster development, reuse, …
11 January 2012 MC4ES workshop 52
27. The book
Formal Development of a Network-Centric RTOS
Software Engineering for Reliable Embedded
Systems
Verhulst, E., Boute, R.T., Faria, J.M.S., Sputh,
B.H.C., Mezhuyev, V.
Springer
Documents the project (IWT co-funded) and the
design, formal modeling, implementation of
OpenComRTOS
11 January 2012 MC4ES workshop 53