This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
The Linux Block Layer - Built for Fast StorageKernel TLV
The arrival of flash storage introduced a radical change in performance profiles of direct attached devices. At the time, it was obvious that Linux I/O stack needed to be redesigned in order to support devices capable of millions of IOPs, and with extremely low latency.
In this talk we revisit the changes the Linux block layer in the
last decade or so, that made it what it is today - a performant, scalable, robust and NUMA-aware subsystem. In addition, we cover the new NVMe over Fabrics support in Linux.
Sagi Grimberg
Sagi is Principal Architect and co-founder at LightBits Labs.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...Adrian Huang
This document describes setting up a QEMU virtual machine with Ubuntu 20.04.1 to debug Linux kernel code using gdb. It has a 2-socket CPU configuration with 16GB of memory and disabled KASAN and ASLR. The QEMU VM can be used to run sample code and observe Linux kernel behavior under gdb, such as setting conditional breakpoints to analyze page fault behavior for mmap addresses by referencing a gdb debugging text file.
LAS16-200: SCMI - System Management and Control InterfaceLinaro
Title: SCMI - System Management and Control Interface
Abstract: In this session we present a new standard proposal for system control and management. The industry, both in high end mobile and enterprise, is trending towards the use of power and system controllers. In most cases the controllers have very similar communication mechanisms between application processors and controllers. In addition, these controllers generally provide very similar functions, e.g. DVFS, power domain management, sensor management. This standard proposal provides an extensible, OS agnostic, and virtualizable interface to access these functions.
Speaker(s):Charles Garcia-Tobin
LCU13: An Introduction to ARM Trusted FirmwareLinaro
Resource: LCU13
Name: An Introduction to ARM Trusted Firmware
Date: 28-10-2013
Speaker: Andrew Thoelke
Video: http://www.youtube.com/watch?v=q32BEMMxmfw
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
The Linux Block Layer - Built for Fast StorageKernel TLV
The arrival of flash storage introduced a radical change in performance profiles of direct attached devices. At the time, it was obvious that Linux I/O stack needed to be redesigned in order to support devices capable of millions of IOPs, and with extremely low latency.
In this talk we revisit the changes the Linux block layer in the
last decade or so, that made it what it is today - a performant, scalable, robust and NUMA-aware subsystem. In addition, we cover the new NVMe over Fabrics support in Linux.
Sagi Grimberg
Sagi is Principal Architect and co-founder at LightBits Labs.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
qemu + gdb + sample_code: Run sample code in QEMU OS and observe Linux Kernel...Adrian Huang
This document describes setting up a QEMU virtual machine with Ubuntu 20.04.1 to debug Linux kernel code using gdb. It has a 2-socket CPU configuration with 16GB of memory and disabled KASAN and ASLR. The QEMU VM can be used to run sample code and observe Linux kernel behavior under gdb, such as setting conditional breakpoints to analyze page fault behavior for mmap addresses by referencing a gdb debugging text file.
LAS16-200: SCMI - System Management and Control InterfaceLinaro
Title: SCMI - System Management and Control Interface
Abstract: In this session we present a new standard proposal for system control and management. The industry, both in high end mobile and enterprise, is trending towards the use of power and system controllers. In most cases the controllers have very similar communication mechanisms between application processors and controllers. In addition, these controllers generally provide very similar functions, e.g. DVFS, power domain management, sensor management. This standard proposal provides an extensible, OS agnostic, and virtualizable interface to access these functions.
Speaker(s):Charles Garcia-Tobin
LCU13: An Introduction to ARM Trusted FirmwareLinaro
Resource: LCU13
Name: An Introduction to ARM Trusted Firmware
Date: 28-10-2013
Speaker: Andrew Thoelke
Video: http://www.youtube.com/watch?v=q32BEMMxmfw
Note: When you view the the slide deck via web browser, the screenshots may be blurred. You can download and view them offline (Screenshots are clear).
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
Speakers:
Date: September 29, 2016
★ Session Description ★
ARM Trusted Firmware has established itself as a key part of the ARMv8-A software stack. Broadening its applicability across all segments, from embedded to enterprise, is challenging. This session discusses the latest developments, including extension into the 32-bit space.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-402
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-402/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
A Linux device driver summary:
1. Device drivers are implemented as kernel modules that can be dynamically loaded and unloaded. They provide access to hardware devices through the file system.
2. There are three main types of device drivers: character, block, and network. Character drivers provide a stream of bytes interface, block drivers handle block-based storage, and network drivers manage network interfaces.
3. The file_operations structure contains function pointers that drivers implement to handle operations like open, close, read, and write on character devices. This structure associates the driver with a major/minor number range allocated using functions like alloc_chrdev_region.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
Основные темы, затронутые на семинаре:
Задачи и компоненты подсистемы управления памятью;
Аппаратные возможности платформы x86_64;
Как описывается в ядре физическая и виртуальная память;
API подсистемы управления памятью;
Высвобождение ранее занятой памяти;
Инструменты мониторинга;
Memory Cgroups;
Compaction — дефрагментация физической памяти.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
The document discusses power management in ARMv8-A and the integration of OP-TEE with the ARM Trusted Firmware. It provides an overview of the software stack and PSCI requirements. It then describes OP-TEE's system view and how it integrates with ARM Trusted Firmware as a runtime service. Finally, it discusses the programmer's view of PSCI and provides examples of how CPU_ON, CPU_OFF, and CPU_SUSPEND operations are handled between Linux, ARM Trusted Firmware, and OP-TEE.
In this talk Liran will discuss interrupt management in Linux, effective handling, how to defer work using tasklets, workqueues and timers. We'll learn how to handle interrupts in userspace and talk about the performance and latency aspects of each method as well as look at some examples from the kernel source.
Liran is the CTO at Mabel technology and co-founder of DiscoverSDK - Software Libraries directory and DiscoverCloud - Business Apps directory.
More than 20 years of training experience including courses in: Linux, Android, Real-time and Embedded systems, and many more.
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
This document discusses Linux kernel debugging. It provides an overview of debugging techniques including collecting system information, handling failures, and using printk(), KGDB, and debuggers. Key points covered are the components of a debugger, how KGDB can be used with gdb to debug interactively, analyzing crash data, and some debugging tricks and print functions.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
This document provides an overview of Linux memory management concepts including:
- RAM usage and primary vs secondary memory
- Memory mapping, process address spaces, and segmentation
- Pages, frames, page tables, and virtual memory
- Memory nodes, zones, and NUMA concepts
- Kernel memory allocation, page faults, and troubleshooting memory issues
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
It describes the MMC storage device driver functionality in Linux Kernel and it's role. It explains different type of storage devices available and how they are handled from MMC driver point of view. It describes eMMC (internal storage) device and SD (external storage) devices in details and SD protocol used for communicating with these devices in Linux.
This document provides an overview of cBPF and eBPF. It discusses the history and implementation of cBPF, including how it was originally used for packet filtering. It then covers eBPF in more depth, explaining what it is, its history, implementation including different program types and maps. It also discusses several uses of eBPF including networking, firewalls, DDoS mitigation, profiling, security, and chaos engineering. Finally, it introduces XDP and DPDK, comparing XDP's benefits over DPDK.
The document discusses the Linux kernel and its structure. The Linux kernel acts as the interface between hardware and software, contains device drivers for peripherals, handles resource allocation and tracking application access to files. It is also responsible for security and access controls for users. The kernel version numbers use even numbers to indicate stable releases.
VLIW and EPIC are processor architectures that aim to improve performance through instruction level parallelism. VLIW processors use very long instruction words containing multiple operations that can execute in parallel. EPIC evolved from VLIW and uses compiler scheduling to explicitly specify parallel instructions. The Intel Itanium architecture implemented the EPIC model using the IA-64 instruction set, featuring long instruction bundles, predication, and a large register file to facilitate parallel execution.
The document provides an overview of the initialization phase of the Linux kernel. It discusses how the kernel enables paging to transition from physical to virtual memory addresses. It then describes the various initialization functions that are called by start_kernel to initialize kernel features and architecture-specific code. Some key initialization tasks discussed include creating an identity page table, clearing BSS, and reserving BIOS memory.
The document provides an overview of the initialization process in the Linux kernel from start_kernel to rest_init. It lists the functions called during this process organized by category including functions for initialization of multiprocessor support (SMP), memory management (MM), scheduling, timers, interrupts, and architecture specific setup. The setup_arch section focuses on x86 architecture specific initialization functions such as reserving memory regions, parsing boot parameters, initializing memory mapping and MTRRs.
Arm device tree and linux device driversHoucheng Lin
This document discusses how the Linux kernel supports different ARM boards using a common source code base. It describes how device tree is used to describe hardware in a board-agnostic way. The kernel initializes machine-specific code via the device tree and initializes drivers by matching compatible strings. This allows a single kernel binary to support multiple boards by abstracting low-level hardware details into the device tree rather than the kernel source. The document also contrasts the ARM approach to the x86 approach, where BIOS abstraction and standardized buses allow one kernel to support most x86 hardware.
1. O documento fornece instruções sobre como produzir quadrinhos usando software livre e gratuito, como GIMP e Inkscape.
2. Ele explica como digitalizar desenhos, remover ruídos, suavizar traços, aplicar cores e adicionar texto usando os recursos desses programas.
3. Por fim, dá dicas sobre como posicionar balões de texto e redimensionar imagens para a composição final da história em quadrinhos.
LAS16-402: ARM Trusted Firmware – from Enterprise to EmbeddedLinaro
LAS16-402: ARM Trusted Firmware – from Enterprise to Embedded
Speakers:
Date: September 29, 2016
★ Session Description ★
ARM Trusted Firmware has established itself as a key part of the ARMv8-A software stack. Broadening its applicability across all segments, from embedded to enterprise, is challenging. This session discusses the latest developments, including extension into the 32-bit space.
★ Resources ★
Etherpad: pad.linaro.org/p/las16-402
Presentations & Videos: http://connect.linaro.org/resource/las16/las16-402/
★ Event Details ★
Linaro Connect Las Vegas 2016 – #LAS16
September 26-30, 2016
http://www.linaro.org
http://connect.linaro.org
This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
A Linux device driver summary:
1. Device drivers are implemented as kernel modules that can be dynamically loaded and unloaded. They provide access to hardware devices through the file system.
2. There are three main types of device drivers: character, block, and network. Character drivers provide a stream of bytes interface, block drivers handle block-based storage, and network drivers manage network interfaces.
3. The file_operations structure contains function pointers that drivers implement to handle operations like open, close, read, and write on character devices. This structure associates the driver with a major/minor number range allocated using functions like alloc_chrdev_region.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
Основные темы, затронутые на семинаре:
Задачи и компоненты подсистемы управления памятью;
Аппаратные возможности платформы x86_64;
Как описывается в ядре физическая и виртуальная память;
API подсистемы управления памятью;
Высвобождение ранее занятой памяти;
Инструменты мониторинга;
Memory Cgroups;
Compaction — дефрагментация физической памяти.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
The document discusses power management in ARMv8-A and the integration of OP-TEE with the ARM Trusted Firmware. It provides an overview of the software stack and PSCI requirements. It then describes OP-TEE's system view and how it integrates with ARM Trusted Firmware as a runtime service. Finally, it discusses the programmer's view of PSCI and provides examples of how CPU_ON, CPU_OFF, and CPU_SUSPEND operations are handled between Linux, ARM Trusted Firmware, and OP-TEE.
In this talk Liran will discuss interrupt management in Linux, effective handling, how to defer work using tasklets, workqueues and timers. We'll learn how to handle interrupts in userspace and talk about the performance and latency aspects of each method as well as look at some examples from the kernel source.
Liran is the CTO at Mabel technology and co-founder of DiscoverSDK - Software Libraries directory and DiscoverCloud - Business Apps directory.
More than 20 years of training experience including courses in: Linux, Android, Real-time and Embedded systems, and many more.
Linux Memory Management with CMA (Contiguous Memory Allocator)Pankaj Suryawanshi
Fundamentals of Linux Memory Management and CMA (Contiguous Memory Allocator) In Linux.
Virtual Memory, Physical Memory, Swap Space, DMA, IOMMU, Paging, Segmentation, TLB, Hugepages, Ion google memory manager
This document discusses Linux kernel debugging. It provides an overview of debugging techniques including collecting system information, handling failures, and using printk(), KGDB, and debuggers. Key points covered are the components of a debugger, how KGDB can be used with gdb to debug interactively, analyzing crash data, and some debugging tricks and print functions.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Video: https://www.facebook.com/atscaleevents/videos/1693888610884236/ . Talk by Brendan Gregg from Facebook's Performance @Scale: "Linux performance analysis has been the domain of ancient tools and metrics, but that's now changing in the Linux 4.x series. A new tracer is available in the mainline kernel, built from dynamic tracing (kprobes, uprobes) and enhanced BPF (Berkeley Packet Filter), aka, eBPF. It allows us to measure latency distributions for file system I/O and run queue latency, print details of storage device I/O and TCP retransmits, investigate blocked stack traces and memory leaks, and a whole lot more. These lead to performance wins large and small, especially when instrumenting areas that previously had zero visibility. This talk will summarize this new technology and some long-standing issues that it can solve, and how we intend to use it at Netflix."
This document provides an overview of Linux memory management concepts including:
- RAM usage and primary vs secondary memory
- Memory mapping, process address spaces, and segmentation
- Pages, frames, page tables, and virtual memory
- Memory nodes, zones, and NUMA concepts
- Kernel memory allocation, page faults, and troubleshooting memory issues
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
It describes the MMC storage device driver functionality in Linux Kernel and it's role. It explains different type of storage devices available and how they are handled from MMC driver point of view. It describes eMMC (internal storage) device and SD (external storage) devices in details and SD protocol used for communicating with these devices in Linux.
This document provides an overview of cBPF and eBPF. It discusses the history and implementation of cBPF, including how it was originally used for packet filtering. It then covers eBPF in more depth, explaining what it is, its history, implementation including different program types and maps. It also discusses several uses of eBPF including networking, firewalls, DDoS mitigation, profiling, security, and chaos engineering. Finally, it introduces XDP and DPDK, comparing XDP's benefits over DPDK.
The document discusses the Linux kernel and its structure. The Linux kernel acts as the interface between hardware and software, contains device drivers for peripherals, handles resource allocation and tracking application access to files. It is also responsible for security and access controls for users. The kernel version numbers use even numbers to indicate stable releases.
VLIW and EPIC are processor architectures that aim to improve performance through instruction level parallelism. VLIW processors use very long instruction words containing multiple operations that can execute in parallel. EPIC evolved from VLIW and uses compiler scheduling to explicitly specify parallel instructions. The Intel Itanium architecture implemented the EPIC model using the IA-64 instruction set, featuring long instruction bundles, predication, and a large register file to facilitate parallel execution.
The document provides an overview of the initialization phase of the Linux kernel. It discusses how the kernel enables paging to transition from physical to virtual memory addresses. It then describes the various initialization functions that are called by start_kernel to initialize kernel features and architecture-specific code. Some key initialization tasks discussed include creating an identity page table, clearing BSS, and reserving BIOS memory.
The document provides an overview of the initialization process in the Linux kernel from start_kernel to rest_init. It lists the functions called during this process organized by category including functions for initialization of multiprocessor support (SMP), memory management (MM), scheduling, timers, interrupts, and architecture specific setup. The setup_arch section focuses on x86 architecture specific initialization functions such as reserving memory regions, parsing boot parameters, initializing memory mapping and MTRRs.
Arm device tree and linux device driversHoucheng Lin
This document discusses how the Linux kernel supports different ARM boards using a common source code base. It describes how device tree is used to describe hardware in a board-agnostic way. The kernel initializes machine-specific code via the device tree and initializes drivers by matching compatible strings. This allows a single kernel binary to support multiple boards by abstracting low-level hardware details into the device tree rather than the kernel source. The document also contrasts the ARM approach to the x86 approach, where BIOS abstraction and standardized buses allow one kernel to support most x86 hardware.
1. O documento fornece instruções sobre como produzir quadrinhos usando software livre e gratuito, como GIMP e Inkscape.
2. Ele explica como digitalizar desenhos, remover ruídos, suavizar traços, aplicar cores e adicionar texto usando os recursos desses programas.
3. Por fim, dá dicas sobre como posicionar balões de texto e redimensionar imagens para a composição final da história em quadrinhos.
This document discusses Linux device drivers. It provides an overview of the history and purpose of Linux and device drivers. It explains that device drivers connect hardware and software by communicating at both the logical and physical layers. It also describes the difference between kernel and user modes, and discusses how device drivers are developed by utilizing Linux system calls and supporting development environments within the Linux kernel programming interface.
Linux Device Driver Training-TutorialsDaddyStryker King
The document outlines a Linux device driver training course which covers writing basic character drivers and modules, memory management, interrupts, synchronization, and debugging techniques. It also discusses related kernel topics like processes, scheduling, syscalls, and timers. Additionally, it introduces multimedia driver architectures for Android cameras and HDMI and provides a forum for questions.
Linux Kernel and Device Driver Development Training solely on a fair knowledge about the learning of Linux kernel, device driver, and RTOS (real time operating system)
Breaking into Open Source and Linux: A USB 3.0 Success StorySage Sharp
The Women in Computer Science Undergraduate Committee invites you to attend this term's public lecture in which we will host Sarah Sharp, Linux kernel developer and USB 3.0 driver author. Sarah will provide a deep dive into how USB 3.0 support was added to the Linux kernel, as an example of the technical, social, and cultural challenges in getting involved in open source development. Sarah will also provide tips for getting involved with the Linux kernel community.
An introduction to how the Linux kernel works: maintianers, scaling trust, and no regressions. This talk also gives tips to people who want to get involved with Linux kernel development, either through reporting bugs, reviewing code, or developing code.
The document discusses the Linux kernel. It describes the kernel as the core of the operating system that manages system resources like CPU time, memory, and network connectivity. It allocates resources to processes and interacts with the memory subsystem through function calls. It also manages processes, memory, filesystems, devices, and networking. The document also discusses kernel modules, device drivers, security issues, version numbering, and licensing terms.
Emertxe's presentation in Open Source India 2014 about an innovative self-learning kit we have created. Linux device drivers is often perceived as a difficult topic, that prevents students, enthusiasts and professionals from learning it. This presentation talks about various features (Board, SDK and sample code) of the kit.
Some basic knowledges required for beginners in writing linux kernel module - with a description of linux source tree, so that the idea of where and how develops. The working of insmod and rmmod commands are described also.
Introduction to embedded linux device driver and firmwaredefinecareer
This document provides an introduction to embedded Linux and device drivers. It defines an embedded system as a special-purpose computer designed to perform dedicated functions with real-time constraints, often embedded as part of a complete device. Linux is commonly used in embedded systems due to advantages like reuse of existing components, community support, and low cost. The document outlines the typical hardware and software requirements for developing embedded Linux systems, and provides an overview of the Linux kernel architecture for device drivers, including the unified device model, bus drivers, and platform devices.
The Linux kernel acts as an interface between applications and hardware, managing system resources and providing access through system calls; it uses a monolithic design where all components run in the kernel thread for high performance but can be difficult to maintain, though Linux allows dynamic loading of modules. Device drivers interface hardware like keyboards, hard disks, and network devices with the operating system, and are implemented as loadable kernel modules that are compiled using special makefiles and registered with the kernel through system calls.
Achieving Performance Isolation with Lightweight Co-KernelsJiannan Ouyang, PhD
This slides were presented at the 24th International Symposium on High-Performance Parallel and Distributed Computing (HPDC '15)
Performance isolation is emerging as a requirement for High Performance Computing (HPC) applications, particularly as HPC architectures turn to in situ data processing and application composition techniques to increase system throughput. These approaches require the co-location of disparate workloads on the same compute node, each with different resource and runtime requirements. In this paper we claim that these workloads cannot be effectively managed by a single Operating System/Runtime (OS/R). Therefore, we present Pisces, a system software architecture that enables the co-existence of multiple independent and fully isolated OS/Rs, or enclaves, that can be customized to address the disparate requirements of next generation HPC workloads. Each enclave consists of a specialized lightweight OS co-kernel and runtime, which is capable of independently managing partitions of dynamically assigned hardware resources. Contrary to other co-kernel approaches, in this work we consider performance isolation to be a primary requirement and present a novel co-kernel architecture to achieve this goal. We further present a set of design requirements necessary to ensure performance isolation, including: (1) elimination of cross OS dependencies, (2) internalized management of I/O, (3) limiting cross enclave communication to explicit shared memory channels, and (4) using virtualization techniques to provide missing OS features. The implementation of the Pisces co-kernel architecture is based on the Kitten Lightweight Kernel and Palacios Virtual Machine Monitor, two system software architectures designed specifically for HPC systems. Finally we will show that lightweight isolated co-kernels can provide better performance for HPC applications, and that isolated virtual machines are even capable of outperforming native environments in the presence of competing workloads.
ENERGY EFFICIENCY OF ARM ARCHITECTURES FOR CLOUD COMPUTING APPLICATIONSStephan Cadene
This thesis evaluates how the energy efficiency of the ARMv7 architecture based processors
Cortex-A9 MPCpre and Cortex-A8 compare in applications such as a SIPProxy
and a web server compared to Intel Xeon processors. The focus is on comparing
the energy efficiency between the two architectures rather than just the performance.
As the processors used in servers today have more computational power than
the Cortex-A9 MPCore, several of these slower but more energy efficient processors
are needed. Depending on the application, benchmarks indicate energy efficiency of
3-11 times greater for the ARM Cortex-A9 in comparison to the Intel Xeon. The topics
of interconnects between processors and overhead caused by using an increasing
number of processors, are left for later research
Shoot4U: Using VMM Assists to Optimize TLB Operations on Preempted vCPUsJiannan Ouyang, PhD
This slides were presented at the 12th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments (VEE’16).
Virtual Machine based approaches to workload consolidation, as seen in IaaS cloud as well as datacenter platforms, have long had to contend with performance degradation caused by synchronization primitives inside the guest environments. These primitives can be affected by virtual CPU preemptions by the host scheduler that can introduce delays that are orders of magnitude longer than those primitives were designed for. While a significant amount of work has focused on the behavior of spinlock primitives as a source of these performance issues, spinlocks do not represent the entirety of synchronization mechanisms that are susceptible to scheduling issues when running in a virtualized environment. In this paper we address the virtualized performance issues introduced by TLB shootdown operations. Our profiling study, based on the PARSEC benchmark suite, has shown that up to 64% of a VM's CPU time can be spent on TLB shootdown operations under certain workloads. In order to address this problem, we present a paravirtual TLB shootdown scheme named Shoot4U. Shoot4U completely eliminates TLB shootdown preemptions by invalidating guest TLB entries from the VMM and allowing guest TLB shootdown operations to complete without waiting for remote virtual CPUs to be scheduled. Our performance evaluation using the PARSEC benchmark suite demonstrates that Shoot4U can reduce benchmark runtime by up to 85% compared an unmodified Linux kernel, and up to 44% over a state-of-the-art paravirtual TLB shootdown scheme.
Denser, cooler, faster, stronger: PHP on ARM microserversJez Halford
This is the story of how I helped make arm.com run on a small collection of state-of-the art ARM-based microservers, and the lessons I learned along the way, from how not to crash your entire multi-node MySQL database several times a day, to how not to have nginx consume all your disk space within a few minutes, all the way through to how to build a rock solid PHP application that is *truly* scalable to almost any size.
LinuxCon2009: 10Gbit/s Bi-Directional Routing on standard hardware running Linuxbrouer
This talk my 2009 updates on the progress of doing 10Gbit/s routing on standard hardware running Linux. The results are good, BUT to achieve these results, a lot of tuning is required of hardware queues, MSI interrupts and SMP affinity, together with some (now) submitted patches. I\'ll explain the concept of network hardware queues and why interrupt and SMP tuning is essential. I\'ll present results from different hardware both 10GbE netcards and CPUs (current CPUs under test is AMD phenom and Core i7). Many future challenges still exists, especially in the area of more easy tuning. A high knowledge level about the Linux kernel is required to follow all the details.
Peemuperf is a Linux kernel module and userspace tool that uses the Performance Monitoring Unit (PMU) on ARM processors to monitor performance metrics like CPU cycles, cache misses, and stalls. It can profile the ARM Cortex A8 and A9 by dynamically configuring the number and types of performance counters. The tool outputs profiling data to the Linux proc filesystem for inspection in userspace. Peemuperf aims to provide cache monitoring capabilities for ARM devices where the oprofile tool is currently limited.
This document discusses scheduling for multicore processors. It begins by explaining that multicore processors pack multiple CPU cores onto a single chip to increase processing speed. However, traditional C programs only use one CPU, so simply adding more CPUs does not speed programs up. The document then covers several challenges in multicore scheduling, such as cache coherence and affinity. It proposes some solutions like multi-queue scheduling, where each CPU has its own job queue, to help address issues like lack of scalability from single-queue approaches. Common Linux schedulers like the O(1) scheduler and Completely Fair Scheduler that use multiple queues are also mentioned.
Cache coherence refers to maintaining consistency between data stored in caches and the main memory in a system with multiple processors that share memory. Without cache coherence protocols, modified data in one processor's cache may not be propagated to other caches or memory. There are different levels of cache coherence - from ensuring all processors see writes instantly to allowing different ordering of reads and writes. Cache coherence aims to ensure reads see the most recent writes and that write ordering is preserved across processors. Directory-based and snooping protocols are commonly used to maintain coherence between caches in multiprocessor systems.
This document provides an introduction to multiprocessor systems and discusses different multiprocessor architectures including shared memory, distributed memory, and distributed shared memory systems. It describes the key differences between Uniform Memory Access (UMA) and Non-Uniform Memory Access (NUMA) models. Cache coherence problems that can arise in shared memory systems are discussed along with solutions like snooping and directory-based cache coherence protocols.
The document discusses the architecture and internals of Unix operating systems. It describes the high-level architecture with the hardware at the bottom providing basic services, the operating system kernel interacting directly with hardware and providing common services to user programs, and user programs being isolated from hardware. It also discusses topics like multiprocessing, multi-core processors, operating system services like process management and scheduling, and the file subsystem as a key part of the Unix kernel architecture.
This document discusses computer system architecture and operating system structures. It covers single and multiprocessor systems, including symmetric and asymmetric multiprocessing. It also discusses clustered systems, operating system operations like interrupts and dual mode, and system calls. Finally, it discusses user interfaces like command line and graphical user interfaces, and simple operating system structures.
This document discusses operating system structures and components. It describes four main OS designs: monolithic systems, layered systems, virtual machines, and client-server models. For each design, it provides details on how the system is organized and which components are responsible for which tasks. It also discusses some advantages and disadvantages of the different approaches. The document concludes by explaining how client-server models address issues with distributing OS functions to user space by having some critical servers run in the kernel while still communicating with user processes.
This document discusses kernel synchronization in Linux. It begins by outlining kernel control paths and when synchronization is necessary, such as to prevent race conditions when kernel control paths are interleaved. It then describes various synchronization primitives like spin locks, semaphores, and RCU. Examples are given of how these primitives can be used to synchronize access to kernel data structures. Interrupt-aware versions of synchronization primitives are also outlined. The document concludes with examples of how race conditions are prevented for specific data structures and operations in the kernel.
The document proposes an operation zone based load balancer to improve user responsiveness on multicore embedded systems. It aims to reduce the costs of frequent task migration by the existing load balancers. The proposed approach divides the CPU utilization range into three zones - cold, warm and hot. The load balancer operates less frequently in the cold zone and more frequently in the hot zone, with intermediate behavior in the warm zone. Evaluation shows the approach reduces scheduling latency compared to CPU affinity based and non-affinity based systems under stress tests.
The document discusses multiprocessor and multicore systems. It defines multiprocessors as systems with two or more CPUs sharing full access to common RAM. It describes different hardware architectures for multiprocessors like bus-based, UMA, and NUMA systems. It discusses cache coherence protocols and issues like false sharing. It also covers scheduling and synchronization challenges in multiprocessor systems like load balancing, task assignment, and avoiding priority inversions.
The document provides an overview of embedded systems and ARM processors. It discusses key aspects of ARM processors including the pipeline, memory management features like cache, TCM, MMU and TLB. It also summarizes the AMBA specification and differences between operating in ARM and Thumb states. The document is intended as lecture material for an embedded systems course covering ARM architecture.
CSCI 2121- Computer Organization and Assembly Language Labor.docxannettsparrow
CSCI 2121- Computer Organization and Assembly Language
Laboratory No. 5
Week of March 12th, 2018
Submission Instructions:
1. Save the files as shift.sv and vending.sv
2. Put all files into one folder.
3. Compress the folder into a .zip file.
4. Submit the zip file on Brightspace.
5. Submission Deadline: Sunday, April 15th, 2018, 11.55PM
SRC CHIP PROJECT
In order to proceed on the project for this course, there are several key Verilog concepts which are
necessary in the implementation of this code.
Part 0: More Verilog concepts.
Shared busses and tri-state buffers
Before we begin implementing the CPU, let's go over a few concepts which will be needed for the lab
assignment. The first is the concept of a shared bus. We've already seen busses in Verilog in terms of an
array of wires or registers, however in the context of our CPU, the word "bus" has another meaning. Our
CPU uses a shared "bus" which is a wire connected between the inputs and outputs of many different
registers, in order to share data:
This can be easily modelled in Verilog, using the inout keyword for a module. This defines a wire which
can act as both an input and an output to our module. However, there is one critically important design
concept which goes along with the usage of shared busses:
Only one signal can be written to a shared bus at any given time.
If two signals are written to the bus, the result is a bus collision, and the value is undefined. In the
simulation, this is represented as "X", but in real life this would cause garbage data to exist on the bus,
potentially corrupting any process which is running on the CPU. As Verilog programmers, it's our job to
ensure that a bus collision never happens. To facilitate this, we'll use a digital logic component called a
tri-state buffer:
The purpose of a tri-state buffer is to act as a switch. If B is 1, then the data can pass through the buffer,
but if not, it simply disconnects the wire, outputting a high-impedance state (in Verilog, denoted by Z).
hz943141
A B C
0 0 Z
0 1 0
1 0 Z
1 1 1
In Verilog, we can't directly write a tri-state buffer, but it can be easily synthesized using a ternary
operator as shown on line 11:
Line 11 shows the construction of a ternary operator to use as a tri-state buffer. If the input called
activate_out is set to 1, then the tri-state buffer is activated, and the circuit will put my_result on the
bus. Otherwise, it puts the value 32'bz on the bus, which will disconnect my_result from the bus, and
allow another circuit to write to the bus.
Note: It is important that during your lab assignment, any circuits which write to the bus have a tri-
state buffer implemented to prevent bus collisions.
Verilog Tasks:
Verilog tasks are like object functions in Java. Read more about them here. You may find that they are
useful in separating the code for instructions within a module, particularly for th.
This document discusses parallel processing and cache coherence in computer architecture. It defines parallel processing as using multiple CPUs simultaneously to execute a program faster. It describes different types of parallel processor systems based on the number of instruction and data streams. It then discusses symmetric multiprocessors (SMPs), which have multiple similar processors that share memory and I/O facilities. Finally, it explains the cache coherence problem that can occur when multiple caches contain the same data, and describes the MESI protocol used to maintain coherence between caches.
This document discusses different approaches to memory management in operating systems. It begins by describing monoprogramming without swapping or paging, where one program uses all available memory at a time. It then describes multiprogramming using fixed memory partitions, either with separate queues for each partition or a single queue. The challenges of relocation and protection when programs are loaded at different addresses are also covered. Finally, it introduces the concepts of swapping and virtual memory for handling situations where not all active processes fit in main memory.
This document discusses multi-core programming and how the architecture of modern multi-core CPUs affects programming. It covers key topics like how adding more cores introduces bottlenecks due to shared resources, the differences between multi-CPU and multi-core machines, how memory is shared between cores, and how features like caches, pipelines and non-uniform memory access impact programming. The document provides guidance on how to optimize code for multi-core CPUs by avoiding delays from shared resources and improving instruction scheduling.
The document discusses processor management and device management. It covers topics like job scheduling, process scheduling algorithms, cache memory, interrupts, I/O devices, and the need for a device manager. It describes how the processor manager allocates CPUs to jobs using job schedulers and process schedulers. Common process scheduling algorithms are discussed like FCFS, SJN, priority scheduling, SRT, and round robin. Cache memory levels like L1, L2, and L3 are also summarized.
Aman 16 os sheduling algorithm methods.pptxvikramkagitapu
This document discusses operating systems and CPU scheduling algorithms. It begins by defining an operating system and providing examples of common operating systems. It then describes different types of operating systems including mainframe, batch processing, multiprogramming, time-sharing/multitasking, multiprocessor, distributed, and desktop systems. The document also discusses various CPU scheduling algorithms such as first-come first-served, priority-based, round robin, and shortest-job-first scheduling. Examples are provided to calculate average wait times for processes under different scheduling algorithms.
Chip Multithreading Systems Need a New Operating System Scheduler Sarwan ali
This document discusses the need for a new operating system scheduler for chip multithreading (CMT) systems. CMT combines chip multiprocessing and hardware multithreading to improve processor utilization. The current schedulers do not scale well to the large number of hardware threads in CMT systems. A new scheduler is proposed that would model resource contention and use this to minimize contention and maximize throughput when assigning threads to processors. Experiments show that resource contention, especially in the processor pipeline, has a significant impact on performance and a CMT-aware scheduler could improve performance by up to 2x.
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
Our main aim of research is to find the limit of Amdahl's Law for multicore processors, to make number of cores giving more efficiency to overall architecture of the CMP(Chip Multi
Processor a.k.a. Multicore Processor). As it is expected this limit will be in the architecture of Multicore Processor, or in the programming. We surveyed the architecture of the Multicore
processors of various chip manufacturers namely INTEL™, AMD™, IBM™ etc., and the various techniques there followed in, for improving the performance of the Multicore
Processors. We conducted cluster experiments to find this limit. In this paper we propose an alternate design of Multicore processor based on the results of our cluster experiment.
Similar to Linux Device Driver parallelism using SMP and Kernel Pre-emption (20)
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
In the rapidly evolving landscape of technologies, XML continues to play a vital role in structuring, storing, and transporting data across diverse systems. The recent advancements in artificial intelligence (AI) present new methodologies for enhancing XML development workflows, introducing efficiency, automation, and intelligent capabilities. This presentation will outline the scope and perspective of utilizing AI in XML development. The potential benefits and the possible pitfalls will be highlighted, providing a balanced view of the subject.
We will explore the capabilities of AI in understanding XML markup languages and autonomously creating structured XML content. Additionally, we will examine the capacity of AI to enrich plain text with appropriate XML markup. Practical examples and methodological guidelines will be provided to elucidate how AI can be effectively prompted to interpret and generate accurate XML markup.
Further emphasis will be placed on the role of AI in developing XSLT, or schemas such as XSD and Schematron. We will address the techniques and strategies adopted to create prompts for generating code, explaining code, or refactoring the code, and the results achieved.
The discussion will extend to how AI can be used to transform XML content. In particular, the focus will be on the use of AI XPath extension functions in XSLT, Schematron, Schematron Quick Fixes, or for XML content refactoring.
The presentation aims to deliver a comprehensive overview of AI usage in XML development, providing attendees with the necessary knowledge to make informed decisions. Whether you’re at the early stages of adopting AI or considering integrating it in advanced XML development, this presentation will cover all levels of expertise.
By highlighting the potential advantages and challenges of integrating AI with XML development tools and languages, the presentation seeks to inspire thoughtful conversation around the future of XML development. We’ll not only delve into the technical aspects of AI-powered XML development but also discuss practical implications and possible future directions.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Best 20 SEO Techniques To Improve Website Visibility In SERPPixlogix Infotech
Boost your website's visibility with proven SEO techniques! Our latest blog dives into essential strategies to enhance your online presence, increase traffic, and rank higher on search engines. From keyword optimization to quality content creation, learn how to make your site stand out in the crowded digital landscape. Discover actionable tips and expert insights to elevate your SEO game.
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Skybuffer SAM4U tool for SAP license adoptionTatiana Kojar
Manage and optimize your license adoption and consumption with SAM4U, an SAP free customer software asset management tool.
SAM4U, an SAP complimentary software asset management tool for customers, delivers a detailed and well-structured overview of license inventory and usage with a user-friendly interface. We offer a hosted, cost-effective, and performance-optimized SAM4U setup in the Skybuffer Cloud environment. You retain ownership of the system and data, while we manage the ABAP 7.58 infrastructure, ensuring fixed Total Cost of Ownership (TCO) and exceptional services through the SAP Fiori interface.
2. Slide 2
• Understanding of Linux Device Drivers
• Basic understanding of Linux Synchronization mechanisms like
Semaphore, Mutex and Spin Locks
PrerequisitesPrerequisites
4. Slide 4
Driver Parallelism
• Parallelism or Concurrency arises when system tries to do more than one
thing at once
– Concurrency is when two tasks can start, run, and complete in
overlapping time periods. It doesn't necessarily mean they'll ever
both be running at the same instant.
– Parallelism is when tasks literally run at the same time
• The goal of parallelism/concurrency is to improve system performance
• The side affect is that it can also lead to Race conditions
• Further discussion in the slides will highlight the sources of
parallelism/concurrency, howto improve performance and avoid race
conditions for Linux Device Drivers
http://www.fasterj.com/cartoon/cartoon106.shtml
5. Slide 5
Kernel Preemption
• CONFIG_PREEMPT
– This kernel config option reduces the latency of the kernel by making all kernel
code (that is not executing in a critical section) preemptible.
– This allows reaction to interactive events by permitting a low priority process to
be preempted involuntarily even if it is in kernel mode executing
– After execution of an asynchronous event like interrupt handler, if a higher
priority process is ready to run the current process is replaced.
– Useful for embedded system with latency requirements in the milliseconds
range.
6. Slide 6
SMP Architecture
• Evolution of multiprocessor architectures
– Late 60s saw need for more CPU processing power for scientific and
compute intensive applications.
– Two or more CPUs combined to form a single computer
• SMP (Symmetric Multiprocessing) is one of the multiprocessor
architecture.
• AMP, Cluster are others
• Basic idea, more tasks in parallel per unit time
7. Slide 7
SMP Architecture
Cache Cache Cache Cache
CPU CPU CPU CPU
I/O
Memory
Fig 1 : Logical view of SMP
In actual hardware implementation, cache will not be
directly connected to bus.
Cache Cache Cache Cache
CPU CPU CPU CPU
I/O
Memory
Fig 1 : Logical view of SMP
In actual hardware implementation, cache will not be
directly connected to bus.
8. Slide 8
SMP Architecture Contd
• 4 CPU SMP system shown in diagram, all CPUs would be symmetric i.e.
would be of same architecture, frequency etc
• CPU, Memory, IO tightly coupled using high speed interconnect bus,
allowing any unit connected to bus to communicate with any other unit
• Single globally accessible memory used by all CPUs, No local RAM in
CPUs, Data changes visible to all CPUs
• Symmetric or equal access to global shared memory, contents are fully
shared, all CPUs use the same address whenever referring to the same
piece of data
• I/O access also symmetric, i.e. any cpu can initiate I/O
9. Slide 9
SMP Architecture Cont
• Interrupts distributed across CPUs by PIC
• Access to bus and memory has to be arbitrated so that no 2 CPUs step
on each other, and all have guaranteed fair access
• Max CPUs that can be used depends on Bus bandwidth
• Only one instance of OS or Operating System, which is loaded in main
memory
• Concurrent access to kernel data structures, hence kernel needs to be
SMP aware
11. Slide 11
SMP Intricacies: Cache Coherency
• CPU stores data into cache in most implementations to improve system
performance.
• Consider the case of 2 Threads running on 2 different CPUs in a SMP
system. Both use global variable “Data”. If one of them modifies it to 1, it
is reflected in its own cache only. Values in main memory and other cpu’s
cache are stale, and if those values are read by other CPU, results could
be unpredictable. Hence the need to maintain consistency or coherency
of caches.
• This problem is typically solved by Hardware cache consistency protocols,
which include snooping and write-update/write-invalidate
12. Slide 12
SMP Intricacies: Atomic
operations
• Two threads trying to obtain the same semaphore simultaneously. Both
read value of 0 think its available and set it to 1.
• These issues are solved by using atomic instructions provided by each
architecture
• Special instructions provide Atomic test and set operations. Example
load-linked and store-conditional instructions in MIPS and load-exclusive
store-exclusive in ARM
13. Slide 13
USB Subsystem Analysis
USB Host
Controller
EHCI Driver
USB Core
USB Print
Class Driver
USB Mass Storage
Class Driver
USB Print
APP
USB Mass
Storage APP
Linux
Host
USB Device
Controller
UDC Driver
Mass storage
gadget Driver
Print gadget
Driver
USB
Print App
Linux
Device
Simplified view of USB Subsystem
14. Slide 14
USB Subsystem Analysis:
No preempt
• Assume Linux host has initiated a large transfer for USB mass storage.
• In-kernel transfer would not be pre-empted until available data is
exhausted.
• High priority, small amount of data for Print would get scheduled only after
mass storage transfer is complete.
• This affects end user experience
15. Slide 15
USB Subsystem Analysis:
Preempt Enabled
• Assume the same scenario with kernel preemption enabled.
• In kernel transfer of mass-storage can be preempted and replaced by
Print data transfer, for example after processing a keyboard or timer
interrupt
• Opens another parallel path into both USB core and Ehci drivers, since
mass storage transfer is not complete and Print transfer has started.
• Print transfer could re-open the same device, access the same data
structures for initiating transfer, and could even disconnect the device.
16. Slide 16
USB Subsystem Analysis:
Preempt Enabled
• Hence driver design needs to determine all parallel paths and points at
which its safe to be pre-empted, at the same time enable parallelism.
• For example it could be safe to pre-empt once URB request is queued,
but might not be safe to pre-empt when DMA is in progress since DMA
configuration registers could be overwritten.
17. Slide 17
USB Subsystem Analysis: SMP
• Assume the previous scenario on a SMP system
• In this case the scheduler need not pre-empt the running mass storage transfer,
but can schedule the print transfer on an another CPU.
• This too opens a new parallel path into the drivers, and both would be executing
at the same instant of time.
• Hence if parallelism is taken care in the drivers, its to a large extent SMP safe.
• In SMP systems Interrupt handler and driver code could run concurrently on
different CPUs.
• Hence the need to protect Interrupt handlers using spin locks
18. Slide 18
Driver Scenarios
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
local_irq_disable();
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
local_irq_enable();
}
irqreturn_t ts_isr (int irq, void *dev_id)
{
/* Process Interrupt */
list_add_tail(node, &ts_list);
}
local_irq_disable () protects from both interrupt handler and
preemption
spin_lock_irqsave () needs to be added for SMP safe in Driver
Code & ISR
19. Slide 19
Driver Scenarios: Cont
Locking using Mutex/Semaphore doesn't disable pre-emption,
but guarantees that data structure is not corrupted on pre-
emption
Both SMP safe and Pre-empt Safe
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
int process_rest_entries()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process remaining elements */
}
mutex_unlock(ts->lock);
}
20. Slide 20
Driver Scenarios: Cont
Functions process_ts_entries() and
process_rest_entries() could deadlock if pre-empted
while holding one of the locks
Locks need to be obtained in the same order, to avoid
deadlock
static LIST_HEAD(ts_list);
static LIST_HEAD(tc_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
/* Some processing */
mutex_lock_interruptible(tc->lock);
}
int process_rest_entries()
{
mutex_lock_interruptible(tc->lock);
/* Some processing */
mutex_lock_interruptible(ts->lock);
}
21. Slide 21
Driver Scenarios: Cont
In some cases it might be better to access resources from a single
function, rather than have locks spread across through out the code
static LIST_HEAD(ts_list);
int process_ts_entries ()
{
mutex_lock_interruptible(ts->lock);
list_for_each_entry(ts, &ts_list, node) {
/* Process List elements */
list_del(node);
}
mutex_unlock(ts->lock);
}
{
/* Process list elements */
process_ts_entries();
}
{
/* Process list elements */
process_ts_entries();
}
22. Slide 22
Driver Scenarios
• Don’t use one big lock for everything, reduces concurrency
• Too fine-grained locks increases overhead
• Need to balance both aspects
• Reader –Writer locks
– If Data structures are read more often than being updated
– Allows multiple reads locks to be obtained simultaneously.
– Allows single write lock to be obtained, and also prevents any read lock from
being obtained while write lock is held
– Available for both spin locks and semaphores
• Stack variables/structures don't need locking, since on pre-emption
another instance is created
23. Slide 23
Summary
• Concurrency/Parallelism needs to be one of the criteria during Driver Design
phase
• Analysis required to determine the parallel paths and protection for critical
sections
• Drivers which ensure concurrency using appropriate locking techniques, not only
avoids race conditions but also improves performance
• Unit testing could be used to test some of the parallel paths in the driver
– Two different applications which will enable parallel path into the same driver.
– Two instances for the same application.