This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...The Linux Foundation
This document discusses vSVM design on the Xen hypervisor. It proposes exposing SVM extensions in hardware like PASID, ATS and PRQ through virtual IOMMU capabilities in Xen. This would allow guest VMs to utilize shared virtual addressing between CPU and devices. The design would involve shadow extended context entries pointing to guest PASID tables and queuing invalidation of translation caches between host and guest. Currently Xen supports device assignment but not full IOMMU functionality or SVM extensions for shared virtual addressing across VMs.
HKG15-107: ACPI Power Management on ARM64 Servers (v2)Linaro
HKG15-107: ACPI Power Management on ARM64 Servers
---------------------------------------------------
Speaker: Ashwin Chaugule
Date: February 9, 2015
---------------------------------------------------
★ Session Summary ★
Status of CPPC with runtime PM and discussion on idle PM with ACPI
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250767
Video: https://www.youtube.com/watch?v=eDDgYIkUHLI
Etherpad: http://pad.linaro.org/p/hkg15-107
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The U-Boot is an "Universal Bootloader" ("Das U-Boot") is a monitor program that is under GPL. This production quality boot-loader is used as default boot loader by several board vendors. It is easily portable and easy to port and to debug by supporting PPC, ARM, MIPS, x86,m68k, NIOS, Microblaze architectures. Here is a presentation that introduces U-Boot.
This presentation covers the general concepts about real-time systems, how Linux kernel works for preemption, the latency in Linux, rt-preempt, and Xenomai, the real-time extension as the dual kernel approach.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
The document discusses QEMU and adding a new device to it. It begins with an introduction to QEMU and its uses. It then discusses setting up a development environment, compiling QEMU, and examples of existing devices. The main part explains how to add a new "Devix" device by creating source files, registering the device type, initializing PCI configuration, and registering memory regions. It demonstrates basic functionality like interrupts and I/O access callbacks. The goal is to introduce developing new emulated devices for QEMU.
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
XPDDS17: Shared Virtual Memory Virtualization Implementation on Xen - Yi Liu,...The Linux Foundation
This document discusses vSVM design on the Xen hypervisor. It proposes exposing SVM extensions in hardware like PASID, ATS and PRQ through virtual IOMMU capabilities in Xen. This would allow guest VMs to utilize shared virtual addressing between CPU and devices. The design would involve shadow extended context entries pointing to guest PASID tables and queuing invalidation of translation caches between host and guest. Currently Xen supports device assignment but not full IOMMU functionality or SVM extensions for shared virtual addressing across VMs.
HKG15-107: ACPI Power Management on ARM64 Servers (v2)Linaro
HKG15-107: ACPI Power Management on ARM64 Servers
---------------------------------------------------
Speaker: Ashwin Chaugule
Date: February 9, 2015
---------------------------------------------------
★ Session Summary ★
Status of CPPC with runtime PM and discussion on idle PM with ACPI
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250767
Video: https://www.youtube.com/watch?v=eDDgYIkUHLI
Etherpad: http://pad.linaro.org/p/hkg15-107
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
The U-Boot is an "Universal Bootloader" ("Das U-Boot") is a monitor program that is under GPL. This production quality boot-loader is used as default boot loader by several board vendors. It is easily portable and easy to port and to debug by supporting PPC, ARM, MIPS, x86,m68k, NIOS, Microblaze architectures. Here is a presentation that introduces U-Boot.
This presentation covers the general concepts about real-time systems, how Linux kernel works for preemption, the latency in Linux, rt-preempt, and Xenomai, the real-time extension as the dual kernel approach.
This document summarizes a presentation on static partitioning virtualization for RISC-V. It discusses the motivation for embedded virtualization, an overview of static partitioning hypervisors like Jailhouse and Xen, and the Bao hypervisor. It then provides an overview of the RISC-V hypervisor specification and extensions, including implemented features. It evaluates the performance overhead and interrupt latency of a prototype RISC-V hypervisor implementation with and without interference mitigations like cache partitioning.
The document discusses QEMU and adding a new device to it. It begins with an introduction to QEMU and its uses. It then discusses setting up a development environment, compiling QEMU, and examples of existing devices. The main part explains how to add a new "Devix" device by creating source files, registering the device type, initializing PCI configuration, and registering memory regions. It demonstrates basic functionality like interrupts and I/O access callbacks. The goal is to introduce developing new emulated devices for QEMU.
The document discusses ioremap and mmap functions in Linux for mapping physical addresses into the virtual address space. Ioremap is used when physical addresses are larger than the virtual address space size. It maps physical addresses to virtual addresses that can be accessed by the CPU. Mmap allows a process to map pages of a file into virtual memory. It is useful for reducing memory copies and improving performance of file read/write operations. The document outlines the functions, flags, and flows of ioremap, mmap, and implementing a custom mmap file operation for direct physical memory mapping.
This document discusses making Linux capable of hard real-time performance. It begins by defining hard and soft real-time systems and explaining that real-time does not necessarily mean fast but rather determinism. It then covers general concepts around real-time performance in Linux like preemption, interrupts, context switching, and scheduling. Specific features in Linux like RT-Preempt, priority inheritance, and threaded interrupts that improve real-time capabilities are also summarized.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
---------------------------------------------------
Speaker: Jorge Ramirez-Ortiz
Date: February 13, 2015
---------------------------------------------------
★ Session Summary ★
[Note: this is a joint Security/Power Management session) Understand what use cases related to Power Management have to interact with Trusted Firmware via Secure calls. Walk through some key use cases like CPU Suspend and explain how PM Linux drivers interacts with Trusted Firmware / PSCI
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250855
Video: https://www.youtube.com/watch?v=hQ2ITjHZY4s
Etherpad: http://pad.linaro.org/p/hkg15-505
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsThe Linux Foundation
The document discusses hardware accelerated virtualization capabilities in ARM Cortex processors including the Cortex-A15. It describes new features like large physical addressing, virtualization extensions, and a virtual interrupt controller that allow multiple operating system instances and work environments to run simultaneously in isolation on ARM devices.
QEMU is a free and open-source hypervisor that performs hardware virtualization by emulating CPUs through dynamic binary translation and providing device models. This allows it to run unmodified guest operating systems. It can be used to create virtual machines similarly to VMWare, VirtualBox, KVM, and Xen. QEMU also supports emulating different CPU architectures and can save and restore the state of a virtual machine.
This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
The document provides an overview of Das U-Boot, a universal boot loader used to load operating systems and applications into memory on embedded systems. It discusses U-Boot's features such as its command line interface, ability to load images from different sources, and support for various architectures and boards. It also covers compiling and configuring U-Boot, as well as its basic command set and image support capabilities.
The document discusses Linux device trees and how they are used to describe hardware configurations. Some key points:
- A device tree is a data structure that describes hardware connections and configurations. It allows the same kernel to support different hardware.
- Device trees contain nodes that represent devices, with properties like compatible strings to identify drivers. They describe things like memory maps, interrupts, and bus attachments.
- The kernel uses the device tree passed by the bootloader to identify and initialize hardware. Drivers match based on compatible properties.
- Device tree files with .dts extension can be compiled to binary blobs (.dtb) and overlays (.dtbo) used at boot time to describe hardware.
This document discusses memory ordering and synchronization in multithreaded programs. It begins with background on mutexes, semaphores, and their differences. It then discusses problems that can occur with locking-based synchronization methods like mutexes, such as deadlocks, priority inversion, and performance issues. Alternative lock-free programming techniques using atomic operations are presented as a way to synchronize access without locks. Finally, memory ordering, consistency models, barriers, and their implementations in compilers, Linux kernels, and ARM architectures are covered in detail.
This document discusses SR-IOV (Single Root I/O Virtualization), which allows a PCIe device to appear as multiple separate devices. It describes how SR-IOV works by introducing physical functions and virtual functions. It then outlines the steps to enable SR-IOV on a Xen hypervisor, including configuring the network device, enabling virtual functions, binding VFs to the pciback driver, and assigning VFs to guest VMs. Reference links are also provided for additional information on SR-IOV and its implementation in Xen.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
QEMU is an emulator that uses dynamic translation to emulate one instruction set architecture (ISA) on another host ISA. It translates guest instructions to an intermediate representation (TCG IR) code, and then compiles the IR code to native host instructions. QEMU employs techniques like translation block caching and chaining to improve the performance of dynamic translation. It also uses helper functions to offload complex operations during translation to improve efficiency.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
This course gets you started with writing device drivers in Linux by providing real time hardware exposure. Equip you with real-time tools, debugging techniques and industry usage in a hands-on manner. Dedicated hardware by Emertxe's device driver learning kit. Special focus on character and USB device drivers.
U-Boot is an open source bootloader used widely in embedded systems. It initializes hardware and loads the operating system kernel. The document provides an overview of U-Boot from the user and developer perspectives, including its features, build process, file structure, and boot sequence. It also discusses modernizing efforts like adopting the driver model, device tree, and Kbuild configuration system to improve compatibility and support new platforms.
It describes the MMC storage device driver functionality in Linux Kernel and it's role. It explains different type of storage devices available and how they are handled from MMC driver point of view. It describes eMMC (internal storage) device and SD (external storage) devices in details and SD protocol used for communicating with these devices in Linux.
The document discusses Virtio, an interface for virtualized I/O devices. It introduces Virtio's architecture, which involves Virtqueues and Vrings to facilitate communication between guest drivers and hypervisor-level device emulators. It outlines the five Virtio APIs for adding, kicking, getting buffers and enabling/disabling callbacks. It also provides an overview of steps for adding new Virtio devices and drivers.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Overview of the Linux Kernel, based on "Anatomy of the Linux Kernel" by M. Tim Jones, (IBM Developerworks) http://www.ibm.com/developerworks/linux/library/l-linux-kernel/
Embedded systems contain processors designed to perform dedicated functions. They tightly integrate hardware and software to perform tasks like controlling quadcopters, engines, and satellites. Embedded systems have processors unlike general purpose CPUs in PCs. They are integral parts of larger systems. Microcontrollers are commonly used embedded systems that integrate a processor, memory, and I/O on a single chip. They include peripherals like timers, analog-to-digital converters, and communication protocols. The microcontroller acts as the brain that processes instructions from memory and transfers data through buses to peripherals and memory to control inputs and outputs.
The document provides an overview of the ARM architecture and Cortex-M3 processor. It discusses ARM Ltd.'s history and business model as an IP licensing company. It then describes the Cortex-M3 microcontroller, including its programmer's model, exception and interrupt handling, pipeline, and instruction sets. Key points are the Cortex-M3's stack-based exception model, 3-stage pipeline, conditional execution support, and AHB/APB system design integration.
The document discusses ioremap and mmap functions in Linux for mapping physical addresses into the virtual address space. Ioremap is used when physical addresses are larger than the virtual address space size. It maps physical addresses to virtual addresses that can be accessed by the CPU. Mmap allows a process to map pages of a file into virtual memory. It is useful for reducing memory copies and improving performance of file read/write operations. The document outlines the functions, flags, and flows of ioremap, mmap, and implementing a custom mmap file operation for direct physical memory mapping.
This document discusses making Linux capable of hard real-time performance. It begins by defining hard and soft real-time systems and explaining that real-time does not necessarily mean fast but rather determinism. It then covers general concepts around real-time performance in Linux like preemption, interrupts, context switching, and scheduling. Specific features in Linux like RT-Preempt, priority inheritance, and threaded interrupts that improve real-time capabilities are also summarized.
HKG15-505: Power Management interactions with OP-TEE and Trusted FirmwareLinaro
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
---------------------------------------------------
Speaker: Jorge Ramirez-Ortiz
Date: February 13, 2015
---------------------------------------------------
★ Session Summary ★
[Note: this is a joint Security/Power Management session) Understand what use cases related to Power Management have to interact with Trusted Firmware via Secure calls. Walk through some key use cases like CPU Suspend and explain how PM Linux drivers interacts with Trusted Firmware / PSCI
--------------------------------------------------
★ Resources ★
Pathable: https://hkg15.pathable.com/meetings/250855
Video: https://www.youtube.com/watch?v=hQ2ITjHZY4s
Etherpad: http://pad.linaro.org/p/hkg15-505
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2015 - #HKG15
February 9-13th, 2015
Regal Airport Hotel Hong Kong Airport
---------------------------------------------------
http://www.linaro.org
http://connect.linaro.org
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsThe Linux Foundation
The document discusses hardware accelerated virtualization capabilities in ARM Cortex processors including the Cortex-A15. It describes new features like large physical addressing, virtualization extensions, and a virtual interrupt controller that allow multiple operating system instances and work environments to run simultaneously in isolation on ARM devices.
QEMU is a free and open-source hypervisor that performs hardware virtualization by emulating CPUs through dynamic binary translation and providing device models. This allows it to run unmodified guest operating systems. It can be used to create virtual machines similarly to VMWare, VirtualBox, KVM, and Xen. QEMU also supports emulating different CPU architectures and can save and restore the state of a virtual machine.
This document discusses Linux memory management. It outlines the buddy system, zone allocation, and slab allocator used by Linux to manage physical memory. It describes how pages are allocated and initialized at boot using the memory map. The slab allocator is used to optimize allocation of kernel objects and is implemented as caches of fixed-size slabs and objects. Per-CPU allocation improves performance by reducing locking and cache invalidations.
The document provides an overview of Das U-Boot, a universal boot loader used to load operating systems and applications into memory on embedded systems. It discusses U-Boot's features such as its command line interface, ability to load images from different sources, and support for various architectures and boards. It also covers compiling and configuring U-Boot, as well as its basic command set and image support capabilities.
The document discusses Linux device trees and how they are used to describe hardware configurations. Some key points:
- A device tree is a data structure that describes hardware connections and configurations. It allows the same kernel to support different hardware.
- Device trees contain nodes that represent devices, with properties like compatible strings to identify drivers. They describe things like memory maps, interrupts, and bus attachments.
- The kernel uses the device tree passed by the bootloader to identify and initialize hardware. Drivers match based on compatible properties.
- Device tree files with .dts extension can be compiled to binary blobs (.dtb) and overlays (.dtbo) used at boot time to describe hardware.
This document discusses memory ordering and synchronization in multithreaded programs. It begins with background on mutexes, semaphores, and their differences. It then discusses problems that can occur with locking-based synchronization methods like mutexes, such as deadlocks, priority inversion, and performance issues. Alternative lock-free programming techniques using atomic operations are presented as a way to synchronize access without locks. Finally, memory ordering, consistency models, barriers, and their implementations in compilers, Linux kernels, and ARM architectures are covered in detail.
This document discusses SR-IOV (Single Root I/O Virtualization), which allows a PCIe device to appear as multiple separate devices. It describes how SR-IOV works by introducing physical functions and virtual functions. It then outlines the steps to enable SR-IOV on a Xen hypervisor, including configuring the network device, enabling virtual functions, binding VFs to the pciback driver, and assigning VFs to guest VMs. Reference links are also provided for additional information on SR-IOV and its implementation in Xen.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
QEMU is an emulator that uses dynamic translation to emulate one instruction set architecture (ISA) on another host ISA. It translates guest instructions to an intermediate representation (TCG IR) code, and then compiles the IR code to native host instructions. QEMU employs techniques like translation block caching and chaining to improve the performance of dynamic translation. It also uses helper functions to offload complex operations during translation to improve efficiency.
This document discusses adding support for PCI Express and new chipset emulation to Qemu. It introduces a new Q35 chipset emulator with support for 64-bit BAR, PCIe MMCONFIG, multiple PCI buses and slots. Future work includes improving PCIe hotplug, passthrough and power management as well as switching the BIOS to SeaBIOS and improving ACPI table support. The goal is to modernize Qemu's emulation of PCI features to match capabilities of newer hardware.
This course gets you started with writing device drivers in Linux by providing real time hardware exposure. Equip you with real-time tools, debugging techniques and industry usage in a hands-on manner. Dedicated hardware by Emertxe's device driver learning kit. Special focus on character and USB device drivers.
U-Boot is an open source bootloader used widely in embedded systems. It initializes hardware and loads the operating system kernel. The document provides an overview of U-Boot from the user and developer perspectives, including its features, build process, file structure, and boot sequence. It also discusses modernizing efforts like adopting the driver model, device tree, and Kbuild configuration system to improve compatibility and support new platforms.
It describes the MMC storage device driver functionality in Linux Kernel and it's role. It explains different type of storage devices available and how they are handled from MMC driver point of view. It describes eMMC (internal storage) device and SD (external storage) devices in details and SD protocol used for communicating with these devices in Linux.
The document discusses Virtio, an interface for virtualized I/O devices. It introduces Virtio's architecture, which involves Virtqueues and Vrings to facilitate communication between guest drivers and hypervisor-level device emulators. It outlines the five Virtio APIs for adding, kicking, getting buffers and enabling/disabling callbacks. It also provides an overview of steps for adding new Virtio devices and drivers.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Overview of the Linux Kernel, based on "Anatomy of the Linux Kernel" by M. Tim Jones, (IBM Developerworks) http://www.ibm.com/developerworks/linux/library/l-linux-kernel/
Embedded systems contain processors designed to perform dedicated functions. They tightly integrate hardware and software to perform tasks like controlling quadcopters, engines, and satellites. Embedded systems have processors unlike general purpose CPUs in PCs. They are integral parts of larger systems. Microcontrollers are commonly used embedded systems that integrate a processor, memory, and I/O on a single chip. They include peripherals like timers, analog-to-digital converters, and communication protocols. The microcontroller acts as the brain that processes instructions from memory and transfers data through buses to peripherals and memory to control inputs and outputs.
The document provides an overview of the ARM architecture and Cortex-M3 processor. It discusses ARM Ltd.'s history and business model as an IP licensing company. It then describes the Cortex-M3 microcontroller, including its programmer's model, exception and interrupt handling, pipeline, and instruction sets. Key points are the Cortex-M3's stack-based exception model, 3-stage pipeline, conditional execution support, and AHB/APB system design integration.
VTU University Micro Controllers-06ES42 lecturer Notes24x7house
The document discusses microcontrollers and the 8051 microcontroller architecture. It begins with definitions of microprocessors and microcontrollers, and describes the differences between them. It then discusses the 8051 microcontroller in detail, including its hardware components, memory architecture, instruction cycle, and internal memory structure. The 8051 uses a Harvard architecture with separate memory for program and data. It has 128 bytes of internal RAM and can interface with external memory.
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAlan Su
This document describes a multi-core software/hardware co-debug platform that integrates various debug mechanisms to allow debugging of both software and hardware issues in a multi-core system-on-chip (SoC). It utilizes ARM CoreSight for on-chip debug and trace, an on-chip test architecture for hardware breakpoints and register inspection, and an AHB/AXI bus monitor. The platform allows stepping through execution, inspecting processor and component status/data, and identifying bugs like race conditions across software, processors, and hardware in the multi-core SoC. It was validated through debugging a multi-core 3D image application program on a quad-ARM1176 chip with the integrated debug mechanisms.
The document is a chapter about input/output and storage systems from a computer organization and assembly course. It discusses various I/O architectures like programmed I/O, interrupt-driven I/O, DMA, and channel I/O. It also covers data transmission modes, magnetic disk technology, hard disks, floppy disks, and file allocation tables. The key topics are I/O methods and architectures, storage media formats, RAID, data compression algorithms, and how I/O systems work.
The document describes the architecture of the Pentium family processor. It discusses the Pentium processor's architecture including its 64-bit data bus, separate code and data caches, pipeline sequence, and superscalar execution using two pipelines. It also describes the Pentium's registers including the general purpose, segment, debug, and EFlags registers. Finally, it discusses the Pentium's bus description including the address bus, data bus, control bus, byte enables, and bus cycles.
directCell - Cell/B.E. tightly coupled via PCI ExpressHeiko Joerg Schick
This document summarizes new features in PCI Express Gen 3, including Atomic Operations, TLP Processing Hints, TLP Prefix, Resizable BAR, and others. It describes how each feature enhances PCI Express functionality, such as enabling atomic operations to facilitate migration of SMP applications to PCIe accelerators, and TLP Prefix allowing expansion of header sizes to carry additional information.
this is a complete summer training report on embedded sys_AVR. It aslo includes a project and its coding and other topics which are learnt in training.
The document discusses general-purpose processors and their basic architecture. It explains that general-purpose processors have a control unit and datapath that are designed to perform a variety of computation tasks through software programs. The control unit sequences through instruction cycles that involve fetching instructions from memory, decoding them, fetching operands, executing operations in the datapath, and storing results. Pipelining and other techniques can improve processor throughput and performance. The document also covers programming models and assembly-level instruction sets.
Smartphones architecture is generally different from
common desktop architectures. It is limited by power, size and
cost of manufacturing with the goal to provide the best
experience for users in a minimum cost. Stemming from this
fact, modern micro-processors are designed with an
architecture that has three main components: an application
processor that executes the end user’s applications, a modem
responding to baseband radio activities, and peripheral devices
for interacting with the end user.
Parallelism
Multicores:
The Cortex A7 MPCore processor implements the ARMv7-A
architecture. The Cortex A7 MPCore processor has one to
four processors in a single multi-processor device. The
following figure shows an example configuration with four
processors [3].
In this paper, we are discussing the architecture of the
application processor of Apple iPhone. Specifically, Apple
iPhone uses ARM Cortex generation of processors as their
core. The following sections discusses this architecture in terms
of Instruction Set Architecture, Memory Hierarchy and
Parallelism.
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
This document provides an introduction and overview of embedded systems and embedded system design. It discusses the following key points in 3 sentences:
1. It defines embedded systems and lists their essential components as well as characteristics including low cost, low power usage, and small size.
2. It discusses the requirements of embedded microcontroller cores including memory, ports, timers, interrupts, and serial data transfer standards to interface with real-world peripherals.
3. It also covers embedded programming, real-time operating systems, example applications, and textbooks on embedded systems design.
I have collected all the necessary information about various hardware blocks of Nvidia Tegra K1 processor and put them together. It would be helpful for those who are/going to work on it by giving the details in a very concise fashion.
A 16-bit microprocessor I designed during my final semester (2005) of my Bachelor of Technology program. The microprocessor circuitry design was coded in VHDL and then configured in a Xilinx XC9572 PC84 CPLD kit. Most of the design, the architecture and the instruction set were taken from Computer System Architecture (3rd ed.) by M. Morris Mano. See https://github.com/susam/mano-cpu for VHDL source code and other related files.
This document discusses the implementation of a narrow band digital filter on a low-cost microcontroller. It implemented a 1kHz center frequency, 500Hz bandwidth filter on an Atmel ATmega163 microcontroller. It describes the hardware and software tools used, including the Atmel STK500 starter kit, CodeVisionAVR C compiler, and Atmel AVR Studio. It discusses the analog input and output configurations, sampling rate, and filter design considerations like fixed-point representation effects. The performance and tradeoffs of the implementation are evaluated.
This document discusses ARM embedded systems and microprocessors. It covers ARM's RISC design philosophy, instruction set, and embedded system hardware and software components. The hardware components include the ARM processor, controllers, peripherals, and bus architecture. The software components include initialization code, operating systems, and applications. It also describes ARM registers, the program status register, pipelining, exceptions, interrupts, and the instruction set states.
Design of a low power processor for Embedded system applicationsROHIT89352
The document describes the design of a low power processor for embedded systems. It uses clock gating techniques and a standby mode to reduce power consumption. The processor is designed based on a modified MIPS microarchitecture and can operate using the RV32E instruction set. It has been implemented at the register transfer level in Verilog and synthesized into an 180nm CMOS technology. The processor consumes 189uA in normal mode and 11.1uA in standby mode, achieving low power operation.
Computer Organization : CPU, Memory and I/O organizationAmrutaMehata
This document provides information on CPU, memory, and I/O organization. It begins with an overview of the main components of a computer including the processor unit, memory unit, and input/output unit. It then describes the CPU in more detail including the arithmetic logic unit, control unit, and CPU block diagram. The document discusses the system bus and its various lines. It also covers CPU registers, instruction cycles, and status and control flags. The document provides an overview of instruction set architecture and compares RISC and CISC processor designs.
Implementing ELDs or Electronic Logging Devices is slowly but surely becoming the norm in fleet management. Why? Well, integrating ELDs and associated connected vehicle solutions like fleet tracking devices lets businesses and their in-house fleet managers reap several benefits. Check out the post below to learn more.
What Could Be Behind Your Mercedes Sprinter's Power Loss on Uphill RoadsSprinter Gurus
Unlock the secrets behind your Mercedes Sprinter's uphill power loss with our comprehensive presentation. From fuel filter blockages to turbocharger troubles, we uncover the culprits and empower you to reclaim your vehicle's peak performance. Conquer every ascent with confidence and ensure a thrilling journey every time.
Ever been troubled by the blinking sign and didn’t know what to do?
Here’s a handy guide to dashboard symbols so that you’ll never be confused again!
Save them for later and save the trouble!
Welcome to ASP Cranes, your trusted partner for crane solutions in Raipur, Chhattisgarh! With years of experience and a commitment to excellence, we offer a comprehensive range of crane services tailored to meet your lifting and material handling needs.
At ASP Cranes, we understand the importance of reliable and efficient crane operations in various industries, from construction and manufacturing to logistics and infrastructure development. That's why we strive to deliver top-notch solutions that enhance productivity, safety, and cost-effectiveness for our clients.
Our services include:
Crane Rental: Whether you need a crawler crane for heavy lifting or a hydraulic crane for versatile operations, we have a diverse fleet of well-maintained cranes available for rent. Our rental options are flexible and can be customized to suit your project requirements.
Crane Sales: Looking to invest in a crane for your business? We offer a wide selection of new and used cranes from leading manufacturers, ensuring you find the perfect equipment to match your needs and budget.
Crane Maintenance and Repair: To ensure optimal performance and safety, regular maintenance and timely repairs are essential for cranes. Our team of skilled technicians provides comprehensive maintenance and repair services to keep your equipment running smoothly and minimize downtime.
Crane Operator Training: Proper training is crucial for safe and efficient crane operation. We offer specialized training programs conducted by certified instructors to equip operators with the skills and knowledge they need to handle cranes effectively.
Custom Solutions: We understand that every project is unique, which is why we offer custom crane solutions tailored to your specific requirements. Whether you need modifications, attachments, or specialized equipment, we can design and implement solutions that meet your needs.
At ASP Cranes, customer satisfaction is our top priority. We are dedicated to delivering reliable, cost-effective, and innovative crane solutions that exceed expectations. Contact us today to learn more about our services and how we can support your project in Raipur, Chhattisgarh, and beyond. Let ASP Cranes be your trusted partner for all your crane needs!
Understanding Catalytic Converter Theft:
What is a Catalytic Converter?: Learn about the function of catalytic converters in vehicles and why they are targeted by thieves.
Why are They Stolen?: Discover the valuable metals inside catalytic converters (such as platinum, palladium, and rhodium) that make them attractive to criminals.
Steps to Prevent Catalytic Converter Theft:
Parking Strategies: Tips on where and how to park your vehicle to reduce the risk of theft, such as parking in well-lit areas or secure garages.
Protective Devices: Overview of various anti-theft devices available, including catalytic converter locks, shields, and alarms.
Etching and Marking: The benefits of etching your vehicle’s VIN on the catalytic converter or using a catalytic converter marking kit to make it traceable and less appealing to thieves.
Surveillance and Monitoring: Recommendations for using security cameras and motion-sensor lights to deter thieves.
Statistics and Insights:
Theft Rates by Borough: Analysis of data to determine which borough in NYC experiences the highest rate of catalytic converter thefts.
Recent Trends: Current trends and patterns in catalytic converter thefts to help you stay aware of emerging hotspots and tactics used by thieves.
Benefits of This Presentation:
Awareness: Increase your awareness about catalytic converter theft and its impact on vehicle owners.
Practical Tips: Gain actionable insights and tips to effectively prevent catalytic converter theft.
Local Insights: Understand the specific risks in different NYC boroughs, helping you take targeted preventive measures.
This presentation aims to equip you with the knowledge and tools needed to protect your vehicle from catalytic converter theft, ensuring you are prepared and proactive in safeguarding your property.
Expanding Access to Affordable At-Home EV Charging by Vanessa WarheitForth
Vanessa Warheit, Co-Founder of EV Charging for All, gave this presentation at the Forth Addressing The Challenges of Charging at Multi-Family Housing webinar on June 11, 2024.
EV Charging at MFH Properties by Whitaker JamiesonForth
Whitaker Jamieson, Senior Specialist at Forth, gave this presentation at the Forth Addressing The Challenges of Charging at Multi-Family Housing webinar on June 11, 2024.
2. Agenda
• To provide a simplified view on Virtualization for Automotive ECUs
• To understand and compare different solutions available.
• To share this knowledge so that this drop in ocean join with other drops
eventually quench the thirst of some good souls in universe!
Note: All contents, pictures etc., are based on either what are already published on the web and/or from my own experience /
learning / creations. My intent is not to violate any copyrights or NDA content. Please let me know if any violations if happened.
3. What is Virtualization?
• Operating System (OS) abstracts
hardware from its applications.
Virtualization abstracts hardware
from one or more OS.
• In automotive world, Virtualization
is about abstracting applications,
operating systems, vehicle
network, displays, audio systems
etc. away from the hardware.
What is Virtualization?
• Operating System (OS) abstracts
hardware from its applications.
Virtualization abstracts hardware
from one or more OS.
• In automotive world, Virtualization
is about abstracting operating
systems, applications, vehicle
network, displays, audio systems
etc. away from the hardware.
4. Virtualization Types
• Fundamental types
• Type 1: Full-virtualization, where the hypervisor takes control of the
hardware and hosts the guest OSes, and the guests are completely
unaware of running on an virtualized environment.
• Type 2: Para-virtualization, where one of the operating system (called as
Host OS) takes charge of hardware and the guest OS is modified to
connect with either Host OS or hardware devices.
• Derived Types
• Hardware assisted virtualization: Here the virtualization solution utilizes
the support provided by hardware to realize the virtualization goals.
• Example Linux/KVM falls under this category. We will see this in details, later.
• Hybrid types: Here the virtualization is realized by combining different
other types.
• For example, the core virtualization functions are realized using Type 1
hypervisor and peripheral / device virtualization are done using Type 2 or other
types such as Graphics, Display virtualization uses a server in Host OS and clients
running in Guests OSes. We will see this in details, later.
• ... and many more
Hardware
Hypervisor
OS 1 OS 2
Type 1
Hardware
Host OS
Apps Hypervisor
Type 2
Modified
Guest OS
5. Stop all old stories! How to realize Virtualization?
• System Virtualization involves following functions
• Virtualization of CPU cores or Processing Elements
• Virtualization of memory and the memory management
• Virtualization of Interrupts
• Virtualization of Timers
• I/O or Peripheral Virtualization
• To get a better understanding different virtualization functions (listed
above), we may need some example hardware such as Raspberry Pi 3
(ARM Cortex A53)
• Raspberry Pi is taken because that is the most open & common hardware available.
6. Overview of ARM Cortex A53
- CPU cores
- Exceptions Levels of ARMv8
- Memory management
- Memory Mapped I/O
- Interrupts
- Timers, Clocks, Resets
7. ARM Cortex A53 – CPU core hardware blocks
• 4 CPU Cores with
• Timer block
• Interrupt block
• Core includes
• NEON Coprocessor
• FPU
• Crypto extensions
• L1 Cache [, L2 Cache]
• Debug & trace
• Trace block
• Debug block
• ACP - Accelerator Coherency
Ports for AXI slaves
• Master memory interface
• Power management interface
• Test interface
The Cortex-A53 processor is a mid-range, low-power processor that implements the ARMv8-A
architecture. The Cortex-A53 processor has one to four cores, each with an L1 memory system
and a single shared L2 cache.
Figure 1-1 shows an example of a Cortex-A53 MPCore configuration with four cores and either
an ACE or a CHI interface.
Figure 1-1 Example Cortex-A53 processor configuration
See About the Cortex-A53 processor functions on page 2-2 for more information about the
functional components.
Core 3*
Core 2*
Core 1*
AXI slave interface
Core 0
Timer events
Counter
ICDT*, nIRQ, nFIQ
PMU
ATB
Debug
Core
Trace
Debug
Interrupt
Timer
ACP*
Power
management
Test
ACE or CHI
master interface
Power control
DFT
MBIST
Cortex-A53 processor
* Optional
APB debug
Clocks
Resets
Configuration
Master
interface
ICCT*, nVCPUMNTIRQ
Ref: https://developer.arm.com/documentation/ddi0500/e/introduction/about-the-cortex-a53-processor
8. ARM Cortex A53 – CPU Functional Blocks
• APB – slow speed(compared
to AXI) Advanced Peripheral
Bus
• CTM – CoreSight Trigger
Matrix (Debug & Trace)
• CTI – CoreSight Trigger
Interface (Debug & Trace)
• GIC – Global Interrupt
Controller
• SCU – Snoop Control Unit that
maintains cache coherency
• ACE – an extension to AXI
protocol
• CHI – a scalable protocol
supporting multi-node
interconnect
• ACP - an AMBA 4 AXI slave
interface
.1 About the Cortex-A53 processor functions
Figure 2-1 shows a top-level functional diagram of the Cortex-A53 processor.
Figure 2-1 Cortex-A53 processor block diagram
The following sections describe the main Cortex-A53 processor components and their
functions:
L1
ICache
L1
DCache
Debug
and trace
Core 0
L2 cache SCU
ACE/AMBA 5 CHI
master bus interface
ACP slave
Level 2 memory system
Core 0 governor
L1
ICache
L1
DCache
Debug
and trace
Core 1
FPU and NEON
extension
Crypto
extension
L1
ICache
L1
DCache
Debug
and trace
Core 2
L1
ICache
L1
DCache
Debug
and trace
Core 3
Core 1 governor Core 2 governor Core 3 governor
Arch
timer
GIC CPU
interface
Clock and
reset
CTI
Retention
control
Debug over
power down
Arch
timer
GIC CPU
interface
Clock and
reset
CTI
Retention
control
Debug over
power down
Arch
timer
GIC CPU
interface
Clock and
reset
CTI
Retention
control
Debug over
power down
Arch
timer
GIC CPU
interface
Clock and
reset
CTI
Retention
control
Debug over
power down
Governor
APB decoder APB ROM APB multiplexer CTM
Cortex-A53 processor
FPU and NEON
extension
Crypto
extension
FPU and NEON
extension
Crypto
extension
FPU and NEON
extension
Crypto
extension
Ref: https://developer.arm.com/documentation/ddi0500/d/functional-description/about-the-cortex-a53-processor-functions?lang=en
9. CPU Virtualization – Virtual CPU Cores (vCPU)
• A Raspberry Pi3 has 4 physical cores or CPU (refer previous slide).
• vCPU is basically a time slot of a physical CPU.
• Note: ARM uses the term “vPE” (virtual Processing Element)
• There can be 1-to-1 or many-to-1 relation between vCPU and a real CPU
core.
• For understanding purpose let us imagine a single core ARMv8 processor
and if we can schedule 2 vCPUs from from it (as shown in the picture
below), then this system has 2 vCPU to 1 real CPU relationship.
7 Virtualizing the G
7 Virtualizing the Generic Timers
The Arm architecture includes the Generic Timer, which is a standardized set of timers avai
each processor. The Generic Timer consists of a set of comparators that compare against a
system count. A comparator generates an interrupt when its value is equal to or less than th
count. In the following diagram, we can see the Generic Timer in a system (orange), and its
components of comparators and a counter module.
The following diagram shows an example system with a hypervisor that hosts two virtual
CPUs (vCPUs):
Single Core ARMv8
Hypervisor
VM 1 VM 2
12. ARM Cortex A53 (Raspberry Pi3) Memory Map
• Picture on right shows physical and virtual
memory addresses of RPi3 B.
• Physical memory map
• RAM (1 GB “bcm2837-rpi-3-b.dts”)
memory {
reg = <0 0x40000000>;
}
• Memory Mapped I/O
• See the map for I/O Peripherals on the right.
• Virtual memory map
• User space: 0x0000 0000 to 0xBFFF FFFF
• Kernel space: 0xC000 0000 to 0xFFFF FFFF
DDR2
I/O
Peripherals
ARM
MMU
User Space
Virtual
Memory
0x0000 0000
0x4000 0000
0x0000 0000
0xC000 0000
0xFFFF FFFF
Physical Address Virtual Address
32-bit split
0x4003 FFFF
I/O
Peripherals
13. ARM Cortex A53 – MMU
• ARM64 instructions such as LDR/STR and registers such as PC/LR all points to
Virtual Address.
• This means any address or pointers in the application programs points to virtual
addresses.
• MMU
• Sits between CPU and DDR controller (see next slide).
• Translates virtual to physical address. Once configured no translation penalty.
• Address translation granule of 4KB (AArch32 & AAarch64) and 64KB (AArch64 only) - pages.
• 16 bit ASID (AArch32 uses 8-bit) – used in TLB (see TLB slides).
• Max supported physical address size = 40 bits = 2^40 = 1024 GB (1 TB).
• Provides fine-grained control through virt-to-phys addr. mappings and memory attributes
held in page tables, loaded into the TLB (translation lookaside buffer).
• Generates exceptions if any access violation happens.
14. ARM Cortex A53 – MMU mapping with Example
• Let us take an example application
that needs 16k of RAM (.text + .bss
+ .data and others).
• As soon as the application is
spawned, assume that the OS
allocates 16k at virtual address:
0x80000000.
• As the page size is 4k, the OS allots
4 contiguous pages as shown in the
table below:
1
2
3
4
0x8000_0000 + 16k
0x00000000
Page VA: start VA: end PA: start PA: end
1 80000000 80000FFF 00017000 00017FFF
2 80001000 80001FFF 00029000 00029FFF
3 80002000 80002FFF 00003000 00003FFF
4 80003000 80003FFF 0004F000 0004FFFF
0x00016000
0x00024000
0x0004D000
This table is for illustrating the MMU lookup
table. Note that the physical address in last 2
columns are not contiguous.
15. ARM Cortex A53 – Translation Lookaside Buffer (TLB)
• Based on the example provided in previous slide, assume that for every 5 to 10
instructions, the CPU is asked to read a look-up table that is stored in external
DDR memory. Do you think the CPU will be efficiently utilized? No.
• What is TLB?
• TLB is a ‘memory cache’ that contains the recent Virtual Address (VA) to Physical Address
(PA) translations. This saves CPU time for the entries that are done often by the OS.
• In case of TLB miss, generally address translation info from the look-up table is fetched
and updated.
• TLB is organized as 4 major blocks listed below:
• Micro TLB
• 10 sets of physical address to cache (first level) for each data and instruction
• Main TLB
• second layer of TLB structure that catches the cache misses from Micro TLBs.
• Supports all VMSAv8 (Virtual Memory System Architecture) block sizes except 1 GB.
• IPA cache RAM
• The intermediate physical address (IPA) cache RAM holds mappings between intermediate
physical address and the physical address.
• Only non-secure EL1 and EL0 “stage 2” translation uses this cache.
• Walk cache RAM
• Holds the result of stage 1 (OS controlled) translation.
• If stage 1 translation result in a section or larger mapping then nothing is placed in the walk
cache RAM.
MMU
TLB
DDR
Controller
DDR
Memory
CPU
Registers
Entries
pa ...va
pa ...va
pa ...va
SoC
Note: The IPA is part of “stage 2” address translation, i.e., the hypervisor controlled address translation. Will be discussed later.
16. ARM Cortex A53 – TLB matching and Cache handling
TLB match process
• Each TLB entry contains a VA, block-size, PA and a set of memory
properties (type, access permissions, ...)
• Each entry is associated with a particular ASID, contains a field to store
VMID
• A TLB entry match occurs, when the following conditions are met:
• Its VA, moderated by VA bits [47:N], where N is log2(page size) =
12 for 4k
• Memory space matches the memory space state of request. The
memory space can be one of four values:
• Secure EL3 (AArch64)
• Non-secure EL2
• Secure EL0, EL1 (and EL3 - AArch32)
• Non-secure EL0 or EL1
• ASID matches the current ASID held in the CONTEXTIDR, TTBR0
or TTBR1 or entry marked as global
• The VMID matches the current VMID held in the VTTBR register
Data cache coherency
• Uses MOESI protocol to maintain data coherency between
multiple cores.
• M - Modified - The line is in only this cache and is dirty (Unique Dirty)
• O - Owned - The line is possibly in more than one cache and is dirty
(Shared Dirty)
• E - Exclusive - The line is in only this cache and is clean. (Unique
Clean)
• S - Shared - The line is possibly in more than one cache and is clean
(Shared Clean)
• I - Invalid - The line is not in this cache
CPU
Registers
MMU
+ TLB
L1
Cache
L2
Cache
DDR
Memory
Virtual Address Physical Address
SoC
We will discuss
this soon
Key takeaway: Memory is already virtualized on a single OS, virtualizing it for more than one OS is done by adding stage-2 address translation.
18. ARMv8 Stage2 Address Translation
• Allows Hypervisor to control which memory mapped system resources a VM can access and
how it appears within the VM.
• It is can be used to ensure that the VM can see only the allocated regions.
• In short, OS controlled translation table is called stage 1 table and Hypervisor controlled
translation table is called as stage 2 translation
https://developer.arm.com/architectures/learn-the-architecture/armv8-a-virtualization/stage-2-translation
resources that are allocated to other VMs or the hypervisor.
For memory address translation, stage 2 translation is a second stage of translation. To support this, a
new set of translation tables known as Stage 2 tables, are required, as shown here:
An Operating System (OS) controls a set of translation tables that map from the virtual address space
to what it thinks is the physical address space. However, this process undergoes a second translation
into the real physical address space. This second stage is controlled by the hypervisor.
The OS-controlled translation is called stage 1 translation, and the hypervisor-controlled translation
is called stage 2 translation. The address space that the OS thinks is physical memory is referred to as
the Intermediate Physical Address (IPA) space.
Note: For an introduction to how address translation works, see our guide on Memory Management.
OS or VM Hypervisor (IPA) RPi3 phy.addr space
23. ARMv8 Exception Levels
• ARMv8 model defines 4 exception
levels
• EL0 – least privileged
• EL1 – increased privileged (OS)
• EL2 – Hypervisor mode.
• EL3 – highest privileged, Secure
monitor mode.
• On processor reset (power on
reset), the system enters EL3.
• On taking an exception, exception
level either increases or remains
the same. Doesn’t decrease.
• On return from exception, the
exception level decreases or
remains the same.
• Every exception level has its own
stack pointer. The boot loader or
the initialization part of operating
system software has to setup these
registers for all exceptions levels.
ProgrammersModel
Figure 3-1 ARMv8 security model when EL3 is using AArch64
Security model when EL3 is using AArch32
To provide software compatibility with VMSAv7 implementations that include the security
Guest OS1 Guest OS2
Hypervisor
EL0
EL1
EL2
EL3
Non-secure state Secure state
Secure monitor
Hyp
Modes:
AArch64
System, FIQ, IRQ,
Supervisor, Abort, Undefined
Modes:
System, FIQ, IRQ,
Supervisor, Abort, Undefined
Modes:
User
Modes:
User
Modes:
User
Modes:
User
Modes:
AArch32 or
AArch64†
AArch32 or
AArch64†
App1 App2
User
Modes:
User
Modes:
AArch32 or
AArch64†
AArch32 or
AArch64†
App1 App2
AArch32 or AArch64‡
AArch32 or AArch64‡
AArch32 or AArch64
AArch32 or
AArch64†
Secure App1
AArch32 or
AArch64†
Secure App2
Secure OS
System, FIQ, IRQ,
Supervisor, Abort, Undefined
Modes:
AArch32 or AArch64
† AArch64 permitted only if EL1 is using AArch64
‡ AArch64 permitted only if EL2 is using AArch64
Ref: https://developer.arm.com/documentation/100095/0003/programmers-model/armv8-a-architecture-concepts/armv8-security-model
24. ARMv8 Security States & Virtualization Support
• Secure State
• Can access both secure memory
space and non-secure memory
states.
• When executing at EL3, it can access
all system control resources.
• Non-Secure State
• Can only access non secure memory
spaces.
• Even in EL3, it cannot access all
system control resources.
• Virtualization support
• Software running in EL2 has access to
several control for virtualization
• Stage 2 translation
• EL1/0 instruction and register access
trapping.
• Virtual exception generation
https://developer.arm.com/architectures/learn-the-architecture/armv8-a-virtualization/virtualization-in-aarch64
Armv8-A virtualization
3 Virtualization in AArch64
Software running at EL2 or higher has access to several controls for virtu
Stage 2 translation
EL1/0 instruction and register access trapping
Virtual exception generation
The Exception Levels (ELs) in Non-Secure and Secure states are shown h
In the diagram, Secure EL2 is shown in gray. This is because support for E
26. ARM Cortex A53 – Interrupt Controller GICv4
• Interrupt Sources.
• Message-based interrupts are generated by memory-write to an assigned address.
• Wired-based interrupts are generated by peripherals such as UART or I2C via I/O
pins.
• SPI – Shared Peripheral Interrupts, which can be either message-based or wire-
based. Can be routed to any PEs configured to handle interrupts.
• PPI - Private Peripheral Interrupt, targets single specific PE (Processing Element).
• LPI - Locality-specific Peripheral Interrupt are interrupts that uses ITS (interrupt
translation service) to route an interrupt to a specific redistributor and PE.
• SGI – Software Generated Interrupts, generated by PEs.
• Distributor
• performs interrupt prioritization & distribution of SPIs and SGIs to the
Redistributors & CPU interfaces.
• Redistributor (red box)
• Holds the control, prioritization and pending information for all physical LPIs using
data structures that are held in memory. Provides programming interface for:
• Enabling / disabling SGIs and PPIs, Setting priority levels for SGIs and PPIs
• Setting each PPI to be level sensitive or edge triggered.
• ...
• CPU interface (blue box)
• Provides register interface to PE. Provides programming interface for:
• Control and config. To enable interrupt handling in accordance with the security state and
legacy support requirements of the implementation
• Acknowledging an interrupt, deactivation of interrupt,
• Performing a priority drop, deactivation of interrupt,
• ...
3 GIC Partitioning
3.1 The GIC logical components
Figure 3-2 shows the GIC partitioning in an implementation that includes an ITS.
Figure 3-2 GIC logical partitioning with an ITS
The mechanism for communication between the ITS and the Redistributors is IMPLEMENTATION DEFINED.
The mechanism for communication between the CPU interfaces and the Redistributors is also IMPLEMENTATION
DEFINED.
Distributor
PE
x.y.0.0
PE
x.y.0.1
PE
x.y.0.2
Cluster C0
PE
x.y.n.0
PE
x.y.n.1
Cluster Cn
Redistributor
ITSa
Interrupt Translation Service
CPU interface
Distributor
a. The inclusion of an ITS is optional, and there might be more than one
ITS in a GIC.
b. SGIs are generated by a PE and routed through the Distributor.
PPIs
LPIs
SGIsb SGIsb
SGIsb
SGIsb
SPIs
SGIsb
Wired-based Interrupt Message-based Interrupt
Ref: https://static.docs.arm.com/ihi0069/c/IHI0069C_gic_architecture_specification.pdf, section 3.1
27. ARM Cortex A53 – Interrupt Lifecycle & Interrupt numbers
GIC interrupt lifecycle, a series of high-level processes that apply to any
e interrupt lifecycle provides a basis for describing the detailed steps of the
o maintains a state machine that controls interrupt state transitions during
cycle for physical interrupts.
Figure 4-1 Physical interrupt lifecycle
s follows:
s generated either by the peripheral or by software.
Start
A device generates an
interrupt
Generate
End
Distribute
Deliver
Activate
Priority
drop
The CPU interface
delivers interrupt to the
PE
Deactivationa
a. This step does not apply to LPIs.
The PE ends the
interrupt
The PE acknowledges
the interrupt
INTID Interrupt
Type
Details
0 - 15 SGI These interrupts are local to CPU interface
16 - 31 PPI These interrupts are local to CPU interface
(0-1023 are compatible with earlier versions of
GIC architecture)
32 - 1019 SPI Shared peripheral interrupts that the Distributor
can route to either a specific PE, or to any one of
the PEs in the system that is a participating node
1020 - 1023 Special
interrupt
number
1020 - GIC returns this from EL3 -> handled at
Secure EL1
1021 - GIC returns this from EL3 -> handled at
Non-Secure EL1
1022 - legacy operations only
1023 – GIC returns this as interrupt acknowledge,
or if there are errors handling interrupt.
1024 - 8191 - Reserved
8192 -
implementat
ion defined
LPI Peripheral hardware interrupts that are routed to
a specific PE (directly).
Ref: https://static.docs.arm.com/ihi0069/c/IHI0069C_gic_architecture_specification.pdf, section 4.1
28. ARMv8 Interrupt Grouping
• GICv3 onwards supports Interrupt Grouping as a mechanism to align interrupt handling with ARMv8
exception & security model.
• In a system with two Security states (secure, non-secure), an interrupt is configured as one of the
following:
• A Group 0 physical interrupt:
• ARM expects these interrupts to be handled at EL3.
• A Secure Group 1 physical interrupt:
• ARM expects these interrupts to be handled at Secure EL1.
• A Non-secure Group 1 physical interrupt:
• ARM expects these interrupts to be handled at Non-secure EL2 in systems using virtualization, or at Non-secure EL1 in systems not using
virtualization
• In a system with one Security state an interrupt is configured to be either:
• Group 0.
• Group 1.
• At the System level, GICD_CTLR.DS indicates if the GIC is configured with one or two Security states.
29. ARM Cortex A53 – Virtual Interrupt Handling
• Say, a serial input device asserts its
interrupt signal to GIC.
• During initialization software
executing at EL3 or EL2 configures
PE to route interrupts to EL2
(hypervisor)
• GIC generates a physical interrupt
exception, either IRQ or FIQ, which
then gets routed to EL2
(hypervisor)
• The hypervisor then configures the
GIC to forward the physical
interrupt as Virtual Interrupt (vIRQ
or vFIQ) to the right vCPU/VM.
• The hypervisor then returns the
control to the vCPU/VM.
• The vCPU/VM uses Virtual CPU
Interface to read and respond to
the interrupts.
Note: In GICv4 onwards, LPIs can be directly injected to VM, which reduces the context switching to hypervisor.
Armv8-A virtualization Doc ID 10214
Issue [0
6 Virtualizing exceptio
The diagram illustrates these steps:
1. The physical peripheral asserts its interrupt signal into the GIC.
2. The GIC generates a physical interrupt exception, either IRQ or FIQ, which gets routed to EL2 by
the configuration of HCR_EL2.IMO/FMO. The hypervisor identifies the peripheral and
determines that it has been assigned to a VM. It checks which vCPU the interrupt should be
31. ARM Cortex A53 – Generic Timer
Functional Description
• Each core has following set of 64bit timer:
• EL1 Non-secure physical timer
• EL1 Secure physical timer
• EL2 physical timer
• Virtual timer
• The system counter value (which resides in SoC) is
distributed to the Cortex-A53 processor via
CNTVALUEB[63:0]
• The system counter typically operate at lower frequency
than the CLKIN (main processor clock)
• Each timer provides an active-LOW interrupt output to the
SoC.
• External interrupt output pins (n = no-of-cores -1)
• nCNTPNSIRQ[n:0] - EL1 Non-secure physical timer event
• nCNTPSIRQ[n:0] - EL1 Secure physical timer event
• nCNTHPIRQ[n:0] - EL2 physical timer event
• nCNTVIRQ[n:0] - Virtual timer event
https://developer.arm.com/documentation/ddi0500/e/generic-timer/generic-timer-functional-description
frequency than the main processor CLKIN, the CNTCLKEN input is provided as a clock
enable for the CNTVALUEB bus. CNTCLKEN is registered inside the Cortex-A53 processor
before being used as a clock enable for the CNTVALUEB[63:0] registers. This allows a
multicycle path to be applied to the CNTVALUEB[63:0] bus. Figure 10-1 shows the interface.
Figure 10-1 Architectural counter interface
The value on the CNTVALUEB[63:0] bus is required to be stable whenever the internally
registered version of the CNTCLKEN clock enable is asserted. CNTCLKEN must be
synchronous and balanced with CLK and must toggle at integer ratios of the processor CLK.
See Clocks on page 2-9 for more information about CNTCLKEN.
Each timer provides an active-LOW interrupt output to the SoC.
Table 10-1 shows the signals that are the external interrupt output pins.
Cortex-A53 processor
Clock gate
CNTCLKEN
register
Architectural
counter
registers
CNTVALUEB[63:0]
CNTCLKEN
Table 10-1 Generic Timer signals
• Timer schedules events and trigger
interrupts based on an incrementing
counter value.
• It provides
• Generation of timer events as
interrupt outputs
• Generation of event streams
32. ARM Cortex A53 – System Counter for Timer
• System counter (in SoC) generates the count value and
distributes to all cores (PEs)
• System counter measures real time and doesn’t
affected by DVFS
• Provides interface to programmers via following frames
• CNTControlBase accessible only in EL3, contains following
registers
• CNTCR – Control Register, contains enable, freq. selection, scaling
selection etc.
• CNTSR – Status Register, reports whether timer is running or not.
• CNTCV – Reports the current count value.
• ...
• CNTReadBase
• This is a copy of CNTControlBase but includes CNTCV register only.
5 External timers
In What is the Generic Timer, we introduced the timers that are in the processor. A syste
timers. The following diagram shows an example of this:
The programming interface for these timers mirrors that of the internal timers, but these
mapped registers. The location of these registers is determined by the SoC implementor,
datasheet for the SoC that you are working with.
Interrupts from the external memory-mapped timers will typically be delivered as Shared
https://developer.arm.com/architectures/learn-the-architecture/generic-timer/what-is-the-generic-timer
33. ARMv8 – Timer Virtualization
• Similar to “shared” UART driver example (slide 20), we can also trap timer
interrupts in hypervisor. But this would add considerable CPU overhead on
such systems as this is the core of any OS.
• The good news here is, ARMv8 allows vCPU to access following timers for
its scheduling needs:
• EL1 Non-secure physical timer – Read Only
• Virtual Timer – Read Write
• To generate timer interrupt, the GICv4 needs to configure the interrupt
getting routed to a specific vCPU.
• Note: as discussed earlier (slide 29), usage LPIs should reduce the interrupt context
switches. This is something I need to do more research and confirm.
34. ARM Cortex A53 – Clocks and Resets
Clock Tree
• The Cortex-A53 processor has a single
clock input, CLKIN
• RPi3 uses 1.2GHz clock
• All cores in Cortex-A53 & SCU are
clocked with a distributed version of
CLKIN.
• Clock Tree
• PCLK - APB interface / bus
• ACLK - ACE (extension to AXI) bus, ACP
slave interface
• SCLK - SCU interface only if CHI protocol
is used.
• ATCLK - ATB interface, which can operate
at any integer multiple of main clock
• CNTCLK - 64-bit counter
Reset Inputs
• Cortex-A53 processor has the
following active-LOW reset input
signals
• nCPUPORESET[N:0] - primary, cold resets
signals initialize all resettable registers
• nCORERESET[N:0] - same as above,
except debug registers and ETM registers
• nPRESETDBG - single, cluster-wide signal
resets the integrated CoreSight
components that connect to the external
PCLK domain, such as debug logic
• nL2RESET - resets all resettable registers
in L2 memory system and the logic in the
SCU
• nMBISTRESET - an external MBIST
controller can use this signal to reset the
entire SoC.
Clock tree and resets are typically handled by Host OS. VMs generally don’t modify these.
36. Linux KVM/ARM
• KVM stands for Kernel-based Virtual
Machine, which can run “unmodified”
guests.
• As discussed in slide 4, this is a derived
type, where hypervisor is part of an OS
that uses hardware assisted features. It
doesn’t fall under type 1 or 2.
• As shown in picture on right, the KVM
implementation is split into 2 parts
• Highvisor – runs in EL1
• Lowvisor – runs in EL2, to trap hypervisor
calls and exceptions.
in slow and convoluted code paths. As a simple example, a page
fault handler needs to obtain the virtual address causing the page
fault. In Hyp mode this address is stored in a different register
than in kernel mode.
Second, running the entire kernel in Hyp mode would ad-
versely affect native performance. For example, Hyp mode has
its own separate address space. Whereas kernel mode uses two
page table base registers to provide the familiar 3GB/1GB split
between user address space and kernel address space, Hyp mode
uses a single page table register and therefore cannot have direct
access to the user space portion of the address space. Frequently
used functions to access user memory would require the kernel
to explicitly map user space data into kernel address space and
subsequently perform necessary teardown and TLB maintenance
operations, resulting in poor native performance on ARM.
These problems with running a Linux hypervisor using ARM
Hyp mode do not occur for x86 hardware virtualization. x86 root
mode is orthogonal to its CPU privilege modes. The entire Linux
kernel can run in root mode as a hypervisor because the same set
of CPU modes available in non-root mode are available in root
mode. Nevertheless, given the widespread use of ARM and the
advantages of Linux on ARM, finding an efficient virtualization
solution for ARM that can leverage Linux and take advantage
Host
Kernel
KVM
Highvisor
Host
User
QEMU
PL 0 (User)
PL 1 (Kernel)
PL 2 (Hyp)
VM
Kernel
VM
User
Trap
Lowvisor
Trap
Figure 2: KVM/ARM System Architecture
processing required and defers the bulk of the work to be done
to the highvisor after a world switch to the highvisor is complete.
The highvisor runs in kernel mode as part of the host Linux
kernel. It can therefore directly leverage existing Linux function-
ality such as the scheduler, and can make use of standard kernel
software data structures and mechanisms to implement its func-
tionality, such as locking mechanisms and memory allocation
functions. This makes higher-level functionality easier to imple-
ment in the highvisor. For example, while the lowvisor provides
https://www.cs.columbia.edu/~nieh/pubs/asplos2014_kvmarm.pdf
37. KVM/ARM Linux
• The source tree (picture) on right shows all the files
that realizes Highvisor and Lowvisor described in
previous slide.
• Most of the file names should be familiar by now.
• Also it should provide a feel of how simple a hypervisor
implementation (~16k lines of code).
<Linux Kernel Src>
├── COPYING
├── CREDITS
├── Documentation
~~~
└── virt
├── Makefile
├── built-in.a
├── kvm
│ ├── Kconfig
│ ├── arm
│ │ ├── aarch32.c
│ │ ├── arch_timer.c
│ │ ├── arm.c
│ │ ├── hyp
│ │ ├── mmio.c
│ │ ├── mmu.c
│ │ ├── perf.c
│ │ ├── pmu.c
│ │ ├── psci.c
│ │ ├── trace.h
│ │ └── vgic
│ ├── async_pf.c
│ ├── async_pf.h
│ ├── coalesced_mmio.c
│ ├── coalesced_mmio.h
│ ├── eventfd.c
│ ├── irqchip.c
│ ├── kvm_main.c
│ ├── vfio.c
│ └── vfio.h
├── lib
│ ├── Kconfig
│ ├── Makefile
│ ├── built-in.a
│ ├── irqbypass.c
│ ├── modules.builtin
│ └── modules.order
├── modules.builtin
└── modules.order
38. KVM and its suitability for Automotive
• The high-visor of KVM is basically a character device driver.
• From a software component perspective, KVM is a kernel module loaded after
the Linux kernel is initialized.
• New / modification / start of VM happens through ioctl() calls from userspace.
• Auto-start of VM machine is also possible under KVM.
• This means, the first virtual machine can start after the Linux kernel is initialized.
• So, for a product with architecture similar to the one in slide 3 will have difficulty
in meeting safety and start-up time needs for Automotive.
• But, if we use safety certified Linux as a hypervisor (i.e., kernel configured with minimum
features, ~2MB size) on a system with high speed eMMC or UFS (more than 400 Mbps),
then there is a possibility of meeting the timing and safety requirements of Automotive.
• Note: 2 MB image @400 Mbps will get loaded in 40ms.
My view: It is not wise to go on this kind of paths for Automotive. Strategically, we need a light-weight Type 1 hypervisors for Automotive.
39. Minos Hypervisor (Type 1)
• Though there are may type 1 hypervisor, but the Minos project sounded
interesting as it supports virtualization on Raspberry Pi.
• Please evaluate it if you get time
• https://github.com/minosproject/minos
• I will provide more details and my views as time permits.
40. Graphics, Display & Audio
How do current suppliers virtualize these peripherals, which are critical for
Automotive?
42. Acronyms – 1 / 3
• ACE – an extension to AXI protocol
• ACLK – ACE (extension to AXI bus) Clock
• ACP – Accelerator Coherency Ports for AXI slaves
• APB – Advanced Peripheral Bus for slower speed interface by ARM
• App - Software Application (e.g. Calculator App in Android)
• ASID – Address Space Identifier
• ATB – Interface for Trace by ARM
• ATCLK – ATB Clock
• AXI – Advanced eXtensible Interface by ARM
• BCM – Broadcom (the maker of Raspberry Pi)
• CHI – a scalable protocol supporting multi-node interconnect
• CLKIN – Clock In (main processor clock, 1.2 GHz)
• CNTCLK – Timer / Counter Clock
• CNTCLKEN – Counter Clock Enable
• CNTCR – Timer Control Register
• CNTCV – Timer Count Value
• CNTHPIRQ – EL1 physical timer event pin
• CNTPNSIRQ – EL1 Non-secure physical timer event pin
• CNTPSIRQ - EL1 Secure physical timer event pin
• CNTSR – Timer Status Register
• CNTVALUE – 64bit counter value
• CNTVIRQ - Virtual timer event pin
• CONTEXTIDR – Context ID Register (identifies current process ID and ASID)
• CORERESET – Core rest (debug and ETM registers are preserved)
• CPU – Central Processing Unit
• CPUPORESET – CPU Power On Reset
• CTI – CoreSight Trigger Interface (Debug & Trace)
• CTLR.DS – Control Register -> Disable Security bit
• CTM – CoreSight Trigger Matrix (Debug & Trace)
• DDR – Double Data Rate
• DFT – Design For Test
• DMA – Dynamic Memory Access (the unit that offloads CPU for data copy)
43. Acronyms – 2 / 3
• DTS – Device Tree Structure
• DVFS – Dynamic Voltage and Frequency Scaling
• ECC – Error Correction Code
• ECU – Electronic Control Units in Cars
• ELx – Exception Level ‘x’ [x: 0 to 3 for ARMv8]
• ERET – Exception Return
• ESR – Exception Syndrome Register
• FIQ – Fast Interrupt Request (takes higher priority than IRQ)
• FPU – Floating Point Unit
• GIC – Global Interrupt Controller
• GICD - GIC Distributor
• GPU – Graphics Processing Unit
• HCR – Hypervisor Configuration Register
• HPFAR – Hyp IPA Fault Address Register. Holds the faulting IPA for some
aborts on stage 2 translation.
• HVC – Hypervisor Call
• I/O – Inputs and / or Outputs
• INTID – Interrupt Identifier
• IOMMU - I/O MMU, same as SMMU.
• IPA - Intermediate Physical Address
• IRQ – Interrupt Request (from I/O to CPU)
• ITS – Interrupt Translation Service that injects Interrupt directly to VMs.
• IVI – In-Vehicle Infotainment unit.
• KVM – Kernel-based Virtual Machine
• L1 – Level 1
• L2RESET – L2 Memory system reset
• LDR – Load from register
• log2 – Binary Logarithm
• LPI - Locality-specific Peripheral Interrupt are interrupts that uses ITS
• LR – Link Register
• MBIST – Memory Built In Self Test
• MBISTRESET – and external MBIST controller can reset the SoC
• MMU – Memory Management Unit
• NDA – Non Disclosure Agreement
44. Acronyms – 3 / 3
• OS – Operating Systems
• PA – Physical Address
• PC – Program Counter
• PCLK – Peripheral
• PE – Processing Element
• PMU – Performance Monitoring Unit
• PoC – Proof of Concept
• PPI - Private Peripheral Interrupt, targets single specific PE
• PRESETDBG - single, cluster-wide signal resets
• QNX – QNX Operating System
• RAM – Random Access Memory
• ROM – Read Only Memory
• SCLK – SCU Clock
• SCU – Snoop Control Unit that maintains cache coherency
• SGI – Software Generated Interrupts, generated by PEs.
• SMMU – System Memory Management Unit (the MMU for peripherals)
• SoC – System on Chip
• SPI – Shared Peripheral Interrupt
• SRAM – Static Random Access Memory
• STR - Store from Register
• TLB – Translation Lookaside Buffer
• TTBRx – Translation Table Base Register x [x: 0 or 1]
• UART – Universal Asynchronous Receive and Transmit. Serial Comms.
• VA – Virtual Address
• vCPU – Virtual CPU
• VM – Virtual Machine
• VMID – Virtual Machine Identifier
• VMSA – Virtual Memory System Architecture
• VTTBR – Virtualization Translation Table Base Register
• WFI – Wait For Interrupt