The document discusses Reliability, Availability, and Serviceability (RAS) on ARM64 platforms. It provides an overview of ARM64 RAS architecture including hardware support through the RAS extension and firmware support through the Secure Partition Manager (SPM) and MM Secure Partition. It describes the status of prototypes for firmware first error handling using the APEI protocol and CperLib as well as plans to upstream code to various open source projects.
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64 Linaro
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
Speaker: Wei Fu
Track: LEG
★ Session Summary ★
The RAS architecture base on RAS extension, SDEI, APEI.
1. the definition, Function and importance of RAS
2. Hardware requirement: CPU, GIC and RAS Extension
3. Firmware support : SDEI dispatcher and APEI support, and how to take advantage of RAS Extension.
4. OS support: APEI driver and SDEI client, error report mechanism(trace event)
5. Userspace recorder: rasdaemon
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-209/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
ARM64, LEG, RAS
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
"
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
Title: RAS What is it? Why do we need it?
A 101 style introduction to RAS, its purpose and how we use it on ARM64. Covering current status of implementation in ASWG specs and Linux kernel. Plans for future features that are essential for ARM64. Followed by a discussion period.
Speaker: Yazen Ghannam, Fu Wei
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
Demystifying Security Root of Trust Approaches for IoT/Embedded - SFO17-304Linaro
Session ID: SFO17-304
Session Name: Demystifying Security Root of Trust Approaches for IoT/Embedded
- SFO17-304
Speaker: Suresh Marisetty
Track: LHG,LITE,Security
★ Session Summary ★
The current trend of IoT market segment is expected to enable and deploy about 50 billion connected devices by year 2020. IoT devices will be deployed across the board to cater to multiple use cases like Home/building Automation, Automotive, a highly fragmented embedded segment: gateways, set top boxes, security cameras, industrial automation, digital signage, healthcare, etc. This trend will bring about a great challenge of securing the connected end point IoT devices from a myriad of physical and remote attacks ex: DDOS Mirai botnet launched through IoT devices like digital cameras and DVR players
Problem Statement: Each use cases has its own IoT device constraints like: Cost, Power, Performance, memory footprint, security objectives, etc. The fundamental basis for any secure IoT and Embedded solution is the Root of Trust (RoT), which provides assurance of the integrity of the system software from: boot and runtime firmware, to OS loader, to the Kernel, to the user Applications. This poses a serious issue and challenges the one-size fits all RoT solution model.
ARM has taken on this challenge head on to come up with a microcontroller security architecture solution that caters to the various IoT devices constraints, by offering ARM Cortex-M family of processors. ARM’s flexible and scalable architecture solution will allow an OEM or Silicon partner to adapt the base security architecture and to extend it in a seamless way. This caters to the requirements of different market segments through add-on hardware, firmware and software security enhancements.
The session will present the ARM’s base security system and software architecture based on the upcoming Cortex V8M solution that will provide a hardware and firmware assisted Trust Zone based Security RoT aka TBSA-M for a range of markets, to include the highly constrained IoT devices. Furthermore, the session will discuss about how the base RoT capability can be extended in a seamless way with additional hardware assisted mechanisms to offer high levels of functionality and/or robustness for less constrained IoT devises with options like TBSA-M+, TBSA-HSM and platform level security software abstraction framework to decouple the chosen RoT capability for various OSes and the Cloud security frameworks.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-304/
Presentation:
Video: https://www.youtube.com/watch?v=aIwmRXFOshs
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
The document discusses implementing PCIe Address Translation Services (ATS) in ARM-based systems-on-chips (SoCs). It describes an example ARM server system with various components like CPUs, memory controllers, and I/O devices. It then explains how ATS works to improve memory access performance by allowing devices to cache address translations locally instead of relying solely on the IOMMU. The document outlines the typical components involved in ATS like the address translation cache, translating agent, and address translation protection table. It also describes how the ARM System MMU (SMMU) implements ATS and supports distributed address translation caching by endpoints.
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64 Linaro
BUD17-209: Reliability, Availability, and Serviceability (RAS) on ARM64
Speaker: Wei Fu
Track: LEG
★ Session Summary ★
The RAS architecture base on RAS extension, SDEI, APEI.
1. the definition, Function and importance of RAS
2. Hardware requirement: CPU, GIC and RAS Extension
3. Firmware support : SDEI dispatcher and APEI support, and how to take advantage of RAS Extension.
4. OS support: APEI driver and SDEI client, error report mechanism(trace event)
5. Userspace recorder: rasdaemon
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/bud17/bud17-209/
Presentation:
Video:
---------------------------------------------------
★ Event Details ★
Linaro Connect Budapest 2017 (BUD17)
6-10 March 2017
Corinthia Hotel, Budapest,
Erzsébet krt. 43-49,
1073 Hungary
---------------------------------------------------
ARM64, LEG, RAS
http://www.linaro.org
http://connect.linaro.org
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
"
Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203Linaro
Session ID: SFO17-203
Session Name: Reliability, Availability, and Serviceability (RAS) on ARM64 status - SFO17-203
Speaker: Fu Wei
Track: LEG
★ Session Summary ★
This presentation gives an updated RAS architecture on ARM64 base on RAS extension (in ARMv8.2), SDEI (Software Delegated Exception Interface), APEI, UEFI PI-SMM. Will talk about all the components of the new RAS architecture on ARM64, gives audience the current status and the next step of development.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-203/
Presentation:
Video: https://www.youtube.com/watch?v=NReFBzbeWi0
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
---------------------------------------------------
Keyword:
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://twitter.com/linaroorg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Las16 200 - firmware summit - ras what is it- why do we need itLinaro
Title: RAS What is it? Why do we need it?
A 101 style introduction to RAS, its purpose and how we use it on ARM64. Covering current status of implementation in ASWG specs and Linux kernel. Plans for future features that are essential for ARM64. Followed by a discussion period.
Speaker: Yazen Ghannam, Fu Wei
During the CXL Forum at OCP Global Summit, memory system architect Jungmin Choi of SK hynix talks about the need for memory bandwidth and capacity, and the SK hynix Niagara solution.
1. The document provides an overview of the ARM architecture, including details on registers, exceptions, interrupts, memory management and instructions.
2. It describes the evolution of the ARM architecture from ARMv7 to AArch64 (ARMv8), noting changes in registers and exception handling between the versions.
3. The document also covers memory management features like MMU support, different memory types (normal vs device memory), and caching behavior in the ARM architecture.
Demystifying Security Root of Trust Approaches for IoT/Embedded - SFO17-304Linaro
Session ID: SFO17-304
Session Name: Demystifying Security Root of Trust Approaches for IoT/Embedded
- SFO17-304
Speaker: Suresh Marisetty
Track: LHG,LITE,Security
★ Session Summary ★
The current trend of IoT market segment is expected to enable and deploy about 50 billion connected devices by year 2020. IoT devices will be deployed across the board to cater to multiple use cases like Home/building Automation, Automotive, a highly fragmented embedded segment: gateways, set top boxes, security cameras, industrial automation, digital signage, healthcare, etc. This trend will bring about a great challenge of securing the connected end point IoT devices from a myriad of physical and remote attacks ex: DDOS Mirai botnet launched through IoT devices like digital cameras and DVR players
Problem Statement: Each use cases has its own IoT device constraints like: Cost, Power, Performance, memory footprint, security objectives, etc. The fundamental basis for any secure IoT and Embedded solution is the Root of Trust (RoT), which provides assurance of the integrity of the system software from: boot and runtime firmware, to OS loader, to the Kernel, to the user Applications. This poses a serious issue and challenges the one-size fits all RoT solution model.
ARM has taken on this challenge head on to come up with a microcontroller security architecture solution that caters to the various IoT devices constraints, by offering ARM Cortex-M family of processors. ARM’s flexible and scalable architecture solution will allow an OEM or Silicon partner to adapt the base security architecture and to extend it in a seamless way. This caters to the requirements of different market segments through add-on hardware, firmware and software security enhancements.
The session will present the ARM’s base security system and software architecture based on the upcoming Cortex V8M solution that will provide a hardware and firmware assisted Trust Zone based Security RoT aka TBSA-M for a range of markets, to include the highly constrained IoT devices. Furthermore, the session will discuss about how the base RoT capability can be extended in a seamless way with additional hardware assisted mechanisms to offer high levels of functionality and/or robustness for less constrained IoT devises with options like TBSA-M+, TBSA-HSM and platform level security software abstraction framework to decouple the chosen RoT capability for various OSes and the Cloud security frameworks.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/sfo17/sfo17-304/
Presentation:
Video: https://www.youtube.com/watch?v=aIwmRXFOshs
---------------------------------------------------
★ Event Details ★
Linaro Connect San Francisco 2017 (SFO17)
25-29 September 2017
Hyatt Regency San Francisco Airport
The document discusses implementing PCIe Address Translation Services (ATS) in ARM-based systems-on-chips (SoCs). It describes an example ARM server system with various components like CPUs, memory controllers, and I/O devices. It then explains how ATS works to improve memory access performance by allowing devices to cache address translations locally instead of relying solely on the IOMMU. The document outlines the typical components involved in ATS like the address translation cache, translating agent, and address translation protection table. It also describes how the ARM System MMU (SMMU) implements ATS and supports distributed address translation caching by endpoints.
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
This document provides information about ARM Ltd and the ARM architecture. It discusses the history and founding of ARM, the basic operating modes and registers in the ARM architecture, the instruction sets and pipeline stages of various ARM processors, and the features of ARM Cortex processors like the Cortex-A8 and Cortex-A9.
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON
Joe FitzPatrick gave a presentation on exploiting PCIe (Peripheral Component Interconnect Express) buses for hardware attacks. He discussed using DMA (direct memory access) over PCIe to read and write system memory, modify firmware, and potentially bypass mitigations like IOMMU (input-output memory management unit). FitzPatrick demonstrated proof-of-concept attacks on Macs and Windows PCs using custom PCIe devices and software. However, he noted that fully bypassing protections like VT-d on Macbooks had not yet been achieved and more work is needed to build attacks without imitating a genuine device.
LCU13: An Introduction to ARM Trusted FirmwareLinaro
Resource: LCU13
Name: An Introduction to ARM Trusted Firmware
Date: 28-10-2013
Speaker: Andrew Thoelke
Video: http://www.youtube.com/watch?v=q32BEMMxmfw
This document provides an overview of the AMD EPYCTM microprocessor architecture. It discusses the key tenets of the EPYC processor design including the "Zen" CPU core, virtualization and security features, high per-socket capability through its multi-chip module (MCM) design, high bandwidth fabric interconnect, large memory capacity and disruptive I/O capabilities. It also details the microarchitecture of the "Zen" core and how it was designed and optimized for data center workloads.
The document provides an overview of the Peripheral Component Interconnect (PCI) system architecture. PCI is an Intel-backed initiative introduced in 1992 to improve capabilities for adding and removing peripheral devices. It defines a standard configuration space for automatic detection of devices and simplifies driver development. PCI uses bus, device and function numbers to address devices and supports bridges and switches to connect multiple devices. The Linux kernel extracts device information from PCI configuration spaces and represents it in linked data structures for drivers.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
The number of internet-connected devices is growing exponentially, enabling an increasing number of edge applications in environments such as smart cities, retail, and industry 4.0. These intelligent solutions often require processing large amounts of data, running models to enable image recognition, predictive analytics, autonomous systems, and more. Increasing system workloads and data processing capacity at the edge is essential to minimize latency, improve responsiveness, and reduce network traffic back to data centers. Purpose-built systems such as Supermicro’s short-depth, multi-node SuperEdge, powered by 3rd Gen Intel® Xeon® Scalable processors, increase compute and I/O density at the edge and enable businesses to further accelerate innovation.
Join this webinar to discover new insights in edge-to-cloud infrastructures and learn how Supermicro SuperEdge multi-node solutions leverage data center scale, performance, and efficiency for 5G, IoT, and Edge applications.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
This document discusses interrupts, exceptions, and system calls in operating systems. It begins by explaining that the OS is event-driven and only executes when there is an interrupt, trap, or system call. It then describes the different event types - interrupts, exceptions, and how hardware interrupts are handled using interrupt controllers like the 8259 PIC and APIC. It discusses interrupt vectors, interrupt descriptor tables, and how the CPU switches to the interrupt handler. Finally, it covers system calls and exceptions as a type of software interrupt, providing an example of the write system call.
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsThe Linux Foundation
The document discusses hardware accelerated virtualization capabilities in ARM Cortex processors including the Cortex-A15. It describes new features like large physical addressing, virtualization extensions, and a virtual interrupt controller that allow multiple operating system instances and work environments to run simultaneously in isolation on ARM devices.
The document discusses Linux support for the ARM 64-bit (AArch64) architecture. It covers key aspects of the AArch64 instruction set like 64-bit registers and memory accesses. It describes the exception model with multiple privilege levels and modes for virtualization. It also summarizes the Linux kernel port to AArch64 including boot process, memory management support, and compatibility for 32-bit applications. Future work is outlined to improve platform support and add new features to the AArch64 version of Linux.
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
This document discusses evaluating a UCIe-based multi-die system-on-chip (SoC) using system modeling to meet timing and power constraints. It provides an overview of UCIe and how it can be used to connect multiple dies. It then describes assembling a system model in VisualSim Architect using UCIe components to analyze configurations and optimize latency, bandwidth, and power. Examples of multi-media and automotive applications using UCIe-based chiplet designs are also presented.
The document describes an AXI_PCIEX1 module that acts as a PCI Express to AXI bridge, allowing a system with an embedded AXI bus to connect to an external PCI Express bus. The module is compliant with DO-254 guidance for critical applications and has a low gate count and latency. It supports PCIe 1.0 at 2.5 Gbps on one lane and interfaces with AXI4 at 32 bits. The module is optimized for reliability and error reporting. DMAP provides Verilog RTL, verification testbenches, and all required certification documentation.
Project ACRN expose and pass through platform hidden PCIe devices to SOSProject ACRN
ACRN needs to expose and pass through platform hidden PCIe devices to the Service OS (SOS) in order to properly support device access and power management. Currently some devices are hidden by the BIOS and cannot be discovered through normal enumeration. This causes issues when the SOS tries to access or configure these hidden devices. ACRN will add support to discover hidden PCIe devices and assign them to the SOS similar to normal devices, addressing the access and power management problems.
1) Design and Implementation of Multicore Processors
2) Coherence and Consistency
3) Power and Temperature
4) Interconnects
5) Multicore Caches
6) Security
7) Real world examples
The document describes a memory controller for DDR SDRAM that is implemented using Verilog HDL. DDR SDRAM operates at double the frequency of the processor and transfers data on both the rising and falling edges of the clock, allowing it to have higher bandwidth than SDR SDRAM. The controller generates timing and control signals to properly initialize and refresh the memory and handle read and write operations. Simulation and synthesis of the controller design is done using Xilinx ISE 14.5 software.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Spike yuan server ras and uefi cper finalparth bera
The document discusses server Reliability, Availability and Serviceability (RAS) challenges and potential solutions. It covers RAS basics including definitions of reliability, availability and serviceability. It then discusses the key pillars of RAS - fault avoidance, detection, correction and fault management. The document also outlines firmware and software building blocks that enable RAS capabilities, including error reporting and Linux RAS blocks. It provides an overview of the Common Platform Error Record (CPER) format and Linux CPER implementation.
Session ID: HKG18-116
Session Name: HKG18-116 - RAS Solutions for Arm64 Servers
Speaker: Jonathan (Zhixiong) Zhang
Track: Enterprise
★ Session Summary ★
In order to get ARM64 servers into large scale production, both out-of-band and in-band RAS (Reliability/Availability/Serviceability) solutions have to be in-par with servers of other architectures. This presentation discusses requirements and designs of various RAS solution, how hardware/firmware and higher level software working together to achieve the goals, with focus on firmware. RAS topics covered include memory, PCIe, CPU, thermal management and catastrophic errors. In addition, future directions/expectations of server RAS technologies are presented.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-116/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-116.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-116.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Enterprise
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
SFO15-TR9: PSCI, ACPI (and UEFI to boot)
Speaker: Bill Fletcher
Date: September 24, 2015
★ Session Description ★
An introductory session of a system-level overview at Power State Coordination
- Focus on ARMv8
- Goes top-down from ACPI
- A demo based on the current code in qemu
- The specifications are very dynamic - what’s onging for ACPI and PSCI
★ Resources ★
Video: https://www.youtube.com/watch?v=vXzPdpaZVto
Presentation: http://www.slideshare.net/linaroorg/sfo15tr9-psci-acpi-and-uefi-to-boot
Etherpad: pad.linaro.org/p/sfo15-tr9
Pathable: https://sfo15.pathable.com/meetings/303087
★ Event Details ★
Linaro Connect San Francisco 2015 - #SFO15
September 21-25, 2015
Hyatt Regency Hotel
http://www.linaro.org
http://connect.linaro.org
This document provides information about ARM Ltd and the ARM architecture. It discusses the history and founding of ARM, the basic operating modes and registers in the ARM architecture, the instruction sets and pipeline stages of various ARM processors, and the features of ARM Cortex processors like the Cortex-A8 and Cortex-A9.
44CON 2014 - Stupid PCIe Tricks, Joe Fitzpatrick44CON
Joe FitzPatrick gave a presentation on exploiting PCIe (Peripheral Component Interconnect Express) buses for hardware attacks. He discussed using DMA (direct memory access) over PCIe to read and write system memory, modify firmware, and potentially bypass mitigations like IOMMU (input-output memory management unit). FitzPatrick demonstrated proof-of-concept attacks on Macs and Windows PCs using custom PCIe devices and software. However, he noted that fully bypassing protections like VT-d on Macbooks had not yet been achieved and more work is needed to build attacks without imitating a genuine device.
LCU13: An Introduction to ARM Trusted FirmwareLinaro
Resource: LCU13
Name: An Introduction to ARM Trusted Firmware
Date: 28-10-2013
Speaker: Andrew Thoelke
Video: http://www.youtube.com/watch?v=q32BEMMxmfw
This document provides an overview of the AMD EPYCTM microprocessor architecture. It discusses the key tenets of the EPYC processor design including the "Zen" CPU core, virtualization and security features, high per-socket capability through its multi-chip module (MCM) design, high bandwidth fabric interconnect, large memory capacity and disruptive I/O capabilities. It also details the microarchitecture of the "Zen" core and how it was designed and optimized for data center workloads.
The document provides an overview of the Peripheral Component Interconnect (PCI) system architecture. PCI is an Intel-backed initiative introduced in 1992 to improve capabilities for adding and removing peripheral devices. It defines a standard configuration space for automatic detection of devices and simplifies driver development. PCI uses bus, device and function numbers to address devices and supports bridges and switches to connect multiple devices. The Linux kernel extracts device information from PCI configuration spaces and represents it in linked data structures for drivers.
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
This document discusses architectural exploration for AI and ML accelerators using simulation tools. It notes that current AI/ML applications require custom hardware configurations to achieve performance goals. The Imperas simulation tools allow analyzing performance on different hardware designs by running software on virtual platforms months before RTL implementation. Imperas provides virtual platforms for heterogeneous systems running full operating systems along with detailed analysis, profiling and debugging tools. It also includes a RISC-V reference model that enables developing custom instructions for architectural exploration of AI/ML accelerators.
The number of internet-connected devices is growing exponentially, enabling an increasing number of edge applications in environments such as smart cities, retail, and industry 4.0. These intelligent solutions often require processing large amounts of data, running models to enable image recognition, predictive analytics, autonomous systems, and more. Increasing system workloads and data processing capacity at the edge is essential to minimize latency, improve responsiveness, and reduce network traffic back to data centers. Purpose-built systems such as Supermicro’s short-depth, multi-node SuperEdge, powered by 3rd Gen Intel® Xeon® Scalable processors, increase compute and I/O density at the edge and enable businesses to further accelerate innovation.
Join this webinar to discover new insights in edge-to-cloud infrastructures and learn how Supermicro SuperEdge multi-node solutions leverage data center scale, performance, and efficiency for 5G, IoT, and Edge applications.
The document provides an introduction to ARMv8 Aarch64, a 64-bit instruction set introduced in ARMv8. Some key points:
- It uses 64-bit pointers and registers, with 32-bit fixed length instructions. It has 31 general purpose 64-bit registers.
- Many traditional ARM features are removed or modified, such as conditional execution, immediate shifts, load/store multiple. New features are added like large PC-relative addressing and branching.
- It has a load-acquire/store-release memory model and supports atomic operations. Advanced SIMD and floating point are now mandatory.
This document discusses interrupts, exceptions, and system calls in operating systems. It begins by explaining that the OS is event-driven and only executes when there is an interrupt, trap, or system call. It then describes the different event types - interrupts, exceptions, and how hardware interrupts are handled using interrupt controllers like the 8259 PIC and APIC. It discusses interrupt vectors, interrupt descriptor tables, and how the CPU switches to the interrupt handler. Finally, it covers system calls and exceptions as a type of software interrupt, providing an example of the write system call.
Hardware accelerated Virtualization in the ARM Cortex™ ProcessorsThe Linux Foundation
The document discusses hardware accelerated virtualization capabilities in ARM Cortex processors including the Cortex-A15. It describes new features like large physical addressing, virtualization extensions, and a virtual interrupt controller that allow multiple operating system instances and work environments to run simultaneously in isolation on ARM devices.
The document discusses Linux support for the ARM 64-bit (AArch64) architecture. It covers key aspects of the AArch64 instruction set like 64-bit registers and memory accesses. It describes the exception model with multiple privilege levels and modes for virtualization. It also summarizes the Linux kernel port to AArch64 including boot process, memory management support, and compatibility for 32-bit applications. Future work is outlined to improve platform support and add new features to the AArch64 version of Linux.
Evaluating UCIe based multi-die SoC to meet timing and power Deepak Shankar
This document discusses evaluating a UCIe-based multi-die system-on-chip (SoC) using system modeling to meet timing and power constraints. It provides an overview of UCIe and how it can be used to connect multiple dies. It then describes assembling a system model in VisualSim Architect using UCIe components to analyze configurations and optimize latency, bandwidth, and power. Examples of multi-media and automotive applications using UCIe-based chiplet designs are also presented.
The document describes an AXI_PCIEX1 module that acts as a PCI Express to AXI bridge, allowing a system with an embedded AXI bus to connect to an external PCI Express bus. The module is compliant with DO-254 guidance for critical applications and has a low gate count and latency. It supports PCIe 1.0 at 2.5 Gbps on one lane and interfaces with AXI4 at 32 bits. The module is optimized for reliability and error reporting. DMAP provides Verilog RTL, verification testbenches, and all required certification documentation.
Project ACRN expose and pass through platform hidden PCIe devices to SOSProject ACRN
ACRN needs to expose and pass through platform hidden PCIe devices to the Service OS (SOS) in order to properly support device access and power management. Currently some devices are hidden by the BIOS and cannot be discovered through normal enumeration. This causes issues when the SOS tries to access or configure these hidden devices. ACRN will add support to discover hidden PCIe devices and assign them to the SOS similar to normal devices, addressing the access and power management problems.
1) Design and Implementation of Multicore Processors
2) Coherence and Consistency
3) Power and Temperature
4) Interconnects
5) Multicore Caches
6) Security
7) Real world examples
The document describes a memory controller for DDR SDRAM that is implemented using Verilog HDL. DDR SDRAM operates at double the frequency of the processor and transfers data on both the rising and falling edges of the clock, allowing it to have higher bandwidth than SDR SDRAM. The controller generates timing and control signals to properly initialize and refresh the memory and handle read and write operations. Simulation and synthesis of the controller design is done using Xilinx ISE 14.5 software.
Linux Kernel Booting Process (1) - For NLKBshimosawa
Describes the bootstrapping part in Linux and some related technologies.
This is the part one of the slides, and the succeeding slides will contain the errata for this slide.
Spike yuan server ras and uefi cper finalparth bera
The document discusses server Reliability, Availability and Serviceability (RAS) challenges and potential solutions. It covers RAS basics including definitions of reliability, availability and serviceability. It then discusses the key pillars of RAS - fault avoidance, detection, correction and fault management. The document also outlines firmware and software building blocks that enable RAS capabilities, including error reporting and Linux RAS blocks. It provides an overview of the Common Platform Error Record (CPER) format and Linux CPER implementation.
Session ID: HKG18-116
Session Name: HKG18-116 - RAS Solutions for Arm64 Servers
Speaker: Jonathan (Zhixiong) Zhang
Track: Enterprise
★ Session Summary ★
In order to get ARM64 servers into large scale production, both out-of-band and in-band RAS (Reliability/Availability/Serviceability) solutions have to be in-par with servers of other architectures. This presentation discusses requirements and designs of various RAS solution, how hardware/firmware and higher level software working together to achieve the goals, with focus on firmware. RAS topics covered include memory, PCIe, CPU, thermal management and catastrophic errors. In addition, future directions/expectations of server RAS technologies are presented.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-116/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-116.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-116.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: Enterprise
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
This slide provides a basic understanding of hypervisor support in ARM v8 and above processors. And these slides (intent to) give some guidelines to automotive engineers to compare and choose right solution!
The document discusses Slimline Open Firmware (SLOF), an open source firmware implementation based on the Open Firmware standard. It provides an overview of SLOF, describing its modular design, platform support, code structure, filesystem structure, PCI probe infrastructure, display device support, and interfaces including the Open Firmware client interface and Runtime Abstraction Services.
Windows Vista requires full ACPI support. The document discusses ACPI support improvements in Windows Vista, including support for new ACPI specifications, machine role flags, screen brightness controls, processor power management, and PCI Express. It provides guidance to ensure systems properly support ACPI in Windows Vista, including leveraging ACPI where possible, testing implementations, and updating BIOS revisions.
Building PoC ready ODM Platforms with Arm SystemReady v5.2.pdfPaul Yang
The purpose of this technical talk with the demo is to show ODMs, OEMs, and ISVs how to leverage SystemReady Lab, showcase the use-case based on the virtualization platform for the edge, and deploy open-source tools that set up ODMs to develop their Arm platforms.
This document provides information about IBM Power Systems models, including their specifications and features. It discusses both scale-out models like the S814 and S824 that support Linux, AIX, and IBM i, as well as enterprise models like the E870 that support large clustered configurations. It covers technologies like PowerVM, PowerVC, Active Memory Expansion, and ChipKill memory error correction. Overall, the document is a technical reference for IBM Power Systems servers and their capabilities.
AMulti-coreSoftwareHardwareCo-DebugPlatform_FinalAlan Su
This document describes a multi-core software/hardware co-debug platform that integrates various debug mechanisms to allow debugging of both software and hardware issues in a multi-core system-on-chip (SoC). It utilizes ARM CoreSight for on-chip debug and trace, an on-chip test architecture for hardware breakpoints and register inspection, and an AHB/AXI bus monitor. The platform allows stepping through execution, inspecting processor and component status/data, and identifying bugs like race conditions across software, processors, and hardware in the multi-core SoC. It was validated through debugging a multi-core 3D image application program on a quad-ARM1176 chip with the integrated debug mechanisms.
The document summarizes an ARM technology update presentation. It introduced the new Cortex-A7 processor, which is aimed at highly efficient mobile workloads. It also described ARM's big.LITTLE processing approach, which uses more powerful and efficient cores together for optimal performance and battery life. Finally, it provided an overview of the new 64-bit ARMv8-A architecture, including its exception model and support for both 32-bit and 64-bit execution states.
The document discusses the ARM architecture and interrupt handling. It provides background on ARM and compares RISC and CISC architectures. It describes ARM's instruction sets, data sizes, registers including the program counter and current program status register. It discusses exception handling in ARM, including saving state on exception entry and exit. Interrupts and exceptions are compared to system calls. Memory organization during exceptions is also covered.
The document summarizes the Cell processor architecture, which was developed as a collaboration between IBM, Sony, and Toshiba to address limitations in processor performance. The Cell consists of 9 cores - 1 PowerPC core called the PPE and 8 synergistic processor elements (SPEs) optimized for SIMD operations. It has a peak performance of over 200 GFLOPS and was used in the PlayStation 3 game console to enable graphics-intensive applications. The document outlines the Cell architecture and how it aims to overcome performance walls related to power, memory, and frequency limitations.
directCell - Cell/B.E. tightly coupled via PCI ExpressHeiko Joerg Schick
This document summarizes new features in PCI Express Gen 3, including Atomic Operations, TLP Processing Hints, TLP Prefix, Resizable BAR, and others. It describes how each feature enhances PCI Express functionality, such as enabling atomic operations to facilitate migration of SMP applications to PCIe accelerators, and TLP Prefix allowing expansion of header sizes to carry additional information.
The document describes the ARMv7-A architecture and its support for large physical addresses (LPAE) in Linux. Key points include:
- ARMv7-A supports LPAE through a 3-level translation table that maps 40-bit virtual addresses to 40-bit physical addresses.
- Linux implements LPAE by modifying page table definitions, extending the swapper page directory to cover three levels, and adapting functions for setting page table entries and switching address spaces.
- Low memory is mapped with 2MB sections while page tables can be allocated from high memory. Exception handling and PGD allocation/freeing were also updated for LPAE.
This document provides information about the ARM7 microcontroller LPC2148. It discusses the features of the LPC2148 including its memory, speed, interfaces, and peripherals. It also describes the ARM7TDMI-S architecture and software tools that can be used for programming the LPC2148 such as compilers, debuggers, and IDEs. Finally, it discusses some example applications of the LPC2148 and how to interface it with an LCD and communicate using UART.
The document provides an overview of ST7 8-bit microcontrollers from STMicroelectronics. It describes the ST7 portfolio, features, and peripherals. The ST7 is an 8-bit CISC core with up to 60KB program memory and 5KB RAM. It has flash memory options, a rich peripheral set including timers, serial interfaces, and USB/CAN connectivity. The document outlines the ST7 architecture and interrupt system, memory types, programming tools, and applications.
A 32-Bit Parameterized Leon-3 Processor with Custom Peripheral IntegrationTalal Khaliq
A new descriptive method to use ARM AHB and APB Bus architecture to add new IP Cores and enhance functionality of 32 bit Processor (Leon3). AHB and APB addressing and GUI enhancement is also discussed.
AAME ARM Techcon2013 002v02 Advanced FeaturesAnh Dung NGUYEN
This document provides an overview of advanced ARM microcontroller features related to exceptions and interrupts. It discusses the ARM v7-M exception architecture, including nested prioritized interrupts, efficient interrupt handling through microcoded architecture, and built-in real-time operating system support. Key aspects covered include interrupt overhead reduction, interrupt arrival during state restore, exception types, processor mode usage, the nested vectored interrupt controller, microcoded interrupt mechanism, exception priorities and preemption, the vector table layout, reset and exception behavior, exception states, and interrupt service routine entry processing.
This document provides an overview of ARM embedded systems, including the ARM processor architecture, instruction set, hardware components, and software stack. It describes the RISC design philosophy behind ARM and how its instruction set is optimized for embedded applications. It also discusses the ARM bus technology, memory, peripherals, boot code, operating systems, and common application areas for ARM processors like networking, automotive, mobile devices, and more.
Assembly Language for x86 Processors 7th Edition Chapter 2 : x86 Processor Ar...ssuser65bfce
This document provides an overview of Chapter 2 from the textbook "Assembly Language for x86 Processors". It discusses the basic concepts of computer hardware components including the CPU, memory, and input/output systems. It then describes the architecture of x86 processors, including their modes of operation, basic execution environment involving registers and memory, and instruction execution cycle. Examples of simple assembly language programs are provided.
Similar to Reliability, Availability, and Serviceability (RAS) on ARM64 status - SAN19-118 (20)
How Can Hiring A Mobile App Development Company Help Your Business Grow?ToXSL Technologies
ToXSL Technologies is an award-winning Mobile App Development Company in Dubai that helps businesses reshape their digital possibilities with custom app services. As a top app development company in Dubai, we offer highly engaging iOS & Android app solutions. https://rb.gy/necdnt
Flutter is a popular open source, cross-platform framework developed by Google. In this webinar we'll explore Flutter and its architecture, delve into the Flutter Embedder and Flutter’s Dart language, discover how to leverage Flutter for embedded device development, learn about Automotive Grade Linux (AGL) and its consortium and understand the rationale behind AGL's choice of Flutter for next-gen IVI systems. Don’t miss this opportunity to discover whether Flutter is right for your project.
Artificia Intellicence and XPath Extension FunctionsOctavian Nadolu
The purpose of this presentation is to provide an overview of how you can use AI from XSLT, XQuery, Schematron, or XML Refactoring operations, the potential benefits of using AI, and some of the challenges we face.
UI5con 2024 - Keynote: Latest News about UI5 and it’s EcosystemPeter Muessig
Learn about the latest innovations in and around OpenUI5/SAPUI5: UI5 Tooling, UI5 linter, UI5 Web Components, Web Components Integration, UI5 2.x, UI5 GenAI.
Recording:
https://www.youtube.com/live/MSdGLG2zLy8?si=INxBHTqkwHhxV5Ta&t=0
Unveiling the Advantages of Agile Software Development.pdfbrainerhub1
Learn about Agile Software Development's advantages. Simplify your workflow to spur quicker innovation. Jump right in! We have also discussed the advantages.
The Key to Digital Success_ A Comprehensive Guide to Continuous Testing Integ...kalichargn70th171
In today's business landscape, digital integration is ubiquitous, demanding swift innovation as a necessity rather than a luxury. In a fiercely competitive market with heightened customer expectations, the timely launch of flawless digital products is crucial for both acquisition and retention—any delay risks ceding market share to competitors.
Liberarsi dai framework con i Web Component.pptxMassimo Artizzu
In Italian
Presentazione sulle feature e l'utilizzo dei Web Component nell sviluppo di pagine e applicazioni web. Racconto delle ragioni storiche dell'avvento dei Web Component. Evidenziazione dei vantaggi e delle sfide poste, indicazione delle best practices, con particolare accento sulla possibilità di usare web component per facilitare la migrazione delle proprie applicazioni verso nuovi stack tecnologici.
Consistent toolbox talks are critical for maintaining workplace safety, as they provide regular opportunities to address specific hazards and reinforce safe practices.
These brief, focused sessions ensure that safety is a continual conversation rather than a one-time event, which helps keep safety protocols fresh in employees' minds. Studies have shown that shorter, more frequent training sessions are more effective for retention and behavior change compared to longer, infrequent sessions.
Engaging workers regularly, toolbox talks promote a culture of safety, empower employees to voice concerns, and ultimately reduce the likelihood of accidents and injuries on site.
The traditional method of conducting safety talks with paper documents and lengthy meetings is not only time-consuming but also less effective. Manual tracking of attendance and compliance is prone to errors and inconsistencies, leading to gaps in safety communication and potential non-compliance with OSHA regulations. Switching to a digital solution like Safelyio offers significant advantages.
Safelyio automates the delivery and documentation of safety talks, ensuring consistency and accessibility. The microlearning approach breaks down complex safety protocols into manageable, bite-sized pieces, making it easier for employees to absorb and retain information.
This method minimizes disruptions to work schedules, eliminates the hassle of paperwork, and ensures that all safety communications are tracked and recorded accurately. Ultimately, using a digital platform like Safelyio enhances engagement, compliance, and overall safety performance on site. https://safelyio.com/
E-commerce Development Services- Hornet DynamicsHornet Dynamics
For any business hoping to succeed in the digital age, having a strong online presence is crucial. We offer Ecommerce Development Services that are customized according to your business requirements and client preferences, enabling you to create a dynamic, safe, and user-friendly online store.
Mobile App Development Company In Noida | Drona InfotechDrona Infotech
Drona Infotech is a premier mobile app development company in Noida, providing cutting-edge solutions for businesses.
Visit Us For : https://www.dronainfotech.com/mobile-application-development/
5. Hardware support for RAS
CPU
● ARMv8-A architecture (a mandatory
extension to ARMv8.2)
● EL2, EL3, or both
● Virtualization extension or Security
extensions or both
GICv3
● Interrupt routing modes
● Private and shared interrupts
(PPI/SPI)
● Ability to set an interrupt pending
event signaling and delegation
Interrupt groups/priority
RAS Extension
● ESB (Error Synchronization Barrier)
instructions
● RAS Extension registers
● Corrupted data poisoning
6. RAS Extension: Gather HW error info for FW
ESB instruction
Help to locate Error
RAS Extension registers
●Provide the error info to FW
●Control the Interrupt by FW
ARMv8-A RAS extensions standardize
the interface between HW and
FW
● a mandatory extension to ARMv8.2, an optional extension to the Armv8.0 and
Firmware
RAS
Extension
registers
7. Arm ® Reliability, Availability, and Serviceability
(RAS) Specification Documentations
The latest version for now is :
DDI 0587C.b
The latest version for now is :
DDI 0487E.a
8. ● Secure Partition Manager in BL31
exports standard ABI to
− Initialize the partition
− Delegate SMC requests to the
partition
− For RAS, delegate RAS error
handling to a MM Secure
Partition of RAS
● MM Secure Partition implements
management functions , runs in S-
EL0
− Leverages existing firmware
code based on EDK2:
StandaloneMm
− Minimise code in EL3
Firmware support for RAS: SPM & MM Secure
Partition
9. Arm SPA Documentations
The latest version for now is :
DEN 0063 1.0.0-2 https://developer.arm.com/architectures/sec
urity-architectures/platform-security-
architecture
10. APEI
EINJERST
BERT HEST For runtimeFor last
crash
(critical error)
For Testing
For Storage Provides a standard way
to convey error info
from Firmware to OS
CPER
UEFI
For Error
info format
Firmware support for RAS: APEI (ACPI Platform Error Interfaces)
11. ACPI and UEFI Documentations
The latest version for now is :
Version 6.3
The latest version for now is :
Version 2.8
12. ACPI for the Armv8 RAS Extensions 1.0(Provisional)
This document describes ACPI extensions
that enable kernel first RAS handling for
systems that employ the Arm RAS
extensions.
For PEs, this specification covers:
1. Armv8 & 8.4 RAS extension
2. RAS system architecture v1.0 & v1.1.
AEST: Arm
Error Source
Table
Integrating
AEST into
HEST
Armv8 RAS
extension
section
ACPI CPER
Code First
13. From firmware to OS: SDEI usage in RAS
Software Delegated Exception Interface:An interface between FW & OS, for registering,
notifying and servicing system events using SMC/HVC. SDEI Specification (ARM DEN0054A)
19. SBSA QEMU & Secure Partition
❖ SBSA QEMU
➢ SBSA QEMU patchset has
been upstreamed
Great thanks to Hongbo and Radosław
❖ ARM-TF
➢ SBSA QEMU support has
been partly upstreamed,
and the rest of patches are
posted.
➢ StandaloneMm support(on
SBSA QEMU) patches are
debugged
20. SBSA QEMU & Secure Partition
❖ EDK2 and edk2-platform
➢ StandaloneMm support patchset has been upstreamed
➢ StandaloneMm support for SBSA QEMU patches are under
debugging.
➢ APEI protocol and CperLib patches are under debugging.
21. TODO
❖ Go on upstreaming ARM-TF
patchset (target: v2.2)
❖ Hardware
➢ Test on a real hardware
(ARMv8.2+)
❖Firmware
➢ EDK2:
■ Improve and upstream
APEI protocol and
CperLib code (target:
201912)
■ EINJ prototype(target:
before next Connect)
■ prototype solution of
BERT
22. EINJ: Error Injection with Common Fault
Injection Model Extension
Linux
EINJ
driver
APEI
Test
application
BL32 in SEL-0
MM
Foundation/Core
Non-secure mem for
firmware
RAS Node
registers
Common Fault
Injection Model
Extension
Trigger Error
Action Tables
EINJ
Shared mem
EDK2(BL33)
In EL-2
23. Acknowledgments
Achin Gupta
Arvind Chauhan
Charles Garcia-Tobin
Daniil Egranov
Dong Wei
Supreeth Venkatesh
Thomas Abraham
Leif Lindholm
Radosław Biernacki
Tanmay Jagdale
Zhang HongBo
Al Stone
John Feeney
Jon Masters
24. Thank you
Join Linaro to accelerate deployment of your Arm-based
solutions through collaboration
contactus@linaro.org
26. What is RAS? -- Definition
Reliability
Continuity, Computation
needs be correct and reliable.
Availability
Readiness, System needs to
remain available as long as
possible.
Serviceability
Ability to undergo modifications
and repairs, system should provide
information to administrator to aid
in system servicing.
The RAS architecture primarily cares about
the ERRORs produced from HARDWARE.
27. Why do we need RAS? -- Reality
Impacts
Enterprise systems often provide
mission-critical services. Any
system failure or data corruption
impacts the customer’s business
and reputation
Inevitability
Although faults are rare,
enterprise systems can be
very large. So failures are
inevitable.
28. Why do we need RAS? -- Importance
So we inevitably have to maintain system very well.
For reducing the expense for maintenance, we should :
Replace
only failed
parts
Scheduled
maintenance
cheaper than
unscheduled service outages
29. History
ECC in memory controllers and I/O RAMs
Machine Check Architecture (MCA)
● A mechanism in which the CPU reports
hardware errors to the OS
• model-specific registers (MSRs)
■ set up machine checking
■ record detected errors
■ the info they contain is CPU
specific
• Machine Check Exception (MCE)
■ signals the detection of an
uncorrected machine-check
error
■ handler collect information
about error from MSRs
• Utility: mcelog
PCI-E: Advanced Error Reporting (AER)
Linux kernel
● EDAC (Error Detection and Correction)
• designed to report and possibly
act on hardware errors
• inspect the hardware directly
(system-specific handling and
decoding.)
• only support memory controller
and PCI/AGP errors
Firmware (first) FF
● APEI
● UEFI
32. RAS Extension: ESB instruction
● ESB (Error Synchronization Barrier) can be used to synchronize Unrecoverable
errors(containable errors).
● Software can determine that:
○ The error was reported as Unrecoverable.
○ The preferred return address of the SEI is an ESB instruction.
○ The software between that ESB and the previous ESB can be isolated.
○ ESB might update :
■ DISR_EL1 / DISR (Deferred Interrupt Status Register)
■ VDISR_EL2 / VDISR (Virtual Deferred Interrupt Status Register)
33. RAS Extension: Registers
System register views
(in Arm ® Architecture Reference Manual)
● Processor Feature Register
● Memory Model Feature Register
● Error Record Register
○ Error Record ID Register
○ Error Record Select Register
○ [Selected Error Record] Feature Register
○ [...] Control Register
○ [...] Primary Status Register
○ [...] Address Register
○ [...] Miscellaneous Register [0-3]
● Deferred Interrupt Status Register
● Hypervisor Configuration Register
● Virtual SError Exception Syndrome
Register
● Virtual Deferred Interrupt Status Register
● Secure Configuration Register
Memory-mapped view
● Error Record Feature Register
● Error Record Control Register
● Error Record Primary Status Register
● Error Record Address Register
● Error Record Miscellaneous Register[0-3]
● Pseudo-fault Generation Feature register
● Pseudo-fault Generation Control register
● Pseudo-fault Generation Countdown register
● Error Group Status Register
● Implementation Identification Register
● Error Interrupt Configuration Register[FHI, ERI,
CRI][0-2]
● Error Interrupt Status Register
● Device Affinity Register
● Device Architecture Register
● Device Configuration Register
● Peripheral Identification Register[0-4]
● Component Identification Register[0-3]
34. APEI tables
BERT: Boot Error Record Table
Record fatal errors, then report it in the
second boot
CPER(in the Appendix of UEFI spec)
Common Platform Error Record
35. APEI tables
HEST: Hardware Error Source Table
● Key info:
HOW to get trigger
WHERE are the error records
HOW to release records’ mem
● For ARM64 : GHES v2
HOW to get trigger: Notification
Structure
WHERE are the error records:
Error Status Address
(GAS : Generic Address Structure)
HOW to release records’ mem:
Read Ack Register
For IA-32 :
MCE
CMC
NMI
For PCI: AER
Root Port
Endpoint
Bridge
For generic
hardware:
GHES V1/V2
(Generic
Hardware
Error Source)
36. APEI tables
ERST: Error Record Serialization Table
Operation abstract, provides details
necessary to communicate with on-board
persistent storage for error recording.
EINJ: Error Injection Table
Operation abstract, provides a generic
interface which OSPM can inject
hardware errors to the platform
without requiring platform specific
software.
37. ACPI for the Armv8 RAS Extensions 1.0(proposal-1)
AEST: Arm Error Source Table
AEST node are composed of the
following parts:
● A header
● A set of common fields for all error
nodes(Node generic data)
● A component specific section
● A section that describes the interfaces
associated with an error node.
● A section describing associated
interrupts with the error node.
38. ACPI for the Armv8 RAS Extensions 1.0(proposal-2)
Integrating AEST into HEST
Q: Do we have to add a brand new table for
the extension?
An alternative way is to integrate the “AEST”
nodes into the HEST directly by creating new
error source structure, Type: 12-ARMv8 RAS
Extension error node
39. ACPI for the Armv8 RAS Extensions 1.0(proposal-3)
CPER Armv8 RAS extension section
The section gathers the content of a RAS
extension node’s registers without any
specific information that can associate it
with components in the SoC, because it is
used alongside other error sections
already defined for CPER, such as
processor sections, generic and Arm, or
Memory sections.
40. Uncorrected Error -- HEST & MM
1. System boot: BootROM-->BL2-->BL3x
a. BL31 initializes SPM, SDEI
dispatcher and BL32(MM dispatcher)
b. UEFI (BL33), DXE, UEFI Platform
Driver:
i. query MM partitions by APEI
protocol for error source info
ii. MM partitions return error
source info back to UEFI
iii. UEFI map in and mark error
record region as Runtime
Services Data Region
iv. Update/add error source info
in HEST
2. OS starts running: HEST driver scan HEST
table and register error handlers by SDEI
3. UE occurred, the event will be routed to
EL3 (SPM)
4. SPM routes the event to RAS error
handler in S-EL0 (MM partitions)
5. MM Foundation (CperLib) creates the
CPER blobs by the info from RAS
Extension
6. SPM notifies SDEI to call the
corresponding OS registered handler
7. OS gets the CPER blobs by Error Status
Address block, process the error, try to
recovery.
8. report the error event by RAS event
9. rasdaemon log error info from RAS event
to recorder
Boot Time RunTime
41. APEI protocol: API
Main
● Apei.c
○ UpdateApei :
updates error source information to
the specified APEI(HEST/BERT) Table
typedef
EFI_STATUS
(EFIAPI *EFI_APEI_UPDATE) (
IN CONST EFI_APEI_PROTOCOL *This,
IN UINT32
Signature
);
● ApeiCommon.c
○ GetApeiTable
○ AcpiTableChecksum
For HEST table:
● ApeiHest.c
− UpdateHest
■ GetErrorSources
■ UpdateGhes
For BERT table(TBD):
● ApeiBert.c:
○ UpdateBert
■ GetErrorInfo
■ FillBert
42. CperLib: API
● CperInit:
parse the error source info, and return
the error record address info
EFI_STATUS
EFIAPI
CperInit (
IN EFI_APEI_ERROR_SOURCE *ErrorSource,
IN OUT UINTN *ErrorRecordAddress
);
The Dynamic Tables module passes the
ErrorSource to help CperLib get the
memory region info for other APIs, like
CperWrite
● CperWrite:
creates a CPER blob at the defined
memory region from Section info data
EFI_STATUS
EFIAPI
CperWrite (
IN SECTIONS_INFO *SectionInfo,
IN UINTN ErrorRecordAddress
);
Wrap the given section data into CPER
blob and put it into the specific
memory region
43. BERT : When catastrophic errors occur(TODO)
● if whole system has
crashed down, BMC or
other system controller
should generate the error
blob, and save it.