1 |
Xen Functional Safety -
Update
Michal Orzel – AMD
Ayan Kumar Halder - AMD
2 |
Introduction
• Michal Orzel
o Maintainer of Xen on Arm
o SMTS at AMD
o Founding member of Xen Safety certification within AMD
o Active Xen community member (350+ patches authored/reviewed in the last 3 years)
• Ayan Kumar Halder
o Supporting Xen hypervisor on AMD products (as a part of Virtualization team)
o MTS at AMD
o Coordinating Xen functional safety efforts across different teams
o 15 years of experience working on low level software stack (kernel, Zephyr, ethos-n-driver, bootloaders) and post
silicon validation of Arm based products
3 |
Functional Safety
• “Functional safety is part of the overall safety that depends on a system or
equipment operating correctly in response to its inputs.” (IEC 61508)
• “Absence of unreasonable risk due to hazards caused
by malfunctioning behavior of E/E systems” (ISO 26262)
• “Safety is freedom from unacceptable risk” (ISO 14971)
4 |
However, it provides benefits beyond safety
Safety
Quality
Usability
Maintainability
Upgradability
Diverse
Opensource
Community Safety Certification
Efforts
5 |
Safety certifying Xen Hypervisor
• AMD is working on making Xen safety-certifiable for AMD platforms
• ARM and AMD x86 platforms
• IEC 61508 SIL 3 (Systematic Capability 3) & ISO 26262 ASIL D
• Certification based on Xen upstream community processes and upstream codebase
• Not working with a private fork -- Ability to update the certification with limited efforts
• Certification docs & artifacts available for AMD customers
• Open to collaborations with other community members upstreaming
• Assumptions and Scope
• Common code and core components in Xen
• AMD x86: AMD-v, AMD-Vi, IOMMU, HPET, vPCI
• ARM: SMMUv3, GICv3, Arch Timer, Hypervisor Extensions, vPCI
• Easy to port to future generations of hardware
• Xen enabling components for Virtio and Xen PV Drivers
• Safe memory sharing using Virtio with grant table;
• Virtio with grant table enables Virtio frontends in Safe VMs
• No OS/hypervisor dependencies: run (multiple) Safe VMs and QM VMs of your choice
6 |
Core features of Xen for safety certification
• Microkernel architecture
• Small code size (less than 50K LOC on Arm)
• Dom0less
• Static partitioning
• Each domain has direct hardware access (IOMMU protected)
• Real-time
• Strong isolation between the domains
• Real-time isolation
• Failure isolation
• By default, each component has only enough privilege to do what it needs
to do
o Parallel boot
• Dom0 becomes optional
• Faster time to boot/service
• Real-time with the null scheduler
• Cache isolation with cache coloring
Xen
SafeOS
e.g. Zephyr
QM
e.g. Linux
HW
HW partition
HW partition
create create
access access
QM == Non-Safe
7 |
Xen Safety Progress
• Xen MISRA C compliance
• MISRA C: coding guidelines for safe C programming
• Goal: improve the Xen codebase
• MISRA C compliance is never at the expense of quality
• Requirements, Test Cases, Tests
• Define scope and requirement's structure
• “market”, “product” and “software safety” requirements
• Traceability
• Link requirements, tests, and code
8 |
Xen Safety Progress: MISRA C
• Preliminary tailoring resulted in the selection of 143 MISRA C rule candidates
• MISRA C rules adoption in progress:
• 135 discussed among maintainers and Bugseng experts
• 114 rules adopted and added to docs/misra/rules.rst
• Only 6 rules left to discuss!
• Rules added to docs/misra/rules.rst
• Xen 4.18 release: 148 commits to fix MISRA C violations by
• MISRA C unjustified violations down from 2 million to 90,000!
• ECLAIR MISRA C scanner integrated in the upstream Xen Gitlab CI-loop
• 76 rules checked with zero unjustified violations (“clean” and checked against regressions)
• 9 more rules are also clean on arm64 only and 1 rules on x86_64 only
• 21 additional rules will also be checked against regressions (some violations are present)
9 |
Xen Safety Progress: Requirements
• Derived directly from the technical safety requirements allocated to software or
are requirements for safety functions and properties that, if not fulfilled, could lead to a
violation of the technical safety requirements allocated to software
• 400 requirements:
• Market Requirements (or L1 reqs)
• Product Requirements (or L2 reqs)
• Software Safety Requirements (or L3 reqs)
Market
requirements
Product
requirements
Software safety
requirements
Test
case
Test
code
Test
job
M to N M to N 1 to N N to 1 1 to 1
10 |
Why writing requirements
• Before developing a new feature, we need to
answer three question :-
o What the feature is
o Why is the new feature required
o How is it designed/implemented
• Currently
o Why/What/How are explained in the commit message
o Optionally, there may be a design note explaining how
• With requirements being written as a separate entity
o Why and What is decoupled from the code. Easy to
view the big picture
o How can still be addressed in the commit message or
design note.
o Linking "Why --> What --> How" ??
Commit message,
design notes,
documentation
Rationale
(explains
why)
What
the
feature
is
How the
feature is
being
implemented
by Xen
11 |
Market Requirements
• Identify the scope of the safety certification for Xen
• Defines the expectations of Xen for automotive and embedded use cases
o So, this is mostly of interest to the product marketing or FAE folks who have expectations from a hypervisor.
• Written with a high-level view of the system
• Example:
Name Description
Static VM definition Xen shall specify the resources required to boot and
run safe and non-safe VMs.
Run Arm64 and AMD-x86 VMs Xen shall run Arm64 and AMD x86 VMs.
VM device assignment Xen shall be able to assign devices to each VM. For
e.g.: it should be able to assign GPU to VM1, MMC to
VM2. Only the VM assigned to a device shall have
exclusive access to the device.
12 |
1 Market Requirement à N Product Requirements
• Product Requirements explain how Market Requirements are fulfilled by Xen
• Product Requirements are Xen specific
o So, this of interest to Xen architects who understand how the requirements are fulfilled by Xen.
• Still written with a high-level view of the system
• Product Requirements can sometimes be linked to more than one Market Requirement
Emulated UART
13 |
1 Product Requirements à N Software Safety Requirements
Domain shall be able to read the frequency of the system counter (either via
CNTFRQ_EL0 register or "clock-frequency" device tree property if present).
Access virtual timer from a domain
Trigger the physical timer interrupt from a domain
Trigger the virtual timer interrupt from a domain
14 |
Characteristics of Software Safety Requirements
• The requirements are written in plain English, from the perspective of what Xen is expected to fulfil
• The software safety requirements (SSR) are the most granular form
• Engineers are expected to refer to a SSR (and architecture spec) to write a test to validate it
o This is of interest to the subsystem maintainers or folks working on specific parts of Xen (eg generic timer).
• Each SSR should be tested independently
• SSR should be unambiguous, complete, consistent, correct
• SSR should be traceable all the way to market requirements
15 |
Organizing the software safety requirements
Booting
Domain Creation and Runtime
•Domain Creation
•Domain Fully Emulated Resources
•Domain Partially Emulated Resources
•Hypercalls
•Physical Resources
Firmware Physical resources
• Xen shall be able to create a domain using a
specified kernel image.
• Domain shall be able to transmit data in polling mode
(i.e. without involving interrupts). (Emulated UART)
• Domain shall be able to access the counter-timer
kernel control register to allow controlling the access to
the timer from userspace (EL0). (Generic Timer)
• Domain shall be able to access
__HYPERVISOR_xen_version passing
XENVER_version as a command.
• Xen shall validate the presence of mandatory SMMU
features ….
• Xen shall be able to configure and use
HPET timer in one shot mode.
• Xen shall be able to receive HPET interrupt.
• Xen shall be able to invoke
SMCCC_VERSION as a parameter to
PSCI_FEATURES to obtain the SMCCC
version.
• Xen shall invoke psci (PSCI_FEATURES)
to obtain the features.
• Xen shall enable Memory Management
Unit.
• Xen shall use 4KB page granularity.
• Xen shall enable instruction cache.
Still something is missing ??
16 |
Assumption of Use
Hardware
Firmware
Bootloader
Domain
Compiler
Xen relies on
them to fulfill its
functionality
• GCC version should be 5.1 or later (arm64)
• GCC version should be 4.1.2_20070115 or
later (x86)
• Xen shall be loaded at Non-Secure EL2
exception level.
• Bootloader shall pass physical address of
the host device tree in x0 register.
• The hardware needs to have the ARM Generic
Interrupt Controller, version 3
• The hardware needs to have the Arm System Memory
Management Unit, version 3 onwards
• Domain should not write access
GICD_ISACTIVER<n> registers
• Domain should not use physical LPIs
without ITS
• CNTFRQ_EL0 needs to be programmed
with the system timer frequency. Or the
"clock-frequency" dt property should be
used.
• TF-A shall provide PSCI api
(PSCI_VERSION) to read the version.
17 |
Software Architecture Specification
• Defines the major elements and subsystems of the
software, how they are interconnected, and how the
required attributes, particularly safety integrity, will be
achieved
• Defines the overall behaviour of the software, and how
software elements interface and interact
• Satisfies both safety and non-safety requirements
• Initial version of the document written
18 |
Software Validation Test Cases
• Validation - "confirmation by examination and provision of objective evidence that the particular
requirements for a specific intended use are fulfilled" (IEC 61508)
• 120 test cases (written as RST docs):
• Define:
§ test objectives
§ test prerequisites
§ steps required to achieve the objectives
§ test pass/fail criteria
Methods
ASIL
A B C D
1a Analysis of requirements ++ ++ ++ ++
1b Generation and analysis of equivalence classes + ++ ++ ++
1c Analysis of boundary values + + ++ ++
1d Error guessing based on knowledge of experience + + ++ ++
1e Analysis of functional dependencies + + ++ ++
1f Analysis of operational use cases + ++ ++ ++
19 |
Software Validation Tests
• 160 tests
• Verification criteria:
o Compliance with software design spec
o Correct implementation of the functionality
o Robustness check
o Absence of unintended functionality
• 3 types of tests:
o Linux
§ designed for high-level functionality testing through a black-
box approach
§ tests written as userspace apps, kernel modules
§ example: device tree parsing testing
o Zephyr
§ target mid-level functionalities with a focus on components
such as UART, Timer, etc.
§ tests written as Zephyr applications
§ example: emulated UART testing
o XTF
§ designed for low-level functionality suited for examining the
core functionalities such as hypercalls
§ tests written as XTF guests
§ example: event channel hypercall testing
Methods
ASIL
A B C D
1a Requirements based test ++ ++ ++ ++
1b Interface test ++ ++ ++ ++
1c Fault injection test + + ++ ++
1d Resource usage evaluation ++ ++ ++ ++
1e Back-to-back comparison test between model and code, if
applicable
+ + ++ ++
1f Verification of the control flow and data flow + + ++ ++
1g Static code analysis ++ ++ ++ ++
1h Static analysis based on abstract interpretation + + + +
20 |
Xen Safety Progress: Traceability
• OFT (OpenFastTrace) selected as a requirements management tool
• Requirements-as-Code methodology
• Detect missing dependencies (missing links)
• Detect obsolete requirements (old versions)
• Generate traceability reports
21 |
• Keep doing the stringent code reviews
• Writing and Upstreaming requirements
• Writing and Upstreaming architecture specs
Community Collaboration
How can the community participate How does the community benefit
• Better code quality
• Ease of onboarding new engineers
• Easier to explain to customers, FAE, management
22
Thank you

Using Xen Hypervisor for Functional Safety

  • 1.
    1 | Xen FunctionalSafety - Update Michal Orzel – AMD Ayan Kumar Halder - AMD
  • 2.
    2 | Introduction • MichalOrzel o Maintainer of Xen on Arm o SMTS at AMD o Founding member of Xen Safety certification within AMD o Active Xen community member (350+ patches authored/reviewed in the last 3 years) • Ayan Kumar Halder o Supporting Xen hypervisor on AMD products (as a part of Virtualization team) o MTS at AMD o Coordinating Xen functional safety efforts across different teams o 15 years of experience working on low level software stack (kernel, Zephyr, ethos-n-driver, bootloaders) and post silicon validation of Arm based products
  • 3.
    3 | Functional Safety •“Functional safety is part of the overall safety that depends on a system or equipment operating correctly in response to its inputs.” (IEC 61508) • “Absence of unreasonable risk due to hazards caused by malfunctioning behavior of E/E systems” (ISO 26262) • “Safety is freedom from unacceptable risk” (ISO 14971)
  • 4.
    4 | However, itprovides benefits beyond safety Safety Quality Usability Maintainability Upgradability Diverse Opensource Community Safety Certification Efforts
  • 5.
    5 | Safety certifyingXen Hypervisor • AMD is working on making Xen safety-certifiable for AMD platforms • ARM and AMD x86 platforms • IEC 61508 SIL 3 (Systematic Capability 3) & ISO 26262 ASIL D • Certification based on Xen upstream community processes and upstream codebase • Not working with a private fork -- Ability to update the certification with limited efforts • Certification docs & artifacts available for AMD customers • Open to collaborations with other community members upstreaming • Assumptions and Scope • Common code and core components in Xen • AMD x86: AMD-v, AMD-Vi, IOMMU, HPET, vPCI • ARM: SMMUv3, GICv3, Arch Timer, Hypervisor Extensions, vPCI • Easy to port to future generations of hardware • Xen enabling components for Virtio and Xen PV Drivers • Safe memory sharing using Virtio with grant table; • Virtio with grant table enables Virtio frontends in Safe VMs • No OS/hypervisor dependencies: run (multiple) Safe VMs and QM VMs of your choice
  • 6.
    6 | Core featuresof Xen for safety certification • Microkernel architecture • Small code size (less than 50K LOC on Arm) • Dom0less • Static partitioning • Each domain has direct hardware access (IOMMU protected) • Real-time • Strong isolation between the domains • Real-time isolation • Failure isolation • By default, each component has only enough privilege to do what it needs to do o Parallel boot • Dom0 becomes optional • Faster time to boot/service • Real-time with the null scheduler • Cache isolation with cache coloring Xen SafeOS e.g. Zephyr QM e.g. Linux HW HW partition HW partition create create access access QM == Non-Safe
  • 7.
    7 | Xen SafetyProgress • Xen MISRA C compliance • MISRA C: coding guidelines for safe C programming • Goal: improve the Xen codebase • MISRA C compliance is never at the expense of quality • Requirements, Test Cases, Tests • Define scope and requirement's structure • “market”, “product” and “software safety” requirements • Traceability • Link requirements, tests, and code
  • 8.
    8 | Xen SafetyProgress: MISRA C • Preliminary tailoring resulted in the selection of 143 MISRA C rule candidates • MISRA C rules adoption in progress: • 135 discussed among maintainers and Bugseng experts • 114 rules adopted and added to docs/misra/rules.rst • Only 6 rules left to discuss! • Rules added to docs/misra/rules.rst • Xen 4.18 release: 148 commits to fix MISRA C violations by • MISRA C unjustified violations down from 2 million to 90,000! • ECLAIR MISRA C scanner integrated in the upstream Xen Gitlab CI-loop • 76 rules checked with zero unjustified violations (“clean” and checked against regressions) • 9 more rules are also clean on arm64 only and 1 rules on x86_64 only • 21 additional rules will also be checked against regressions (some violations are present)
  • 9.
    9 | Xen SafetyProgress: Requirements • Derived directly from the technical safety requirements allocated to software or are requirements for safety functions and properties that, if not fulfilled, could lead to a violation of the technical safety requirements allocated to software • 400 requirements: • Market Requirements (or L1 reqs) • Product Requirements (or L2 reqs) • Software Safety Requirements (or L3 reqs) Market requirements Product requirements Software safety requirements Test case Test code Test job M to N M to N 1 to N N to 1 1 to 1
  • 10.
    10 | Why writingrequirements • Before developing a new feature, we need to answer three question :- o What the feature is o Why is the new feature required o How is it designed/implemented • Currently o Why/What/How are explained in the commit message o Optionally, there may be a design note explaining how • With requirements being written as a separate entity o Why and What is decoupled from the code. Easy to view the big picture o How can still be addressed in the commit message or design note. o Linking "Why --> What --> How" ?? Commit message, design notes, documentation Rationale (explains why) What the feature is How the feature is being implemented by Xen
  • 11.
    11 | Market Requirements •Identify the scope of the safety certification for Xen • Defines the expectations of Xen for automotive and embedded use cases o So, this is mostly of interest to the product marketing or FAE folks who have expectations from a hypervisor. • Written with a high-level view of the system • Example: Name Description Static VM definition Xen shall specify the resources required to boot and run safe and non-safe VMs. Run Arm64 and AMD-x86 VMs Xen shall run Arm64 and AMD x86 VMs. VM device assignment Xen shall be able to assign devices to each VM. For e.g.: it should be able to assign GPU to VM1, MMC to VM2. Only the VM assigned to a device shall have exclusive access to the device.
  • 12.
    12 | 1 MarketRequirement à N Product Requirements • Product Requirements explain how Market Requirements are fulfilled by Xen • Product Requirements are Xen specific o So, this of interest to Xen architects who understand how the requirements are fulfilled by Xen. • Still written with a high-level view of the system • Product Requirements can sometimes be linked to more than one Market Requirement Emulated UART
  • 13.
    13 | 1 ProductRequirements à N Software Safety Requirements Domain shall be able to read the frequency of the system counter (either via CNTFRQ_EL0 register or "clock-frequency" device tree property if present). Access virtual timer from a domain Trigger the physical timer interrupt from a domain Trigger the virtual timer interrupt from a domain
  • 14.
    14 | Characteristics ofSoftware Safety Requirements • The requirements are written in plain English, from the perspective of what Xen is expected to fulfil • The software safety requirements (SSR) are the most granular form • Engineers are expected to refer to a SSR (and architecture spec) to write a test to validate it o This is of interest to the subsystem maintainers or folks working on specific parts of Xen (eg generic timer). • Each SSR should be tested independently • SSR should be unambiguous, complete, consistent, correct • SSR should be traceable all the way to market requirements
  • 15.
    15 | Organizing thesoftware safety requirements Booting Domain Creation and Runtime •Domain Creation •Domain Fully Emulated Resources •Domain Partially Emulated Resources •Hypercalls •Physical Resources Firmware Physical resources • Xen shall be able to create a domain using a specified kernel image. • Domain shall be able to transmit data in polling mode (i.e. without involving interrupts). (Emulated UART) • Domain shall be able to access the counter-timer kernel control register to allow controlling the access to the timer from userspace (EL0). (Generic Timer) • Domain shall be able to access __HYPERVISOR_xen_version passing XENVER_version as a command. • Xen shall validate the presence of mandatory SMMU features …. • Xen shall be able to configure and use HPET timer in one shot mode. • Xen shall be able to receive HPET interrupt. • Xen shall be able to invoke SMCCC_VERSION as a parameter to PSCI_FEATURES to obtain the SMCCC version. • Xen shall invoke psci (PSCI_FEATURES) to obtain the features. • Xen shall enable Memory Management Unit. • Xen shall use 4KB page granularity. • Xen shall enable instruction cache. Still something is missing ??
  • 16.
    16 | Assumption ofUse Hardware Firmware Bootloader Domain Compiler Xen relies on them to fulfill its functionality • GCC version should be 5.1 or later (arm64) • GCC version should be 4.1.2_20070115 or later (x86) • Xen shall be loaded at Non-Secure EL2 exception level. • Bootloader shall pass physical address of the host device tree in x0 register. • The hardware needs to have the ARM Generic Interrupt Controller, version 3 • The hardware needs to have the Arm System Memory Management Unit, version 3 onwards • Domain should not write access GICD_ISACTIVER<n> registers • Domain should not use physical LPIs without ITS • CNTFRQ_EL0 needs to be programmed with the system timer frequency. Or the "clock-frequency" dt property should be used. • TF-A shall provide PSCI api (PSCI_VERSION) to read the version.
  • 17.
    17 | Software ArchitectureSpecification • Defines the major elements and subsystems of the software, how they are interconnected, and how the required attributes, particularly safety integrity, will be achieved • Defines the overall behaviour of the software, and how software elements interface and interact • Satisfies both safety and non-safety requirements • Initial version of the document written
  • 18.
    18 | Software ValidationTest Cases • Validation - "confirmation by examination and provision of objective evidence that the particular requirements for a specific intended use are fulfilled" (IEC 61508) • 120 test cases (written as RST docs): • Define: § test objectives § test prerequisites § steps required to achieve the objectives § test pass/fail criteria Methods ASIL A B C D 1a Analysis of requirements ++ ++ ++ ++ 1b Generation and analysis of equivalence classes + ++ ++ ++ 1c Analysis of boundary values + + ++ ++ 1d Error guessing based on knowledge of experience + + ++ ++ 1e Analysis of functional dependencies + + ++ ++ 1f Analysis of operational use cases + ++ ++ ++
  • 19.
    19 | Software ValidationTests • 160 tests • Verification criteria: o Compliance with software design spec o Correct implementation of the functionality o Robustness check o Absence of unintended functionality • 3 types of tests: o Linux § designed for high-level functionality testing through a black- box approach § tests written as userspace apps, kernel modules § example: device tree parsing testing o Zephyr § target mid-level functionalities with a focus on components such as UART, Timer, etc. § tests written as Zephyr applications § example: emulated UART testing o XTF § designed for low-level functionality suited for examining the core functionalities such as hypercalls § tests written as XTF guests § example: event channel hypercall testing Methods ASIL A B C D 1a Requirements based test ++ ++ ++ ++ 1b Interface test ++ ++ ++ ++ 1c Fault injection test + + ++ ++ 1d Resource usage evaluation ++ ++ ++ ++ 1e Back-to-back comparison test between model and code, if applicable + + ++ ++ 1f Verification of the control flow and data flow + + ++ ++ 1g Static code analysis ++ ++ ++ ++ 1h Static analysis based on abstract interpretation + + + +
  • 20.
    20 | Xen SafetyProgress: Traceability • OFT (OpenFastTrace) selected as a requirements management tool • Requirements-as-Code methodology • Detect missing dependencies (missing links) • Detect obsolete requirements (old versions) • Generate traceability reports
  • 21.
    21 | • Keepdoing the stringent code reviews • Writing and Upstreaming requirements • Writing and Upstreaming architecture specs Community Collaboration How can the community participate How does the community benefit • Better code quality • Ease of onboarding new engineers • Easier to explain to customers, FAE, management
  • 22.