SlideShare a Scribd company logo
Xen in Safety-Critical Systems
Stefano Stabellini & Bertrand Marquis
Critical Summit NA 2022
What is safety?
• “Safety critical embedded software applications are developed for systems whose
failures contribute to hazards in the system for safety of life”
• Safety certifications (ISO 26262)
• Strict coding guidelines (MISRA C)
• Strict testing and documentation requirements
Why Xen matters for safety
• It is common to have a mix of critical and non-critical
components (mixed-criticality)
• Xen has been enabling mixed-criticality workloads for years
• Componentization
• Highly secure environments
• Isolate critical apps from non-critical apps
• Separate work environment from Personal environment on a laptop
e.g. Qubes OS and OpenXT
• Xen as static partitioning tool for embedded
• from industrial to medical and automotive
• the real-time (critical) domain must be isolated from the non-critical
• the real-time domain cannot miss any deadlines
• Safety-critical systems are mixed-criticality systems
Xen
Critical
Non-
Critical
Non-
Critical
Why Xen is a good fit for safety
• Small codebase (less than 50K LOC)
• Micro-kernel architecture
• only the hypervisor requires privileges
• dom0 is only optional now
• Xen supports disaggregation and driver domains: large amounts of code run unprivileged
• No need for a “dom0” privileged environment
• Linux is not required
• Supports real-time and cache coloring
• Thorough review process
• Thorough security process
Xen
Zephyr Linux
Example Industrial
• Xen static partitioning configuration with
dom0less
• 2 domains with hardware directly assigned
• no dom0
• 1 Linux VM, networking and cloud
• 1 Zephyr VM, motor controller application
with real-time requirements
Xen
Linux
networking, cloud APIs
Zephyr
real-time
controller
Example Automotive
• Xen static partitioning configuration with dom0less and also dom0 for monitoring
• 4 domains created statically at boot time
• 1 minimal dom0 VM (Zephyr) for system monitoring
• 1 Linux VM, infotainment
• 1 Zephyr VM, real-time sensor processing
• 1 Instrument Cluster VM
Xen
Zephyr
mini-Dom0
Linux
infotainment
Instrument Cluster
Zephyr
real-time
sensors
Real-time and Xen
• What is Real-time ?
• Real time is not fast
• I will answer on average in 5ms is not real-time
• Real time is a guarantee on the maximum time to respond
• I will give a response to an event in no more than 100ms
• Why does it matter in a safety context ?
• Safety usually equals time constraints
• If I detect a wall, stop before the wall
• How long to action the breaks
• If longer ….
0
10
20
30
40
50
60
70
80
<
1
m
s
1
-
5
m
s
5
-
9
5
m
s
9
5
-
1
0
0
m
s
>
1
0
0
m
s
My system
Real-time and Xen
• What is Real-time ?
• Real time is not fast
• I will answer on average in 5ms is not real-time
• Real time is a guarantee on the maximum time to respond
• I will give a response to an event in no more than 100ms
• Why does it matter in a safety context ?
• Safety usually equals time constraints
• If I detect a wall, stop before the wall
• How long to action the breaks
• If longer ….
• In safety software: Worst Case Execution Time (WCET)
• By demonstration, not by test
• Usually, the WCET is a case impossible to trigger by test
• For Xen several subjects are being investigated
Real-time - Interrupts
• Interrupt latency
• Maximum time until a guest receives an irq
• Depend on time required by guest
• Time depend on hardware
• Context of the analysis
• Arm64
• Guest alone on his core
• Zephyr as guest
• Timer interrupt
• Procedure
• Code analysis/inspection
• Confirm with tracing on a real target
Zephyr
Xen
Timer irq handler
Forward irq
Timer irq
Real-time - Interrupts
• Overall result: 1090 instructions
• Save guest context (cpu and irq controller): 356
• Xen irq handler: 144
• Xen virtual timer: 360
• Xen exit irq handler: 97
• Restore guest: 133
• Assumption and limitations
• No hypercall from real time guest
• No interaction with guests on other cores
• After Xen init phase
• Fix configuration (guest, communication, …etc)
Zephyr
Xen
Timer irq handler
Enter irq handler
Timer irq
Save guest ctxt
Restore guest ctxt
Exit irq handler
Virtual timer
Real-time - Interrupts
• Issues and future work
• IPI interrupts and Xen RCU tasks
• Limited to guest isolated on its own core
• Wfi/wfe handling disabled (power consumption)
• No PV driver
• No hypercalls in real-time guest
• Status: Full analysis is public, link
Real-time – MPU support
• MMU hard to use for real-time
• TLB miss -> page table walk
• Influence of other cores (TLB sync)
• Influence of other guests (TLB miss)
MMU
L1 L2 L3
TLB
VA PA
MPU
REGs
VA PA
Real-time – MPU support
• MMU hard to use for real-time
• TLB miss -> page table walk
• Influence of other cores (TLB sync)
• Influence of other guests (TLB miss)
• MPU
• No translation (1 to 1 mapping)
• Register based (no page tables)
• No cache effect
Real-time and Xen – MPU support
• Arm Cortex R (Cortex R82)
• Both MMU and MPU
• EL2 (Xen): MPU
• For Xen
• Control Guest allowed memory
• EL1 (RTOS): MPU
• Real time
• EL1 (Linux): MMU
• Not real-time
• Cohabitation of MPU and MMU guests
• Xen and RTOS real-time
• Linux or other non-real-time OS running on same system
• Status: Proof of concept available
• Upstream in Xen ongoing
Linux
Zephyr
Xen
Real-time – Cache coloring
L2
Core
1
Core
2
Core
3
Core
4
DDR
L1 L1 L1 L1
Real-time – Cache coloring
L2
Core
1
Core
2
Core
3
Core
4
DDR
L1 L1 L1 L1
Real-time – Cache coloring
• CPUs clusters often share L2 cache
• Interference via L2 cache affects performance
• App0 running on CPU0 can cause cache entries evictions, which
affect App1 running on CPU1
• App1 running on CPU1 could miss a deadline due to App0’s
behavior
• It can happen between Apps running on the same OS & between
VMs on the same hypervisor
• Hypervisor Solution: Cache Partitioning,
AKA Cache Coloring
• Each VM gets its own allocation of cache entries
• No shared cache entries between VMs
• Allows real-time apps to run with deterministic IRQ latency
• 3us IRQ latency
Core
1
Core
2
Core
3
Core
4
L2
D
D
R
Static configuration with Xen
• What is it ?
• Defining the whole system (guests and communication) statically
• Why does it maters in a safety context ?
• No random behaviour
• Same after reboot (target, task, guest, …etc)
• Example: application on same core at same address in memory
• Reduce amount of testing
• Limit possibilities
• Example: only used functions, compile out the rest
• No dynamic behaviour
• Limit complexity
• Example: allocation on boot or static, no free
• Conclusion: reduce certification costs
Static configuration – dom0less
• Define the system during design phase
• How much guests
• What characteristics
• memory, device access, cpus
• Create them directly on boot
• Defined in configuration (device-tree)
• Advantages for safety
• No need for a complex dom0
• No dependency to Linux (not certifiable)
• Faster boot
• Guests start directly on boot
• Reduce system complexity
• No dynamic guest creation
• Status: available
Linux
Zephyr
Xen
Device tree:
domU1 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xen,domain";
memory = <0 0x20000>;
cpus = <1>;
vpl011;
module@2000000 {
compatible = "multiboot,kernel",
"multiboot,module";
reg = <0x2000000 0xffffff>;
bootargs = "console=ttyAMA0";
};
};
Static configuration – memory
• Define the address and size of memories
• For guests memory
• For Xen heap
• Internal Xen allocation
• For Xen guest heap
• Xen allocation related to a guest
• Defined in configuration (device-tree)
• Advantages for safety
• System or guest identical upon reboot
• Reduce possible interferences
• A guest cannot impact another
• Adding a guest in the future
• Current guests unchanged
• Incremental certification
• Status: available in next Xen release
Linux
Zephyr
Xen
Memory
Zephyr Linux
Device tree:
domU1 {
compatible = "xen,domain";
#address-cells = <0x2>;
#size-cells = <0x2>;
cpus = <2>;
memory = <0x0 0x80000>;
#xen,static-mem-address-cells = <0x1>;
#xen,static-mem-size-cells = <0x1>;
xen,static-mem = <0x30000000 0x20000000>;
...
};
Static configuration – communication
Linux
Zephyr
Xen
Memory
SHM
Event
Device tree:
shared-mem@10000000 {
compatible = "xen,domain-shared-memory-v1";
role = "owner";
xen,shm-id = <0x0>;
xen,shared-mem = <0x10000000 0x10000000 0x10000000>;
};
domU1 {
….
domU1-shared-mem@10000000 {
compatible = "xen,domain-shared-memory-v1";
role = "borrower";
xen,shm-id = <0x0>;
xen,shared-mem = <0x10000000 0x10000000 0x50000000>;
};
….
};
• Xenbus is too complex for safety
• Need Dom0 or Linux
• Drivers are complex
• One guest access another guest memory
• Several components for safety (not only)
• Static shared memory
• Area of memory accessible by several guests
• Defined in configuration (device-tree)
• Static event channels
• Solution to ping between 2 guests
• Defined in configuration (device-tree)
• Any protocol possible to build on top
• Status: Available in next Xen release
• Linux support
• Zephyr support
Static configuration - cpupools
• Define which core(s) are useable by who
• Xen cpupool: a pool with cores
• One or several cores
• Scheduler to use
• A guest can be assigned to a cpupool
• Several guests can be in one cpupool
• Scheduler independent between cpupools
• A core can only be assigned to one cpupool
• Defined in configuration (device-tree)
• Advantages for safety
• Static core assignment
• Isolation between guests
• Scheduler per cpupool
• Status: available in next Xen release
Linux1
Zephyr
Xen
Linux2
Pool-1
Pool-0
Device tree:
cp0:cpupool0 {
compatible = "xen,cpupool";
cpupool-cpus = <&a72_0 &a72_1>;
cpupool-sched = "credit2";
};
domU1 {
#address-cells = <1>;
#size-cells = <1>;
compatible = "xen,domain";
memory = <0 0x20000>;
cpus = <1>;
domain-cpupool = <&cp0>;
…
};
Safety Certifications Activities
• Xen can be already safety-certified, but at what cost?
• It has been done already
• Safety experts have analyzed the code and deemed it safety-certifiable
• Require significant downstream work
• GOAL: make it easier for users to deploy Xen in safety environments
• "safety-certifiable", not safety-certified
• users can fill the gaps
• we can be flexible: it is OK to decide not to follow certain rules
• let's focus on what we do best: robustness of the code
• Clarity: What does Xen support? What's missing?
• Users should be able to estimate precisely the work required
Code First
• Robustness and Safety of the code
• Code is Xen Project's primary output
• The most important item for safety-certifications is robustness/safety of the codebase
• Documentation, requirements, and tests can be more easily outsourced
• Main code safety aspects:
• Coding style and MISRA C rules
• Determinism: deterministic IRQ handling and memory allocations
• Enhanced Kconfig for a smaller codebase (less to certify)
• Why MISRA C?
• A de facto standard in all industry sectors
• Maintained and backed up by an authoritative organization (MISRA consortium)
• A pragmatic approach and a perfect match for Xen: MISRA documents clearly state that code quality
should never be sacrificed for compliance (deviation process)
MISRA C: status
• MISRA C Tailoring completed: ~100 rules considered relevant for Xen
• MISRA C Rules adoption in progress ~15/100 rules
• Xen is actually already following many MISRA C rules, just not officially
• Add Xen Rules we already follow to CODING_STYLE
• Discuss the others
• Decide we follow a rule, add it to CODING_STYLE, check for it using MISRA C scanners
• Decide we follow a rule with deviations
• deviations are intentional and documented exceptions to the rule
• document deviations with in-code or out-of-code comments so that MISRA C
scanners will “ignore” them
• check for the rule automatically using MISRA C scanners
• Not follow the rule and not scan for it
• cannot be scanned automatically by static analyzers
MISRA C: status
• Benefits:
• Static code analyzers available to check for the rules, from ECLAIR to cppcheck
• Check individual patches in advance before review even start
• Ease code reviews & reduce maintainers work
• Improve code quality
• Reduce defects
• Working with Roberto Bagnara and Bugseng ECLAIR
• Improve existing coding style and coding conventions in Xen
• Improve safety of the code
• Improve code security – defensive programming
• Widen compilers compatibility
• Ensure we do not violate the C99 standard
• Ensure we do not unknowingly use language extensions that may not be available in
other compilers
Tooling
• CPPCheck
• Available to any developer without license
• Good for pre-submission checks
• Open Source
• Good coverage but not 100%
• ECLAIR
• 100% coverage of MISRA C:2012 with very high accuracy
• Automatically adapts to the toolchain to capture all implementation-defined aspects of the
language
• Very detailed and actionable reports
• Made publicly available by BUGSENG at http://eclairit.com
ECLAIR
• Enter the system by clicking “See ECLAIR in action”
Future Work
• Deterministic interrupt handling code path
• Deterministic memory allocations at system boot
• Further reducing code size via Kconfig
• Complete MISRA C rules adoption and fixing violations
• Documentation
• Testing
• Xen Testing Framework “XTF”
• Gitlab-CI
Questions?

More Related Content

What's hot

Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
The Linux Foundation
 

What's hot (20)

Rootlinux17: An introduction to Xen Project Virtualisation
Rootlinux17:  An introduction to Xen Project VirtualisationRootlinux17:  An introduction to Xen Project Virtualisation
Rootlinux17: An introduction to Xen Project Virtualisation
 
OSSNA18: Xen Beginners Training
OSSNA18: Xen Beginners Training OSSNA18: Xen Beginners Training
OSSNA18: Xen Beginners Training
 
ALSS14: Xen Project Automotive Hypervisor (Demo)
ALSS14: Xen Project Automotive Hypervisor (Demo)ALSS14: Xen Project Automotive Hypervisor (Demo)
ALSS14: Xen Project Automotive Hypervisor (Demo)
 
Dom0less - Xen Developer Summit 2019
Dom0less  - Xen Developer Summit 2019Dom0less  - Xen Developer Summit 2019
Dom0less - Xen Developer Summit 2019
 
Fosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using VirtualizationFosdem 18: Securing embedded Systems using Virtualization
Fosdem 18: Securing embedded Systems using Virtualization
 
Xen Memory Management
Xen Memory ManagementXen Memory Management
Xen Memory Management
 
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
XPDS13: Xen in OSS based In–Vehicle Infotainment Systems - Artem Mygaiev, Glo...
 
XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen...
XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen...XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen...
XPDDS18: Design and Implementation of Automotive: Virtualization Based on Xen...
 
RunX: deploy real-time OSes as containers at the edge
RunX: deploy real-time OSes as containers at the edgeRunX: deploy real-time OSes as containers at the edge
RunX: deploy real-time OSes as containers at the edge
 
Cache coloring Xen Summit 2020
Cache coloring Xen Summit 2020Cache coloring Xen Summit 2020
Cache coloring Xen Summit 2020
 
BKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack UpdateBKK16-315 Graphics Stack Update
BKK16-315 Graphics Stack Update
 
Xen Debugging
Xen DebuggingXen Debugging
Xen Debugging
 
Safety-Certifying Open Source Software: The Case of the Xen Hypervisor
Safety-Certifying Open Source Software: The Case of the Xen HypervisorSafety-Certifying Open Source Software: The Case of the Xen Hypervisor
Safety-Certifying Open Source Software: The Case of the Xen Hypervisor
 
05.2 virtio introduction
05.2 virtio introduction05.2 virtio introduction
05.2 virtio introduction
 
GPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive Solutions
 
Understanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panicUnderstanding a kernel oops and a kernel panic
Understanding a kernel oops and a kernel panic
 
ALSF13: Xen on ARM - Virtualization for the Automotive Industry - Stefano Sta...
ALSF13: Xen on ARM - Virtualization for the Automotive Industry - Stefano Sta...ALSF13: Xen on ARM - Virtualization for the Automotive Industry - Stefano Sta...
ALSF13: Xen on ARM - Virtualization for the Automotive Industry - Stefano Sta...
 
Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?Reconnaissance of Virtio: What’s new and how it’s all connected?
Reconnaissance of Virtio: What’s new and how it’s all connected?
 
Linux Kernel Module - For NLKB
Linux Kernel Module - For NLKBLinux Kernel Module - For NLKB
Linux Kernel Module - For NLKB
 
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARMXPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
XPDS16: Porting Xen on ARM to a new SOC - Julien Grall, ARM
 

Similar to Xen in Safety-Critical Systems - Critical Summit 2022

Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
The Linux Foundation
 
XenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in XenXenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in Xen
The Linux Foundation
 
Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of Virtualization
Susheel Thakur
 
Kernel Mode Threats and Practical Defenses
Kernel Mode Threats and Practical DefensesKernel Mode Threats and Practical Defenses
Kernel Mode Threats and Practical Defenses
Priyanka Aash
 

Similar to Xen in Safety-Critical Systems - Critical Summit 2022 (20)

RHEL5 XEN HandOnTraining_v0.4.pdf
RHEL5 XEN HandOnTraining_v0.4.pdfRHEL5 XEN HandOnTraining_v0.4.pdf
RHEL5 XEN HandOnTraining_v0.4.pdf
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCPOscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
Oscon 2012 : From Datacenter to the Cloud - Featuring Xen and XCP
 
XenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in XenXenTT: Deterministic Systems Analysis in Xen
XenTT: Deterministic Systems Analysis in Xen
 
Virtualization: A Case Study from the IT Trenches - Darren Schoen, Broward Ce...
Virtualization: A Case Study from the IT Trenches - Darren Schoen, Broward Ce...Virtualization: A Case Study from the IT Trenches - Darren Schoen, Broward Ce...
Virtualization: A Case Study from the IT Trenches - Darren Schoen, Broward Ce...
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018Platform Security Summit 18: Xen Security Weather Report 2018
Platform Security Summit 18: Xen Security Weather Report 2018
 
Xen Community Update 2011
Xen Community Update 2011Xen Community Update 2011
Xen Community Update 2011
 
Containers 101
Containers 101Containers 101
Containers 101
 
Xen revisited
Xen revisitedXen revisited
Xen revisited
 
Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)Deployment Strategies (Mongo Austin)
Deployment Strategies (Mongo Austin)
 
Bare-Metal Hypervisor as a Platform for Innovation
Bare-Metal Hypervisor as a Platform for InnovationBare-Metal Hypervisor as a Platform for Innovation
Bare-Metal Hypervisor as a Platform for Innovation
 
Workshop: XenClient Serve & Manage your road warriors with local virtual desktop
Workshop: XenClient Serve & Manage your road warriors with local virtual desktopWorkshop: XenClient Serve & Manage your road warriors with local virtual desktop
Workshop: XenClient Serve & Manage your road warriors with local virtual desktop
 
IITCC15: The Bare-Metal Hypervisor as a Platform for Innovation
IITCC15: The Bare-Metal Hypervisor as a Platform for InnovationIITCC15: The Bare-Metal Hypervisor as a Platform for Innovation
IITCC15: The Bare-Metal Hypervisor as a Platform for Innovation
 
Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13Xen: Hypervisor for the Cloud - CCC13
Xen: Hypervisor for the Cloud - CCC13
 
17-virtualization.pptx
17-virtualization.pptx17-virtualization.pptx
17-virtualization.pptx
 
Xen and the Art of Virtualization
Xen and the Art of VirtualizationXen and the Art of Virtualization
Xen and the Art of Virtualization
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
 
Kernel Mode Threats and Practical Defenses
Kernel Mode Threats and Practical DefensesKernel Mode Threats and Practical Defenses
Kernel Mode Threats and Practical Defenses
 
Xen arm
Xen armXen arm
Xen arm
 

More from Stefano Stabellini

More from Stefano Stabellini (7)

RunX ELCE 2020
RunX ELCE 2020RunX ELCE 2020
RunX ELCE 2020
 
System Device Tree update: Bus Firewalls and Lopper
System Device Tree update: Bus Firewalls and LopperSystem Device Tree update: Bus Firewalls and Lopper
System Device Tree update: Bus Firewalls and Lopper
 
Xen Cache Coloring: Interference-Free Real-Time System
Xen Cache Coloring: Interference-Free Real-Time SystemXen Cache Coloring: Interference-Free Real-Time System
Xen Cache Coloring: Interference-Free Real-Time System
 
Xen Project for ARM Servers
Xen Project for ARM ServersXen Project for ARM Servers
Xen Project for ARM Servers
 
Xen and OpenStack
Xen and OpenStackXen and OpenStack
Xen and OpenStack
 
XDS15: Project Raisin
XDS15: Project RaisinXDS15: Project Raisin
XDS15: Project Raisin
 
OpenStack and Xen
OpenStack and XenOpenStack and Xen
OpenStack and Xen
 

Recently uploaded

Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 

Recently uploaded (20)

OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Software Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdfSoftware Testing Exam imp Ques Notes.pdf
Software Testing Exam imp Ques Notes.pdf
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
Multiple Your Crypto Portfolio with the Innovative Features of Advanced Crypt...
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.ILBeyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
Beyond Event Sourcing - Embracing CRUD for Wix Platform - Java.IL
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024Globus Connect Server Deep Dive - GlobusWorld 2024
Globus Connect Server Deep Dive - GlobusWorld 2024
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
SOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBrokerSOCRadar Research Team: Latest Activities of IntelBroker
SOCRadar Research Team: Latest Activities of IntelBroker
 

Xen in Safety-Critical Systems - Critical Summit 2022

  • 1. Xen in Safety-Critical Systems Stefano Stabellini & Bertrand Marquis Critical Summit NA 2022
  • 2. What is safety? • “Safety critical embedded software applications are developed for systems whose failures contribute to hazards in the system for safety of life” • Safety certifications (ISO 26262) • Strict coding guidelines (MISRA C) • Strict testing and documentation requirements
  • 3. Why Xen matters for safety • It is common to have a mix of critical and non-critical components (mixed-criticality) • Xen has been enabling mixed-criticality workloads for years • Componentization • Highly secure environments • Isolate critical apps from non-critical apps • Separate work environment from Personal environment on a laptop e.g. Qubes OS and OpenXT • Xen as static partitioning tool for embedded • from industrial to medical and automotive • the real-time (critical) domain must be isolated from the non-critical • the real-time domain cannot miss any deadlines • Safety-critical systems are mixed-criticality systems Xen Critical Non- Critical Non- Critical
  • 4. Why Xen is a good fit for safety • Small codebase (less than 50K LOC) • Micro-kernel architecture • only the hypervisor requires privileges • dom0 is only optional now • Xen supports disaggregation and driver domains: large amounts of code run unprivileged • No need for a “dom0” privileged environment • Linux is not required • Supports real-time and cache coloring • Thorough review process • Thorough security process Xen Zephyr Linux
  • 5. Example Industrial • Xen static partitioning configuration with dom0less • 2 domains with hardware directly assigned • no dom0 • 1 Linux VM, networking and cloud • 1 Zephyr VM, motor controller application with real-time requirements Xen Linux networking, cloud APIs Zephyr real-time controller
  • 6. Example Automotive • Xen static partitioning configuration with dom0less and also dom0 for monitoring • 4 domains created statically at boot time • 1 minimal dom0 VM (Zephyr) for system monitoring • 1 Linux VM, infotainment • 1 Zephyr VM, real-time sensor processing • 1 Instrument Cluster VM Xen Zephyr mini-Dom0 Linux infotainment Instrument Cluster Zephyr real-time sensors
  • 7. Real-time and Xen • What is Real-time ? • Real time is not fast • I will answer on average in 5ms is not real-time • Real time is a guarantee on the maximum time to respond • I will give a response to an event in no more than 100ms • Why does it matter in a safety context ? • Safety usually equals time constraints • If I detect a wall, stop before the wall • How long to action the breaks • If longer …. 0 10 20 30 40 50 60 70 80 < 1 m s 1 - 5 m s 5 - 9 5 m s 9 5 - 1 0 0 m s > 1 0 0 m s My system
  • 8.
  • 9. Real-time and Xen • What is Real-time ? • Real time is not fast • I will answer on average in 5ms is not real-time • Real time is a guarantee on the maximum time to respond • I will give a response to an event in no more than 100ms • Why does it matter in a safety context ? • Safety usually equals time constraints • If I detect a wall, stop before the wall • How long to action the breaks • If longer …. • In safety software: Worst Case Execution Time (WCET) • By demonstration, not by test • Usually, the WCET is a case impossible to trigger by test • For Xen several subjects are being investigated
  • 10. Real-time - Interrupts • Interrupt latency • Maximum time until a guest receives an irq • Depend on time required by guest • Time depend on hardware • Context of the analysis • Arm64 • Guest alone on his core • Zephyr as guest • Timer interrupt • Procedure • Code analysis/inspection • Confirm with tracing on a real target Zephyr Xen Timer irq handler Forward irq Timer irq
  • 11. Real-time - Interrupts • Overall result: 1090 instructions • Save guest context (cpu and irq controller): 356 • Xen irq handler: 144 • Xen virtual timer: 360 • Xen exit irq handler: 97 • Restore guest: 133 • Assumption and limitations • No hypercall from real time guest • No interaction with guests on other cores • After Xen init phase • Fix configuration (guest, communication, …etc) Zephyr Xen Timer irq handler Enter irq handler Timer irq Save guest ctxt Restore guest ctxt Exit irq handler Virtual timer
  • 12. Real-time - Interrupts • Issues and future work • IPI interrupts and Xen RCU tasks • Limited to guest isolated on its own core • Wfi/wfe handling disabled (power consumption) • No PV driver • No hypercalls in real-time guest • Status: Full analysis is public, link
  • 13. Real-time – MPU support • MMU hard to use for real-time • TLB miss -> page table walk • Influence of other cores (TLB sync) • Influence of other guests (TLB miss) MMU L1 L2 L3 TLB VA PA
  • 14. MPU REGs VA PA Real-time – MPU support • MMU hard to use for real-time • TLB miss -> page table walk • Influence of other cores (TLB sync) • Influence of other guests (TLB miss) • MPU • No translation (1 to 1 mapping) • Register based (no page tables) • No cache effect
  • 15. Real-time and Xen – MPU support • Arm Cortex R (Cortex R82) • Both MMU and MPU • EL2 (Xen): MPU • For Xen • Control Guest allowed memory • EL1 (RTOS): MPU • Real time • EL1 (Linux): MMU • Not real-time • Cohabitation of MPU and MMU guests • Xen and RTOS real-time • Linux or other non-real-time OS running on same system • Status: Proof of concept available • Upstream in Xen ongoing Linux Zephyr Xen
  • 16. Real-time – Cache coloring L2 Core 1 Core 2 Core 3 Core 4 DDR L1 L1 L1 L1
  • 17. Real-time – Cache coloring L2 Core 1 Core 2 Core 3 Core 4 DDR L1 L1 L1 L1
  • 18. Real-time – Cache coloring • CPUs clusters often share L2 cache • Interference via L2 cache affects performance • App0 running on CPU0 can cause cache entries evictions, which affect App1 running on CPU1 • App1 running on CPU1 could miss a deadline due to App0’s behavior • It can happen between Apps running on the same OS & between VMs on the same hypervisor • Hypervisor Solution: Cache Partitioning, AKA Cache Coloring • Each VM gets its own allocation of cache entries • No shared cache entries between VMs • Allows real-time apps to run with deterministic IRQ latency • 3us IRQ latency Core 1 Core 2 Core 3 Core 4 L2 D D R
  • 19. Static configuration with Xen • What is it ? • Defining the whole system (guests and communication) statically • Why does it maters in a safety context ? • No random behaviour • Same after reboot (target, task, guest, …etc) • Example: application on same core at same address in memory • Reduce amount of testing • Limit possibilities • Example: only used functions, compile out the rest • No dynamic behaviour • Limit complexity • Example: allocation on boot or static, no free • Conclusion: reduce certification costs
  • 20. Static configuration – dom0less • Define the system during design phase • How much guests • What characteristics • memory, device access, cpus • Create them directly on boot • Defined in configuration (device-tree) • Advantages for safety • No need for a complex dom0 • No dependency to Linux (not certifiable) • Faster boot • Guests start directly on boot • Reduce system complexity • No dynamic guest creation • Status: available Linux Zephyr Xen Device tree: domU1 { #address-cells = <1>; #size-cells = <1>; compatible = "xen,domain"; memory = <0 0x20000>; cpus = <1>; vpl011; module@2000000 { compatible = "multiboot,kernel", "multiboot,module"; reg = <0x2000000 0xffffff>; bootargs = "console=ttyAMA0"; }; };
  • 21. Static configuration – memory • Define the address and size of memories • For guests memory • For Xen heap • Internal Xen allocation • For Xen guest heap • Xen allocation related to a guest • Defined in configuration (device-tree) • Advantages for safety • System or guest identical upon reboot • Reduce possible interferences • A guest cannot impact another • Adding a guest in the future • Current guests unchanged • Incremental certification • Status: available in next Xen release Linux Zephyr Xen Memory Zephyr Linux Device tree: domU1 { compatible = "xen,domain"; #address-cells = <0x2>; #size-cells = <0x2>; cpus = <2>; memory = <0x0 0x80000>; #xen,static-mem-address-cells = <0x1>; #xen,static-mem-size-cells = <0x1>; xen,static-mem = <0x30000000 0x20000000>; ... };
  • 22. Static configuration – communication Linux Zephyr Xen Memory SHM Event Device tree: shared-mem@10000000 { compatible = "xen,domain-shared-memory-v1"; role = "owner"; xen,shm-id = <0x0>; xen,shared-mem = <0x10000000 0x10000000 0x10000000>; }; domU1 { …. domU1-shared-mem@10000000 { compatible = "xen,domain-shared-memory-v1"; role = "borrower"; xen,shm-id = <0x0>; xen,shared-mem = <0x10000000 0x10000000 0x50000000>; }; …. }; • Xenbus is too complex for safety • Need Dom0 or Linux • Drivers are complex • One guest access another guest memory • Several components for safety (not only) • Static shared memory • Area of memory accessible by several guests • Defined in configuration (device-tree) • Static event channels • Solution to ping between 2 guests • Defined in configuration (device-tree) • Any protocol possible to build on top • Status: Available in next Xen release • Linux support • Zephyr support
  • 23. Static configuration - cpupools • Define which core(s) are useable by who • Xen cpupool: a pool with cores • One or several cores • Scheduler to use • A guest can be assigned to a cpupool • Several guests can be in one cpupool • Scheduler independent between cpupools • A core can only be assigned to one cpupool • Defined in configuration (device-tree) • Advantages for safety • Static core assignment • Isolation between guests • Scheduler per cpupool • Status: available in next Xen release Linux1 Zephyr Xen Linux2 Pool-1 Pool-0 Device tree: cp0:cpupool0 { compatible = "xen,cpupool"; cpupool-cpus = <&a72_0 &a72_1>; cpupool-sched = "credit2"; }; domU1 { #address-cells = <1>; #size-cells = <1>; compatible = "xen,domain"; memory = <0 0x20000>; cpus = <1>; domain-cpupool = <&cp0>; … };
  • 24. Safety Certifications Activities • Xen can be already safety-certified, but at what cost? • It has been done already • Safety experts have analyzed the code and deemed it safety-certifiable • Require significant downstream work • GOAL: make it easier for users to deploy Xen in safety environments • "safety-certifiable", not safety-certified • users can fill the gaps • we can be flexible: it is OK to decide not to follow certain rules • let's focus on what we do best: robustness of the code • Clarity: What does Xen support? What's missing? • Users should be able to estimate precisely the work required
  • 25. Code First • Robustness and Safety of the code • Code is Xen Project's primary output • The most important item for safety-certifications is robustness/safety of the codebase • Documentation, requirements, and tests can be more easily outsourced • Main code safety aspects: • Coding style and MISRA C rules • Determinism: deterministic IRQ handling and memory allocations • Enhanced Kconfig for a smaller codebase (less to certify) • Why MISRA C? • A de facto standard in all industry sectors • Maintained and backed up by an authoritative organization (MISRA consortium) • A pragmatic approach and a perfect match for Xen: MISRA documents clearly state that code quality should never be sacrificed for compliance (deviation process)
  • 26. MISRA C: status • MISRA C Tailoring completed: ~100 rules considered relevant for Xen • MISRA C Rules adoption in progress ~15/100 rules • Xen is actually already following many MISRA C rules, just not officially • Add Xen Rules we already follow to CODING_STYLE • Discuss the others • Decide we follow a rule, add it to CODING_STYLE, check for it using MISRA C scanners • Decide we follow a rule with deviations • deviations are intentional and documented exceptions to the rule • document deviations with in-code or out-of-code comments so that MISRA C scanners will “ignore” them • check for the rule automatically using MISRA C scanners • Not follow the rule and not scan for it • cannot be scanned automatically by static analyzers
  • 27. MISRA C: status • Benefits: • Static code analyzers available to check for the rules, from ECLAIR to cppcheck • Check individual patches in advance before review even start • Ease code reviews & reduce maintainers work • Improve code quality • Reduce defects • Working with Roberto Bagnara and Bugseng ECLAIR • Improve existing coding style and coding conventions in Xen • Improve safety of the code • Improve code security – defensive programming • Widen compilers compatibility • Ensure we do not violate the C99 standard • Ensure we do not unknowingly use language extensions that may not be available in other compilers
  • 28. Tooling • CPPCheck • Available to any developer without license • Good for pre-submission checks • Open Source • Good coverage but not 100% • ECLAIR • 100% coverage of MISRA C:2012 with very high accuracy • Automatically adapts to the toolchain to capture all implementation-defined aspects of the language • Very detailed and actionable reports • Made publicly available by BUGSENG at http://eclairit.com
  • 29. ECLAIR • Enter the system by clicking “See ECLAIR in action”
  • 30. Future Work • Deterministic interrupt handling code path • Deterministic memory allocations at system boot • Further reducing code size via Kconfig • Complete MISRA C rules adoption and fixing violations • Documentation • Testing • Xen Testing Framework “XTF” • Gitlab-CI