Shared coprocessor
framework
on ARM
Oleksandr Andrushchenko, Lead Software Engineer, EPAM Systems Inc.
Kyiv, Ukraine
Team of developers at EPAM Systems Inc., based in Kyiv, Ukraine
We are focused on:
• Xen on ARM
• Automotive use-cases
• Para-virtualized front drivers, backends and managers: sound, display, input
• SoC’s HW virtualization
• TEE integration
• Power management
• FuSa ISO 61508/26262 certification
• Yocto based build system for multi-domain distributions
We are upstreaming to Xen Project: see us at https://github.com/xen-troops
Introduction
In this talk
1. What is it about?
2. Why one would want to share a coprocessor?
3. Scheduling a virtual coprocessor
4. Configuration approaches
5. IOMMU support
6. Proprietary code and native applications
7. Virtual GPU
Rationale
● Not only CPU anymore, but SoC
○ GPUs, multimedia encoders, DSPs, FPGAs… you name it
○ Used to offload processing from CPUs to dedicated HW
● Good for one-OS-does-everything
● We have to isolate parts of the system
○ Split HW blocks between users (if HW allows that, e.g. display)
○ Choose which part uses real HW and which does SW emulation
○ Use para-virtual devices
Safety domain
In-vehicle infotainmentDomD
Split and para-virtualize
Dom0 Instrument cluster
Audio backend
SW GPU/Rendering
Sources: http://www.aa1car.com/library/instrument_cluster.htm;
https://www.xda-developers.com/panasonic-automotive-to-build-android-automotive-in-vehicle-infotainment-system-into-fiat-chrysler-vehicles/
GPU/Rendering Media encoders
Display HW Audio HW Graphics HW Encoders HW
Driver assistance
ADAS
Custom HW
Display backend
SW Audio
Xen
Shared coprocessors
● Why one would bother with sharing coprocessors?
○ Performance and complexity issues with para-virtual devices
■ Memory copying
■ Complex ABI (just imagine para-virtual OpenGL)
○ HW cannot be split
○ Different guests may need to run different FW/driver
○ Multiple domains may benefit from platform’s HW capabilities
● It is always a question what needs to be shared or
para-virtualized
With shared coprocessors
Safety domain
In-vehicle infotainmentDomDDom0 Instrument cluster
Audio backend
Pictures from: http://www.aa1car.com/library/instrument_cluster.htm
https://www.xda-developers.com/panasonic-automotive-to-build-android-automotive-in-vehicle-infotainment-system-into-fiat-chrysler-vehicles/
GPU Media encoders
Display Audio GPU Media encoders
Driver assistance
ADAS
DSP
Display backend
Xen vGPU vMEncodervDSP vFPGA
GPU
Media encoders
FPGA
Shared coprocessor
framework
(SCF)
Shared Coprocessor Framework
• SCF will simplify sharing a coprocessor
• Leave all the burden to the framework, focus on your coproc
• Make coproc support unified
• Benefit from framework bug-fixes and work others do,
contribute
(virtual)Coprocessor
resources
● Memory-mapped I/O ranges
● Interrupts
● Power - reset - clock management (PRCM)
● IOMMU context(s)
● Coprocessor own MMU if any
Scheduling or why not Xen’s vCPU scheduler
• Cannot use vCPU scheduler
• Not all HW allows context switch or it can be complex
• Guest may be inactive, but its tasks may still be processed by coproc
• Active guest may not use coproc at all, so let others utilize it
• IOMMU context switch may be needed for vcoproc
• Requirements for coproc scheduler
• Priority of a guest - mission critical tasks
• Coproc load/usage - time is not the best measure
Scheduling a vCoproc
● Round-robin at the first stage
● Can existing schedulers be used?
○ Null scheduler could be a match
○ Credit/Credit2 seem to need much work
○ Real-time schedulers
■ ARINC 653
■ Real-Time-Deferrable-Server (RTDS)
● Or we need to (re)implement the same for coproc?
● Do we need to be real-time? (mission critical, Audio/Video
use-cases)
Configuration
• Configure: MMIO ranges, interrupts, IOMMU etc.
• Need to configure both privileged and guest domains
• Privileged domains may not have configuration file, e.g. Dom0, but
DomD has
• Guest is configured with a configuration file
• Must be able to configure multiple vcoprocs per domain
• To allow coprocessor sharing within the same guest, running different
FW/Drivers, e.g. OpenGL concurrently with OpenCL for vGPU
Configuration
• Current implementation
• device tree bootargs to configure Dom0
• partial dtb + DomU configuration file (similar to ARM passthrough)
• partial dtb for DomU (with pdtb passed to XEN) was rejected after community
discussion
• How to pass variable structure data to Xen
• Device-tree, but no x86 support
• ACPI, but is it ARM ready yet?
• Introduce new ABI:
• Pass memory ranges, interrupts etc in a flexible way
• Have convertors for ACPI, DTB etc?
IOMMU support
• HW expects to see physically contiguous memory, e.g. for DMA
operations
• Guest cannot guarantee that, “bad” options are:
• 1:1 mapped guest
• If coproc has its own MMU - trap memory access and update MMU manually in SW
• Utilize IOMMU to overcome these problems with better performance:
• 1:1 is not required
• Better memory isolation - control coproc’s memory access
• Overcome 4GB limit for 32-bit DMA capable devices
• Switch handled by the framework
• No changes to existing FW/driver
• No changes to coproc Xen driver
Proprietary code
● There is always room for someone’s IP...
● Cannot disclose source/interface: NDA, incompatible license
● Need to move part of coproc’s code into a black box
● Options are being discussed (Volodymyr Babchuk will cover
in detail during the Summit):
○ Stubdom
○ EL0 applications
● Once decision is made it will be adopted by the framework
What is expected from a “native application”
• Latency is critical
• MMIO access
• IRQ handling
• System stability
• Recovery from misbehaving proprietary code
• Power and clock management
• Solution to legal problems
Next generation car
Picture from http://www.designhmi.com/2015/02/23/in-car-connectivity-and-iot-internet-of-things/
Virtual GPU
● One of the key components for automotive use-cases
○ Instrument cluster (IC)
○ Head-up display (HUD)
○ In-vehicle infotainment (IVI)
● Performance and stability are both critical:
○ Not only OpenGL/Vulkan, but OpenCL and more - different firmware at the same
time, even the same guest
○ IVI crash must not affect IC
vGPU status
● Proof-of-concept is limited, but working
○ Context switch via power off/on sequence of the GPU
○ IOMMU switch is done via
iommu_deassign_dt_device/iommu_assign_dt_device
○ Future work:
■ Avoid complete off/on sequences
■ Faster switch via context save
● Need proper integration with IOMMU
● Need decision on proprietary code placement
SCF status and open questions
• In progress
• Initial shared coprocessor framework design document is available
(needs update)
• Native application approaches are being discussed
• SCF configuration discussion started
• POC is available
• Not started
• Power - reset - clock management
• Need to control clocks and power
• What if external PMIC is used (HW interface, driver, which domain?)
What we are working on
Xen
Native EL0 apps / stub domains
Real time scheduling
Heterogenous big.LITTLE support
PMF (cpufreq, cpuidle, thermal, vcoprocpm)
SCF
IOMMUF & IPMMU support
SMC/HVC bridge
PV frontends
Xen apps
PM governor +SoC drivers
TEE manager +OP-TEE driver
GPU mediator +SGX driver
OP-TEE Mullti-domain support
Integration
Android HALs
Sound/Display managers
PV backends
Certification ISO 61508 path 3s
CI Build/release system
Resources
● https://github.com/xen-troops
○ Shared coprocessor framework
○ Para-virtual drivers and backends (generic backend library, display,
sound, multi-touch etc)
○ Multidomain Yocto-based build system (xt-distro)
● With your help we will upstream it all
Questions
Thank you!

XPDDS17: Keynote: Shared Coprocessor Framework on ARM - Oleksandr Andrushchenko, EPAM Systems

  • 2.
    Shared coprocessor framework on ARM OleksandrAndrushchenko, Lead Software Engineer, EPAM Systems Inc. Kyiv, Ukraine
  • 3.
    Team of developersat EPAM Systems Inc., based in Kyiv, Ukraine We are focused on: • Xen on ARM • Automotive use-cases • Para-virtualized front drivers, backends and managers: sound, display, input • SoC’s HW virtualization • TEE integration • Power management • FuSa ISO 61508/26262 certification • Yocto based build system for multi-domain distributions We are upstreaming to Xen Project: see us at https://github.com/xen-troops Introduction
  • 4.
    In this talk 1.What is it about? 2. Why one would want to share a coprocessor? 3. Scheduling a virtual coprocessor 4. Configuration approaches 5. IOMMU support 6. Proprietary code and native applications 7. Virtual GPU
  • 5.
    Rationale ● Not onlyCPU anymore, but SoC ○ GPUs, multimedia encoders, DSPs, FPGAs… you name it ○ Used to offload processing from CPUs to dedicated HW ● Good for one-OS-does-everything ● We have to isolate parts of the system ○ Split HW blocks between users (if HW allows that, e.g. display) ○ Choose which part uses real HW and which does SW emulation ○ Use para-virtual devices
  • 6.
    Safety domain In-vehicle infotainmentDomD Splitand para-virtualize Dom0 Instrument cluster Audio backend SW GPU/Rendering Sources: http://www.aa1car.com/library/instrument_cluster.htm; https://www.xda-developers.com/panasonic-automotive-to-build-android-automotive-in-vehicle-infotainment-system-into-fiat-chrysler-vehicles/ GPU/Rendering Media encoders Display HW Audio HW Graphics HW Encoders HW Driver assistance ADAS Custom HW Display backend SW Audio Xen
  • 7.
    Shared coprocessors ● Whyone would bother with sharing coprocessors? ○ Performance and complexity issues with para-virtual devices ■ Memory copying ■ Complex ABI (just imagine para-virtual OpenGL) ○ HW cannot be split ○ Different guests may need to run different FW/driver ○ Multiple domains may benefit from platform’s HW capabilities ● It is always a question what needs to be shared or para-virtualized
  • 8.
    With shared coprocessors Safetydomain In-vehicle infotainmentDomDDom0 Instrument cluster Audio backend Pictures from: http://www.aa1car.com/library/instrument_cluster.htm https://www.xda-developers.com/panasonic-automotive-to-build-android-automotive-in-vehicle-infotainment-system-into-fiat-chrysler-vehicles/ GPU Media encoders Display Audio GPU Media encoders Driver assistance ADAS DSP Display backend Xen vGPU vMEncodervDSP vFPGA GPU Media encoders FPGA
  • 9.
  • 10.
    Shared Coprocessor Framework •SCF will simplify sharing a coprocessor • Leave all the burden to the framework, focus on your coproc • Make coproc support unified • Benefit from framework bug-fixes and work others do, contribute
  • 11.
    (virtual)Coprocessor resources ● Memory-mapped I/Oranges ● Interrupts ● Power - reset - clock management (PRCM) ● IOMMU context(s) ● Coprocessor own MMU if any
  • 12.
    Scheduling or whynot Xen’s vCPU scheduler • Cannot use vCPU scheduler • Not all HW allows context switch or it can be complex • Guest may be inactive, but its tasks may still be processed by coproc • Active guest may not use coproc at all, so let others utilize it • IOMMU context switch may be needed for vcoproc • Requirements for coproc scheduler • Priority of a guest - mission critical tasks • Coproc load/usage - time is not the best measure
  • 13.
    Scheduling a vCoproc ●Round-robin at the first stage ● Can existing schedulers be used? ○ Null scheduler could be a match ○ Credit/Credit2 seem to need much work ○ Real-time schedulers ■ ARINC 653 ■ Real-Time-Deferrable-Server (RTDS) ● Or we need to (re)implement the same for coproc? ● Do we need to be real-time? (mission critical, Audio/Video use-cases)
  • 14.
    Configuration • Configure: MMIOranges, interrupts, IOMMU etc. • Need to configure both privileged and guest domains • Privileged domains may not have configuration file, e.g. Dom0, but DomD has • Guest is configured with a configuration file • Must be able to configure multiple vcoprocs per domain • To allow coprocessor sharing within the same guest, running different FW/Drivers, e.g. OpenGL concurrently with OpenCL for vGPU
  • 15.
    Configuration • Current implementation •device tree bootargs to configure Dom0 • partial dtb + DomU configuration file (similar to ARM passthrough) • partial dtb for DomU (with pdtb passed to XEN) was rejected after community discussion • How to pass variable structure data to Xen • Device-tree, but no x86 support • ACPI, but is it ARM ready yet? • Introduce new ABI: • Pass memory ranges, interrupts etc in a flexible way • Have convertors for ACPI, DTB etc?
  • 16.
    IOMMU support • HWexpects to see physically contiguous memory, e.g. for DMA operations • Guest cannot guarantee that, “bad” options are: • 1:1 mapped guest • If coproc has its own MMU - trap memory access and update MMU manually in SW • Utilize IOMMU to overcome these problems with better performance: • 1:1 is not required • Better memory isolation - control coproc’s memory access • Overcome 4GB limit for 32-bit DMA capable devices • Switch handled by the framework • No changes to existing FW/driver • No changes to coproc Xen driver
  • 17.
    Proprietary code ● Thereis always room for someone’s IP... ● Cannot disclose source/interface: NDA, incompatible license ● Need to move part of coproc’s code into a black box ● Options are being discussed (Volodymyr Babchuk will cover in detail during the Summit): ○ Stubdom ○ EL0 applications ● Once decision is made it will be adopted by the framework
  • 18.
    What is expectedfrom a “native application” • Latency is critical • MMIO access • IRQ handling • System stability • Recovery from misbehaving proprietary code • Power and clock management • Solution to legal problems
  • 19.
    Next generation car Picturefrom http://www.designhmi.com/2015/02/23/in-car-connectivity-and-iot-internet-of-things/
  • 20.
    Virtual GPU ● Oneof the key components for automotive use-cases ○ Instrument cluster (IC) ○ Head-up display (HUD) ○ In-vehicle infotainment (IVI) ● Performance and stability are both critical: ○ Not only OpenGL/Vulkan, but OpenCL and more - different firmware at the same time, even the same guest ○ IVI crash must not affect IC
  • 21.
    vGPU status ● Proof-of-conceptis limited, but working ○ Context switch via power off/on sequence of the GPU ○ IOMMU switch is done via iommu_deassign_dt_device/iommu_assign_dt_device ○ Future work: ■ Avoid complete off/on sequences ■ Faster switch via context save ● Need proper integration with IOMMU ● Need decision on proprietary code placement
  • 22.
    SCF status andopen questions • In progress • Initial shared coprocessor framework design document is available (needs update) • Native application approaches are being discussed • SCF configuration discussion started • POC is available • Not started • Power - reset - clock management • Need to control clocks and power • What if external PMIC is used (HW interface, driver, which domain?)
  • 23.
    What we areworking on Xen Native EL0 apps / stub domains Real time scheduling Heterogenous big.LITTLE support PMF (cpufreq, cpuidle, thermal, vcoprocpm) SCF IOMMUF & IPMMU support SMC/HVC bridge PV frontends Xen apps PM governor +SoC drivers TEE manager +OP-TEE driver GPU mediator +SGX driver OP-TEE Mullti-domain support Integration Android HALs Sound/Display managers PV backends Certification ISO 61508 path 3s CI Build/release system
  • 24.
    Resources ● https://github.com/xen-troops ○ Sharedcoprocessor framework ○ Para-virtual drivers and backends (generic backend library, display, sound, multi-touch etc) ○ Multidomain Yocto-based build system (xt-distro) ● With your help we will upstream it all
  • 25.
  • 26.