Project ACRN expose and pass through platform hidden PCIe devices to SOS
Expose and pass through platform hidden PCIe devices to SOS
SSP.ACRN
Li, Fei
agenda
⚫ The background
⚫ Why does ACRN need to expose and pass through
platform hidden PCI(e) devices to SOS
⚫ How ACRN to expose and pass through platform hidden
PCI(e) devices to SOS
PCI-compatible Configuration Mechanism
Bus Number - is an encoded value used to select 1 of 256 buses in a system.
Device Number - is an encoded value used to select one of 32 devices on a given bus.
Function Number - is an encoded value used to select one of eight possible functions
on a multifunction device.
Register Number - is an encoded value used to select a DWORD in the Configuration
Space of the intended target.
PCI-compatible Configuration Mechanism
CONFIG_ADDRESS (CF8h)
CONFIG_DATA (CFCh - CFFh)
IE: read PCIe device 00:1c.0 Header Type (0xEh)
Write (1 << 31) | (0 << 16) | (1c << 11) | (0 << 8) | (0xE & 0xFC) to IO Port CF8h
Read one byte data from IO Port CFCh + (0xE – (0xE & 0xFC))
PCI Express Enhanced Configuration Access Mechanism
(ECAM)
PCI MMCONFIG Address 0xe0000000 (physical address,
bus 00-ff)
IE: read PCIe device 00:1c.0 Header Type (0xEh)
Read four bytes data form MMIO 0xe0000000 + (0 << 20) |
(1c << 15) | (0 << 12) | (0xE & 0xFC)
PCIe devices on NUC
Vendor ID Register (Offset 00h)
The Vendor ID register is HwInit and the value in this register identifies the
manufacturer of the Function. In keeping with PCI-SIG procedures, valid vendor
identifiers must be allocated by the PCI-SIG to ensure uniqueness. Each vendor must
have at least one Vendor ID. It is recommended that software read the Vendor ID
register to determine if a Function is present, where a value of FFFFh indicates that no
Function is present.
Part Two - Why does ACRN need to
expose and pass through platform
hidden PCI(e) devices to SOS
ACRN needs to trap ECAM for SOS
⚫ SOS shouldn’t access the PCIe devices which have assigned to
UOS.
⚫ SOS needs to trap some PCIe external capability access and handle
this correctly. For example, for the SR-IOV support.
So now ACRN hypervisor will trap each PCIe configuration access (no
matter by IO port or MMIO) and check whether the PCIe device
belongs to the guest which trigger the access.
Why does ACRN need to expose and pass through
platform hidden PCI(e) devices to SOS
However, there’re some PCIe devices hidden by BIOS which can’t been discovered by
read Vendor ID simply. For example, Intel Primary to Sideband Bridge. The BIOS could
hide/unhide it by enabling/disabling HIDE bit in P2SB Control Register.
To make things worse, SOS on APL UP2 will call ACPI method to set eMMC controller to
D3Hot Power State which will trigger P2SB ECAM access.
Why does ACRN need to expose and pass through
platform hidden PCI(e) devices to SOS
Scope (_SB.PCI0)
{
OperationRegion (P2CG, SystemMemory, 0xE00680D0, 0x20) – PCI device 00:0d.0 CFG Space [0xD0 – 0x100)
Field (P2CG, DWordAcc, NoLock, Preserve)
{
SBAD, 32,
SBDA, 32,
IRDY, 1,
, 6,
POST, 1,
OPCD, 8,
SBID, 16,
SBEA, 32,
Offset (0x11),
P2HD, 8 - P2SB Control Hide Device
}
Why does ACRN need to expose and pass through
platform hidden PCI(e) devices to SOS
Method (SBIM, 4, Serialized)
{
Local0 = Zero
Local1 = Acquire (_GL, 0x1F40)
If ((Local1 == Zero))
{
P2HD = Zero
While (IRDY)
{
Sleep (One)
}
SBAD &= 0x00F00000
SBAD |= Arg0
SBEA = Zero
SBDA = Arg1
SBID &= 0x0800
SBID |= Arg3
POST = Zero
OPCD = Arg2
IRDY = One
While (IRDY)
{
Sleep (One)
}
Local0 = SBDA /* _SB_.PCI0.SBDA */
P2HD = One
}
Release (_GL)
Scope (SDHA)
{
Name (_DDN, "Intel(R) eMMC Controller - 80865ACC") // _DDN:
DOS Device Name
Name (_UID, One) // _UID: Unique ID
Method (_PS3, 0, NotSerialized) // _PS3: Power State 3
{
Local0 = SBIM (0xD600003C, Zero, Zero, 0x30E0)
Local1 = SBIM (0xD6000834, Zero, Zero, 0x30E0)
Local2 = SBIM (0xD6000840, Zero, Zero, 0x30E0)
If ((Local0 & 0x00800000))
{
Local3 = (((Local2 & 0x1F) * 0x02) + ((Local1 &
0x3F00) >> 0x08))
Local1 = ((Local1 & 0xFFFFFF80) | (Local3 & 0x7F))
SBIM (0xD6000834, Local1, One, 0x30E0)
}
DPGE = One
I3EN = One
SCPG (One, 0x41)
Local0 = PMSR /* _SB_.PCI0.SDHA.PMSR */
Local0 &= One
}
Why does ACRN need to expose and pass through
platform hidden PCI(e) devices to SOS
We didn’t assign hidden PCIe devices to SOS before. As a result, SOS thought this
device didn’t belong to itself. So this access was ignored.
And the eMMC controller Power State setting was failed. This would make SOS
hardly working since it seemed this was the only thing it cared – keeping setting if
failed.
Part Three - How ACRN to expose and pass through
platform hidden PCI(e) devices to SOS
How ACRN to expose and pass through platform hidden
PCI(e) devices to SOS
Just add the platform hidden PCIe devices into plat_hidden_pdevs structure on your
board configuration in arch/x86/configs/$(BOARD)/board.c and the platform hidden
PCIe devices number (MAX_HIDDEN_PDEVS_NUM) in
arch/x86/configs/$(BOARD)/misc_cfg.h
NOTE: Not each platform hidden PCIe device should expose and pass through to SOS. If
this hidden device only used to trigger watchdog. (After we support RT-VM, SOS should
not trigger hardware watchdog to reset the whole system if RT-VM is alive)
How ACRN to discover platform hidden PCI(e) devices
dm/vpci/vpci.c
DM Emulated/Touched Legacy/ACPI Devices
IRQ IO/MMIO Expose to UOS by Comments
RTC 8 0x70~0x71 DSDT: PNP0B00 Expose if enabled LPC
UART 3,4 0x2F8, 0x3F8 DSDT: PNP0501 Expose if enabled LPC
ACPI PM 9 0xCF9,
0x400~0x404,
0xb2
FADT
reset register, PM1A, SMI
Always Expose
ACPI Idle N/A N/A DSDT Always Expose
HV emulated, hcall to fetch Cx info; no need after
CPU sharing
ACPI P-state N/A N/A DSDT Always Expose
HV emulated, hcall to fetch Px info; no need after
CPU sharing
PIC 2 0x20, 0xA0 DSDT: PNP0000 Expose if enabled LPC
(except RTVM)
HV emulated
PIT 0 0x40 DSDT: PNP0100 Expose if enabled LPC
HPET N/A 0xFED00000 HPET
DSDT: PNP0103
Always Expose
KBD/MOU 1,12 0x60,0x64 DSDT: PNP0303, PNP0F13 Expose if enabled LPC
TPM N/A 0xFED40000 TPM2
DSDT: MSFT0101
Optional (through cmdline opt)
DM Emulated PCI Devices
• Interrupt (if the device request INTx support)
• DM allocate INTA/B/C/D for each device in same slot (bus:dev) -> fill to IRQ_PIN (0x3d)
• DM allocate PIRQ (pin & pic_irq) for each slot INTA/B/C/D -> fill to IRQ_LINE(0x3c)
• DM allocate AIRQ (IOAPIC gsi) for each slot INTA/B/C/D
• DM create PPRT & APRT based on PIRQ (SLOT:INTx vs. PIRQ) & AIRQ (SLOT:INTx vs. AIRQ)
• Bar
• Each PCI bridge 32bit/64bit MMIO range -> DSDT
• DM allocate bar for each device based on each bus
• BDF
• Set by DM cmdline
Assumption for Device Emulation between HV & DM
• Limit HV emulated devices
• Legacy device for vUART & vRTC
• PCI devices for PCI bridge (for pre-launched or service VM), PT devices, vWatchdog & PCI based vUART
• NO ACPI device
• For PCI devices in a post-launched VM
• DM own PCI bridge and other pure emulated PCI devices
• HV own PT devices
• HV own necessary pure emulated PCI device, only support MSI interrupt
• For RTVM under LAPIC PT mode, PT devices shall only support MSI interrupt, NO INTx
• If a PT device explicitly depend on other PT devices, only support it when all depend PT devices pass-thru to this VM
• If a PT device implicitly depend on some other resource coming from native ACPI table, NOT support it
• If a PT device depend on DM emulated device, NOT support it?
Proposal for HV Emulated Legacy Device in DM-
launched VM
• DM own overall resource management
• Interrupt – IOAPIC/PIC pins
• ACPI – DSDT (_PTR) etc
• Legacy devices
• Pre-alignment for the resource configuration between DM & HV
• Configurable to choose DM or HV one, or just directly use HV vm_config and ignore DM config, as HV IO emulation has
priority
Proposal-1 for HV Emulated PCI Devices in DM-
launched VM
• DM own overall resource management
• Interrupt – IOAPIC/PIC pins
• PCI – BDF, bar range, INTx
• ACPI – DSDT (_PTR) etc
• PCI devices
• PT PCI device move to HV
• vPCI for UOS
• DM add device through hypercall with necessary resource info allocated
• type : could be vWatchdog, vUART, PT
• vBDF, pBDF: pBDF only for PT device type
• INTx
• bar (include MSIX bar)
SOS
PT vPCI Dev in DM
Proposal-1 – PT PCI Dev in UOS
Initialization
vBDF
HV
UOS
PCI Dev
INTx
BAR
hypercall
IO Request
UOS vpci
PCI Request
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBD
F
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
Other Dev in DM
1 2
1) DM allocate resource
❑ DM analyze PT device physical info based on SOS vPCI, to
allocate all necessary resource [vBDF, BAR, INTx]
2) DM hypercall to add_dev
❑ Combine hypercalls (ASSIGN_PTDEV,
SET_PTDEV_INTR_INFO) into one – ADD_DEV [type=PTDEV,
vBDF, pBDF, BAR, INTx]
❑ vpci_init_pt_dev with DM allocated resource
3) UOS PCI Dev access trap to HV
❑ Parse PCI IO request
❑ Filter out PCI IO request for HV owned PT PCI Device
❑ Others PCI IO request still send up to DM like the old way
SOS vpci
3
Proposal-2 for HV Emulated PCI Devices in DM-
launched VM
• Pre-partitioned resource between DM & HV
• BDF:
• DM own bus 0 and HV own bus 1
• Bar:
• pre-aligned partition MMIO range for bus 0 and bus 1, DM need prepare ACPI info for bus 1 PCI bridge
• INTx:
• pre-aligned partition INTx gsi range, DM need prepare ACPI PRT for PCI devices under bus 1
• Config:
• HV could pre-define UOS Ptdev list in its vm_config, or DM define it during runtime configuration
Suggestion for Emulated Device Matrix
Resource[1] allocated
in Emulated in ACPI prepared by Support Assumption
DM HV DM HV DM HV
Host Bridge Y Y Y Always need
Normal Bridge Y Y Y If there is extra PCI bus
PTDev
INTx
(w/ vLAPIC) Proposal 1 Proposal 2 Y Y - for PRT
Not support if VM is using PT lapic;
It's necessary to PT UART/I2C kind of legacy
PCI devices;
ACPI table only cover PRT;
MSI/MSIX Proposal 1 Proposal 2 Y No ACPI table for it
other PTDev
explicitly dep on [2] Proposal 1 Proposal 2 Y Y - for PSx
Only support if all dep devices pass-thru to this
VM;
ACPI table only cover PM related if needed??
other implicitly
Resource dep on [3] No plan to support
Other
devices
HV emulated
PCI device Proposal 1 Proposal 2 Y Only support MSI for emulated PCI device
HV emulated legacy
device Pre-aligned Pre-aligned Y Y ACPI only cover for emulated legacy device
DM emulated
legacy & PCI device Y Y Y
[1] Resource include BAR, INTx and BDF etc.
[2] PTDev may explicitly depend on other PTDev like I2C, UART, its driver or ACPI method like PSx need work based on such devices
[3] PTDev may depend on other implicitly resource or DM emulated device, for example, it may need DM emulated GPIO, or its ACPI method like PSx need work based on NVS ram passed MMIO
like side-band access
SOS
Pure vPCI Dev in DM
DM Emulated PCI Devices
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
hypercall
vLAPIC/vIOAPIC/vPI
C
IO Request
MMIO BAR
• DM emulate all PCI resource
• Virtual config space
• INTx
• BAR range
• vBDF
• ACPI table (PRT etc)
• DM inject virq through hypercall
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices - Initialization
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR Map
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
hypercall
IOMMU
PTIRQ
EPT
1
2
3
4
5
• SOS own PT dev original – DM access
through SOS vPCI
1) DM build
❑ vDev cfg space based on SOS vPCI dev
❑ vMSIX bar map based on SOS vPCI dev
2) DM initialize HV stuff through hypercall
3) Assign IOMMU domain for this PT dev to UOS
4) Assign PTIRQ entries for this PT dev to UOS
❑ vINTx vs. pINTx
❑ vMSI vs. pMSI
❑ vMSIX vs. pMSIX
5) Add EPT mapping of MMIO bar for this PT dev to
UOS
SOS vpci
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices – MMIO BAR Access
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR Map
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
EPT
• UOS PCI Dev directly access physical bar
based on EPT mapping
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices – Interrupt Injection
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR Map
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
PTIRQ
1) Physical device trigger physical interrupt
❑ pINTx
❑ pMSI/pMSIX
2) Do_irq match ptirq mapping
❑ pINTx -> vINTx -> vVector
❑ pMSI/pMSIX -> vMSI/vMSIX -> vVector
3) Do_irq inject vVector to guest
Do_irq
1
2
3
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices – Modify MMIO BAR CFG
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
IO Request
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR Map
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
EPT
1
23
1) UOS PCI Dev access cfg space trap to HV
2) HV raise IO request to DM
3) DM notice MMIO bar update
❑ Directly modify BAR cfg based on SOS vPCI dev
❑ SOS vPCI will remap the EPT mapping
** it’s actually a long & obscure flow
SOS vpci
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices – Modify MSIX
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
IO Request
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR Map
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
1
23
1) UOS PCI Dev access MSIX MMIO trap to HV
2) HV raise IO request to DM
3) DM notice MSIX MMIO access
❑ Directly modify MSIX BAR map based on SOS vPCI dev
❑ SOS vPCI will modify the PTIRQ entry for related MSIX entry
** Modify MSIX could be destination, delivery mode
change which lead to physical MSI msg change
** it’s actually a long & obscure flow
PTIRQ
SOS vpci
SOS
PT vPCI Dev in DM
DM Emulated PT PCI Devices – Modify MSI cfg
VIRT CFG SPACE
BAR cfg
vBDF
HV
UOS
PCI Dev
IO Request
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
PT Physical Dev in HV
CFG SPACE
BAR cfg MMIO BAR
pBDF
INTx cfg
MSI cfg
MSIX cfg
MSIX BAR
1
23
1) UOS PCI Dev access cfg space trap to HV
2) HV raise IO request to DM
3) DM notice MSI cfg space access
❑ Directly modify MSI cfg based on SOS vPCI dev
❑ SOS vPCI will modify the PTIRQ entry for related MSI entry
** it’s actually a long & obscure flow
PTIRQ
SOS vpci