PCI Passthrough
and GICv3-ITS in Xen ARM
Manish Jaggi
Vijaya Kumar Kilari
Cavium, Inc. Proprietary
+ Demo on Dual Socket 48x2 Core ARMv8 Board
Page 2©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Agenda
 Status of Xen Support from Cavium
 Top Level Architecture
 Additions in xen for pci-passthrough
 ITS architecture
– ARM specification
– Virtual ITS driver in Xen
 Xen NUMA Demo on Cavium ThunderX platform
 Questions
Page 3©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Status of Xen Support from Cavium
 Xen 4.5+ (Current)
– Demoed in Linaro Connect
– Initial Support NUMA
 Xen 4.6
– Basic ThunderX platform support
– Gicv3 Support.
 Xen 4.7
– vITS support
– PCI Passthrough patches in Xen and Linux
– NUMA Patches
Page 4©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Linaro Connect – Demo
Xen running on single socket 48 core -
ThunderX
Page 5©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ThunderX System Dual Socket Reference Platform
x2
Standard Industry Form Factor:
½ SSI Motherboard
2U 19” Rack Mount Chassis
Volume Server I/O:
PCIe Gen3
10Gb or 40Gb Ethernet
Integrated SATA
Up to 128GB Memory
Full Systems Management w/ BMC and IPMI
http://cavium.com/pdfFiles/ThunderX_CRB_2S_Rev1.pdf
Page 6©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Xen NUMA
running on dual socket 48x2 cores
vCPU
dom0 domU
vITS
vCPU
vITS
domU
vCPU
vITS
Xen Hypervisor
DDR DDR
Node 0 (48 Cores) Node 1 (48 Cores)
Page 7©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Top Level Architecture
R/WMSI/X
vCPU
IO Virtualizaton with System MMU
dom0 domU
vITS
vCPU
vITS
PCIe HostBridge
GICv3 ITS
DDR
Controller
DDR
Xen Virtual ITS Driver
PCIe-EP1 PCIe-EP2
(DeviceID,MSI_Index)=>LPI
Interrupt
Translation Table StreamID => ContextBank
ContextBank =
{…, Domain PageTable, … }
vLPIvLPI
}
Page 8©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
additions in xen-arm… (proposed / implemented)
 PCIe HostController Support in Xen.
– pci_conf_read/write calls handled by host controller driver
– device_tree based
 vITS Emulation Support
 Hypercall to map Linux SegmentID to appropriate PCI HostController
 xl-toolstack additions
– Mapping of GITS_ITRANSLATER space in domain
– assign_device hypercall enhanced to support vDeviceID
 Frontend-Backend Changes
– no communication for MSI.
– Front-end PCI bus msi-parent => its node in guest device tree
 SMMU additions
Page 9©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
PCIe Host Controller support in Xen
The init function in the pci host driver calls to register hostbridge callbacks:
int pci_hostbridge_register(pci_hostbridge_t *pcihb);
struct pci_hostbridge_ops {
     u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,u32 reg, u32 bytes);
     void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn,
u32 reg, u32 bytes, u32 val);
};
struct pci_hostbridge{
     u32 segno;
     paddr_t cfg_base;
     paddr_t cfg_size;
     struct dt_device_node *dt_node;
     struct pci_hostbridge_ops ops;
     struct list_head list;
};
Page 10©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
PHYSDEVOP_pci_host_bridge_add
#define PHYSDEVOP_pci_host_bridge_add    44
struct physdev_pci_host_bridge_add {
    /* IN */
    uint16_t seg;
    uint64_t cfg_base;
    uint64_t cfg_size;
};
This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add
hypercall. The handler code invokes … to update segment number in pci_hostbridge:
int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size);
Page 11©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
xl toolstack additions - DOMCTL
For domU, while creating the domain, the toolstack reads the IPA from the
macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is
read from a new hypercall which returns the PA of the GITS_ITRANSLATER_SPACE.
Subsequently the toolstack sends a hypercall to create a stage 2 mapping.
Hypercall Details: XEN_DOMCTL_get_itranslater_space
/* XEN_DOMCTL_get_itranslater_space */
struct xen_domctl_get_itranslater_space {
    /* OUT variables. */
    uint64_aligned_t start_addr;
    uint64_aligned_t size;
};
typedef struct xen_domctl_get_itranslater_space xen_domctl_get_itranslater_space;
DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
Page 12©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
xl toolstack additions – device assignmentReserved Areas in guest memory space
Parts of the guest address space is reserved for mapping assigned pci device’s BAR
regions. Toolstack is responsible for allocating ranges from this area and creating stage
2 mapping for the domain.This area is defined in public/arch-arm.h
/* For 32bit BARs*/
#define GUEST_BAR_BASE_32 <<>>
#define GUEST_BAR_SIZE_32 <<>>
/* For 64bit BARs*/
#define GUEST_BAR_BASE_64 <<>>
#define GUEST_BAR_SIZE_64 <<>>
New entries in xenstore for device BARs
/local/domain/0/backend/pci/1/0
vdev-N
    BDF = ""
    BAR-0-IPA = ""
    BAR-0-PA = ""
    BAR-0-SIZE = ""
...
BAR-M-IPA = ""
BAR-M-PA = ""
BAR-M-SIZE = "”
Page 13©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Hypercall Modification (XEN_DOMCTL_assign_device)
 struct xen_domctl_assign_device {
uint32_t dev; /* XEN_DOMCTL_DEV_* */
union {
struct {
uint32_t machine_sbdf; /* machine PCI ID of assigned device */
uint32_t guest_sbdf; /* guest PCI ID of assigned device */
} pci;
struct {
uint32_t size; /* Length of the path */
XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */
} dt;
} u;
};
Page 14©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
SMMU Code additions
iommu_ops functions
PHYSDEVOP_pci_add_device
.add_device = arm_smmu_add_dom0_dev,
PHYSDEVOP_pci_remove_device
.remove_device = arm_smmu_remove_device
Page 15©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Mapping between streamID - deviceID - pci sbdf - requesterID
 For a simpler case all should be equal to BDF.
 But there are some devices that use the different
requester ID for DMA transactions
 Suggestions How to handle this
Page 16©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
pci-frontend bus gicv3-its node binding for domU
 It is assumed that toolstack would generate a gicv3-its node in domU device
tree.
 As of now the ARM PCI passthrough design supports device assignment to
the guests which have gicv3-its support.
 All the devices assigned to domU are enumerated on a PCI frontend bus.
On this bus interrupt parent is set as gicv3-its for ARM systems.
 As the gicv3-its is emulated in xen, all the access by domU driver is trapped.
This helps configuration & direct injection of MSI(LPI) into the guest. Frontend-backend
communication for MSI is no longer required.
 Frontend-backend communication is required only for reading PCI configuration
space by dom0 on behalf of domU.
Page 17©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS
 Interrupt Translation Service(ITS) is the specification from ARM to
support PCI MSI(x).
 MSI(x) are handled as Locality-specific Peripheral Interrupts (LPI)
starting from IRQ number 8192.
 LPIs are directly targeted to CPU.
 SW sends ITS command like MAPD, MAPVI, MOVI, INT, SYNC, INV to ITS
HW to prepare MSI(x) Translation.
 Command Completion notification using
– Polling
– Interrupt notification by placing INT command
Page 18©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS HW-SW Interaction
ITT Table
ITT Table
ITT Table
CWRITER
CREADER
BASER
Command
Queue
ITS Commands
Device Table
ITS HW
LPI
Configuration
Table
LPI Pending Table
(per CPU)
CPUS
CPUS
SOFTWARE
Allocated by
SW used by
HW
Allocated by SW
used by both
SW Write ITS
CMDs to Queue
HW reads ITS
CMDs and
configures
CPUS
CPUS
Legend:
Page 19©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Major challenges in virtualizing ITS
 ITS Commands should be processed in with minimal latency
without blocking VCPU for long duration
 All guests should get fair amount of time in processing guest ITS
commands
 Guest cannot put Xen in DoS by sending commands continuously
– Solution: Do not send Guest ITS commands to HW. Just emulate them.
 Processing global ITS commands like SYNC, INVALL etc., on
platforms with Multi-node ITS
– Solution: One Virtual ITS per domain and Ignore guests SYNC,
INVALL, DISCARD commands
Page 20©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Major challenges in virtualizing ITS
 Handling guest ITS emulation that uses INT
command for completion notification
– Solution: Xen injects back virtual LPI to guest
when INT command is emulated.
Page 21©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS virtualization in XEN
 ITS Virtualization
– Command Queue virtualization
– LPI configuration table virtualization
– GITS registers virtualization
Page 22©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
XEN ITS Initialization
XEN
Guest
SOFTWARE
ITT Table
CWRITER
CREADER
BASER
Physical ITS
command Queue
ITS Commands
Device Table
Memory Allocated
by
XEN for ITS HW
(1) Guest sends
PCI_DEVICE_ADD
_PHYSDEVOPS
hyper call
(2) Allocates ITT
table for the
device and sends
MAPD command
to ITS HW
(3) Allocates LPIs
(physical LPI) to
device and sends
MAPVI commands
Page 23©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
ITS command Virtualization
ITT Table
ITT Table
ITT Table
CWRITER
CREADER
BASER
Virtual
Command
Queue
ITS
Commands
Device Table
XEN
Guest
SOFTWARE
Memory Allocated
by
GUEST
(2) Xen uses Guest’s
Device and ITT table
memory to note-down
Guest ITS command
information
(1) Traps to Xen
on Guest update
of command in
Virtual Queue
Page 24©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
MAPD/MAPVI ITS command Virtualization
ITT Table
CWRITER
CREADER
BASER
MAPD Devi ID,
ITT IPA, Size
Device Table
XEN
Guest
SOFTWARE
Memory Allocated
by
GUEST
(1) XEN reads
MAPD finds out
IPA of ITT table
and Size for the
devid
ITT IPA
(8 bytes)
Size
(8 bytes)
Virtual
Command
Queue
MAPVI Dev ID,
vID, Collection
Collection ID
vLPI (vID)
(3) XEN reads
MAPVI
Command
(2) Xen uses Guest’s
Device and ITT table
memory to note-down
ITT Table IPA and Size
(4) Xen uses Guest’s
Device Table to find
Address of ITT for the
device and updates ITT
indexed by ID with vLPI
and Collection ID
Page 25©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
LPI Routing to Guest
ITT TableDevice Table
XEN
Guest
SOFTWARE
Memory
Allocated by
Guest
(2) Xen queries
Device Table
and gets ITT
table
(1) Xen
receives
pLPI
ITT IPA
(8 bytes)
Size
(8 bytes)
vLPI (vID)
Collection ID
HW
(3) From ITT
table, Xen get
Virtual LPI
(vLPI)
(4) Xen Injects
vLPI to Guest
Page 26©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
References:
 vITS Design doc
– http://xenbits.xen.org/people/ianc/vits/draftG.pdf
 Patches ( 22 )
– http://osdir.com/ml/general/2015-07/msg35182.html
 PCI Pass through Design doc
– http://www.gossamer-
threads.com/lists/xen/devel/394962
Page 27©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Xen Dual(Socket / Node) NUMA Demo
vCPU
dom0 domU
vITS
vCPU
vITS
domU
vCPU
vITS
Xen Hypervisor
DDR DDRNode 0 (48 Cores) Node 1 (48 Cores)
Page 28©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
 #xl list
Name ID Mem VCPUs State Time(s)
Domain-0 0 2048 8 r----- 128.9
domu-node0 1 2048 4 -b---- 1.4
domu-node1 2 2048 4 -b---- 0.6
 #xl cpupool-list
Name CPUs Sched Active Domain count
Pool-node0 48 credit y 2
Pool-node1 48 credit y 1
 xl cpupool-list -c
Name CPU list
Pool-node0
0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,4
3,44,45,46,47
Pool-node1
48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,
88,89,90,91,92,93,94,95
Page 29©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
Questions
Page 30©2015 Cavium Inc. All rights reserved. Confidential and Proprietary.
MSI-x Routing (back up)

PCI Passthrough and ITS Support in Xen / ARM :Xen Dev Summit 2015 Presentation

  • 1.
    PCI Passthrough and GICv3-ITSin Xen ARM Manish Jaggi Vijaya Kumar Kilari Cavium, Inc. Proprietary + Demo on Dual Socket 48x2 Core ARMv8 Board
  • 2.
    Page 2©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Agenda  Status of Xen Support from Cavium  Top Level Architecture  Additions in xen for pci-passthrough  ITS architecture – ARM specification – Virtual ITS driver in Xen  Xen NUMA Demo on Cavium ThunderX platform  Questions
  • 3.
    Page 3©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Status of Xen Support from Cavium  Xen 4.5+ (Current) – Demoed in Linaro Connect – Initial Support NUMA  Xen 4.6 – Basic ThunderX platform support – Gicv3 Support.  Xen 4.7 – vITS support – PCI Passthrough patches in Xen and Linux – NUMA Patches
  • 4.
    Page 4©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Linaro Connect – Demo Xen running on single socket 48 core - ThunderX
  • 5.
    Page 5©2015 CaviumInc. All rights reserved. Confidential and Proprietary. ThunderX System Dual Socket Reference Platform x2 Standard Industry Form Factor: ½ SSI Motherboard 2U 19” Rack Mount Chassis Volume Server I/O: PCIe Gen3 10Gb or 40Gb Ethernet Integrated SATA Up to 128GB Memory Full Systems Management w/ BMC and IPMI http://cavium.com/pdfFiles/ThunderX_CRB_2S_Rev1.pdf
  • 6.
    Page 6©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Xen NUMA running on dual socket 48x2 cores vCPU dom0 domU vITS vCPU vITS domU vCPU vITS Xen Hypervisor DDR DDR Node 0 (48 Cores) Node 1 (48 Cores)
  • 7.
    Page 7©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Top Level Architecture R/WMSI/X vCPU IO Virtualizaton with System MMU dom0 domU vITS vCPU vITS PCIe HostBridge GICv3 ITS DDR Controller DDR Xen Virtual ITS Driver PCIe-EP1 PCIe-EP2 (DeviceID,MSI_Index)=>LPI Interrupt Translation Table StreamID => ContextBank ContextBank = {…, Domain PageTable, … } vLPIvLPI }
  • 8.
    Page 8©2015 CaviumInc. All rights reserved. Confidential and Proprietary. additions in xen-arm… (proposed / implemented)  PCIe HostController Support in Xen. – pci_conf_read/write calls handled by host controller driver – device_tree based  vITS Emulation Support  Hypercall to map Linux SegmentID to appropriate PCI HostController  xl-toolstack additions – Mapping of GITS_ITRANSLATER space in domain – assign_device hypercall enhanced to support vDeviceID  Frontend-Backend Changes – no communication for MSI. – Front-end PCI bus msi-parent => its node in guest device tree  SMMU additions
  • 9.
    Page 9©2015 CaviumInc. All rights reserved. Confidential and Proprietary. PCIe Host Controller support in Xen The init function in the pci host driver calls to register hostbridge callbacks: int pci_hostbridge_register(pci_hostbridge_t *pcihb); struct pci_hostbridge_ops {      u32 (*pci_conf_read)(struct pci_hostbridge*, u32 bus, u32 devfn,u32 reg, u32 bytes);      void (*pci_conf_write)(struct pci_hostbridge*, u32 bus, u32 devfn, u32 reg, u32 bytes, u32 val); }; struct pci_hostbridge{      u32 segno;      paddr_t cfg_base;      paddr_t cfg_size;      struct dt_device_node *dt_node;      struct pci_hostbridge_ops ops;      struct list_head list; };
  • 10.
    Page 10©2015 CaviumInc. All rights reserved. Confidential and Proprietary. PHYSDEVOP_pci_host_bridge_add #define PHYSDEVOP_pci_host_bridge_add    44 struct physdev_pci_host_bridge_add {     /* IN */     uint16_t seg;     uint64_t cfg_base;     uint64_t cfg_size; }; This hypercall is invoked before dom0 invokes the PHYSDEVOP_pci_device_add hypercall. The handler code invokes … to update segment number in pci_hostbridge: int pci_hostbridge_setup(uint32_t segno, uint64_t cfg_base, uint64_t cfg_size);
  • 11.
    Page 11©2015 CaviumInc. All rights reserved. Confidential and Proprietary. xl toolstack additions - DOMCTL For domU, while creating the domain, the toolstack reads the IPA from the macro GITS_ITRANSLATER_SPACE from xen/include/public/arch-arm.h. The PA is read from a new hypercall which returns the PA of the GITS_ITRANSLATER_SPACE. Subsequently the toolstack sends a hypercall to create a stage 2 mapping. Hypercall Details: XEN_DOMCTL_get_itranslater_space /* XEN_DOMCTL_get_itranslater_space */ struct xen_domctl_get_itranslater_space {     /* OUT variables. */     uint64_aligned_t start_addr;     uint64_aligned_t size; }; typedef struct xen_domctl_get_itranslater_space xen_domctl_get_itranslater_space; DEFINE_XEN_GUEST_HANDLE(xen_domctl_get_itranslater_space;
  • 12.
    Page 12©2015 CaviumInc. All rights reserved. Confidential and Proprietary. xl toolstack additions – device assignmentReserved Areas in guest memory space Parts of the guest address space is reserved for mapping assigned pci device’s BAR regions. Toolstack is responsible for allocating ranges from this area and creating stage 2 mapping for the domain.This area is defined in public/arch-arm.h /* For 32bit BARs*/ #define GUEST_BAR_BASE_32 <<>> #define GUEST_BAR_SIZE_32 <<>> /* For 64bit BARs*/ #define GUEST_BAR_BASE_64 <<>> #define GUEST_BAR_SIZE_64 <<>> New entries in xenstore for device BARs /local/domain/0/backend/pci/1/0 vdev-N     BDF = ""     BAR-0-IPA = ""     BAR-0-PA = ""     BAR-0-SIZE = "" ... BAR-M-IPA = "" BAR-M-PA = "" BAR-M-SIZE = "”
  • 13.
    Page 13©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Hypercall Modification (XEN_DOMCTL_assign_device)  struct xen_domctl_assign_device { uint32_t dev; /* XEN_DOMCTL_DEV_* */ union { struct { uint32_t machine_sbdf; /* machine PCI ID of assigned device */ uint32_t guest_sbdf; /* guest PCI ID of assigned device */ } pci; struct { uint32_t size; /* Length of the path */ XEN_GUEST_HANDLE_64(char) path; /* path to the device tree node */ } dt; } u; };
  • 14.
    Page 14©2015 CaviumInc. All rights reserved. Confidential and Proprietary. SMMU Code additions iommu_ops functions PHYSDEVOP_pci_add_device .add_device = arm_smmu_add_dom0_dev, PHYSDEVOP_pci_remove_device .remove_device = arm_smmu_remove_device
  • 15.
    Page 15©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Mapping between streamID - deviceID - pci sbdf - requesterID  For a simpler case all should be equal to BDF.  But there are some devices that use the different requester ID for DMA transactions  Suggestions How to handle this
  • 16.
    Page 16©2015 CaviumInc. All rights reserved. Confidential and Proprietary. pci-frontend bus gicv3-its node binding for domU  It is assumed that toolstack would generate a gicv3-its node in domU device tree.  As of now the ARM PCI passthrough design supports device assignment to the guests which have gicv3-its support.  All the devices assigned to domU are enumerated on a PCI frontend bus. On this bus interrupt parent is set as gicv3-its for ARM systems.  As the gicv3-its is emulated in xen, all the access by domU driver is trapped. This helps configuration & direct injection of MSI(LPI) into the guest. Frontend-backend communication for MSI is no longer required.  Frontend-backend communication is required only for reading PCI configuration space by dom0 on behalf of domU.
  • 17.
    Page 17©2015 CaviumInc. All rights reserved. Confidential and Proprietary. ITS  Interrupt Translation Service(ITS) is the specification from ARM to support PCI MSI(x).  MSI(x) are handled as Locality-specific Peripheral Interrupts (LPI) starting from IRQ number 8192.  LPIs are directly targeted to CPU.  SW sends ITS command like MAPD, MAPVI, MOVI, INT, SYNC, INV to ITS HW to prepare MSI(x) Translation.  Command Completion notification using – Polling – Interrupt notification by placing INT command
  • 18.
    Page 18©2015 CaviumInc. All rights reserved. Confidential and Proprietary. ITS HW-SW Interaction ITT Table ITT Table ITT Table CWRITER CREADER BASER Command Queue ITS Commands Device Table ITS HW LPI Configuration Table LPI Pending Table (per CPU) CPUS CPUS SOFTWARE Allocated by SW used by HW Allocated by SW used by both SW Write ITS CMDs to Queue HW reads ITS CMDs and configures CPUS CPUS Legend:
  • 19.
    Page 19©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Major challenges in virtualizing ITS  ITS Commands should be processed in with minimal latency without blocking VCPU for long duration  All guests should get fair amount of time in processing guest ITS commands  Guest cannot put Xen in DoS by sending commands continuously – Solution: Do not send Guest ITS commands to HW. Just emulate them.  Processing global ITS commands like SYNC, INVALL etc., on platforms with Multi-node ITS – Solution: One Virtual ITS per domain and Ignore guests SYNC, INVALL, DISCARD commands
  • 20.
    Page 20©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Major challenges in virtualizing ITS  Handling guest ITS emulation that uses INT command for completion notification – Solution: Xen injects back virtual LPI to guest when INT command is emulated.
  • 21.
    Page 21©2015 CaviumInc. All rights reserved. Confidential and Proprietary. ITS virtualization in XEN  ITS Virtualization – Command Queue virtualization – LPI configuration table virtualization – GITS registers virtualization
  • 22.
    Page 22©2015 CaviumInc. All rights reserved. Confidential and Proprietary. XEN ITS Initialization XEN Guest SOFTWARE ITT Table CWRITER CREADER BASER Physical ITS command Queue ITS Commands Device Table Memory Allocated by XEN for ITS HW (1) Guest sends PCI_DEVICE_ADD _PHYSDEVOPS hyper call (2) Allocates ITT table for the device and sends MAPD command to ITS HW (3) Allocates LPIs (physical LPI) to device and sends MAPVI commands
  • 23.
    Page 23©2015 CaviumInc. All rights reserved. Confidential and Proprietary. ITS command Virtualization ITT Table ITT Table ITT Table CWRITER CREADER BASER Virtual Command Queue ITS Commands Device Table XEN Guest SOFTWARE Memory Allocated by GUEST (2) Xen uses Guest’s Device and ITT table memory to note-down Guest ITS command information (1) Traps to Xen on Guest update of command in Virtual Queue
  • 24.
    Page 24©2015 CaviumInc. All rights reserved. Confidential and Proprietary. MAPD/MAPVI ITS command Virtualization ITT Table CWRITER CREADER BASER MAPD Devi ID, ITT IPA, Size Device Table XEN Guest SOFTWARE Memory Allocated by GUEST (1) XEN reads MAPD finds out IPA of ITT table and Size for the devid ITT IPA (8 bytes) Size (8 bytes) Virtual Command Queue MAPVI Dev ID, vID, Collection Collection ID vLPI (vID) (3) XEN reads MAPVI Command (2) Xen uses Guest’s Device and ITT table memory to note-down ITT Table IPA and Size (4) Xen uses Guest’s Device Table to find Address of ITT for the device and updates ITT indexed by ID with vLPI and Collection ID
  • 25.
    Page 25©2015 CaviumInc. All rights reserved. Confidential and Proprietary. LPI Routing to Guest ITT TableDevice Table XEN Guest SOFTWARE Memory Allocated by Guest (2) Xen queries Device Table and gets ITT table (1) Xen receives pLPI ITT IPA (8 bytes) Size (8 bytes) vLPI (vID) Collection ID HW (3) From ITT table, Xen get Virtual LPI (vLPI) (4) Xen Injects vLPI to Guest
  • 26.
    Page 26©2015 CaviumInc. All rights reserved. Confidential and Proprietary. References:  vITS Design doc – http://xenbits.xen.org/people/ianc/vits/draftG.pdf  Patches ( 22 ) – http://osdir.com/ml/general/2015-07/msg35182.html  PCI Pass through Design doc – http://www.gossamer- threads.com/lists/xen/devel/394962
  • 27.
    Page 27©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Xen Dual(Socket / Node) NUMA Demo vCPU dom0 domU vITS vCPU vITS domU vCPU vITS Xen Hypervisor DDR DDRNode 0 (48 Cores) Node 1 (48 Cores)
  • 28.
    Page 28©2015 CaviumInc. All rights reserved. Confidential and Proprietary.  #xl list Name ID Mem VCPUs State Time(s) Domain-0 0 2048 8 r----- 128.9 domu-node0 1 2048 4 -b---- 1.4 domu-node1 2 2048 4 -b---- 0.6  #xl cpupool-list Name CPUs Sched Active Domain count Pool-node0 48 credit y 2 Pool-node1 48 credit y 1  xl cpupool-list -c Name CPU list Pool-node0 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,4 3,44,45,46,47 Pool-node1 48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87, 88,89,90,91,92,93,94,95
  • 29.
    Page 29©2015 CaviumInc. All rights reserved. Confidential and Proprietary. Questions
  • 30.
    Page 30©2015 CaviumInc. All rights reserved. Confidential and Proprietary. MSI-x Routing (back up)