Nell’iperspazio con Rocket: il Framework Web di Rust!
XS Oracle 2009 Fujitsu
1. Improvement of
the PCI pass-through
Jun Kamada <kama@jp.fujitsu.com>
Akio Takebe <takebe_akio@jp.fujitsu.com>
FUJITSU LIMITED
All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
2. Agenda
Background
Why SCSI ?
pvSCSI and PCI pass-through
Part 1: Current status of pvSCSI enhancements
Part 2: The booting guest with PCI pass-through
2 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
3. Background
Why SCSI ? (1/2)
Backup to tape is a fundamental functionality
for reliability and availability.
Free to move (to safe place)
Long term preservation
Safety
box
Storage
preserve
backup restore
move
load
Tape drive unload Tape cartridge
Tape drive is usually controlled by SCSI functionality.
SCSI support on guest VM is highly desired in virtualized
environment. (Issuing SCSI command from guest VM)
3 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
4. Background
Why SCSI ? (2/2)
In data center, reliability and availability (e.g. hardware snapshot, tape
backup) are provided by SCSI feature.
These servers are consolidated into a server in virtualized environment.
LAN
DB Backup
DBMS
Server Server
Data SCSI command
Center SCSI command
SAN
Storage (RAID)
Data snapshot Tape Drive
Data
File File
Load, unload, reset
Hardware snapshot
SCSI support on guest VM is mandatory.
(Issuing SCSI command from guest VM)
4 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
5. Background
pvSCSI and PCI pass-through
We have developed the pvSCSI driver and
will continue to enhance it.
Report current status of the enhancement. (Part 1)
On the other hand, we have needs to provide …
Reliability with hardware assist. (e.g. PCIe AER, …)
Seamless move between P and V.
We are focusing on SAN/PXE boot using VT-d/IOMMU.
Report enhancements of guest BIOS
in order to provide SAN/PXE boot. (Part 2)
5 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
6. Part 1
Current status of
pvSCSI enhancements
6 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
7. Current implementation (Xen 3.3.0)
The pvSCSI driver for Xen 3.3.0 provides:
LUN(Logical Unit Number) pass through
LUN hot-plug
Dom0 Guest Domain
Physical SCSI host Virtual SCSI host Virtual SCSI host
(host=0) (host=2) (host=2)
(3) Add
(2)
Attach
(1) Add
Physical LUNs Virtual LUNs Virtual LUNs
LUN
… LUN
LUN LUN LUN LUN LUN
1:0:1:3
0:0:0:1 0:1:2:3 2:0:0:0 2:0:0:1 2:0:0:0 2:0:0:1
Physical SCSI tree(s) Virtual SCSI tree Virtual SCSI tree
Arbitrary (4) Appear
mapping Immediately
7 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
8. Issue of current implementation
Current implementation provides completely virtualized
(arbitrarily mapped) SCSI tree to guest domain.
It can provide flexibility, but …
Some kind of SCSI commands (REPORT_LUN,
EXTENDED_COPY, …) should be emulated on backend.
(They depend on physical topology of SCSI tree.)
A lot of work is needed in order to Implement emulation
logic for all the commands, so current implementation
supports only mandatory commands.
Does not support full SCSI functionality. :-(
8 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
9. How to solve the issue
1. Implement all emulation logics step by step.
Hard work.
Cannot support some vendor specific commands, maybe.
2. “Add” new mode in order to attach whole HBA to
guest domain. (It allows bypassing “SCSI command
emulation” on backend driver.)
Easy to implement. (Details will be shown in following slide.)
Can support all vendor specific commands.
We took second approach.
9 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
10. Posted implementation (1/2)
Additional implementation provides:
Host (HBA: Host Bus Adaptor) pass through
Dom0 Guest Domain
Physical SCSI host Virtual SCSI host Virtual SCSI host
(host=0) (host=2) (host=2)
(1)
Create (2)
Attach
Physical LUNs Virtual LUNs Virtual LUNs
LUN LUN LUN LUN LUN LUN
0:0:0:1 0:1:2:3 2:0:0:1 2:1:2:3 2:0:0:1 2:1:2:3
Physical SCSI tree Virtual SCSI tree Virtual SCSI tree
Same ID
(underline only)
10 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
11. Posted implementation (2/2)
Following are modifications actually needed
Backend Driver
LUN/Host mode identification flag for each virtual SCSI tree
Emulation bypassing logic (if the flag shows “Host mode”)
Frontend Driver
No need to modify
xend
User interface (in order to specify “Host mode”)
LUN scan logic (provides shorter processing time by using
“lsscsi” command, if exist. (Community’s request))
11 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
12. Conclusion (Part 1)
We posted a series of patches on last week and,
they were already merged into the unstable tree.
(Thanks!)
Please try and evaluate them. Many comments
are appreciated.
Thanks
12 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
13. Part 2
The booting guest with PCI pass-through
13 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
14. Introduction
What is problem of booting from PCI pass-
through?
Before After
dom0 guest dom0 guest
qemu qemu
Emulation disk Pass through disk Pass through disk
(boot disk) (data disk) (boot disk & data disk)
14 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
15. Contents of Part2
What are required for SAN/SAS boot
Details of the requirements
Status of the requirements
Sample
Other challenge (PXE boot)
Some concerns
15 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
16. What are required
for SAN/SAS boot?
• int 0x13 handler of pass through device
(support calling convention of BCV style)
• BIOS function
• PMM(POST Memory Manager) service
• PnP runtime function
• IPL/BCV table
BCV: Boot Connection Vector. It’s typically used by SCSI controller.
16 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
17. Details of the requirements
17 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
18. Calling PCI expansion ROM(1/3)
BIOS needs to
BIOS IDE disk handler
read MBR of a
boot disk.
CD-ROM handler
FDD handler
int 0x13
PnP device handler
What is BCV style?
BCV is a pointer that points to code inside the
Expansion ROM. By using the code, PCI cards
supporting the boot spec of BCV style can hook INT
0x13 at the device initialization. Then BIOS can access
the harddisk connected to the PCI cards by using the
special INT 0x13 handler.
18 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
19. Calling PCI expansion ROM (2/3)
How to initialize Expansion ROM
0h signature 0xaa55
2h Image size …
1. Hvmloader map
3h Entry point
the Expansion
for INIT
ROM to 0xc0000 jmp <address>
function
6h reserved
0xc0000
Pointer to
18h PCI data
ROM header
Structure
PCI data structure PnP
1Ah
Expansion
PnP Expansion Header
Header
Image
2. Hvmloader,rombios
3. rombios jump to
checks some data
Entry point for INIT
function after
0xea000 supplying ax register
with bus:dev:function
number.
19 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
20. Calling PCI expansion ROM(3/3)
How to initialize Expansion ROM
0h signature 0xaa55
2h Image size …
3h Entry point
for INIT jmp <address>
function 4. Jump to BCV
for hooking INT
6h reserved
0x13h
Pointer to
0xc0000
18h PCI data
Structure
PnP
ROM header 1Ah
Expansion
PCI data structure Header
PnP Expansion Header
Image Next PnP
0h signature $PnP
Expansion Header
… … …
06h offset of next header (0000h is none)
…
… …
09h checksum
0xea000 … … …
BCV
16h Code to hook INT 0x13h
… … …
20 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
21. PMM service?
The PMM provides memory allocation only during
POST.
PCI expansion ROM use PMM service. For
example, PCI expansion ROM need a memory
block to decompress their code and to allocate
data area only used during initialization.
21 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
22. PnP runtime function?
PnP runtime functions are used by O/S and
application program. It allow them to access
BIOS features. (Get version, the number of
device, …)
PCI expansion ROM may check only PnP
Installation Check structure to determine if the
system has a Plug and Play BIOS.
22 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
23. IPL/BCV table?
IPL Table/IPL priority
IPL Table/IPL priority decide in which order devices
will be selected for booting.
In the case of xen, they are configured like
“boot=cda” in a guest configuration file.
BCV Table/BCV priority
BCV Table/BCV priority decide in which order
devices will be selected for installing INT 0x13
handler.
The order would affect the boot order.
23 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
24. Example of Boot Order
IPL table BCV table
1 HDD disk 1 IDE disk
2 CD-ROM
Additional PCI card
2
3 Network (e.g. SCSI card)
4 Floppy
BCV priority 2 1
IPL priority 4 2 1 3
IPL BCV
4 Floppy
Boot Order
CD-ROM
2
12 Additional PCI card
11 IDE disk
3 Network
24 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
25. Status of the requirements
25 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
26. Status(1/3)
Calling convention
Supported the calling convention of BCV
style in BIOS Boot spec
BCV cover not only PCI device but also ISA device.
But IOMMU does not support ISA devices.
So we supported only the calling convention of BCV style
for PCI devices.
26 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
27. Status(2/3)
BIOS function
PMM would be needed in some PCI cards.
PMM has already been supported by Kouya Shimura
PnP runtime function would be not called (in my experience).
But we need to support dummy PnP runtime function because
some Expansion ROM may check only supporting PnP runtime
function.
The dummy PnP runtime function is easy to support.
In Bochs community, Sebastian Herbszt has already posted the patch.
27 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
28. Status(3/3)
BCV table/BCV priority
How to support BCV priority for pass-though device on Xen?
A) If without Emulation disks, boot a pass-through device.
B) If we specify a pass-through device as a bootable, the
expansion ROM of only the device is loaded.
For example, pci= [ “bb:dd.ff,boot=1” ]
C)Enhance the IPL table. If pass-through device is
specified in boot order, the pass-through device of
boot=1 option is selected as a boot device.
For example, boot=“p”.
28 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
29. Sample
29 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
30. Other challenge
(PXE boot with PCI pass-through)
30 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
31. Issue of PXE boot
Expansion ROM of Ethernet
Almost PnP devices of ethernet don’t have Expansion
ROM image on themselves.
So we try to use gPXE for booting from a pass-
through devices.
31 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
32. Try PXE boot with gPXE
Configuration & Hack
Comment out checking the device number in
hvmloader.
Don’t specify emulation nic.
Only specify a nic of pass through device.
Recompile gPXE with the driver of the device
and remake eb-roms.h
32 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
33. Result…
gPXE may not support the NIC cards
gPXE may check device-id/vendor-id and so
on inside itself.
Need more debug…
33 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
34. Some concerns
Lack of I/O port space
Boot device use I/O port but I/O port is only 64k.
MMIO problem
See docs/misc/vtd.txt (Assigning devices to HVM
domains)
Dependency of Multifunction device
Some Multifunction device don’t work when we pass
the single function to guest.
pci.hide option
If we use many pass-through devices, pci.hide option
will be very long…
34 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
35. Q&A
Any question?
35 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
36. This work was partly funded by Ministry of Economy,
Trade and Industry (METI) of Japan as the Secure
Platform project of Association of Super-Advanced
Electronics Technologies (ASET).
36 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
37. Thank you
37 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009