XS Oracle 2009 Fujitsu

  • 528 views
Uploaded on

PCI Pass-through

PCI Pass-through

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
528
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Improvement of the PCI pass-through Jun Kamada <kama@jp.fujitsu.com> Akio Takebe <takebe_akio@jp.fujitsu.com> FUJITSU LIMITED All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 2. Agenda Background Why SCSI ? pvSCSI and PCI pass-through Part 1: Current status of pvSCSI enhancements Part 2: The booting guest with PCI pass-through 2 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 3. Background Why SCSI ? (1/2) Backup to tape is a fundamental functionality for reliability and availability. Free to move (to safe place) Long term preservation Safety box Storage preserve backup restore move load Tape drive unload Tape cartridge Tape drive is usually controlled by SCSI functionality. SCSI support on guest VM is highly desired in virtualized environment. (Issuing SCSI command from guest VM) 3 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 4. Background Why SCSI ? (2/2) In data center, reliability and availability (e.g. hardware snapshot, tape backup) are provided by SCSI feature. These servers are consolidated into a server in virtualized environment. LAN DB Backup DBMS Server Server Data SCSI command Center SCSI command SAN Storage (RAID) Data snapshot Tape Drive Data File File Load, unload, reset Hardware snapshot SCSI support on guest VM is mandatory. (Issuing SCSI command from guest VM) 4 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 5. Background pvSCSI and PCI pass-through We have developed the pvSCSI driver and will continue to enhance it. Report current status of the enhancement. (Part 1) On the other hand, we have needs to provide … Reliability with hardware assist. (e.g. PCIe AER, …) Seamless move between P and V. We are focusing on SAN/PXE boot using VT-d/IOMMU. Report enhancements of guest BIOS in order to provide SAN/PXE boot. (Part 2) 5 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 6. Part 1 Current status of pvSCSI enhancements 6 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 7. Current implementation (Xen 3.3.0) The pvSCSI driver for Xen 3.3.0 provides: LUN(Logical Unit Number) pass through LUN hot-plug Dom0 Guest Domain Physical SCSI host Virtual SCSI host Virtual SCSI host (host=0) (host=2) (host=2) (3) Add (2) Attach (1) Add Physical LUNs Virtual LUNs Virtual LUNs LUN … LUN LUN LUN LUN LUN LUN 1:0:1:3 0:0:0:1 0:1:2:3 2:0:0:0 2:0:0:1 2:0:0:0 2:0:0:1 Physical SCSI tree(s) Virtual SCSI tree Virtual SCSI tree Arbitrary (4) Appear mapping Immediately 7 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 8. Issue of current implementation Current implementation provides completely virtualized (arbitrarily mapped) SCSI tree to guest domain. It can provide flexibility, but … Some kind of SCSI commands (REPORT_LUN, EXTENDED_COPY, …) should be emulated on backend. (They depend on physical topology of SCSI tree.) A lot of work is needed in order to Implement emulation logic for all the commands, so current implementation supports only mandatory commands. Does not support full SCSI functionality. :-( 8 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 9. How to solve the issue 1. Implement all emulation logics step by step. Hard work. Cannot support some vendor specific commands, maybe. 2. “Add” new mode in order to attach whole HBA to guest domain. (It allows bypassing “SCSI command emulation” on backend driver.) Easy to implement. (Details will be shown in following slide.) Can support all vendor specific commands. We took second approach. 9 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 10. Posted implementation (1/2) Additional implementation provides: Host (HBA: Host Bus Adaptor) pass through Dom0 Guest Domain Physical SCSI host Virtual SCSI host Virtual SCSI host (host=0) (host=2) (host=2) (1) Create (2) Attach Physical LUNs Virtual LUNs Virtual LUNs LUN LUN LUN LUN LUN LUN 0:0:0:1 0:1:2:3 2:0:0:1 2:1:2:3 2:0:0:1 2:1:2:3 Physical SCSI tree Virtual SCSI tree Virtual SCSI tree Same ID (underline only) 10 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 11. Posted implementation (2/2) Following are modifications actually needed Backend Driver LUN/Host mode identification flag for each virtual SCSI tree Emulation bypassing logic (if the flag shows “Host mode”) Frontend Driver No need to modify xend User interface (in order to specify “Host mode”) LUN scan logic (provides shorter processing time by using “lsscsi” command, if exist. (Community’s request)) 11 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 12. Conclusion (Part 1) We posted a series of patches on last week and, they were already merged into the unstable tree. (Thanks!) Please try and evaluate them. Many comments are appreciated. Thanks 12 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 13. Part 2 The booting guest with PCI pass-through 13 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 14. Introduction What is problem of booting from PCI pass- through? Before After dom0 guest dom0 guest qemu qemu Emulation disk Pass through disk Pass through disk (boot disk) (data disk) (boot disk & data disk) 14 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 15. Contents of Part2 What are required for SAN/SAS boot Details of the requirements Status of the requirements Sample Other challenge (PXE boot) Some concerns 15 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 16. What are required for SAN/SAS boot? • int 0x13 handler of pass through device (support calling convention of BCV style) • BIOS function • PMM(POST Memory Manager) service • PnP runtime function • IPL/BCV table BCV: Boot Connection Vector. It’s typically used by SCSI controller. 16 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 17. Details of the requirements 17 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 18. Calling PCI expansion ROM(1/3) BIOS needs to BIOS IDE disk handler read MBR of a boot disk. CD-ROM handler FDD handler int 0x13 PnP device handler What is BCV style? BCV is a pointer that points to code inside the Expansion ROM. By using the code, PCI cards supporting the boot spec of BCV style can hook INT 0x13 at the device initialization. Then BIOS can access the harddisk connected to the PCI cards by using the special INT 0x13 handler. 18 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 19. Calling PCI expansion ROM (2/3) How to initialize Expansion ROM 0h signature 0xaa55 2h Image size … 1. Hvmloader map 3h Entry point the Expansion for INIT ROM to 0xc0000 jmp <address> function 6h reserved 0xc0000 Pointer to 18h PCI data ROM header Structure PCI data structure PnP 1Ah Expansion PnP Expansion Header Header Image 2. Hvmloader,rombios 3. rombios jump to checks some data Entry point for INIT function after 0xea000 supplying ax register with bus:dev:function number. 19 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 20. Calling PCI expansion ROM(3/3) How to initialize Expansion ROM 0h signature 0xaa55 2h Image size … 3h Entry point for INIT jmp <address> function 4. Jump to BCV for hooking INT 6h reserved 0x13h Pointer to 0xc0000 18h PCI data Structure PnP ROM header 1Ah Expansion PCI data structure Header PnP Expansion Header Image Next PnP 0h signature $PnP Expansion Header … … … 06h offset of next header (0000h is none) … … … 09h checksum 0xea000 … … … BCV 16h Code to hook INT 0x13h … … … 20 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 21. PMM service? The PMM provides memory allocation only during POST. PCI expansion ROM use PMM service. For example, PCI expansion ROM need a memory block to decompress their code and to allocate data area only used during initialization. 21 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 22. PnP runtime function? PnP runtime functions are used by O/S and application program. It allow them to access BIOS features. (Get version, the number of device, …) PCI expansion ROM may check only PnP Installation Check structure to determine if the system has a Plug and Play BIOS. 22 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 23. IPL/BCV table? IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will be selected for booting. In the case of xen, they are configured like “boot=cda” in a guest configuration file. BCV Table/BCV priority BCV Table/BCV priority decide in which order devices will be selected for installing INT 0x13 handler. The order would affect the boot order. 23 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 24. Example of Boot Order IPL table BCV table 1 HDD disk 1 IDE disk 2 CD-ROM Additional PCI card 2 3 Network (e.g. SCSI card) 4 Floppy BCV priority 2 1 IPL priority 4 2 1 3 IPL BCV 4 Floppy Boot Order CD-ROM 2 12 Additional PCI card 11 IDE disk 3 Network 24 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 25. Status of the requirements 25 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 26. Status(1/3) Calling convention Supported the calling convention of BCV style in BIOS Boot spec BCV cover not only PCI device but also ISA device. But IOMMU does not support ISA devices. So we supported only the calling convention of BCV style for PCI devices. 26 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 27. Status(2/3) BIOS function PMM would be needed in some PCI cards. PMM has already been supported by Kouya Shimura PnP runtime function would be not called (in my experience). But we need to support dummy PnP runtime function because some Expansion ROM may check only supporting PnP runtime function. The dummy PnP runtime function is easy to support. In Bochs community, Sebastian Herbszt has already posted the patch. 27 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 28. Status(3/3) BCV table/BCV priority How to support BCV priority for pass-though device on Xen? A) If without Emulation disks, boot a pass-through device. B) If we specify a pass-through device as a bootable, the expansion ROM of only the device is loaded. For example, pci= [ “bb:dd.ff,boot=1” ] C)Enhance the IPL table. If pass-through device is specified in boot order, the pass-through device of boot=1 option is selected as a boot device. For example, boot=“p”. 28 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 29. Sample 29 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 30. Other challenge (PXE boot with PCI pass-through) 30 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 31. Issue of PXE boot Expansion ROM of Ethernet Almost PnP devices of ethernet don’t have Expansion ROM image on themselves. So we try to use gPXE for booting from a pass- through devices. 31 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 32. Try PXE boot with gPXE Configuration & Hack Comment out checking the device number in hvmloader. Don’t specify emulation nic. Only specify a nic of pass through device. Recompile gPXE with the driver of the device and remake eb-roms.h 32 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 33. Result… gPXE may not support the NIC cards gPXE may check device-id/vendor-id and so on inside itself. Need more debug… 33 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 34. Some concerns Lack of I/O port space Boot device use I/O port but I/O port is only 64k. MMIO problem See docs/misc/vtd.txt (Assigning devices to HVM domains) Dependency of Multifunction device Some Multifunction device don’t work when we pass the single function to guest. pci.hide option If we use many pass-through devices, pci.hide option will be very long… 34 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 35. Q&A Any question? 35 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 36. This work was partly funded by Ministry of Economy, Trade and Industry (METI) of Japan as the Secure Platform project of Association of Super-Advanced Electronics Technologies (ASET). 36 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  • 37. Thank you 37 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009