XS Oracle 2009 Fujitsu

1,013 views

Published on

PCI Pass-through

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
1,013
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
10
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XS Oracle 2009 Fujitsu

  1. 1. Improvement of the PCI pass-through Jun Kamada <kama@jp.fujitsu.com> Akio Takebe <takebe_akio@jp.fujitsu.com> FUJITSU LIMITED All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  2. 2. Agenda Background Why SCSI ? pvSCSI and PCI pass-through Part 1: Current status of pvSCSI enhancements Part 2: The booting guest with PCI pass-through 2 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  3. 3. Background Why SCSI ? (1/2) Backup to tape is a fundamental functionality for reliability and availability. Free to move (to safe place) Long term preservation Safety box Storage preserve backup restore move load Tape drive unload Tape cartridge Tape drive is usually controlled by SCSI functionality. SCSI support on guest VM is highly desired in virtualized environment. (Issuing SCSI command from guest VM) 3 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  4. 4. Background Why SCSI ? (2/2) In data center, reliability and availability (e.g. hardware snapshot, tape backup) are provided by SCSI feature. These servers are consolidated into a server in virtualized environment. LAN DB Backup DBMS Server Server Data SCSI command Center SCSI command SAN Storage (RAID) Data snapshot Tape Drive Data File File Load, unload, reset Hardware snapshot SCSI support on guest VM is mandatory. (Issuing SCSI command from guest VM) 4 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  5. 5. Background pvSCSI and PCI pass-through We have developed the pvSCSI driver and will continue to enhance it. Report current status of the enhancement. (Part 1) On the other hand, we have needs to provide … Reliability with hardware assist. (e.g. PCIe AER, …) Seamless move between P and V. We are focusing on SAN/PXE boot using VT-d/IOMMU. Report enhancements of guest BIOS in order to provide SAN/PXE boot. (Part 2) 5 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  6. 6. Part 1 Current status of pvSCSI enhancements 6 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  7. 7. Current implementation (Xen 3.3.0) The pvSCSI driver for Xen 3.3.0 provides: LUN(Logical Unit Number) pass through LUN hot-plug Dom0 Guest Domain Physical SCSI host Virtual SCSI host Virtual SCSI host (host=0) (host=2) (host=2) (3) Add (2) Attach (1) Add Physical LUNs Virtual LUNs Virtual LUNs LUN … LUN LUN LUN LUN LUN LUN 1:0:1:3 0:0:0:1 0:1:2:3 2:0:0:0 2:0:0:1 2:0:0:0 2:0:0:1 Physical SCSI tree(s) Virtual SCSI tree Virtual SCSI tree Arbitrary (4) Appear mapping Immediately 7 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  8. 8. Issue of current implementation Current implementation provides completely virtualized (arbitrarily mapped) SCSI tree to guest domain. It can provide flexibility, but … Some kind of SCSI commands (REPORT_LUN, EXTENDED_COPY, …) should be emulated on backend. (They depend on physical topology of SCSI tree.) A lot of work is needed in order to Implement emulation logic for all the commands, so current implementation supports only mandatory commands. Does not support full SCSI functionality. :-( 8 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  9. 9. How to solve the issue 1. Implement all emulation logics step by step. Hard work. Cannot support some vendor specific commands, maybe. 2. “Add” new mode in order to attach whole HBA to guest domain. (It allows bypassing “SCSI command emulation” on backend driver.) Easy to implement. (Details will be shown in following slide.) Can support all vendor specific commands. We took second approach. 9 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  10. 10. Posted implementation (1/2) Additional implementation provides: Host (HBA: Host Bus Adaptor) pass through Dom0 Guest Domain Physical SCSI host Virtual SCSI host Virtual SCSI host (host=0) (host=2) (host=2) (1) Create (2) Attach Physical LUNs Virtual LUNs Virtual LUNs LUN LUN LUN LUN LUN LUN 0:0:0:1 0:1:2:3 2:0:0:1 2:1:2:3 2:0:0:1 2:1:2:3 Physical SCSI tree Virtual SCSI tree Virtual SCSI tree Same ID (underline only) 10 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  11. 11. Posted implementation (2/2) Following are modifications actually needed Backend Driver LUN/Host mode identification flag for each virtual SCSI tree Emulation bypassing logic (if the flag shows “Host mode”) Frontend Driver No need to modify xend User interface (in order to specify “Host mode”) LUN scan logic (provides shorter processing time by using “lsscsi” command, if exist. (Community’s request)) 11 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  12. 12. Conclusion (Part 1) We posted a series of patches on last week and, they were already merged into the unstable tree. (Thanks!) Please try and evaluate them. Many comments are appreciated. Thanks 12 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  13. 13. Part 2 The booting guest with PCI pass-through 13 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  14. 14. Introduction What is problem of booting from PCI pass- through? Before After dom0 guest dom0 guest qemu qemu Emulation disk Pass through disk Pass through disk (boot disk) (data disk) (boot disk & data disk) 14 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  15. 15. Contents of Part2 What are required for SAN/SAS boot Details of the requirements Status of the requirements Sample Other challenge (PXE boot) Some concerns 15 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  16. 16. What are required for SAN/SAS boot? • int 0x13 handler of pass through device (support calling convention of BCV style) • BIOS function • PMM(POST Memory Manager) service • PnP runtime function • IPL/BCV table BCV: Boot Connection Vector. It’s typically used by SCSI controller. 16 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  17. 17. Details of the requirements 17 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  18. 18. Calling PCI expansion ROM(1/3) BIOS needs to BIOS IDE disk handler read MBR of a boot disk. CD-ROM handler FDD handler int 0x13 PnP device handler What is BCV style? BCV is a pointer that points to code inside the Expansion ROM. By using the code, PCI cards supporting the boot spec of BCV style can hook INT 0x13 at the device initialization. Then BIOS can access the harddisk connected to the PCI cards by using the special INT 0x13 handler. 18 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  19. 19. Calling PCI expansion ROM (2/3) How to initialize Expansion ROM 0h signature 0xaa55 2h Image size … 1. Hvmloader map 3h Entry point the Expansion for INIT ROM to 0xc0000 jmp <address> function 6h reserved 0xc0000 Pointer to 18h PCI data ROM header Structure PCI data structure PnP 1Ah Expansion PnP Expansion Header Header Image 2. Hvmloader,rombios 3. rombios jump to checks some data Entry point for INIT function after 0xea000 supplying ax register with bus:dev:function number. 19 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  20. 20. Calling PCI expansion ROM(3/3) How to initialize Expansion ROM 0h signature 0xaa55 2h Image size … 3h Entry point for INIT jmp <address> function 4. Jump to BCV for hooking INT 6h reserved 0x13h Pointer to 0xc0000 18h PCI data Structure PnP ROM header 1Ah Expansion PCI data structure Header PnP Expansion Header Image Next PnP 0h signature $PnP Expansion Header … … … 06h offset of next header (0000h is none) … … … 09h checksum 0xea000 … … … BCV 16h Code to hook INT 0x13h … … … 20 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  21. 21. PMM service? The PMM provides memory allocation only during POST. PCI expansion ROM use PMM service. For example, PCI expansion ROM need a memory block to decompress their code and to allocate data area only used during initialization. 21 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  22. 22. PnP runtime function? PnP runtime functions are used by O/S and application program. It allow them to access BIOS features. (Get version, the number of device, …) PCI expansion ROM may check only PnP Installation Check structure to determine if the system has a Plug and Play BIOS. 22 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  23. 23. IPL/BCV table? IPL Table/IPL priority IPL Table/IPL priority decide in which order devices will be selected for booting. In the case of xen, they are configured like “boot=cda” in a guest configuration file. BCV Table/BCV priority BCV Table/BCV priority decide in which order devices will be selected for installing INT 0x13 handler. The order would affect the boot order. 23 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  24. 24. Example of Boot Order IPL table BCV table 1 HDD disk 1 IDE disk 2 CD-ROM Additional PCI card 2 3 Network (e.g. SCSI card) 4 Floppy BCV priority 2 1 IPL priority 4 2 1 3 IPL BCV 4 Floppy Boot Order CD-ROM 2 12 Additional PCI card 11 IDE disk 3 Network 24 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  25. 25. Status of the requirements 25 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  26. 26. Status(1/3) Calling convention Supported the calling convention of BCV style in BIOS Boot spec BCV cover not only PCI device but also ISA device. But IOMMU does not support ISA devices. So we supported only the calling convention of BCV style for PCI devices. 26 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  27. 27. Status(2/3) BIOS function PMM would be needed in some PCI cards. PMM has already been supported by Kouya Shimura PnP runtime function would be not called (in my experience). But we need to support dummy PnP runtime function because some Expansion ROM may check only supporting PnP runtime function. The dummy PnP runtime function is easy to support. In Bochs community, Sebastian Herbszt has already posted the patch. 27 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  28. 28. Status(3/3) BCV table/BCV priority How to support BCV priority for pass-though device on Xen? A) If without Emulation disks, boot a pass-through device. B) If we specify a pass-through device as a bootable, the expansion ROM of only the device is loaded. For example, pci= [ “bb:dd.ff,boot=1” ] C)Enhance the IPL table. If pass-through device is specified in boot order, the pass-through device of boot=1 option is selected as a boot device. For example, boot=“p”. 28 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  29. 29. Sample 29 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  30. 30. Other challenge (PXE boot with PCI pass-through) 30 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  31. 31. Issue of PXE boot Expansion ROM of Ethernet Almost PnP devices of ethernet don’t have Expansion ROM image on themselves. So we try to use gPXE for booting from a pass- through devices. 31 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  32. 32. Try PXE boot with gPXE Configuration & Hack Comment out checking the device number in hvmloader. Don’t specify emulation nic. Only specify a nic of pass through device. Recompile gPXE with the driver of the device and remake eb-roms.h 32 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  33. 33. Result… gPXE may not support the NIC cards gPXE may check device-id/vendor-id and so on inside itself. Need more debug… 33 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  34. 34. Some concerns Lack of I/O port space Boot device use I/O port but I/O port is only 64k. MMIO problem See docs/misc/vtd.txt (Assigning devices to HVM domains) Dependency of Multifunction device Some Multifunction device don’t work when we pass the single function to guest. pci.hide option If we use many pass-through devices, pci.hide option will be very long… 34 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  35. 35. Q&A Any question? 35 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  36. 36. This work was partly funded by Ministry of Economy, Trade and Industry (METI) of Japan as the Secure Platform project of Association of Super-Advanced Electronics Technologies (ASET). 36 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009
  37. 37. Thank you 37 All Rights Reserved, Copyright (C) FUJITSU 2007 - 2009

×