June 2008
Caitlin.Bestler@neterion.com
SR-IOV enables a new generation of NICs
with multiple PCI Functions:
◦ Each function operates as an independent NIC.
◦ The...
Multi-port NIC has multiple physical ports.
◦ Or possibly simply multiple NICs.
Each port is a distinct PCI function.
◦ Ea...
Multi-queue NICs provide multiple
independent queues within a PCI function.
Native OS can use multiple queues itself:
◦ CP...
Fastpath operations are
direct.
Fastpaths are
created/maintained by
frontend/backend.
Hardware specific code
is required i...
Multi-function NICs
present each external
port as its own multi-
queue NIC.
Each PCI function can be
directly assigned.
Fr...
This presentation is not a call to add
support for multi-function NICs in Xen.
◦ Because the support is already there.
Xen...
Assignment of each PCI Function can enable
direct networking support for Guests:
Eliminating virtualization overhead.
◦ To...
NIC presents itself as multiple PCI
functions.
◦ Xen can assign as many to each guest
  as it wants to.
Relies on an Addre...
This is not a bug. It is a feature.
There already is a device specific driver in
the Guest OS image.
The vendor worked ver...
With Direct Assignment Only one driver is
needed per OS
◦ No separate distribution, development,
  testing or certificatio...
Only the raw frame forwarding services are
needed on the NIC.
Typical switch/bridge design is already split
between a fram...
A Shared Device is a generic PCI device.
A Generic PCI Function can be assigned
without understanding it:
◦ Or what servic...
There are other shared configuration issues
to be resolved:
◦ On-chip resources must be allocated, but only the device nee...
Neither the Hypervisor or DomD needs to
be involved.
◦ The Shared Device and its PCI Function Drivers can implement
  thei...
Many methods possible
◦ As though 802.1Q Bridge per external port.
◦ Static defaults applied to be unmanaged switch
    Al...
Migration Support using
   Guest OS Services
Frontend/Backend is kept in place and is
always available.
Direct Assignment is used for most important
Guests
◦ Each mult...
Page Protection is not enough
◦ Full Function Assignment requires true isolation of each PCI
  Function.
     Protecting P...
Directly Assigned Devices can be migrated
using existing services:
◦ GOS Device Bonding / Failover.
   Including netfront/...
PCI Function assignment can support Multi-function NICs as
generic PCI devices.
◦ No special netfront/netback features are...
Support PCI Function assignment.
◦ It’s not just for special purpose devices.
◦ It is well suited for high performance dev...
Additional Material
that there will not be time for.




                           5/14/2008
Multi-function NIC is unlikely to fully
support all netfilter rules in hardware.
When considering Direct Assignment:
◦ Det...
Auto-negotiate the uplinks.
Divide resources evenly over
configured/enabled PCI functions.
Do not enable other VLANs.
But ...
No performance penalty
◦ GOS Driver is interacting with Device the same way it would
  without virtualization.
    There i...
Frontend/Backend
supplies excellent
migration already
◦ But requires a Hypervisor
  specific frontend driver.
Because it i...
Availability:
◦ Is the driver installed in the Guest OS image?

Efficiency:
◦ Does the driver interfaces efficiently with ...
Availability:
◦ Excellent, NICs to be emulated are selected based on widespread
  deployment.
Performance:
◦ Terrible.
Mig...
Availability:
◦ Good. But there is a lag problem on which frontend has made it
  into the OS distribution.
Performance:
◦ ...
Availability:
◦ Excellent. The same driver is used whether running natively or
  under any Hypervisor.
◦ NIC vendors alrea...
Multi-queue is a valuable feature
◦ But it does not really compensate for being a Single PCI Function
  Device.
Multi-func...
More on GOS Enabled Migration




                        5/14/2008
Requirement: device must be able to
checkpoint any per-client stateful image in
the client’s memory space.
◦ Device is tol...
Same-to-same migration only requires
checkpoint/restore of any Device state via VF
Driver.
◦ Once state is checkpointed in...
Not all platforms have the same direct-access
NICs, but same-to-same migration can be
used anyway.
Method A: post-migratio...
Method B: migrate Same-to-same via
netfront.
◦ Fail the Directly Assigned device.
◦ GOS will failover to the Frontend devi...
Frame Forwarding Services
    On-chip “Switch”




                      5/14/2008
Frame Forwarding Services needed
◦ Directs incoming frames to the correct guest VNIC / PCI Function.
◦ Provide internal VN...
Spanning Tree




                5/14/2008
MF-NIC treats
each external port
as uplink for a
distinct bridge.
Uplink is always
the Root Port.
Guest VNICs are
always
D...
If DomD or DomU
forwards frames
between “external”
ports then there
are problems
•If Spanning Tree
is not used then     Ex...
XS Boston 2008 Networking Direct Assignment
XS Boston 2008 Networking Direct Assignment
Upcoming SlideShare
Loading in...5
×

XS Boston 2008 Networking Direct Assignment

309

Published on

Caitlin Bestler: Virtual Networking via Direct Assignment

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
309
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

XS Boston 2008 Networking Direct Assignment

  1. 1. June 2008 Caitlin.Bestler@neterion.com
  2. 2. SR-IOV enables a new generation of NICs with multiple PCI Functions: ◦ Each function operates as an independent NIC. ◦ The functions actually share an external physical port. ◦ Example: Neterion X3100. Direct PCI Function assignment provides Native performance. ◦ While maintaining Hypervisor, Dom0/DomD and GOS control. ◦ Using the same drivers for the Native OS no matter which Hypervisor is in use (or none). ◦ Is not dependent on Native OS being SR-IOV aware.
  3. 3. Multi-port NIC has multiple physical ports. ◦ Or possibly simply multiple NICs. Each port is a distinct PCI function. ◦ Each PCI function can be directly assigned to a Guest. ◦ Provides benefits of Native Driver usage No virtualization penalty. Full features of native mode available. Single driver in the OS image. But an entire port is a big thing to assign ◦ Not enough space for physical ports and cables. ◦ No granularity on assignment. ◦ Bandwidth wasted if the assignee has nothing to send.
  4. 4. Multi-queue NICs provide multiple independent queues within a PCI function. Native OS can use multiple queues itself: ◦ CPU Affinity, QoS, Ethernet Priorities. DomD can utilize Guest Specific queues: But his is not true device assignment. ◦ Backend must validate/translate each request (WQE/TxD). ◦ Does not enable vendor’s native driver. Which already knows how to use multiple queues. ◦ Does not provide Function Level Reset.
  5. 5. Fastpath operations are direct. Fastpaths are created/maintained by frontend/backend. Hardware specific code is required in both the Guest and DomD. Single-threaded control means entaglement between Guests. ◦ Are resource handles migratable? ◦ No Function Level Reset.
  6. 6. Multi-function NICs present each external port as its own multi- queue NIC. Each PCI function can be directly assigned. Frame Forwarding and External Ports are shared. Sharing is resolved on the device: ◦ subject to Policy from Hypervisor, DomD and GOS. ◦.
  7. 7. This presentation is not a call to add support for multi-function NICs in Xen. ◦ Because the support is already there. Xen, and the various GOSs, already have almost everything they need to support multi-function NICs. Xen has PCI Function Delegation. ◦ Xen has migration. ◦ GOSs support bonding/teaming drivers. ◦ GOSs support PCI device insertion/removal. ◦
  8. 8. Assignment of each PCI Function can enable direct networking support for Guests: Eliminating virtualization overhead. ◦ To be precise: costs of virtualization have been offloaded. Enabling a single driver in the OS image ◦ Regardless of which Hypervisor is deployed (or none). While still supporting migration. While still preserving Xen control.
  9. 9. NIC presents itself as multiple PCI functions. ◦ Xen can assign as many to each guest as it wants to. Relies on an Address Translation Solution ◦ IOMMU is just the most likely solution. ◦ GOS does not need to be IOMMU aware. Still needs to deal with more VMs than directly PCI Functions ◦ Correct complements frontend/backend. Worse issue with h/w specific driver in guest ◦ That’s not a bug. That’s a feature. Full L2+ switch functionality on NIC No need for a “full switch”.
  10. 10. This is not a bug. It is a feature. There already is a device specific driver in the Guest OS image. The vendor worked very hard to get it there. ◦ And to get it tested, and certified. ◦ There is already a distribution chain Which customers strongly prefer. ◦ It already integrates hardware capabilities with Native OS requirements.
  11. 11. With Direct Assignment Only one driver is needed per OS ◦ No separate distribution, development, testing or certification is required. ◦ Driver is operationally identical in all modes, not just a matter of packaging multiple drivers in one binary. ◦ One Driver can be distributed with the OS Image to work with any or no Hypervisor.
  12. 12. Only the raw frame forwarding services are needed on the NIC. Typical switch/bridge design is already split between a frame forwarding engine and management/control plane processing. The latter is usually in a conventional processor that sits on the side of the frame forwarding hardware. ◦ A “Level-3”/MF-NIC has a very powerful processor attached to it. ◦ In fact the customer already paid for it.
  13. 13. A Shared Device is a generic PCI device. A Generic PCI Function can be assigned without understanding it: ◦ Or what services it provides. ◦ What specific device model it is. ◦ What driver is required. Xen already supports this. There are other configuration issues that needto be addressed somewhere. ◦ Xen may be involved here.
  14. 14. There are other shared configuration issues to be resolved: ◦ On-chip resources must be allocated, but only the device needs to know how this is done. ◦ Ethernet device can only set one link state ◦ Load sharing between the VNICs. Who gets to transmit how much? Some of these may already be addressed by device independent Xen and/or Network Management ◦ Rate shaping. ◦ Uplink physical link configuration.
  15. 15. Neither the Hypervisor or DomD needs to be involved. ◦ The Shared Device and its PCI Function Drivers can implement their own solution. ◦ The device already knows how to talk with each VF Driver, and what VFs are active. DomD can control things it already knows: ◦ MAC Address of VNIC ports. ◦ VLAN Membership. But DomD does not need to deal with new device specific controls. ◦ All required logic can be implemented in device specific drivers and/or network management daemons.
  16. 16. Many methods possible ◦ As though 802.1Q Bridge per external port. ◦ Static defaults applied to be unmanaged switch All VNIC MAC Addresses are Manufacturer supplied. ◦ Privileged operations via the Native Driver Enabled for DomD or stand-alone Native Drivers. ◦ Combinations of the above. Existing vif-bridge script could easily configure the vsport matching the VNIC for a directly assigned VIF. ◦ It already has MAC Address and any VLAN ID. ◦ Suggested naming convention: use PCI Function number to name the Backend instance. Simplifies pairing with direct device.
  17. 17. Migration Support using Guest OS Services
  18. 18. Frontend/Backend is kept in place and is always available. Direct Assignment is used for most important Guests ◦ Each multi-function device will have a limit on how many guests it can directly support. Native Driver talks directly to NIC through its own PCI function, if enabled. Bonding Driver uses frontend/backend if direct NIC is not available.
  19. 19. Page Protection is not enough ◦ Full Function Assignment requires true isolation of each PCI Function. Protecting Pages is not enough. If a bad configuration register can hang the device then the functions are not truly independent. ◦ Some devices can only support direct Fastpaths. A direct fastpath does not address Driver Distribution Issues – It still requires two drivers ◦ One when virtualized ◦ One when running in native mode. Single-path for Slowpath Control means entangled Slowpaths ◦ Untangling for migration not guaranteed to be easy. RDMA users complain about cost of slowpath operations. ◦ Virtualization will only make it worse.
  20. 20. Directly Assigned Devices can be migrated using existing services: ◦ GOS Device Bonding / Failover. Including netfront/netback in the team enables migrations between platforms with different hardware. ◦ GOS support of PCI device insertion/removal. Including check-pointing of any stateful data in host memory.
  21. 21. PCI Function assignment can support Multi-function NICs as generic PCI devices. ◦ No special netfront/netback features are required. ◦ The same features that enable direct assignment of entire devices enable assignment of PCI Functions that actually share on-device resources. Leverage work done for the Native Oss ◦ Multi-queue. ◦ Bonding/Failover. ◦ Driver certification. ◦ Driver distribution. Direct PCI Function assignment eliminates the overhead of network virtualization.
  22. 22. Support PCI Function assignment. ◦ It’s not just for special purpose devices. ◦ It is well suited for high performance devices such as NICs and Graphics adapters. Rely on Native OS Distribution Chain Work to standardize control of switching services whether in DomD or on the NIC. Any Follow-up Questions? ◦ DirectIO@neterion.com Xen Summit Boston 2008 5/14/2008
  23. 23. Additional Material that there will not be time for. 5/14/2008
  24. 24. Multi-function NIC is unlikely to fully support all netfilter rules in hardware. When considering Direct Assignment: ◦ Determine which netfilter rules are implemented by the Multifunction NICs frame forwarding services. ◦ Determine if the remaining netfilter rules can be trusted to DomU. ◦ If there are filters that the hardware cannot implement, and cannot be trusted to DomU, then don’t do the direct assignment. Direct Assignment complements frontend/backend. It is not a replacement.
  25. 25. Auto-negotiate the uplinks. Divide resources evenly over configured/enabled PCI functions. Do not enable other VLANs. But any non-default configuration must be done via a privileged PCI function. Xen Summit Boston 2008 5/14/2008
  26. 26. No performance penalty ◦ GOS Driver is interacting with Device the same way it would without virtualization. There is Zero penalty to the host. Multi-function NICs offload the cost of sharing. ◦ Frontend/Backend solutions always cost more: Address translation has non-zero cost. Copying even more. Latency penalty unavoidable. An extra step cannot take zero time. Can support ANY service supported by the Native OS. ◦ because the Native OS Driver sees the same resources.
  27. 27. Frontend/Backend supplies excellent migration already ◦ But requires a Hypervisor specific frontend driver. Because it is the only universally supported solution it plays a critical role in enabling migration.
  28. 28. Availability: ◦ Is the driver installed in the Guest OS image? Efficiency: ◦ Does the driver interfaces efficiently with the NIC? Migration: ◦ Can Guests using this Driver be migrated? Flexibility: ◦ Can new services be supported?
  29. 29. Availability: ◦ Excellent, NICs to be emulated are selected based on widespread deployment. Performance: ◦ Terrible. Migration: ◦ not a problem. Flexibility: ◦ None. You’re emulating a 20th century NIC.
  30. 30. Availability: ◦ Good. But there is a lag problem on which frontend has made it into the OS distribution. Performance: ◦ Tolerable. Migration: ◦ not a problem. Flexibility: ◦ New features require extensive collaboration.
  31. 31. Availability: ◦ Excellent. The same driver is used whether running natively or under any Hypervisor. ◦ NIC vendors already deal with OS distributions. Performance: ◦ Same as native. Migration: ◦ Not really a problem, details to follow. Flexibility: ◦ Same as native.
  32. 32. Multi-queue is a valuable feature ◦ But it does not really compensate for being a Single PCI Function Device. Multi-function NICs are multi-queue NICs ◦ But each queue is owned by a specific PCI Function. ◦ It operates within the function specific IO MAP Allowing GOS to communicate GPAs directly to the NIC. Each PCI Function has its own MSI-X. ◦ PCI Config space. ◦ Function Level Reset. ◦ Statistics. ◦
  33. 33. More on GOS Enabled Migration 5/14/2008
  34. 34. Requirement: device must be able to checkpoint any per-client stateful image in the client’s memory space. ◦ Device is told when to checkpoint any Guest-specific stateful information in the Guest memory image. ◦ Migrating Guest check-pointed memory image is a known problem that is already solved. Device driver on new host is told to restore from check-pointed memory image. ◦ Check-pointed image should be devoid of any absolute (non-VF relative) references. ◦ If this is not certain a “Migration Notice” is needed to enable the driver to fix all absolute references.
  35. 35. Same-to-same migration only requires checkpoint/restore of any Device state via VF Driver. ◦ Once state is checkpointed in VM memory, the Hypervisor knows how to migrate the VM. Many services do not require migration ◦ Each VM implements one instance of a distributed Service. Persistence is a shared responsibility. ◦ Most Web servers fall in this category. GOS already provides failover between dissimilar devices through bonding drivers.
  36. 36. Not all platforms have the same direct-access NICs, but same-to-same migration can be used anyway. Method A: post-migration makes right ◦ Just do a Same-to-Same Migration anyway. ◦ It will work Of course because the actual device is missing on the new platform the re-activated instance will fail. Invoking existing device failover logic within the Guest. ◦ Possible Enhancement: Provide PCI Device removal event immediately on migration.
  37. 37. Method B: migrate Same-to-same via netfront. ◦ Fail the Directly Assigned device. ◦ GOS will failover to the Frontend device. ◦ Migrate same-to-same to the new target platform. Which always can support netfront. ◦ Enable the appropriate Directly Assigned device on the new platform. ◦ GOS is informed on newly inserted PCI Function. ◦ GOS will failover to the preferred device as though it were being restored to service.
  38. 38. Frame Forwarding Services On-chip “Switch” 5/14/2008
  39. 39. Frame Forwarding Services needed ◦ Directs incoming frames to the correct guest VNIC / PCI Function. ◦ Provide internal VNIC-to-VNIC frame forwarding. ◦ Provides VNIC to external port forwarding: Some form of traffic shaping probably required. Must prevent forged source addresses. Must enforce VLAN membership. Must work with Dom D “soft switch” ◦ Must agree when to enable Spanning Tree. ◦ NIC Frame Forwarding may be statically controlled, not learned. ◦ NIC Frame Forwarding must allow Dom D to be “catchall”.
  40. 40. Spanning Tree 5/14/2008
  41. 41. MF-NIC treats each external port as uplink for a distinct bridge. Uplink is always the Root Port. Guest VNICs are always Downstream Ports. There is never a Blocked Port. 5/14/2008
  42. 42. If DomD or DomU forwards frames between “external” ports then there are problems •If Spanning Tree is not used then External bridge thinks this is 2nd path to Soft-Bridge, and therefore blocks it. loops can result. But MF-NIC Bridge does not know this. •If Spanning Tree is used then one MF-NIC port may be deactivated. 5/14/2008
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×