(PowerPoint format)
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
981
On Slideshare
981
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
34
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • This presentation will cover the various server I/O network types, including their requirements. This presentation will look at the standards that are contending for share in each network type.

Transcript

  • 1. Server I/O Networks Past, Present, and Future Renato Recio Distinguished Engineer Chief Architect, IBM eServer I/O Copyrighted, International Business Machines Corporation, 2003
  • 2. Legal Notices All statements regarding future direction and intent for IBM, InfiniBand TM Trade Association, RDMA Consortium, or any other standard organization mentioned are subject to change or withdrawal without notice, and represent goals and objectives only. Contact your IBM local Branch Office or IBM Authorized Reseller for the full text of a specific Statement of General Direction. IBM may have patents or pending patent applications covering subject matter in this presentation. The furnishing of this presentation does not give you any license to these patents. Send license inquires, in writing, to IBM Director of Licensing, IBM Corporation, 500 Columbus Avenue, Thornwood, NY 10594 USA. The information contained in this presentation has not been submitted to any formal IBM test and is distributed as is. While each item may have been reviewed by IBM for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. The use of this information or the implementation of any techniques described herein is a customer responsibility and depends on the customer's ability to evaluate and integrate them into the customer's operational environment. Customers attempting to adapt these techniques to their own environments do so at their own risk. The following terms are trademarks of International Business Machines Corporation in the United States and/or other countries: AIX, PowerPC, RS/6000, SP, S/390, AS/400, zSeries, iSeries, pSeries, xSeries, and Remote I/O. UNIX is a registered trademark in the United States and other countries, licensed exclusively through X/Open Company, Limited. Ethernet is a registered trademark of Xerox Corporation. TPC-C, TPC-D, and TPC-H are trademarks of the Transaction Processing Performance Council. Infiniband TM is a trademark of the Infiniband TM Trade Association. Other product or company names mentioned herein may be trademarks or registered trademarks of their respective companies or organizations.
  • 3. In other words… Regarding Industry Trends and Directions
    • IBM respects the copyright and trademark of other companies…
    • and
    • These slides represent my views:
      • Does not imply IBM views or directions.
      • Does not imply the views or directions of InfiniBand SM Trade Association, RDMA Consortium, PCI-SIG, or any other standard group.
    • These slides simply represent my view.
  • 4. Agenda
    • Server I/O
      • Network types
      • Requirements,
      • Contenders.
    • Server I/O
      • I/O Attachment and I/O Networks
        • PCI family and InfiniBand
      • Network stack offload
        • Hardware, OS, and Application considerations
      • Local Area Networks
      • Cluster Area Networks
        • InfiniBand and Ethernet
      • Storage Area Networks
        • FC and Ethernet
    • Summary
  • 5. Purpose of Server I/O Networks uP, $ uP, $ uP, $ uP, $ I/O Expansion Network Virtual Adapter Bridge . . . . Virtual Adapter Virtual Adapter uP, $ uP, $ uP, $ uP, $ I/O Expansion Network Virtual Adapter Virtual Adapter Virtual Adapter Switch Storage Area Network Cluster Network I/O I/O I/O I/O Local Area Network Virtual Adapter Virtual Adapter I/O I/O I/O Attachment I/O Attachment Server I/O networks are used to connect devices and other servers. Memory Memory Controller Memory Memory Controller
  • 6. Server I/O Network Requirements
    • In the past, servers have placed the following requirements on I/O networks:
      • Standardization , so many different vendors' products can be connected;
      • Performance (scalable throughput and bandwidth; and low latency/overhead);
      • High availability , so connectivity is maintained despite failures;
      • Continuous operations , so changes can occur without disrupting availability;
      • Connectivity , so many units can be connected;
      • Distance , both to support scaling and to enable disaster recovery; and
      • Low total cost , interacts strongly with standardization through volumes; also depends on amount of infrastructure build-up required.
    • More recently, servers have added the following requirements:
      • Virtualization of host, fabric and devices;
      • Service differentiation (including QoS), to manage fabric utilization peaks; and
      • Adequate security , particularly in multi-host (farm or cluster) situations.
  • 7. Server I/O Network History
    • Historically, no single technology satisfied all the above requirements,
      • So many types of fabrics proliferated:
        • Local Area Networks
        • Cluster Networks (a.k.a. HPCN, CAN)
        • Storage Area Networks
        • I/O Expansion Networks, etc…
      • and many link solutions proliferated:
        • Standard:
          • FC for SAN,
          • Ethernet for LAN
        • Proprietary:
          • a handful of IEONs (IBM’s RIO, HP’s remote I/O, SGI’s XIO, etc…)
          • a handful of CANs (IBM’s Colony, Myricom’s Myrinet, Quadrics, etc…)
    • Consolidation solutions are now emerging, but the winner is uncertain.
      • PCI family: IOA and IOEN
      • Ethernet: LAN, SAN, CAN
      • InfiniBand: CAN, IOEN, and possibly higher-end IOA/SAN.
  • 8. Recent Server I/O Network Evolution Timeline Proprietary fabrics (e.g. IBM channels, IBM RIO, IBM-STI, IBM-Colony, SGI-XIO, Tandem/Compaq/HP-ServerNet) Rattner pitch (2/98) FIO goes public (2/99) FIO and NGIO merge into IB (9/99) NGIO goes public (11/98) NGIO Spec available (7/99) FIO spec available (9/99) InfiniBand Spec Releases Verb 1.0 (10/00) 1.0.a (6/01) PCI PCI-X 1.0 spec available (9/99) PCI-X 2.0 announced (2/02) 3GIO described at IDF (11/01) PCI-Express 1.0 Spec (7/02) PCI-X 2.0 DDR/QDR Spec (7/02) 1.1 (11/2002) AS 1.0 Spec (2003) ? RDMA over IP begins (6/00) 53 rd IETF ROI BOF calls for IETF ROI WG (12/01) RDMAC announced (5/02) 54 th IETF RDDP RDDP WG Chartered (3/02) Ext. (12/03) RDMA, DDP, MPA 1.0 specs (10/02) Verbs, SDP, iSER, … 1.0 specs (4/03)
  • 9. PCI
    • PCI standard’s strategy is:
      • Add evolutionary technology enhancements to the standard,
      • that maintain the existing PCI eco-system.
    • Within the standard, two contenders are vying for IOA market share:
      • PCI-X
        • 1.0 is shipping now,
        • 2.0 is next and targets 10 Gig Networking generation.
      • PCI-Express
        • Maintains existing PCI software/firmware programming model,
        • adds new: protocol layers, physical layer, and associated connectors.
        • Can also be used as an IOEN, but does not satisfy all enterprise class requirements
          • Enterprise class RAS is optional (e.g. multipathing)
          • Fabric Virtualization is missing,
          • More efficient I/O communication model, …
        • Will likely be extended to support:
          • Faster speed link,
          • Mandatory enterprise class RAS.
  • 10. I/O Attachment Comparison Cost New chip core (macro) IOEN and I/O Attachment Delta to existing PCI chips None Infrastructure build up Fabric consolidation potential PCI-Express (1.0, 2.0) PCI-X (1.0, 2.0) Performed by host None No standard mechanism Performed by host None No standard mechanism Host virtualization Network virtualization I/O virtualization Virtualization Interface checks, CRC No redundant paths Hot-plug and dynamic discovery Traffic classes, virtual channels Interface checks, Parity, ECC No redundant paths Hot-plug and dynamic discovery N/A Unscheduled outage protection Schedule outage protection Service level agreement Self-management Memory mapped switched fabric Chip-chip, card-card connector, cable Multi-drop bus or point-point Chip-chip, card-card connector Connectivity Distance Serial 1x, 4x, 8x, 16x 2.5 GHz -> 5 or 6.25 GHz 250 MB/s to 4 GB/s Parallel 32 bit, 64 bit 33, 66, 100, 133, 266, 533 MHz 132 MB/s to 4.17 GB/s Effective link widths Effective link frequency Bandwidth range Performance
  • 11. InfiniBand
    • IB’s strategy:
      • Provide a new , very efficient, I/O communication model ,
      • that satisfies enterprise server requirements, and
      • can be used for I/O, cluster, and storage.
      • IB’s model
        • Enables middleware to communicate across a low latency, high bandwidth fabric, through messages queues, that can be accessed directly out of user space.
        • But… required a completely new infrastructure ,
        • (management, software, endpoint hardware, fabric switches, and links).
      • I/O adapter industry viewed IB’s model as too complex.
        • Sooo… I/O adapter vendors are staying on PCI,
        • IB may be used to attach high-end I/O to enterprise class servers.
      • Given current I/O attachment reality, enterprise class vendors will likely:
        • Continue extending their proprietary fabric(s), or
        • Tunnel PCI traffic through IB, and provide IB-PCI bridges.
  • 12. I/O Expansion Network Comparison Interface checks, CRC Memory access controls Redundant paths Hot-plug and dynamic discovery Service levels, virtual channels Identifier based switched fabric Chip-chip, card-card connector, cable Multi-host, general Serial 1x, 4x, 12x 2.5 GHz 250 MB/s to 3 GB/s Native: Message based asynchronous operations (Send and RDMA ) Tunnel: PIO based sync. operations IB PCI-Express Interface checks, CRC No native memory access controls No redundant paths Hot-plug and dynamic discovery Traffic classes, virtual channels Unscheduled outage protection Schedule outage protection Service level agreement Self-management Memory mapped switched fabric Chip-chip, card-card connector, cable Single host, root Tree Connectivity Distance Topology Serial 1x, 4x, 8x, 16x 2.5 GHz 250 MB/s to 4 GB/s PIO based synchronous operations (network traversal for PIO Reads) Link widths Link frequency Bandwidth range Latency Performance
  • 13. I/O Expansion Network Comparison… Continued 5 or 6.25 GHz (work in process) Verb enhancements 5 or 6.25 GHz (work in process) Mandatory interface checks, CRC Higher frequency links Advanced functions Next steps Standard mechanisms available End-point partitioning Standard mechanisms available New infrastructure IOEN, CAN, high-end I/O Attachment IB Cost PCI-Express Performed by host None No standard mechanism Host virtualization Network virtualization I/O virtualization Virtualization New chip core (macro) IOEN and I/O Attachment Infrastructure build up Fabric consolidation potential
  • 14. Server Scale-up Topology Options PCI-X Bridge Switch Memory uP, $ uP, $ uP, $ uP, $ Memory Controller PCI-Express Bridge Adapter Adapter Adapter Adapter Adapter Adapter Adapter Adapter PCI-X Bridge Memory uP, $ uP, $ uP, $ uP, $ Memory Controller Memory uP, $ uP, $ uP, $ uP, $ Memory Controller Switch
    • Key PCI-Express IOEN value proposition
      • Bandwidth scaling
      • Short-distance remote I/O
      • Proprietary based virtualization
      • QoS (8 traffic classes, virtual channels)
      • Low infrastructure build-up
        • Evolutionary compatibility with PCI
    • Key IB IOEN value proposition
      • Bandwidth scaling
      • Long distance remote I/O
      • Native, standard based virtualization
      • Multipathing for performance and HA
      • QoS (16 service levels, virtual lanes)
      • CAN and IOEN convergence
    PCI-Express IOEN IB or Proprietary IOEN
    • PCI-Express:
      • SMP only
    PCI tunneling For large SMPs, a memory fabric must be used to access I/O that is not local to a SMP sub-node. Proprietary or IB SMP sub-node SMP sub-node SMP sub-node
  • 15. Server IOA Outlook
    • Server I/O Attachment
      • Next steps in PCI family roadmap:
        • 2003-05: PCI-X 2.0 DDR and QDR.
        • 2005: PCI-Express
      • Key drivers for PCI-Express are:
        • AGP replacement on clients (16x)
        • CPU chipset on clients and servers (8 or 16x)
      • IB as an IOA
        • Complexity and eco-system issues will limit IB to a small portion of high-end IOA.
    • Server I/O Expansion
      • Options for scale-up servers:
        • Migrate to IB and tunnel PCI I/O through it.
        • Continue upgrading proprietary IOEN.
        • Migrate to PCI-Express.
      • SHV servers will likely pursue PCI-Express:
        • Satisfies low-end requirements,
        • but not all enterprise class requirements.
    I/O Attachment (GB/s) I/O Expansion Networks (GB/s) .01 .1 1 10 100 1994 1999 2004 2009 MCA PCI/PCI-X PCI-Exp. .01 .1 1 10 100 1994 1999 2004 2009 PCI-E (8/16x) IB (12x) ServerNet SGI XIO IBM RIO/STI HP
  • 16. Problems with Sockets over TCP/IP
    • Network intensive applications consume a large percent of the CPU cycles:
      • Small 1 KB transfers spend 40% of the time in TCP/IP, and 18% in copy/buffer mgt
      • Large 64 KB transfers spend 25% of the time in TCP/IP, and 49% in copy/buffer mgt
    • Network stack processing consumes a significant amount of the available server memory bandwidth (3x the link rate on receives).
    Copy/Data Mgt TCP IP NIC Interrupt Processing Socket Library Available for Application
    • Note:
      • 1 KB Based on Erich Nahum’s Tuxedo on Linux, 1 KB files, 512 clients run, but adds .5 instructions per byte for copy.
      • 64 KB Based on Erich Nahum’s Tuxedo on Linux, 64 KB files, 512 clients run, but adds .5 instructions per byte for copy.
    Receive server memory to link bandwidth ratio % CPU utilization* 0 20 40 60 80 100 No Offload, 1 KB No Offload, 64 KB 0 1 2 3 Standard NIC
  • 17. Network Offload – Basic Mechanisms
    • Successful network stack offload requires five basic mechanisms:
      • Direct user space access to a send/receive Queue Pair (QP) on the offload adapter.
        • Allows middleware to directly send/receive data through the adapter.
      • Registration of virtual to physical address translations with the offload adapter.
        • Allows the hardware adapter to directly access user space memory.
      • Access controls between registered memory resources and work queues.
        • Allows privileged code to associate adapter resources (memory registrations, QPs, and Shared Receive Queues) to a combination of: OS image, process, and, if desired, thread.
      • Remote direct data placement (a.k.a. Remote Direct Memory Access - RDMA).
        • Allows adapter to directly place incoming data into a user space buffer.
      • Efficient implementation of the offloaded network stack.
        • Otherwise offload may not yield desired performance benefits.
  • 18. Network Stack Offload – InfiniBand Host Channel Adapter Overview
    • Verb consumer – Software that uses HCA to communicate to other nodes.
    • Communication is thru verbs, that:
      • Manage connection state.
      • Manage memory and queue access.
      • Submit work to HCA.
      • Retrieve work and events from HCA.
    • Channel Interface (CI) performs work on behalf of the consumer.
    • CI consists of:
      • Driver – Performs privileged functions.
      • Library – Performs user space functions.
      • HCA – hardware adapter.
    • SQ – Send Queue
    • RQ – Receive Queue
    • SRQ – Shared RQ
    • QP – Queue Pair
    • QP = SQ + RQ
    • CQ – Completion Queue
    HCA Driver/Library Verb consumer Verbs HCA Data Engine Layer QP Context (QPC) IB Transport, IB Network,… CI CQ RQ SQ AE Memory Translation and Protection Table (TPT) SRQ
  • 19. Network Stack Offload – iONICs
    • iONIC
      • An internet Offload Network Interface Controller (iONIC).
      • Supports one or more internet protocol suite offload services.
    • RDMA enabled NIC (RNIC)
      • An iONIC that supports the RDMA Service.
    • IP suite offload services, include, but are not limited to:
      • TCP/IP Offload Engine (TOE) Service
      • Remote Direct Memory Access (RDMA) Service
      • iSCSI Service
      • iSCSI Extensions for RDMA (iSER) Service
      • IPSec Service
    Transport Network Sockets over Ethernet Link Service NIC Mgt Host Sockets over TOE Service Sockets over RDMA Service TOE Drv TOE Service Library iONIC TCP IP Ethernet RDMA/DDP/MPA NIC Dvr RNIC Drv RDMA Service Library Only the Ethernet Link, TOE, and RDMA Services are shown. Sockets Application
  • 20. Network Stack Offload – iONIC RDMA Service Overview
    • Verb consumer – Software that uses RDMA Service to communicate to other nodes.
    • Communication is thru verbs, that:
      • Manage connection state.
      • Manage memory and queue access.
      • Submit work to iONIC.
      • Retrieve work and events from iONIC.
    • RDMA Service Interface (RI) performs work on behalf of the consumer.
    • RI consists of:
      • Driver – Performs privileged functions.
      • Library – Performs user space functions.
      • RNIC – hardware adapter.
    RNIC Driver/Library Verb consumer Verbs iONIC RDMA Service Data Engine Layer QP Context (QPC) RDMA/DDP/MPA/TCP/IP … RI CQ RQ SQ AE Memory Translation and Protection Table (TPT) SRQ
    • SQ – Send Queue
    • RQ – Receive Queue
    • SRQ – Shared RQ
    • QP – Queue Pair
    • QP = SQ + RQ
    • CQ – Completion Queue
  • 21. Network I/O Transaction Efficiency
    • Graph shows a complete transaction:
      • Send and Receive for TOE
      • Combined Send+RDMA Write and Receive for RDMA
    Send and Receive pair .01 .1 1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 Transfer size in bytes CPU Instructions/Byte ELS TOE 1 TOE 2 SDP RDMA
  • 22. Network Offload Benefits Middleware View Multi-tier Server Environment
    • Benefit of network stack offload (IB or iONIC) depends on the ratio of:
      • Application/Middleware (App) instructions :: network stack instructions.
    Presentation Server DB Client & Replication Web Application Server Business Function Server: OLTP & BI DB; HPC NC not useful at present due to XML & Java overheads
    • Sockets-level NC support beneficial
    • (5 to 6% performance gain for communication between App tier and business function tier)
    • (0 to 90% performance gain for communication between browser and web server)
    • Low-level (uDAPL, ICSC) support most beneficial
    • (4 to 50% performance gain for business function tier)
    • iSCSI, DAFS support beneficial
    • (5 to 50% gain for NFS/RDMA compared to NFS performance)
    Legend Note: All tiers are logical; they can potentially run on the same server OS instance(s). Traditionally use Cluster Network. Client Tier Browser User Web Server Presentation Data Application Data Business Data
  • 23. TCP/IP/Ethernet Are King of LANs
    • Ethernet is standard and widely deployed as a LAN.
      • Long distance links (from card-card to 40 Km).
      • High availability through session, adapter, or port level switchover.
      • Dynamic congestion management when combined with IP Transports.
      • Scalable security levels.
      • Sufficient performance for LAN.
      • Good enough performance for many clusters, and
      • high performance (when combined with TCP Offload)
    • Strategy is to extend Ethernet’s role in Wide Area Networks, Cluster, and Storage, through a combination of:
      • Faster link speed (10 Gb/s) at competitive costs
        • No additional cost for copper (XAUI).
        • $150 to 2000 transceiver cost for fiber.
      • internet Offload Network Interface Controllers (iONICs)
        • Multiple services
      • Lower latency switches
        • Sub-microsecond latencies for data-center (cluster and storage) networks.
  • 24. Market Value of Ethernet 250 Million Ethernet ports installed to date Cumulative Shipments Server Ethernet NIC Prices $ 10 Mb/s 100 Mb/s 1 Gb/s IB 4x 10 Gb/s Cu 10 Gb/s Fiber 10 Gb/s Fiber iONIC 10 Gb/s Cu iONIC 1 Gb/s iONIC 1 10 100 1000 10000 Jan-96 Jan-99 Jan-02 Jan-05 10 100 1,000 10,000 100,000 1,000,000 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 Ports (000s) Sw 10 Sw 100 Sw 1000 Sw 10000
  • 25. LAN Switch Trends
    • Traditional LAN switch IHV business model has been to pursue higher level protocols and functions; with less attention to latency.
    • iONICs and 10 GigE are expected to increase the role Ethernet will play in Cluster and Storage Networks.
      • iONICs and 10 GigE provides an additional business model for switch vendors, that is focused on satisfying the performance needs of Cluster and Storage Networks.
      • Some Ethernet switch vendors (e.g. Nishan) are pursuing this new model.
    Switch latencies
    • Nishan switch:
    • IBM 0.18um ASIC
    • 25 million transistors
    • 15mm x 15mm die size
    • 928 signal pins
    • Less than 2us latency
    <1 us range 3-5 us range 2006 <2 us range 10-20 us range 2002 20-100 us range 1997 Data Center (eg iSCSI) focused switch General purpose switch
  • 26. LAN Outlook
    • Ethernet vendors are gearing up for storage and higher performance cluster networks:
      • 10 GigE to provide higher bandwidths;
      • iONIC to solve CPU and memory overhead problems; and
      • lower latency switches to satisfy end-end process latency requirements.
    • Given above comes to pass,
    • How well will Ethernet play in
      • Cluster market?
      • Storage market?
    2x every 16 months (over the past 10 years) Local Area Networks .001 .01 .1 1 10 100 1974 1984 1994 2004 GB/s Ethernet Token Ring ATM FDDI
  • 27. Cluster Network Contenders
    • Proprietary networks
      • Strategy:
        • Provide advantage over standard networks.
          • Lower latency/overhead, higher link performance, and advanced functions
        • Eco-system completely supplied by one vendor.
      • Two approaches:
        • Multi-server – can be used on server from more than one vendor.
        • Single-server – only available on server from one vendor.
    • Standard networks
      • IB
        • Strategy is to provide almost all the advantages available on proprietary networks;
        • thereby, eventually, displacing proprietary fabrics.
      • 10 Gb/s Ethernet internet Offload NIC (iONIC)
        • Strategy is to increase Ethernet’s share of the Cluster network pie,
        • by providing lower CPU/memory overhead and advanced functions,
        • though at a lower bandwidth than IB and proprietary networks.
    • Note: PCI-Express doesn’t play, because it is missing many functions.
  • 28. Cluster Network Usage in HPC
    • Use of proprietary cluster networks for high-end clusters will continue to decline.
      • Multi-platform cluster networks have already begun to gain significant share.
      • Standards-based cluster networks will become the dominant form.
    0% 20% 40% 60% 80% 100% Standards Based Multi Platform Single Platform June 2002 November 2002 June 2000 Cluster Interconnect Technology Top 500 Supercomputers Top 100 Next 100 Last 100 Top 100 Next 100 Last 100 Top 100 Next 100 Last 100 * Source: Top 500 study by Tom Heller. *
  • 29. Reduction in Process-Process Latencies 256 B and 8 KB Block LAN Process-Process Latencies Normalized LAN Process-Process Latencies 1 GigE 100 MFLOP=19us 10 GigE 100 MFLOP= 6us IB 100 MFLOP= 6us 1.2x lower 4.6x lower 8.4x lower 2.5x lower 3.0x lower 3.9x lower
  • 30. HPC Cluster Network Outlook
    • Proprietary fabrics will be displaced by Ethernet and IB.
    • Server’s with the most stringent performance requirements will use IB.
    • Cluster Networks will continue to be predominately Ethernet.
      • iONIC and low-latency switches will increase Ethernet’s participation in the cluster network market.
    High Performance Standard Links (IB/Ethernet) .0001 .001 .01 .1 1 10 100 1985 1989 1994 1999 2004 2009 Bandwidth (GB/s) Ethernet Token Ring ATM FDDI FCS IB HiPPI ServerNet Mem. Channel SGI-GIGAchnl. IBM SP/RIO/STI Synfinity
  • 31. Current SAN and NAS Positioning Overview
    • Current SAN differentiators
      • Block level I/O access
      • High performance I/O
      • Low latency, high bandwidth
      • Vendor unique fabric mgt protocols Learning curve for IT folks
      • Homogeneous access to I/O.
    SAN
    • Current NAS differentiators
      • File level I/O access
      • LAN level performance
      • High latency, lower bandwidth
      • Standard fabric mgt protocols
      • Low/zero learning curve for IT folks
      • Heterogeneous platform access to files
    • Commonalities
      • Robust remote recovery and storage management requires special tools for both.
      • Each can optimize disk access, though SAN does require virtualization to do it.
    • Contenders:
      • SAN: FC and Ethernet.
    LAN NFS, HTTP, etc… LUN LBA
      • NAS: Ethernet.
  • 32. Storage Models for IP
    • Parallel SCSI and FC have very efficient path through O/S
      • Existing driver to hardware interface has been tuned for many years.
      • An efficient driver-HW interface model has been a key iSCSI adoption issue.
    • Next steps in iSCSI development:
      • Offload TCP/IP processing to the host bus adapter,
      • Provide switches that satisfy SAN latencies requirements,
      • Improve read and write processing overhead at the initiator and target.
    CPU SCSI or FC FS API Application Stg. Adapter FS/LVM Stg Driver Parallel SCSI or FC CPU Application iSCSI Service in iONIC Adapter Drv iSCSI HBA iSCSI CPU FS API Application iSCSI Service in host NIC FS/LVM iSCSI TCP/IP NIC Driver FS API FS/LVM Stg Driver TCP/IP Ethernet P. Offload Ethernet .01 .1 1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 Transfer size in bytes CPU Instructions/Byte Parallel SCSI iSCSI Service in host iSCSI Service in iONIC
  • 33. Storage Models for IP
    • RDMA will significantly improve NAS server performance.
      • Host network stack processing will be offloaded to an iONIC.
        • Removes tcp/ip processing from host path.
        • Allows zero copy.
      • NAS (NFS with RDMA Extensions) protocols will exploit RDMA.
      • RDMA allows a file level access device to approach
      • block level access device performance levels.
      • Creating a performance discontinuity for storage.
    NFS over ELS NIC NFS Extensions for RDMA over RDMA Service in iONIC CPU NFS API Application NIC NFS TCP/IP NIC Driver CPU Application RNIC RDMA/DDP NIC Driver NFS API NFS P. Offload Ethernet MPA/TCP IP/Ethernet .01 .1 1 10 100 1000 10000 100000 1 10 100 1000 10000 100000 Transfer size in bytes CPU Instructions/Byte NFS over ELS NIC NFS over RNIC Parallel SCSI
  • 34. Storage I/O Network Outlook
    • Storage network outlook:
      • Link bandwidth trends will continue.
        • Paced by optic technology enhancements.
      • Adapter throughput trend will continue
        • Paced by higher frequency circuits, higher performance microprocessors, and larger fast-write and read cache memory.
      • SANs will gradually transition from FC to IP/Ethernet networks.
        • Motivated by TCO/complexity reduction.
        • Paced by availability of:
          • iSCSI with efficient TOE (possibly RNIC)
          • Lower latency switches
      • NAS will be more competitive against SAN.
        • Paced by RNIC availability.
    * Sources: Product literature from 14 companies. Typically use a workload that is 100% read of 512 byte data; not a good measure of overall sustained performance, but it is a good measure of adapter/controller front-end throughput capability. Single Adapter/Controller Throughput SAN .001 .01 .1 1 10 1990 1995 2000 2005 GB/s SCSI FC Disk Head iSCSI/E .1 1 10 100 1000 1994 1998 2003 2008 K IOPS
  • 35. Summary
    • I/O server adapters will likely attach through PCI family,
    • because of PCI’s low cost and simplicity of implementation.
    • I/O expansion networks will likely use
      • Proprietary or IB (with PCI tunneling) links that satisfy enterprise requirements, and
      • PCI-Express on Standard High Volume, Low-End servers.
    • Clusters networks will likely use
      • Ethernet networks for the high-volume portion of market, and
      • InfiniBand when performance (latency, bandwidth, throughput) is required.
    • Storage area networks will likely
      • Continue using Fibre Channel, but gradually migrate to iSCSI over Ethernet.
    • LANs Ethernet is King.
    • Several network stack offload design approaches will be attempted
      • From all firmware on slow microprocessors,
      • to heavy state machine usage, to all points in between.
      • After weed design approaches are rooted out of the market,
      • iONICs will eventually become a prevalent feature on low to high-end servers.