Multicore I/O Processors In Virtual Data Centers


Published on

Published in: Technology, Business
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Multicore I/O Processors In Virtual Data Centers

  1. 1. 5th Annual pp Application of Multicore I/O Processors in Virtualized Data Centers Nabil Damouny Rolf Neugebauer ESC – Multicore Expo San Jose, CA April 27 2010 27,
  2. 2. 5th Annual Outline  Networking Market Dynamics  Cloud Computing & the Virtualized Data Center  The Need for an Intelligent I/O Coprocessor  I/O Processing in Virtualized Data centers 1. SW-based (Bridge & vSwitch) 2. I/O Gateway 3. Virtual Ethernet Port Aggregation (VEPA) 4. Server-based  I/O Coprocessor Requirements  Meeting the I/O Coprocessor Challenge in Virtualized Data Centers  Heterogeneous Multicore Architecture  Netronome’s Network Flow Processors and Acceleration Cards  Summary and Conclusion. Data center virtualization is not complete until the I/O subsystem is also virtualized. y 2 2 ESC Silicon Valley – April, 2010 2
  3. 3. 5th Annual About Netronome • Fabless semiconductor company, developing Network Flow Processing solutions for high-performance, programmable, L2-L7 applications • Network coprocessors for x86 designs • Most complex processing per packet than any other architecture • Best in class performance per watt • Unmatched integration with x86 CPUs • Family of products including processors, acceleration cards, development tools, software libraries and professional services Intel Agreements Summary • Founded in 2003 • IXP28XX Technology License • Solid background in networking, communications, security, voice and video applications, high-performance computing • SDK Software License • Comprised of networking and silicon veterans • HDK Hardware License • S l and Marketing Sales d M k ti • Global Presence • QPI Technology License • Boston, Massachusetts; Santa Clara, California; Pittsburgh, Pennsylvania; Cambridge, United Kingdom; Shenzhen, China; Penang, Malaysia 3 3 ESC Silicon Valley – April, 2010 3
  4. 4. 5th Annual Networking Market Dynamics Eventually, every packet Market Drivers from every flow of communications Application services will Awareness Integrated Email, Web, be intelligently Multimedia Content Security Inspection processed. VPN, SSL, Spam, Voice. Video, Data, Anti-Virus, IDS/IPS, Executables Firewall Intelligent Increasing Networking Device Bandwidth B d idth Switching, Routing, S it hi R ti Virtualization Millions of packets WiMax, 3GPP LTE, Security Blades & Multicore, Multi-OS and flows at 10GigE Appliances, Data Multi-app, Multi-I/O and beyond Center Servers Source: Morgan Stanley Increasing Bandwidth, Greater Security Requirements and the need for Application and Content-aware Networking are Driving the Evolution to Intelligent Networking (L2-L7) from Today’s Simpler (L2-L3 only) Networks. 4 4 ESC Silicon Valley – April, 2010 4
  5. 5. 5th Annual Unified Computing in Virtualized Data centers .… Requires Intelligent Networking  Unified Computing: The convergence of computing, networking, and storage in a virtualized environment  Applies to the enterprise (private or internal) and service providers  Environment: Uncorrelated high I/O data rates  Networking  Web servers, especially virtualized servers  Unified Computing - combination of servers and networking  Requirements for high-performance intelligent networking  I/O coprocessing for multicore IA/x86 to scale applications  Intelligent flow-based switching for inter-VM communications  Manage complex high performance networking interfaces The advent of many VMs and the need for IOV creates a new set of requirements that mandates a more intelligent approach for managing I/O. 5 5 ESC Silicon Valley – April, 2010 5
  6. 6. 5th Annual Cloud Computing … Definition & Services Cloud Computing Defined:  IT-related capabilities are provided “as a service” using Internet technologies to multiple external customers.  P blic Clo ds Public Clouds  Private Clouds Types of services available in Cloud Computing:  Software-as-a-service: Software applications delivered over the Web  Infrastructure- as-a-service: Remotely accessible server and storage capacity  Platform- as-a-service: compute-and-software platform that lets developers build and deploy Web applications on a hosted infrastructure infrastructure. Cloud computing technologies play a crucial role in allowing companies to scale their data center infrastructure to meet performance and TCO requirements. 6 6 ESC Silicon Valley – April, 2010 6
  7. 7. 5th Annual The Need for an I/O Coprocessor … In the Virtualized data Center  Efficient delivery of data to VMs at high rates (20+ Gbs)  Requires intelligent IOV solution.  Just L2+ processing is not enough  VLANs, ACLs, etc only cover the base  Stateful load-balancing requires flow-awareness  Clouds are hostile environments:  Stateful firewalls, IPS/IDS, deep packet inspection capabilities  Multicore x86 CPUs  show poor packet processing performance  A unsuitable f h dli millions of stateful fl Are it bl for handling illi f t t f l flows  Have high power consumption Introduce an intelligent I/O-Coprocessor to assist x86 Multicore CPUs 7 7 ESC Silicon Valley – April, 2010 7
  8. 8. 5th Annual IDC … on I/O Virtualization  “If I/O is not sufficient, then it could limit all the gains brought about by the virtualization process” process  I/O subsystem needs to deliver peak throughput and lower latency to the VMs and to the applications they host.  As the VM density increases, most customers are scaling I/O capacity by installing more adapters.  IOV is simply the abstraction of the logical details of I/O from the physical, essentially to separate the upper-layer protocols from the physical connection or transport. If I/O is not sufficient, then it could limit all the gains brought about by the virtualization process 8 8 ESC Silicon Valley – April, 2010 8
  9. 9. 5th Annual I/O Coprocessor in a Virtualized Heterogeneous Multicore Architecture Multicore CPU Multicore CPU VM1 VM2 VM3 VMn VM1 VM2 VM3 VMn OS OS OS OS OS OS OS OS VNIC VNIC VNIC VNIC VNIC VNIC VNIC VNIC x86 Chipset Control plane PCIe Gen2 Data plane IOV 10GE 10GE I/O Coprocessor C High-speed Serial interface Interlaken * Future 9 9 ESC Silicon Valley – April, 2010 9
  10. 10. 5th Annual I/O Coprocessor Requirements in a Heterogeneous Multicore Architecture Addressing the Inter-VM Switching and I/O Challenge Inter-chip access •Demultiplexing and classification ore Multico • TCP offload ffl d x866 • Host offload for burdensome I/O, security, DPI functions IOV • Zero copy, big block transfers to Flow Processor multiple cores, VMs or endpoints Multicore • Full I/O virtualization with Intel M w VTd • Programmable egress traffic management Heterogeneous Multicore Processing Solutions are >4x performance of (Multicore x86 + standard NIC). p ( ) ESC Silicon Valley – April, 2010 10 10 10
  11. 11. 5th Annual Challenges in Virtualized Data Centers Rack of single core Many virtual machines and cores servers and switches in one server 5 years ago What was a rack of servers five years ago is now a single server including networking (switch, IPS, FW..) g( ) 2004 2009 11 Many cores results in 10’s of VMs and network I/O challenge. 11 11 ESC Silicon Valley – April, 2010 11
  12. 12. 5th Annual IEEE 802.1 Addressing Ethernet Virtualization in data Center  Current IEEE 802.1Q Bridges  Do not allow packet to be sent back to same port within same VLAN  D not have visibility into identity of virtual VM within physical stations Do t h i ibilit i t id tit f i t l VMs ithi h i l t ti  Extensions to Bridge and End Station behaviors needed to support virtualization  IEEE 802.1Qbg EVB (Edge Virtual Bridging), VEB/VEPA (Virtual Ethernet Q ( ) / ( Bridge / Virtual Ethernet Port Aggregation) & 802.1Qbh Bridge Port Extension (PE)  Address management issues created by the explosion in VMs in data centers – sharing access to network through embedded bridge  Discuss methods to offload policy, security, and management processing from virtual switches on NICs and blade servers, to physical Ethernet switches Managing Network I/O and Inter-VM Switching will Require Various Implementation Alternatives p 12 12
  13. 13. 5th Annual OpenFlow Switching / vSwitch  OpenFlow Switching includes:  Flow Tables used to implement packet processing  OpenFlow protocol used to manipulate the flow entries entries.  Enables acceleration of stateful security functions:  Application VM with associated security VM (e.g. FW, IPS, anti-virus).  Network traffic will be classified and transit the security VM p y prior to being g allowed to reach the application VM.  If new flow has been “blessed” pass packets straight to App VM.  Flow based policies for white/black lists (not just L2)  Software-based virtual switches will have difficulty coping with:  Large numbers of flows per second;  Many packets per second, i.e. high throughput at small packet sizes; AAssuring l i low l t latency. Network Flow Processors architecture fits well with OpenFlow. 15 13 ESC Silicon Valley – April, 2010 13
  14. 14. 5th Annual 1A. Software-Based Switching (Bridge) in Virtual Server Software virtual switch VMWare, Xen & Linux Bridge (initially had no ACL’s, VLAN’s ACL’ VLAN’ support) t) VMWare and Xen put switches as software modules in their VMM - but they lacked key features, and were slow! 13 14 14
  15. 15. 5th Annual 1B. Enhanced Software-Based Switching (vSwitch) in Virtual Server Cisco Nexus 1000V (ACLs, VLANs, IOS) for VMWare; OpenVSwitch (flow based) for XenServer But with added functionality the performance reduces Example: hugely - what happens if FW Cisco Nexus N1000 and IPS are added? 15 Good Solution for low-performance systems. High Latency low performance 14 15 ESC Silicon Valley – April, 2010 15
  16. 16. 5th Annual 2. I/O Gateway Delivers Three Key Functions: • In‐rack server communications switch  • replaces top‐of‐rack Ethernet switch l t f k Eth t it h • 10/20Gbps PCIe fabric  • Centralized enclosure for I/O adapters used  by servers in the rack Aprius Source: • shared (network, storage) I/O Virtensys Note: Xsigo Next I/O, Xsigo, use similar concepts • assigned (specialty accelerators) • Virtualized I/O configuration  New approach using PCIe or Infiniband interconnects, and security functions within gateway 16 16 ESC Silicon Valley – April, 2010 16
  17. 17. 5th Annual 3. Virtual Ethernet Port Aggregation (VEPA)  Offloads policy, security and management processing from virtual switches on NICs and blade servers, into h i l Ethernet switches ( i t physical Eth t it h (e.g. ToR switch)  IEEE VEPA is an extension to physical and virtual switching  VEPA allows VMs to use external switches to access features like ACLs, policies, VLAN assignments. All Inter-VM traffic has to traverse the physical network infrastructure. Additional security features, load balancers etc. implemented in external appliances 17 17 ESC Silicon Valley – April, 2010 17
  18. 18. 5th Annual 4. Moving Switching Into The Server Switch moved from IA/x86 into Netronome NFP-32xx Moving the switching to Netronome based Coprocessor leads to release of cycles on IA and increased application performance Adding IPS or FW is no performance. problem! 18 Server based Server-based NIC or LoM - Use Existing Wiring. Security processing in the Server 18 18 18
  19. 19. 5th Annual Intelligent I/O Sharing Alternatives; Summary Addressing Inter-VM Switching and the Network I/O Challenge Software- Server-based based switch I/O Gateway VEPA switch Very good – except Performance P f Poor Very good for inter-VM Very good switching Poor Power Wastes IA Cycles Good Good Good Unclear – standard U l t d d Network or server Network admin Depends who owns the Management admin if I/O Gateway owns switch implements a switch Centralized. Adding Centralized. Adding Software-based Centralized y Security Adds Latency security increases security increases +Distributed cost and latency cost and latency Depends on Medium – Flexibility High architecture standard switch High Good Reliability Low Good Good Distributed Di t ib t d <VEPA – card is same <VEPA: Card in Low, but higher for Less costly but as CNA in ToR. But Cost wastes IA cycles server <CNA & ToR intelligent ToR VEPA much simpler, Sw part of Gateway switches cheaper 19 19 ESC Silicon Valley – April, 2010 19
  20. 20. 5th Annual Performance of SR-IOV NIC, Linux Bridge and a vSwitch vSwitches require more packet processing & hence drop packets much earlier. 20 20 ESC Silicon Valley – April, 2010 20
  21. 21. 5th Annual Performance of SR-IOV NIC, an old style Bridge and a vSwitch vSwitches Provide more Flexibility and Functionality, but… Drop Packets Earlier; Consumes more CPU Cycles 21 21 ESC Silicon Valley – April, 2010 21
  22. 22. 5th Annual Performance & CPU Load of SR-IOV NIC, Linux Bridge and a vSwitch Combining Flexibility of vSwitches with Performance of SR-IOV NICs Requires an Intelligent I/O Coprocessor 22 22 ESC Silicon Valley – April, 2010 22
  23. 23. 5th Annual Requirements for I/O Coprocessor  Intelligent, Stateful, Flow-based switching Flow based  Integrated IOV  Load balancing  Integrated security  Glue-less interface to CPU subsystem Glue less Netronome “Netrok Flow Processor” is an Intelligent I/O Coprocessor 23 23 ESC Silicon Valley – April, 2010 23
  24. 24. 5th Annual Netronome Silicon & PCIe Cards  NFP-3240 based PCIe Cards  20Gbps of line rate packet and flow processing per NFE  6x1GigE, 2x10GigE ( g g (SPF+), netmod interfaces )  PCIe Gen2 (8 lanes)  Virtualized Linux drivers via SR-IOV  Flexible/configurable memory options  Packet time stamping with nanosecond granularity  Integrated cryptography  Packet capture and Inline applications  Hardware-based stateful flow management  TCAM based traffic filtering  D Dynamic fl i flow-based l d b l b d load balancing t x86 CPU i to 86 CPUs Highly programmable, intelligent, virtualized acceleration cards for network security appliances and virtualized servers 24 24 © 2009 Netronome Systems Confidential 24
  25. 25. 5th Annual Summary and Conclusion  Inter-VM switching and intelligent I/O device sharing are integral part of data center virtualization  There are many implementations alternatives  Heterogeneous architecture addresses this challenge  I/O Coprocessor Complements multicore x86 with packet processing performance; handling millions of stateful flows; Lowering power consumption  Netronome’s NFP-32xx processor family integrates inter-VM switching and I/O virtualization capabilities  Netronome’s PCIe card family integrates the intelligent, programmable, y g g ,p g , flow-based, Network Card functionality with IOV, for the data center. Heterogeneous architecture (Network Flow Processing + Multicore x86) addresses the need for inter-VM switching and intelligent I/O sharing. g g g 25 25 ESC Silicon Valley – April, 2010 25
  26. 26. 5th Annual Backup 26
  27. 27. 5th Annual Session Info & Abstract   Application of Multicore I/O Processors in Virtualized Data Centers Speaker: Nabil Damouny (Senior Director, Marketing, Netronome Systems), Rolf Neugebauer (Staff Software Engineer, Netronome Systems) Date/Time: (April 27, 2010) 8:30am — 9:15am Formats: Audience level: Intermediate Presentation Abstract This presentation will discuss the applications of integrated multicore processors, optimized for networking I/O applications, in virtualized data centers. Data centers are increasingly being built with multicore virtualized servers. As the number of cores in the server increases, the number of VMs goes up at an even faster pace. i th i th b f VM t f t These servers need to have access to high-performance network I/O, resulting in the requirement to implement I/O sharing in a virtualized, intelligent way. In addition, a mechanism for high-performance inter-VM switching will also be needed. Flow-based solutions, such as flow classification, routing and load balancing, balancing supporting in excess of 8M flows, are effective ways to address the flows above challenges. Track: Multicore Expo – Networking & Telecom 27
  28. 28. 5th Annual NFP-32xx Integrates Flow- Based L2 Functions For Inter-VM Switching • Flow Classification • Switching between physical networking ports • Switching between virtual NICs, without host intervention • Switching between any physical port and any virtual port • Stateful flow-based switching VM1 VM2 VM3 VMn C1 C2 C3 Cn CPU (Host) ( ) VNIC VNIC VNIC VNIC NFE Rx Tx Ethernet NFP-32xx Supports Switch Rx Tx > 8 Million Flows Interconnection Link Interconnection Link 28 ESC Silicon Valley – April, 2010 28
  29. 29. 5th Annual I/O Virtualization (IOV) Requirements  Support multiple virtual functions (VFs) over PCIe  Lower cost, lower power  Dynamically assign VFs to different VMs  Support multiple NIC functions: Crypto, PCAP, etc… pp p yp , ,  Capability to pin I/O device to specific CPU core/VM  Enable consolidation and isolation  Flow-based load balancing to x86 multicore CPUs  Hi h Higher-performance at l f t lower power Intelligent I/O virtualization is required in multicore CPU designs PCI-SIG introduced SR-IOV standards for this purpose 29 ESC Silicon Valley – April, 2010 29
  30. 30. 5th Annual The Need for Intelligent I/O Virtualization • Use commodity multicore hardware • Virtualization for: • C Consolidation lid ti • Move “legacy” applications & OSs to multicore • Isolation so at o • I/O devices need to be shared • Load balance/direct traffic to VMs • Pin VMs to cores • Direct traffic to cores/VMs • I l t d i access f Isolate device from VM VMs A good IOV solutions provides all of the above! 30 ESC Silicon Valley – April, 2010 30
  31. 31. 5th Annual NFP Security Capabilities  Internal instruction unit  DMA, bulk crypt/hash, PKI control, sequenced through cryptography instructions with multithreaded controller  Hardware accelerated b lk cryptography (20 Gb ) H d l t d bulk t h (20+Gbps)  AES-128,192, 256 bit keys  ECB, CBC, GCM, CTR, OFB, CFB, CM, f8 support  3DES, DES with  ECB, CBC support  ARC-4  SHA-1, SHA-1 HMAC  SHA-2, SHA-2 HMAC family S S Cf  224/256/384/512-bit support Encrypt/Authenticate PKI Modular Exponentiation  PKI modular exponentiation p  20k+ ops  Up to 2048 bit Integrated high performance modern crypto algorithms, with a PKI  Supports CRT engine, in a multi-threaded programmable environment 31 © 2010 Netronome, Inc. - Confidential. 31