NIC Virtualization on IBM Flex Systems

5,095 views

Published on

SOURCE URL: http://www.redbooks.ibm.com/redpieces/abstracts/sg248223.html

Learn how to deploy vNICs with IBM Flex System Manager patterns!

Published in: Technology, Business
  • Be the first to comment

NIC Virtualization on IBM Flex Systems

  1. 1. Draft Document for Review July 18, 2014 10:18 pm SG24-8223-00 ibm.com/redbooks Front cover NIC Virtualization in IBM Flex System Fabric Solutions Scott Irwin Scott Lorditch Matt Slavin Ilya Krutov Introduces NIC virtualization concepts and technologies Discusses UFP and vNIC deployment scenarios Provides UFP and vNIC configuration examples
  2. 2. International Technical Support Organization NIC Virtualization in IBM Flex System Fabric Solutions June 2014 Draft Document for Review July 18, 2014 10:18 pm 8223edno.fm SG24-8223-00
  3. 3. © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 8223edno.fm Draft Document for Review July 18, 2014 10:18 pm First Edition (June 2014) This edition applies to: IBM Networking Operating System 7.8 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch IBM Flex System Fabric EN4093R 10Gb Scalable Switch IBM Flex System Embedded 10Gb Virtual Fabric Adapter IBM Flex System CN4054 10Gb Virtual Fabric Adapter IBM Flex System CN4054R 10Gb Virtual Fabric Adapter This document was created or updated on July 18, 2014. Note: Before using this information and the product it supports, read the information in “Notices” on page v.
  4. 4. © Copyright IBM Corp. 2014. All rights reserved. iii Draft Document for Review July 18, 2014 10:18 pm 8223TOC.fm Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overview of Flex System network virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Introduction to NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 vNIC based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Unified Fabric Port based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Comparing vNIC modes and UFP modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Introduction to I/O module virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Introduction to vLAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Introduction to stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Introduction to SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Easy Connect Q-in-Q solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.5 Introduction to the Failover feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Introduction to converged fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Fibre Channel over Ethernet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.2 iSCSI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.3 iSCSI versus FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 2. IBM Flex System networking architecture and Fabric portfolio. . . . . . . . . 19 2.1 Enterprise Chassis I/O architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 IBM Flex System Fabric I/O modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.1 IBM Flex System Fabric EN4093R 10Gb Scalable Switch . . . . . . . . . . . . . . . . . . 24 2.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. . . . . . . . . . 30 2.2.3 IBM Flex System Fabric SI4093 System Interconnect Module. . . . . . . . . . . . . . . 36 2.2.4 I/O modules and cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3 IBM Flex System Virtual Fabric adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.1 Embedded 10Gb Virtual Fabric Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. . . . . . . . . . . 44 Chapter 3. NIC virtualization considerations on the switch side . . . . . . . . . . . . . . . . . 47 3.1 Virtual Fabric vNIC solution capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.1 Virtual Fabric mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.2 Switch Independent mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Unified Fabric Port feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.1 UFP Access and Trunk modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.2 UFP Tunnel mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.3 UFP FCoE mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.4 UFP Auto mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.5 UFP vPort considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3 Compute node NIC to I/O module connectivity mapping . . . . . . . . . . . . . . . . . . . . . . . 61
  5. 5. 8223TOC.fm Draft Document for Review July 18, 2014 10:18 pm iv NIC Virtualization in IBM Flex System Fabric Solutions 3.3.1 Embedded 10 Gb VFA (LOM) - Mezzanine 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1. . . . . . . . . . . . . . 62 3.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2. . . . . . . . . 63 3.3.4 IBM Flex System x222 Compute Node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 4. NIC virtualization considerations on the server side. . . . . . . . . . . . . . . . . 65 4.1 Enabling virtual NICs on the server via UEFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.1 Getting in to the virtual NIC configuration section of UEFI . . . . . . . . . . . . . . . . . . 66 4.1.2 Initially enabling virtual NIC functionality via UEFI . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1.3 Special settings for the different modes of virtual NIC via UEFI . . . . . . . . . . . . . . 76 4.1.4 Setting the Emulex virtual NIC settings back to factory default. . . . . . . . . . . . . . . 81 4.2 Enabling virtual NICs via Configuration Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3 Utilizing physical and virtual NICs in the OSes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Introduction to teaming/bonding on the server . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.2 OS side teaming/bonding and upstream network requirements . . . . . . . . . . . . . 112 4.3.3 Discussion of physical NIC connections and logical enumeration . . . . . . . . . . . 119 Chapter 5. Flex System NIC virtulization deployment scenarios . . . . . . . . . . . . . . . . 123 5.1 Introduction to deployment examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 UFP mode virtual NIC and Layer 2 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 UFP mode virtual NIC with vLAG and FCoE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.3.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover . . . . . . . . . . . . . . . . . . . . 149 5.4.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.4.2 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.4.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.4.4 Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.4.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.5 Switch Independent mode with SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.5.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.5.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
  6. 6. © Copyright IBM Corp. 2014. All rights reserved. v Draft Document for Review July 18, 2014 10:18 pm 8223spec.fm Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
  7. 7. 8223spec.fm Draft Document for Review July 18, 2014 10:18 pm vi NIC Virtualization in IBM Flex System Fabric Solutions Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Blade Network Technologies® BladeCenter® BNT® IBM® IBM Flex System® PowerVM® PureFlex® Redbooks® Redbooks (logo) ® System x® VMready® The following terms are trademarks of other companies: Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
  8. 8. © Copyright IBM Corp. 2014. All rights reserved. vii Draft Document for Review July 18, 2014 10:18 pm 8223pref.fm Preface The deployment of server virtualization technologies in data centers requires significant efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen applications and services. Each of these services requires certain bandwidth (or speed) to function properly. Furthermore, because of different network traffic patterns that are relevant to different service types, these traffic flows can interfere with each other. They can lead to serious network problems, including the inability of the service to perform its functions. The NIC virtualization in IBM® Flex System Fabric solutions addresses these issues. The solutions are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure. This infrastructure is built on IBM Flex System Fabric CN4093 and EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System Fabric SI4093 Switch Interconnect modules in the chassis and the Emulex Virtual Fabric Adapters in each compute node. This IBM Redbooks® publication introduces NIC virtualization concepts and technologies, discusses their deployment scenarios, and provide configuration examples that use IBM Networking OS technologies combined with the Emulex Virtual Fabric adapters. This book is for IBM, IBM Business Partner and client networking professionals who want to learn how to implement NIC virtualization solutions and switch interconnect technologies on IBM Flex System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode. This book assumes that the reader has basic knowledge of the networking concepts and technologies, including OSI model, Ethernet LANs, Spanning Tree protocol, VLANs, VLAN tagging, uplinks, trunks, and static and dynamic (LACP) link aggregation. Authors This book was produced by a team of specialists from around the world working at the International Technical Support Organization, Raleigh Center. Ilya Krutov is a Project Leader at the ITSO Center in Raleigh and has been with IBM since 1998. Before he joined the ITSO, Ilya served in IBM as a Run Rate Team Leader, Portfolio Manager, Brand Manager, Technical Sales Specialist, and Certified Instructor. Ilya has expert knowledge in IBM System x®, BladeCenter®, and Flex System products and technologies, virtualization and cloud computing, and data center networking. He has authored over 150 books, papers, product guides, and solution guides. He has a bachelor’s degree in Computer Engineering from the Moscow Engineering and Physics Institute. Scott Irwin is a Consulting System Engineer (CSE) for IBM System Networking. He joined IBM in November of 2010 as part of the Blade Network Technologies®, (BNT®) acquisition. His Networking background spans well over 16 years as both a Customer Support Escalation Engineer and a Customer facing Field Systems Engineer. In May of 2007, he was promoted to Consulting Systems Engineer with a focus on deep customer troubleshooting. His responsibilities are to support customer Proof of Concepts, assist with paid installations and training and provide support for both pre and post Sales focusing on all verticals (Public Sector, High Frequency Trading, Service Provider, Mid Market and Enterprise).
  9. 9. 8223pref.fm Draft Document for Review July 18, 2014 10:18 pm viii NIC Virtualization in IBM Flex System Fabric Solutions Scott Lorditch is a Consulting Systems Engineer for IBM System Networking. He performs network architecture assessments, and develops designs and proposals for implementing GbE Switch Module products for the IBM BladeCenter. He also developed several training and lab sessions for IBM technical and sales personnel. Previously, Scott spent almost 20 years working on networking in various industries, working as a senior network architect, a product manager for managed hosting services, and manager of electronic securities transfer projects. Scott holds a BS degree in Operations Research with a specialization in computer science from Cornell University. Matt Slavin is a Consulting Systems Engineer for IBM Systems Networking, based out of Tulsa, Oklahoma, and currently providing network consulting skills to the Americas. He has a background of over 30 years of hands-on systems and network design, installation, and troubleshooting. Most recently, he has focused on data center networking where he is leading client efforts in adopting new and potently game-changing technologies into their day-to-day operations. Matt joined IBM through the acquisition of Blade Network Technologies, and prior to that has worked at some of the top systems and networking companies in the world. Thanks to the following people for their contributions to this project: Tamikia Barrow, Cheryl Gera, Chris Rayns, Jon Tate, David Watts, Debbie Willmschen International Technical Support Organization, Raleigh Center Nghiem Chu, Sai Chan, Michael Easterly, Heidi Griffin, Bob Louden, Richard Mancini, Shekhar Mishra, Heather Richardson, Hector Sanchez, Tim Shaughnessy IBM Jeff Lin Emulex Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
  10. 10. Preface ix Draft Document for Review July 18, 2014 10:18 pm 8223pref.fm Comments welcome Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an email to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 Stay connected to IBM Redbooks Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html
  11. 11. 8223pref.fm Draft Document for Review July 18, 2014 10:18 pm x NIC Virtualization in IBM Flex System Fabric Solutions
  12. 12. © Copyright IBM Corp. 2014. All rights reserved. 1 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment This chapter introduces the various virtualization features available with certain I/O Modules and converged network adapters (CNAs) in the IBM PureFlex® System environment. The primary focus of this paper are the EN4093R, CN4093, and the SI4093, along with related server side converged network adapter (CNA) or Virtual Fabric Adapter (VFA) virtulization features. Although other I/O modules are available for the Flex System Enterprise Chassis environment, unless otherwise noted, those other I/O modules do not support the virtualization features discussed in this document and are not covered here. This chapter includes the following sections: 1.1, “Overview of Flex System network virtualization” on page 2 1.2, “Introduction to NIC virtualization” on page 3 1.3, “Introduction to I/O module virtualization” on page 6 1.4, “Introduction to converged fabrics” on page 14 1
  13. 13. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 2 NIC Virtualization in IBM Flex System Fabric Solutions 1.1 Overview of Flex System network virtualization The term virtualization can mean many different things to different people, and in different contexts. For example, in the server world it is often associated with taking bare metal platforms and putting in a layer of software (referred to as a hypervisor) that permits multiple virtual machines (VMs) to run on that single physical platform, with each VM thinking it owns the entire hardware platform. In the network world, there are many different concepts of virtualization. Such things as overlay technologies, that let a user run one network on top of another network, usually with the goal of hiding the complexities of the underlying network (often referred to as overlay networking). Another form of network virtualization would be Openflow technology, which de-couples a switches control plane from the switch, and allows the switching path decisions to be made from a central control point. And then there are other forms of virtualization, such as cross chassis aggregation (also known as cross-switch aggregation), virtualized NIC technologies, and converged fabrics. This paper is focused on the latter set of virtualization forms, specifically the following set of features: Converged fabrics - Fibre Channel over Ethernet (FCoE) and Internet Small Computer Systems Interconnect (iSCSI) Virtual Link Aggregation (vLAG) - A form of cross switch aggregation Stacking - Virtualizing the management plane and the switching fabric Switch Partitioning (SPAR) - Masking the I/O Module from the host and upstream network Easy Connect Q-in-Q solutions - More ways to mask the I/O Modules from connecting devices NIC virtualization - Allowing a single physical 10 GbE NIC to represent multiple NICs to the host OS Although we will be introducing all of these topics in this chapter, the primary focus of this paper will be around how the last item (NIC virtualization) integrates into the various other features, and the surrounding customer environment. The specific NIC virtualization features that will be discussed in detail in this paper include the following: IBM Virtual Fabric mode - also known as vNIC Virtual Fabric mode, including both Dedicated Uplink Mode (default) and Shared Uplink Mode (optional) operations Switch Independent Mode - also known as vNIC Switch Independent Mode Unified Fabric Port - also known as IBM Unified Fabric Protocol, or just UFP - All modes Important: The term vNIC can be used both generically for all virtual NIC technologies, or as a vendor specific term. For example, VMware calls the virtual NIC that resides inside a VM a vNIC. Unless otherwise noted, the use of the term vNIC in this paper is referring to a specific feature available on the Flex System I/O modules and Emulex CNAs inside physical hosts. In a related fashion, the term vPort has multiple connotations, for example, used by Microsoft for their Hyper-V environment. Unless otherwise noted, the use of the term vPort in this paper is referring to the UFP feature on the Flex System I/O modules and Emulex CNAs inside physical hosts.
  14. 14. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 3 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm 1.2 Introduction to NIC virtualization This section introduces the two primary types of NIC virtualization (vNIC and UFP) available on the Flex System I/O modules and adapters, as well as introduces the various sub-elements of these virtual NIC technologies. The deployment of server virtualization technologies in data centers requires significant efforts to provide sufficient network I/O bandwidth (or speed) to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen network applications and services, and each of these services requires a certain bandwidth to function properly. Furthermore, because of different network traffic patterns relevant to different service types, these traffic flows might interfere with each other. This interference can lead to serious network problems, including the inability of the service to perform its functions. Providing sufficient bandwidth and isolation to virtualized applications in a 1 Gbps network infrastructure might be challenging for blade-based deployments where the number of physical I/O ports per compute node is limited. For example, a maximum of 12 physical ports per single-wide compute node (up to six Ethernet ports per adapter) can be utilized for network connectivity. With 1 GbE, a total network bandwidth of 12 Gb per compute node is available for Gigabit Ethernet infrastructures, leaving no room for future growth. In addition, traffic flows are isolated on a physical port basis. Also, the bandwidth per interface is static with a maximum bandwidth of 1 Gb per flow, thus limiting the flexibility of bandwidth usage. IBM Flex System Fabric solutions address these issues by increasing the number of available Ethernet ports and providing more flexibility in allocating the available bandwidth to meet specific application requirements. By virtualizing a 10 Gbps NIC, its resources can be divided into multiple logical instances or virtual NICs. Each virtual NIC appears as a regular, independent NIC to the server operating system or hypervisor, and each virtual NIC uses a portion of the overall bandwidth of the physical NIC. For example, a NIC partition with a maximum bandwidth of 4 Gbps appears to the host applications as a physically distinct 4 Gbps Ethernet adapter. Also, the NIC partitions provide traffic forwarding and port isolation. The virtual NIC technologies discussed for the I/O module here are all directly tied to the Emulex CNA offerings for the Flex System environment, and documented in 2.3, “IBM Flex System Virtual Fabric adapters” on page 42. 1.2.1 vNIC based NIC virtualization vNIC is the original virtual NIC technology utilized in the IBM BladeCenter 10Gb Virtual Fabric Switch Module, and has been brought forward into the PureFlex System environment to allow customers that have standardized on vNIC to still use it with the PureFlex System solutions. Important: All I/O module features discussed in this paper are based on the latest available firmware at the time of writing (IBM Netwroking OS 7.8 for the EN4093R, CN4093, and SI4093 modules).
  15. 15. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 4 NIC Virtualization in IBM Flex System Fabric Solutions vNIC has two primary modes: IBM Virtual Fabric mode Virtual Fabric mode offer advanced virtual NICs to servers, and it requires support on the switch side. In IBM Virtual Fabric mode, the Virtual Fabric Adapter (VFA) in the compute node communicates with the Flex System switch to obtain vNIC parameters (using DCBX). A special tag is added within each data packet and is later removed by the NIC and switch for each vNIC group to maintain separation of the virtual data paths. In IBM Virtual Fabric Mode, you can change the bandwidth allocations through the IBM switch user interfaces without requiring a reboot of the server. vNIC bandwidth allocation and metering is performed by both the switch and the VFA. In such a case, a bidirectional virtual channel of an assigned bandwidth is established between them for every defined vNIC. Switch Independent mode Switch Independent Mode offers virtual NICs to server with no special I/O module side configuration. It extends the existing customer VLANs to the virtual NIC interfaces. The IEEE 802.1Q VLAN tag is essential to the separation of the vNIC groups by the NIC adapter or driver and the switch. The VLAN tags are added to the packet by the applications or drivers at each end station rather than by the switch. vNIC bandwidth allocation and metering is only performed by VFA itself. The switch is completely unaware that the 10 GbE NIC is being seen as multiple logical NICs in the OS. In such a case, a unidirectional virtual channel is established where the bandwidth management is only performed for the outgoing traffic on a VFA side (server-to-switch). The incoming traffic (switch-to-server) uses the all available physical port bandwidth, as there is no metering performed on either the VFA or a switch side. Virtual Fabric mode vNIC has two sub-modes: vNIC Virtual Fabric - Dedicated Uplink Mode – Provides a Q-in-Q tunneling action for each vNIC group – Each vNIC group must have its own dedicated uplink path out – Any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network with Layer 3 routing vNIC Virtual Fabric - Shared Uplink Mode – Each vNIC group provides a single VLAN for all vNICs in that group – Each vNIC group must be a unique VLAN (can not use same VLAN on more than a single vNIC group) – Servers can not use tagging when Shared Uplink Mode is enabled – Like vNICs in Dedicate Uplink Mode, any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network with Layer 3 routing Details for enabling and configuring these modes can be found in Chapter 4, “NIC virtualization considerations on the server side” on page 65 and Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123. 1.2.2 Unified Fabric Port based NIC virtualization UFP is the current direction of IBM NIC virtualization, and provides a more feature rich solution compared to the original vNIC Virtual Fabric mode. Like Virtual Fabric mode vNIC,
  16. 16. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 5 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm UFP allows carving up a single 10 Gb port into four virtual NICs (called vPorts in UFP). UFP also has a number of modes associated with it, including: Tunnel mode Provides Q-in-Q mode, where the vPort is customer VLAN-independent (very similar to vNIC Virtual Fabric Dedicated Uplink Mode) Trunk mode Provides a traditional 802.1Q trunk mode (multi-VLAN trunk link) to the virtual NIC (vPort) interface, i.e. permits host side tagging Access mode Provides a traditional access mode (single untagged VLAN) to the virtual NIC (vPort) interface which is similar to a physical port in access mode FCoE mode Provides FCoE functionality to the vPort Auto-VLAN mode Auto VLAN creation for Qbg and IBM VMready® environments Only one vPort (vPort 2) per physical port can be bound to FCoE. If FCoE is not desired, vPort 2 can be configured for one of the other modes. Details for enabling and configuring these modes can be found in Chapter 4, “NIC virtualization considerations on the server side” on page 65 and Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123. 1.2.3 Comparing vNIC modes and UFP modes As a general rule of thumb, if a customer desires virtualized NICs in the PureFlex System environment, UFP is usually the preferred solution, as all new feature development is going into UFP. If a customer has standardized on the original vNIC Virtual Fabric mode, then they can still continue to use that mode in a fully supported fashion. If a customer does not want any of the virtual NIC functionality controlled by the I/O module (only controlled and configured on the server side) then Switch Independent mode vNIC is the solution of choice. This mode has the advantage of being I/O module independent, such that any upstream I/O module can be utilized. Some of the down sides to this mode are that bandwidth restrictions can only be enforced from the server side, not the I/O module side, and to change bandwidth requires a reboot of the server (bandwidth control for the other virtual NIC modes discussed here are changed from the switch side, enforce bandwidth restrictions bidirectionally, and can be changed on the fly, with no reboot required).
  17. 17. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 6 NIC Virtualization in IBM Flex System Fabric Solutions Table 1-1 shows some of the items that may effect the decision making process. Table 1-1 Attributes of virtual NIC options For a deeper dive into virtual NIC operational characteristics from the switch side see Chapter 3, “NIC virtualization considerations on the switch side” on page 47. For virtual NIC operational characteristics from the server side, see Chapter 4, “NIC virtualization considerations on the server side” on page 65. 1.3 Introduction to I/O module virtualization This section provides brief overview of Flex System I/O module virtualization technologies. The folloiwng topics are covered: 1.3.1, “Introduction to vLAG” 1.3.2, “Introduction to stacking” on page 8 1.3.3, “Introduction to SPAR” on page 9 1.3.4, “Easy Connect Q-in-Q solutions” on page 10 1.3.5, “Introduction to the Failover feature” on page 13 Capability Virtual Fabric vNIC mode Switch independent Mode vNIC UFP Dedicated uplink Shared uplink Requires support in the I/O module Yes Yes No Yes Requires support in the NIC/CNA Yes Yes Yes Yes Supports adapter transmit rate control Yes Yes Yes Yes Supports I/O module transmit rate control Yes Yes No Yes Supports changing rate without restart of node Yes Yes No Yes Requires a dedicated uplink path per vNIC group or vPort Yes No No Yes for vPorts in Tunnel mode Support for node OS-based tagging Yes No Yes Yes Support for failover per vNIC/ group/UFP vPort Yes Yes No Yes Support for more than one uplink path per vNIC/vPort group No Yes Yes Yes for vPorts in Trunk and Access modes Supported regardless of the model of the Flex System I/O module No No Yes No Supported with vLAG No No Yes Yes for uplinks out of the I/O Module carrying vPort traffic Supported with SPAR No No Yes No Supported with stacking Yes Yes Yes Yes Supported with SI4093 No No Yes Yes Supported with EN4093 Yes Yes Yes Yes Supported with CN4093 Yes Yes Yes Yes
  18. 18. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 7 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm 1.3.1 Introduction to vLAG In its simplest terms, vLAG is a technology designed to enhance traditional Ethernet link aggregations (sometimes referred to generically as Portchannels or Etherchannels). It is important to note that vLAG is not a form of aggregation in its own right, but an enhancement to aggregations. As some background, under current IEEE specifications, an aggregation is still defined as a bundle of similar links between two, and only two devices, bound together to operate as a single logical link. By today’s standards based definitions, you cannot create an aggregation on one device and have these links of that aggregation connect to more than a single device on the other side of the aggregation. The use of only two devices in this fashion limits the ability to offer certain robust designs. Although the standards bodies are working on a solution that provides split aggregations across devices, most vendors have developed their own versions of this multi-chassis aggregation. For example, Cisco has virtual Port Channel (vPC) on NX OS products, and Virtual Switch System (VSS) on the 6500 IOS products. IBM offers virtual Link Aggregation (vLAG) on many of the IBM Top of Rack (ToR) solutions, and on the EN4093R and CN4093 Flex System I/O modules. The primary goal of virtual link aggregation is to overcome the limit imposed by the current standards-based aggregation, and provide a distributed aggregation across a pair of switches instead of a single switch. Doing so results in a reduction of single points of failure, while still maintaining a loop-free, non-blocking environment. Figure 1-1 on page 7, shows an example of how vLAG can create a single common uplink out of a pair of embedded I/O Modules. This creates a non-looped path with no blocking links, offering the maximum amount of bandwidth for the links, and no single point of failure. Figure 1-1 Non-looped design using multi-chassis aggregation on both sides Although this vLAG based design is considered the most optimal, not all I/O module virtualization options support this topology, for example, Virtual Fabric vNIC mode or SPAR is not supported with vLAG. Another potentially limiting factor with vLAG (and other such cross-chassis aggregations such as vPC and VSS) is that it only supports a pair of switches acting as one for this cross-chassis aggregation, and not more than two. If the desire is to split an aggregation across more than two switches, stacking might be an option to consider. Chassis Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) I/O Module 1 I/O Module 2 Multi-chassis Aggregation (vLAG)
  19. 19. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 8 NIC Virtualization in IBM Flex System Fabric Solutions 1.3.2 Introduction to stacking Stacking provides the ability to take up to eight physical I/O modules and treat them as a single logical switch from a port usage and management perspective. This means ports on different I/O modules in the stack can be part of a common aggregation, and you only log in to a single IP address to manage all I/O modules in the stack. For devices that are attaching to the stack, the stack looks and acts like a single large switch. Stacking is supported on the EN4093R and CN4093 I/O modules. It is provided by reserving a group of uplinks into stacking links and creating a ring of I/O modules with these links. The ring design ensures the loss of a single link or single I/O module in the stack does not lead to a disruption of the stack. Before v7.7 releases of code, it was possible to stack the EN4093R only into a common stack of like model I/O modules. However, in v7.7 and later code, support was added to add a pair CN4093s into a hybrid stack of EN4093s to add Fibre Channel Forwarder (FCF) capability into the stack. The limit for this hybrid stacking is a maximum of 6x EN4093Rs and 2x CN4093s in a common stack. Stacking the Flex System chassis I/O modules with IBM Top of Rack switches that also support stacking is not allowed. Connections from a stack of Flex System chassis I/O modules to upstream switches can be made with normal single or aggregated connections, including the use of vLAG/vPC on the upstream switches to connect links across stack members into a common non-blocking fabric between the stack and the Top of Rack switches. An example of four I/O modules in a highly available stacking design is shown in Figure 1-2. Important: When using the EN4093R and CN4093 in hybrid stacking, only the CN4093 is allowed to act as a stack master or stack backup master for the stack.
  20. 20. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 9 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Figure 1-2 Example of stacking in the Flex System environment This example shows a design with no single points of failures, via a stack of four I/O modules in a single stack, and a pair of upstream vLAG/vPC connected switches. One of the potential limitations of the current implementation of stacking is that if an upgrade of code is needed, a reload of the entire stack must occur. Because upgrades are uncommon and should be scheduled for non-production hours anyway, a single stack design is usually efficient and acceptable. But some customers do not want to have any downtime (scheduled or otherwise) and a single stack design is thus not an acceptable solution. For these users that still want to make the most use of stacking, a two-stack design might be an option. This design features stacking a set of I/O modules in bay 1 into one stack, and a set of I/O modules in bay 2 in a second stack. The primary advantage to a two-stack design is that each stack can be upgraded one at a time, with the running stack maintaining connectivity for the compute nodes during the upgrade and reload of the other stack. The downside of the two-stack design is that traffic that is flowing from one stack to another stack must go through the upstream network to reach the other stack. As can be seen, stacking might not be suitable for all customers. However, if it is desired, it is another tool that is available for building a robust infrastructure by using the Flex System I/O modules. 1.3.3 Introduction to SPAR Switch partitioning (SPAR) is a feature that, among other things, allows a physical I/O module to be divided into multiple logical switches. After SPAR is configured, ports within a given SPAR group can communicate only with each other. Ports that are members of different Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) Chassis 1 Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 I/O Module 1 I/O Module 2 Stacking Chassis 2 Compute Node NIC 1 NIC 2 I/O Module 1 I/O Module 2
  21. 21. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 10 NIC Virtualization in IBM Flex System Fabric Solutions SPAR groups on the same I/O module can not communicate directly with each other, without going outside the I/O module. The EN4093R, CN4093, and the SI4093 I/O Modules support SPAR. SPAR features two modes of operation: Pass-through domain mode (also known as transparent mode) This mode of SPAR uses a Q-in-Q function to encapsulate all traffic passing through the switch in a second layer of VLAN tagging. This is the default mode when SPAR is enabled and is VLAN agnostic owing to this Q-in-Q operation. It passes tagged and untagged packets through the SPAR session without looking at or interfering with any customer assigned tag. SPAR pass-thru mode supports passing FCoE packets to an upstream FCF, but without FIP Snooping within the SPAR group in pass-through domain mode. Local domain mode This mode is not VLAN agnostic and requires a user to create any required VLANs in the SPAR group. Currently, there is a limit of 256 VLANs in Local domain mode. Support is available for FIP Snooping on FCoE sessions in Local Domain mode. Unlike pass-through domain mode, Local Domain mode provides strict control of end host VLAN isolation. Consider the following points regarding SPAR: SPAR is disabled by default on the EN4093R and CN4093. SPAR is enabled by default on SI4093, with all base licensed internal and external ports defaulting to a single pass-through SPAR group. This default SI4093 configuration can be changed if desired. Any port can be a member of only a single SPAR group at one time. Only a single uplink path is allowed per SPAR group (can be a single link, a single static aggregation, or a single LACP aggregation). This SPAR enforced restriction ensures that no network loops are possible with ports in a SPAR group. SPAR cannot be used with UFP or Virtual Fabric vNIC at this time. Switch Independent Mode vNIC is supported with SPAR. UFP support is slated for a possible future release. Up to eight SPAR groups per I/O module are supported. This number might be increased in a future release. SPAR is not supported with vLAG, stacking or tagpvid-ingress features. SPAR can be a useful solution in environments were simplicity is paramount. 1.3.4 Easy Connect Q-in-Q solutions The Easy Connect concept, often referred to as Easy Connect mode, or Transparent mode, is not a specific feature but a way of using one of four different existing features to attempt to minimize ongoing I/O module management requirements. The primary goal of Easy Connect is to make an I/O module transparent to the hosts and the upstream network they need to access, thus reducing the management requirements for I/O Modules in an Easy Connect mode. As noted, there are actually several features that can be used to accomplish an Easy Connect solution, with the following being common aspects of Easy Connect solutions: At the heart of Easy Connect is some form of Q-in-Q tagging, to mask packets traveling through the I/O module. This is a fundamental requirement of any Easy Connect solution
  22. 22. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 11 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm and lets the attached hosts and upstream network communicate using any VLAN (tagged or untagged), and the I/O module will pass those packets through to the other side of the I/O module by wrapping them in an outer VLAN tag, and then removing that outer VLAN tag as the packet exits the I/O module, thus making the I/O module VLAN agnostic. This Q-in-Q operation is what removes the need to manage VLANs on the I/O module, which is usually one of the larger ongoing management requirements of a deployed I/O module. Pre-creating an aggregation of the uplinks, in some cases, all of the uplinks, to remove the possibility of loops (if all uplinks are not used, any unused uplinks/ports should be disabled to ensure loops are not possible). Optionally disabling spanning-tree so the upstream network does not receive any spanning-tree BPDUs. This is especially important in the case of upstream devices that will shut down a port if BPDUs are received, such as a Cisco FEX device, or an upstream switch running some form of BPDU guard. After it is configured, an I/O module in Easy Connect mode does not require on-going configuration changes as a customer adds and removes VLANs to the hosts and upstream network. In essence, Easy Connect turns the I/O module into a VLAN agnostic port aggregator, with support for growing up to the maximum bandwidth of the product (for example, add upgrade Feature on Demand (FoD) keys to the I/O module to increase the 10 Gb links to Compute Nodes and 10 Gb and 40 Gb links to the upstream networks). The following are the two primary methods for deploying an Easy Connect solution: Use an I/O module that defaults to a form of Easy Connect: – For customers that want an Easy Connect type of solution that is immediately ready for use out of the box (zero touch I/O module deployment), the SI4093 provides this by default. The SI4093 accomplishes this by having the following factory default configuration: • All base licensed internal and external ports are put into a single SPAR group. • All uplinks are put into a single common LACP aggregation and the LACP suspend-port feature is enabled. • The failover feature is enabled on the common LACP key. • No spanning-tree support (the SI4093 is designed to never permit more than a single uplink path per SPAR, so it can not create a loop and does not support spanning-tree). For customers that want the option to be able to use advanced features, but also want an Easy Connect mode solution, the EN4093R and CN4093 offer configurable options that can make them transparent to the attaching Compute Nodes and upstream network switches, while maintaining the option of changing to more advanced modes of configuration when needed. As noted, the SI4093 accomplishes this by defaulting to the SPAR feature in pass-through mode, which puts all compute node ports and all uplinks into a common Q-in-Q group. For the EN4093R and CN4093, there are a number of features that can be implemented to accomplish this Easy Connect support. The primary difference between these I/O modules and the SI4093 is that you must first perform a small set of configuration steps to set up the EN4093R and CN4093 into an Easy Connect mode, after which minimal management of the I/O module is required. For these I/O modules, this Easy Connect mode can be configured by using one of the following four features:
  23. 23. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 12 NIC Virtualization in IBM Flex System Fabric Solutions The SPAR feature that is default on the SI4093 can be configured on both the EN4093R and CN4093 as well Utilize the tagpvid-ingress feature Configure vNIC Virtual Fabric Dedicated Uplink Mode Configure UFP vPort tunnel mode In general, all of these features provide this Easy Connect functionality, with each having some pros and cons. For example, if the desire is to use Easy Connect with vLAG, you should use the tagpvid-ingress mode or the UFP vPort tunnel mode (SPAR and Virtual Fabric vNIC do not permit the vLAG ISL). But, if you want to use Easy Connect with FCoE today, you cannot use tagpvid-ingress and must utilize a different form of Easy connect, such as the vNIC Virtual Fabric Dedicated Uplink Mode or UFP tunnel mode (SPAR pass-through mode allows FCoE but does not support FIP snooping, which may or may not be a concern for some customers). As an example of how Easy Connect works (in all Easy Connect modes), consider the tagpvid-ingress Easy Connect mode operation shown in Figure 1-3. When all internal ports and the desired uplink ports are placed into a common PVID/Native VLAN (4091 in this example) and tagpvid-ingress is enabled on these ports (with any wanted aggregation protocol on the uplinks that are required to match the other end of those links), all ports with a matching Native or PVID setting on this I/O module are part of a single Q-in-Q tunnel. The Native/PVID VLAN on the port acts as the outer tag and the I/O module switches traffic based on this outer tag VLAN. The inner customer tag rides through the fabric encapsulated on this Native/PVID VLAN to the destination port (or ports) in this tunnel, and then has the outer tag stripped off as it exits the I/O module, thus re-exposing the original customer facing tag (or no tag) to the device attaching to that egress port. Figure 1-3 Packet flow with Easy Connect In all modes of Easy Connect, local switching based on destination MAC address is still used.
  24. 24. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 13 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Some considerations on what form of Easy Connect mode makes the most sense for a given situation: For users that require virtualized NICs and are already using vNIC Virtual Fabric mode, and are more comfortable staying with it, vNIC Virtual Fabric dedicated uplink mode might be the best solution for Easy Connect functionality. For users that require virtualized NICs and have no particular opinion on which mode of virtualized NIC they prefer, UFP tunnel mode would be the best choice for Easy Connect mode, since the UFP feature is the future direction of virtualized NICs in the Flex System I/O module solutions. For users planning to make use of the vLAG feature, this would require either UFP tunnel mode or tagpvid-ingress mode forms of Easy Connect (vNIC Virtual Fabric mode and SPAR Easy Connect modes do not work with the vLAG feature). For users that do not need vLAG or virtual NIC functionality, SPAR is a very simple and clean solution to implement as an Easy Connect solution. 1.3.5 Introduction to the Failover feature Failover, some times referred to as Layer 2 Failover or Trunk Failover, is not a virtulization feature in its own right, but can play an important role when NICs on a server are making use of teaming/bonding (forms of NIC virtulization in the OS). Failover is particularly important in an embedded environment, such as in a Flex System chassis. When NICs are teamed/bonded in an operating system, they need to know when a NIC is no longer able to reach the upstream network, so they can decide to use or not use a NIC in the team. Most commonly this is a simple link up/link down check in the server. If the link is reporting up, use the NIC, if a link is reporting down, do not use the NIC. In an embedded environment, this can be a problem if the uplinks out of the embedded I/O module go down, but the internal link to the server is still up. In that case, the server will still be reporting the NIC link as up, even though there is no path to the upstream network, and that leads to the server sending traffic out a NIC that has no path out of the embedded I/O module, and disrupts server communications. The Failover feature can be implemented in these environments, and when the set of uplinks the Failover feature is tracking go down, then configurable internal ports will also be taken down, alerting the embedded server to a path fault in this direction, at which time the server can utilize the team/bond to select a different NIC, and maintain network connectivity. An example of how failover can protect Compute Nodes in a PureFlex chassis when there is an uplink fault out of one of the I/O modules can be seen in Figure 1-4 on page 14.
  25. 25. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 14 NIC Virtualization in IBM Flex System Fabric Solutions Figure 1-4 Example of Failover in action Without failover or some other form of remote link failure detection, embedded servers would potentially be exposed to loss of connectivity if the uplink path on one of the embedded I/O modules were to fail. Note designs that utilize vLAG or some sort of cross chassis aggregation such as stacking are not exposed to this issue (and thus do not need the Failover feature) as they have a different coping method for dealing with uplinks out of an I/O module going down (for example, with vLAG, the packets that need to get upstream can cross the vLAG ISL and use the other I/O modules uplinks to get to the upstream network). 1.4 Introduction to converged fabrics As the name implies, converged fabrics are all about taking a set of protocols and data designed to run on top of one kind of physical medium, and allowing them to be carried on top of a different physical medium. This provides a number of cost benefits, such as reducing the number of physical cabling plants that are required, removing the need for separate physical NICs and HBAs, including a potential reduction in power and cooling. From an OpEx perspective it can reduce the cost associated with the management of separate physical infrastructures. In the datacenter world, two of the most common forms of converged fabrics are FCoE and iSCSI. FCoE allows a host to use its 10 Gb Ethernet connections to access Fibre Channel attached storage, as if it were physically Fibre Channel attached to the host, when in fact the FC traffic is encapsulated into FCoE frames and carried to the remote storage via an Ethernet network. iSCSI takes a protocol that was originally designed for hosts to talk to relatively close physical storage over physical SCSI cables, and converts it to utilize IP and run over an Ethernet network, and thus be able to access storage way beyond the limitations of a physical SCSI based solution. How Failover Works 1. All uplinks out of the I/O module have gone down (could be a link failure or failure of ToR 1, and so forth). 2. Trunk failover takes down the link to NIC 1 to notify the compute node the path out of I/O module 1 is gone. 3. NIC teaming on the compute node begins to utilizing the still functioning NIC 2 for all communications. Chassis Node NIC 1 NIC 2ToR Switch 2 ToR Switch 1 I/O Module 1 Failover enabled I/O Module 2 Failover enabled X Logical Teamed NIC 2 3 1
  26. 26. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 15 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm iSCSI can be used in existing (lossy) and new (lossless) Ethernet infrastructure, with different performance characteristics. However, FCoE requires a lossless converged enhanced Ethernet network, and it relies on additional functionality known from Fibre Channel (for example, nameserver, zoning). 1.4.1 Fibre Channel over Ethernet FCoE assumes the existence of a lossless Ethernet, such as one that implements the Data Center Bridging (DCB) extensions to Ethernet. The EN4093R, CN4093, G8264 and G8264CS switches support FCoE; the G8264 and EN4093R functions as an FCoE transit switch while the CN4093 and G8264CS have Omni Ports which can be set to function as either FC ports or Ethernet ports under as specified in the switch configuration. The basic notion of FCoE is that the upper layers of FC are mapped onto Ethernet. The upper layer protocols and services of FC remain the same in an FCoE deployment. Zoning, fabric services, and similar services still exist with FCoE. The difference is that the lower layers of FC are replaced by lossless Ethernet, which also implies that FC concepts, such as port types and lower-layer initialization protocols, must be replaced by new constructs in FCoE. Such mappings are defined by the FC-BB-5 standard and are briefly addressed here. Figure 1-5 shows the perspective on FCoE layering compared to other storage networking technologies. In this figure, FC and FCoE layers are shown with other storage networking protocols, including iSCSI. Figure 1-5 Storage Network Protocol Layering Operating Systems / Applications SCSI Layer 1, 2, 4, 8, 16 Gbps FCP FCP FCP FCiSCSI SRP TCP TCP TCP IP IP IP FCoE FC IB iFCPFCIP Ethernet 1, 10, 40, 100... Gbps 10, 20, 40 Gbps FC FCoE
  27. 27. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 16 NIC Virtualization in IBM Flex System Fabric Solutions 1.4.2 iSCSI The iSCSI protocol allows for longer distances between a server and its storage when compared to the traditionally restrictive parallel SCSI solutions or the newer serial-attached SCSI (SAS). iSCSI technology can use a hardware initiator, such as a host bus adapter (HBA), or a software initiator to issue requests to target devices. Within iSCSI storage terminology, the initiator is typically known as a client, and the target is the storage device. The iSCSI protocol encapsulates SCSI commands into protocol data units (PDUs) within the TCP/IP protocol and then transports them over the network to the target device. iSCSI provides block-level access to storage, as does Fibre Channel, but uses TCP/IP over Ethernet instead of Fibre Channel protocol. Therefore, iSCSI is attractive for its relative simplicity and usage of widely available Ethernet skills. Its chief limitations historically have been the relatively lower speeds of Ethernet compared to Fibre Channel and the extra TCP/IP encapsulation required. With lossless 10 Gb Ethernet now available, the attractiveness of iSCSI is expected to grow rapidly. TCP/IP encapsulation will still be used, but 10 Gbps Ethernet speeds will dramatically increase the appeal of iSCSI. 1.4.3 iSCSI versus FCoE The section highlights the similarities and differences between iSCSI and FCoE. However, in most cases, considerations other than purely technical ones will influence your decision in choosing one over the other. iSCSI and FCoE have the following similarities: Both protocols are block-oriented storage protocols. That is, the file system logic for accessing storage with either of them is on the computer where the initiator is, not on the storage hardware. Therefore, they are both different from typical network-attached storage (NAS) technologies, which are file oriented. Both protocols implement Ethernet-attached storage. Both protocols can be implemented in hardware, which is detected by the operating system of the host as an HBA. Both protocols can also be implemented by using software initiators which are available in various server operating systems. However, this approach uses resources of the main processor to perform tasks which would otherwise be performed by the hardware of an HBA. Both protocols can use the Converged Enhanced Ethernet (CEE), also referred to as Data Center Bridging), standards to deliver “lossless” traffic over Ethernet. Both protocols are alternatives to traditional FC storage and FC SANs. iSCSI and FCoE have the following differences: iSCSI uses TCP/IP as its transport, and FCoE uses Ethernet. iSCSI can use media other than Ethernet, such as InfiniBand, and iSCSI can use Layer 3 routing in an IP network. Numerous vendors provide local iSCSI storage targets, some of which also support Fibre Channel and other storage technologies. Relatively few native FCoE targets are available at this time, which might allow iSCSI to be implemented at a lower overall capital cost. FCoE requires a gateway function, usually called a Fibre Channel Forwarder (FCF), which allows FCoE access to traditional FC-attached storage. This approach allows FCoE and traditional FC storage access to coexist either as a long-term approach or as part of a migration. The G8264CS and CN4093 switches can be used to provide FCF functionality.
  28. 28. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 17 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm iSCSI-to-FC gateways exist but are not required when a storage device is used that can accept iSCSI traffic directly. Except in the case of a local FCoE storage target, the last leg of the connection uses FC to reach the storage. FC uses 8b/10b encoding, which means that, sending 8 bits of data requires a transmission of 10 bits over the wire or 25% overhead that is transmitted over the network to prevent corruption of the data. The 10 Gbps Ethernet uses 64b/66b encoding, which has a far smaller overhead. iSCSI includes IP headers and Ethernet (or other media) headers with every frame, which adds overhead. The largest payload that can be sent in an FCoE frame is 2112. iSCSI can use jumbo frame support on Ethernet and send 9K or more in a single frame. iSCSI has been on the market for several years longer than FCoE. Therefore, the iSCSI standards are more mature than FCoE. Troubleshooting FCoE end-to-end requires Ethernet networking skills and FC SAN skills.
  29. 29. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 18 NIC Virtualization in IBM Flex System Fabric Solutions
  30. 30. © Copyright IBM Corp. 2014. All rights reserved. 19 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Chapter 2. IBM Flex System networking architecture and Fabric portfolio The Flex System chassis delivers high-speed performance complete with integrated servers, storage, and networking for multi-chassis management in data center compute environments. Furthermore, its flexible design can meet the needs of varying workloads with independently scalable IT resource pools for higher usage and lower cost per workload. Although increased security and resiliency protect vital information and promote maximum uptime, the integrated, easy-to-use management system reduces setup time and complexity, which provides a quicker path to return on investment (ROI). This chapter includes the following topics: 2.1, “Enterprise Chassis I/O architecture” on page 20 2.2, “IBM Flex System Fabric I/O modules” on page 23 2.3, “IBM Flex System Virtual Fabric adapters” on page 42 2
  31. 31. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 20 NIC Virtualization in IBM Flex System Fabric Solutions 2.1 Enterprise Chassis I/O architecture The Fabric networking I/O architecture for the IBM Flex System Enterprise Chassis includes an array of connectivity options for server nodes that are installed in the enclosure. Flex System Fabric I/O modules offer a local switching model that provides superior performance, cable reduction and a rich feature set that is fully integrated into the operation and management of the Enterprise Chassis. From a physical I/O module bay perspective, the Enterprise Chassis has four I/O bays in the rear of the chassis. The physical layout of these I/O module bays is shown in Figure 2-1. Figure 2-1 Rear view of the Enterprise Chassis showing I/O module bays From a midplane wiring point of view, the Enterprise Chassis provides 16 lanes out of each half-wide node bay (toward the rear I/O bays) with each lane capable of 16 Gbps or higher speeds. How these lanes are used is a function of which adapters are installed in a node, which I/O module is installed in the rear, and which port licenses are enabled on the I/O module. How the midplane lanes connect between the node bays upfront and the I/O bays in the rear is shown in Figure 2-2 on page 21. The concept of an I/O module Upgrade Feature on Demand (FoD) also is shown in Figure 2-2 on page 21. From a physical perspective, an upgrade FoD in this context is a bank of 14 ports and some number of uplinks that can be enabled and used on a switch module. By default, all I/O modules include the base set of ports, and thus have 14 internal ports, one each connected to the 14 compute node bays in the front. By adding an upgrade license to the I/O module, it is possible to add more banks of 14 ports (plus some number of uplinks) to an I/O module. The node needs an adapter that has the necessary physical ports to connect to the new lanes enabled by the upgrades. Those lanes connect to the ports in the I/O module enabled by the upgrade. I/O module bay 1 I/O module bay 3 I/O module bay 2 I/O module bay 4
  32. 32. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 21 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Figure 2-2 Sixteen lanes total of a single half-wide node bay toward the I/O bays For example, if a node were installed with only the dual port LAN on system board (LOM) adapter, only two of the 16 lanes are used (one to I/O bay 1 and one to I/O bay 2), as shown in Figure 2-3 on page 22. If a node was installed without LOM and two quad port adapters were installed, eight of the 16 lanes are used (two to each of the four I/O bays). This installation can potentially provide up to 320 Gb of full duplex Ethernet bandwidth (16 lanes x 10 Gb x 2) to a single half-wide node and over half a terabit (Tb) per second of bandwidth to a full-wide node. Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the Fabric I/O modules which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, as shown in Figure 2-2, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Node Bay 1 InterfaceConnectorTo Adapter 2 To LOM or Adapter 1 InterfaceConnector Midplane I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  33. 33. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 22 NIC Virtualization in IBM Flex System Fabric Solutions Figure 2-3 Dual port LOM connecting to ports on I/O bays 1 and 2 (all other lanes unused) Today, there are limits on the port density of the current I/O modules, in that only the first three lanes are potentially available from the I/O module. By default, each I/O module provides a single connection (lane) to each of the 14 half-wide node bays upfront. By adding port licenses, an EN4093R 10Gb Scalable Switch, CN4093 10Gb Converged Scalable Switch or SI4093 System Interconnect Module can each provide up to three 10 Gb ports to each of the 14 half-wide node bays. As an example, if two 8-port adapters were installed and four I/O modules were installed with all upgrades, the end node has access to 12 10G lanes (three to each switch). On the 8-port adapter, two lanes are unavailable at this time. Concerning port licensing, the default available upstream connections also are associated with port licenses. For more information about these connections and the node that face links, see 2.2, “IBM Flex System Fabric I/O modules” on page 23. All I/O modules include a base set of 14 downstream ports. The Ethernet switching and interconnect I/O modules support more than the base set of ports, and the ports are enabled by the upgrades. For more information, see the respective I/O module section in 2.2, “IBM Flex System Fabric I/O modules” on page 23. As of this writing, although no I/O modules and node adapter combinations can use all 16 lanes between a compute node bay and the I/O bays, the lanes exist to ensure that the Enterprise Chassis can use future available capacity. Beyond the physical aspects of the hardware, there are certain logical aspects that ensure that the Enterprise Chassis can integrate seamlessly into any modern data centers infrastructure. Many of these enhancements, such as vNIC, VMready, and 802.1Qbg, revolve around integrating virtualized servers into the environment. Fibre Channel over Ethernet (FCoE) Node Bay 1 InterfaceConnectorInterfaceConnector Midplane Dual port Ethernet Adapter LAN on Motherboard I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  34. 34. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 23 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm allows users to converge their Fibre Channel traffic onto their 10 Gb Ethernet network, which reduces the number of cables and points of management that is necessary to connect the Enterprise Chassis to the upstream infrastructures. The wide range of physical and logical Ethernet networking options that are available today and in development ensure that the Enterprise Chassis can meet the most demanding I/O connectivity challenges now and as the data center evolves. 2.2 IBM Flex System Fabric I/O modules The IBM Flex System Enterprise Chassis features a number of Fabric I/O module solutions that provide a combination of 1 Gb and 10 Gb ports to the servers and 1 Gb, 10 Gb, and 40 Gb for uplink connectivity to the outside upstream infrastructure. The IBM Flex System Enterprise Chassis ensures that a suitable selection is available to meet the needs of the server nodes. The following Flex System Fabric modules are available for deployment within the Enterprise Chassis: 2.2.1, “IBM Flex System Fabric EN4093R 10Gb Scalable Switch” 2.2.2, “IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch” on page 30 2.2.3, “IBM Flex System Fabric SI4093 System Interconnect Module” on page 36 Some of the Fabric I/O module selection criteria are summarized in Table 2-1. Table 2-1 fabric module selection criteria External cabling: SFP, SFP+, and QSFP+ transceivers or DAC cables are required for external fabric module connectivity. Compatible transceivers and cables are listed in 2.2.4, “I/O modules and cables” on page 41. Suitable Fabric module  Fabric modules Requirement SI4093 System Interconnect Module EN4093R 10Gb Scalable Switch CN4093 10Gb Converged Scalable Switch Gigabit Ethernet to nodes Yes Yes Yes 10 Gb Ethernet to nodes Yes Yes Yes 10 Gb Ethernet uplinks Yes Yes Yes 40 Gb Ethernet uplinks Yes Yes Yes Basic Layer 2 switching Yes Yes Yes Advanced Layer 2 switching: IEEE features (STP, QoS) No Yes Yes Layer 3 IPv4 switching (forwarding, routing, ACL filtering) No Yes Yes Layer 3 IPv6 switching (forwarding, routing, ACL filtering) No Yes Yes 10 Gb Ethernet CEE Yes Yes Yes FCoE FIP Snooping Bridge support Yes Yes Yes FCF support No No Yes Native FC port support No No Yes
  35. 35. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 24 NIC Virtualization in IBM Flex System Fabric Solutions 2.2.1 IBM Flex System Fabric EN4093R 10Gb Scalable Switch The IBM Flex System Fabric EN4093R 10Gb Scalable Switch provides unmatched scalability, port flexibility and performance, while also delivering innovations to help address a number of networking concerns today and providing capabilities that will help you prepare for the future. This switch is capable of supporting up to sixty-four 10 Gb Ethernet connections while offering Layer 2/3 switching, in addition to OpenFlow and "easy connect" modes. It is designed to install within the I/O module bays of the IBM Flex System Enterprise Chassis. This switch can help clients migrate to a 10 Gb or 40 Gb Ethernet infrastructure and offers cloud ready virtualization features like Virtual Fabric and VMready in addition to being Software Defined Network (SDN) ready. The EN4093R switch is shown in Figure 2-4. Figure 2-4 The IBM Flex System Fabric EN4093R 10Gb Scalable Switch The EN4093R switch is initially licensed for 24x 10 GbE ports. Further ports can be enabled with Upgrade 1 and Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied. Switch stacking No Yes Yes 802.1Qbg Edge Virtual Bridge support Yes Yes Yes vLAG support No Yes Yes UFP support Yes Yes Yes Virtual Fabric mode vNIC support No Yes Yes Switch Independent mode vNIC support Yes Yes Yes SPAR support Yes Yes Yes Openflow support No Yes No Suitable Fabric module  Fabric modules Requirement SI4093 System Interconnect Module EN4093R 10Gb Scalable Switch CN4093 10Gb Converged Scalable Switch EN4093: The EN4093, non R, is no longer being marketed. For information on the older EN4093, visit IBM Flex System InfoCenter publications.
  36. 36. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 25 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Table 2-2 lists the part numbers for ordering the switch and the upgrades. Table 2-2 EN4093R 10Gb Scalable Switch part numbers and port upgrades (default port mapping) With flexible port mapping, clients have licenses for a specific number of ports: 95Y3309 is the part number for the base switch, and it provides 24x 10 GbE port licenses that can enable any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port). 49Y4798 (Upgrade 1) upgrades the base switch by activation of 14 internal 10 GbE ports and two external 40 GbE ports which is equivalent to adding 22 more 10 GbE port licenses for a total of 46x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. 88Y6037 (Upgrade 2) requires the base switch and Upgrade 1 already be activated and simply activates all the ports on the EN4093R which is 42 internal 10 GbE ports, 14 external SFP+ ports, and two external QSFP+ ports. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the EN4093R are enabled. Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 40 GbE ports (external) 05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 10x external 10 GbE ports 14x internal 10 GbE ports 14 10 0 49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 1) Adds 2x external 40 GbE ports Adds 14x internal 10 GbE ports 28 10 2 88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 2) (requires Upgrade 1): Adds 4x external 10 GbE ports Add 14x internal 10 GbE ports 42 14 2 Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the EN4093R which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Flexible port mapping is not available in Stacking mode. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the EN4093R are enabled.
  37. 37. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 26 NIC Virtualization in IBM Flex System Fabric Solutions Table 2-3 lists supported port combinations with flexible port mapping. Table 2-3 EN4093R 10Gb Scalable Switch part numbers and port upgrades (flexible port mapping) The IBM Flex System Fabric EN4093R 10Gb Scalable Switch has the following features and specifications: Internal ports – 42 internal full-duplex 10 Gigabit Ethernet ports. – Two internal full-duplex 1 GbE ports connected to the chassis management module. External ports – 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or SFP+ direct-attach copper cables. SFP+ modules and DAC cables are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers, QSFP+ to QSFP+ DAC cables, or QSFP+ to 4x 10 Gb SFP+ break-out cables. QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides an additional means to configure the switch module. Scalability and performance – 40 Gb Ethernet ports for extreme external bandwidth and performance – Fixed-speed external 10 Gb Ethernet ports to leverage 10 GbE core infrastructure – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps – Media access control (MAC) address learning: automatic update, support for up to 128,000 MAC addresses – Up to 128 IP interfaces per switch – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total external bandwidth per switch, up to 64 trunk groups, up to 16 ports per group Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal and external) 40 GbE ports (external) 05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 24 0 20 1 16 2 49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 1) 46 0 42 1 38 2 88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 2) (requires Upgrade 1)b b. Flexible port mapping is not used with Upgrade 2 because with Upgrade 2 all ports on the switch become licensed and there is no need to reassign ports. 56 2
  38. 38. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 27 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm – Support for jumbo frames (up to 9,216 bytes) – Broadcast/multicast storm control – IGMP snooping to limit flooding of IP multicast traffic – IGMP filtering to control multicast traffic for hosts participating in multicast groups – Configurable traffic distribution schemes over trunk links based on source/destination IP or MAC addresses, or both – Fast port forwarding and fast uplink convergence for rapid STP convergence Availability and redundancy – Virtual Router Redundancy Protocol (VRRP) for Layer 3 router redundancy – IEEE 802.1D STP for providing L2 redundancy – IEEE 802.1s Multiple STP (MSTP) for topology optimization, up to 32 STP instances are supported by a single switch – IEEE 802.1w Rapid STP (RSTP) provides rapid STP convergence for critical delay-sensitive traffic like voice or video – Per-VLAN Rapid STP (PVRST) enhancements – Layer 2 Trunk Failover to support active/standby configurations of network adapter teaming on compute nodes – Hot Links provides basic link redundancy with fast recovery for network topologies that require Spanning Tree to be turned off VLAN support – Up to 4095 VLANs supported per switch, with VLAN numbers ranging from 1 to 4095 (4095 is used for management module’s connection only.) – 802.1Q VLAN tagging support on all ports Private VLANs Security – VLAN-based, MAC-based, and IP-based access control lists (ACLs) – 802.1x port-based authentication – Multiple user IDs and passwords – User access control – Radius, TACACS+ and LDAP authentication and authorization – NIST 800-131A Encryption – Selectable encryption protocol; SHA 256 enabled as default – IPv6 ACL metering Quality of Service (QoS) – Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and destination addresses, VLANs) traffic classification and processing – Traffic shaping and re-marking based on defined policies – Eight Weighted Round Robin (WRR) priority queues per port for processing qualified traffic IP v4 Layer 3 functions – Host management – IP forwarding – IP filtering with ACLs, up to 896 ACLs supported
  39. 39. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 28 NIC Virtualization in IBM Flex System Fabric Solutions – VRRP for router redundancy – Support for up to 128 static routes – Routing protocol support (RIP v1, RIP v2, OSPF v2, BGP-4); up to 2048 entries in a routing table – Support for DHCP Relay – Support for IGMP snooping and IGMP relay – Support for Protocol Independent Multicast (PIM) in Sparse Mode (PIM-SM) and Dense Mode (PIM-DM). IPv6 Layer 3 functions – IPv6 host management (except default switch management IP address) – IPv6 forwarding – Up to 128 static routes – Support for OSPF v3 routing protocol – IPv6 filtering with ACLs – Virtual Station Interface Data Base (VSIDB) support OpenFlow support – OpenFlow 1.0 and 1.3.1 – OpenFlow hybrid mode Virtualization – Virtual NICs (vNICs) • Ethernet, iSCSI, or FCoE traffic is supported on vNICs – Unified fabric ports (UFPs) • Ethernet or FCoE traffic is supported on UFPs • Supports up to 256 VLAN for the virtual ports • Integration with L2 failover – Virtual link aggregation groups (vLAGs) – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware. • Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor. • Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data units. • Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that will persist with the VM, independent of its location. • EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, VDP. – VMready – Switch partitioning (SPAR) • SPAR forms separate virtual switching contexts by segmenting the data plane of the module. Data plane traffic is not shared between SPARs on the same switch. • SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached to a SPAR can communicate with each other and with the upstream switch. Hosts on the same VLAN but attached to different SPARs communicate through the upstream switch.
  40. 40. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 29 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm • SPAR is implemented as a dedicated VLAN with a set of internal compute node ports and a single external port or link aggregation (LAG). Multiple external ports or LAGs are not allowed in SPAR. A port can be a member of only one SPAR. Converged Enhanced Ethernet – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities. – Support for SPAR and FCoE Fibre Channel over Ethernet (FCoE) – FC-BB5 FCoE specification compliant – FCoE transit switch operations – FCoE Initialization Protocol (FIP) support for automatic ACL configuration – FCoE Link Aggregation Group (LAG) support – Multi-hop RDMA over Converged Ethernet (RoCE) with LAG support – Supports 2,000 secure FCoE sessions with FIP Snooping by using Class ID ACLs Stacking – Up to eight switches in a stack - single IP management – Hybrid stacking support (from two to six EN4093R switches with two CN4093 switches) – FCoE support • FCoE LAG on external ports – 802.1Qbg support – vNIC and UFP support • Support for UFP with 802.1Qbg Manageability – Simple Network Management Protocol (SNMP V1, V2 and V3) – HTTP browser GUI – Telnet interface for CLI – SSH – Secure FTP (sFTP) – Service Location Protocol (SLP) – Serial interface for CLI – Scriptable CLI – Firmware image update (TFTP and FTP) – Network Time Protocol (NTP) and Precision Time Protocol (PTP) Monitoring – Switch LEDs for external port status and switch module status indication – Remote Monitoring (RMON) agent to collect statistics and proactively monitor switch performance – Port mirroring for analyzing network traffic passing through switch – Change tracking and remote logging with syslog feature – Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW analyzer required elsewhere) – POST diagnostics
  41. 41. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 30 NIC Virtualization in IBM Flex System Fabric Solutions For more information, see IBM Flex System Fabric EN4093R 10Gb Scalable Switch, TIPS0864, which is available at this website: http://www.redbooks.ibm.com/abstracts/tips0864.html 2.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch provides unmatched scalability, port flexibility, performance, convergence, and network virtualization, while also delivering innovations to help address a number of networking concerns today and providing capabilities that will help you prepare for the future. The switch offers full Layer 2/3 switching, transparent "easy connect" mode, as well as FCoE Full Fabric and Fibre Channel NPV Gateway operations to deliver a truly converged integrated solution, and it is designed to install within the I/O module bays of the IBM Flex System Enterprise Chassis. The switch can help clients migrate to a 10 GbE or 40 GbE converged Ethernet infrastructure and offers virtualization features like Virtual Fabric and VMready. Figure 2-5 shows the IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. Figure 2-5 IBM Flex System Fabric CN4093 10 Gb Converged Scalable Switch The CN4093 switch is initially licensed for 22x 10 GbE ports. Further ports can be enabled with Upgrade 1 and Upgrade 2 license options. Upgrade 1 and Upgrade 2 can be applied on the switch independently from each other or in combination for full feature capability. Table 2-4 lists the part numbers for ordering the switch and the upgrades. Table 2-4 CN4093 10Gb Converged Scalable Switch part numbers and port upgrades (default port mapping) Part number Feature codea Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 10 Gb Omni ports (external) 40 GbE ports (external) 00D5823 A3HH / ESW2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 2x external 10 GbE ports 6x external Omni ports 14x internal 10 GbE ports 14 2 6 0 00D5845 A3HL / ESU1 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) Adds 2x external 40 GbE ports Adds 14x internal 10 GbE ports 28 2 6 2
  42. 42. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 31 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm With flexible port mapping, clients have licenses for a specific number of ports: 00D5823 is the part number for the base switch, and it provides 22x 10 GbE port licenses that can enable any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port). 00D5845 (Upgrade 1) upgrades the base switch by activation of 14 internal 10 GbE ports and two external 40 GbE ports which is equivalent to adding 22 more 10 GbE port licenses for a total of 44x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. 00D5847 (Upgrade 2) upgrades the base switch by activation of 14 internal 10 GbE ports and six external Omni Ports which is equivalent to adding 20 more 10 GbE port licenses for a total of 42x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. Both 00D5845 (Upgrade 1) and 00D5847 (Upgrade 2) simply activate all the ports on the CN4093 which is 42 internal 10 GbE ports, two external SFP+ ports, 12 external Omni Ports and two external QSFP+ ports. 00D5847 A3HM / ESU2 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) Adds 6x external Omni ports Adds 14x internal 10 GbE ports 28 2 12 0 Both Upgrade 1 and Upgrade 2 applied 42 2 12 2 a. x-config / e-config feature code Part number Feature codea Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 10 Gb Omni ports (external) 40 GbE ports (external) Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the CN4093 which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports and Omni Ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Flexible port mapping is not available in Stacking mode. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the CN4093 are enabled.
  43. 43. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 32 NIC Virtualization in IBM Flex System Fabric Solutions Table 2-5 lists supported port combinations with flexible port mapping. Table 2-5 CN4093 10Gb Converged Scalable Switch part numbers and port upgrades (flexible port mapping) The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch has the following features and specifications: Internal ports – Forty-two internal full-duplex 10 Gigabit ports – Two internal full-duplex 1 GbE ports connected to the Chassis Management Module External ports – Two ports for 1 Gb or 10 Gb Ethernet SFP/SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, 10GBASE-LR, or SFP+ direct-attach copper (DAC) cables. SFP+ modules and DAC cables are not included and must be purchased separately. – Twelve IBM Omni Ports, each of which can operate as a 10 Gb Ethernet (support for 10GBASE-SR, 10GBASE-LR, or 10 GbE SFP+ DAC cables), or auto-negotiating 4/8 Gb Fibre Channel, depending on the SFP+ transceiver installed in the port. SFP+ modules and DAC cables are not included and must be purchased separately. (Omni Ports do not support 1 Gb SFP Ethernet transceivers.) – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DAC cables. In addition, you can use break-out cables to break out each 40 GbE port into four 10 GbE SFP+ connections. QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides an additional means to configure the switch module. Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal and external) and Omni ports (external) 40 GbE ports (external) 00D5823 A3HH / ESW2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 22 0 18 1 14 2 00D5845 A3HL / ESU1 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) 44 0 40 1 36 2 00D5847 A3HM / ESU2 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) 42 0 38 1 34 2 Both Upgrade 1 and Upgrade 2 appliedb b. Flexible port mapping is not used when both Upgrade 1 and Upgrade 2 are applied because with both upgrades all ports on the switch become licensed and there is no need to reassign ports. 56 2

×