• Like

NIC Virtualization on IBM Flex Systems

  • 1,411 views
Uploaded on

SOURCE URL: http://www.redbooks.ibm.com/redpieces/abstracts/sg248223.html …

SOURCE URL: http://www.redbooks.ibm.com/redpieces/abstracts/sg248223.html

Learn how to deploy vNICs with IBM Flex System Manager patterns!

More in: Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
1,411
On Slideshare
0
From Embeds
0
Number of Embeds
3

Actions

Shares
Downloads
34
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Draft Document for Review July 18, 2014 10:18 pm SG24-8223-00 ibm.com/redbooks Front cover NIC Virtualization in IBM Flex System Fabric Solutions Scott Irwin Scott Lorditch Matt Slavin Ilya Krutov Introduces NIC virtualization concepts and technologies Discusses UFP and vNIC deployment scenarios Provides UFP and vNIC configuration examples
  • 2. International Technical Support Organization NIC Virtualization in IBM Flex System Fabric Solutions June 2014 Draft Document for Review July 18, 2014 10:18 pm 8223edno.fm SG24-8223-00
  • 3. © Copyright International Business Machines Corporation 2014. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. 8223edno.fm Draft Document for Review July 18, 2014 10:18 pm First Edition (June 2014) This edition applies to: IBM Networking Operating System 7.8 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch IBM Flex System Fabric EN4093R 10Gb Scalable Switch IBM Flex System Embedded 10Gb Virtual Fabric Adapter IBM Flex System CN4054 10Gb Virtual Fabric Adapter IBM Flex System CN4054R 10Gb Virtual Fabric Adapter This document was created or updated on July 18, 2014. Note: Before using this information and the product it supports, read the information in “Notices” on page v.
  • 4. © Copyright IBM Corp. 2014. All rights reserved. iii Draft Document for Review July 18, 2014 10:18 pm 8223TOC.fm Contents Notices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .v Trademarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi Preface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Authors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii Now you can become a published author, too! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . viii Comments welcome. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Stay connected to IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Overview of Flex System network virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Introduction to NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.1 vNIC based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2.2 Unified Fabric Port based NIC virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.3 Comparing vNIC modes and UFP modes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Introduction to I/O module virtualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.3.1 Introduction to vLAG . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.3.2 Introduction to stacking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.3.3 Introduction to SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.3.4 Easy Connect Q-in-Q solutions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.3.5 Introduction to the Failover feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.4 Introduction to converged fabrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 1.4.1 Fibre Channel over Ethernet. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 1.4.2 iSCSI. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.4.3 iSCSI versus FCoE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 Chapter 2. IBM Flex System networking architecture and Fabric portfolio. . . . . . . . . 19 2.1 Enterprise Chassis I/O architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 2.2 IBM Flex System Fabric I/O modules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2.1 IBM Flex System Fabric EN4093R 10Gb Scalable Switch . . . . . . . . . . . . . . . . . . 24 2.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. . . . . . . . . . 30 2.2.3 IBM Flex System Fabric SI4093 System Interconnect Module. . . . . . . . . . . . . . . 36 2.2.4 I/O modules and cables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.3 IBM Flex System Virtual Fabric adapters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.1 Embedded 10Gb Virtual Fabric Adapter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 2.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. . . . . . . . . . . 44 Chapter 3. NIC virtualization considerations on the switch side . . . . . . . . . . . . . . . . . 47 3.1 Virtual Fabric vNIC solution capabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 3.1.1 Virtual Fabric mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49 3.1.2 Switch Independent mode vNIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 3.2 Unified Fabric Port feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 3.2.1 UFP Access and Trunk modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 3.2.2 UFP Tunnel mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 3.2.3 UFP FCoE mode. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 3.2.4 UFP Auto mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.2.5 UFP vPort considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 3.3 Compute node NIC to I/O module connectivity mapping . . . . . . . . . . . . . . . . . . . . . . . 61
  • 5. 8223TOC.fm Draft Document for Review July 18, 2014 10:18 pm iv NIC Virtualization in IBM Flex System Fabric Solutions 3.3.1 Embedded 10 Gb VFA (LOM) - Mezzanine 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 3.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1. . . . . . . . . . . . . . 62 3.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2. . . . . . . . . 63 3.3.4 IBM Flex System x222 Compute Node. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Chapter 4. NIC virtualization considerations on the server side. . . . . . . . . . . . . . . . . 65 4.1 Enabling virtual NICs on the server via UEFI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 4.1.1 Getting in to the virtual NIC configuration section of UEFI . . . . . . . . . . . . . . . . . . 66 4.1.2 Initially enabling virtual NIC functionality via UEFI . . . . . . . . . . . . . . . . . . . . . . . . 75 4.1.3 Special settings for the different modes of virtual NIC via UEFI . . . . . . . . . . . . . . 76 4.1.4 Setting the Emulex virtual NIC settings back to factory default. . . . . . . . . . . . . . . 81 4.2 Enabling virtual NICs via Configuration Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 4.3 Utilizing physical and virtual NICs in the OSes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 4.3.1 Introduction to teaming/bonding on the server . . . . . . . . . . . . . . . . . . . . . . . . . . 106 4.3.2 OS side teaming/bonding and upstream network requirements . . . . . . . . . . . . . 112 4.3.3 Discussion of physical NIC connections and logical enumeration . . . . . . . . . . . 119 Chapter 5. Flex System NIC virtulization deployment scenarios . . . . . . . . . . . . . . . . 123 5.1 Introduction to deployment examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124 5.2 UFP mode virtual NIC and Layer 2 Failover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 5.2.3 Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 5.2.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 5.3 UFP mode virtual NIC with vLAG and FCoE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 5.3.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136 5.3.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 5.3.5 Confirming operation of the environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 5.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover . . . . . . . . . . . . . . . . . . . . 149 5.4.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150 5.4.2 Topologies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.4.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151 5.4.4 Configurations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 5.4.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166 5.5 Switch Independent mode with SPAR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.1 Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.2 Topology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 5.5.3 Use cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 176 5.5.4 Configuration. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 5.5.5 Verifying operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 Abbreviations and acronyms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Related publications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 IBM Redbooks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191 Help from IBM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
  • 6. © Copyright IBM Corp. 2014. All rights reserved. v Draft Document for Review July 18, 2014 10:18 pm 8223spec.fm Notices This information was developed for products and services offered in the U.S.A. IBM may not offer the products, services, or features discussed in this document in other countries. Consult your local IBM representative for information on the products and services currently available in your area. Any reference to an IBM product, program, or service is not intended to state or imply that only that IBM product, program, or service may be used. Any functionally equivalent product, program, or service that does not infringe any IBM intellectual property right may be used instead. However, it is the user's responsibility to evaluate and verify the operation of any non-IBM product, program, or service. IBM may have patents or pending patent applications covering subject matter described in this document. The furnishing of this document does not grant you any license to these patents. You can send license inquiries, in writing, to: IBM Director of Licensing, IBM Corporation, North Castle Drive, Armonk, NY 10504-1785 U.S.A. The following paragraph does not apply to the United Kingdom or any other country where such provisions are inconsistent with local law: INTERNATIONAL BUSINESS MACHINES CORPORATION PROVIDES THIS PUBLICATION "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESS OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF NON-INFRINGEMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Some states do not allow disclaimer of express or implied warranties in certain transactions, therefore, this statement may not apply to you. This information could include technical inaccuracies or typographical errors. Changes are periodically made to the information herein; these changes will be incorporated in new editions of the publication. IBM may make improvements and/or changes in the product(s) and/or the program(s) described in this publication at any time without notice. Any references in this information to non-IBM websites are provided for convenience only and do not in any manner serve as an endorsement of those websites. The materials at those websites are not part of the materials for this IBM product and use of those websites is at your own risk. IBM may use or distribute any of the information you supply in any way it believes appropriate without incurring any obligation to you. Any performance data contained herein was determined in a controlled environment. Therefore, the results obtained in other operating environments may vary significantly. Some measurements may have been made on development-level systems and there is no guarantee that these measurements will be the same on generally available systems. Furthermore, some measurements may have been estimated through extrapolation. Actual results may vary. Users of this document should verify the applicable data for their specific environment. Information concerning non-IBM products was obtained from the suppliers of those products, their published announcements or other publicly available sources. IBM has not tested those products and cannot confirm the accuracy of performance, compatibility or any other claims related to non-IBM products. Questions on the capabilities of non-IBM products should be addressed to the suppliers of those products. This information contains examples of data and reports used in daily business operations. To illustrate them as completely as possible, the examples include the names of individuals, companies, brands, and products. All of these names are fictitious and any similarity to the names and addresses used by an actual business enterprise is entirely coincidental. COPYRIGHT LICENSE: This information contains sample application programs in source language, which illustrate programming techniques on various operating platforms. You may copy, modify, and distribute these sample programs in any form without payment to IBM, for the purposes of developing, using, marketing or distributing application programs conforming to the application programming interface for the operating platform for which the sample programs are written. These examples have not been thoroughly tested under all conditions. IBM, therefore, cannot guarantee or imply reliability, serviceability, or function of these programs.
  • 7. 8223spec.fm Draft Document for Review July 18, 2014 10:18 pm vi NIC Virtualization in IBM Flex System Fabric Solutions Trademarks IBM, the IBM logo, and ibm.com are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. These and other IBM trademarked terms are marked on their first occurrence in this information with the appropriate symbol (® or ™), indicating US registered or common law trademarks owned by IBM at the time this information was published. Such trademarks may also be registered or common law trademarks in other countries. A current list of IBM trademarks is available on the Web at http://www.ibm.com/legal/copytrade.shtml The following terms are trademarks of the International Business Machines Corporation in the United States, other countries, or both: Blade Network Technologies® BladeCenter® BNT® IBM® IBM Flex System® PowerVM® PureFlex® Redbooks® Redbooks (logo) ® System x® VMready® The following terms are trademarks of other companies: Intel, Intel Xeon, Intel logo, Intel Inside logo, and Intel Centrino logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.
  • 8. © Copyright IBM Corp. 2014. All rights reserved. vii Draft Document for Review July 18, 2014 10:18 pm 8223pref.fm Preface The deployment of server virtualization technologies in data centers requires significant efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen applications and services. Each of these services requires certain bandwidth (or speed) to function properly. Furthermore, because of different network traffic patterns that are relevant to different service types, these traffic flows can interfere with each other. They can lead to serious network problems, including the inability of the service to perform its functions. The NIC virtualization in IBM® Flex System Fabric solutions addresses these issues. The solutions are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure. This infrastructure is built on IBM Flex System Fabric CN4093 and EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System Fabric SI4093 Switch Interconnect modules in the chassis and the Emulex Virtual Fabric Adapters in each compute node. This IBM Redbooks® publication introduces NIC virtualization concepts and technologies, discusses their deployment scenarios, and provide configuration examples that use IBM Networking OS technologies combined with the Emulex Virtual Fabric adapters. This book is for IBM, IBM Business Partner and client networking professionals who want to learn how to implement NIC virtualization solutions and switch interconnect technologies on IBM Flex System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode. This book assumes that the reader has basic knowledge of the networking concepts and technologies, including OSI model, Ethernet LANs, Spanning Tree protocol, VLANs, VLAN tagging, uplinks, trunks, and static and dynamic (LACP) link aggregation. Authors This book was produced by a team of specialists from around the world working at the International Technical Support Organization, Raleigh Center. Ilya Krutov is a Project Leader at the ITSO Center in Raleigh and has been with IBM since 1998. Before he joined the ITSO, Ilya served in IBM as a Run Rate Team Leader, Portfolio Manager, Brand Manager, Technical Sales Specialist, and Certified Instructor. Ilya has expert knowledge in IBM System x®, BladeCenter®, and Flex System products and technologies, virtualization and cloud computing, and data center networking. He has authored over 150 books, papers, product guides, and solution guides. He has a bachelor’s degree in Computer Engineering from the Moscow Engineering and Physics Institute. Scott Irwin is a Consulting System Engineer (CSE) for IBM System Networking. He joined IBM in November of 2010 as part of the Blade Network Technologies®, (BNT®) acquisition. His Networking background spans well over 16 years as both a Customer Support Escalation Engineer and a Customer facing Field Systems Engineer. In May of 2007, he was promoted to Consulting Systems Engineer with a focus on deep customer troubleshooting. His responsibilities are to support customer Proof of Concepts, assist with paid installations and training and provide support for both pre and post Sales focusing on all verticals (Public Sector, High Frequency Trading, Service Provider, Mid Market and Enterprise).
  • 9. 8223pref.fm Draft Document for Review July 18, 2014 10:18 pm viii NIC Virtualization in IBM Flex System Fabric Solutions Scott Lorditch is a Consulting Systems Engineer for IBM System Networking. He performs network architecture assessments, and develops designs and proposals for implementing GbE Switch Module products for the IBM BladeCenter. He also developed several training and lab sessions for IBM technical and sales personnel. Previously, Scott spent almost 20 years working on networking in various industries, working as a senior network architect, a product manager for managed hosting services, and manager of electronic securities transfer projects. Scott holds a BS degree in Operations Research with a specialization in computer science from Cornell University. Matt Slavin is a Consulting Systems Engineer for IBM Systems Networking, based out of Tulsa, Oklahoma, and currently providing network consulting skills to the Americas. He has a background of over 30 years of hands-on systems and network design, installation, and troubleshooting. Most recently, he has focused on data center networking where he is leading client efforts in adopting new and potently game-changing technologies into their day-to-day operations. Matt joined IBM through the acquisition of Blade Network Technologies, and prior to that has worked at some of the top systems and networking companies in the world. Thanks to the following people for their contributions to this project: Tamikia Barrow, Cheryl Gera, Chris Rayns, Jon Tate, David Watts, Debbie Willmschen International Technical Support Organization, Raleigh Center Nghiem Chu, Sai Chan, Michael Easterly, Heidi Griffin, Bob Louden, Richard Mancini, Shekhar Mishra, Heather Richardson, Hector Sanchez, Tim Shaughnessy IBM Jeff Lin Emulex Now you can become a published author, too! Here’s an opportunity to spotlight your skills, grow your career, and become a published author—all at the same time! Join an ITSO residency project and help write a book in your area of expertise, while honing your experience using leading-edge technologies. Your efforts will help to increase product acceptance and customer satisfaction, as you expand your network of technical contacts and relationships. Residencies run from two to six weeks in length, and you can participate either in person or as a remote resident working from your home base. Find out more about the residency program, browse the residency index, and apply online at: ibm.com/redbooks/residencies.html
  • 10. Preface ix Draft Document for Review July 18, 2014 10:18 pm 8223pref.fm Comments welcome Your comments are important to us! We want our books to be as helpful as possible. Send us your comments about this book or other IBM Redbooks publications in one of the following ways: Use the online Contact us review Redbooks form found at: ibm.com/redbooks Send your comments in an email to: redbooks@us.ibm.com Mail your comments to: IBM Corporation, International Technical Support Organization Dept. HYTD Mail Station P099 2455 South Road Poughkeepsie, NY 12601-5400 Stay connected to IBM Redbooks Find us on Facebook: http://www.facebook.com/IBMRedbooks Follow us on Twitter: http://twitter.com/ibmredbooks Look for us on LinkedIn: http://www.linkedin.com/groups?home=&gid=2130806 Explore new Redbooks publications, residencies, and workshops with the IBM Redbooks weekly newsletter: https://www.redbooks.ibm.com/Redbooks.nsf/subscribe?OpenForm Stay current on recent Redbooks publications with RSS Feeds: http://www.redbooks.ibm.com/rss.html
  • 11. 8223pref.fm Draft Document for Review July 18, 2014 10:18 pm x NIC Virtualization in IBM Flex System Fabric Solutions
  • 12. © Copyright IBM Corp. 2014. All rights reserved. 1 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment This chapter introduces the various virtualization features available with certain I/O Modules and converged network adapters (CNAs) in the IBM PureFlex® System environment. The primary focus of this paper are the EN4093R, CN4093, and the SI4093, along with related server side converged network adapter (CNA) or Virtual Fabric Adapter (VFA) virtulization features. Although other I/O modules are available for the Flex System Enterprise Chassis environment, unless otherwise noted, those other I/O modules do not support the virtualization features discussed in this document and are not covered here. This chapter includes the following sections: 1.1, “Overview of Flex System network virtualization” on page 2 1.2, “Introduction to NIC virtualization” on page 3 1.3, “Introduction to I/O module virtualization” on page 6 1.4, “Introduction to converged fabrics” on page 14 1
  • 13. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 2 NIC Virtualization in IBM Flex System Fabric Solutions 1.1 Overview of Flex System network virtualization The term virtualization can mean many different things to different people, and in different contexts. For example, in the server world it is often associated with taking bare metal platforms and putting in a layer of software (referred to as a hypervisor) that permits multiple virtual machines (VMs) to run on that single physical platform, with each VM thinking it owns the entire hardware platform. In the network world, there are many different concepts of virtualization. Such things as overlay technologies, that let a user run one network on top of another network, usually with the goal of hiding the complexities of the underlying network (often referred to as overlay networking). Another form of network virtualization would be Openflow technology, which de-couples a switches control plane from the switch, and allows the switching path decisions to be made from a central control point. And then there are other forms of virtualization, such as cross chassis aggregation (also known as cross-switch aggregation), virtualized NIC technologies, and converged fabrics. This paper is focused on the latter set of virtualization forms, specifically the following set of features: Converged fabrics - Fibre Channel over Ethernet (FCoE) and Internet Small Computer Systems Interconnect (iSCSI) Virtual Link Aggregation (vLAG) - A form of cross switch aggregation Stacking - Virtualizing the management plane and the switching fabric Switch Partitioning (SPAR) - Masking the I/O Module from the host and upstream network Easy Connect Q-in-Q solutions - More ways to mask the I/O Modules from connecting devices NIC virtualization - Allowing a single physical 10 GbE NIC to represent multiple NICs to the host OS Although we will be introducing all of these topics in this chapter, the primary focus of this paper will be around how the last item (NIC virtualization) integrates into the various other features, and the surrounding customer environment. The specific NIC virtualization features that will be discussed in detail in this paper include the following: IBM Virtual Fabric mode - also known as vNIC Virtual Fabric mode, including both Dedicated Uplink Mode (default) and Shared Uplink Mode (optional) operations Switch Independent Mode - also known as vNIC Switch Independent Mode Unified Fabric Port - also known as IBM Unified Fabric Protocol, or just UFP - All modes Important: The term vNIC can be used both generically for all virtual NIC technologies, or as a vendor specific term. For example, VMware calls the virtual NIC that resides inside a VM a vNIC. Unless otherwise noted, the use of the term vNIC in this paper is referring to a specific feature available on the Flex System I/O modules and Emulex CNAs inside physical hosts. In a related fashion, the term vPort has multiple connotations, for example, used by Microsoft for their Hyper-V environment. Unless otherwise noted, the use of the term vPort in this paper is referring to the UFP feature on the Flex System I/O modules and Emulex CNAs inside physical hosts.
  • 14. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 3 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm 1.2 Introduction to NIC virtualization This section introduces the two primary types of NIC virtualization (vNIC and UFP) available on the Flex System I/O modules and adapters, as well as introduces the various sub-elements of these virtual NIC technologies. The deployment of server virtualization technologies in data centers requires significant efforts to provide sufficient network I/O bandwidth (or speed) to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen network applications and services, and each of these services requires a certain bandwidth to function properly. Furthermore, because of different network traffic patterns relevant to different service types, these traffic flows might interfere with each other. This interference can lead to serious network problems, including the inability of the service to perform its functions. Providing sufficient bandwidth and isolation to virtualized applications in a 1 Gbps network infrastructure might be challenging for blade-based deployments where the number of physical I/O ports per compute node is limited. For example, a maximum of 12 physical ports per single-wide compute node (up to six Ethernet ports per adapter) can be utilized for network connectivity. With 1 GbE, a total network bandwidth of 12 Gb per compute node is available for Gigabit Ethernet infrastructures, leaving no room for future growth. In addition, traffic flows are isolated on a physical port basis. Also, the bandwidth per interface is static with a maximum bandwidth of 1 Gb per flow, thus limiting the flexibility of bandwidth usage. IBM Flex System Fabric solutions address these issues by increasing the number of available Ethernet ports and providing more flexibility in allocating the available bandwidth to meet specific application requirements. By virtualizing a 10 Gbps NIC, its resources can be divided into multiple logical instances or virtual NICs. Each virtual NIC appears as a regular, independent NIC to the server operating system or hypervisor, and each virtual NIC uses a portion of the overall bandwidth of the physical NIC. For example, a NIC partition with a maximum bandwidth of 4 Gbps appears to the host applications as a physically distinct 4 Gbps Ethernet adapter. Also, the NIC partitions provide traffic forwarding and port isolation. The virtual NIC technologies discussed for the I/O module here are all directly tied to the Emulex CNA offerings for the Flex System environment, and documented in 2.3, “IBM Flex System Virtual Fabric adapters” on page 42. 1.2.1 vNIC based NIC virtualization vNIC is the original virtual NIC technology utilized in the IBM BladeCenter 10Gb Virtual Fabric Switch Module, and has been brought forward into the PureFlex System environment to allow customers that have standardized on vNIC to still use it with the PureFlex System solutions. Important: All I/O module features discussed in this paper are based on the latest available firmware at the time of writing (IBM Netwroking OS 7.8 for the EN4093R, CN4093, and SI4093 modules).
  • 15. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 4 NIC Virtualization in IBM Flex System Fabric Solutions vNIC has two primary modes: IBM Virtual Fabric mode Virtual Fabric mode offer advanced virtual NICs to servers, and it requires support on the switch side. In IBM Virtual Fabric mode, the Virtual Fabric Adapter (VFA) in the compute node communicates with the Flex System switch to obtain vNIC parameters (using DCBX). A special tag is added within each data packet and is later removed by the NIC and switch for each vNIC group to maintain separation of the virtual data paths. In IBM Virtual Fabric Mode, you can change the bandwidth allocations through the IBM switch user interfaces without requiring a reboot of the server. vNIC bandwidth allocation and metering is performed by both the switch and the VFA. In such a case, a bidirectional virtual channel of an assigned bandwidth is established between them for every defined vNIC. Switch Independent mode Switch Independent Mode offers virtual NICs to server with no special I/O module side configuration. It extends the existing customer VLANs to the virtual NIC interfaces. The IEEE 802.1Q VLAN tag is essential to the separation of the vNIC groups by the NIC adapter or driver and the switch. The VLAN tags are added to the packet by the applications or drivers at each end station rather than by the switch. vNIC bandwidth allocation and metering is only performed by VFA itself. The switch is completely unaware that the 10 GbE NIC is being seen as multiple logical NICs in the OS. In such a case, a unidirectional virtual channel is established where the bandwidth management is only performed for the outgoing traffic on a VFA side (server-to-switch). The incoming traffic (switch-to-server) uses the all available physical port bandwidth, as there is no metering performed on either the VFA or a switch side. Virtual Fabric mode vNIC has two sub-modes: vNIC Virtual Fabric - Dedicated Uplink Mode – Provides a Q-in-Q tunneling action for each vNIC group – Each vNIC group must have its own dedicated uplink path out – Any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network with Layer 3 routing vNIC Virtual Fabric - Shared Uplink Mode – Each vNIC group provides a single VLAN for all vNICs in that group – Each vNIC group must be a unique VLAN (can not use same VLAN on more than a single vNIC group) – Servers can not use tagging when Shared Uplink Mode is enabled – Like vNICs in Dedicate Uplink Mode, any vNICs in one vNIC group can not talk with vNICs in any other vNIC group, without first exiting to the upstream network with Layer 3 routing Details for enabling and configuring these modes can be found in Chapter 4, “NIC virtualization considerations on the server side” on page 65 and Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123. 1.2.2 Unified Fabric Port based NIC virtualization UFP is the current direction of IBM NIC virtualization, and provides a more feature rich solution compared to the original vNIC Virtual Fabric mode. Like Virtual Fabric mode vNIC,
  • 16. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 5 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm UFP allows carving up a single 10 Gb port into four virtual NICs (called vPorts in UFP). UFP also has a number of modes associated with it, including: Tunnel mode Provides Q-in-Q mode, where the vPort is customer VLAN-independent (very similar to vNIC Virtual Fabric Dedicated Uplink Mode) Trunk mode Provides a traditional 802.1Q trunk mode (multi-VLAN trunk link) to the virtual NIC (vPort) interface, i.e. permits host side tagging Access mode Provides a traditional access mode (single untagged VLAN) to the virtual NIC (vPort) interface which is similar to a physical port in access mode FCoE mode Provides FCoE functionality to the vPort Auto-VLAN mode Auto VLAN creation for Qbg and IBM VMready® environments Only one vPort (vPort 2) per physical port can be bound to FCoE. If FCoE is not desired, vPort 2 can be configured for one of the other modes. Details for enabling and configuring these modes can be found in Chapter 4, “NIC virtualization considerations on the server side” on page 65 and Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123. 1.2.3 Comparing vNIC modes and UFP modes As a general rule of thumb, if a customer desires virtualized NICs in the PureFlex System environment, UFP is usually the preferred solution, as all new feature development is going into UFP. If a customer has standardized on the original vNIC Virtual Fabric mode, then they can still continue to use that mode in a fully supported fashion. If a customer does not want any of the virtual NIC functionality controlled by the I/O module (only controlled and configured on the server side) then Switch Independent mode vNIC is the solution of choice. This mode has the advantage of being I/O module independent, such that any upstream I/O module can be utilized. Some of the down sides to this mode are that bandwidth restrictions can only be enforced from the server side, not the I/O module side, and to change bandwidth requires a reboot of the server (bandwidth control for the other virtual NIC modes discussed here are changed from the switch side, enforce bandwidth restrictions bidirectionally, and can be changed on the fly, with no reboot required).
  • 17. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 6 NIC Virtualization in IBM Flex System Fabric Solutions Table 1-1 shows some of the items that may effect the decision making process. Table 1-1 Attributes of virtual NIC options For a deeper dive into virtual NIC operational characteristics from the switch side see Chapter 3, “NIC virtualization considerations on the switch side” on page 47. For virtual NIC operational characteristics from the server side, see Chapter 4, “NIC virtualization considerations on the server side” on page 65. 1.3 Introduction to I/O module virtualization This section provides brief overview of Flex System I/O module virtualization technologies. The folloiwng topics are covered: 1.3.1, “Introduction to vLAG” 1.3.2, “Introduction to stacking” on page 8 1.3.3, “Introduction to SPAR” on page 9 1.3.4, “Easy Connect Q-in-Q solutions” on page 10 1.3.5, “Introduction to the Failover feature” on page 13 Capability Virtual Fabric vNIC mode Switch independent Mode vNIC UFP Dedicated uplink Shared uplink Requires support in the I/O module Yes Yes No Yes Requires support in the NIC/CNA Yes Yes Yes Yes Supports adapter transmit rate control Yes Yes Yes Yes Supports I/O module transmit rate control Yes Yes No Yes Supports changing rate without restart of node Yes Yes No Yes Requires a dedicated uplink path per vNIC group or vPort Yes No No Yes for vPorts in Tunnel mode Support for node OS-based tagging Yes No Yes Yes Support for failover per vNIC/ group/UFP vPort Yes Yes No Yes Support for more than one uplink path per vNIC/vPort group No Yes Yes Yes for vPorts in Trunk and Access modes Supported regardless of the model of the Flex System I/O module No No Yes No Supported with vLAG No No Yes Yes for uplinks out of the I/O Module carrying vPort traffic Supported with SPAR No No Yes No Supported with stacking Yes Yes Yes Yes Supported with SI4093 No No Yes Yes Supported with EN4093 Yes Yes Yes Yes Supported with CN4093 Yes Yes Yes Yes
  • 18. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 7 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm 1.3.1 Introduction to vLAG In its simplest terms, vLAG is a technology designed to enhance traditional Ethernet link aggregations (sometimes referred to generically as Portchannels or Etherchannels). It is important to note that vLAG is not a form of aggregation in its own right, but an enhancement to aggregations. As some background, under current IEEE specifications, an aggregation is still defined as a bundle of similar links between two, and only two devices, bound together to operate as a single logical link. By today’s standards based definitions, you cannot create an aggregation on one device and have these links of that aggregation connect to more than a single device on the other side of the aggregation. The use of only two devices in this fashion limits the ability to offer certain robust designs. Although the standards bodies are working on a solution that provides split aggregations across devices, most vendors have developed their own versions of this multi-chassis aggregation. For example, Cisco has virtual Port Channel (vPC) on NX OS products, and Virtual Switch System (VSS) on the 6500 IOS products. IBM offers virtual Link Aggregation (vLAG) on many of the IBM Top of Rack (ToR) solutions, and on the EN4093R and CN4093 Flex System I/O modules. The primary goal of virtual link aggregation is to overcome the limit imposed by the current standards-based aggregation, and provide a distributed aggregation across a pair of switches instead of a single switch. Doing so results in a reduction of single points of failure, while still maintaining a loop-free, non-blocking environment. Figure 1-1 on page 7, shows an example of how vLAG can create a single common uplink out of a pair of embedded I/O Modules. This creates a non-looped path with no blocking links, offering the maximum amount of bandwidth for the links, and no single point of failure. Figure 1-1 Non-looped design using multi-chassis aggregation on both sides Although this vLAG based design is considered the most optimal, not all I/O module virtualization options support this topology, for example, Virtual Fabric vNIC mode or SPAR is not supported with vLAG. Another potentially limiting factor with vLAG (and other such cross-chassis aggregations such as vPC and VSS) is that it only supports a pair of switches acting as one for this cross-chassis aggregation, and not more than two. If the desire is to split an aggregation across more than two switches, stacking might be an option to consider. Chassis Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) I/O Module 1 I/O Module 2 Multi-chassis Aggregation (vLAG)
  • 19. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 8 NIC Virtualization in IBM Flex System Fabric Solutions 1.3.2 Introduction to stacking Stacking provides the ability to take up to eight physical I/O modules and treat them as a single logical switch from a port usage and management perspective. This means ports on different I/O modules in the stack can be part of a common aggregation, and you only log in to a single IP address to manage all I/O modules in the stack. For devices that are attaching to the stack, the stack looks and acts like a single large switch. Stacking is supported on the EN4093R and CN4093 I/O modules. It is provided by reserving a group of uplinks into stacking links and creating a ring of I/O modules with these links. The ring design ensures the loss of a single link or single I/O module in the stack does not lead to a disruption of the stack. Before v7.7 releases of code, it was possible to stack the EN4093R only into a common stack of like model I/O modules. However, in v7.7 and later code, support was added to add a pair CN4093s into a hybrid stack of EN4093s to add Fibre Channel Forwarder (FCF) capability into the stack. The limit for this hybrid stacking is a maximum of 6x EN4093Rs and 2x CN4093s in a common stack. Stacking the Flex System chassis I/O modules with IBM Top of Rack switches that also support stacking is not allowed. Connections from a stack of Flex System chassis I/O modules to upstream switches can be made with normal single or aggregated connections, including the use of vLAG/vPC on the upstream switches to connect links across stack members into a common non-blocking fabric between the stack and the Top of Rack switches. An example of four I/O modules in a highly available stacking design is shown in Figure 1-2. Important: When using the EN4093R and CN4093 in hybrid stacking, only the CN4093 is allowed to act as a stack master or stack backup master for the stack.
  • 20. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 9 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Figure 1-2 Example of stacking in the Flex System environment This example shows a design with no single points of failures, via a stack of four I/O modules in a single stack, and a pair of upstream vLAG/vPC connected switches. One of the potential limitations of the current implementation of stacking is that if an upgrade of code is needed, a reload of the entire stack must occur. Because upgrades are uncommon and should be scheduled for non-production hours anyway, a single stack design is usually efficient and acceptable. But some customers do not want to have any downtime (scheduled or otherwise) and a single stack design is thus not an acceptable solution. For these users that still want to make the most use of stacking, a two-stack design might be an option. This design features stacking a set of I/O modules in bay 1 into one stack, and a set of I/O modules in bay 2 in a second stack. The primary advantage to a two-stack design is that each stack can be upgraded one at a time, with the running stack maintaining connectivity for the compute nodes during the upgrade and reload of the other stack. The downside of the two-stack design is that traffic that is flowing from one stack to another stack must go through the upstream network to reach the other stack. As can be seen, stacking might not be suitable for all customers. However, if it is desired, it is another tool that is available for building a robust infrastructure by using the Flex System I/O modules. 1.3.3 Introduction to SPAR Switch partitioning (SPAR) is a feature that, among other things, allows a physical I/O module to be divided into multiple logical switches. After SPAR is configured, ports within a given SPAR group can communicate only with each other. Ports that are members of different Multi-chassis Aggregation (vLAG, vPC, mLAG, etc) Chassis 1 Compute Node NIC 1 NIC 2 Upstream Network ToR Switch 2 ToR Switch 1 I/O Module 1 I/O Module 2 Stacking Chassis 2 Compute Node NIC 1 NIC 2 I/O Module 1 I/O Module 2
  • 21. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 10 NIC Virtualization in IBM Flex System Fabric Solutions SPAR groups on the same I/O module can not communicate directly with each other, without going outside the I/O module. The EN4093R, CN4093, and the SI4093 I/O Modules support SPAR. SPAR features two modes of operation: Pass-through domain mode (also known as transparent mode) This mode of SPAR uses a Q-in-Q function to encapsulate all traffic passing through the switch in a second layer of VLAN tagging. This is the default mode when SPAR is enabled and is VLAN agnostic owing to this Q-in-Q operation. It passes tagged and untagged packets through the SPAR session without looking at or interfering with any customer assigned tag. SPAR pass-thru mode supports passing FCoE packets to an upstream FCF, but without FIP Snooping within the SPAR group in pass-through domain mode. Local domain mode This mode is not VLAN agnostic and requires a user to create any required VLANs in the SPAR group. Currently, there is a limit of 256 VLANs in Local domain mode. Support is available for FIP Snooping on FCoE sessions in Local Domain mode. Unlike pass-through domain mode, Local Domain mode provides strict control of end host VLAN isolation. Consider the following points regarding SPAR: SPAR is disabled by default on the EN4093R and CN4093. SPAR is enabled by default on SI4093, with all base licensed internal and external ports defaulting to a single pass-through SPAR group. This default SI4093 configuration can be changed if desired. Any port can be a member of only a single SPAR group at one time. Only a single uplink path is allowed per SPAR group (can be a single link, a single static aggregation, or a single LACP aggregation). This SPAR enforced restriction ensures that no network loops are possible with ports in a SPAR group. SPAR cannot be used with UFP or Virtual Fabric vNIC at this time. Switch Independent Mode vNIC is supported with SPAR. UFP support is slated for a possible future release. Up to eight SPAR groups per I/O module are supported. This number might be increased in a future release. SPAR is not supported with vLAG, stacking or tagpvid-ingress features. SPAR can be a useful solution in environments were simplicity is paramount. 1.3.4 Easy Connect Q-in-Q solutions The Easy Connect concept, often referred to as Easy Connect mode, or Transparent mode, is not a specific feature but a way of using one of four different existing features to attempt to minimize ongoing I/O module management requirements. The primary goal of Easy Connect is to make an I/O module transparent to the hosts and the upstream network they need to access, thus reducing the management requirements for I/O Modules in an Easy Connect mode. As noted, there are actually several features that can be used to accomplish an Easy Connect solution, with the following being common aspects of Easy Connect solutions: At the heart of Easy Connect is some form of Q-in-Q tagging, to mask packets traveling through the I/O module. This is a fundamental requirement of any Easy Connect solution
  • 22. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 11 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm and lets the attached hosts and upstream network communicate using any VLAN (tagged or untagged), and the I/O module will pass those packets through to the other side of the I/O module by wrapping them in an outer VLAN tag, and then removing that outer VLAN tag as the packet exits the I/O module, thus making the I/O module VLAN agnostic. This Q-in-Q operation is what removes the need to manage VLANs on the I/O module, which is usually one of the larger ongoing management requirements of a deployed I/O module. Pre-creating an aggregation of the uplinks, in some cases, all of the uplinks, to remove the possibility of loops (if all uplinks are not used, any unused uplinks/ports should be disabled to ensure loops are not possible). Optionally disabling spanning-tree so the upstream network does not receive any spanning-tree BPDUs. This is especially important in the case of upstream devices that will shut down a port if BPDUs are received, such as a Cisco FEX device, or an upstream switch running some form of BPDU guard. After it is configured, an I/O module in Easy Connect mode does not require on-going configuration changes as a customer adds and removes VLANs to the hosts and upstream network. In essence, Easy Connect turns the I/O module into a VLAN agnostic port aggregator, with support for growing up to the maximum bandwidth of the product (for example, add upgrade Feature on Demand (FoD) keys to the I/O module to increase the 10 Gb links to Compute Nodes and 10 Gb and 40 Gb links to the upstream networks). The following are the two primary methods for deploying an Easy Connect solution: Use an I/O module that defaults to a form of Easy Connect: – For customers that want an Easy Connect type of solution that is immediately ready for use out of the box (zero touch I/O module deployment), the SI4093 provides this by default. The SI4093 accomplishes this by having the following factory default configuration: • All base licensed internal and external ports are put into a single SPAR group. • All uplinks are put into a single common LACP aggregation and the LACP suspend-port feature is enabled. • The failover feature is enabled on the common LACP key. • No spanning-tree support (the SI4093 is designed to never permit more than a single uplink path per SPAR, so it can not create a loop and does not support spanning-tree). For customers that want the option to be able to use advanced features, but also want an Easy Connect mode solution, the EN4093R and CN4093 offer configurable options that can make them transparent to the attaching Compute Nodes and upstream network switches, while maintaining the option of changing to more advanced modes of configuration when needed. As noted, the SI4093 accomplishes this by defaulting to the SPAR feature in pass-through mode, which puts all compute node ports and all uplinks into a common Q-in-Q group. For the EN4093R and CN4093, there are a number of features that can be implemented to accomplish this Easy Connect support. The primary difference between these I/O modules and the SI4093 is that you must first perform a small set of configuration steps to set up the EN4093R and CN4093 into an Easy Connect mode, after which minimal management of the I/O module is required. For these I/O modules, this Easy Connect mode can be configured by using one of the following four features:
  • 23. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 12 NIC Virtualization in IBM Flex System Fabric Solutions The SPAR feature that is default on the SI4093 can be configured on both the EN4093R and CN4093 as well Utilize the tagpvid-ingress feature Configure vNIC Virtual Fabric Dedicated Uplink Mode Configure UFP vPort tunnel mode In general, all of these features provide this Easy Connect functionality, with each having some pros and cons. For example, if the desire is to use Easy Connect with vLAG, you should use the tagpvid-ingress mode or the UFP vPort tunnel mode (SPAR and Virtual Fabric vNIC do not permit the vLAG ISL). But, if you want to use Easy Connect with FCoE today, you cannot use tagpvid-ingress and must utilize a different form of Easy connect, such as the vNIC Virtual Fabric Dedicated Uplink Mode or UFP tunnel mode (SPAR pass-through mode allows FCoE but does not support FIP snooping, which may or may not be a concern for some customers). As an example of how Easy Connect works (in all Easy Connect modes), consider the tagpvid-ingress Easy Connect mode operation shown in Figure 1-3. When all internal ports and the desired uplink ports are placed into a common PVID/Native VLAN (4091 in this example) and tagpvid-ingress is enabled on these ports (with any wanted aggregation protocol on the uplinks that are required to match the other end of those links), all ports with a matching Native or PVID setting on this I/O module are part of a single Q-in-Q tunnel. The Native/PVID VLAN on the port acts as the outer tag and the I/O module switches traffic based on this outer tag VLAN. The inner customer tag rides through the fabric encapsulated on this Native/PVID VLAN to the destination port (or ports) in this tunnel, and then has the outer tag stripped off as it exits the I/O module, thus re-exposing the original customer facing tag (or no tag) to the device attaching to that egress port. Figure 1-3 Packet flow with Easy Connect In all modes of Easy Connect, local switching based on destination MAC address is still used.
  • 24. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 13 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm Some considerations on what form of Easy Connect mode makes the most sense for a given situation: For users that require virtualized NICs and are already using vNIC Virtual Fabric mode, and are more comfortable staying with it, vNIC Virtual Fabric dedicated uplink mode might be the best solution for Easy Connect functionality. For users that require virtualized NICs and have no particular opinion on which mode of virtualized NIC they prefer, UFP tunnel mode would be the best choice for Easy Connect mode, since the UFP feature is the future direction of virtualized NICs in the Flex System I/O module solutions. For users planning to make use of the vLAG feature, this would require either UFP tunnel mode or tagpvid-ingress mode forms of Easy Connect (vNIC Virtual Fabric mode and SPAR Easy Connect modes do not work with the vLAG feature). For users that do not need vLAG or virtual NIC functionality, SPAR is a very simple and clean solution to implement as an Easy Connect solution. 1.3.5 Introduction to the Failover feature Failover, some times referred to as Layer 2 Failover or Trunk Failover, is not a virtulization feature in its own right, but can play an important role when NICs on a server are making use of teaming/bonding (forms of NIC virtulization in the OS). Failover is particularly important in an embedded environment, such as in a Flex System chassis. When NICs are teamed/bonded in an operating system, they need to know when a NIC is no longer able to reach the upstream network, so they can decide to use or not use a NIC in the team. Most commonly this is a simple link up/link down check in the server. If the link is reporting up, use the NIC, if a link is reporting down, do not use the NIC. In an embedded environment, this can be a problem if the uplinks out of the embedded I/O module go down, but the internal link to the server is still up. In that case, the server will still be reporting the NIC link as up, even though there is no path to the upstream network, and that leads to the server sending traffic out a NIC that has no path out of the embedded I/O module, and disrupts server communications. The Failover feature can be implemented in these environments, and when the set of uplinks the Failover feature is tracking go down, then configurable internal ports will also be taken down, alerting the embedded server to a path fault in this direction, at which time the server can utilize the team/bond to select a different NIC, and maintain network connectivity. An example of how failover can protect Compute Nodes in a PureFlex chassis when there is an uplink fault out of one of the I/O modules can be seen in Figure 1-4 on page 14.
  • 25. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 14 NIC Virtualization in IBM Flex System Fabric Solutions Figure 1-4 Example of Failover in action Without failover or some other form of remote link failure detection, embedded servers would potentially be exposed to loss of connectivity if the uplink path on one of the embedded I/O modules were to fail. Note designs that utilize vLAG or some sort of cross chassis aggregation such as stacking are not exposed to this issue (and thus do not need the Failover feature) as they have a different coping method for dealing with uplinks out of an I/O module going down (for example, with vLAG, the packets that need to get upstream can cross the vLAG ISL and use the other I/O modules uplinks to get to the upstream network). 1.4 Introduction to converged fabrics As the name implies, converged fabrics are all about taking a set of protocols and data designed to run on top of one kind of physical medium, and allowing them to be carried on top of a different physical medium. This provides a number of cost benefits, such as reducing the number of physical cabling plants that are required, removing the need for separate physical NICs and HBAs, including a potential reduction in power and cooling. From an OpEx perspective it can reduce the cost associated with the management of separate physical infrastructures. In the datacenter world, two of the most common forms of converged fabrics are FCoE and iSCSI. FCoE allows a host to use its 10 Gb Ethernet connections to access Fibre Channel attached storage, as if it were physically Fibre Channel attached to the host, when in fact the FC traffic is encapsulated into FCoE frames and carried to the remote storage via an Ethernet network. iSCSI takes a protocol that was originally designed for hosts to talk to relatively close physical storage over physical SCSI cables, and converts it to utilize IP and run over an Ethernet network, and thus be able to access storage way beyond the limitations of a physical SCSI based solution. How Failover Works 1. All uplinks out of the I/O module have gone down (could be a link failure or failure of ToR 1, and so forth). 2. Trunk failover takes down the link to NIC 1 to notify the compute node the path out of I/O module 1 is gone. 3. NIC teaming on the compute node begins to utilizing the still functioning NIC 2 for all communications. Chassis Node NIC 1 NIC 2ToR Switch 2 ToR Switch 1 I/O Module 1 Failover enabled I/O Module 2 Failover enabled X Logical Teamed NIC 2 3 1
  • 26. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 15 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm iSCSI can be used in existing (lossy) and new (lossless) Ethernet infrastructure, with different performance characteristics. However, FCoE requires a lossless converged enhanced Ethernet network, and it relies on additional functionality known from Fibre Channel (for example, nameserver, zoning). 1.4.1 Fibre Channel over Ethernet FCoE assumes the existence of a lossless Ethernet, such as one that implements the Data Center Bridging (DCB) extensions to Ethernet. The EN4093R, CN4093, G8264 and G8264CS switches support FCoE; the G8264 and EN4093R functions as an FCoE transit switch while the CN4093 and G8264CS have Omni Ports which can be set to function as either FC ports or Ethernet ports under as specified in the switch configuration. The basic notion of FCoE is that the upper layers of FC are mapped onto Ethernet. The upper layer protocols and services of FC remain the same in an FCoE deployment. Zoning, fabric services, and similar services still exist with FCoE. The difference is that the lower layers of FC are replaced by lossless Ethernet, which also implies that FC concepts, such as port types and lower-layer initialization protocols, must be replaced by new constructs in FCoE. Such mappings are defined by the FC-BB-5 standard and are briefly addressed here. Figure 1-5 shows the perspective on FCoE layering compared to other storage networking technologies. In this figure, FC and FCoE layers are shown with other storage networking protocols, including iSCSI. Figure 1-5 Storage Network Protocol Layering Operating Systems / Applications SCSI Layer 1, 2, 4, 8, 16 Gbps FCP FCP FCP FCiSCSI SRP TCP TCP TCP IP IP IP FCoE FC IB iFCPFCIP Ethernet 1, 10, 40, 100... Gbps 10, 20, 40 Gbps FC FCoE
  • 27. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 16 NIC Virtualization in IBM Flex System Fabric Solutions 1.4.2 iSCSI The iSCSI protocol allows for longer distances between a server and its storage when compared to the traditionally restrictive parallel SCSI solutions or the newer serial-attached SCSI (SAS). iSCSI technology can use a hardware initiator, such as a host bus adapter (HBA), or a software initiator to issue requests to target devices. Within iSCSI storage terminology, the initiator is typically known as a client, and the target is the storage device. The iSCSI protocol encapsulates SCSI commands into protocol data units (PDUs) within the TCP/IP protocol and then transports them over the network to the target device. iSCSI provides block-level access to storage, as does Fibre Channel, but uses TCP/IP over Ethernet instead of Fibre Channel protocol. Therefore, iSCSI is attractive for its relative simplicity and usage of widely available Ethernet skills. Its chief limitations historically have been the relatively lower speeds of Ethernet compared to Fibre Channel and the extra TCP/IP encapsulation required. With lossless 10 Gb Ethernet now available, the attractiveness of iSCSI is expected to grow rapidly. TCP/IP encapsulation will still be used, but 10 Gbps Ethernet speeds will dramatically increase the appeal of iSCSI. 1.4.3 iSCSI versus FCoE The section highlights the similarities and differences between iSCSI and FCoE. However, in most cases, considerations other than purely technical ones will influence your decision in choosing one over the other. iSCSI and FCoE have the following similarities: Both protocols are block-oriented storage protocols. That is, the file system logic for accessing storage with either of them is on the computer where the initiator is, not on the storage hardware. Therefore, they are both different from typical network-attached storage (NAS) technologies, which are file oriented. Both protocols implement Ethernet-attached storage. Both protocols can be implemented in hardware, which is detected by the operating system of the host as an HBA. Both protocols can also be implemented by using software initiators which are available in various server operating systems. However, this approach uses resources of the main processor to perform tasks which would otherwise be performed by the hardware of an HBA. Both protocols can use the Converged Enhanced Ethernet (CEE), also referred to as Data Center Bridging), standards to deliver “lossless” traffic over Ethernet. Both protocols are alternatives to traditional FC storage and FC SANs. iSCSI and FCoE have the following differences: iSCSI uses TCP/IP as its transport, and FCoE uses Ethernet. iSCSI can use media other than Ethernet, such as InfiniBand, and iSCSI can use Layer 3 routing in an IP network. Numerous vendors provide local iSCSI storage targets, some of which also support Fibre Channel and other storage technologies. Relatively few native FCoE targets are available at this time, which might allow iSCSI to be implemented at a lower overall capital cost. FCoE requires a gateway function, usually called a Fibre Channel Forwarder (FCF), which allows FCoE access to traditional FC-attached storage. This approach allows FCoE and traditional FC storage access to coexist either as a long-term approach or as part of a migration. The G8264CS and CN4093 switches can be used to provide FCF functionality.
  • 28. Chapter 1. I/O module and NIC virtualization features in the IBM Flex System environment 17 Draft Document for Review July 18, 2014 10:18 pm Introduction.fm iSCSI-to-FC gateways exist but are not required when a storage device is used that can accept iSCSI traffic directly. Except in the case of a local FCoE storage target, the last leg of the connection uses FC to reach the storage. FC uses 8b/10b encoding, which means that, sending 8 bits of data requires a transmission of 10 bits over the wire or 25% overhead that is transmitted over the network to prevent corruption of the data. The 10 Gbps Ethernet uses 64b/66b encoding, which has a far smaller overhead. iSCSI includes IP headers and Ethernet (or other media) headers with every frame, which adds overhead. The largest payload that can be sent in an FCoE frame is 2112. iSCSI can use jumbo frame support on Ethernet and send 9K or more in a single frame. iSCSI has been on the market for several years longer than FCoE. Therefore, the iSCSI standards are more mature than FCoE. Troubleshooting FCoE end-to-end requires Ethernet networking skills and FC SAN skills.
  • 29. Introduction.fm Draft Document for Review July 18, 2014 10:18 pm 18 NIC Virtualization in IBM Flex System Fabric Solutions
  • 30. © Copyright IBM Corp. 2014. All rights reserved. 19 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Chapter 2. IBM Flex System networking architecture and Fabric portfolio The Flex System chassis delivers high-speed performance complete with integrated servers, storage, and networking for multi-chassis management in data center compute environments. Furthermore, its flexible design can meet the needs of varying workloads with independently scalable IT resource pools for higher usage and lower cost per workload. Although increased security and resiliency protect vital information and promote maximum uptime, the integrated, easy-to-use management system reduces setup time and complexity, which provides a quicker path to return on investment (ROI). This chapter includes the following topics: 2.1, “Enterprise Chassis I/O architecture” on page 20 2.2, “IBM Flex System Fabric I/O modules” on page 23 2.3, “IBM Flex System Virtual Fabric adapters” on page 42 2
  • 31. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 20 NIC Virtualization in IBM Flex System Fabric Solutions 2.1 Enterprise Chassis I/O architecture The Fabric networking I/O architecture for the IBM Flex System Enterprise Chassis includes an array of connectivity options for server nodes that are installed in the enclosure. Flex System Fabric I/O modules offer a local switching model that provides superior performance, cable reduction and a rich feature set that is fully integrated into the operation and management of the Enterprise Chassis. From a physical I/O module bay perspective, the Enterprise Chassis has four I/O bays in the rear of the chassis. The physical layout of these I/O module bays is shown in Figure 2-1. Figure 2-1 Rear view of the Enterprise Chassis showing I/O module bays From a midplane wiring point of view, the Enterprise Chassis provides 16 lanes out of each half-wide node bay (toward the rear I/O bays) with each lane capable of 16 Gbps or higher speeds. How these lanes are used is a function of which adapters are installed in a node, which I/O module is installed in the rear, and which port licenses are enabled on the I/O module. How the midplane lanes connect between the node bays upfront and the I/O bays in the rear is shown in Figure 2-2 on page 21. The concept of an I/O module Upgrade Feature on Demand (FoD) also is shown in Figure 2-2 on page 21. From a physical perspective, an upgrade FoD in this context is a bank of 14 ports and some number of uplinks that can be enabled and used on a switch module. By default, all I/O modules include the base set of ports, and thus have 14 internal ports, one each connected to the 14 compute node bays in the front. By adding an upgrade license to the I/O module, it is possible to add more banks of 14 ports (plus some number of uplinks) to an I/O module. The node needs an adapter that has the necessary physical ports to connect to the new lanes enabled by the upgrades. Those lanes connect to the ports in the I/O module enabled by the upgrade. I/O module bay 1 I/O module bay 3 I/O module bay 2 I/O module bay 4
  • 32. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 21 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Figure 2-2 Sixteen lanes total of a single half-wide node bay toward the I/O bays For example, if a node were installed with only the dual port LAN on system board (LOM) adapter, only two of the 16 lanes are used (one to I/O bay 1 and one to I/O bay 2), as shown in Figure 2-3 on page 22. If a node was installed without LOM and two quad port adapters were installed, eight of the 16 lanes are used (two to each of the four I/O bays). This installation can potentially provide up to 320 Gb of full duplex Ethernet bandwidth (16 lanes x 10 Gb x 2) to a single half-wide node and over half a terabit (Tb) per second of bandwidth to a full-wide node. Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the Fabric I/O modules which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, as shown in Figure 2-2, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Node Bay 1 InterfaceConnectorTo Adapter 2 To LOM or Adapter 1 InterfaceConnector Midplane I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  • 33. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 22 NIC Virtualization in IBM Flex System Fabric Solutions Figure 2-3 Dual port LOM connecting to ports on I/O bays 1 and 2 (all other lanes unused) Today, there are limits on the port density of the current I/O modules, in that only the first three lanes are potentially available from the I/O module. By default, each I/O module provides a single connection (lane) to each of the 14 half-wide node bays upfront. By adding port licenses, an EN4093R 10Gb Scalable Switch, CN4093 10Gb Converged Scalable Switch or SI4093 System Interconnect Module can each provide up to three 10 Gb ports to each of the 14 half-wide node bays. As an example, if two 8-port adapters were installed and four I/O modules were installed with all upgrades, the end node has access to 12 10G lanes (three to each switch). On the 8-port adapter, two lanes are unavailable at this time. Concerning port licensing, the default available upstream connections also are associated with port licenses. For more information about these connections and the node that face links, see 2.2, “IBM Flex System Fabric I/O modules” on page 23. All I/O modules include a base set of 14 downstream ports. The Ethernet switching and interconnect I/O modules support more than the base set of ports, and the ports are enabled by the upgrades. For more information, see the respective I/O module section in 2.2, “IBM Flex System Fabric I/O modules” on page 23. As of this writing, although no I/O modules and node adapter combinations can use all 16 lanes between a compute node bay and the I/O bays, the lanes exist to ensure that the Enterprise Chassis can use future available capacity. Beyond the physical aspects of the hardware, there are certain logical aspects that ensure that the Enterprise Chassis can integrate seamlessly into any modern data centers infrastructure. Many of these enhancements, such as vNIC, VMready, and 802.1Qbg, revolve around integrating virtualized servers into the environment. Fibre Channel over Ethernet (FCoE) Node Bay 1 InterfaceConnectorInterfaceConnector Midplane Dual port Ethernet Adapter LAN on Motherboard I/O Bay 1 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 2 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 3 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future I/O Bay 4 Base Upgrade 1 (Optional) Upgrade 2 (Optional) Future
  • 34. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 23 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm allows users to converge their Fibre Channel traffic onto their 10 Gb Ethernet network, which reduces the number of cables and points of management that is necessary to connect the Enterprise Chassis to the upstream infrastructures. The wide range of physical and logical Ethernet networking options that are available today and in development ensure that the Enterprise Chassis can meet the most demanding I/O connectivity challenges now and as the data center evolves. 2.2 IBM Flex System Fabric I/O modules The IBM Flex System Enterprise Chassis features a number of Fabric I/O module solutions that provide a combination of 1 Gb and 10 Gb ports to the servers and 1 Gb, 10 Gb, and 40 Gb for uplink connectivity to the outside upstream infrastructure. The IBM Flex System Enterprise Chassis ensures that a suitable selection is available to meet the needs of the server nodes. The following Flex System Fabric modules are available for deployment within the Enterprise Chassis: 2.2.1, “IBM Flex System Fabric EN4093R 10Gb Scalable Switch” 2.2.2, “IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch” on page 30 2.2.3, “IBM Flex System Fabric SI4093 System Interconnect Module” on page 36 Some of the Fabric I/O module selection criteria are summarized in Table 2-1. Table 2-1 fabric module selection criteria External cabling: SFP, SFP+, and QSFP+ transceivers or DAC cables are required for external fabric module connectivity. Compatible transceivers and cables are listed in 2.2.4, “I/O modules and cables” on page 41. Suitable Fabric module  Fabric modules Requirement SI4093 System Interconnect Module EN4093R 10Gb Scalable Switch CN4093 10Gb Converged Scalable Switch Gigabit Ethernet to nodes Yes Yes Yes 10 Gb Ethernet to nodes Yes Yes Yes 10 Gb Ethernet uplinks Yes Yes Yes 40 Gb Ethernet uplinks Yes Yes Yes Basic Layer 2 switching Yes Yes Yes Advanced Layer 2 switching: IEEE features (STP, QoS) No Yes Yes Layer 3 IPv4 switching (forwarding, routing, ACL filtering) No Yes Yes Layer 3 IPv6 switching (forwarding, routing, ACL filtering) No Yes Yes 10 Gb Ethernet CEE Yes Yes Yes FCoE FIP Snooping Bridge support Yes Yes Yes FCF support No No Yes Native FC port support No No Yes
  • 35. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 24 NIC Virtualization in IBM Flex System Fabric Solutions 2.2.1 IBM Flex System Fabric EN4093R 10Gb Scalable Switch The IBM Flex System Fabric EN4093R 10Gb Scalable Switch provides unmatched scalability, port flexibility and performance, while also delivering innovations to help address a number of networking concerns today and providing capabilities that will help you prepare for the future. This switch is capable of supporting up to sixty-four 10 Gb Ethernet connections while offering Layer 2/3 switching, in addition to OpenFlow and "easy connect" modes. It is designed to install within the I/O module bays of the IBM Flex System Enterprise Chassis. This switch can help clients migrate to a 10 Gb or 40 Gb Ethernet infrastructure and offers cloud ready virtualization features like Virtual Fabric and VMready in addition to being Software Defined Network (SDN) ready. The EN4093R switch is shown in Figure 2-4. Figure 2-4 The IBM Flex System Fabric EN4093R 10Gb Scalable Switch The EN4093R switch is initially licensed for 24x 10 GbE ports. Further ports can be enabled with Upgrade 1 and Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied. Switch stacking No Yes Yes 802.1Qbg Edge Virtual Bridge support Yes Yes Yes vLAG support No Yes Yes UFP support Yes Yes Yes Virtual Fabric mode vNIC support No Yes Yes Switch Independent mode vNIC support Yes Yes Yes SPAR support Yes Yes Yes Openflow support No Yes No Suitable Fabric module  Fabric modules Requirement SI4093 System Interconnect Module EN4093R 10Gb Scalable Switch CN4093 10Gb Converged Scalable Switch EN4093: The EN4093, non R, is no longer being marketed. For information on the older EN4093, visit IBM Flex System InfoCenter publications.
  • 36. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 25 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Table 2-2 lists the part numbers for ordering the switch and the upgrades. Table 2-2 EN4093R 10Gb Scalable Switch part numbers and port upgrades (default port mapping) With flexible port mapping, clients have licenses for a specific number of ports: 95Y3309 is the part number for the base switch, and it provides 24x 10 GbE port licenses that can enable any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port). 49Y4798 (Upgrade 1) upgrades the base switch by activation of 14 internal 10 GbE ports and two external 40 GbE ports which is equivalent to adding 22 more 10 GbE port licenses for a total of 46x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. 88Y6037 (Upgrade 2) requires the base switch and Upgrade 1 already be activated and simply activates all the ports on the EN4093R which is 42 internal 10 GbE ports, 14 external SFP+ ports, and two external QSFP+ ports. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the EN4093R are enabled. Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 40 GbE ports (external) 05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 10x external 10 GbE ports 14x internal 10 GbE ports 14 10 0 49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 1) Adds 2x external 40 GbE ports Adds 14x internal 10 GbE ports 28 10 2 88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 2) (requires Upgrade 1): Adds 4x external 10 GbE ports Add 14x internal 10 GbE ports 42 14 2 Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the EN4093R which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Flexible port mapping is not available in Stacking mode. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the EN4093R are enabled.
  • 37. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 26 NIC Virtualization in IBM Flex System Fabric Solutions Table 2-3 lists supported port combinations with flexible port mapping. Table 2-3 EN4093R 10Gb Scalable Switch part numbers and port upgrades (flexible port mapping) The IBM Flex System Fabric EN4093R 10Gb Scalable Switch has the following features and specifications: Internal ports – 42 internal full-duplex 10 Gigabit Ethernet ports. – Two internal full-duplex 1 GbE ports connected to the chassis management module. External ports – 14 ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or SFP+ direct-attach copper cables. SFP+ modules and DAC cables are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers, QSFP+ to QSFP+ DAC cables, or QSFP+ to 4x 10 Gb SFP+ break-out cables. QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides an additional means to configure the switch module. Scalability and performance – 40 Gb Ethernet ports for extreme external bandwidth and performance – Fixed-speed external 10 Gb Ethernet ports to leverage 10 GbE core infrastructure – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps – Media access control (MAC) address learning: automatic update, support for up to 128,000 MAC addresses – Up to 128 IP interfaces per switch – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total external bandwidth per switch, up to 64 trunk groups, up to 16 ports per group Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal and external) 40 GbE ports (external) 05Y3309 A3J6 / ESW7 IBM Flex System Fabric EN4093R 10Gb Scalable Switch 24 0 20 1 16 2 49Y4798 A1EL / 3596 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 1) 46 0 42 1 38 2 88Y6037 A1EM / 3597 IBM Flex System Fabric EN4093 10Gb Scalable Switch (Upgrade 2) (requires Upgrade 1)b b. Flexible port mapping is not used with Upgrade 2 because with Upgrade 2 all ports on the switch become licensed and there is no need to reassign ports. 56 2
  • 38. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 27 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm – Support for jumbo frames (up to 9,216 bytes) – Broadcast/multicast storm control – IGMP snooping to limit flooding of IP multicast traffic – IGMP filtering to control multicast traffic for hosts participating in multicast groups – Configurable traffic distribution schemes over trunk links based on source/destination IP or MAC addresses, or both – Fast port forwarding and fast uplink convergence for rapid STP convergence Availability and redundancy – Virtual Router Redundancy Protocol (VRRP) for Layer 3 router redundancy – IEEE 802.1D STP for providing L2 redundancy – IEEE 802.1s Multiple STP (MSTP) for topology optimization, up to 32 STP instances are supported by a single switch – IEEE 802.1w Rapid STP (RSTP) provides rapid STP convergence for critical delay-sensitive traffic like voice or video – Per-VLAN Rapid STP (PVRST) enhancements – Layer 2 Trunk Failover to support active/standby configurations of network adapter teaming on compute nodes – Hot Links provides basic link redundancy with fast recovery for network topologies that require Spanning Tree to be turned off VLAN support – Up to 4095 VLANs supported per switch, with VLAN numbers ranging from 1 to 4095 (4095 is used for management module’s connection only.) – 802.1Q VLAN tagging support on all ports Private VLANs Security – VLAN-based, MAC-based, and IP-based access control lists (ACLs) – 802.1x port-based authentication – Multiple user IDs and passwords – User access control – Radius, TACACS+ and LDAP authentication and authorization – NIST 800-131A Encryption – Selectable encryption protocol; SHA 256 enabled as default – IPv6 ACL metering Quality of Service (QoS) – Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and destination addresses, VLANs) traffic classification and processing – Traffic shaping and re-marking based on defined policies – Eight Weighted Round Robin (WRR) priority queues per port for processing qualified traffic IP v4 Layer 3 functions – Host management – IP forwarding – IP filtering with ACLs, up to 896 ACLs supported
  • 39. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 28 NIC Virtualization in IBM Flex System Fabric Solutions – VRRP for router redundancy – Support for up to 128 static routes – Routing protocol support (RIP v1, RIP v2, OSPF v2, BGP-4); up to 2048 entries in a routing table – Support for DHCP Relay – Support for IGMP snooping and IGMP relay – Support for Protocol Independent Multicast (PIM) in Sparse Mode (PIM-SM) and Dense Mode (PIM-DM). IPv6 Layer 3 functions – IPv6 host management (except default switch management IP address) – IPv6 forwarding – Up to 128 static routes – Support for OSPF v3 routing protocol – IPv6 filtering with ACLs – Virtual Station Interface Data Base (VSIDB) support OpenFlow support – OpenFlow 1.0 and 1.3.1 – OpenFlow hybrid mode Virtualization – Virtual NICs (vNICs) • Ethernet, iSCSI, or FCoE traffic is supported on vNICs – Unified fabric ports (UFPs) • Ethernet or FCoE traffic is supported on UFPs • Supports up to 256 VLAN for the virtual ports • Integration with L2 failover – Virtual link aggregation groups (vLAGs) – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware. • Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor. • Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data units. • Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that will persist with the VM, independent of its location. • EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, VDP. – VMready – Switch partitioning (SPAR) • SPAR forms separate virtual switching contexts by segmenting the data plane of the module. Data plane traffic is not shared between SPARs on the same switch. • SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached to a SPAR can communicate with each other and with the upstream switch. Hosts on the same VLAN but attached to different SPARs communicate through the upstream switch.
  • 40. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 29 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm • SPAR is implemented as a dedicated VLAN with a set of internal compute node ports and a single external port or link aggregation (LAG). Multiple external ports or LAGs are not allowed in SPAR. A port can be a member of only one SPAR. Converged Enhanced Ethernet – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities. – Support for SPAR and FCoE Fibre Channel over Ethernet (FCoE) – FC-BB5 FCoE specification compliant – FCoE transit switch operations – FCoE Initialization Protocol (FIP) support for automatic ACL configuration – FCoE Link Aggregation Group (LAG) support – Multi-hop RDMA over Converged Ethernet (RoCE) with LAG support – Supports 2,000 secure FCoE sessions with FIP Snooping by using Class ID ACLs Stacking – Up to eight switches in a stack - single IP management – Hybrid stacking support (from two to six EN4093R switches with two CN4093 switches) – FCoE support • FCoE LAG on external ports – 802.1Qbg support – vNIC and UFP support • Support for UFP with 802.1Qbg Manageability – Simple Network Management Protocol (SNMP V1, V2 and V3) – HTTP browser GUI – Telnet interface for CLI – SSH – Secure FTP (sFTP) – Service Location Protocol (SLP) – Serial interface for CLI – Scriptable CLI – Firmware image update (TFTP and FTP) – Network Time Protocol (NTP) and Precision Time Protocol (PTP) Monitoring – Switch LEDs for external port status and switch module status indication – Remote Monitoring (RMON) agent to collect statistics and proactively monitor switch performance – Port mirroring for analyzing network traffic passing through switch – Change tracking and remote logging with syslog feature – Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW analyzer required elsewhere) – POST diagnostics
  • 41. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 30 NIC Virtualization in IBM Flex System Fabric Solutions For more information, see IBM Flex System Fabric EN4093R 10Gb Scalable Switch, TIPS0864, which is available at this website: http://www.redbooks.ibm.com/abstracts/tips0864.html 2.2.2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch provides unmatched scalability, port flexibility, performance, convergence, and network virtualization, while also delivering innovations to help address a number of networking concerns today and providing capabilities that will help you prepare for the future. The switch offers full Layer 2/3 switching, transparent "easy connect" mode, as well as FCoE Full Fabric and Fibre Channel NPV Gateway operations to deliver a truly converged integrated solution, and it is designed to install within the I/O module bays of the IBM Flex System Enterprise Chassis. The switch can help clients migrate to a 10 GbE or 40 GbE converged Ethernet infrastructure and offers virtualization features like Virtual Fabric and VMready. Figure 2-5 shows the IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch. Figure 2-5 IBM Flex System Fabric CN4093 10 Gb Converged Scalable Switch The CN4093 switch is initially licensed for 22x 10 GbE ports. Further ports can be enabled with Upgrade 1 and Upgrade 2 license options. Upgrade 1 and Upgrade 2 can be applied on the switch independently from each other or in combination for full feature capability. Table 2-4 lists the part numbers for ordering the switch and the upgrades. Table 2-4 CN4093 10Gb Converged Scalable Switch part numbers and port upgrades (default port mapping) Part number Feature codea Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 10 Gb Omni ports (external) 40 GbE ports (external) 00D5823 A3HH / ESW2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 2x external 10 GbE ports 6x external Omni ports 14x internal 10 GbE ports 14 2 6 0 00D5845 A3HL / ESU1 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) Adds 2x external 40 GbE ports Adds 14x internal 10 GbE ports 28 2 6 2
  • 42. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 31 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm With flexible port mapping, clients have licenses for a specific number of ports: 00D5823 is the part number for the base switch, and it provides 22x 10 GbE port licenses that can enable any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port). 00D5845 (Upgrade 1) upgrades the base switch by activation of 14 internal 10 GbE ports and two external 40 GbE ports which is equivalent to adding 22 more 10 GbE port licenses for a total of 44x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. 00D5847 (Upgrade 2) upgrades the base switch by activation of 14 internal 10 GbE ports and six external Omni Ports which is equivalent to adding 20 more 10 GbE port licenses for a total of 42x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and Omni Ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base switch. Both 00D5845 (Upgrade 1) and 00D5847 (Upgrade 2) simply activate all the ports on the CN4093 which is 42 internal 10 GbE ports, two external SFP+ ports, 12 external Omni Ports and two external QSFP+ ports. 00D5847 A3HM / ESU2 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) Adds 6x external Omni ports Adds 14x internal 10 GbE ports 28 2 12 0 Both Upgrade 1 and Upgrade 2 applied 42 2 12 2 a. x-config / e-config feature code Part number Feature codea Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 10 Gb Omni ports (external) 40 GbE ports (external) Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the CN4093 which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports and Omni Ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. Flexible port mapping is not available in Stacking mode. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the CN4093 are enabled.
  • 43. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 32 NIC Virtualization in IBM Flex System Fabric Solutions Table 2-5 lists supported port combinations with flexible port mapping. Table 2-5 CN4093 10Gb Converged Scalable Switch part numbers and port upgrades (flexible port mapping) The IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch has the following features and specifications: Internal ports – Forty-two internal full-duplex 10 Gigabit ports – Two internal full-duplex 1 GbE ports connected to the Chassis Management Module External ports – Two ports for 1 Gb or 10 Gb Ethernet SFP/SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, 10GBASE-LR, or SFP+ direct-attach copper (DAC) cables. SFP+ modules and DAC cables are not included and must be purchased separately. – Twelve IBM Omni Ports, each of which can operate as a 10 Gb Ethernet (support for 10GBASE-SR, 10GBASE-LR, or 10 GbE SFP+ DAC cables), or auto-negotiating 4/8 Gb Fibre Channel, depending on the SFP+ transceiver installed in the port. SFP+ modules and DAC cables are not included and must be purchased separately. (Omni Ports do not support 1 Gb SFP Ethernet transceivers.) – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DAC cables. In addition, you can use break-out cables to break out each 40 GbE port into four 10 GbE SFP+ connections. QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides an additional means to configure the switch module. Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal and external) and Omni ports (external) 40 GbE ports (external) 00D5823 A3HH / ESW2 IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch 22 0 18 1 14 2 00D5845 A3HL / ESU1 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 1) 44 0 40 1 36 2 00D5847 A3HM / ESU2 IBM Flex System Fabric CN4093 Converged Scalable Switch (Upgrade 2) 42 0 38 1 34 2 Both Upgrade 1 and Upgrade 2 appliedb b. Flexible port mapping is not used when both Upgrade 1 and Upgrade 2 are applied because with both upgrades all ports on the switch become licensed and there is no need to reassign ports. 56 2
  • 44. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 33 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Scalability and performance – 40 Gb Ethernet ports for more external bandwidth and performance – Fixed-speed external 10 Gb Ethernet ports to leverage 10 GbE upstream infrastructure – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps on Ethernet ports – Media access control (MAC) address learning: automatic update, support for up to 128,000 MAC addresses – Up to 128 IP interfaces per switch – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total external bandwidth per switch, up to 64 trunk groups, up to 16 ports per group – Support for jumbo frames (up to 9,216 bytes) – Broadcast/multicast storm control – IGMP snooping to limit flooding of IP multicast traffic – IGMP filtering to control multicast traffic for hosts participating in multicast groups – Configurable traffic distribution schemes over trunk links based on source/destination IP or MAC addresses, or both – Fast port forwarding and fast uplink convergence for rapid STP convergence Availability and redundancy – Virtual Router Redundancy Protocol (VRRP) for Layer 3 router redundancy – IEEE 802.1D STP for providing L2 redundancy – IEEE 802.1s Multiple STP (MSTP) for topology optimization; up to 32 STP instances are supported by a single switch – IEEE 802.1w Rapid STP (RSTP) provides rapid STP convergence for critical delay-sensitive traffic such as voice or video – Per-VLAN Rapid STP (PVRST) enhancements – Layer 2 Trunk Failover to support active/standby configurations of network adapter teaming on compute nodes – Hot Links provides basic link redundancy with fast recovery for network topologies that require Spanning Tree to be turned off VLAN support – Up to 4095 VLANs supported per switch, with VLAN numbers ranging from 1 to 4095 (4095 is used for management module’s connection only.) – 802.1Q VLAN tagging support on all ports – Private VLANs Security – VLAN-based, MAC-based, and IP-based access control lists (ACLs) – 802.1x port-based authentication – Multiple user IDs and passwords – User access control – Radius, TACACS+, and LDAP authentication and authorization – NIST 800-131A Encryption – Selectable encryption protocol; SHA 256 enabled as default – IPv6 ACL metering
  • 45. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 34 NIC Virtualization in IBM Flex System Fabric Solutions Quality of Service (QoS) – Support for IEEE 802.1p, IP ToS/DSCP, and ACL-based (MAC/IP source and destination addresses, VLANs) traffic classification and processing – Traffic shaping and re-marking based on defined policies – Eight Weighted Round Robin (WRR) priority queues per port for processing qualified traffic IP v4 Layer 3 functions – Host management – IP forwarding – IP filtering with ACLs; up to 896 ACLs supported – VRRP for router redundancy – Support for up to 128 static routes – Routing protocol support (RIP v1, RIP v2, OSPF v2, BGP-4); up to 2048 entries in a routing table – Support for DHCP Relay – Support for IGMP snooping and IGMP relay – Support for Protocol Independent Multicast (PIM) in Sparse Mode (PIM-SM) and Dense Mode (PIM-DM) IP v6 Layer 3 functions – IPv6 host management (except default switch management IP address) – IPv6 forwarding – Up to 128 static routes – Support for OSPF v3 routing protocol – IPv6 filtering with ACLs – Virtual Station Interface Data Base (VSIDB) support Virtualization – Virtual NIC (vNIC) • Ethernet, iSCSI, or FCoE traffic is supported on vNICs – Unified fabric port (UFP) • Ethernet or FCoE traffic is supported on UFPs • Supports up to 256 VLAN for the virtual ports (vPorts) • Integration with L2 failover – Virtual link aggregation groups (vLAGs) – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware. • Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor. • Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data units. • Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that will persist with the VM, independent of its location. • EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, VDP.
  • 46. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 35 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm – VMready – Switch partitioning (SPAR) • SPAR forms separate virtual switching contexts by segmenting the data plane of the switch. Data plane traffic is not shared between SPARs on the same switch. • SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached to a SPAR can communicate with each other and with the upstream switch. Hosts on the same VLAN but attached to different SPARs communicate through the upstream switch. • SPAR is implemented as a dedicated VLAN with a set of internal compute node ports and a single external port or link aggregation (LAG). Multiple external ports or LAGs are not allowed in SPAR. A port can be a member of only one SPAR. Converged Enhanced Ethernet – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the switch to pause traffic based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities. – Multi-hop RDMA over Converged Ethernet (RoCE) with LAG support. Fibre Channel and Fibre Channel over Ethernet (FCoE) – FC-BB-5 FCoE specification compliant – Native FC Forwarder (FCF) switch operations – End-to-end FCoE support (initiator to target) – FCoE Initialization Protocol (FIP) support – FCoE Link Aggregation Group (LAG) support – Optimized FCoE to FCoE forwarding – Omni Ports support 4/8 Gb FC when FC SFPs+ are installed in these ports – Support for F_port, E_Port ISL, NP_port and VF_port FC port types – Full Fabric mode for end-to-end FCoE or FCoE gateway; NPV Gateway mode for external FC SAN attachments (support for IBM B-type, Brocade, and Cisco MDS external SANs) – Sixteen buffer credits supported – Fabric Device Management Interface (FDMI) – NPIV support – Fabric Shortest Path First (FSPF) – Port security – Fibre Channel ping, debugging – Supports 2,000 secure FCoE sessions with FIP Snooping by using Class ID ACLs – Fabric services in Full Fabric mode: • Name Server • Registered State Change Notification (RSCN) • Login services • Zoning
  • 47. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 36 NIC Virtualization in IBM Flex System Fabric Solutions Stacking – Hybrid stacking support (from two to six EN4093/EN4093R switches with two CN4093 switches) - single IP management – FCoE support • FCoE LAG on external ports – 802.1Qbg support – vNIC and UFP support • Support for UFP with 802.1Qbg Manageability – Simple Network Management Protocol (SNMP V1, V2 and V3) – HTTP browser GUI – Telnet interface for CLI – SSH – Secure FTP (sFTP) – Service Location Protocol (SLP) – Serial interface for CLI – Scriptable CLI – Firmware image update (TFTP and FTP) – Network Time Protocol (NTP) for switch clock synchronization Monitoring – Switch LEDs for external port status and switch module status indication – Remote Monitoring (RMON) agent to collect statistics and proactively monitor switch performance – Port mirroring for analyzing network traffic passing through the switch – Change tracking and remote logging with syslog feature – Support for sFLOW agent for monitoring traffic in data networks (separate sFLOW analyzer required elsewhere) – POST diagnostics For more information, see the IBM Redbooks Product Guide IBM Flex System Fabric CN4093 10Gb Converged Scalable Switch, TIPS0910, found at: http://www.redbooks.ibm.com/abstracts/tips0910.html 2.2.3 IBM Flex System Fabric SI4093 System Interconnect Module The IBM Flex System Fabric SI4093 System Interconnect Module enables simplified integration of IBM Flex System into your existing networking infrastructure and provides the capability of building simple connectivity for points of delivery (PODs) or clusters up to 252 nodes. The SI4093 requires no management for most data center environments, eliminating the need to configure each networking device or individual ports, thus reducing the number of management points. It provides a low latency, loop-free interface that does not rely upon spanning tree protocols, thus removing one of the greatest deployment and management complexities of a traditional switch. The SI4093 offers administrators a simplified deployment experience while maintaining the performance of intra-chassis connectivity.
  • 48. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 37 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm The SI4093 System Interconnect Module is shown in Figure 2-6. Figure 2-6 IBM Flex System Fabric SI4093 System Interconnect Module The SI4093 module is initially licensed for 24x 10 GbE ports. Further ports can be enabled with Upgrade 1 and Upgrade 2 license options. Upgrade 1 must be applied before Upgrade 2 can be applied. Table 2-6 shows the part numbers for ordering the switches and the upgrades. Table 2-6 SI4093 System Interconnect Module part numbers and port upgrades (default port mapping) Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal) 10 GbE ports (external) 40 GbE ports (external) 95Y3313 A45T / ESWA IBM Flex System Fabric SI4093 System Interconnect Module 10x external 10 GbE ports 14x internal 10 GbE ports 14 10 0 95Y3318 A45U / ESW8 IBM Flex System Fabric SI4093 System Interconnect Module (Upgrade 1) Adds 2x external 40 GbE ports Adds 14x internal 10 GbE ports 28 10 2 95Y3320 A45V / ESW9 IBM Flex System Fabric SI4093 System Interconnect Module (Upgrade 2) (requires Upgrade 1) Adds 4x external 10 GbEports Adds 14x internal 10 GbE ports 42 14 2 Flexible port mapping: With IBM Networking OS version 7.8 or later clients have more flexibility in assigning ports that they have licensed on the SI4093 which can help eliminate or postpone the need to purchase upgrades. While the base model and upgrades still activate specific ports, flexible port mapping provides clients with the capability of reassigning ports as needed by moving internal and external 10 GbE ports, or trading off four 10 GbE ports for the use of an external 40 GbE port. When both Upgrade 1 and Upgrade 2 are activated, flexible port mapping is no longer used because all the ports on the SI4093 are enabled.
  • 49. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 38 NIC Virtualization in IBM Flex System Fabric Solutions With flexible port mapping, clients have licenses for a specific number of ports: 95Y3313 is the part number for the base module, and it provides 24x 10 GbE ports licenses that can enable any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port). 95Y3318 (Upgrade 1) upgrades the base module by activation of 14 internal 10 GbE ports and two external 40 GbE ports which is equivalent to adding 22 more 10 GbE port licenses for a total of 46x 10 GbE port licenses. Any combination of internal and external 10 GbE ports and external 40 GbE ports (with the use of four 10 GbE port licenses per one 40 GbE port) can be enabled with this upgrade. This upgrade requires the base module. 95Y3320 (Upgrade 2) requires the base module and Upgrade 1 already be activated and simply activates all the ports on the SI4093 which is 42 internal 10 GbE ports, 14 external SFP+ ports, and two external QSFP+ ports. Table 2-7 lists supported port combinations with flexible port mapping. Table 2-7 SI4093 System Interconnect Module part numbers and port upgrades (flexible port mapping) The SI4093 System Interconnect Module has the following features and specifications: Modes of operations – Transparent (or VLAN-agnostic) mode. In VLAN-agnostic mode (default configuration), the SI4093 transparently forwards VLAN tagged frames without filtering on the customer VLAN tag, providing an end host view to the upstream network. The interconnect module provides traffic consolidation in the chassis to minimize TOR port utilization, and it also enables compute node to compute node communication for optimum performance (for example, vMotion). It can be connected to the FCoE transit switch or FCoE gateway (FC Forwarder) device. – Local Domain (or VLAN-aware) mode. In VLAN-aware mode (optional configuration), the SI4093 provides additional security for multi-tenant environments by extending client VLAN traffic isolation to the interconnect module and its external ports. VLAN-based access control lists (ACLs) Part number Feature codea a. x-config / e-config feature code Product description Total ports that are enabled 10 GbE ports (internal and external) 40 GbE ports (external) 95Y3313 A45T / ESWA IBM Flex System Fabric SI4093 System Interconnect Module 24 0 20 1 16 2 95Y3318 A45U / ESW8 IBM Flex System Fabric SI4093 System Interconnect Module (Upgrade 1) 46 0 42 1 38 2 95Y3320 A45V / ESW9 IBM Flex System Fabric SI4093 System Interconnect Module (Upgrade 2) (requires Upgrade 1)b b. Flexible port mapping is not used when both Upgrade 1 and Upgrade 2 are applied because with both upgrades all ports on the switch become licensed and there is no need to reassign ports. 56 2
  • 50. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 39 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm can be configured on the SI4093. When FCoE is used, the SI4093 operates as an FCoE transit switch, and it should be connected to the FCF device. – IBM Flex System Interconnect Fabric mode In Flex System Interconnect Fabric mode, the SI4093 module is running optional Interconnect Fabric software image and operates as a leaf switch in the leaf-spine fabric. Flex System Interconnect Fabric integrates the entire point of delivery (POD) into a seamless network fabric for compute node and storage under single IP management, and it attaches to the upstream data center network as a loop-free Layer 2 network fabric with a single Ethernet external connection or aggregation group to each Layer 2 upstream network. Internal ports – Forty-two internal full-duplex 10 Gigabit ports. – Two internal full-duplex 1 GbE ports connected to the chassis management module. External ports – Fourteen ports for 1 Gb or 10 Gb Ethernet SFP+ transceivers (support for 1000BASE-SX, 1000BASE-LX, 1000BASE-T, 10GBASE-SR, or 10GBASE-LR) or SFP+ direct-attach copper (DAC) cables. SFP+ modules and DAC cables are not included and must be purchased separately. – Two ports for 40 Gb Ethernet QSFP+ transceivers or QSFP+ DAC cables. QSFP+ modules and DAC cables are not included and must be purchased separately. – One RS-232 serial port (mini-USB connector) that provides an additional means to configure the interconnect module. Scalability and performance – 40 Gb Ethernet ports for extreme external bandwidth and performance. – External 10 Gb Ethernet ports to leverage 10 GbE upstream infrastructure. – Non-blocking architecture with wire-speed forwarding of traffic and aggregated throughput of 1.28 Tbps. – Media access control (MAC) address learning: automatic update, support for up to 128,000 MAC addresses. – Static and LACP (IEEE 802.3ad) link aggregation, up to 220 Gb of total external bandwidth per interconnect module. – Support for jumbo frames (up to 9,216 bytes). Availability and redundancy – Layer 2 Trunk Failover to support active/standby configurations of network adapter teaming on compute nodes. – Built-in link redundancy with loop prevention without a need for Spanning Tree protocol. VLAN support – Up to 32 VLANs supported per interconnect module SPAR partition, with VLAN numbers 1 - 4095. (4095 is used for management module’s connection only.) – 802.1Q VLAN tagging support on all ports. – Private VLANs. Note: Flexible port mapping is not available in Flex System Interconnect Fabric mode.
  • 51. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 40 NIC Virtualization in IBM Flex System Fabric Solutions Security – VLAN-based access control lists (ACLs) (VLAN-aware mode). – Multiple user IDs and passwords. – User access control. – Radius, TACACS+, and LDAP authentication and authorization. – NIST 800-131A Encryption. – Selectable encryption protocol; SHA 256 enabled as default. Quality of service (QoS) – Support for IEEE 802.1p traffic classification and processing. Virtualization – Switch Independent Virtual NIC (vNIC2). • Ethernet, iSCSI, or FCoE traffic is supported on vNICs. – Unified fabric port (UFP) • Ethernet or FCoE traffic is supported on UFPs. • Supports up to 256 VLAN for the virtual ports. • Integration with L2 failover. – 802.1Qbg Edge Virtual Bridging (EVB) is an emerging IEEE standard for allowing networks to become virtual machine (VM)-aware. • Virtual Ethernet Bridging (VEB) and Virtual Ethernet Port Aggregator (VEPA) are mechanisms for switching between VMs on the same hypervisor. • Edge Control Protocol (ECP) is a transport protocol that operates between two peers over an IEEE 802 LAN providing reliable, in-order delivery of upper layer protocol data units. • Virtual Station Interface (VSI) Discovery and Configuration Protocol (VDP) allows centralized configuration of network policies that will persist with the VM, independent of its location. • EVB Type-Length-Value (TLV) is used to discover and configure VEPA, ECP, VDP. – VMready – Switch partitioning (SPAR) • SPAR forms separate virtual switching contexts by segmenting the data plane of the module. Data plane traffic is not shared between SPARs on the same module. • SPAR operates as a Layer 2 broadcast network. Hosts on the same VLAN attached to a SPAR can communicate with each other and with the upstream switch. Hosts on the same VLAN but attached to different SPARs communicate through the upstream switch. • SPAR is implemented as a dedicated VLAN with a set of internal compute node ports and a single external port or link aggregation (LAG). Multiple external ports or LAGs are not allowed in SPAR. A port can be a member of only one SPAR. Converged Enhanced Ethernet – Priority-Based Flow Control (PFC) (IEEE 802.1Qbb) extends 802.3x standard flow control to allow the module to pause traffic based on the 802.1p priority value in each packet’s VLAN tag. – Enhanced Transmission Selection (ETS) (IEEE 802.1Qaz) provides a method for allocating link bandwidth based on the 802.1p priority value in each packet’s VLAN tag. – Data Center Bridging Capability Exchange Protocol (DCBX) (IEEE 802.1AB) allows neighboring network devices to exchange information about their capabilities.
  • 52. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 41 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Fibre Channel over Ethernet (FCoE) – FC-BB5 FCoE specification compliant. – FCoE transit switch operations. – FCoE Initialization Protocol (FIP) support. Manageability – IPv4 and IPv6 host management. – Simple Network Management Protocol (SNMP V1, V2, and V3). – Industry standard command-line interface (IS-CLI) through Telnet, SSH, and serial port. – Secure FTP (sFTP). – Service Location Protocol (SLP). – Firmware image update (TFTP and FTP/sFTP). – Network Time Protocol (NTP) for clock synchronization. – IBM System Networking Switch Center (SNSC) support. Monitoring – LEDs for external port status and module status indication. – Change tracking and remote logging with syslog feature. – POST diagnostic tests. For more information, see IBM Flex System Fabric EN4093 and EN4093R 10Gb Scalable Switches, TIPS0864, which is available at this website: http://www.redbooks.ibm.com/abstracts/tips0864.html?Open 2.2.4 I/O modules and cables The Ethernet I/O modules support for interface modules and cables is shown in Table 2-8. Table 2-8 Modules and cables supported in Ethernet I/O modules Part number Description EN4093R CN4093 SI4093 10 Gb Ethernet SFP+ transceivers 44W4408 10GbE 850 nm Fiber SFP+ Transceiver (SR) Yes Yes Yes 46C3447 IBM SFP+ SR Transceiver Yes Yes Yes 90Y9412 IBM SFP+ LR Transceiver Yes Yes Yes 1 Gb Ethernet SFP transceivers 81Y1622 IBM SFP SX Transceiver Yes Yes Yes 81Y1618 IBM SFP RJ45 Transceiver Yes Yes Yes 90Y9424 IBM SFP LX Transceiver Yes Yes Yes 40 Gb Ethernet QSFP+ transceivers 49Y7884 IBM QSFP+ SR Transceiver Yes Yes Yes 90Y3519 10m IBM QSFP+ MTP Optical cable Yes Yes Yes 90Y3521 30m IBM QSFP+ MTP Optical cable Yes Yes Yes
  • 53. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 42 NIC Virtualization in IBM Flex System Fabric Solutions All Fabric I/O modules are restricted to the use of the SFP/SFP+/QSFP+ modules and DAC cables that are listed in Table 2-8 on page 41. 2.3 IBM Flex System Virtual Fabric adapters The IBM Flex System portfolio contains a number of Virtual Fabric I/O adapters. The cards are a combination of 10 Gb ports and advanced function support that includes converged networks and virtual NICs. The following Virtual Fabric I/O adapters are described: 2.3.1, “Embedded 10Gb Virtual Fabric Adapter” 2.3.2, “IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters” on page 44 2.3.1 Embedded 10Gb Virtual Fabric Adapter Some models of the IBM Flex System x240 and x440 Compute Nodes include an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard (LOM)) built into the system board. Each x240 model that includes the embedded 10 Gb VFA also has the Compute Node Fabric Connector installed in I/O connector 1 (and physically screwed onto the system board) to provide connectivity to the Enterprise Chassis midplane. Each x440 model that includes two embedded 10 Gb VFAs also has the Compute Node Fabric Connectors installed in each of I/O connectors 1 and 3 (and physically screwed onto the system board) to provide 10 Gb Ethernet SFP+ DAC cables 90Y9427 1m IBM Passive DAC SFP+ Cable Yes Yes Yes 00AY764 1.5m IBM Passive DAC SFP+ Cable Yes Yes Yes 00AY765 2m IBM Passive DAC SFP+ Cable Yes Yes Yes 90Y9430 3m IBM Passive DAC SFP+ Cable Yes Yes Yes 90Y9433 5m IBM Passive DAC SFP+ Cable Yes Yes Yes 00D6151 7m IBM Passive DAC SFP+ Cable Yes Yes Yes 40 Gb Ethernet QSFP+ to 4x SFP+ DAC break out cables 49Y7886 1m IBM QSFP+ DAC Break Out Cable Yes Yes Yes 49Y7887 3m IBM QSFP+ DAC Break Out Cable Yes Yes Yes 49Y7888 5m IBM QSFP+ DAC Break Out Cable Yes Yes Yes 40 Gb Ethernet QSFP+ DAC cables 49Y7890 1m IBM QSFP+-to-QSFP+ Cable Yes Yes Yes 49Y7891 3m IBM QSFP+-to-QSFP+ Cable Yes Yes Yes 00D5810 5m IBM QSFP+ to QSFP+ Cable Yes Yes Yes 00D5813 7m IBM QSFP+ to QSFP+ Cable Yes Yes Yes Part number Description EN4093R CN4093 SI4093
  • 54. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 43 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm connectivity to the Enterprise Chassis midplane. The Fabric Connector enables port 1 on the embedded 10Gb VFA to be routed to I/O module bay 1 and port 2 to be routed to I/O bay 2. Each server in the x222 compute node includes an Embedded two-port 10Gb Virtual Fabric Adapter that is built in to the system board. The x222 has one Fabric Connector (which is physically on the lower server) and the Ethernet connections from both Embedded 10 Gb VFAs are routed through it to I/O module bays 1 and 2. Table 2-9 lists the ordering information for the IBM Virtual Fabric Advanced Software Upgrade (LOM), which enables the iSCSI and FCoE support on the Embedded 10Gb Virtual Fabric Adapter. Table 2-9 Feature on Demand upgrade for FCoE and iSCSI support The IBM Flex System Embedded 10 Gb VFA has the following features and specifications: Models with Intel Xeon E5-2400, E5-2600, and 4600 processors: Emulex BladeEngine 3 (BE3) ASIC Models with Intel Xeon E5-2600 v2 processors: Emulex BladeEngine 3R (BE3R) ASIC Operates as a 2-port 1/10 Gb Ethernet adapter or supports up to eight Virtual Network Interface Controllers (virtual NICs) Supports NIC virtualization – Modes of operation: • IBM Unified Fabric Port (UFP) • IBM Virtual Fabric mode vNIC • Switch Independent mode vNIC – Virtual port bandwidth allocation in 100 Mbps increments – Up to eight virtual ports per adapter (four per port) – With the optional Advanced Upgrade, two of the eight virtual NICs (one per port) are transformed into iSCSI or FCoE HBAs Wake On LAN support FCoE and iSCSI HBA function support with the optional Advanced Upgrade PCI Express 2.0 x8 host interface Full-duplex capability DMA support PXE support IPv4/IPv6 TCP, UDP checksum offload: – Large send offload – Large receive offload – RSS – IPv4 TCP Chimney offload – TCP Segmentation offload VLAN insertion and extraction Jumbo frames up to 9000 bytes Load balancing and failover support, including AFT, SFT, ALB, and LACP Part number Feature code Description 90Y9310 A2TD IBM Virtual Fabric Advanced Software Upgrade (LOM)
  • 55. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 44 NIC Virtualization in IBM Flex System Fabric Solutions Converged Enhanced Ethernet (draft): – Enhanced Transmission Selection (ETS) (P802.1Qaz) – Priority-based Flow Control (PFC) (P802.1Qbb) – Data Center Bridging eXchange protocol (DCBX) (P802.1Qaz) Support Serial over LAN (SoL) 2.3.2 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters The IBM Flex System CN4054 and CN4054R 10Gb Virtual Fabric Adapters from Emulex are 4-port 10 Gb converged network adapters. They can scale to up to 16 virtual ports and support multiple protocols, such as Ethernet, iSCSI, and FCoE. The CN4054R adds support for compute nodes with the Intel Xeon E5-2600 v2 processors. Figure 2-7 shows the IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapters. Figure 2-7 The CN4054/CN4054R 10Gb Virtual Fabric Adapter for IBM Flex System Table 2-10 lists the ordering part numbers and feature codes. Table 2-10 IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter ordering information The IBM Flex System CN4054 and CN4054R 10Gb Virtual Fabric Adapters have the following features and specifications: CN4054: Dual-ASIC Emulex BladeEngine 3 (BE3) controller CN4054R: Dual-ASIC Emulex BladeEngine 3R (BE3R) controller Operates as a 4-port 1/10 Gb Ethernet adapter or supports up to 16 Virtual Network Interface Controllers (vNICs) Part number Feature code Description 90Y3554 A1R1 CN4054 10Gb Virtual Fabric Adapter 00Y3306 A4K2 CN4054R 10Gb Virtual Fabric Adapter 90Y3558 A1R0 CN4054 Virtual Fabric Adapter Upgrade
  • 56. Chapter 2. IBM Flex System networking architecture and Fabric portfolio 45 Draft Document for Review July 18, 2014 10:18 pm Flex System networking offerings.fm Supports NIC virtualization – Modes of operation: • IBM Unified Fabric Port (UFP) • IBM Virtual Fabric mode vNIC • Switch Independent mode vNIC – Virtual port bandwidth allocation in 100 Mbps increments – Up to eight virtual ports per adapter (four per port) – With the optional Advanced Upgrade, two of the eight vNICs (one per port) are transformed into iSCSI or FCoE HBAs Wake On LAN support FCoE and iSCSI HBA function support with the optional Advanced Upgrade PCI Express 3.0 x8 host interface Full-duplex capability DMA support PXE support IPv4/IPv6 TCP, UDP checksum offload: – Large send offload – Large receive offload – RSS – IPv4 TCP Chimney offload – TCP Segmentation offload VLAN insertion and extraction Jumbo frames up to 9000 bytes Load balancing and failover support, including AFT, SFT, ALB, and LACP Converged Enhanced Ethernet (draft): – Enhanced Transmission Selection (ETS) (P802.1Qaz) – Priority-based Flow Control (PFC) (P802.1Qbb) – Data Center Bridging eXchange protocol (DCBX) (P802.1Qaz) Support Serial over LAN (SoL) For more information, see IBM Flex System CN4054/CN4054R 10Gb Virtual Fabric Adapter and EN4054 4-port 10Gb Ethernet Adapter, TIPS0868, which can be found at this website: http://www.redbooks.ibm.com/abstracts/tips0868.html
  • 57. Flex System networking offerings.fm Draft Document for Review July 18, 2014 10:18 pm 46 NIC Virtualization in IBM Flex System Fabric Solutions
  • 58. © Copyright IBM Corp. 2014. All rights reserved. 47 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Chapter 3. NIC virtualization considerations on the switch side This paper is primarily focused on the various options to virtualize NIC technology. This section introduces the two primary types of NIC virtualization (vNIC and UFP) available on the Flex System switches, as well as introduces and discusses considerations of the various sub-elements of these virtual NIC technologies. At the core of all virtual NICs discussed in this section, is the ability to take a single physical 10 GbE NIC, and carve it up into up to four virtual NICs, for use in the attaching host. This chapter focuses on various deployment considerations when looking at making the right choice in NIC virtualization within a PureFlex System environment. The following topics are covered: 3.1, “Virtual Fabric vNIC solution capabilities” on page 48 3.2, “Unified Fabric Port feature” on page 55 3.3, “Compute node NIC to I/O module connectivity mapping” on page 61 3
  • 59. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 48 NIC Virtualization in IBM Flex System Fabric Solutions 3.1 Virtual Fabric vNIC solution capabilities Virtual Network Interface Controller (called vNIC in this paper) was the original way IBM switches provided the ability to divide a physical NIC into smaller logical NICs, so that the OS has more ways to logically connect to the infrastructure. The vNIC feature is supported only on 10 Gb ports that face the compute nodes within the chassis, and only on certain Ethernet I/O modules. These currently include the EN4093R 10Gb Scalable Switch and CN4093 10Gb Converged Scalable Switch. vNIC also requires a node adapter that also supports this functionality. As of this writing, there are two primary forms of vNIC available: Virtual Fabric mode (or switch dependent mode) and Switch Independent mode. The Virtual Fabric mode of vNIC also is subdivided into two sub-modes: Dedicated uplink vNIC mode and Shared uplink vNIC mode. All vNIC modes share the following common elements: They are supported only on 10 Gb connections. Each vNIC mode allows a NIC to be divided into up to four vNICs per physical NIC (can be less than four, but not more). They all require an adapter that has support for one or more of the vNIC modes. When vNICs are created, the default bandwidth is 2.5 Gb for each vNIC, but they can be configured to be anywhere from 100 Mb up to the full bandwidth of the NIC. The bandwidth of all configured vNICs on a physical NIC cannot exceed 10 Gb. All modes support FCoE. A summary of some of the differences and similarities of these modes is shown in Table 3-1. These differences and similarities are covered in more detail next. Table 3-1 Attributes of vNIC modes Tip: It will occasionally be seen in other documentation that these modes are called vNIC 1 (Virtual Fabric mode vNIC) and vNIC 2 (Switch Independent mode vNIC). Capability IBM Virtual Fabric mode Switch independent modeDedicated uplink Shared uplink Requires support in the I/O module Yes Yes No Requires support in the NIC/CNA Yes Yes Yes Supports adapter transmit rate control Yes Yes Yes Support I/O module transmit rate control Yes Yes No Supports changing rate without restart of node Yes Yes No Requires a dedicated uplink per vNIC group Yes No No Support for node OS-based tagging Yes No Yes Support for more than one uplink path per vNIC No No Yes
  • 60. Chapter 3. NIC virtualization considerations on the switch side 49 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm 3.1.1 Virtual Fabric mode vNIC Virtual Fabric mode vNIC depends on the switch in the I/O module bay to participate in the vNIC process. Specifically, the IBM Flex System Fabric EN4093R 10Gb Scalable Switch and the CN4093 10Gb Converged Scalable Switch support this mode. It also requires an adapter on the compute node that supports the vNIC Virtual Fabric mode feature. In Virtual Fabric mode vNIC, configuration is performed on the switch and the configuration information is communicated between the switch and the adapter so that both sides agree on and enforce bandwidth controls. The mode can be changed to different speeds at any time without rebooting the OS or the I/O module. As noted, there are two types of Virtual Fabric vNIC modes: Dedicated uplink mode and Shared uplink mode. Both modes incorporate the concept of a vNIC group on the switch that is used to associate vNICs and physical ports into virtual switches within the chassis. How these vNIC groups are used is the primary difference between dedicated uplink mode and shared uplink mode. Virtual Fabric vNIC modes share the following common attributes: They conceptually are a vNIC group that must be created on the I/O module. Similar vNICs are bundled together into common vNIC groups. Each vNIC group is treated as a virtual switch within the I/O module. Packets in one vNIC group can get only to a different vNIC group by going to an external switch/router. For the purposes of Spanning tree and packet flow, each vNIC group is treated as a unique switch by upstream connecting switches/routers. Both modes support the addition of physical NICs (pNIC) (the NICs from nodes that are not using vNIC) to vNIC groups for internal communication to other pNICs and vNICs in that vNIC group, and share any uplink that is associated with that vNIC group. Dedicated uplink mode Dedicated uplink mode is the default mode when vNIC is enabled on the I/O module. In dedicated uplink mode, each vNIC group must have its own dedicated physical or logical (aggregation) uplink. In this mode, no more than one physical or logical uplink to a vNIC group can be assigned and it assumed that high availability is achieved by some combination of aggregation on the uplink or NIC teaming on the server. In dedicated uplink mode, vNIC groups are VLAN-independent to the nodes and the rest of the network, which means that you do not need to create VLANs for each VLAN that is used by the nodes. The vNIC group takes each packet (tagged or untagged) and moves it through the switch. This mode is accomplished by the use of a form of Q-in-Q tagging. Each vNIC group is assigned some VLAN that is unique to each vNIC group. Any packet (tagged or untagged) that comes in on a downstream or upstream port in that vNIC group has a tag placed on it equal to the vNIC group VLAN. As that packet leaves the vNIC into the node or out an uplink, that tag is removed and the original tag (or no tag, depending on the original packet) is revealed. Example 3-1 on page 50 shows an example Virtual Fabric vNIC - Dedicated Uplink mode configuration. The below example enables VLAN 4091 as the Outer Q-n-Q VLAN ID on vNIC port 1 the first Index ID. By default the bandwidth configuration is set to 25% on all four index numbers equating to 100%. As noted above, these values can be adjusted as needed but not to exceed 100% on all four index’s.
  • 61. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 50 NIC Virtualization in IBM Flex System Fabric Solutions In the previous paragraph we discussed the INT vNIC Port settings but how does this relate to the EXT Port for network access? Within the vnic vnicgroup 1 configuration one of three options can chosen to get network access; port; a single physical port trunk; a Static/Trunk Port Channel key; an LACP (802.3ad) Port Channel The failover command, also located within the vnic vnicgroup section, allows for the monitoring of an EXT Port or Port Channel. In the event of a link failure on the EXT Port or Port Channel the I/O Module will disable all related members within that vnicgroup. Example 3-1 Virtual Fabric mode example configuration vnic enable vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 4091 enable failover member INTA1.1 port EXT1 exit In Figure 3-1 on page 51, Virtual Fabric vNIC Dedicated Uplink Mode uses vNIC Groups to partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is supported on all Intel platform operating systems with the Emulex Virtual Fabric Adapters. In this example, vNIC Group1, 2, 3, and 4 utilizes separate uplinks since normal VLAN traffic is being transparently switched within each group using Q-n-Q. Since all traffic is transparent and is contained within its own vNIC Group and I/O Module it is possible to run the same VLAN or VLANs within multiple vNIC Groups and still maintain VLAN isolation. For instance, in Figure 3-1 on page 51 below VLAN 20 is being utilized within two separate ESXi vSwitch’s. However, since each vSwitch has its own physical uplink and the I/O Module is also running Virtual Fabric vNIC Dedicated Uplink Mode VLAN 20 between the two vSwitches will remain in isolation from one another.
  • 62. Chapter 3. NIC virtualization considerations on the switch side 51 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Virtual Fabric vNIC Dedicated Uplink mode is shown in Figure 3-1. Figure 3-1 IBM Virtual Fabric vNIC Dedicated Uplink Mode Shared Uplink mode Shared uplink mode is a global option that can be enabled on an I/O Module that has the vNIC feature enabled. As the name suggests, it allows an uplink to be shared by more than one group, which reduces the possible number of uplinks that are required. It also changes the way that the vNIC groups process packets for tagging. In Shared Uplink mode, it is expected that the servers no longer use tags. Instead, the vNIC group VLAN acts as the tag that is placed on the packet. When a server sends a packet into the vNIC group, it has a tag placed on it equal to the vNIC group VLAN and then sends it out the uplink tagged with that VLAN. Only one VLAN can be assigned to a vNIC Group. Since Shared Uplink mode is a global parameter, Dedicated Uplink mode cannot be utilized on the same I/O Module when enabled. Unlike the restrictions that both Virtual Fabric Dedicated and Shared Uplink mode contains, Unified Fabric Port (UFP) does not contain these restrictions. Example 3-2 on page 52 shows an example of Shared Uplink mode. The following parameters must be set in order for Shared Uplink mode to operate properly. Also note that most parameters below in this example are identical to the settings in Dedicated Uplink mode section above minus the vnic uplink-share command and the vlan number which in Shared Uplink mode is identical to that of the customers vlan. The default VLAN must be set on both the INT and EXT Port or PortChannel participating in the Shared Uplink vNIC mode configuration. TAGGING must be enabled on the EXT Port or PortChannel. All VLAN’s set within the vnicgroup will be TAGGED to the upstream customer network.
  • 63. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 52 NIC Virtualization in IBM Flex System Fabric Solutions Example 3-2 Virtual Fabric vNIC Shared Uplink mode example configuration vnic enable vnic uplink-share vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 100 enable failover member INTA1.1 port EXT1 exit ! In Figure 3-2 on page 53, Virtual Fabric vNIC Shared Uplink Mode uses vNIC Groups to partition the vSwitch within the ESXi Host. Note that this is not specific to VMware and is supported on all Intel Platform Operating Systems with the Emulex Virtual Fabric Adapter. In this example vNIC Group 1, 2, and 3 all share the same uplink port out of the I/O Module in order to communicate with the network. vNIC Group 4, however, utilizes a separate uplink giving flexibility and control over physical connectivity into the network. The biggest draw back to Virtual Fabric vNIC Shared Uplink Mode is the inability to apply VLANs via the operating system.
  • 64. Chapter 3. NIC virtualization considerations on the switch side 53 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Virtual Fabric vNIC Shared Uplink mode is shown in Figure 3-2 below. Figure 3-2 IBM Virtual Fabric vNIC Shared Uplink Mode 3.1.2 Switch Independent mode vNIC Switch Independent mode vNIC is configured only on the node, and the I/O Module is unaware of this virtualization. The I/O Module acts as a normal switch in all ways (any VLAN that must be carried through the I/O Module must be created on the I/O Module and allowed on the wanted ports). This mode is enabled at the compute node directly (via F1 setup at boot time or via FSM configuration pattern controls), and has similar rules as Virtual Fabric vNIC mode regarding how you can divide the vNIC’s. But any bandwidth settings are limited to how the node sends traffic, not how the I/O Module sends traffic back to the node (since the I/O Module is unaware of the vNIC virtualization taking place on the Compute Node). Also, the bandwidth settings cannot be changed in real time, because they require a reload of the compute node for any speed change to take effect. Switch Independent mode requires setting an LPVID value in the compute node NIC configuration, and this is a catch-all VLAN for the vNIC to which it is assigned. Any untagged packet from the OS sent to the vNIC is sent to the switch with the tag of the LPVID for that vNIC. Any tagged packet sent from the OS to the vNIC is sent to the switch with the tag set by the OS (the LPVID is ignored). Owing to this interaction, most users set the LPVID to some unused VLAN, and then tag all packets in the OS. One exception to this is for a Compute Node that needs PXE to boot the base OS. In that case, the LPVID for the vNIC that is providing the PXE service must be set for the wanted PXE VLAN.
  • 65. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 54 NIC Virtualization in IBM Flex System Fabric Solutions Because all packets that are coming into the I/O module from a NIC that is configured for Switch Independent mode vNIC are always tagged (by the OS or by the LPVID setting if the OS is not tagging), all VLANs that are allowed on the port on the I/O Module side should be tagging as well. This means set the PVID/Native VLAN on the switch port to some unused VLAN, or set it to one that is used and enable PVID tagging to ensure the port sends and receives PVID and Native VLAN packets as tagged. In most OSs, Switch Independent mode vNIC supports as many VLANs as the OS supports. One exception is with bare metal Windows OS installations, where in Switch Independent mode, only a limited number of VLANs are supported per vNIC (maximum of 63 VLANs, but less in some cases, depending on version of Windows and what driver is in use). See the documentation for your NIC for details about any limitations for Windows and Switch Independent mode vNIC. In Figure 3-3 on page 55, Switch Independent Mode is being utilized to present multiple vmnic instances to the hypervisor. Each vmnic can be used to connect to it’s own vSwitch with multiple Port Groups. In this example each vmnic is configured to support 1 or more Port Groups. Those Port Groups without a VLAN defined will utilize the LPVID VLAN ID to communicate with the Network. For instance, vmnic 0 has an untagged Port Group defined that is part of the LPVID 200 vNIC. For that specific Port Group each VM client will end up on the network TAGGED with VLAN 200. Those Port Groups that do contain a VLAN TAG will utilize its own TAG and will bypass the LPVID. The same thing goes for the untagged Port Group connected to vmnic 2 except that VM client will utilize the LPVID VLAN 300 to communicate with the Network. The I/O Module, on the other hand, sees these ports as physical 10 GB Ports utilizing Traditional Network VLAN’s and Switching technology.
  • 66. Chapter 3. NIC virtualization considerations on the switch side 55 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Figure 3-3 IBM Switch Independent vNIC mode Summary of Virtual Fabric mode vNIC options In this section, we have described the various modes of vNIC. The mode that is best-suited for a user depends on the user’s requirements. Virtual Fabric Dedicated Uplink mode offers the most control, and Shared Uplink mode and Switch Independent mode offer the most flexibility with uplink connectivity. 3.2 Unified Fabric Port feature Unified Fabric Port (UFP) is another approach to NIC virtualization. It is similar to Virtual Fabric vNIC but with enhanced flexibility and should be considered the direction for future development in the virtual NIC area for IBM switching solutions. UFP is supported today on the EN4093R 10Gb Scalable Switch, CN4093 10Gb Converged Scalable Switch, and SI4093 System Inyterconnet Module and utilizes LLDP TLDs to communicate between the physical switch port and the physical NIC within the compute node. UFP and Virtual Fabric vNIC are mutually exclusive in that you cannot enable UFP and Virtual Fabric vNIC at the same time on the same switch. UFP supports the following modes of operation per virtual NIC (vPort): 3.2.1, “UFP Access and Trunk modes” on page 56 3.2.2, “UFP Tunnel mode” on page 58 3.2.3, “UFP FCoE mode” on page 59 3.2.4, “UFP Auto mode” on page 60
  • 67. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 56 NIC Virtualization in IBM Flex System Fabric Solutions IBM Unified Fabric Port utilizes vPorts to create isolation between virtual NICs within the compute node and maintains that isolation within the I/O module. vmNIC’s within the compute node are created (up to 4 per 10 GB NIC) that can be assigned to separate vSwitches or be seen as a virtual HBA within the hypervisor or bare bone operating system. In the example shown in Figure 3-4, vPort (.1) is utilized for ESXi Management for connectivity to vCenter and vPort (.3) is utilized for vMotion both of which are set to Access mode. vPort (.2) has been enabled for FCoE mode. vPort (.4), which is set to Tunnel mode, is utilized to Tunnel VM Data between the hypervisor and the upstream network. Figure 3-4 IBM Unified Fabric Port Mode 3.2.1 UFP Access and Trunk modes Access: The vPort only allows the default VLAN, which is similar to a physical port in access mode. Trunk: The vPort permits host side tagging and supports up to 32 customer-defined VLANs on each vPort. Example configuration Example 3-3 on page 57 shows one vPort configured for Access mode and another vPort, within the same physical port, configured for Trunk mode. VLAN 10 on vPort 1 is set to be an access port allowing only a single un-tagged VLAN for this vPort. VLAN 20 on vPort 2 is set to be the native VLAN for that vPort with VLAN 30 and 40 set to be tagged over that same vPort.
  • 68. Chapter 3. NIC virtualization considerations on the switch side 57 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Example 3-3 vPort Access and Trunk mode example configuration ufp port INTA1 vport 1 network mode access network default-vlan 10 enable exit ! ufp port INTA1 vport 2 network mode trunk network default-vlan 20 enable exit ! vlan 30,40 enable vmember INTA1.2 Optionally, Example 3-4 shows adding the ability to detect uplink failures referred to as failover. Failover is a feature used to monitor an up uplink port or port channel and upon detection of a failed link or port channel the I/O Module will disable any associated members (INT Ports) or vmembers (UFP vPorts). Example 3-4 UFP Failover of a vmembers failover trigger 1 mmon monitor member EXT1 failover trigger 1 mmon control vmember INTA1.1 failover trigger 1 enable Configuration validation and state of a UFP vPort While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort as seen below in Example 3-5 on page 57. Example 3-5 below shows the results of a successfully configured vPort with UFP selected and running on the Compute Node. Example 3-5 display’s individual ufp vPort configuration and status PF_CN4093a#show ufp information vport port 3 vport 1 ------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ --------- INTA3.1 up dis trunk 4002 10 dis 10 20 30 Below is an understanding of each of the states taking from the above Example 3-5. vPort = is the Virtual Port ID [port.vport] state = the state of the vPort (up, down or disabled) evbprof = only used when Edge Virtual Bridge Profile is being utilized, i.e. 5000v mode = vPort mode type, e.g. access, trunk, tunnel, fcoe, auto Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port <port identifier> enable).
  • 69. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 58 NIC Virtualization in IBM Flex System Fabric Solutions svid = Reserved VLAN 4001-4004 for UFP vPort communication with Emulex NIC defvlan = default VLAN is the PVID/Native VLAN for that vPort (untagged) deftag = default TAG, disabled by default, allows for option to tag the defvlan VLANs = list of VLAN’s assigned to that vPort Some other useful UFP vPort troubleshooting commands can be seen below in Example 3-6. Example 3-6 display's multiple ufp vPort configuration and status PF_CN4093a(config)#show ufp information port ----------------------------------------------------------------- Alias Port state vPorts chan 1 chan 2 chan 3 chan 4 ------- ---- ----- ------ --------- --------- --------- --------- INTA1 1 dis 0 disabled disabled disabled disabled INTA2 2 dis 0 disabled disabled disabled disabled INTA3 3 ena 1 up disabled disabled disabled . . . PF_CN4093a(config)#show ufp information vport ------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ --------- INTA1.1 dis dis tunnel 0 0 dis INTA1.2 dis dis tunnel 0 0 dis INTA1.3 dis dis tunnel 0 0 dis INTA1.4 dis dis tunnel 0 0 dis INTA2.1 dis dis tunnel 0 0 dis INTA2.2 dis dis tunnel 0 0 dis INTA2.3 dis dis tunnel 0 0 dis INTA2.4 dis dis tunnel 0 0 dis INTA3.1 up dis trunk 4002 10 dis 10 20 30 . 3.2.2 UFP Tunnel mode Q-in-Q mode, where the vPort is customer VLAN-independent (this is the closest to vNIC Virtual Fabric dedicated uplink mode). Tunnel mode is the default mode for a vPort. Example configuration Example 3-7 on page 59 shows port INTA1 vPort 3 configured in Tunnel mode, Q-n-Q, which can carry multiple VLANs through a single outer tagged VLAN ID. In this example we are using VLAN 4091 as the Tunnel VLAN. When configuring UFP Tunnel mode at least one EXT port must be configured to support the Outer VLAN ID as seen in the below example. Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port <port identifier> enable).
  • 70. Chapter 3. NIC virtualization considerations on the switch side 59 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Example 3-7 vPort Tunnel mode example configuration ufp port INTA1 vport 3 network mode tunnel network default-vlan 4091 enable exit ! interface port EXT1 tagpvid-ingress pvid 4091 exit Configuration validation and state of a UFP vPort - Tunnel mode While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort. (See Example 3-6 on page 58.) 3.2.3 UFP FCoE mode UFP FCoE mode dedicates the specific vPort (vPort 2 only) for FCoE traffic when enabled within the UEFI. See Chapter 4, “NIC virtualization considerations on the server side” on page 65 on how to enable FCoE within a compute node. Example configuration Example 3-8 on page 59 shows vPort 2 set in FCoE mode utilizing VLAN 1001. QoS minimum bandwidth is set to 50% of a 10 GbE port with the default max burst set of 100%. Example 3-8 vPort FCoE Mode example configuration ufp port INTA1 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 50 enable exit Configuration validation and state of a UFP vPort While it’s easy enough to read and understanding how to configure an I/O module for UFP, there are several troubleshooting commands that can be utilized to validate the configuration and the state of a vPort. (See Example 3-6 on page 58). Note: This is only the vPort setting required to carry FCoE. CEE, FCoE FIPS Snooping and other settings are required to be enabled that can be seen in Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123. Note: Before configuring vPort mode, UFP must be enabled globally (ufp enable command) and on the port (ufp port <port identifier> enable).
  • 71. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 60 NIC Virtualization in IBM Flex System Fabric Solutions 3.2.4 UFP Auto mode The UFP vPort Auto mode feature is based on IBM VMready and IEEE 802.1Qbg implementations. IBM VMready and IEEE 802.1Qbg Edge Virtual Bridging are software solutions that supports open standards virtualization. They allow administrators to create groups of virtual machine port groups allowing the ability to administer and migrate from a central location. VMready works with all major hypervisor software, including VMware, Microsoft Hyper-V, Linux Kernel-based Virtual Machine (KVM) or, Citrix XenServer. Although IBM PowerVM® is supported with VMready, UFP is specific to x86-based compute nodes. It requires no proprietary tagging or changes to the hypervisor software. UFP vPort Auto Mode works to dynamically create and remove VLANs learned from the vPort. When a VLAN is created and added to a vPort that same VLAN ID is also added to the Uplink associated with that vPort. This, however, can be intrusive to a network if having more than one uplink path out of a Switch, not a PortChannel, to a single destination running the same VLAN. Caution should be taken when implementing VMready. More information can be found on implementing VMready within the following IBM Redbooks publication: http://www.redbooks.ibm.com/abstracts/sg247985.html 3.2.5 UFP vPort considerations The following rules and attributes are associated with UFP vPorts They are supported only on 10 Gb internal interfaces. UFP allows a NIC to be divided into up to four virtual NICs called vPorts per physical NIC (can be less than 4, but not more). Each vPort can be set for a different mode or same mode (with the exception of the FCoE mode, which is limited only to a single vPort on a UFP port, and specifically only vPort 2). UFP requires the proper support in the Compute Node for any port using UFP. By default, each vPort is ensured 2.5 Gb and can burst up to the full 10G if other vPorts do not need the bandwidth. The ensured minimum bandwidth and maximum bandwidth for each vPort are configurable. The minimum bandwidth settings of all configured vPorts on a physical NIC cannot exceed 10 Gb. Each vPort must have a default VLAN assigned. This default VLAN is used for different purposes in different modes. This default VLAN must be unique across the other three vPorts for this physical port, which means that vPort 1.1 must have a different default VLAN assigned than vPort 1.2, 1.3 or 1.4. When in trunk or access mode, this default VLAN is untagged by default, but it can be configured for tagging if desired. This configuration is similar to tagging the native or PVID VLAN on a physical port. In tunnel mode, the default VLAN is the outer tag for the Q-in-Q tunnel through the switch and is not seen by the end hosts and upstream network. vPort 2 is the only vPort that supports the FCoE setting. vPort 2 can also be used for other modes (for example, access, trunk or tunnel). However, if you want the physical port to support FCoE, this function can only be defined on vPort 2
  • 72. Chapter 3. NIC virtualization considerations on the switch side 61 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm The physical port must be set to VLAN 1 as the pvid with tagging enabled and no other VLAN’s defined for that port. Table 4-2 offers some check points in helping to select a UFP mode. Table 3-2 Attributes of UFP modes Summary of whether or not Virtual Fabric or UFP should be considered What are some of the criteria to decide if a UFP or vNIC solution should be implemented to provide the virtual NIC capability? In an environment that has not standardized on any specific virtual NIC technology, UFP is the way to go. As noted, all future virtual NIC development will be on UFP. UFP has the advantage of being able to emulate vNIC virtual fabric modes (via tunnel mode for dedicated uplink vNIC and access mode for shared uplink vNIC) but can also offer virtual NIC support with customer VLAN awareness (trunk mode) and shared virtual group uplinks for access and trunk mode vPorts. If an environment has already standardized on Virtual Fabric mode vNIC and plans to stay with it, Virtual Fabric mode vNIC is recommended. Note that Switch Independent mode vNIC is actually exclusive of the above decision making process. Switch Independent mode has its own unique attributes, one being truly switch independent, which allows a user to configure the switch without restrictions to the virtual NIC technology, other than allowing the proper VLANs. UFP and Virtual Fabric mode vNIC each have a number of unique switch-side requirements and configurations. The down side to Switch independent mode vNIC is the inability to make changes to the vNIC without first rebooting the server, and the lack of support for bidirectional bandwidth allocation. 3.3 Compute node NIC to I/O module connectivity mapping Port mapping between VFA NICs and I/O module slots are often mis-understood and confusing to explain. Each type of mezzanine card option could have similar connectivity to each I/O module slot and others might be completely different depending on the number of ports and ports per ASIC. One thing is always the same, each mezzanine slot consists of four lanes. Each lane can drive either 1 Gb or 10 Gb Ethernet speeds. In total a single mezzanine slot is possible of driving up to 40 Gb Ethernet to each I/O module. Capability IBM UFP vPort mode options Access Trunk Tunnel FCoE Support for a single untagged VLAN on the vPorta Yes Yes Yes No Support for VLAN restrictions on vPortb Yes Yes No Yes VLAN-independent pass-true for customer VLANs No No Yes No Support for FCoE on vPort No No No Yes Support to carry more than 256 VLANs on a vPort No No Yes No a. Typically a user sets the vPort for access mode if the OS uses this vPort as a simple untagged link. Both trunk and tunnel mode can also support this, but are not necessary to carry only a single untagged VLAN. b. Access and FCoE mode restricts VLANs to only the default VLAN that is set on the vPort. Trunk mode restricts VLANs to ones that are specifically allowed per VLAN on the switch (up to 32).
  • 73. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 62 NIC Virtualization in IBM Flex System Fabric Solutions 3.3.1 Embedded 10 Gb VFA (LOM) - Mezzanine 1 Figure 3-5 shows an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM), specifically for the x86 compute nodes that can be replaced with another option card by removing the riser card from Mezzanine Slot 1. The 2-port LoM types are capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The dual-port LoM consists of a single ASIC with two ports of 10 GbE that has physical direct wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy. Figure 3-5 2 port LoM 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2 3.3.2 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 Figure 3-6 on page 63 shows a CN4054 4 port 10Gb Virtual Fabric Adapter specifically for the x86 compute nodes that can be placed into either Mezzanine Slot 1 or 2. The 4-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The four-port CNA Card consists of dual ASICs with two ports of 10 GbE each that has physical direct wiring through the midplane to the I/O Module Slot 1 and 2 for port redundancy when placed into Mezzanine Slot 1. Switch upgrade for CN4054/CN4054R: Prior to IBM Networking OS version 7.8, for EN4093R, CN4093 and SI4093 modules, you must have Upgrade 1 applied on these modules to enable network connectivity for the compute nodes with 4-port expansion cards installed. With the introduction of flexible port mapping in IBM Networking OS 7.8, if the Flex System chassis is not fully populated with the compute nodes that have four network ports, there might be no need to buy Upgrade 1.
  • 74. Chapter 3. NIC virtualization considerations on the switch side 63 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Switch side.fm Figure 3-6 4 port CN4054/R 10G VFA Mezz 1 connectivity to I/O Modules 1 and 2 3.3.3 IBM Flex System CN4054/CN4054R 10Gb VFA - Mezzanine 1 and 2 Figure 3-7 on page 63 shows two 4-port CN4054 10Gb Virtual Fabric Adapters specifically for the x86 compute nodes that has placed into both Mezzanine Slots 1 and 2. The 4-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The Virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. The four-port CNA card consists of dual ASICs with two ports of 10 GbE each that has physical direct wiring through the Midplane to I/O Module Slot 1 and 2 for Mezzanine 1 and I/O Modules 3 and 4 for Mezzanine 2. This provides for a highly redundant environment with bandwidth possibilities of up to 80 Gb can be achieved with this option to each half width compute node. Figure 3-7 Two 4-port CN4054/CN4054R 10Gb VFA Mezz 1 and 2 connectivity to I/O Modules
  • 75. NIC virtualization considerations - Switch side.fm Draft Document for Review July 18, 2014 10:18 pm 64 NIC Virtualization in IBM Flex System Fabric Solutions 3.3.4 IBM Flex System x222 Compute Node Each server in the x222 includes an Embedded 10Gb Virtual Fabric Adapter (VFA, also known as LAN on Motherboard or LOM) built in to the system board. The x222 has one Fabric Connector (which is physically on the lower server) and the Ethernet connections from both Embedded 10 Gb VFAs are routed through it. Figure 3-8 shows how each server connects to the I/O module. Each 2-port CNA type is capable of pNIC, FCoE and iSCSI (license key may be required). The virtualization options are Virtual Fabric Mode, Switch Independent Mode and Unified Fabric Protocol. Figure 3-8 x222 Node Server connectivity to I/O Module Switch upgrade for CN4054/CN4054R: Prior to IBM Networking OS version 7.8, for EN4093R, CN4093 and SI4093 modules, you must have Upgrade 1 applied on these modules to enable network connectivity for the compute nodes with 4-port expansion cards installed. With the introduction of flexible port mapping in IBM Networking OS 7.8, if the Flex System chassis is not fully populated with the compute nodes that have four network ports, there might be no need to buy Upgrade 1. Switch upgrade for x222: Prior to IBM Networking OS version 7.8, for EN4093R, CN4093 and SI4093 modules, you must have Upgrade 1 applied on these modules to enable x222 network connectivity. With the introduction of flexible port mapping in IBM Networking OS 7.8, if the Flex System chassis is not fully populated with the x222 compute nodes that have four network ports, there might be no need to buy Upgrade 1.
  • 76. © Copyright IBM Corp. 2014. All rights reserved. 65 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Chapter 4. NIC virtualization considerations on the server side In 2.3, “IBM Flex System Virtual Fabric adapters” on page 42 we introduced the physical Emulex NICs that support virtual NIC functionality in the PureFlex System environment and in Chapter 3, “NIC virtualization considerations on the switch side” on page 47 we discussed the I/O module virtualization features. In this chapter we go into detail on how to enable the NIC virtualization from the server side, as well as some design considerations for utilizing these NICs within various operating systems. The following topics are covered: 4.1, “Enabling virtual NICs on the server via UEFI” on page 66 4.2, “Enabling virtual NICs via Configuration Patterns” on page 82 4.3, “Utilizing physical and virtual NICs in the OSes” on page 105 4
  • 77. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 66 NIC Virtualization in IBM Flex System Fabric Solutions 4.1 Enabling virtual NICs on the server via UEFI Regardless of what mode of virtual NIC is desired, all modes have at least some small element of low level configuration that must be performed on the server side. Some Emulex NICs may ship pre-configured for vNIC Virtual Fabric mode already enabled, but even those can be changed to a different mode, or have vNIC disabled all together if desired. Exactly how to enable and/or change the virtual NIC function on the Emulex NICs has varied over the years, but for the most part it can always be done via the UEFI configuration from the F1 setup on the server. It is also possible to control and automate setting virtual NIC options via certain tools, such as using Configuration Patterns in the FSM, and this will also be introduced in this section as well, but we will primarily focus on using the F1 setup method for configuring the virtual NIC on the server side. 4.1.1 Getting in to the virtual NIC configuration section of UEFI When manually performing the virtual NIC configuration on the server, it is necessary to enter UEFI via the F1 setup option during server boot. Once you are into F1 setup you need to drill into the section that permits enabling and changing the desired virtual NIC mode and perform any changes and then save those changes. Important: The steps to get into UEFI in this section assume the reader knows how to get to the console of a Compute Node. For reference, this is commonly done by connecting via browser to the IMM IP address of that host, and clicking on the Remote Control button, and the clicking on the option to start remote control in either single-user or multi-user mode.
  • 78. Chapter 4. NIC virtualization considerations on the server side 67 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm The following are the exact steps on how to get to the virtual NIC configuration screens when utilizing version 4.6.281.26 of the Emulex firmware 1. Power on the server, and when the screen shown in Figure 4-1 is present, press the F1 key to enter in to UEFI setup. Figure 4-1 Example of screen to press the F1 key to enter UEFI setup
  • 79. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 68 NIC Virtualization in IBM Flex System Fabric Solutions 2. On the main System Configuration and Boot Management screen as seen in Figure 4-2 on page 68 use the arrow keys to scroll down to System Settings option and press Enter. Figure 4-2 Example of first screen viewed after pressing the F1 key to enter UEFI setup
  • 80. Chapter 4. NIC virtualization considerations on the server side 69 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 3. On the System Settings screen as seen in Figure 4-3, scroll down to the Network option and press Enter. Figure 4-3 Example of screen to enter network set up
  • 81. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 70 NIC Virtualization in IBM Flex System Fabric Solutions 4. On the Network screen, scroll down to the desired NIC and press Enter – Exactly how many NICs you see on the Network screen will vary, depending on what model NIC is installed (dual port, quad port and so on), how many of these NICs are installed (LoM only, MEZZ1 and/or MEZZ2 slots used), and if a virtual NIC mode is already enabled or not. For example, if this were a Compute Node with only the LoM dual port NIC, and no virtual NIC had previously been enabled, you would only see the two physical NICs on this screen, as seen in Figure 4-4. – If this were the same dual port NIC and virtual NIC had already been enabled, you would see between six and eight NICs on this screen (depending on if FCoE/iSCSI had also been previously enabled or not). Figure 4-4 Example of Network screen with dual port LoM, before any virtual NIC has been enabled
  • 82. Chapter 4. NIC virtualization considerations on the server side 71 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm – Figure 4-5 shows how the Network screen might look on a dual port NIC after some form of virtual NIC had been enabled, and the system restarted. Figure 4-5 Example of Network screen after vNIC has been enabled and the system restarted – The images in Figure 4-4 on page 70 and Figure 4-5 on page 71 also illustrate an important concept, once a NIC has been placed into a virtual NIC mode and reloaded, and a user comes back into this Network screen, if it is desired to drill back into the NICs to review or change the virtual NIC settings, the two top NICs (in this example of a dual NIC solution) are the only ones that will let you make those changes. If you drill into the third through eight NICs in this list, the user will not be presented with an option to drill in to make changes to the virtual NIC settings. Only the first two NICs in the list of 8 NICs in this example will let you make those changes.
  • 83. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 72 NIC Virtualization in IBM Flex System Fabric Solutions 5. Once a user highlights the desired NIC in the Network screen and presses the Enter key, a screen for just that one NIC will be shown, something like what is shown in Figure 4-6. Figure 4-6 Example of the individual NIC screen
  • 84. Chapter 4. NIC virtualization considerations on the server side 73 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 6. On the screen shown in Figure 4-6 on page 72, highlight the NIC itself and press Enter to drill one step deeper into that NICs configuration, which will bring up a screen called Emulex NIC selection, that will look something like Figure 4-7 (may vary depending on firmware version of the NIC). Figure 4-7 Example of the Emulex NIC Selection screen (virtual NIC disabled) Some important items with regard to Figure 4-7: – If Multichannel mode is disabled, then regardless of the Personality setting (NIC, FCoE or iSCSI), the OS will be presented with just the physical NICs – If Multichannel mode is set to any form of virtual NIC mode, then the Personality setting impacts how many virtual NICs are presented to the OS. • If NIC is selected in Personality, 4 NICs will be presented to the OS for each 10G NIC set to a form of virtual NIC. • If FCoE or iSCSI is selected in Personality, 3 NICs will be presented to the OS for each 10G NIC set to a form of virtual NIC. An example of 3 ports on each NIC on a dual port NIC (6 ports total) can be seen in Figure 4-8 on page 74.
  • 85. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 74 NIC Virtualization in IBM Flex System Fabric Solutions Figure 4-8 NICs available on dual port NIC with virtual NIC enabled, and iSCSI or FCoE Personality enabled – The Multichannel mode is how the virtual NIC feature is enabled, and should bring up a window as shown in Figure 4-9 when Multichannel is selected and the Enter key is pressed: Figure 4-9 Emulex Multichannel (virtual NIC) mode options And should have these four options listed • Switch Independent Mode (This is Switch Independent Mode vNIC) • IBM Virtual Fabric Mode (this is vNIC Virtual Fabric mode) • IBM Unified Fabric Protocol Mode (This is UFP) • Disable (when selected turns off all NIC virtualization on this ASIC – Controller configuration is where you can make some changes to the vNIC modes of virtual NIC (once enabled and saved in UEFI, all remaining configuration for the UFP modes of virtual NIC is done via the I/O Module) Important: If you do not see all three virtual NIC options (Switch Independent Mode, IBM Virtual Fabric Mode, and IBM Unified Fabric Protocol Mode), more then likely the NIC is on down level firmware, and should be upgraded before going any further.
  • 86. Chapter 4. NIC virtualization considerations on the server side 75 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 4.1.2 Initially enabling virtual NIC functionality via UEFI Starting from the Emulex NIC selection screen, perform the following steps to select a virtual NIC mode: 1. Scroll down to the Multichannel Mode option and press Enter to see the selections as shown in Figure 4-10. Figure 4-10 Selecting a multichannel mode 2. In the screen shown in Figure 4-10 scroll to the desired virtual NIC mode and press the Enter key to enable the version of virtual NIC to be used (or disable it if the Disable option is selected) 3. What needs to happen next depends on what mode is selected: – If Switch Independent Mode is selected, you must now go into the Controller Configuration portion of the Emulex NIC Selection screen, and set the LPVID (Logical Port VLAN Identifier), and the Bandwidth (in older firmware you also had to enable or disable each virtual NIC individually, but that is not necessary in newer firmware). See Special settings for vNIC Switch Independent Mode section for details. With this mode of Virtual NIC mode, there are no special settings that need to be performed on the I/O Modules. – If IBM Virtual Fabric Mode is selected, you can optionally go into the Controller Configuration section and set LPVID (as seen in Special settings for vNIC Virtual Fabric mode section), but you must perform specific configuration steps on the I/O Modules to complete this mode of virtual NIC. See chapter 4 for details on necessary settings on the I/O Modules to complete this configuration.
  • 87. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 76 NIC Virtualization in IBM Flex System Fabric Solutions – If IBM Unified Fabric Protocol Mode is selected, no other configuration in the UEFI is permitted, but you must perform specific configuration on the I/O Modules themselves to complete this mode of virtual NIC. See chapter 4 for details on necessary settings on the I/O Modules to complete this configuration. Regardless of the mode selected, it is necessary to eventually exit out of UEFI and save the changes before any of these options take effect. It is important to note that enabling a type of virtual NIC in the Multichannel mode section of the Emulex NIC Selection screen impacts all NICs on an ASIC, not just that single NIC. If working with the dual port NIC (single ASIC solution), enabling a virtual NIC mode on one NIC enables the feature on both NICs. If working with the 4 or 8 port Emulex NIC (both dual ASIC solutions) and want virtual NICs on all NICs, you must enable it twice, once for each ASIC (in the case of the 8 port NIC, when you enable it on a single port on an ASIC, the other 3 ports on that same ASIC are also enabled for this function). See Chapter 4 for details on ASIC NIC mapping in relationship to I/O Module connectivity. 4.1.3 Special settings for the different modes of virtual NIC via UEFI As noted, when UFP is enabled there are no other settings necessary in UEFI, but both modes of vNIC virtual NIC have more settings that can be performed within UEFI. These extra settings are mandatory with Switch Independent Mode vNIC, and optional for Virtual Fabric Mode vNIC. The following are the extra settings for these modes. Important: Unlike when enabling the virtual NIC feature itself, where it effects all ports on the same ASIC, you must complete these extra settings on a per physical port basis. So if this is a dual port NIC, once you have set and saved the first NIC, you must exit back to the Network screen, and select the second physical NIC, and repeat the process.
  • 88. Chapter 4. NIC virtualization considerations on the server side 77 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Special settings for vNIC Switch Independent Mode After the Multichannel Mode has been set to Switch Independent Mode, it is now mandatory to scroll down to the Controller Configuration option and complete other steps to bring these virtual NICs fully operational. After selecting the Controller Configuration option and pressing Enter you will be taken to a screen similar to that seen in Figure 4-11. Figure 4-11 Example options available in Switch Independent Mode As can be seen, the Controller Configuration screen for Switch Independent Mode offers 4 options: 1. View configuration- Views the most recently saved configuration (changes that have been made but have not yet been saved via the Save Current Configurations option on this screen, will not be seen in here) 2. Configure Bandwidth - Defaults to 0G per vNIC, and must be set and saved before they become operational in the OS 3. Configure LPVID - Must be set and saved before these vNICs will become operational in the OS 4. Save Current Configuration - Must save config changes before leaving this screen or changes will be lost Important: One of the most common issues noted in the field is the changes not being saved in this screen before exiting. Remember to always save here if any changes are made in this area. It may be a good idea after saving changes and exiting this screen, to go back into this screen and reconfirm the configurations for LPVID and Bandwidth were truly saved.
  • 89. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 78 NIC Virtualization in IBM Flex System Fabric Solutions The following provides more details on these specific options. Configure Bandwidth After scrolling to the Configure Bandwidth option and pressing the Enter key, a screen similar to Figure 4-12 will be shown: Figure 4-12 Example of Bandwidth settings in Switch Independent Mode showing default settings Users must properly set the desired minimum and maximum bandwidths before this configuration can be saved. The following are some guidelines with regard to these Bandwidth settings: All values are in percentages of 10G (for example, setting a 10 in here represents 10% of 10G, meaning it is set for 1G) All values are between 0 to 100 in increments of 1 (1% of 10G = 100M) The total value of all the minimums must equal 100%, or save will not be allowed The value of any given vNIC maximum must be equal to or greater then the minimum for that vNIC If hard enforcement of bandwidth is desired, set the minimum and maximum values the same for each vNIC. An example of this would be setting both the minimum and maximums values all to 25, which would hard lock the values to 2.5G per each vNIC. If it is desired to allow vNICs to use excess bandwidth not in use by other vNICs, set the maximum to a higher value then the minimum. An example of this would be setting all of the minimums to 25, and all of the maximums to some higher value, in which case each vNIC is guaranteed 25%, but can use up to their maximum percentage if other vNICs are not using their full minimum allotment.
  • 90. Chapter 4. NIC virtualization considerations on the server side 79 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm It is possible to set the maximum for all vNICs to 100%, meaning each vNIC is guaranteed the minimum set, but can use up to 100% of the remaining bandwidth if it is not in use by other vNICs Configure LPVID After scrolling to the Configure LPVID option and pressing Enter, a screen similar to Figure 4-12 will be shown. Figure 4-13 Example of default LPVID settings in Switch Independent Mode The LPVID is a unique concept to the vNIC based options (both Virtual Fabric mode and Switch Independent Mode). From an end user perspective the LPVID value could be considered the default VLAN for that vNIC. This LPVID value is only used by the OS if the OS is sending untagged packets. If the OS is sending untagged packets toward the I/O Module, that packet will get a tag equal to the LPVID for that vNIC, before being sent on its way to the I/O Module (return packets would have the LPVID VLAN stripped off before being sent back to the OS). If the OS is sending tagged packets, the LPVID is ignored and the OS VLAN tag is passed to the upstream I/O Module unmolested. One side effect of this LPVID usage is that all packets coming from a host running Switch Independent Mode will be delivered to the upstream I/O Module tagged (if the OS sends an untagged packet, it will be sent to the I/O Module tagged with the value of the LPVID setting for that vNIC, and if the OS sends the packet tagged, it will be sent to the I/O module with whatever tag the OS had put on the packet). The following are some guidelines with regard to these the LPVID settings: Valid LPVID values are 2-4094 For Switch Independent mode, you must set the LPVID on all vNICs before a save will be allowed (this is an optional setting on Virtual Fabric vNIC mode)
  • 91. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 80 NIC Virtualization in IBM Flex System Fabric Solutions Each vNIC on a given physical port must use a unique LPVID - most cases, the partner NIC LPVIDs are set for the same value, but they could be set different Owing to how all packets will arrive tagged at the I/O Module, on the I/O Module side the interface must be tagged and if the host needs to use the currently assigned PVID/Native VLAN on the I/O Module side, then the tag-pvid option must be configured on this interface on the I/O Module. Another solution to this is to set the PVID/Native VLAN on the I/O Module for this port to some unused value and do not use the PVID/Native VLAN If bare metal PXE boot is not required on the host, one option is to set the LPVID values to some unused VLANs, and then only send tagged packets from the OS. The same restriction from the previous bullet (all packets tagged) still applies, but the end user no longer needs to keep track of which VLANs need to be tagged in the OS and which do not (just tag them all at all times). If bare metal PXE boot is required, then the LPVID for the vNIC that needs to PXE boot, must be set for the VLAN that the PXE packet is expected to arrive on Once the LPVID and bandwidth settings are properly set, before exiting the Controller configuration screen, the user must perform a save. Older versions of firmware would allow a user to escape out of this screen without saving and not provide any warning. The version of firmware used during the writing of this paper (and hopefully all newer versions) put up a warning as seen in Figure 4-14 if the changes have not been saved. Figure 4-14 Exiting Switch Independent Mode vNIC Controller Configuration screen without saving
  • 92. Chapter 4. NIC virtualization considerations on the server side 81 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Special settings for vNIC Virtual Fabric mode When enabled for the Multichannel mode of IBM Virtual Fabric mode vNIC, the only UEFI option is to configure the LPVID value (Bandwidth is controlled from the I/O Module). Unlike the Switch Independent mode, this setting is strictly optional. Also unlike the Switch Independent Mode, it is not necessary to set all vNICs LPVID values, and still save the config. If desired, only a single vNIC or any or all vNICs can have an LPVID assigned or remain at 0 (0 meaning the vNIC passes untagged traffic untagged to the upstream I/O module), and it will still be allowed. For any vNICs that do have an LPVID assigned, the operation is the same as for Switch Independent Mode (if the host sends an untagged packet, that packet will be sent to the I/O Module tagged with the value of the LPVID, if a host sends a tagged packet, the LPVID is ignored and the tag the host set gets sent to the I/O Module). As noted, if no LPVID value is assigned (default for Virtual Fabric vNIC mode), any untagged packet sent from the OS will be sent to the I/O module untagged, and arrive on the Native/PVID VLAN assigned to the I/O Module port connecting to this host. 4.1.4 Setting the Emulex virtual NIC settings back to factory default If necessary, it is possible to reset the Emulex NICs back to factory default. This not only resets all of the Bandwidth and LPVID settings, but also disables Multichannel for this ASIC back to factory default. The option to perform this factory default can be found by scrolling to the bottom of the Emulex NIC Selection screen, and select Erase Configuration and pressing the Enter key. An example of the results of pressing Enter on this selection is shown in Figure 4-15 on page 82. Important: As noted previously, after the Bandwidth and LPVID values are configured and saved on one NIC, this process must be completed for the other physical NIC of this pair (you must exit back to the Network screen and select the other NIC and drill back in to the LPVID and Bandwidth settings and make and save the changes). This is different from the settings in the Emulex NIC selection screen, where changes there, to things like Multichannel mode and Personality, are carried to all NICs on the common ASIC. Important: Until both the LPVID and Bandwidth values are properly set and saved, the vNICs will show as disconnected in the OS. Be sure to complete these operations on all Switch Independent Mode configured NICS before attempting to utilize these NICs in the OS. Important: Regardless of if you do or do not set any LPVID values, the Virtual Fabric mode of vNIC now requires you to go into the I/O module to complete the configuration process (enable vNIC, create vNIC groups and assign other variables). Until the I/O module step is done the OS will report the vNIC as not connected. See Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123 for examples for configuring the I/O module side for Virtual Fabric vNIC.
  • 93. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 82 NIC Virtualization in IBM Flex System Fabric Solutions Figure 4-15 Example of setting Emulex NIC back to factory default 4.2 Enabling virtual NICs via Configuration Patterns Although the primary method used in this document for enabling virtual NICs on the server is via the UEFI F1 setup path, there are other tools available to help automate this process. This section introduces one such tool – FSM configuration patterns. With certain Emulex NICs it is possible to automate the deployment of the NIC settings via the FSM. Some examples of items that can be automated via the FSM: Change the personality between NIC, FCoE, or iSCSI (assuming FoDs installed) Enable a desired mode of Virtual NIC, or disable it For the vNIC modes of virtual NICs that offer other configuration options, we can change those options, such as LPVID or Bandwidth Currently the Embedded 10Gb Virtual Fabric Ethernet Controller (LOM) and IBM Flex System CN4054 10Gb Virtual Fabric Adapter are supported with FSM configuration patterns. The most important aspect of utilizing configuration patterns, is the ability to push out changes to many servers, without having to perform the tedious process of manually going into F1 setup on every server that virtual NICs need to be changed on. After making any such changes with FSM Configuration Patterns the server must be reloaded for those changes to take effect.
  • 94. Chapter 4. NIC virtualization considerations on the server side 83 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm The process of configuring NIC settings via configuration patterns include the following steps: Creating port patterns that describe desired vNIC mode, protocols, and port settings Creating adapter patterns that describe adapter types and desired protocols Creating server patterns that describe node configuration including I/O adapter settings Deploying server patterns on x86 compute node targets Consider the following hypothetical example. You need to configure vNIC Switch Independent mode with Ethernet only vNICs on the integrated LOM and vNIC UFP mode on the CN4054 adapter installed in slot 2 of the x240 compute node. The first ASIC of the CN4054 adapter needs to be configured with Ethernet only vNICs, and the second ASIC requires both Ethernet and FCoE vNICs. By default, both LOM and CN4054 adapters are not configured with any vNICs, as shown in Figure 4-16. Figure 4-16 Initial NIC configuration PFA 12:0:0 and PFA 12:0:1 represent two physical LOM ports, PFA 22:0:0 and PFA 22:0:1 represent two physical ports on the first ASIC of the CN4054, and PFA 27:0:0 and PFA 27:0:1represent two physical ports on the second ASIC of the CN4054, for a total of six network ports.
  • 95. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 84 NIC Virtualization in IBM Flex System Fabric Solutions Opening server configuration patterns Perform the following steps to open server configuration patterns: 1. Launch FSM Explorer from the Home tab of the FSM interface, as shown in Figure 4-17. Figure 4-17 Launch FSM Explorer
  • 96. Chapter 4. NIC virtualization considerations on the server side 85 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 2. Open Configuration Patterns in the FSM Explorer interface by selecting Systems  Configuration Patterns, as shown in Figure 4-18. Figure 4-18 Open Configuration Patterns 3. Select Server Patterns to manage server configuration patterns, as shown in Figure 4-19. Figure 4-19 Server Patterns Creating port patterns In our example, we are creating three port patterns: vNIC switch independent mode with Ethernet only ports Universal fabric port (UFP) mode with Ethernet only ports Universal fabric port (UFP) mode with Ethernet and FCoE ports
  • 97. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 86 NIC Virtualization in IBM Flex System Fabric Solutions Perform the following steps to create desired port patterns: 1. Click New icon and select New Port Pattern, as shown in Figure 4-20. Figure 4-20 New Port Pattern
  • 98. Chapter 4. NIC virtualization considerations on the server side 87 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 2. In the New Port Pattern window shown in Figure 4-21, specify the port pattern name and select desired parameters and click Create. In our example, we are creating switch independent vNIC mode with Ethernet only network ports. For switch independent vNIC, we should also assign bandwidth parameters and VLAN tags (VLAN tags represent the LPVID setting as seen in F1 setup for the NICs, as shown in Figure 4-13 on page 79). Figure 4-21 Port pattern: Configuring vNIC switch independent mode
  • 99. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 88 NIC Virtualization in IBM Flex System Fabric Solutions 3. Repeat steps 1 and 2 for the remaining port configurations. In our example, we are creating two more port patterns: UFP mode with Ethernet only ports and UFP mode with Ethernet and FCoE ports, as shown in Figure 4-22 on page 88 and Figure 4-23 on page 89. Figure 4-22 Port Pattern: Configuring vNIC UFP mode
  • 100. Chapter 4. NIC virtualization considerations on the server side 89 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Figure 4-23 Port Pattern: Configuring UFP mode with FCoE
  • 101. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 90 NIC Virtualization in IBM Flex System Fabric Solutions 4. Configured patterns are displayed in the Server Patterns window, as shown in Figure 4-24. Figure 4-24 List of configured port patterns
  • 102. Chapter 4. NIC virtualization considerations on the server side 91 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Creating adapter patterns We are creating two adapter patterns: vNIC switch independent mode with Ethernet only ports for the integrated LOM vNIC UFP mode with Ethernet only ports for the first ASIC of the CN4054 and Ethernet and FCoE ports for the second ASIC of the CN4054 Perform the following steps to create adapter patterns: 1. Select New Adapter Pattern from the New Patterns drop-down menu, as shown in Figure 4-25. Figure 4-25 New Adapter Pattern
  • 103. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 92 NIC Virtualization in IBM Flex System Fabric Solutions 2. In the New Adapter Pattern window, specify the adapter pattern name, adapter type, operational mode and protocols, as shown in Figure 4-26. We are creating the pattern for the integrated LOM in vNIC switch independent mode with Ethernet only ports. Click Create. Figure 4-26 LOM adapter pattern settings
  • 104. Chapter 4. NIC virtualization considerations on the server side 93 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 3. Repeat steps 1 and 2 for the remaining patterns. In our example, we are configuring the pattern for the CN4094 in UFP mode with Ethernet only ports on the first ASIC (Configuration port group 1) and Ethernet and FCoE ports on the second ASIC (Configuration port group 2), as shown in Figure 4-27. Click Create. Figure 4-27 CN4054 adapter pattern settings
  • 105. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 94 NIC Virtualization in IBM Flex System Fabric Solutions Creating new server pattern We are creating the new server pattern that configures x240 compute node networking components as follows: Integrated LOM is set to vNIC switch independent mode with Ethernet only ports. The first ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode with Ethernet only ports. The second ASIC of the CN4054 expansion card installed in slot 2 is set to UFP mode with Ethernet and FCoE ports. Perform the following steps to create server patterns: 1. Select New Server Pattern from the drop-down menu as shown in Figure 4-28. Figure 4-28 Creating a new server pattern 2. Select Create a new pattern from scratch as shown in Figure 4-29 and click Next. Figure 4-29 Creating a new pattern from scratch
  • 106. Chapter 4. NIC virtualization considerations on the server side 95 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 3. Specify the pattern name and form factor as shown in Figure 4-30 and click Next. Figure 4-30 New Server Pattern Wizard: General 4. Leave Keep existing storage configuration selected as shown in Figure 4-31 and click Next. Figure 4-31 New Server Pattern Wizard: Local Storage
  • 107. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 96 NIC Virtualization in IBM Flex System Fabric Solutions 5. Expand Compute Node twistie, then click Add I/O Adapter 1 or LOM, as shown in Figure 4-32. Figure 4-32 Adding I/O adapter 1 or LOM 6. In the Add I/O Adapter window, select the adapter type (LOM) from the adapter list as shown in Figure 4-33, then click Add. Figure 4-33 Selecting the adapter type: LOM
  • 108. Chapter 4. NIC virtualization considerations on the server side 97 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 7. On the next screen, select previously configured adapter and port patterns, as shown in Figure 4-34. Click Add. In our example, we choose vNIC Switch Independent LOM adapter pattern and vNIC switch independent port pattern that we created earlier. Figure 4-34 Selecting adapter and port patterns 8. From the I/O Adapters screen (see Figure 4-32 on page 96) click Add I/O Adapter 2. 9. In the Add I/O Adapter window, select the adapter type (CN4054) from the adapter list as shown in Figure 4-35, then click Add. Figure 4-35 Selecting the adapter type: CN4054
  • 109. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 98 NIC Virtualization in IBM Flex System Fabric Solutions 10.On the next screen, select previously configured adapter and port patterns, as shown in Figure 4-36. In our example, we select previously configured vNIC UFP FCoE CN4054 adapter pattern and vNIC UFP and vNIC UFP FCoE port patterns. Click Add. Figure 4-36 Selecting adapter and port patterns: CN4054 11.The configured adapter settings are summarized in Figure 4-37. Click Next. Figure 4-37 New Server Pattern Wizard: I/O Adapters summary
  • 110. Chapter 4. NIC virtualization considerations on the server side 99 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 12.Leave Keep existing boot mode selected as shown in Figure 4-38 and click Save. Figure 4-38 New Server Pattern Wizard: Save 13.You can see the created server pattern in the list of patterns, as shown in Figure 4-39. Figure 4-39 Newly created server pattern
  • 111. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 100 NIC Virtualization in IBM Flex System Fabric Solutions Deploying server pattern Perform the following steps to deploy a server pattern: 1. Right click a server pattern that you are going to deploy and select Deploy from the context menu, as shown in Figure 4-40. Figure 4-40 Deploying server pattern 2. Select target nodes (we selected x240_03) as shown in Figure 4-41, then click Deploy. Figure 4-41 Selecting target compute nodes
  • 112. Chapter 4. NIC virtualization considerations on the server side 101 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm 3. Click Deploy in the confirmation window appeared. A new job is started and the confirmation is displayed as shown in Figure 4-42. Click Close. Figure 4-42 Deployment job start confirmation 4. You can check the job status in the Jobs pod by clicking Jobs  Active and moving the mouse pointer other the job name, as shown in Figure 4-43. Figure 4-43 Server Profile activation job status
  • 113. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 102 NIC Virtualization in IBM Flex System Fabric Solutions 5. Click Server Profiles on the left side of the Configuration Patterns window (see Figure 4-43 on page 101). You see the profile deployment status in the Profile Column, as shown in Figure 4-44. Figure 4-44 Profile activation status 6. When profile activation completes successfully, the profile status changes to Profile assigned, as shown in Figure 4-45. Figure 4-45 Profile assigned Server NICs are now configured. Now, let’s have a look at what changed in the UEFI for the network setup.
  • 114. Chapter 4. NIC virtualization considerations on the server side 103 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Go to UEFI by pressing F1 during the compute node boot phase, then select System Settings  Network. Figure 4-46 and Figure 4-47 on page 103 show vNICs configured on the LOM and the CN4054 adapter using configuration patterns. Figure 4-46 Network Device List (Part 1) Figure 4-47 Network Device List (Part 2)
  • 115. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 104 NIC Virtualization in IBM Flex System Fabric Solutions Select Onboard PFA 12:0:0 (Integrated LOM) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 4-48. LOM is configured with vNIC Switch Independent mode and NIC personality (Ethernet only ports). Figure 4-48 LOM vNIC configuration Go back to the network device list by pressing Esc two times and select Slot PFA 22:0:0 (the first ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 4-49. The first ASIC is configured with vNIC UFP mode and NIC personality (Ethernet only ports). Figure 4-49 CN4054 vNIC configuration: First ASIC
  • 116. Chapter 4. NIC virtualization considerations on the server side 105 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Go back to the network device list by pressing Esc two times and select Slot PFA 27:0:0 (the second ASIC of the CN4054) from the device list, press Enter two times, and verify vNIC parameters, as shown in Figure 4-50. The second ASIC is configured with vNIC UFP mode and FCoE personality (Ethernet and FCoE ports). Figure 4-50 CN4054 vNIC configuration: Second ASIC See the following link for more details on utilizing FSM configuration patterns: http://www.redbooks.ibm.com/abstracts/sg248060.html 4.3 Utilizing physical and virtual NICs in the OSes Regardless of if the user is using virtual NICs or physical NICs, most operating systems (OSes) have various ways to utilize those NICs, either as individual links or in teamed/bonded modes for better performance or high availability (or both), as shown in Figure 4-51 on page 106.
  • 117. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 106 NIC Virtualization in IBM Flex System Fabric Solutions Figure 4-51 NIC teaming/bonding examples This section provides guidance on various aspects of the NIC teaming/bonding usage by the operating system. 4.3.1 Introduction to teaming/bonding on the server The terms bonding and teaming are different words for the same thing. In general, in Linux it is referred to as Bonding, in Windows and VMware it is referred to as Teaming. Regardless of the term, these technologies provide a way to allow two or more NICs to appear and operate as a single logical interface, for the purpose of either high availability or increased performance (all modes of teaming/bonding provide high availability, some modes also provide increased performance via load balancing). Each OS has their own way of providing these services, with most having native built in support, but some older Operating Systems still require a third party application to provide this functionality. All teaming/bonding modes come in two primary types, Switch Dependent mode, and Switch Independent mode, discussed here in more detail. NIC Teaming/Bonding Examples (not all inclusive) Active/Standby Active/Active (Aggregation) Switch Dependent Mode (NO Aggregation) Switch Independent Mode Linux Mode 5 or 6 Linux Mode 2 or 4 Linux Mode 1 ESX route based on originating port ID or route based on source MACESX Active/Standby ESX route based on IP hash Windows 2012 Static teaming or LACP Windows 2012 Switch Independent mode Windows 2012 Active/Standby Windows 2003/2008 Broadcom or Emulex Smart Load Balance Windows 2003/2008 Broadcom or Emulex Link aggregation, generic trunking or LACP Windows 2003/ 2008 Broadcom Active/Standby Windows 2003/2008 Emulex Failover
  • 118. Chapter 4. NIC virtualization considerations on the server side 107 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Switch Dependent modes of teaming/bonding These are any teaming/bonding modes in the OS that also require a specific architecture in the connecting switches, and special configurations in these upstream switches (in other words, they are dependent on the upstream switch design and configuration to operate correctly). Some comments on these modes: All of these modes are some form of link aggregation, either static aggregation or dynamic aggregation (Link Aggregation Control Protocol – LACP). Most OS’s support both an LACP and a static form of teaming/bonding, and these are all forms of active/active teaming/bonding, usually load balancing traffic on a per-session basis (what constitutes a session is usually controlled by settings on each side of the device supporting this mode of teaming/bonding and is beyond the scope of this document) Any teaming/bonding mode that utilizes either static or LACP aggregation, requires that all ports in that team/bond, go to a single upstream switch, or a group of switches that can appear as a single switch to the NIC teaming (for example, switches running Cisco vPC or IBM vLAG, or stacked switches). Any of these modes also must have a corresponding mode of aggregation configured on the upstream I/O Modules to work properly - this is what makes them Switch Dependent Figure 4-52 shows some examples of Switch Dependent mode teaming/bonding and their relationship to the upstream network connections. Figure 4-52 Examples of architectures with Switch Dependent modes of teaming/bonding Important: Currently using any aggregation based mode of teaming/bonding is not supported on a server if any of the virtual NIC options (Switch Independent mode, Virtual Fabric vNIC mode, or UFP) have been implemented. This is based on the current limitation that aggregation on IBM switches is on the physical port, not the logical port. An upcoming release of code should permit aggregations on UFP vPorts. Stand Alone SW1 Stand Alone SW2 Compute Node NIC0 NIC1 vLAG SW1 vLAG SW2 Compute Node NIC0 NIC1 Stacked SW1 Stacked SW2 Compute Node NIC0 NIC1 Stand Alone Switch Compute Node NIC0 NIC1 The symbol represents some form of aggregation Switch Dependent Mode Supported Switch Dependent Mode Supported Switch Dependent Mode Supported Switch Dependent Mode Not Supported
  • 119. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 108 NIC Virtualization in IBM Flex System Fabric Solutions Switch Independent modes of teaming/bonding These are teaming/bonding modes that do not require any form of aggregation to be configured on the switch and thus are not dependent on any special switch side design or configuration (just ensure all ports connecting to the team carry a common set of VLANs and any other normal switch settings the host requires). Some comments on these modes:. Some Switch independent modes offer simple Active/Standby NIC teaming, where only the active NIC is used, and the standby NIC comes into play only if the active NIC fails All operating systems offer more advanced kinds of server side teaming that deliver Active/Active NIC usage by attempting to load balance the NICs in the team in such a way that only the server knows or cares about this load balancing (in turn, the switch side of this team/bond can load balance the return traffic based on how the host uses MACs to send traffic out) Attempting to configure some form of aggregation on the I/O Module ports facing the NICs in Switch Independent mode will almost always not work and lead to issues Figure 4-53 shows examples of Switch Independent mode teaming/bonding and their relationship to the upstream network connections. Figure 4-53 Examples of architectures with Switch Independent modes of teaming/bonding Important: In Figure 4-53 it always shows some sort of path between the pair of upstream switches, and never two switches isolated from one another. Although that path may be directly between the upstream pair (as shown here), or may be somewhere further up in the architecture, it must be present to ensure a failover path between points in the event of a path fault. See the section titled The need for end to end paths between NICs in a team later in this chapter for more detail. Stacked SW1 Stacked SW2 Compute Node NIC0 NIC1 Stand Alone Switch Compute Node NIC0 NIC1 vLAG SW1 vLAG SW2 Compute Node NIC0 NIC1 vLAG SW1 vLAG SW2 Compute Node NIC0 NIC1 Switch Independent Mode Supported Switch Independent Mode Supported Switch Independent Mode Supported Switch Independent Mode Not Supported The symbol represents some form of aggregation
  • 120. Chapter 4. NIC virtualization considerations on the server side 109 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Understanding the terms Active and Standby with teaming/bonding The use of the phrases Active/Standby, Active/Passive, Active/Backup, and Active/Active can occasionally be misunderstood and confusing. This section attempts to clarify these terms. Active/Standby, Active/Passive, and Active/Backup These are all different names for the same thing, a NIC in a team/bond selected to be active (passing traffic), and the other NIC is put into a standby state (not passing traffic), and only used in the event the active NIC goes down. In some cases the team/bond might have multiple active NICs and only a single standby NIC, or the reverse (one active NIC and multiple standby NICs), the point being that one or more NICs in this mode are unused for any traffic until needed. Most users understand the operation of these modes of teaming, but there is occasionally some confusion in the context of how the connecting I/O modules are utilized. The I/O modules themselves are not in any sort of special Active/Standby config. I/O modules supporting servers running Active/Standby will both be active, simply forwarding traffic as it is received from the server, following the rules of that I/O module (usually Layer 2 switching based on MAC addresses). So the I/O modules are not in any sort of Active/Standby mode and depend on the servers to decide which I/O module to utilize (based on the NIC selected as active in the OS team/bond). Since the server admin can control what NICs are active or standby, it is possible to configure some servers using a NIC going to I/O module bay 1 as the active NIC, and other servers in the same Flex System chassis using a NIC pointing to the other I/O module in bay 2, and in doing so, thus achieve some form of load balancing (albeit a chassis-based form of load balance). For example, the server admin could configure half of the servers to utilize the NIC going to I/O module bay 1 for the active NIC, and the other servers using the I/O module going to bay 2 for the active NIC. One possible down side for this type of Active/Standby per-chassis load balance is that any server within this Flex System chassis that is using I/O module bay 1, and has to talk to another server in the same chassis using bay 2 as the active path, usually must have that traffic travel to the upstream network and back down to get between the two I/O bays and their associated active server NICs. Overall, these Active/Standby modes tend to be the simplest to implement, and require no special switch side configuration. But provide only high availability (no load balancing for a single server), and thus are wasteful of over all bandwidth available to a given server. Active/Active While most agree on the meaning of the phrase Active/Standby, the phrase Active/Active is frequently a point of contention when parties do not define what the term Active means. In this document, the term active means the OS is free to actively use the NIC in any way that agrees with the teaming/bonding mode selected in the OS, and does not leave that NIC in some sort of standby mode. Within the term Active/Active, there are both Switch Dependent modes and Switch Independent modes of teaming/bonding. Important: The term Switch Independent has been used in this document in relation to a form of virtual NIC that operates independently of the I/O Module, and now as a mode of teaming/bonding on the server that is also independent of the I/O Module. Although they are both independent of the I/O Module, other then the name, and this independence, they are unrelated features.
  • 121. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 110 NIC Virtualization in IBM Flex System Fabric Solutions The following are some comments on Switch Dependent modes of Active/Active and some examples of these modes: Like all Switch Dependent modes of teaming/bonding, any of these Active/Active modes use some form of aggregation and requires an accompanying upstream network architecture and I/O Module configuration to support this aggregation on the server side NICs. Today these aggregation modes are exclusively either LACP or static aggregation These modes use the aggregation hash algorithm to determine what NIC is used for a session of traffic, and a session of traffic may be based on MAC address, IP address and/or other components of the packets being transferred The outbound path used for this mode of teaming/bonding is decoupled from the return traffic, in that each side of the aggregation decides on their own hash what NIC to use for a given session These modes provide a higher chance of better over all load balance, but do not guarantee any load balancing. For example, for a given session if all traffic is between just two hosts (for instance, a large file copy from one host to another) that traffic will generally only use a single NIC in the team. The return traffic will use whatever link it has hashed to by the switch side hash, but will also only pick a single link for this single session for that return traffic. Meaning for a given session, a sending device can only utilize the bandwidth of a single NIC in the team. As noted previously, these aggregation based modes of teaming/bonding are not supported today when using any of the virtual NIC features available from the Emulex NIC. That means that if the server has been configured for UFP, VF mode vNIC, or Switch Independent mode vNIC, that these teaming/bonding modes should not be implemented. Some examples of Active/Active teaming modes in this category for various OSs are: – Linux: Bonding mode 2 – Static aggregation – Linux: Bonding mode 4 – LACP aggregation – ESX vSwitch teaming mode Route based on IP hash – Static aggregation – ESX dvSwitch teaming mode Route based on IP hash – Static or LACP, depending on LACP setting enabled or disabled in the dvSwitch The following are some comments on Switch Independent modes of Active/Active, along with some examples: Like all Switch Independent modes of teaming/bonding, there is no special switch side architectures or configurations, and the switch should not be configured for any form of aggregation These modes use some server side decision making process to select what NIC to use for what session. In this case, a session is often all traffic from a given VM, or a given process in a bare metal OS, or destination IP or MAC and so on. The point being that the server decides how it will load balance the traffic over the NICs The outbound path used for this mode of teaming/bonding is usually not decoupled from the inbound traffic, in that in most cases, what ever NIC is used to send outgoing traffic from the host, the switch side will use the same NIC/link for any return traffic (the switch bases its decision on the MAC learned when the host sent a packet, using that MAC to return the traffic on the link learned). These modes can provide quite satisfactory load balancing, and are not dependent on having a specific switch architecture or configuration above the host, as the Switch Dependent modes do, and are available in al major Operating Systems Unlike the aggregation based modes of active/active teaming/bonding, these switch independent modes of active/active teaming/bonding work fine with any of the virtual NIC functions available in the Emulex adapters.
  • 122. Chapter 4. NIC virtualization considerations on the server side 111 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Some examples of Active/Active Switch Independent modes of teaming/bonding in various OSes are: – Linux: Bonding mode 5 – Adaptive Transmit load balance – Linux: Bonding mode 6 – Adaptive load balance – ESX vSwitch teaming mode Route based on originating virtual port ID – ESX dvSwitch teaming mode Route based on source MAC hash In general, the Switch Dependent modes of Active/Active bonding/teaming, have a greater potential (but no guaranteed they will) of over all better load balance in the team/bond, but have added complexity, and only support certain upstream network architectures, and require the server team to coordinate with the network team to match the aggregation configurations correctly. Where as Switch Independent modes of Active/Active do not require any special upstream architecture/switch configuration, and can be completely controlled and configured from the server side of the equation, with no need for the server team to coordinate with the network team (except for, of course, what VLANs to utilize and how (tagged or untagged), which is always necessary, with or without any sort of teaming/bonding modes). Link and path fault detection in teaming/bonding All teaming/bonding solutions need a way to know if a NIC in the team/bond is available for use. Most use simple link up/down as the primary method. Some add a layer beyond simple link up/down to attempt to detect remote failures beyond the direct link (upstream path failures). In general, most of these remote fault methods use some sort of arp or ping or probe packet to determine if the path to the other NIC or some upstream device is available, and if not, take that NIC out of service. Some examples of non-link fault failure detection technologies: Linux arp-monitoring VMware Beacon probing Broadcom Livelink (third party teaming tool) All of these remote fault methods have their limitations and can be prone to false positives (reporting a NIC unavailable when it can still service packets). Some examples of issues with these remote fault detection methods: In a large DataCenter with potentially 1000’s of hosts using Linux arp-monitoring and constantly ARPing the default gateway, could eventually become (or at least be perceived as) a Denial of Service attack on the default gateway If Beacon probing in ESX is used on a two-NIC team, if it fails with both NICs still in an up state (for example, a path fault not directly at the host, but somewhere in the upstream L2 network) it will not know which NIC is having a path issue and will begin to blast all packets out both ports, potentially overloading the network and creating new issues (owing to this, VMware does not recommend using Beacon probing with two NIC teams, but it will let you configure it on a two NIC team). Rather than using any of these OS-based remote fault detection methods, it is usually preferred to utilize the Failover feature of IBM switches. Other vendors often also support a similar failover feature, such as Cisco’s Link State Tracking See Chapter 5, “Flex System NIC virtulization deployment scenarios” on page 123 for some examples of Failover configurations in a PureFlex System environment. The need for end to end paths between NICs in a team For teaming to work properly, there must be an end to end layer 2 path between the two (or more) NICs in the team. In other words, If you have a pair of teamed NICs, and a host needs
  • 123. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 112 NIC Virtualization in IBM Flex System Fabric Solutions to use VLAN 10, then VLAN 10 must be carried to both NICs, and that VLAN 10 must have an external path in the upstream network to connect these two NICs together. This is required for both failover, and in some configurations, load balancing and normal traffic, and is true regardless of teaming type (switch dependent or switch independent modes). This also has implications when using multi-switch aggregations (i.e. vPC or vLAG) In a typical vLAG/vPC environment, a user might have a pair of enclosure switches, running a vLAG aggregation toward the upstream network. Since the upstream switch thinks this pair of enclosure based switches are one switch, a host on the enclosure might send a packet that goes up on a port on one enclosure switch, but the response comes down on a port on the other enclosure switch (based on the other sides load balancing transmit of packets). Owing to this, you must ensure that not only is that VLAN carried on all ports to the server team, and all ports to the upstream aggregation, but it must also be carried on the ISL links of the vLAG/vPC. If this were a switch dependent mode of teaming (i.e. aggregation) this VLAN on the ISL is needed in the event of failover. If this is a switch independent mode of teaming, then this VLAN on the ISL is required for both failover and normal communications. 4.3.2 OS side teaming/bonding and upstream network requirements This section looks at the most common NIC teaming and bonding modes for various OSs and relates them to requirements for the upstream connecting network. Linux bonding Linux bonding has evolved over the years to become easier to deploy and more robust. This section discusses the various modes of bonding available on most Linux implementations. Most flavors of Linux today come with the bonding module prepackaged, but some versions still have to have it installed before bonding can be implemented. Linux offers many different modes of bonding, and not all modes of bonding exist in all flavors of Linux. But most implementations of Linux support bonding modes 0 through 6, which will be discussed here. Linux bonding offers two primary ways to determine if a link is available for server use mii-mon – This is simple link status up/down, and is the default for bonding ARP monitor – Sends an ARP packet to a specified device and expects a response There are some helpful documents available on the web that explain bonding, but It is important to note that much of the Linux bonding documentation has been written by server admins, not network admins. Thus some of the terms used in these documents and help files can be confusing to a network admin. One of the better places to learn about Linux bonding is the following link: https://www.kernel.org/doc/Documentation/networking/bonding.txt
  • 124. Chapter 4. NIC virtualization considerations on the server side 113 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Table 4-1 provides a cross reference between Linux OS side modes of bonding and their associated switch side requirements when using various Linux bonding modes: Table 4-1 Linux Bonding modes and their associated switch side dependences if any Some comments on Table 4-1: Type I = A switch Independent mode of bonding Type D = A switch Dependent mode of bonding Mode 0 may lead to out of order packet reception on the receiving device (this mode is usually only used in some very specific environments, for example, where out of order packet reception is not an issue) Mode 2 is most aligned with the polices of typical static aggregation on a switch Mode 3 duplicates all packets on each port (this is not a common selection and is rarely utilized). It could also potentially be used without static aggregation, if each NIC in the bond went to different physical networks or devices upstream Mode 4 is aligned with the polices of LACP aggregation on a switch Modes 1, 5 and 6 do not require any sort of aggregation configured on the switch side Linux side bonding modes and comments Type Switch side requirements and comments Bond Mode Comments Type of Agg Comments 0 Round Robin Transmit – Also called balance-rr - Xmit load balance per packet D Static Xmit load balance based on hash setting of the switch 1 Active/Standby – No load balancing – just fault tolerant I None No load balancing of traffic 2 XOR of hash – Also called balance-XOR - Xmit load balance based on setting of xmit_hash_policy, Xmit per session load balance D Static Xmit load balance based on hash setting of the switch 3 Broadcast – Xmits everything out all member interfaces, No load balancing, just fault tolerant D Static Xmit load balance based on hash setting of the switch - can work without switch side aggregation support - see note below 4 LACP – Also called 802.3ad - Xmit load balance based on setting of xmit_hash_policy, Xmit per session load balance D LACP Xmit load balance based on hash setting of the switch 5 Adaptive Transmit Load balance – Also called balance-tlb - Xmit based on current load of NICs in bond I None According to Linux documentation, return traffic is not load balanced (only goes to slave NIC) 6 Adaptive Load balance – Also called balance-alb - Xmit per session load balance I None Load balances return traffic to host based on MAC usage of the host side
  • 125. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 114 NIC Virtualization in IBM Flex System Fabric Solutions VMware ESX teaming VMware ESX offers teaming on its virtual switches including both the stand alone vSwitch and the distributed vSwitch (dvSwitch). The forms of teaming available vary slightly between an ESX stand alone vSwitch and the distributed dvSwitch, with the stand alone vSwitch offering the following four options: Route based on originating virtual Port ID (this is the default - load balances on a per-VM basis) Route based on IP hash (this is a static aggregation) Route based on source MAC hash (similar to the default) Use Explicit failover order (high availability only, no load balancing) The dvSwitch offers some of the same modes, but with more options. The following is the list of teaming options available on the dvSwitch: Route based on originating virtual port (same as stand alone vSwitch) Route based on IP hash (defaults to static aggregation) (same as stand alone vSwitch) Route based on IP hash (Optionally configured for LACP) Route based on source MAC hash (same as stand alone vSwitch) Route based on physical NIC load (attempts to take into account the load on a NIC as they are allocated to the VMs) Use Explicit failover order (same as stand alone vSwitch) VMware offers two modes of detecting when a path is down: Link Status – this is simple link up/link down and is the default Beacon Probing – Only useful in vSwitches/dvSwitches with more then 2 NICs – Do not use beacon probing on vSwitches/dvSwitches with only two NICs – If the upstream switch offers a failover option (as all of the 4093 models do) it is encouraged to use that over Beacon Probing An older document that does a very good job of explaining VMware ESX networking and the kinds of teams supported can be found at the following link (does not include the modes available in the dvSwitch): http://www.vmware.com/files/pdf/virtual_networking_concepts.pdf Some good information specific to the dvSwitch can be found in the following link: http://www.vmware.com/files/pdf/vsphere-vnetwork-ds-migration-configuration-wp.pdf
  • 126. Chapter 4. NIC virtualization considerations on the server side 115 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Table 4-2 provides a cross reference between OS side modes of teaming and their associated switch side requirements when utilizing VMware ESX teaming: Table 4-2 VMware teaming modes and their associated switch side dependences if any Some comments on Table 4-2 on page 115 Type I = A switch Independent mode of teaming Type D = A switch Dependent mode of teaming When NICs are added to a vSwitch, they can be assigned to active or standby rolls independently of the mode of teaming assigned. vSwitch teaming modes can be overridden by vSwitch PortGroup teaming settings Windows Server teaming Teaming in a Windows Server environment can be quite varied. For Windows Server 2008 and 2003, any teaming was only provided by a third party application provided by the NIC vendor. Starting in Windows Server 2012 there is a choice of using either a vendors third party application, or built in teaming provided by Windows 2012. For Windows versions (2012) that have native teaming ability, it is usually best to use the built in native teaming, and only install a third party vendor if there is some special feature that is needed that is not available by the built in versions of teaming in Windows. VMware side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Route based on originating virtual port ID Load balances NICs in vSwitch on a per-VM basis – this is the default teaming mode for an ESX vSwitch I None Load balances return traffic to host based on MAC usage of the host side Route based on IP hash This is a static aggregation on the ESX links in the stand alone vSwitch. When used on a dvSwitch portgroup and the uplinks configured for LACP, this is an LACP aggregation – see below) D Static Xmit load balance based on hash setting of the switch Route based on source MAC hash This is similar to the default teaming mode (per-VM) except it selects the outbound NIC based on the source MAC, and not the originating virtual port ID I None Load balances return traffic to host based on MAC usage of the host side Use explicit failover order Always use the highest order uplink from the list of Active adapters that is up. No load balancing I None No load balance LACP LACP – only available on Distributed vSwitch – can only configure from vSphere Web client (not the traditional vSphere client)– When configured, all PortGroups using this uplink pair must be set to Route based on IP hash D LACP Xmit load balance based on hash setting of the switch Route based on physical NIC load Chooses path based on physical NIC load – only available on Distributed vSwitch (dvSwitch) I None Load balances return traffic to host based on MAC usage of the host side
  • 127. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 116 NIC Virtualization in IBM Flex System Fabric Solutions Teaming using the native modes available in Windows Server 2012 As noted, Windows Server 2012 offers built in NIC teaming, also referred to as LBFO (Load Balance/Failover) in some of their documentation. Microsoft refers to their teaming options as either switch independent mode, or switch dependent mode, with the same meaning we have been applying in this chapter. When selecting the teaming mode in Windows 2012, the user is presented with three options: Static Teaming Also referred to as generic aggregation in some Microsoft documentation, represents a static aggregation and is switch dependent, requiring a static aggregation to be configured on the switch. Switch Independent As the name implies, represents a switch independent mode of teaming (no aggregation configuration needed on the switch). How it utilizes the NICs for load balance is a separate setting. LACP Also referred to as 802.1AX in some Microsoft documentation (AX being the latest IEEE standard for LACP, replacing the older 802.3ad LACP standard), is a switch dependent mode of teaming that requires LACP be configured on the upstream switch. Separate from the teaming mode, a user can then select load balance method. In the initial versions of Windows Server 2012, two types of load balance options existed. Address hash and Hyper-V Port. Address hash utilizes information from the IP addresses in the packets to determine load balance. Hyper-V port attempts to load balance on a per vPort basis (not related to the term vPort as configured in IBM UFP virtual NIC settings). As of Windows Server 2012 R2, Microsoft has added a third load balance option, dynamic load balance, that attempts to also factor in NIC utilization to distribute the loads. Details on this and other aspects of teaming load balancing for Windows Server 2012 can be found in a document available from the following location: http://www.microsoft.com/en-us/download/confirmation.aspx?id=40319 As noted, Windows Server 2012 also still allows third party NIC vendors teaming applications, but states that it is strongly recommended that no system administrator ever run two teaming solutions (built in Windows teaming and third party vendor teaming) at the same time on the same server. So use built in, or use third party, but never use both at the same time. Another good Microsoft document explaining Windows Server 2012 NIC teaming can be found at the following link: http://www.microsoft.com/en-us/download/details.aspx?id=30160
  • 128. Chapter 4. NIC virtualization considerations on the server side 117 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Table 4-3 provides a cross reference between Windows Server 2012 OS side modes of teaming and their associated switch side requirements: Table 4-3 Windows 2012 teaming modes and their associated switch side dependences if any Some comments on Table 4-3: Type I = A switch Independent mode of teaming Type D = A switch Dependent mode of teaming Both static teaming and LACP modes can also be set to use one of the three available hash methods (Address hash, Hyper-Vport, and if 2012 R2, Dynamic) Active/Standby teaming is available as a function of building one of the above mode teams, and then choosing to put a a member of the team into standby Teaming using third party vendor applications for Windows As noted, for Windows Server 2008 or Windows Server 2003, a vendor supplied application is required to implement any form of NIC teaming. Which vendor application you chose is mostly based on the vendor NIC in use on the server. This section discusses two of the more common NIC vendors, Broadcom and Emulex, and their tools. Broadcom provides an application named Broadcom Advanced Server Program (BASP) that runs inside of the Broadcom Advanced Control Suite (BACS) to provide teaming services in Windows 2003/2008. It supports many of the Broadcom NICs as well as some Intel NICs. for a list of supported NICs and an introduction to this product, see the following link: http://www.broadcom.com/support/ethernet_nic/management_applications.php Windows 2012 side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Switch Independent (all load balancing is controlled by the server side) Load balance options are set independent of teaming mode selection. Available load balance options are: Address hash - attempts to load balance based on IP addressing information in the packets Hyper-V port - This is a per-VM load balance and load balances the NICs on a per-VM basis Dynamic (only with R2 or later) - Attempts to assign outbound flows based on IP addresses, TCP ports and NIC utilization I None Load balances return traffic to host based on MAC usage of the host side Static Teaming Microsoft uses the names Generic Trunking and IEEE 802.3ad draft v1 in some of their documentation to refer to a static aggregation D Static Xmit load balance based on hash setting of the switch LACP Microsoft uses the name IEEE 802.3AX LACP in some of their documentation to mean an LACP aggregation D LACP Xmit load balance based on hash setting of the switch
  • 129. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 118 NIC Virtualization in IBM Flex System Fabric Solutions Broadcom BASP supports four primary teaming modes as noted in Table 4-4, and also has a form of remote path failure detection, known as LiveLink. Livelink requires an IP address on the team interface and separate IP addresses on each of the physical NICs. Like all forms of NIC teaming remote path detection discussed in this document, a more robust choice is usually to make use of the switch side Failover feature. A good document on using BASP can be found at the following link: http://www.broadcom.com/docs/support/ethernet_nic/Broadcom_NetXtremeII_Server_T7.8 .pdf Table 4-4 provides a cross reference between OS side modes of teaming and their associated switch side requirements, when utilizing Windows and the Broadcom Advanced Server Program: Table 4-4 Broadcom third party teaming modes and their associated switch side dependences if any Some comments on Table 4-4: Type I = A switch Independent mode of teaming Type D = A switch Dependent mode of teaming This BASP tool can also be used to create VLAN tagged interfaces Emulex is another vendor that offers third party teaming for Windows Server 2003 and 2008 platforms. Emulex refers to their teaming application as OneCommand NIC Teaming and VLAN Manager, and also offers four primary modes of teaming, as noted in Table 4-5. Emulex also uses the terms switch independent and switch dependent modes of teaming in their documentation, which can be found at the following link: http://www-dl.emulex.com/support/windows/windows/240005/nic_teaming_manager.pdf Windows/Broadcom side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Active/Standby Active NIC carries all traffic until it fails, then standby NIC takes over. No load balancing I None No load balance Smart Load Balance (SLB) – with or without auto failback Attempts to load balance based on IP flows. With failback enabled, if a NIC that had failed comes back up, teaming will attempt to switch traffic back to that NIC I None Load balances return traffic to host based on MAC usage of the host side Generic Trunking (FEC/GEC)/802.3 ad-Draft Static This is a typical static aggregation implementation. Broadcom also refers to this as (FEC/GEC)- 802.3ad-Draft Static D Static Xmit load balance based on hash setting of the switch Link Aggregation (802.3ad) This works with LACP aggregations D LACP Xmit load balance based on hash setting of the switch
  • 130. Chapter 4. NIC virtualization considerations on the server side 119 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Table 4-5 provides a cross reference between OS side modes of teaming and their associated switch side requirements, when utilizing Windows and the Emulex OneCommand application. Table 4-5 Emulex third party teaming modes and their associated switch side dependences if any Some comments on Table 4-5: Type I = A switch Independent mode of teaming Type D = A switch Dependent mode of teaming The Emulex tool can also be used to create VLAN tagged interfaces 4.3.3 Discussion of physical NIC connections and logical enumeration From a physical perspective, all physical NICs are hard wired to a specific I/O Module bay and specific port on those I/O Modules in the Flex System chassis. Examples of these fixed physical connections can be seen in 2.1, “Enterprise Chassis I/O architecture” on page 20. Any virtual NICs that are created on top of a physical NIC can naturally only connect to wherever the physical NIC it was created from connects to. Although a given physical NIC always goes to a specific physical I/O Module and port, how the OS enumerates (names) these NICs can be confusing and downright illogical at times. Knowing what OS enumerated NIC physically rides on top of what physical NIC (and thus where it connects to what I/O Module in the Flex System) is important for the server administrator. Understanding this logical to physical mapping allows proper NIC selection when building teamed/bonded designs. If we do not understand this relationship, and build a team or bond of two NICs that happen to go to the same switch, although providing increased bandwidth and NIC redundancy, it would not provide redundancy in the event of an I/O Module failure. As an example of OS enumeration, Figure 4-54 on page 120 represents a Compute Node in a PureFlex System environment, not configured for any virtual NIC technology, and how VMware ESX might typically enumerate those physical NICs. Windows/Emulex side teaming modes and comments Type Switch side requirements and comments Mode of teaming Comments Type of Agg Comments Failover (FO) Simple Active/Standby - no load balancing I None No load balance Smart Load Balance (SLB) - AKA just “Load Balance” Attempts to load balance based on IP hash setting I None Load balances return traffic to host based on MAC usage of the host side Generic trunking - Link aggregation static mode (802.3ad static aggregation) This is a typical static aggregation implementation D Static Xmit load balance based on hash setting of the switch Link Aggregation Control Protocol (LACP) This works with LACP aggregations D LACP Xmit load balance based on hash setting of the switch
  • 131. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 120 NIC Virtualization in IBM Flex System Fabric Solutions Figure 4-54 Dual port physical NIC enumerated in a VMware ESX host As can be seen, the OS enumerated NIC vmnic0 has been associated with physical NIC 0 that connects to the I/O Module in bay 1, and the OS enumerated vmnic1 has been associated with the physical NIC1 that connects to I/O Module bay 2. In this case, putting these two NICs in a team/bond would provide full redundancy, straight forward and orderly. If we then look at an ESX host that had been installed when the NICs had been set for one of the virtual NIC modes we might see what is represented in Figure 4-55 (NICs configured for virtual fabric mode with no iSCSI or FCoE personality selected). Figure 4-55 Dual port physical NIC in a virtual NIC mode enumerated in a VMware ESX host I/O Module 1 I/O Module 2 Compute Node In the PureFlex running VMware ESX Physical 10G Links vmnic0 Physical NIC 0 Physical NIC 1 vmnic1 Without vNIC or UFP enabled – Physical NICs as seen by the OS I/O Module 1 I/O Module 2 Compute Node In the PureFlex running VMware ESX Physical 10G Links vmnic0 vmnic2 vmnic4 vmnic6 Physical NIC 0 Physical NIC 1 vmnic1 vmnic3 vmnic5 vmnic7 With vNIC or UFP enabled Ͳ Virtual NICs as seen by the OS
  • 132. Chapter 4. NIC virtualization considerations on the server side 121 Draft Document for Review July 18, 2014 10:18 pm NIC virtualization considerations - Server side.fm Notice the enumeration sequence as seen in Figure 4-55 on page 120 is also very orderly, and could be readily utilized to determine best pairs of NICs for teaming/bonding (for example, vmnic0 and vmnic1 in a team/bond, vmnic2 and vmnic3 in a team/bond, and so on) to provide I/O module redundancy. Although this orderly enumeration is frequently the case, it is not always how it works out (true of all operating systems, not just ESX shown in this example). In some cases, the enumeration may be in a completely different order then might be expected. For example, if a user had installed VMware when virtual NICs were enabled, and then disabled the virtual NICs and booted back up into the OS, the remaining physical NICs may not be sequential or logical in the OS. In the case of underlying NIC configuration changes, one way (although disruptive) to force the OS to re-enumerate the NICs in proper order is to reinstall the OS, and let it rediscover the current NIC structure. Perhaps simpler is to rename the NICs in the OS (some OS’s provide this ability). Even with a reinstall though, there are times when the OS just seems to want to provide less then obvious enumerations of the NICs, and this can be problematic. How can a user determine what OS named NIC is mapped to what physical NIC and I/O Module? There are several ways to help figure out what OS NIC is associated with what physical NIC. One of the simpler is to go into the I/O Module and shut down one of the physical ports toward the Compute Node, and see which NICs the OS then report as disconnected. Of course this is a disruptive operation so not unnecessarily a good choice in a production environment. Perhaps a less disruptive way is to make note of MAC addresses in the OS, and look in the I/O Module MAC address table to determine what physical port they came in on. But this can be a little more complicated with OSs that do not use the physical NIC MACs. One fairly accurate, if not time consuming, method to make this determination is to go into the UEFI F1 setup, into the Network screen for the NICs, and make note of the information there to compare to information related to each logical NIC in the OS. Figure 4-56 represents an example of what might be seen on this Network screen: Figure 4-56 Example of MAC and PCI Function Address numbering of virtual NICs This screen provides both the MAC address and PCI Function Address (PFA) information for each physical or logical NIC, which can then be used in the server OS to figure out what OS enumerated names are related to the physical (or logical) NICs in hardware. The following two examples show the MAC and PFA info for comparison and contrast between the physical and then converted NICs for a dual port LoM NIC. Example 4-1 on page 122 represents the values as seen for an onboard NICs not in any virtual NIC mode, along with what physical I/O Module bays those physical NICs connect to. Example 4-2 on page 122 represents that same onboard NIC after conversion to some form of virtual NIC mode.
  • 133. NIC virtualization considerations - Server side.fm Draft Document for Review July 18, 2014 10:18 pm 122 NIC Virtualization in IBM Flex System Fabric Solutions Example 4-1 Example of onboard dual port NIC not in any virtual NIC mode MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0 <<< physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1 <<< physical NIC-1 to I/O Module bay 2 As can be seen in Example 4-2, the original physical NIC and PFA information have been inherited by the first two virtual NICs, followed by the other 6 virtual NICs and their associated MAC, PFA info, and what I/O Module (based on the under lying physical connections of the physical NIC) they connect to. Example 4-2 Example of onboard dual port NIC after converting in to virtual NIC mode MAC: 34:40:B5:BE:83:D0 Onboard PFA 12:0:0 <<< physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D4 Onboard PFA 12:0:1 <<< physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D1 Onboard PFA 12:0:2 <<< physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D5 Onboard PFA 12:0:3 <<< physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D2 Onboard PFA 12:0:4 <<< physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D6 Onboard PFA 12:0:5 <<< physical NIC-1 to I/O Module bay 2 MAC: 34:40:B5:BE:83:D3 Onboard PFA 12:0:6 <<< physical NIC-0 to I/O Module bay 1 MAC: 34:40:B5:BE:83:D7 Onboard PFA 12:0:7 <<< physical NIC-1 to I/O Module bay 2 As noted, now that we know this MAC and PFA information (as well as their relationship to the underlying physical NIC and where it connects to), it is usually possible to go into the OS and locate either the MAC or PFA information associated with the OS enumerated name (for example, in the Device Manager in Windows Server 2012), and thus regardless of the enumerated name, know where each vNIC connects to. Regardless of how it is determined, getting the proper pair of NICs into a team/bond is always important to ensure the desired high availability is achieved.
  • 134. © Copyright IBM Corp. 2014. All rights reserved. 123 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Chapter 5. Flex System NIC virtulization deployment scenarios This chapter provides details on various aspects of NIC virtualization as well as their interactions with a number of I/O Module features. The following topics are covered: 5.1, “Introduction to deployment examples” on page 124 5.2, “UFP mode virtual NIC and Layer 2 Failover” on page 125 5.3, “UFP mode virtual NIC with vLAG and FCoE” on page 135 5.4, “pNIC and vNIC Virtual Fabric modes with Layer 2 Failover” on page 149 5.5, “Switch Independent mode with SPAR” on page 174 5
  • 135. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 124 NIC Virtualization in IBM Flex System Fabric Solutions 5.1 Introduction to deployment examples This chapter provides examples for deploying the Flex System I/O modules and NIC virtualization functionality in a number of different scenarios. Also provided are helpful commands to confirm the environment is operating as designed. It is important to note that the examples provided may or may not reflect an exact combination of features an average environment might include, but were more chosen to demonstrate the interoperation of features and their associated configurations. The following combinations of features will be presented in this chapter: UFP mode virtual NIC and Layer 2 Failover UFP mode virtual NIC with FCoE and vLAG Virtual Fabric mode vNIC and Physical NIC with Layer 2 Failover Switch Independent mode vNIC with SPAR The above combinations are not necessarily indicative of any specific restriction as to what works with what, or on what model I/O module, but some features and combinations of features do not interoperate with others, or on all I/O modules, as follows: NIC virtualization features – All forms of vNIC are mutually exclusive of each other on the server side. In other words, a given server can be set for UFP or Virtual Fabric mode vNIC, or Switch Independent mode (or disabled for virtual NIC), but not more than one of these can be set at one time on that server. – On the switch side related to virtual NICs, UFP and Virtual Fabric mode vNIC are also mutual exclusive of each other, in that you can enable one or the other, but not both at the same time. Switch Independent mode vNIC can be enabled on a host connected to an I/O module that is configured for UFP or Virtual Fabric mode vNIC, but only if the I/O module ports facing this host are in physical mode (not configured for UFP or Virtual Fabric mode vNIC). Switch virtualization features – SPAR, vLAG and Stacking are all mutually exclusive on a given I/O module. – SI4093 does not support vLAG or Stacking, but does support SPAR. The SI4093 also does not support Virtual Fabric vNIC, but supports UFP starting from IBM Networking OS version 7.8. It also supports Switch Independent mode vNIC running on the host. – The I/O module-based Failover feature is supported with all modes of virtual NIC, but implemented differently depending on the mode (Virtual Fabric vNIC is configured on a per vNIC group basis, Switch Independent vNIC is configured using global failover per physical port, and UFP is configured using global failover on a per vPort basis). Other restrictions may apply as noted in more depth in Chapter 3, “NIC virtualization considerations on the switch side” on page 47 and Chapter 4, “NIC virtualization considerations on the server side” on page 65. Important: Unless otherwise noted, all configuration examples and commands in this document are based on using the industry standard CLI (isCLI) of the Flex System I/O modules running IBM Networking OS. By default, the EN4093R and CN4093 use the menu-driven CLI (this may change in the future). It is necessary to change the I/O module to isCLI mode to make use of these examples. The simplest way to get into the isCLI from the menu CLI is to issue the menu CLI command /boot/prompt ena, and then exit out and log back in. Upon logging back in you will be offered the option to select the desired CLI.
  • 136. Chapter 5. Flex System NIC virtulization deployment scenarios 125 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm 5.2 UFP mode virtual NIC and Layer 2 Failover Unified Fabric Port provides for the ability of carving up 10 Gb ports into virtual NICs as seen in Chapter 4, “NIC virtualization considerations on the server side” on page 65. Layer 2 Failover, seen in other chapters throughout this book, provides for the ability to detect uplink failures and systematically disable all INT ports. Layer 2 Failover with UFP takes that process to the next level and automates the shutdown not only to a physical NIC but a UFP vPort virtual NIC. This section will provide diagrams and configuration examples for setting up UFP and Layer 2 Failover. The following topics are covered: 5.2.1, “Components” 5.2.2, “Topology” 5.2.3, “Use Cases” on page 127 5.2.4, “Configuration” on page 127 5.2.1 Components This deployment scenario uses the following equipment: Flex System Enterprise Chassis x240 Compute Node (in bay 3) – ESXi 5.5 embedded hypervisor – Quad port CN4054 NIC in Mezz slot 2 • First two physical NICs have UFP configured and FCoE personality enabled • Second two NICs have Virtual NIC disabled (in physical NIC mode) Two CN4093s in switch bays 3 and 4 Two G8264s to act as upstream Ethernet connectivity running vLAG 5.2.2 Topology The x240 Compute Node OS running ESXi will be utilizing vSwitch0 using its default NIC team setting route based on originating virtual port to the pair of CN4093s. The first two ports within the UEFI of the CN4054R Emulex Quad Port NIC will be running in UFP mode. The CN4093s are running as independent I/O Modules with UFP enabled on vPort (.1) in Tunnel mode and vPort (.2) in FCoE mode. Tunnel mode is utilizing EXT1 and EXT2 which are in an IEEE 802.3ad LACP PortChannel with adminkey 4344. The PortChannel, along with INT port 4 UFP vPort (.1) are members of a failover trigger.
  • 137. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 126 NIC Virtualization in IBM Flex System Fabric Solutions As seen in Figure 5-1, a single I/O module is shown to display connectivity between the Compute Node and the external network. Figure 5-1 failover trigger with an active failure In the above Figure 5-1, EXT1 and EXT2 are forming a PortChannel which are also members of a failover trigger. The failover trigger is configured to only allow a single port to fail before it fails the associated INT vPorts. In this example we’re using Auto Monitor with VLAN aware. There are two forms of failover triggers that can be configured; AMON - Auto Monitor which allows for tracking of a physical uplink, static PortChannel or LACP PortChannel. When the uplink fails the I/O Module will auto disable any associated INT Ports or vPorts that is associate with any of the VLANs also assigned to the Monitor Port. MMON - Manual Monitor which also allows for tracking of the same uplink types as AMON, upon failure, will disable any manually configured INT ports or vPorts associated with that Trigger. Limit is a mechanism that is part of failover and can be applied on a per trigger bases. In this example limit is set to 1 within trigger 1. Limit 1 represents the number of ports that must be up and forwarding before a failover is triggered. Once the limit is met failover will trigger a event and disable all INT Ports or vPorts associated with that trigger. There are a couple of different ways a failure can occur. The most clearly understood way is by failure of link on the physical port. The second method in which a failure can occur is by spanning-tree state. When a VLAN that has spanning-tree enabled on the uplink or PortChannel enters into a non forwarding state the I/O Module sees this as a failure and triggers a failover event disabling any of the INT Ports and or vPorts associated with that trigger. A non spanning-tree forward failure event can occur on either AMON or MMON types of Failover.
  • 138. Chapter 5. Flex System NIC virtulization deployment scenarios 127 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm 5.2.3 Use Cases Failover can be extremely useful when NIC Teaming / Bonding is utilized on the Compute Nodes. Since the I/O Module is between the Compute Node and the upstream network a Compute Node has no way of detecting an outage further than its physical connection and can end up sending traffic to a black holed I/O Module. For this reason failover is a significant feature that will allow customers to implement an HA environment with a peace of mind that if a failure does occur their applications can survive with full access through its redundant connection to the network. 5.2.4 Configuration This section includes the configurations and steps necessary to configure the various components. This will not include the upstream G8264s as that is not the focus of this section (but it will include the configuration for the uplinks in the CN4093’s toward the G8264). Host side configuring (OS/UEFI) The process of configuring the UEFI is the same for any operating system that resides on an Intel based Compute Node. In Figure 5-2 below the UEFI Emulex NIC Selection page is found within the System Settings  Network  Network Device List {NIC wanting to enable UFP on}. Once here select Multichannel Mode  IBM Unified Fabric Protocol Mode. After making the change step all the way back out to System Configuration and Boot Management by pressing ESC and select Save Settings. Once enabled on one port of a two port ASIC the settings will automatically be applied to the other port/s. Figure 5-2 UEFI Emulex NIC Selection settings
  • 139. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 128 NIC Virtualization in IBM Flex System Fabric Solutions In Figure 5-3 vmnic2 and vmnic3 are associated with UFP port 4 vPort (.1) on each of the CN4093s. This is representing a healthy management network as both vmnics are being listed as Connected. Figure 5-3 ESXi Management with both redundant ports showing Connected In Figure 5-4 below the associated vSwitch which is utilizing vmnic1 and vmnic2 are seen below and is also showing connected. Figure 5-4 ESXi vSwitch with redundant vmnics Switch side configuration This subsection explains switch side configuration. The following options are covered: “Base Configuration of I/O Module” “Auto Monitor (AMON)” on page 130 “Manual Monitor (MMON)” on page 131 “View from Flex System Chassis with 2x CN4093s” on page 132 Base Configuration of I/O Module Note: Although the base configuration and following failover configurations are all utilizing a pair of CN4093 I/O Modules the steps below can also apply to the EN4093R with potentially minor EXT Port reassignments since the CN4093 has a different EXT port alignment than either of the EN4093 I/O Modules.
  • 140. Chapter 5. Flex System NIC virtulization deployment scenarios 129 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Perform the following steps to configure the I/O module: 1. The first step, if utilizing a PortChannel as the Uplink, is to create an LACP 802.3ad PortChannel. In this Example 5-1 on page 129 below there will be four ports utilized as the Uplink providing 40 Gb of unidirectional bandwidth. Also configured will be the tagpvid-ingress setting as the vPort will be running in tunnel mode. Example 5-1 Setting up LACP as the uplink interface port EXT11-EXT14 lacp mode active lacp key 5356 tagpvid-ingress 2. The second step is to create the UFP vPorts that will be utilized as the vmembers within the failover trigger. The Example 5-2 below shows how to setup UFP with a vPort running in Tunnel Mode. Example 5-2 Setting up UFP vPort 1 in Tunnel mode ufp port INTA3,INTA4 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 50 enable exit ufp port INTA3 enable ufp port INTA4 enable ufp enable 3. Since the I/O Modules will be running in UFP Tunnel mode and not participating in spanning tree the option of shutting down spanning-tree globally is provided in Example 5-3 below. Example 5-3 globally disabling spanning-tree spanning-tree mode disable Now that the I/O Module has been completely setup to support both the uplink PortChannel and the UFP INT Ports the next step is to decide whether to utilize Auto Monitor (AMON) or Manual Monitor (MMON). Both AMON and MMON have there advantages. With AMON, in combination with UFP globally enabled, VLAN monitoring must be enabled before you can enable a failover trigger. VLAN monitoring allows the I/O Module to only disable those vPorts that carry the same VLAN ID as the Uplink or PortChannel assigned to that Failover Trigger. All other vPorts will remain unaffected even within the same physical INT port as the failed vPort. With MMON, the meaning of the word “Manual” is exactly that. The I/O Module must be defined with both the Monitor Port or PortChannel (EXT ports) and the Control members and or vmembers (INT ports). MMON, perhaps, might be more utilized as it provides for greater control of what gets disabled during an uplink outage.
  • 141. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 130 NIC Virtualization in IBM Flex System Fabric Solutions Auto Monitor (AMON) In Figure 5-5 below two of the 4 uplinks within the LACP PortChannel have failed. Since the limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all vPorts associated with the same VLANs listed in the PortChannel to also fail. Figure 5-5 Auto Monitor failure I/O Module configuration in Example 5-4 below consists of a trigger set with auto monitor enabled. This trigger is also set to fail, with a limit of 2, all control members and/or vmembers if the number of forwarding ports is reached by the specified failover limit number. Example 5-4 Failover Trigger with amon configuration failover enable failover vlan failover trigger 1 limit 2 failover trigger 1 amon admin-key 5356 failover trigger 1 enable Note: VLAN trigger requirement with AMON is only necessary if UFP is enabled. AMON failover also works without vlan tracking when UFP is not enabled.
  • 142. Chapter 5. Flex System NIC virtulization deployment scenarios 131 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Manual Monitor (MMON) In Figure 5-6 below two of the 4 uplinks within the LACP PortChannel have failed. Since the limit of ports is set to 2 (i.e. 2 ports left up) a failed event occurs in that I/O Module causing all vPorts, manually enabled as control ports, to also fail. Figure 5-6 Manual Monitor failure I/O Module configuration Example 5-5 below consists of a trigger set to MMON enabled. This trigger is also set to fail, with a limit of 2, all control members and or vmembers if the number of forwarding ports is reached by the specified failover limit number. Example 5-5 Failover Trigger with mmon configuration failover enable failover trigger 1 limit 2 failover trigger 1 mmon monitor admin-key 5356 failover trigger 1 mmon control vmember INTA3.1 failover trigger 1 mmon control vmember INTA4.1 failover trigger 1 enable The biggest difference between AMON and MMON is AMON uses VLANs associated with the EXT Port and triggers a failure event disabling only those vPorts associated with the same VLANs as the Uplink defined within the trigger. Verification of proper configuration, with show commands, can be seen in “Confirming operation of the environment” on page 132.
  • 143. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 132 NIC Virtualization in IBM Flex System Fabric Solutions View from Flex System Chassis with 2x CN4093s Figure 5-7 shows a view of two CN4093s with UFP and Failover enabled. This scenario is identical from the two scenarios above allowing the redundant link to take 100% of the bandwidth after a failure to the primary ESXi vmnic. Figure 5-7 Flex Chassis with 2x CN4093s with failover enabled 5.2.5 Confirming operation of the environment Upon completion of the above steps there are several show commands that can display whether or not Failover is working as expected with the desired configuration. The first and easiest example, as seen below with Example 5-6 on page 133, is to display the status of the vPorts. By issuing a show ufp information port command this displays the health of each vPort. INTA3 and INTA4 Channel 1 (i.e. vPort (.1)) are both showing disabled, however, notice an asterisk next to the word disabled. This indicates, as also noted at the bottom of this example, that the vPort has been disabled due to a UFP failover trigger. This indicates that the number of failed uplinks, either the entire uplink/s or the limit, has been reached. Important: Channel 2 (i.e. vPort (.2)) is still up and forwarding as those vPorts were not members of a trigger with a failed event.
  • 144. Chapter 5. Flex System NIC virtulization deployment scenarios 133 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Example 5-6 UFP vPort status CN4093a(config)#show ufp information port ----------------------------------------------------------------- Alias Port state vPorts chan 1 chan 2 chan 3 chan 4 ------- ---- ----- ------ --------- --------- --------- --------- INTA1 1 dis 0 disabled disabled disabled disabled INTA2 2 dis 0 disabled disabled disabled disabled INTA3 3 ena 2 disabled* up disabled disabled INTA4 4 ena 2 disabled* up disabled disabled INTA5 5 dis 0 disabled disabled disabled disabled . . . * = vPort disabled due to UFP teaming failover Example 5-7 shows results of the command show portchannel information which displays the number of ports that have failed. As you can see below, the number of ports left up and forwarding is two. The failover trigger is also set to 2 so the limit has been reached which forced a failure event and disabled all INT vPorts. This command is especially important to figure out if what caused the failure event was due to Link status or Spanning-Tree block status. Example 5-7 displaying which ports within a PortChannel are still forwarding CN4093a(config)#show portchannel information PortChannel 65: Enabled Protocol - LACP Port State: EXT13: STG 1 forwarding EXT14: STG 1 forwarding These next two, Example 5-8 and Example 5-9 on page 134,display the full status of a failover trigger. This just might be the easiest command to run to find whether a trigger has been activated or not. In Example 5-8 notice that the limit is set to 2 with three of the four ports still remaining in Operational status. Because the limit has not been met the failover trigger has not kicked in. Example 5-8 Healthy Trigger state CN4093a(config)#show failover trigger 1 information Trigger 1 Manual Monitor: Enabled Trigger 1 limit: 2 Monitor State: Up Member Status --------- ----------- adminkey 5356 EXT11 Operational EXT12 Failed EXT13 Operational EXT14 Operational Control State: Auto Controlled Member Status --------- -----------
  • 145. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 134 NIC Virtualization in IBM Flex System Fabric Solutions Virtual ports INTA3.1 Operational INTA4.1 Operational In Example 5-9 notice that the limit is set to 2 and there are only two ports remaining in Operational status. Because the limit has now been met the failover trigger has kicked in and put the associated vPorts into a Failed state. Example 5-9 Failed Trigger state CN4093a(config)#show failover trigger 1 information Trigger 1 Manual Monitor: Enabled Trigger 1 limit: 2 Monitor State: Down Member Status --------- ----------- adminkey 5356 EXT11 Failed EXT12 Failed EXT13 Operational EXT14 Operational Control State: Auto Disabled Member Status --------- ----------- Virtual ports INTA3.1 Failed INTA4.1 Failed We can also see disconnects from the host side indicating that a physical or logical connection has been terminated. In Figure 5-8 vmnic 2 states Disconnected as the uplinks in the I/O Module to the network has been severed (or spanning-tree blocked) causing a Trigger Failover response to the associated vPorts. Figure 5-8 vmnic2 failure - VMware Management
  • 146. Chapter 5. Flex System NIC virtulization deployment scenarios 135 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm In Figure 5-9 below vSwitch0 is now displaying disconnected on vmnic2 and has failed over to its redundant (stand by) vmnic3. When this happens the traffic that was originally on vmnic 2 is now running over vmnic 3 and up through I/O Module 4. Figure 5-9 vmnic2 failure - vSwitch In Example 5-10, using a linux command line, a failure of 2 seconds (e.g. 2 ICMP Ping loss) was experienced during a failover trigger event. Example 5-10 ICMP ping loss due to failover trigger between I/O Modules 64 bytes from 9.42.171.170: icmp_seq=580 ttl=64 time=0.580 ms Request timeout for icmp_seq 581 Request timeout for icmp_seq 582 64 bytes from 9.42.171.170: icmp_seq=583 ttl=64 time=0.468 ms 5.3 UFP mode virtual NIC with vLAG and FCoE This section discusses the implementation of UFP virtual NIC with FCoE, and vLAG aggregations on the uplinks of a pair of CN4093’s. 5.3.1 Components This deployment scenario will make use of the following equipment: Flex System Enterprise Chassis x240 Compute Node in bay 3 – Running ESX 5.5 – Quad port CN4054 CNA in Mezz slot 2 • First two physical CNA ports have UFP configured and FCoE personality enabled • Second two CNA ports have Virtual NIC disabled - Not used in this scenario v7000 Storage Node in bays 11 - 14 of the Flex System chassis – Providing remote storage for Compute Node in bay 3 Two CN4093 I/O Modules – Installed in I/O Module bays 3 and 4 for this scenario Important: During a failure event between I/O Modules it is normal to experience up to 3 seconds of packet loss due to network reconvergence.
  • 147. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 136 NIC Virtualization in IBM Flex System Fabric Solutions – Provide the FCF function between the Compute Node in bay 3, and the storage array in bays 11-14 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of CN4093s 5.3.2 Topology This scenario will take advantage of the vLAG feature available on the CN4093 to virtualize the data plane to support cross switch aggregation. As well as UFP to provide virtual NIC support to the Compute Node, and FCoE within UFP to offer FCoE attached storage to the Compute Node in bay 3. Some comments on what is being demonstrated: We are using vLAG to provide cross-switch aggregation out of the CN4093’s toward the upstream direction to the Top of Rack switches. This provides both HA and improved performance for these connections to the upstream network – We are not doing any vLAG aggregations from the CN4093’s toward the Compute Node in bay 3 (aggregations toward servers running any form of virtual NIC is not supported at this time) For UFP we will be demonstrating four different vPorts: – vPort1 in tunnel mode - using the vLAG aggregation of EXT11 on both CN4093’s for the tunnel uplinks out of the I/O Modules • Uplink for vPorts using tunnel mode should use the tagpvid-ingress command to break out tunnel packets toward upstream and re-add outer tag on inbound packets back into the tunnel – vPort2 in FCoE mode • If FCoE is desired, only vPort2 can provide that function. All other vPorts can be any mode except FCoE – vPort3 in Access mode, allowing only vLAN 40, untagged – vPort4 in Trunk mode, allowing VLANs 50 and 60 (VLAN 50 untagged) • vPort3 and vPort4 sharing vLAG aggregations on ports EXT12 and EXT13 on both CN4093’s for their uplinks Figure 5-10 shows how the components of this design come together.
  • 148. Chapter 5. Flex System NIC virtulization deployment scenarios 137 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Figure 5-10 Example of vLAG aggregations upstream, UFP and FCoE using CN4093s 5.3.3 Use cases For customers desiring highly available upstream connections (vLAG), virtual NICs on the servers (UFP) and converged storage access (FCoE). As noted previously, none of these features are directly a requirement of the other (we can have vLAG without UFP, or UFP without FCoE, and so forth). They are just demonstrated together here for the purposes of showing a potentially flexible and robust design. 5.3.4 Configuration This section includes the configurations and steps necessary to configure the various components. This examples here will not include the upstream G8264 configurations, as that is not the focus of this paper (but it will include the configuration for the uplinks in the CN4093s toward the G8264s). Also not included here is the act of creating the LUNs that will be used for this process. It is assumed they already exist at the time this scenario is built. The steps required to complete this scenario are broken up into five primary sections: “Host side enablement (UEFI Setup)” “Miscellaneous I/O Module settings” “vLAG and aggregation configurations” on page 139
  • 149. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 138 NIC Virtualization in IBM Flex System Fabric Solutions “UFP configuration” on page 141 “FCoE configuration” on page 143 Host side enablement (UEFI Setup) For this example we will need to go into UEFI and configure the desired virtual NIC type (UFP) and set the personality to FCoE. Not shown will be the install of ESX and the configuration of the vSwitches and a test VM (images in Figure 5-10 on page 137 represent the final vSwitch vmnic usage). To configure the host to support UFP and FCoE, reboot the server and when prompted, press the F1 key to enter Setup. In Setup, go to System Settings  Network, then highlight the desired NIC and press Enter twice. This should take us to the Emulex NIC Selection menu. Change Personality to FCoE (assumes FCoE FoD key already installed) and change Multichannel Mode to Unified Fabric Port (UFP). After setting the FCoE and UFP Virtual NIC in UEFI, escape back out of UEFI setup, and save the configuration when prompted, and reboot the Compute Node. Detailed instructions and screen shots of this process can be found in Chapter 4, “NIC virtualization considerations on the server side” on page 65. Miscellaneous I/O Module settings The following are some preparatory steps before configuring the main features of this scenario. Some comments on these commands: In this example we are only using a limited subset of ports for example, INTA3, INTA13-INTA14, and so on, but in most cases many ports would be performing the same roles, so some of the commands shown here will impact both the ports we will be using to demonstrate this scenario, as well as ports that we will not be using in this specific scenario. Tagging needs to be enabled on all ports carrying FCoE VLANs, as well as on any ports carrying more then a single VLAN A host name is configured (for clarity) An idle logout timer is configured (for reference) Apply name to ports going to internal v7000 (for clarity) Important: Changing the Personality and MultiChannel modes effects all CNA ports on the ASIC associated with the one being changed. Meaning it is only necessary to set this in one place, to enable two CNA ports if this is the onboard Emulex or in two places for Quad port CN4054 NIC (CN4054 and Cn4054R have two ASICs). Important: While performing the configurations on the I/O Modules, all uplinks should be disconnected or disabled until instructed to bring the links up. Making certain configuration changes on a I/O Modules with live connections to an upstream network can cause instability in the network. Important: All switch configuration instructions assume we are starting from a factory default configuration on the I/O Modules. All configuration commands shown executed are from the conf t mode of the isCLI interface of the I/O Module.
  • 150. Chapter 5. Flex System NIC virtulization deployment scenarios 139 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm The commands used to perform these miscellaneous tasks can be seen in Example 5-11: Example 5-11 Example of preparing switch with base commands ! Enable tagging on all desired ports ! 1-28 = INTA1-INTB14, 43-44 = EXT1-EXT2 (vLAG ISL) ! 54-55 = EXT11-EXT12 (uplink) int port 1-28,43-44,54-55 tagging ! ! Add host name and set idle time out to 60 minutes hostname "PF_CN4093a" system idle 60 ! ! Add port names on INTA13 and INTA14 int port 13-14 name "v7000_Storage" Repeat the above steps for the second switch, changing hostname to PF_CN4093b. Once these base commands are applied we can proceed to creating the vLAG and aggregations. vLAG and aggregation configurations Configuring vLAG and aggregation is a multistep process and will include the following steps: 1. Create the aggregation for the vLAG ISL and set PVID to an unused VLAN (using an unused VLAN for the PVID on the vLAG ISL helps to increase stability of the ISL). We will be using LACP for all aggregations, but static aggregations could also have been utilized. All LACP keys are chosen to be unique for each aggregation and do not denote anything else special by the use of these specific LACP key numbers 2. Disable Spanning-tree on the PVID VLAN of the ISL (also helps ensure stability of the ISL) 3. Create the local aggregations on the uplinks 4. Configure the health check (in this example we will be using the EXTM ports back to back to provide the vLAG health check). Will be using some unused IP subnet (1.1.1.X/30) for this health check connection 5. Configure and enable vLAG 6. vLAG Tier ID must be unique from any upstream connecting vLAG pair and must be same for both CN4093 I/O Modules in the same vLAG pair 7. Once all configurations are complete, plug in back-to-back health check cable between EXTM ports. Plug in ISL links 8. Once ISL is up, plug in uplinks to upstream networks to complete the physical steps The commands used to perform these tasks are provided in Example 5-12. Example 5-12 Example of configuring vLAG and aggregations ! Create the ISL aggregation and set the PVID to an unused VLAN int port 43-44 lacp mode active lacp key 4344 pvid 4090 Important: All of the examples provided here can be directly cut and pasted into the I/O Module.
  • 151. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 140 NIC Virtualization in IBM Flex System Fabric Solutions ! ! Exit from interface config mode and then globally disable instance of STP ! for ISL PVID VLAN exit no spanning-tree stp 26 enable spanning-tree stp 26 vlan 4090 ! Configure upstream aggregations (using EXT11 (53) for UFP tunnel uplink ! Using EXT12-EXT13 (54-55) for UFP trunk and access uplinks int port 53 lacp mode active lacp key 1111 ! int port 54-55 lacp mode active lacp key 1213 ! ! Configure EXTM ports for use as vLAG healthcheck ! Interface IP 127 is tied to EXTM int ip 127 ip address 1.1.1.1 255.255.255.252 enable ! Configure VLAG ! Hlthck points to IP of other CN4093 in this vLAG pair ! ISL adminkey is the admin keys on ports EXT1-EXT2 ! Other adminkeys are for uplink aggregations previously configured vlag enable vlag tier-id 11 vlag hlthchk peer-ip 1.1.1.2 vlag isl adminkey 4344 vlag adminkey 1111 enable vlag adminkey 1213 enable ! Once the above steps are complete, repeat for the second I/O Module, changing the following two lines in the above config: Change ip address 1.1.1.1 255.255.255.252 enable to ip address 1.1.1.2 255.255.255.252 enable Change vlag hlthchk peer-ip 1.1.1.2 to vlag hlthchk peer-ip 1.1.1.1 Once both switches are configured per the above, perform the following steps: 1. Bring up the ISL links between the pair of CN4093’s (no shut EXT1-EXT2 and/or plug in the cables as necessary) 2. Bring up the management ports on both CN4093’s (no shut EXTM and/or plug in the cable as necessary Important: It is assumed that the upstream connecting switches have already been properly configured for any necessary aggregations and vLAG/vPC before bringing up the links to the upstream network. Failure to ensure upstream configuration is complete before plugging in cables can lead to a network down situation.
  • 152. Chapter 5. Flex System NIC virtulization deployment scenarios 141 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm 3. Confirm links on EXT1, EXT2 and EXTM ports on both CN4093’s are up (show int status) 4. Confirm aggregation on EXT1-EXT2 is Up (show lacp info) 5. Confirm vLAG ISL and health check are up using the command show vlag info and confirm Health check is Up and ISL state is Up 6. Once vLAG and health checks are confirmed operational, bring up uplink aggregations on EXT11-EXT13 on both I/O Modules (no shut EXT11-EXT13 and/or plug in the cables as necessary) 7. Confirm links are up (show int status), aggregations are up (show lacp info) and vLAG shows state formed (show vlag info) for both upstream aggregations. Details on output of above commands for correctly functioning I/O Modules are provided in 5.3.5, “Confirming operation of the environment” on page 145. UFP configuration In this step we will enable and configure UFP on the INTA3 interface and add desired VLANs to uplink ports to complete the path out for the UFP vPorts. Some comments on these steps: Before we start configuring vPorts, we will enabled CEE – If a vPort is configured for FCoE, UFP can not be enabled until CEE is enabled – Enabling CEE automatically turns off flow control on all internal ports - this is to switch to Per Priority Flow control used by CEE – When changing flowcontrol states, the ports are shut/no shut automatically briefly to force the new flowcontrol state In this example we will be configuring four vPorts – vPort 1 will be in UFP tunnel mode and will use a tunnel VLAN 4091. VLAN 4091 will be the outer tag used on packets flowing on this tunnel, and will be stripped off on the uplink EXT11 interface using the tagpvid-ingress command. – vPort 2 will be used for FCoE traffic and set for VLAN 1001 or 1002, depending on the switch. FCoE VLANs should be different for the separate switches to reduce the likely hood of a fabric merge. – vPort 3 will be configured as a simple access vPort, using an access/untagged VLAN 40. – vPort 4 will be configured as an 802.1Q trunk vPort, using an untagged VLAN 5o, and allowing a tagged VLAN 60. vPort bandwidths used in this example can change if desired, but it is recommended to not set the FCoE vPort 2 minimum bandwidth lower then 40%, to prevent FCoE traffic from being guaranteed the necessary bandwidth While in this example we show 4 different types of vPorts being used (tunnel, FCoE, access and trunk), we could have used different arrangements of types (for example, used all trunk vPorts, or all tunnel or access vPorts (except for vPort 2. if FCoE is in use, vPort 2 must be the FCoE vPort) – For each tunnel mode vPort, assuming the tunnel is being broken out (outer tag stripped off) on the uplink, that tunnel must have a separate uplink path (can not share uplink paths with other tunnel mode vPorts or even access or trunk mode vPorts) – All vPorts on a physical port must use unique VLANs
  • 153. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 142 NIC Virtualization in IBM Flex System Fabric Solutions The commands used to configure UFP and some associated VLAN and tunnel parameters can be seen in Example 5-13: Example 5-13 Example of configuring UFP and vPorts on INTA3 ! Enabling CEE at this point as it must be enabled before enabling a UFP vPort ! that has FCoE configured cee enable ! Create and configure all of the vPorts on INTA3 and enable UFP ufp port INTA3 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 10 enable ufp port INTA3 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 40 enable ufp port INTA3 vport 3 network mode access network default-vlan 40 qos bandwidth min 20 enable ufp port INTA3 vport 4 network mode trunk network default-vlan 50 qos bandwidth min 30 enable ufp port INTA3 enable ufp enable ! When UFP is enabled, it will automatically create and enable the assigned ! default-vlan for each vPort, and add the vPort as a member of that default VLAN ! Create any extra VLANS and assign VLANs to uplinks and ISL for failover paths ! VLANs 40 and 50 will have automatically been assigned to the vPorts with ! the default-vlan of the same. We need to now add the ISL and uplink ports vlan 40 enable member EXT1-EXT2,EXT12-EXT13 ! vlan 50 enable member EXT1-EXT2,EXT12-EXT13 ! ! For VLAN 60, this is the only non default-vlan VLAN we will be using ! so we must also manually add the vPort to this VLAN using the vmember command vlan 60 enable member EXT1-EXT2,EXT12-EXT13
  • 154. Chapter 5. Flex System NIC virtulization deployment scenarios 143 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm vmember INTA3.4 ! ! VLAN 4091 is our tunnel mode VLAN, and vPort1 is automatically a member, but we ! must add the ISL links and desired uplink as members to carry traffic in and out vlan 4091 enable member EXT1-EXT2,EXT11 ! We will add the FCoE VLAN to desired ports in the next step. ! Set tagpvid-ingress on upstream port EXT11 to act as tunnel endpoint for vPort 1 ! Will remove tunnel VLAN for outbound packets ! Will add tunnel VLAN for inbound ports int port 53 tagpvid-ingress Repeat the above steps for the second switch, changing the following fine in the above config: Change the vPort 2 command network default-vlan 1001 to network default-vlan 1002. Once both switches are configured per the above, perform the following checks: 1. Run the command show run | section ufp and confirm UFP config is in place 2. Run the command show ufp info vport port inta3 and confirm all vPorts are up and carrying desired VLANs and in desired modes 3. Run the command show int trunk and confirm VLANs are correct and tagpvid-ingress is on upstream EXT11 Details on proper output of above commands for correctly functioning I/O Modules are provided in 5.3.5, “Confirming operation of the environment” on page 145 Once these UFP commands are applied we can proceed to configuring FCoE. FCoE configuration In this section we will be performing the necessary commands to enable FCoE. It is assumed the above steps have already been completed. Most importantly, that CEE has already been enabled in a previous step. The steps we will be performing and some comments: 1. CEE was enabled in a previous step, but if it had not, it must be enabled now 2. We will be using EXT15-EXT16 as our FCF ports – We will not be attaching any cables to these ports in our example as all FCoE traffic will stay internal to the CN4093, between the host on INTA3 and the FCoE attached storage on ports INTA13-INTA14 - but we still must assign FC ports to communicate to the FC component of the CN4093 – Assigning a minimum of 2 FC ports is mandatory for any FCF function to work – Assigning more FC ports provides higher bandwidth Important: It is assumed that the steps to enable multichannel mode to UFP and personality to FCoE In the UEFI of Compute Node 3 has already been completed.
  • 155. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 144 NIC Virtualization in IBM Flex System Fabric Solutions – FC ports are always assigned in pairs (even numbers used) – Only the 12 omni ports (EXT11-EXT22) can be assigned as FC ports 3. Configure desired FCoE ports to carry vlan 1001 or 1002 tagged – FCoE VLAN must be a tagged VLAN on any ports that are carrying it 4. Enabling VLAN 1001 or 1002 for FCF functionality – VLAN 1002 is considered an industry default FCoE VLAN, but almost any VLAN can be used for FCoE (can not use VLAN 1 and a few other reserved VLANs) – Although it is possible to use the same FCoE VLAN on both switches (as long as that VLAN is not carried between the two switches), it is not recommended, to ensure a fabric merge does not occur if the FCoE VLAN did accidently get bridged between the I/O Modules 5. Disable STP instance of spanning-tree associated with the FCoE VLAN 6. Configure any desired zoning – We will be applying zoning that lets all hosts see all available LUNs. This is not what most production designs will incorporate and is only used here for simplified operation – In normal zoning, whenever changes are made to zoning, the zoneset activate name xxxxx command (where xxxxx is the name of the zone to be activated) must be executed before any zoning changes take effect. The zonset activate command is not necessary with the zoning syntax we are using in this scenario 7. Save the configuration to NVRAM when completed The commands used to perform these tasks can be seen in Example 5-14: Example 5-14 Example of configuring FCoE ! Enable FIP Snooping to ensure FCoE end to end security fcoe fips enable ! Designate the desired omni ports as FC ports system port EXT15,EXT16 type fc ! Name FCoE VLAN, add v7000 facing ports and FC ports and enable the FCF support vlan 1001 enable name "FCoE_FAB-A" member INTA13-INTA14,EXT15-EXT16 fcf enable ! Disable STP on instance of STP associated with FCoE VLAN no spanning-tree stp 112 enable spanning-tree stp 112 vlan 1001 ! Add catch-all zoning (not suitable for most production environments) zone default-zone permit zone name allow-all zoneset name default ! Save the configuration changes made to NVRAM copy running startup ! If prompted to save to flash press the y key ! If prompted to change to active config block, press the y key
  • 156. Chapter 5. Flex System NIC virtulization deployment scenarios 145 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Repeat the above steps for the second switch, changing the following lines: Change vlan 1001 to vlan 1002 Change name "FCoE_FAB-A" to name "FCoE_FAB-B" Change no spanning-tree stp 112 enable to no spanning-tree stp 113 enable Change spanning-tree stp 112 vlan 1001 to spanning-tree stp 113 vlan 1002 Once both switches are configured per the above, perform the following checks: 1. Run the command show fcoe fips fcf and confirm we see an FCF entry for each FC port that was configured. The FCF function should come up regardless of FCoE sessions. 2. Run the command show fcoe fips fcoe and confirm we see an FCoE session for each V7000 port on INTA13 and INTA14, and one for the server on INTA3. 3. Run the command show fcoe fips vlan and confirm desired interfaces are present for the FCoE VLAN. Details on proper output of above commands, along with other helpful troubleshooting commands for this environment are provided in “Confirming operation of the environment” 5.3.5 Confirming operation of the environment This section contains helpful commands and their associated output to ensure the scenario demonstrated is healthy and operating as expected. Note there are many helpful commands for many tasks, but this section is focused on the specific commands for this environment. Also note that the output for most of this information can also be obtained from a show tech command. Details on confirming the health of vLAG and aggregations The examples provided in Example 5-15 on page 145 represent truncated output and added embedded comments on that command output: The examples here are all run on the I/O module in bay 3. When troubleshooting, one should always look at both I/O modules in the design. Example 5-15 Example of commands to check the health of vLAG (after all configs applied) ! First check the link status. Make sure ISL ports (EXT1-EXT2), INTA3, INTA13, ! INTA14, EXT1, are Link up, as well as EXTM Link up for the vLAG health check PF_CN4093a#show int status ------------------------------------------------------------------ Alias Port Speed Duplex Flow Ctrl Link Name ------- ---- ----- -------- --TX-----RX-- ------ ------ INTA3 3 10000 full no no up INTA3 ... INTA13 13 10000 full no no up v7000_Storage INTA14 14 10000 full no no up v7000_Storage Important: It is assumed that the OS has already been installed on the Compute Node and proper FCoE drivers are operational within the OS. It is also assumed the V7000 storage has been configured and is presenting storage to the host. Important: In an effort to reduce extraneous output, many non-essential lines have been removed from the output of the commands executed in this section. Where removed, they have been replaced by an ellipsis (...)
  • 157. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 146 NIC Virtualization in IBM Flex System Fabric Solutions ... EXT1 43 10000 full no no up EXT1 EXT2 44 10000 full no no up EXT2 ... EXT11 53 10000 full no no up EXT11 EXT12 54 10000 full no no up EXT12 EXT13 55 10000 full no no up EXT13 ... EXTM 65 1000 full no no up EXTM ... ! Confirm aggregation is now up not only for ISL but each one of the upsteam ! aggeregations PF_CN4093a#sho lacp info ------------------------------------------------------------------ port mode adminkey operkey selected prio aggr trunk status minlinks --------------------------------------------------------------------------------- ... EXT1 active 4344 4344 yes 32768 43 65 up 1 EXT2 active 4344 4344 yes 32768 43 65 up 1 ... EXT11 active 1111 1111 yes 32768 53 66 up 1 EXT12 active 1213 1213 yes 32768 54 67 up 1 EXT13 active 1213 1213 yes 32768 54 67 up 1 ... ! Confirm vLAG is fully healthy and both upstream vLAGed aggregations show state ! formed (formed = at least one uplink from each switch in a vLAGed aggregation is ! up and operationsl) PF_CN4093a#sho vlag info vLAG system MAC: 08:17:f4:c3:dd:0a Local MAC 74:99:75:5d:dc:00 Priority 0 Admin Role PRIMARY (Operational Role PRIMARY) Peer MAC a8:97:dc:10:44:00 Priority 0 Health local 1.1.1.1 peer 1.1.1.2 State UP ISL trunk id 65 ISL state Up Auto Recovery Interval: 300s (Finished) Startup Delay Interval: 120s (Finished) vLAG 65: config with admin key 1111, associated trunk down, state formed vLAG 66: config with admin key 1213, associated trunk down, state formed ! For reference, aside from state formed, there are three possible other states ! state local up = At least one link from the vLAG agg is up on this switch, but ! no links for this vLAG agg are up on the other switch ! state remote up = the reverse of local up, in other words, there is port up for ! this vLAG agg on the other switch, but none on this switch ! state down = no links on either switch are up for this vLAG agg Details on confirming the health of UFP The examples provided in Example 5-16 represent truncated output and added embedded comments on that command output for checking the health of UFP:
  • 158. Chapter 5. Flex System NIC virtulization deployment scenarios 147 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Example 5-16 Example of commands to check the health of UFP (after all configs applied) ! First check that the desired UFP commands are present in the running config ! by filering on just showing the UFP sections PF_CN4093a#show run | section ufp ufp port INTA3 vport 1 network mode tunnel network default-vlan 4091 qos bandwidth min 10 enable exit ! ufp port INTA3 vport 2 network mode fcoe network default-vlan 1001 qos bandwidth min 40 enable exit ! ufp port INTA3 vport 3 network mode access network default-vlan 40 qos bandwidth min 20 enable exit ! ufp port INTA3 vport 4 network mode trunk network default-vlan 50 qos bandwidth min 30 enable exit ! ufp port INTA3 enable ! ufp enable ! ! Get a real time snapshot of vPort state and VLANs in use, as well as the mode ! configured for each vPort. PF_CN4093a#show ufp info vport port inta3 ------------------------------------------------------------------------------- vPort state evbprof mode svid defvlan deftag VLANs --------- ----- ------- ---- ---- ------- ------ ---------------------- INTA3.1 up dis tunnel 4091 4091 dis 4091 INTA3.2 up dis fcoe 1001 1001 dis 1001 INTA3.3 up dis access 4004 40 dis 40 INTA3.4 up dis trunk 4005 50 dis 50 60 ! Get real time infomation on VLAN allowed status as well as the status of ! tagpvid-ingress on the uplink for the tunnel vPort (EXT11) as seen by the ! accompanying # symbol. PF_CN4093a#show int trunk Alias Port Tag Type RMON Lrn Fld PVID NAME VLAN(s) ------- ---- --- ---------- ---- --- --- ------ -------------- ------------------ ...
  • 159. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 148 NIC Virtualization in IBM Flex System Fabric Solutions INTA3 3 y Internal d e e 1 INTA3 1 40 50 60 1001 4091 ... EXT1 43 y External d e e 4090 EXT1 1 40 50 60 4090 4091 EXT2 44 y External d e e 4090 EXT2 1 40 50 60 4090 4091 ... EXT11 53 n External d e e 4091# EXT11 4091 EXT12 54 y External d e e 1 EXT12 1 40 50 60 EXT13 55 y External d e e 1 EXT13 1 40 50 60 ... * = PVID is tagged. # = PVID is ingress tagged. Details on confirming the health of FCoE The examples provided in Example 5-17 represent truncated output and added embedded comments on that command output for checking the health of FCoE: Example 5-17 Example of commands to check health of FCoE (after all configs applied) ! Confirm the FCF is detected and has an entry for each of the FC ports assigned ! to this purpose PF_CN4093a#show fcoe fips fcf Total number of FCFs detected: 2 FCF MAC Port Vlan ----------------------------------- a8:97:dc:10:44:c7 EXT15 1001 a8:97:dc:10:44:c8 EXT16 1001 ! Confirm the FCoE sessions have been establised for each device that is using ! FCoE (the host on INTA3 and the ports toward the v7000 storage (INTA13 and ! INTA14) PF_CN4093a#show fcoe fips fcoe Total number of FCoE connections: 3 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:11:00 a8:97:dc:10:44:c8 INTA3 1001 0e:fc:00:01:10:00 a8:97:dc:10:44:c7 INTA13 1001 0e:fc:00:01:10:01 a8:97:dc:10:44:c7 INTA14 1001 ! Check that all ports that need access to the FCoE VLAN are included: PF_CN4093a#show fcoe fips vlan Vlan App creator Ports ---- ----------------- ------------------------------------------------------- 1001 UFP INTA3 INTA13 INTA14 EXT15 EXT16 ! The following commands are only available when in full fabric mode (FCF enabled) ! and can be helpful when troubleshooting ! Make sure the FCoE database is populated with all hosts PF_CN4093a#show fcoe database -----------------------------------------------------------------------
  • 160. Chapter 5. Flex System NIC virtulization deployment scenarios 149 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm VLAN FCID WWN MAC Port ----------------------------------------------------------------------- 1001 011100 10:00:00:00:c9:f8:0a:59 0e:fc:00:01:11:00 INTA3 1001 011000 50:05:07:68:05:08:03:70 0e:fc:00:01:10:00 INTA13 1001 011001 50:05:07:68:05:08:03:71 0e:fc:00:01:10:01 INTA14 Total number of entries = 3. ----------------------------------------------------------------------- ! Make sure we see a fabric login for each device: PF_CN4093a#show flogi database ----------------------------------------------------------------------- Port FCID Port-WWN Node-WWN ----------------------------------------------------------------------- INTA13 011000 50:05:07:68:05:08:03:70 50:05:07:68:05:00:03:70 INTA14 011001 50:05:07:68:05:08:03:71 50:05:07:68:05:00:03:71 INTA3 011100 10:00:00:00:c9:f8:0a:59 20:00:00:00:c9:f8:0a:59 Total number of entries = 3. ----------------------------------------------------------------------- For further commands on reviewing the health of an I/O Module see the appropriate Application Guide for that product. A good source for guides for PureFlex I/O Modules is the following link: http://publib.boulder.ibm.com/infocenter/flexsys/information/topic/com.ibm.acc.net workdevices.doc/network_iomodule.html 5.4 pNIC and vNIC Virtual Fabric modes with Layer 2 Failover This section presents several scenarios for use of the Emulex LOM’s and mezzanine adapters in Flex System compute nodes. The presented scenarios are: Physical NIC mode with Layer 2 failover Physical NIC mode with Layer 2 failover and FCoE storage Virtual Fabric vNIC mode with failover Virtual Fabric vNIC mode with failover and FCoE storage Physical NIC mode presents each port of the Emulex LOM or card as a single 10Gb physical port. A two-port card would be seen by the OS of the compute node as two 10Gb NICs, each of which would go to a different embedded I/O Module in the Flex chassis. A four-port mezzanine card would be seen as four 10Gb ports; two ports would go to one I/O Module (for example bay 1) and two to another (bay 2), using internal ports INTAx and INTBx on the switches. To make full use of a four port card such as the CN4054, an upgrade would be required on embedded switch modules (EN4093R, CN4093, or SI4093). Physical NIC mode with FCoE changes the presentation of the card so that each physical port is seen as a NIC and a corresponding FCoE HBA. (It is also possible to select the iSCSI personality on the card, and the storage side would be seen as an iSCSI HBA. This scenario is not tested here.) Virtual Fabric vNIC mode, also known as IBM Virtual Fabric mode presents each port of an Emulex LOM or card as up to four virtualized ports. The bandwidth of these ports is configurable with both a minimum guaranteed bandwidth allocation and a maximum limit on
  • 161. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 150 NIC Virtualization in IBM Flex System Fabric Solutions bandwidth usage. The OS of the compute node will see up to eight NICs, with bandwidth equal to the maximum limit configured on the Emulex hardware. Even though the OS might see eight NICs, each with a bandwidth of 10Gb, there are still only two 10Gb physical ports behind them. Four of the vNICs will share the 10Gb bandwidth of each physical port. (If a four port card such as the EN4054 is used, vNIC will present up to sixteen virtualized NICs to the OS from each EN4054, but there are still only four 10Gb physical ports and the total available bandwidth is 40Gb.) Virtual Fabric vNIC mode with FCoE reserves one of the four vNIC instances for each physical port for storage networking. In this case, the OS will see fewer virtualized NIC instances but will see the storage functionality reflected as an HBA. For example, a two port LOM configured in this way would be seen by the OS as six virtualized NIC instances and a two port HBA. The two port LOM still has only two physical 10Gb ports, and each one would be shared by three vNIC instances and one HBA. As in Physical NIC mode, an iSCSI personality is also available. Layer 2 Failover is a configurable function of most of the embedded switch I/O Modules on the Flex System chassis. It allows the state of a set of ports - typically external ports which connect to an upstream network - to control the state of other ports, typically internal facing ports which connect across the chassis backplane to compute nodes. This feature is typically used to protect against a specific type of network failure which can occur in chassis-based systems, where an embedded switch is operational but disconnected from the remainder of the network. Layer 2 failover can administratively disable server-facing ports when such a failure occurs, triggering the servers’ NIC teaming (or bonding) capability to use a surviving port which still has a viable connection to the network. 5.4.1 Components The testing in this chapter was done using the following hardware and software: Flex System Enterprise Chassis x240 Compute Node in bay 1 – Running ESX 5.1 – Dual port Emulex LOM CNA – DS4800 external storage attached via FC ports on G8264CS switches Two EN4093s in I/O module bays 1 and 2 Two G8264CS switches to act as upstream Ethernet connectivity out of the vLAG pair of EN4093s – Provide FCF function and physical connectivity to DS4800 on Fibre Channel port 53 Notes: There are two distinct ways to configure L2 failover on the EN4093R and CN4093 switches. The failover command and associated subcommands and operands operates on full physical ports, and has been enhanced to function for UFP vports as well. There is also a failover option within the configuration of a vnic group; this option allows a failure in the uplink associated with a vnic group to cause the vnic members of that group to be administratively disabled. The vmember option of the failover command, which is intended for UFP vports, will allow a vNIC instance to be specified but it will not provide the desired failover function.
  • 162. Chapter 5. Flex System NIC virtulization deployment scenarios 151 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm 5.4.2 Topologies The base topology for the scenarios presented in this section is shown in Figure 5-11 on page 151 and shows the connections between the components listed above. Specific topology diagrams will be included in the sections below for specific scenarios. Figure 5-11 Base topology for scenarios 5.4.3 Use cases Physical NIC mode (pNIC) is the default for the Flex environment. It presents the LOM and NIC mezzanine cards to the server’s OS with the same number of ports as the card actually has (2-port or 4-port). In this mode, converged networking can be enabled, so that these cards present two or four NIC ports and two or four HBA ports for storage (FCoE or iSCSI). Redundancy can be achieved in pNIC mode for data networking through the use of NIC teaming options on the various operating systems. The storage protocols each have their own multi-pathing options which provide a similar capability as long as both HBA ports have access to the storage LUNs. The embedded and top-of-rack switches can be configured the failover command, which works in concert with NIC teaming. This scenario would use active-standby teaming on PureFlex Chassis DS4800 Storage -- FC attached EN4093 – Bay 1 EXT6 vNIC .4 8264-1 EXT5 EXT7 vNIC .2 FCoE EN4093 – Bay 2 vNIC .4 8264-2 vNIC .2 FCoE INTA1 INTA1 EXT6 EXT7 EXT5 EXT10 EXT9 EXT9 EXT10 vLAG ISL 52 42 42 52 51 51 17 18 17 18 EXTM EXTM FC 54 FC 54 vNIC .3 vNIC .3 vNIC .1 vNIC .1 X240 server -- 2-port LOM -- ESX 5.1
  • 163. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 152 NIC Virtualization in IBM Flex System Fabric Solutions Windows and Linux; it could use a form of active-active teaming with VMware. These options are discussed in chapter 4.3, “Utilizing physical and virtual NICs in the OSes” on page 105. In addition, with Virtual Link-aggregation (vLAG) on the switches, active-active NIC teaming modes can be supported. This will typically provide a more rapid failover and fail-back. Virtual Fabric vNIC mode was the first virtualization option available from Emulex and IBM. It allows the Emulex converged NIC to be seen by operating systems as four NIC ports per physical port, or three NIC ports and one HBA per physical port. There are topology constraints in Virtual Fabric vNIC mode which are largely relaxed in the newer UFP virtualization mode which is recommended for new implementations. UFP is discussed in section 5.3, “UFP mode virtual NIC with vLAG and FCoE” on page 135. 5.4.4 Configurations The following configuration options are covered: “Physical NIC mode” “Use of vLAG with failover” on page 153 “Physical NIC mode with FCoE storage” on page 154 “Virtual Fabric vNIC mode” on page 160 “Virtual Fabric vNIC mode with FCoE” on page 162 Physical NIC mode The failover function on the EN4093R switches can be configured on static or dynamic (LACP) aggregations. If it is desired to use auto monitoring (amon) then a single port can be configured as an aggregation and then configured into a failover trigger. The configuration would be done as follows, assuming that the uplink ports to be monitored are EXT5 and EXT7. (The upstream switch would have to configure LACP on the corresponding ports.) With this configuration, when both EXT5 and EXT7 fail, internally facing ports with the same VLANs will be administratively brought down. The limit option shown can be used to cause the internal ports to be brought down when either EXT5 or EXT7 fails - that is, when there are one or fewer ports active. The commands are shown in Example 5-18. Example 5-18 Failover configuration - pNIC mode - Auto monitor interface port EXT5,EXT7 lacp key 5757 lacp mode active failover enable failover trigger 1 amon admin-key 5757 failover trigger 1 enable failover trigger 1 limit 1 (optional) It is sometimes desirable to configure failover with more flexibility than the amon option provides. This can be done with manual configuration, also known as mmon. A configuration to do the same failover as is shown above using manual monitoring is shown below. Note that the controlled ports are explicitly specified, and can be a subset of the internal facing ports, or can include external ports such as when a server is connected to them. In Example 5-19, only ports INTA1 and INTA2 are to be disabled in the event of an uplink failure.
  • 164. Chapter 5. Flex System NIC virtulization deployment scenarios 153 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Example 5-19 Failover configuration - pNIC mode - Manual monitor interface port EXT5,EXT7 lacp key 5757 lacp mode active failover enable failover trigger 1 mmon monitor admin-key 5757 failover trigger 1 mmon control member INTA1,INTA2 failover trigger 1 enable failover trigger 1 limit 1 (optional) Manual monitor failover can also be configured to monitor individual ports with the following command syntax: failover trigger 1 mmon monitor member EXT5,EXT7. Multiple triggers can be configured but a given resource - one or more ports - can only be controlled by one trigger at a time. A given trigger instance number can be either in amon or mmon mode. Example 5-20 shows manual monitoring of a static Port Channel. Example 5-20 Failover configuration - manual with static PortChannel portchannel 10 port EXT5,EXT7 portchannel 10 enable failover enable failover trigger 2 mmon monitor portchannel 10 failover trigger 2 mmon control member INTA1,INTA2 failover trigger 2 enable failover trigger 2 limit 1 (optional) Use of vLAG with failover The vLAG feature allows a port aggregation to be connected from a switch, including an EN4093 switch, to a pair of upstream switches which are connected and configured appropriately. This function is supported for both static and dynamic link aggregations. Since the failover feature is intended for failures where a server NIC is connected to a switch which has no uplink path, it is less useful when vLAG is used between a pair of 4093’s. This is because if the uplink from a 4093 fails in such a topology, traffic will cross the inter-switch link (ISL) configured as part of vLAG and use the uplink from the other 4093. If both 4093’s uplink ports fail at the same time, then there is no uplink path available from the chassis, and the failover feature will not help. However, failover can be configured to bring down an internal port when both the uplinks and the ISL ports fail (which is likely to be a very rare event); this is shown in Example 5-21. Example 5-21 Failover configuration when vLAG is in use !*** Uplink ports *** int port EXT5,EXT7 lacp key 5757 lacp mode active ! *** ISL ports *** int port ext9,ext10 lacp key 910
  • 165. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 154 NIC Virtualization in IBM Flex System Fabric Solutions lacp mode active !*** vLAG configuration *** vlag enable vlag tier-id 20 vlag isl adminkey 910 !vlag hlthchk ... typically uses EXTM port and interface 127 on embedded switches vlag adminkey 5757 enable failover enable failover trigger 3 mmon monitor admin-key 5757 failover trigger 3 mmon monitor admin-key 910 failover trigger 3 mmon control member INTA1,INTA2.... failover trigger 3 enable Physical NIC mode with FCoE storage Physical NIC mode with storage is not very different from pNIC with no storage; the difference is that there is a dedicated VLAN for the storage traffic which must be carried to a Fibre-Channel Forwarder (FCF), which is where FC and Ethernet addressing is correlated and where FCoE traffic can be converted to standard FC traffic if the topology calls for this. Failover is configured in the same way with FCoE in use as it is without it. Uplink and downlink (server-facing) ports should be configured to carry the FCoE VLAN and the cee enable and fips enable command need to be part of the configuration. On a CN4093 or G8264CS, additional configuration is necessary to configure the Omniports and the FCF function; this is discussed under “FCoE configuration” on page 143. Design choices For pNIC mode (or vNIC) with storage, it is generally suggested that the two HBA ports and the associated switches use different FCoE VLANs, and if vLAG is in use in such a topology, then the FCoE VLANs should not cross the ISL between the vLAG partner switches. This works well with the typical SAN design where redundancy is provided by having two distinct SAN networks (SAN-A, SAN-B) which can both reach the physical storage but which share few or no components between the servers and the storage. It is possible to either send FCoE traffic on the same uplinks as data traffic, or to use separate uplinks for the different types of traffic. In the tested scenario, storage and data traffic were both forwarded to the same upstream switches, but this is not required. Even when the traffic is sent to the same upstream switches, the option to segregate the two types of traffic is available. Topologies which show this and relevant parts of the switch configurations are shown in Figure 5-12 on page 155 and Figure 5-13 on page 156. In the configuration examples shown in Example 5-22 on page 156 and Example 5-23 on page 158, VLANs 1001 and 1002 (on the second EN4093) are used to carry FCoE traffic and VLANs 1 and 2 are carrying data traffic. The traffic could be segregated by changing the configuration in the following ways: On the EN4093 switches: – Breaking the aggregation between links EXT5 and EXT7 which uses LACP key 5757 – Assigning VLAN 1001 (or 1002) to EXT5 and VLAN 1 and 2 to EXT7 (or vice-versa). On the G8264CS switches: – Breaking the aggregation between links 42 and 52 which uses LACP key 4252 – Assigning the VLANs to the links to match what was done on the EN4093s
  • 166. Chapter 5. Flex System NIC virtulization deployment scenarios 155 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Figure 5-12 pNIC with FCoE: single shared uplink aggregation
  • 167. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 156 NIC Virtualization in IBM Flex System Fabric Solutions Figure 5-13 pNIC + FCoE - with FCoE traffic on segregated uplink Example 5-22 EN4093 config excerpts for vLAG topology with pNIC and FCoE version "7.7.9" switch-type "IBM Flex System Fabric EN4093R 10Gb Scalable Switch(Upgrade1)" ... interface port INTA1 tagging no flowcontrol exit ... ! interface port EXT5 tagging exit ... ! interface port EXT7 tagging exit ...
  • 168. Chapter 5. Flex System NIC virtulization deployment scenarios 157 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm interface port EXT9 tagging pvid 4090 exit ! interface port EXT10 tagging pvid 4090 exit ! vlan 2 enable name "VLAN 2" member INTA1,EXT5,EXT7,EXT9-EXT10 ... ! Note: SAN-B will use VLAN 1002 here vlan 1001 enable name "FCoE SAN-A" member INTA1,EXT5,EXT7 ! vlan 4090 enable name "ISL" member EXT9-EXT10 ! ! portchannel 10 port EXT9 portchannel 10 port EXT10 portchannel 10 enable ! ! ! interface port EXT5 no spanning-tree stp 112 enable exit ! interface port EXT7 no spanning-tree stp 112 enable exit ... ! interface port EXT5 lacp mode active lacp key 5757 ! ... ! interface port EXT7 lacp mode active lacp key 5757 ! vlag enable vlag tier-id 20 vlag isl portchannel 10
  • 169. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 158 NIC Virtualization in IBM Flex System Fabric Solutions vlag hlthchk peer-ip 1.1.1.22 vlag adminkey 5757 enable no fcoe fips automatic-vlan ! fcoe fips enable cee enable ! interface ip 127 ip address 1.1.1.11 255.255.255.0 enable exit Example 5-23 G8264CS config for vLAG topology with pNIC and FCoE version "7.8.1" switch-type "IBM Networking Operating System RackSwitch G8264CS" ... system port 53,54 type fc interface fc 53 switchport trunk allowed vlan 1,1001 interface fc 54 switchport trunk allowed vlan 1,1001 ! ... interface port 17 description "ISL" switchport mode trunk switchport trunk allowed vlan 1-2,10,4090 switchport trunk native vlan 4090 exit ! interface port 18 description "ISL" switchport mode trunk switchport trunk allowed vlan 1-2,10,4090 switchport trunk native vlan 4090 exit ... interface port 42 description "4093 downlink" switchport mode trunk switchport trunk allowed vlan 1-2,1001 exit ! interface port 52 description "4093 downlink" switchport mode trunk switchport trunk allowed vlan 1-2,1001 exit ! vlan 2 name "VLAN 2"
  • 170. Chapter 5. Flex System NIC virtulization deployment scenarios 159 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm ! ! ! note that SAN-B (8264-2) will use vlan 1002 here and in the "allowed vlan" statements vlan 1001 name "FCoE SAN-A" fcf enable ! vlan 4090 name "ISL" ... ! interface port 17 lacp mode active lacp key 1718 ! interface port 18 lacp mode active lacp key 1718 ! interface port 42 lacp mode active lacp key 4252 ! interface port 52 lacp mode active lacp key 4252 ! ! ! vlag enable vlag tier-id 10 vlag hlthchk peer-ip 9.42.171.24 vlag isl adminkey 1718 vlag adminkey 4252 enable ! fcoe fips enable cee enable! ! zone default-zone permit ! ! ! ! ! interface ip 128 ip address 9.42.171.23 255.255.254.0 enable exit ! ip gateway 4 address 9.42.170.1 ip gateway 4 enable
  • 171. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 160 NIC Virtualization in IBM Flex System Fabric Solutions Virtual Fabric vNIC mode Virtual Fabric vNIC (or vNIC1) mode is the first NIC virtualization mode developed for use with Emulex adapters on IBM servers. It has largely been supplanted by UFP mode, which is more versatile. However, Virtual Fabric vNIC has its own failover configuration commands which are part of the vNIC group configuration. vNICs, vNIC groups, and uplinks An overall discussion of the available options for vNIC and their initial configuration can be found starting in Section 4.1, “Enabling virtual NICs on the server via UEFI” on page 66. Virtual Fabric vNIC mode introduces the following concepts: vNIC - an instance of a virtualized NIC which is associated with a specific physical port and which appears as a NIC or as an HBA as seen by a server’s OS or hypervisor vNIC group - a set of vNIC’s which are used together and which are each associated with a different physical port vNIC group uplink - a single port or a static or LACP port aggregation associated with a vNIC group vNIC group VLAN - a VLAN used for tunneling traffic from the vNICs and any non-virtualized internal ports associated the group through the group’s uplink to the wider network. Configuration of the Virtual Fabric vNIC feature is done according to the following requirements: A physical port can have up to four vNICs activated. No more than one can be for FCoE traffic and it will always be vNIC instance 2. Bandwidth of vNIC’s is specified in 100 Mb increments; each increment is also one percent of the bandwidth of a 10 Gb port. Minimum bandwidth is 1 Gb which is specified as 10 in the configuration. Each data vNIC must be associated with a vNIC group. FCoE vNIC instances can not be associated with a vNIC group. A vNIC group can have a single logical uplink, as discussed above. If there is no requirement for traffic from the group to be forwarded outside of the chassis, then an uplink is not needed. Each vNIC group must be configured with a vNIC VLAN. This VLAN is never seen outside of the embedded switch in the chassis, and is used as an outer tag for 802-1q double-tagging by the switching ASIC. vNIC group VLAN numbers are not strictly required to be unique within the network, but making them unique may avoid confusion when troubleshooting. Note: The command “zone default-zone permit” allows any server to access any storage where the LUN is made accessible. However, the default zoning configuration when FCF mode is used on a G8264CS or a CN4093 is to deny all access. Therefore either explicit zoning or the default-zone option is necessary. The status of zoning can be seen with the show zone command on converged switches. This does not apply when NPV mode is used; in that case, zoning is configured on an upstream SAN switch.
  • 172. Chapter 5. Flex System NIC virtulization deployment scenarios 161 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm NIC teaming configuration on servers NIC teaming is a feature included in current operating systems which allows multiple physical or virtual NICs to be treated as a single logical interface. Teaming can be active/active or active/standby, and the capabilities of the various teaming modes differ across the various operating systems. A discussion of teaming features and their configuration can be found in section 4.3.2, “OS side teaming/bonding and upstream network requirements” on page 112. vNIC sample failover configuration A sample failover configuration is shown in Example 5-24, including the associated vNIC and vNIC group configuration commands. In this configuration, ports EXT5 and EXT7 are uplink ports. Only one server (in slot 1 and reached via port INTA1) is shown; the configuration would be similar for other servers but the bandwidth allocations need not be identical. This configuration fragment would typically be used identically in each of a pair of 4093 switches in a chassis, especially when failover is used. Example 5-24 vNIC configuration with failover configured as part of the vNIC group vnic enable vnic port INTA1 index 1 enable bandwidth 40 vnic port INTA1 index 2 enable bandwidth 30 vnic port INTA1 index 3 enable bandwidth 20 vnic port INTA1 index 4 enable bandwidth 10 vnic vnicgroup 1 vlan 3001 member INTA1.1 (additional server vnics can go here) port ext5 failover enable vnic vnicgroup 2 vlan 3002 member INTA1.2 (additional server vnics can go here) failover enable (vnic groups 3 and 4 would be configured similarly and would need additional uplink ports to carry traffic outside the chassis)
  • 173. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 162 NIC Virtualization in IBM Flex System Fabric Solutions The failover command in the above example is used instead of the failover configuration shown elsewhere in this section when Virtual Fabric vNIC is used. vNIC failover would function as follows for vNIC groups where it is configured: The uplink port - which can also be a static portchannel or LACP portchannel specified by an adminkey - is monitored. If the uplink for a vnic group fails or is blocked due to spanning tree, then the vnic members of the group would be administratively brought down. If the other switch in the chassis is configured with the same vnic and vnic group configuration, and if the corresponding uplink in that switch is up, and if NIC teaming is configured appropriately on the servers, then traffic will use the path through the other switch. Options which are available in the standard failover trigger configuration, such as the limit option, VLAN sensitivity, and the manual monitoring options are not available in the vNIC failover feature. However, UFP uses standard failover triggers. vLAG can not be used with Virtual Fabric vNIC mode. vNIC failover and shared uplink mode Shared uplink mode with Virtual Fabric vNIC allows multiple vnic groups to share an uplink port. This mode is enabled with the vnic uplink-share command, and by specifying the uplink port (or aggregation) in those vNIC groups where it is desired. The vnic failover command is specified in the same way when shared uplink mode is in use. Shared uplink mode, like dedicated uplink mode, does not allow multiple uplinks to be specified in a given vnic group. A fuller discussion of shared uplink mode and a comparison with the default dedicated uplink mode can be found in section 3.1.1, “Virtual Fabric mode vNIC” on page 49. vLAG considerations vLAG cannot be used on ports or vNIC instances which are members of a vNIC group. A vNIC group can have only one uplink, and so it would not be possible to configure both an uplink and an ISL to connect to a vLAG peer switch. A pair of upstream switches such as the G8264s used in our testing can run vLAG between them and connect to the uplink PortChannels of a pair of vNIC groups on different switches such as EN4093’s. The EN4093’s cannot detect that vLAG is in use at the other end of their uplinks. For this to work, the servers supported by the vNIC groups must configure the same VLANS on corresponding vNIC’s connecting to each physical port. Virtual Fabric vNIC mode with FCoE FCoE traffic is configured in Virtual Fabric vNIC mode as follows: • FCoE traffic, if enabled, is always on vNIC instance 2. • When instance 2 is used for FCoE, it is not included in any vNIC group. • Since the FCoE instance is not configured in a vNIC group, failover for FCoE traffic is not configured with the vnic group failover option. • FCoE traffic does not flow over an uplink configured for a vnic group. It can flow over an uplink in shared uplink mode. • The standard failover trigger commands can be used to implement failover for FCoE traffic if desired, but if this is done the entire server-facing port will be brought down, not only the FCoE vNIC.
  • 174. Chapter 5. Flex System NIC virtulization deployment scenarios 163 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm An example config of Virtual Fabric vNIC with FCoE is shown in Example 5-25. In this example, port EXT7 is used to carry FCoE traffic upstream to the 8264CS switch where the FCF is. Example 5-25 Virtual Fabric vNIC with FCoE vnic enable vnic port INTA1 index 1 enable bandwidth 40 vnic port INTA1 index 2 enable bandwidth 30 vnic port INTA1 index 3 enable bandwidth 20 vnic port INTA1 index 4 enable bandwidth 10 vnic vnicgroup 1 vlan 3001 member INTA1.1 (additional server vnics can go here) port ext5 failover enable .... the FCoE vnic can not be added to a vNIC group .... additional groups for data vNICs would be configured here failover trigger 3 mmon monitor member EXT7 failover trigger 3 mmon control INTA1[,INTA2 ... etc.] failover trigger 3 enable ... configuration for FCoE and for FCoE uplink to G8264CS.... cee enable fcoe fips enable int port ext7 vlan 1002 member ext7 The above configuration will implement failover for both the data and FCoE vNIC instances, but it will behave in the following ways: If port EXT5 fails, vNIC INTA1.1 and others configured in vnic group 1 (which would be on other servers) would be administratively down. The same would happen if an uplink port configured in vnic groups 3 or 4 should fail; the vNICs associated with those groups would be disabled. If the FCoE uplink, port EXT7 fails, then port INTA1 and other ports specified in the failover trigger would be administratively down. This would include all of the vNIC instances configured on those ports even though they might still have a working path to the upstream network.
  • 175. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 164 NIC Virtualization in IBM Flex System Fabric Solutions Because a failure on the FCoE uplink port would bring down all of the vNIC instances rather than just the FCoE instance on vNIC 2, this configuration might not be desirable. Our testing on ESX showed that FCoE has failover mechanisms of its own on the server. If the HBA ports are configured so that both of them have access to the storage LUNs, and one of them loses connectivity to the storage, such as due to an uplink failure, storage access will fail over to the other HBA. The tests performed showed that there might be a slight advantage in how long it takes to detect that storage connectivity is lost if the server-facing port (e.g. INTA1) is brought down, but it did not appear to be a significant advantage. A diagram of the topology in dedicated uplink mode is shown in Figure 5-14. Figure 5-14 vNIC with FCoE: dedicated uplink mode vNIC with FCoE and shared uplink mode The configuration above would be changed in the following ways to use shared-uplink vNIC: On the EN4093’s – The vnic uplink-share command would be used to enable shared uplink mode – The VLAN for vnic group 1 would be set to VLAN 2. All vnic instances which are assigned to group 1 would only carry VLAN 2. – Ports EXT5 and EXT7 could optionally be aggregated together.
  • 176. Chapter 5. Flex System NIC virtulization deployment scenarios 165 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm – The ports used to uplink vnic group 1 could also carry traffic from other vnic groups, on their group VLANs. – The uplink ports or aggregations for group 1 must be configured to include the FCoE VLAN, 1001 or 1002. On the G8264CS’s: – The port or aggregation used to downlink to the EN4093’s must match its aggregation type and status and its VLAN membership, including VLAN 1001 or 1002. A topology diagram with shared uplink mode is shown in Figure 5-15. Figure 5-15 vNIC with FCoE: shared uplink mode Design Choices The choice to use shared uplink mode or dedicated uplink mode is similar to the choice between a single uplink and uplinks which segregate data and FCoE traffic discussed in the section on pNIC mode. Shared uplink mode allows data and FCoE traffic to traverse the same uplink, shared uplink mode restricts each data bearing vNIC connected to a server to carry only a single VLAN. UFP allows either shared uplinks or distinct uplinks without these restrictions.
  • 177. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 166 NIC Virtualization in IBM Flex System Fabric Solutions 5.4.5 Verifying operation This section discusses commands that help verify correct operations. Failover in pNIC mode The failover trigger commands can be checked using the show failover command, as shown in Example 5-26 and Example 5-27. Example 5-26 Show Failover command output - Manual Monitor slot-1#sho failover trigger 1 Current Trigger 1 setting: enabled limit 1 Auto Monitor settings: Manual Monitor settings: LACP port adminkey 7575 Manual Control settings: ports INTA1 INTA2 Example 5-27 Show Failover command output - Auto Monitor slot-2#show failover trigger 1 Current Trigger 1 setting: enabled limit 1 Auto Monitor settings: LACP port adminkey 5757 Manual Monitor settings: Manual Control settings: When a failover occurs, the following messages are seen. Note that in this case, FCoE was part of the configuration and the FCoE session failure also resulted in a message shown in Example 5-28: Example 5-28 Messages resulting from a failover event slot-2(config)#int port ext7 slot-2(config-if)#shut slot-2(config-if)# Apr 15 16:02:26 slot-2 NOTICE link: link down on port EXT7 Apr 15 16:02:26 slot-2 NOTICE lacp: LACP is down on port EXT7 Apr 15 16:02:26 slot-2 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA1 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA3 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA4 Apr 15 16:02:26 slot-2 NOTICE server: link down on port INTA10 Apr 15 16:02:45 slot-2 NOTICE fcoe: FCOE connection between VN_PORT 0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 is down.
  • 178. Chapter 5. Flex System NIC virtulization deployment scenarios 167 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm When internal (or other) ports are down due to a failover, they appear as disabled in a show interface link command, as shown in Figure 5-16. Figure 5-16 Links disabled after failover When a failed link recovers, messages such as the following shown in Figure 5-17 are seen. Figure 5-17 Messages resulting from failover recovery slot-2(config-if)#sho int link ------------------------------------------------------------------ Alias Port Speed Duplex Flow Ctrl Link Name ------- ---- ----- -------- --TX-----RX-- ------ ------ INTA1 1 1G/10G full no no disabled INTA1 INTA2 2 1G/10G full no no disabled INTA2 ..... INTA14 14 1G/10G full no no disabled INTA14 slot-2(config-if)#int port ext7 slot-2(config-if)#no shut slot-2(config-if)# Apr 15 16:07:35 slot-2 NOTICE link: link up on port EXT7 Apr 15 16:07:35 slot-2 NOTICE dcbx: Detected DCBX peer on port EXT7 Apr 15 16:07:39 slot-2 NOTICE lacp: LACP is up on port EXT7 Apr 15 16:07:39 slot-2 NOTICE failover: Trigger 1 is up, control ports are auto controlled. Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA1 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA3 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA4 Apr 15 16:07:39 slot-2 NOTICE server: link up on port INTA10 Apr 15 16:07:42 slot-2 NOTICE fcoe: FCOE connection between VN_PORT 0e:fc:00:01:0c:00 and FCF a8:97:dc:44:eb:c3 has been established.
  • 179. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 168 NIC Virtualization in IBM Flex System Fabric Solutions The status of a disabled port can also be seen on the server as both a NIC and HBA. Port vmnic0 is still active and carrying traffic, and has paths to the storage array, as shown in Figure 5-18 and Figure 5-19 Figure 5-18 VMware display showing port down Figure 5-19 VMware storage adapter showing no paths to storage Failover in vNIC mode The FCoE vNIC instance (INTA1.2) still requires a dedicated uplink unless shared-uplink mode is used. The failover status of a non-FCoE vNIC is shown in the show vnic vnicgroup command. However, there is no console message that shows that the associated vnic(s) have been brought down; this can also be seen by entering the same command, as shown in Figure 5-20 on page 169.
  • 180. Chapter 5. Flex System NIC virtulization deployment scenarios 169 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Figure 5-20 Show vNIC vnicgroup before failover As shown in Figure 5-21, there is no message showing that INTA1.1 has been brought down. Figure 5-21 Messages resulting from shutting down vnic group’s uplink ports slot-2#sho vnic vnicg 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 3901 Failover : enabled vNIC Link ---------- --------- INTA1.1 up Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* up * = The uplink port has LACP admin key 555 slot-2(config)#int port ext5,ext6 slot-2(config-if)#shut slot-2(config-if)# Apr 15 18:51:55 slot-2 NOTICE link: link down on port EXT5 Apr 15 18:51:55 slot-2 NOTICE lacp: LACP is down on port EXT5
  • 181. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 170 NIC Virtualization in IBM Flex System Fabric Solutions The command output, however, does show that the uplink port, EXT5, is down and that the associated vNIC members of the group have been disabled, as shown in Figure 5-22. Figure 5-22 show vnic vnicgroup after failover slot-1(config-if)#sho vnic vnicg 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 3901 Failover : enabled vNIC Link ---------- --------- INTA1.1 disabled Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* down * = The uplink port has LACP admin key 555
  • 182. Chapter 5. Flex System NIC virtulization deployment scenarios 171 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm The FCoE vnic instance can not be configured into a vNIC group and is managed by the failover trigger commands. In the configuration shown in Figure 5-23, the uplink for FCoE traffic is on port EXT7 and FCoE uses VLAN 1001. vNIC and pNIC modes are very similar in this regard. Figure 5-23 Failover configuration for FCoE vnic When EXT7 is brought down, INTA1 (and other ports if so configured) are brought down as shown in the messages. This brings down all the vNIC instances associated with INTA1, so INTA1.1 is down and it is shown in Figure 5-24 on page 172 as down rather than disabled as is the case above. Since the uplinks associated with vnic group 1 are still up, the remaining vNIC instances still have a viable path to the network. slot-1#sho run | section failover failover enable failover trigger 1 mmon monitor member EXT7 failover trigger 1 mmon control member INTA1 failover trigger 1 enable vnic enable vnic uplink-share vnic port INTA1 index 1 bandwidth 25 enable exit ! vnic port INTA1 index 2 bandwidth 25 enable exit ! vnic port INTA1 index 3 bandwidth 25 enable exit ! vnic port INTA1 index 4 bandwidth 25 enable exit ! vnic vnicgroup 1 vlan 2 enable failover member INTA1.1 key 555 exit
  • 183. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 172 NIC Virtualization in IBM Flex System Fabric Solutions Figure 5-24 Failover message flow from FCoE uplink failure slot-1#sho vnic vnicgroup 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 up Port Link ---------- --------- UplinkPort Link ---------- --------- EXT5* up EXT6* up * = The uplink port has LACP admin key 555 slot-1#config t Enter configuration commands, one per line. End with Ctrl/Z. slot-1(config)#int port ext7 slot-1(config-if)#shut Apr 15 20:27:04 slot-1 NOTICE link: link down on port EXT7 Apr 15 20:27:04 slot-1 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 15 20:27:04 slot-1 NOTICE server: link down on port INTA1 Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed because it had timed out. Apr 15 20:27:43 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed because it had timed out. slot-1#sho vnic vnicg 1 (after EXT7 shut down) ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 down UplinkPort Link ---------- --------- EXT5* up EXT6* up * = The uplink port has LACP admin key 555
  • 184. Chapter 5. Flex System NIC virtulization deployment scenarios 173 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Failover with FCoE and shared-uplink vNIC Failover in this mode is similar to the previous scenarios presented; the difference is that FCoE traffic and other data traffic share the same uplink(s). It is still appropriate to use both the failover trigger command and the vnic group failover option. The failover trigger can be used to bring down those internal facing ports that depend specifically on the uplink while the vnic group failover will bring down vnic’s (and not entire internal ports) which depend on the uplink. As in previous cases, FCoE and the HBA drivers that support it have their own failover capabilities on the servers so that if one HBA fails, the surviving HBA can continue to provide storage access if properly configured to do so. From the testing performed on VMware, this failover happens quickly. The messages that result from an uplink failure in this scenario are similar to those in the non-shared scenario presented above but they are shown in Figure 5-25. Figure 5-25 Message flow from uplink failure - shared uplink mode with FCoE slot-1#config t Enter configuration commands, one per line. End with Ctrl/Z. slot-1(config)#int port ext5 slot-1(config-if)#shut Apr 16 12:52:30 slot-1 NOTICE link: link down on port EXT5 Apr 16 12:52:30 slot-1 WARNING failover: Trigger 1 is down, control ports are auto disabled. Apr 16 12:52:30 slot-1 NOTICE lacp: LACP is down on port EXT5 Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c4 has been removed because trunk configuration on the fcf changed. Apr 16 12:52:30 slot-1 NOTICE fcoe: FCF a8:97:dc:0f:ed:c3 has been removed because trunk configuration on the fcf changed. Apr 16 12:52:30 slot-1 NOTICE server: link down on port INTA1 Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature "VNIC" not supported by peer on port INTA2 Apr 16 12:52:31 slot-1 NOTICE dcbx: Feature "VNIC" not supported by peer on port INTA10 sho vnic vnicgroup 1 ------------------------------------------------------------------------ vNIC Group 1: enabled ------------------------------------------------------------------------ VLAN : 2 Failover : enabled vNIC Link ---------- --------- INTA1.1 disabled UplinkPort Link ---------- --------- EXT5* down * = The uplink port has LACP admin key 555
  • 185. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 174 NIC Virtualization in IBM Flex System Fabric Solutions 5.5 Switch Independent mode with SPAR This section will show deployment examples using vNIC Switch Independent mode with SPAR pass-thru mode. The combination of these features - Switch Independent mode on the Emulex adapter and SPAR pass-thru mode on the embedded switches (EN4093R, CN4093, SI4093) allows for a minimum configuration effort on the embedded switches. Little to no embedded switch configuration effort is required when a new VLAN or new compute node is added to a Flex chassis in this scenario, 5.5.1 Components The following hardware and software was used in the examples in this chapter. Flex System Enterprise Chassis x240 Compute Node in bay 1 – Running ESX 5.1 – Dual port Emulex LOM CNA – DS4800 external storage attached via FC ports on G8264CS switches Two EN4093s in I/O module bays 1 and 2 Two G8264 switches to act as upstream Ethernet connectivity out of the vLAG pair of EN4093s – Provide FCF function and physical connectivity to DS4800 on Fibre Channel port 53 5.5.2 Topology Figure 5-26 on page 175 and Figure 5-27 on page 176 describe topologies that are used with SPAR.
  • 186. Chapter 5. Flex System NIC virtulization deployment scenarios 175 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Figure 5-26 Topology with SPAR passthru mode
  • 187. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 176 NIC Virtualization in IBM Flex System Fabric Solutions Figure 5-27 Local SPAR domain with Switch Independent vNIC and FCoE 5.5.3 Use cases The following configuration scenarios are discussed: “SPAR Local and Passthru mode” “vNIC Switch Independent Mode” on page 177 “vNIC Switch Independent Mode with SPAR Passthru mode” on page 177 “vLAG topology considerations” on page 178 SPAR Local and Passthru mode SPAR (Switch Partition) is an option on the EN4093R, CN4093, and SI4093 IBM embedded switches. The implementation of SPAR on the SI4093 is different from that on the other switches and is not dealt with in this book. SPAR allows the switches listed above to logically partition its available ports into multiple domains. In other words, there are multiple segments of the data plane of the switch which do not communicate with each other (unless via an external device). SPAR pass-thru mode is an option which uses 802.1Q-in-Q double tagging to allow customer VLANs to pass through a SPAR instance on a switch without any explicit configuration. This
  • 188. Chapter 5. Flex System NIC virtulization deployment scenarios 177 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm allows new VLANs to be added without any additional configuration on the embedded switch. It is possible to use the same VLAN number in multiple domains, but devices on a given VLAN in SPAR 1 will not be able to communicate with a device on that same VLAN in a different SPAR domain unless the domains are interconnected elsewhere in the network. SPAR local domain mode provides the logical partitioning mentioned above, but does not tunnel customer VLANs through the switch. Instead, each VLAN which is to be used in a domain must be explicitly configured in that domain. However, it is still possible to define the same VLAN number in multiple different domains; a device connected to a given VLAN (for example, 10) in SPAR 1 will not be able to communicate with a device on that same VLAN in SPAR 2 or SPAR 3 within the switch. vNIC Switch Independent Mode Switch Independent Mode is an option on the Emulex adapters, including the LOM included on several of the available Flex compute nodes and the EN4054 mezzanine card. This feature allows the Emulex chip to present up to four vNIC instances to a server based on configuration options in the server’s UEFI rather than those learned from an IBM switch. This mode can therefore be used with a variety of embedded I/O modules including the 4091 Pass-thru module, SI4093 System Interconnect, and I/O modules from companies other than IBM. The testing that is outlined in this section was all done with IBM embedded switches, but the commands for vNIC and UFP functionality are not used. In Switch Independent Mode, each vNIC associated with a port is assigned a default VLAN in UEFI, referred to as a LPVID (Local Port VLAN ID). Untagged traffic originating from the server on a vNIC will be tagged by the Emulex adapter with the configured LPVID VLAN. One consequence of this is that all server traffic entering the embedded switch from a server using this mode will be tagged. vNIC Switch Independent Mode with SPAR Passthru mode Using these features together allows new VLANs to be created and used on servers (including guest OS’s running under a hypervisor) and not configured on the embedded switches at all. On servers, VLANs would be created in ways including the following. This is covered in more detail in section 4.3, “Utilizing physical and virtual NICs in the OSes” on page 105. Here are some considerations regarding the creation of tagged VLANs for different operations systems: Windows Server 2012 - has network configuration tools that allow the creation of tagged VLANs. When this is done, an additional items is created in the Network Connections folder. The default (untagged) Network Connection would use the LPVID for the associated vNIC. Other versions of Windows would need to use the Emulex utility that provides the ability to create tagged VLANs. VMware - port groups which are attached to a vSwitch can have a specific VLAN associated with them; these VLANs are transmitted with tags. A port group configured with no VLAN (VLAN 0) will use the LPVID for the associated vNIC. VMware also allows a port group to be associated with VLAN 4095; when this is done, VLAN tagging is delegated to the OS’s of the guest systems. Linux - the vconfig command can create tagged VLAN interfaces attached to a specific NIC (or vNIC) as seen by the Linux OS. These interfaces default to names of the form eth<x>.<vlan#>; for example, eth0.10. The ifconfig command can be used to set the attributes of these interfaces once they are created. Various Linux distributions also have graphical tools which provide the same capabilities.
  • 189. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 178 NIC Virtualization in IBM Flex System Fabric Solutions vLAG topology considerations vLAG can not be used in concert with SPAR. Since each SPAR domain can only have one uplink, it would not be possible to successfully configure links to upstream switches and also an ISL to a vLAG peer switch. Therefore vLAG can not be used on uplink or downlink (server facing) ports which are included in a SPAR domain. A switch running SPAR could be an access switch using a PortChannel to connect to two upstream vLAG switches if desired. The SPAR domains would have to use the same VLANs, whether or not they were explicitly configured (passthru vs. local mode), and could include a FCoE VLAN in either case. A topology such as this would be more robust than one which did not include the use of vLAG. 5.5.4 Configuration This section describes the following configuration steps: “vNIC Switch Independent Mode” “Switch side configuration - FCoE” on page 181 “SPAR (Switch Partition) configuration” on page 182 vNIC Switch Independent Mode Server side - UEFI configuration This topic is covered in detail in section 4.1.3, “Special settings for the different modes of virtual NIC via UEFI” on page 76. For the examples in this section, the configuration on each port of the Emulex card is as follows, and is shown in Figure 5-28 on page 179, Figure 5-29 on page 180, and Figure 5-30 on page 180: vnic instance 1 - LPVID 3001, min. bandwidth 10%, max bandwidth 100% vnic instance 2 - FCoE vNIC, no LPVID, min. bandwidth 40%, max bandwidth 100% vnic instance 3 - LPVID 3003, min. bandwidth 20%, max bandwidth 100% vnic instance 4 - LPVID 3004, min. bandwidth 30%, max bandwidth 100%
  • 190. Chapter 5. Flex System NIC virtulization deployment scenarios 179 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Figure 5-28 UEFI Configuration for Switch Independent Mode
  • 191. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 180 NIC Virtualization in IBM Flex System Fabric Solutions Figure 5-29 UEFI Configuration - Bandwidth for Switch Independent Mode Figure 5-30 Configuration display with Bandwidth and LPVID (2 of 4 vNIC’s shown)
  • 192. Chapter 5. Flex System NIC virtulization deployment scenarios 181 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Server Side - Operating System Configuration (VMware) The host was configured with a port group for VLAN 2 and an additional port group to test guest tagging, assigned to VLAN 4095. Guests can be moved from one port group to another via the settings menu. For a deeper discussion of networking configuration on VMware and other operating systems, see section 4.3, “Utilizing physical and virtual NICs in the OSes” on page 105. Figure 5-31 VMware network configuration with two port groups Switch side configuration - FCoE There is a group of commands required to enable FCoE on an embedded switch with SPAR and Switch Independent mode. The requirements differ depending on whether the switch is an FCoE transit switch such as the EN4093R used in testing for this chapter, or a converged switch such as the CN4093 used in testing for UFP. The transit switch requirements are below; for a discussion of the configuration of the CN4093, see section 5.3, “UFP mode virtual NIC with vLAG and FCoE” on page 135. To configure an EN4093R as a FCoE transit switch, the requirements are as follows: Enable lossless Ethernet (or Converged Enhanced Ethernet) functionality with the cee enable command. Enable FIP snooping with the fcoe fips enable command. This allows the switch to become aware of FCoE initialization traffic and be ready to carry FCoE traffic. Define the VLAN(s) which will carry FCoE traffic and ensure that the appropriate server facing ports and uplink ports are members of those VLAN(s). – FCoE VLANs should not be the native VLAN on server facing ports. If vLAG is used, in general the vLAG ISL should not carry the FCoE VLANs. – It is common, but not required, to use two distinct VLANs for FCoE. This is usually done where to connect to a redundant storage networking environment. In such an environment, there are two SAN fabrics, usually referred to as SAN-A and SAN-B. Each of the fabrics would connect to its own FCoE VLAN. Typically, two FCoE transit switches in a Flex chassis would each use a different VLAN for FCoE.
  • 193. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 182 NIC Virtualization in IBM Flex System Fabric Solutions An example of the configuration commands required for FCoE transit is shown in Example 5-29. It uses VLAN 1002 for FCoE traffic, which is the default: Example 5-29 FCoE Transit Configuration cee enable fcoe fips enable vlan 1002 enable member INTA1-INTA14,EXT5,EXT7 There are no changes to the configuration above if Switch Independent mode is used. The differences when SPAR passthru mode or SPAR local mode are used are shown in the remainder of this section. SPAR (Switch Partition) configuration SPAR configuration is performed exclusively on switches; the servers are unaware of it. In SPAR local mode, the VLANs configured on the server must be explicitly configured on the switches, but this is also true when the SPAR feature is not used. SPAR pass-through mode - Switch side For the examples in this section, the configuration is as follows: SPAR 2 has at least the necessary ports (INTA1 and EXT5 and 7) configured as members of the SPAR domain. (Additional internal ports are added to the domain but were not used in testing.) The two uplink ports are aggregated together using LACP key 5757. The VLAN associated with SPAR 2 is 3992; note that this is an outer-tag or tunnel VLAN which never leaves the embedded switch on either server-facing or external-facing ports. The remaining internal and external ports on the embedded switches were not configured in a SPAR domain and continue to be configured and to operate normally. The VLANs configured on the VMware server flow through the SPAR domain as a tunnel and do not appear in its configuration. Those VLANs, along with the FCoE VLAN, are configured on the upstream 8264’s. When FCoE is used with SPAR passthru mode, the only command that is used is the cee enable command. FIPS snooping is performed on the switch upstream from the one where SPAR is used, which in our testing would be one of the upstream 8264CS switches. Example 5-30 shows SPAR configuration for the pass-thru mode. Example 5-30 SPAR Pass- through Mode - Switch Configuration for Embedded 4093 switches spar 2 uplink adminkey 5757 domain default vlan 3992 domain default member INTA1,INTA12-INTA14 enable exit
  • 194. Chapter 5. Flex System NIC virtulization deployment scenarios 183 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm SPAR local mode - Switch side The same server configuration was used to test a local SPAR domain in concert with Switch Independent mode. The local SPAR domain was configured as follows: Ports INTA1 and EXT5 and 7 are included in the domain. The two external ports are configured to use LACP key 5757. The default VLAN for the domain is 3001, which matches the LPVID for vnic 1. Local VLANs 3002, 3003, and 3004 are defined in the domain and associated with the INTA1 and EXT6 ports. These VLANs would carry untagged traffic originating on the server and sent via the vNIC instances. Local VLAN 2 is also defined on the server; it is used to carry the traffic from the guest VM’s which are attached to the port groups discussed in 5.4, “pNIC and vNIC Virtual Fabric modes with Layer 2 Failover” on page 149. The intended FCoE VLAN(s), 1001 or 1002, also need to be configured here if they are to pass through the SPAR domain. When those VLANs are configured in the SPAR configuration, the usual commands to create the VLANs and assign their members are not used. Different server facing ports within the SPAR domain can have different VLAN membership by specifying the ports desired for a specific VLAN in the domain local <n> commands. This mirrors the ability to configure VLANs on a port with the usual switchport allowed vlan or VLAN member commands. There is only a single uplink per SPAR domain, which can be an individual port, a static portchannel, or a LACP portchannel. The uplink is always a member of all of the VLANs defined within the SPAR local domain. Example 5-31 shows SPAR configuration for the local mode. Example 5-31 SPAR Local Mode - Switch Configuration for Embedded 4093 switches slot-1#sho run | section spar spar 2 uplink adminkey 5757 domain mode local domain default vlan 3001 domain default member INTA1 domain local 1 vlan 3003 domain local 1 member INTA1 domain local 1 enable domain local 2 vlan 3004 domain local 2 member INTA1 domain local 2 enable domain local 3 vlan 1001 (1002 on second switch) domain local 3 member INTA1 domain local 3 enable domain local 4 vlan 2 domain local 4 member INTA1 domain local 4 enable Upstream G8264CS configuration for SPAR The G8264CS switches have no special configuration requirements when SPAR is used on the downstream EN4093 switches. VLANs used on the servers must be configured on the G8264 switches, whether they are configured on the Emulex UEFI, the server operating system, or learned by the servers as part of FCoE initialization. The configuration on the
  • 195. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 184 NIC Virtualization in IBM Flex System Fabric Solutions G8264CS switches for their side of the uplinks from the EN4093s also must be configured to match the configuration specified on the EN4093 switches. If the upstream switches are to provide FCoE functions such as FCF or NPV, then those functions would be part of their configuration in the usual way. 5.5.5 Verifying operation In summary, it is possible to do the following if desired: Switch Independent mode with SPAR passthru domain Switch Independent mode with SPAR local domain It is also possible to use these features separately from each other if desired. Switch independent mode allows servers to see more NIC interfaces than are physically available and allocate their bandwidth (outbound only). SPAR provides a way to partition the switches on which it is available and tunnel VLANs through them with no additional configuration if passthru mode is used. VLAN numbering considerations There are several different categories of VLANs which need to have assigned numbers with these features, whether used separately or in concert. They are summarized below: Data-bearing VLANs - these are the VLANs that are defined both on the compute node and in the upstream network and which actually carry data. They are typically assigned and managed by the networking team in a customer environment. They are configured on compute nodes in the Flex chassis and also on Top-of-Rack or other aggregation switches which typically are immediately upstream of the embedded I/O modules. Switch Independent Mode LPVIDs - these are the VLANs which are configured in the UEFI page for the Emulex adapter(s) on compute nodes. They are used as the VLANs for untagged traffic sent from a compute node on a vNIC instance, so they are similar to a native VLAN on a switch. LPVIDs can either be actual VLANs which are data-bearing VLANs and will allow host or guest OS’s to send untagged traffic. One common approach, however, is to use numbers for these VLANs which are unlikely to be used for data-bearing VLANs, such as numbers in the 4000 range, and to then always send tagged traffic from hypervisors or guess OS’s. This approach allows VLAN assignments to be changed without the need to reboot the compute node and go through the UEFI configuration. SPAR domain default VLANs - these are used for the outer tag when traffic passes through a SPAR passthru domain. They never leave the switch where they are configured. They can be assigned the same number as a data-bearing VLAN number or a LPVID number, although this may result in confusion when troubleshooting the environment. VLANs used in SPAR local domains. If a SPAR local domain is used then any data-bearing VLANs, including the LPVID VLANs and others defined on OS’s must be explicitly configured as the domain default VLAN or as local VLANs within the domain. Use of SPAR local domains does not provide the ability to avoid configuring VLANs on the embedded I/O modules which is one of the key benefits of using a SPAR passthru domain. Verifying Operations: SPAR Passthru Mode The status of the SPAR is shown through the show spar command. To verify that traffic is flowing to the upstream switch, the show mac-address-table command is used on the downlink ports and/or the desired VLANs. In our test bed, addresses from the VMware management network, the virtual guest machines, and FCoE appear on the SPAR VLAN on the embedded switch but on their proper VLANs on the upstream 4093 switch. If the MAC
  • 196. Chapter 5. Flex System NIC virtulization deployment scenarios 185 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm addresses do not appear in both places, traffic is not flowing properly. The SPAR VLAN, 3992, is not seen at all on the upstream switch. The commands to verify SPAR operations and their output are listed in Example 5-32, Example 5-33, Example 5-34, and Example 5-35 Example 5-32 Show SPAR command output slot-1#sho spar ? <1-8> Show SPAR ID information slot-1#sho spar 2 Current SPAR 2 Settings: enabled, name "SPAR 2" Current SPAR 2 Uplink Settings: port 0, PortChannel 0, adminkey 5757 Current SPAR 2 Domain Settings: mode passthrough Current SPAR 2 Default VLAN Domain Settings: sparvid 3992 server port list: INTA1,INTA12-INTA14 Example 5-33 MAC address display on embedded switch slot-1#sho mac int port inta1 MAC address VLAN Port Trnk State Permanent Openflow ----------------- -------- ------- ---- ----- --------- -------- 00:0c:29:4a:60:ae 3992 INTA1 FWD N 00:0c:29:54:38:d8 3992 INTA1 FWD N 0e:fc:00:01:0c:00 3992 INTA1 FWD N 34:40:b5:be:8e:91 3992 INTA1 FWD N Example 5-34 MAC address display from 8264 switch - downlinks to 4093 8264cs-1#sho mac portchannel 67 MAC address VLAN Port Trnk State Permanent ----------------- -------- ------- ---- ----- --------- 00:0c:29:4a:60:ae 2 67 TRK 00:0c:29:54:38:d8 1 67 TRK 0e:fc:00:01:0c:00 1001 67 TRK P 34:40:b5:be:8e:91 1 67 TRK 34:40:b5:be:8e:91 1001 67 TRK Example 5-35 SPAR VLAN on upstream 8264 8264cs-1#sho mac vlan 3992 No FDB entries for VLAN 3992. 8264cs-1#sho vlan 3992 VLAN Name Status Ports ---- -------------------------------- ------ ------------------------- VLAN 3992 doesn't exist. 8264cs-1#
  • 197. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 186 NIC Virtualization in IBM Flex System Fabric Solutions Verifying Operations: SPAR Local Mode SPAR local mode requires explicit VLAN configuration for every VLAN that will flow through the SPAR domain. These VLANs do appear in the MAC address table of the switch but as shown in the configuration section <ref> above, they are configured using the domain local <n> vlan command rather than the usual VLAN membership commands. In addition to the steps shown in the section on verifying SPAR pass-through mode, the MAC address display on both the embedded and upstream switches should show all of the VLANs which are to be used. An example of a MAC display from a SPAR local server is shown in Figure 5-32. It includes the SPAR domain default VLAN, which is also the LPVID for vNIC 1, as well as addresses and VLANs used by FCoE. Figure 5-32 MAC addresses for server in SPAR local domain The SPAR local VLANs would also need to be configured on upstream switch(es). In this test case, they are the same as the vNIC LPVID VLANs. Unlike SPAR pass-through mode, FIP snooping is configured in a SPAR local domain and the show fcoe commands do work and would need to be checked to verify proper operations, as shown in Figure 5-33. Figure 5-33 FCoE information - 4093 switch - SPAR local domain mode Verifying Operations: Switch Independent Mode The status of the network can be seen from the presence of MAC address entries in the embedded and upstream switches as well as from the tools included in the operating system. show mac int port inta1 MAC address VLAN Port Trnk State Permanent Openflow ----------------- -------- ------- ---- ----- --------- -------- 00:0c:29:4a:60:ae 2 INTA1 FWD N 00:0c:29:54:38:ce 2 INTA1 FWD N 0e:fc:00:01:0c:00 1001 INTA1 FWD P N 34:40:b5:be:8e:90 2 INTA1 FWD N 34:40:b5:be:8e:91 1001 INTA1 FWD N 34:40:b5:be:8e:91 3001 INTA1 FWD N slot-1#sho fcoe fips fcoe Total number of FCoE connections: 1 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 INTA1 1001 slot-1#sho fcoe fips fcf Total number of FCFs detected: 2 FCF MAC Port Vlan ----------------------------------- a8:97:dc:0f:ed:c3 PCH65 1001 a8:97:dc:0f:ed:c4 PCH65 1001
  • 198. Chapter 5. Flex System NIC virtulization deployment scenarios 187 Draft Document for Review July 18, 2014 10:18 pm Deployment scenarios.fm Examples of the MAC displays can be seen in Figure 5-32 on page 186 and Figure 5-33 on page 186. The network adapter display from VMware is shown in Figure 5-34 and Figure 5-35. VLAN 2 is configured on multiple vSwitches and this works as intended, but uses different vNIC’s as seen by the OS. The active vNIC instances can be seen below followed by a display of all of the NIC’s known to the OS. The differing bandwidth configurations for the different vNICs on the two physical ports are reflected in the display below, except for the FCoE vNIC’s which do not appear in the network adapter display. Figure 5-34 VMware vSwitches with multiple vNIC instances Figure 5-35 VMware Network Adapter display showing all six vNIC’s Verifying Operations: Storage Access Because FCoE traffic, whichever VLAN it is using, is not detected as such on the embedded switches in this mode, the commands to display its status will not show anything when issued on the embedded 4093’s. To determine the status of FCoE, appropriate commands need to be issued on the upstream G8264 switch, as shown in Example 5-36 on page 188 and Example 5-37 on page 188.
  • 199. Deployment scenarios.fm Draft Document for Review July 18, 2014 10:18 pm 188 NIC Virtualization in IBM Flex System Fabric Solutions Example 5-36 FCoE query on embedded 4093 using SPAR Pass-through slot-1#sho fcoe fips fcoe FIP snooping is currently disabled. Example 5-37 FCoE query on upstream 8264 8264cs-1#sho fcoe fips fcoe Total number of FCoE connections: 1 VN_PORT MAC FCF MAC Port Vlan ------------------------------------------------------ 0e:fc:00:01:0c:00 a8:97:dc:0f:ed:c3 PCH67 1001 Access to network storage also needs to be verified from the servers accessing it. Three LUNs are shown as visible to the server (see Figure 5-36); when there is a configuration error or a failure on either adapter, the number of LUNs and paths drops to zero on that adapter. Figure 5-36 Storage Adapter status from VMware host
  • 200. © Copyright IBM Corp. 2014. All rights reserved. 189 Draft Document for Review July 18, 2014 10:18 pm 8223abrv.fm 10GbE 10 Gigabit Ethernet ACLs access control lists AMON Auto Monitor BACS Broadcom Advanced Control Suite BASP Broadcom Advanced Server Program BE3 BladeEngine 3 BE3R BladeEngine 3R BNT Blade Network Technologies CEE Converged Enhanced Ethernet CIFS Common Internet File System CNAs converged network adapters CSE Consulting System Engineer DAC direct-attach cables DACs direct-attach cables DCB Data Center Bridging DCE Data Center Ethernet ECP Edge Control Protocol ETS Enhanced Transmission Selection EVB Edge Virtual Bridging FC Fibre Channel FCF Fibre Channel Forwarder FCoE Fibre Channel over Ethernet FIP FCoE Initialization Protocol FO Failover FoD Feature on Demand HBA host bus adapter HBAs host bus adapters IBM International Business Machines Corporation ISL inter-switch link ITSO International Technical Support Organization KVM Kernel-based Virtual Machine LACP Link Aggregation Control Protocol LAG Link Aggregation Group LANs local area networks LOM LAN on system board MAC Media access control MMON Manual Monitor MSTP Multiple STP Abbreviations and acronyms NAS network-attached storage NFS Network File System NIC Network Interface Card NPIV N_Port ID Virtualization NPV N_Port Virtualization NTP Network Time Protocol PDUs protocol data units PFA PCI Function Address PFC Priority-based Flow Control PIM Protocol Independent Multicast PVRST Per-VLAN Rapid STP RMON Remote Monitoring ROI return on investment RSCN Registered State Change Notification RSTP Rapid STP RoCE RDMA over Converged Ethernet SAN storage area network SANs storage area networks SAS serial-attached SCSI SLB Smart Load Balance SLP Service Location Protocol SNSC System Networking Switch Center SPAR Switch Partitioning SR SFP+ Transceiver SoL Supports Serial over LAN TLV Type-Length-Value TOE TCP offload Engine Tb terabit ToR Top of Rack UFP Unified Fabric Port UFPs Unified fabric ports VEB Virtual Ethernet Bridging VEPA Virtual Ethernet Port Aggregator VM virtual machine VMs virtual machines VSI Virtual Station Interface VSS Virtual Switch System iSCSI Internet Small Computer System Interface isCLI industry standard CLI
  • 201. 8223abrv.fm Draft Document for Review July 18, 2014 10:18 pm 190 NIC Virtualization in IBM Flex System Fabric Solutions pNIC Physical NIC mode sFTP Secure FTP vLAG virtual Link Aggregation vLAGs Virtual link aggregation groups vNIC virtual Network Interface Card vNICs Virtual NICs vPC virtual Port Channel vPort virtual port
  • 202. © Copyright IBM Corp. 2014. All rights reserved. 191 Draft Document for Review July 18, 2014 10:18 pm 8223bibl.fm Related publications The publications listed in this section are considered particularly suitable for a more detailed discussion of the topics covered in this book. IBM Redbooks The following IBM Redbooks publications provide additional information about the topic in this document. Note that some publications referenced in this list might be available in softcopy only. IBM Flex System Networking in an Enterprise Data Center, 2nd Edition, REDP-4834 IBM Flex System and PureFlex System Network Implementation, SG24-8089 Storage and Network Convergence Using FCoE and iSCSI, SG24-7986 Implementing Systems Management of IBM PureFlex System, SG24-8060 IBM PureFlex System and IBM Flex System Products and Technology, SG24-7984 IBM Flex System Interconnect Fabric Technical Overview and Planning Considerations, REDP-5106 You can search for, view, download or order these documents and other Redbooks, Redpapers, Web Docs, draft and additional materials, at the following website: ibm.com/redbooks Help from IBM IBM Support and downloads ibm.com/support IBM Global Services ibm.com/services
  • 203. 8223bibl.fm Draft Document for Review July 18, 2014 10:18 pm 192 NIC Virtualization in IBM Flex System Fabric Solutions
  • 204. Todeterminethespinewidthofabook,youdividethepaperPPIintothenumberofpagesinthebook.Anexampleisa250pagebookusingPlainfieldopaque50#smoothwhichhasaPPIof526.Divided 250by526whichequalsaspinewidthof.4752".Inthiscase,youwouldusethe.5”spine.NowselecttheSpinewidthforthebookandhidetheothers:Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set.MovethechangedConditionaltextsettingstoallfilesinyourbookbyopeningthebookfilewiththespine.fmstillopenandFile>Import>Formatsthe ConditionalTextSettings(ONLY!)tothebookfiles. DraftDocumentforReviewJuly18,201410:18pm8223spine.fm193 (0.2”spine) 0.17”<->0.473” 90<->249pages NICVirtualizationinIBMFlexSystemFabricSolutions NICVirtualizationinIBMFlexSystem FabricSolutions NICVirtualizationinIBMFlex SystemFabricSolutions NICVirtualizationinIBMFlexSystemFabricSolutions
  • 205. Todeterminethespinewidthofabook,youdividethepaperPPIintothenumberofpagesinthebook.Anexampleisa250pagebookusingPlainfieldopaque50#smoothwhichhasaPPIof526.Divided 250by526whichequalsaspinewidthof.4752".Inthiscase,youwouldusethe.5”spine.NowselecttheSpinewidthforthebookandhidetheothers:Special>Conditional Text>Show/Hide>SpineSize(-->Hide:)>Set.MovethechangedConditionaltextsettingstoallfilesinyourbookbyopeningthebookfilewiththespine.fmstillopenandFile>Import>Formatsthe ConditionalTextSettings(ONLY!)tothebookfiles. DraftDocumentforReviewJuly18,201410:18pm8223spine.fm194 NICVirtualizationinIBMFlex SystemFabricSolutions NICVirtualizationinIBMFlex SystemFabricSolutions
  • 206. ® SG24-8223-00 ISBN Draft Document for Review July 18, 2014 10:19 pm INTERNATIONAL TECHNICAL SUPPORT ORGANIZATION BUILDING TECHNICAL INFORMATION BASED ON PRACTICAL EXPERIENCE IBM Redbooks are developed by the IBM International Technical Support Organization. Experts from IBM, Customers and Partners from around the world create timely technical information based on realistic scenarios. Specific recommendations are provided to help you implement IT solutions more effectively in your environment. For more information: ibm.com/redbooks ® NIC Virtualization in IBM Flex System Fabric Solutions Introduces NIC virtualization concepts and technologies Discusses UFP and vNIC deployment scenarios Provides UFP and vNIC configuration examples The deployment of server virtualization technologies in data centers requires significant efforts in providing sufficient network I/O bandwidth to satisfy the demand of virtualized applications and services. For example, every virtualized system can host several dozen applications and services. Each of these services requires certain bandwidth (or speed) to function properly. Furthermore, because of different network traffic patterns that are relevant to different service types, these traffic flows can interfere with each other. They can lead to serious network problems, including the inability of the service to perform its functions. The NIC virtualization in IBM® Flex System Fabric solutions addresses these issues. The solutions are based on the IBM Flex System® Enterprise Chassis with a 10 Gbps Converged Enhanced Ethernet infrastructure. This infrastructure is built on IBM Flex System Fabric CN4093 and EN4093R 10 Gbps Ethernet switch modules, and IBM Flex System Fabric SI4093 Switch Interconnect modules in the chassis and the Emulex Virtual Fabric Adapters in each compute node. This IBM Redbooks® publication introduces NIC virtualization concepts and technologies, discusses their deployment scenarios, and provide configuration examples that use IBM Networking OS technologies combined with the Emulex Virtual Fabric adapters. This book is for IBM, IBM Business Partner and client networking professionals who want to learn how to implement NIC virtualization solutions and switch interconnect technologies on IBM Flex System by using the IBM Unified Fabric Port (UFP) mode, Switch Independent mode, and IBM Virtual Fabric mode. Back cover