Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Trill and Datacenter Alternatives


Published on

Rajesh Kumar Sundararajan, Assistant VP of Product Management at Aricent, gave a talk about TRILL and Datacenter technologies at the Interop Show in Las Vegas, May 2012.

Published in: Technology
  • Be the first to comment

Trill and Datacenter Alternatives

  1. 1. TRILL & Datacenter technologies – theirimportance, and alternatives todatacenter network convergenceRajesh Kumar SundararajanAssistant VP Product Management, AricentMay 10, Interop Las Vegas, 2012
  2. 2. About Me ARICENT Group • Global innovation, technology, and services company • Focused exclusively on communications • Co-creating most innovative communications products and applications with customers • Complete lifecycle solutions for networks, management, applications Me • Rajesh Kumar Sundararajan • Assistant Vice President – Product Line Management • Ethernet, IP and Datacenter offerings 2
  3. 3. Agenda Datacenter imperatives Solutions proposed to datacenter imperatives Categorization of the solutions Technological overview of each of the solutions TRILL and alternatives Summary comparison Conclusion 3
  4. 4. 3 Famous C’s of Datacenter OperationsOperational trends in the Datacenter driven by Virtualization, Convergence, & Cloud Services • Increasing amount of physical space to accommodate new hardware • Additional CAPEX for hardware and software • Increased OPEX for staffing and power COST • Physical separation of users from applications • Latency-related concerns due to geographical distribution of applications • High performance demands of Cloud Services applications COMPLEXITY • Ever-increasing amounts of bandwidth demanded by consumers and enterprise applications • Increasing proliferation of video to deliver both consumer as well as business-related content CAPACITY 4
  5. 5. Virtualization, Convergence, & Cloud Services Improved efficiencies in the Datacenter from • Increased utilization of individual servers • Consolidation of servers and network ports • Simplified management and operations • Network virtualization (beyond storage and server virtualization) VIRTUALIZATION • Convergence of equipment and network architectures • Simplified design and increased flexibility of network architecture • LAN/SAN convergence enables the ubiquity and extends the reach of Ethernet in the Datacenter CONVERGENCE • Ability to push hardware (storage, server) and software (SaaS) to a 3rd party provider • Eliminates need to procure, install, update and upgrade hardware & software,  resources can be obtained on as-needed basis • Drive performance / load across datacenters CLOUD SERVICES • Remote datacenter backup 5
  6. 6. Imperatives and Solutions Increase Price/ Performance Ratio • 10Gb Ethernet support required for all new Datacenter Ethernet equipment • Migration to 40GbE/100GbE needs to be planned • Next gen products need to balance high performance with low CAPEX & OPEX Improve Energy Efficiency • Networking equipment needs to lower energy consumption costs • New server chipsets, architectures and software needed to improve overall energy efficiency Support Multiple Migration Options • Multiple migration options available for evolution to a converged network • Companies may start by migrating to converged network adaptors to top-of- rack switches to a converged core or vice versa • Equipment vendors need to support different migration options in products Evolve Standards • DCBX, ETS, PFC, QCN • FCoE, FIP, FIP Snooping • Openflow, SDN, TRILL, SPB, MC-LAG • VxLAN, NVGRE 6
  7. 7. Supporting Technologies Fabric scalability & performance Network Virtualization Interconnecting datacenters Uncomplicate Switching SPB VEPA TRILL OpenFlow MCLAG SDN Endpoint Virtualization QCN DCBX NVGRE VxLAN PFC ETS NPIV NPV Lossless Ethernet Convergence FCoE FIP FIP Snooping 7
  8. 8. Lossless EthernetWhy existing QoS and Flow Control are not enough QoS techniques are primarily – Flow control – pause frames between switches to control sender’s rate – 802.1p and DSCP based queuing – flat or hierarchical queuing – Congestion avoidance methods – WRED, etc. Issues – Flow control – no distinction between different applications or frame priorities – 802.1p and DSCP based QoS methods • Need to differentiate classes of applications (LAN, SAN, IPC, Management) • Need to allocate deterministic bandwidth to classes of applications – Congestion avoidance methods • Rely on dropping frames at the switches • Source may continue to transmit at same rate • Fine for IP based applications which assume channel loss • Don’t work for storage applications which are loss intolerant 8
  9. 9. Lossless Ethernet Flow control – All traffic on port isPriority Flow Control (PFC) affected DCBX Internal back pressure Q1 – CoS 1 Q2 – CoS 2 PAUSE frames Q3 – CoS 3 Q4 – CoS 4 Priority Flow Control – Traffic for specific CoS on port is affected Internal back pressure TLVs in LLDP messages Q1 – CoS 1 Q2 – CoS 2 Advertise own capabilities Q3 – CoS 3 PAUSE (CoS = 3) frames Q4 – CoS 4 Priority groups = x; PFC = Yes, which priorities; Quantized Congestion Notification (QCN) Congestion notification = Yes;Reaction point Congestion point (switch facing(source/ ETH congestion on egress port) TLVs in LLDP messagesend-point) Accept or No Advertise own capabilitiesThrottleTx rate ETH Congestion Switches advertise and notification know capabilities to use message on links 9
  10. 10. NPIV & NPV•NPIV = N_PortID_Virtualization FiberChannel FiberChannel Storage node switch switch - host based technology•NPV = N_Port_Virtualization E_Port F_Port E_Port N_Port – switch based technology (N_PortID)•Technology for the storage side FiberChannel•Relevant to Datacenter Ethernet because of switch Storage nodethe virtualization capabilities and bearing on NPIVFCoE Physical port F_Port•NPIV – virtualization of storage device port Logical port N_Port1 (N_PortID1)to support VMs, multiple zones on same link N_Port2 (N_PortID2)•Requires support on storage endpoint and N_Port2 (N_PortID3)connected switch as well NPV•NPV – endpoint is unaware; switch proxiesfor multiple endpoints (N_Ports) using a F_Port NP_Portsingle NP_Port•Reduces number of switches required Multiple N_PortIDs 10
  11. 11. FCoE (FiberChannel Over Ethernet) FCoE switch •Means to carry FiberChannel frames within FC endpoint FC link Ethernet frames Ethernet •Interconnects FiberChannel endpoints or Ethernet switch Ethernet switches across an Ethernet (DataCenter interconnect Bridged Ethernet) network Ethernet switch Ethernet switch •FC frames encapsulated in Ethernet Ethernet switch header •New EtherType to transport FC frames Ethernet switch •FCoE can be enabled on – FC endpoint devices / FC switches / Ethernet switches Ethernet FCoE switch FCoE switch FC Endpoint Ethernet switch Ethernet switch FCoE switch FC switch FC link Ethernet Ethernet Ethernet FC link FC endpointsFC frame FCoE frame FCoE frame Eth header FC frame FC frame FCoE frame 11
  12. 12. FIP (FCoE Initialization Protocol)•Protocol between FC devices built on assumption of direct connection•Traversing an Ethernet cloud requires additional procedures•Addressed by FIP Device discovery Initializing communication Maintaining communication•FCoE Control frames – FIP – uses different EtherType than FCoE Data frames – FC frames encapsulated with FCoE EtherType FCoE device discovery Initializing communication Maintaining communication FCoE device FCoE device Ethernet 12
  13. 13. FIP Snooping•Protocol between FC devices built on assumption of direct connection•FC switch enforces many configurations, performs validations and access control onattached endpoints•Security concerns when this is exposed over non-secure Ethernet•Addressed by FIPSnooping Done on transit switches carrying FCoE, on VLANs dedicated to FCoE Switches install firewall filters to protect FCoE ports Filters are based on inspection of the FLOGI procedure Example (a) deny Enodes using FC-Switch MAC address as source MAC (b) ensure address assigned to Enode is used only for FCoE traffic FIP FIP Snooping FIP Snooping Ethernet switch Ethernet switch FCoE device FCoE device Ethernet 13
  14. 14. Fabric Scalability and PerformanceWhy Spanning Tree (RSTP/MSTP/PVRST) is not enoughNecessary fundamentals for FCoE to work Multipath through the network Lossless fabric Rapid convergence in fabricSpanning tree (with variants like RSTP, MSTP, PVRST) is the universal way to provide redundancyand stability (loop avoidance) in Ethernet networksSpanning tree is a distance vector based protocol • Routing equivalent of spanning tree (distance vector based) = RIP • Limits the size of the network that it can handle • Much smaller network size than link state based protocols (OSPF / ISIS)Datacenter networks have got much bigger (and getting bigger still !!)Spanning tree blocks links / paths to create redundancy  inefficient capacity utilizationDoes not support multipath which is important for SAN/LAN convergenceThe TRILL solution • Apply link state routing to bridging / Layer 2 Ethernet • Use technique like ECMP for alternate paths without blocking any links or paths 14
  15. 15. Fabric Scalability and Performance TRILL – Transparent Interconnection of Lot of LinksFocus on problem of dense collection of RBridge RBridgeinterconnected clients and switchesAttempt to:• Eliminate limitations of spanning tree TRILL control protocol centric solutions (IS-IS extension)• Bring the benefits of routing technologies to the L2 network (without the need for RBridge RBridge IP/subnets, etc.)Objectives: Learn• Zero configuration and zero assumptions MAC• Forwarding loop mitigation MAC TRILL control frame frame (advertise learnt MAC)• No changes to spanning tree protocols Normal bridgeKey components: RBridge RBridge RBridge or destination• R-bridges (Routing bridges)• Extensions to IS-IS• Apply link state routing to VLAN aware bridging problem MAC TRILL MAC TRILL MAC MAC frame header frame header frame frame 15
  16. 16. Fabric Scalability and PerformanceTRILL – Handling Multicast and Broadcast• Create distribution tree with selected root Root for• Distribute from root to rest of tree Distribution Tree 1• Multiple distribution trees for Root for – Multipath and load distribution Distribution Tree 2 – Alternate paths and resilience• All Rbridges pre-calculate and maintain the distribution trees• Algorithm specified for ensuring identical calculations at all RBRidges• By default - distribution tree is shared across all VLANs and multicast groups Distribution Tree 1• How an ingress node selects a specific tree (from multiple Distribution Tree 2 existing trees) is not specified• Ingress Rbridge receiving multicast encapsulates in TRILL header and sends to root of tree and to downstream branches• Frame with TRILL header is distributed down branches of tree• Rbridges at edges remove TRILL header and send to receivers• Rbridges listen to IGMP messages• Rbridges prune trees based in presence of multicast receivers• Information from IGMP messages propagated through the TRILL core to prune the distribution trees 16
  17. 17. Fabric Scalability and PerformanceTRILL – Issues and Problems •Does not (yet) address different types of virtual networks (VxLAN, NVGRE…) •Provides for L2 multipathing (traffic within a VLAN) but L3 (routed traffic across VLANs) is unipath only •Initial scope of TRILL defined to address spanning tree limitations •IP maybe an afterthought; only 1 default router with VRRP = unipath for L3 •Result of above – forces datacenter operators to provision larger VLANs (more members per VLAN) – so, restricts segmentation using VLANs •Requires hardware replacement in switching infrastructure •Existing security processes have to be enhanced – Existing security processes rely on packet scanning and analysis – Encapsulation changes packet headers, existing tools must be modified / enhanced •Does not inherently address fault isolation •Does not inherently address QoS mapping between edge & core (example – congestion management requires congestion in network to be signaled to source ) •Does not clearly address source specific multicast; so multicast based on groups only 17
  18. 18. Alternatives? Network centric approaches SPB EVB – VEPA / VN-Tag MC-LAG Openflow / SDN •Endpoint /server centric approaches VxLAN NVGRE 18
  19. 19. Fabric Scalability and Performance SPB – Shortest Path Bridging• Key Components SP-Bridge SP-Bridge – Extend IS-IS to compute paths between shortest path bridges – Encapsulate MAC frames in an additional header for transport Control protocol (IS-IS extension) between shortest path bridges• Variations in Encapsulation – SPB – VID Normal bridge or – SPB – MAC (reuse 802.1ah SP-Bridge SP-Bridge SP-Bridge destination encapsulation)• Allows reuse of reliable Ethernet Learn Learn OAM technology (802.1ag, MAC MAC Y.1731) MAC SPB MAC SPB MAC MAC frame header frame header frame frame• Source MAC learning from SPB encapsulated frames at the edge SP-bridges 19
  20. 20. EVB (Edge Virtual Bridging)Addresses interaction between virtualswitching environments in a hypervisor and VM-11 VM-12 ……… VM-1n1st layer of physical switching infrastructure Virtualizer2 different methods – VEPA (Virtual EthernetPort Aggregator) & VN-Tag VEB / vSwitchWithout VEPA – in virtualized environment, Ethernet switchtraffic between VMs is switched within thevirtualizer VM-11 VM-12 ……… VM-1nKey issues – monitoring of traffic, securitypolicies, etc, between VMs is broken VirtualizerWith VEPA – all traffic is pushed out to the Negotiationswitch and then to the appropriate VM VEPAKey issue – additional external link bandwidth Ethernet switchrequirement, additional latencySwitch must be prepared to do “hairpin turn”Accomplished by software negotiationbetween switch and virtualizer 20
  21. 21. MC-LAG (Multi Chassis LAG)Relies on fact that datacenter network islarge but with predictable topology COREDownstream node has multiple links todifferent upstream nodes AGGREGATION & ACCESSLinks are link-aggregated (trunked) intosingle logical interfaceCan be used in redundant or load-shared Typical datacenter networkmodeLoad-shared mode offers multipathResilience and multipath inherent Coordination protocol across switchesNo hardware changes required LAGAccomplished with software upgradeSwitches must have protocol extension to LAG LAG LAGcoordinate LAG termination across switchesDoes nothing about address reuse problemfrom endpoint virtualization 21
  22. 22. OpenFlow and Software Defined Networking (SDN) Paradigm shift to networking SDN Controller [ Service Creation, Flow management, first packet processing, route creation) Flowvisor ( Responsible for Network Partitioning based on Rules ( e.g. Bridge IDs, Flow ids, User credentials) Secure connection Open Flow Open Flow Open Flow Open Flow enabled enabled enabled enabled Switches Switches Switches Switches Open Flow Open Flow Open FlowSource -ONF enabled enabled enabled Switches Switches Switches SDN Controller – Focus on Service OpenFlow – Enabling Network Virtualization • Simplify the network (make it dumb? • Move the intelligence outside 22
  23. 23. Fabric Scalability and Performance OpenFlow and Software Defined Networking (SDN) paradigm SDN Controller – Focus on OpenFlow – Enabling Network Service Virtualization• Open platform for managing the traffic on • Light weight software (strong resemblance to “open flow” complying switches client software)• Functions – network discovery, network • Standard interface for access and service creation, provisioning, QoS, “flow” provisioning management, first packet handling • Secure access to controller• Interoperability with existing networking • Push/pull support for statistics infrastructure – hybrid networks • Unknown flow packet trap and disposition• Overlay networks, application aware through controller routing, performance routing, extensions to existing network behavior • Comply to OpenFlow specifications (current version 1.2) • Accessed and managed by multiple controllers 23
  24. 24. Fabric Scalability and Performance OpenFlow/SDN based switches• Open flow enabled switches – 2 Rule Action Stats types – Hybrid (OpenFlow HAL and current Packet + byte counters network control plane) 1. Forward packet to port(s) – Pure OpenFlow Switches 2. Encapsulate and forward to controller 3. Drop packet• Pure OpenFlow 4. Send to normal processing pipeline – Simpler, low in software content, lower Switch MAC MAC Eth VLAN IP IP IP TCP TCP cost Port src dst type ID Src Dst Prot sport dport + mask• Primarily contains – SSL (for secure management) Management (CLI, SNMP, WEB) Secure Connection (SSL) – Encap/decap – Hardware programming layer/driver Routing Block ( Protocols –RIP, DCBX –for Data centers OSFP, ISIS, BGP, Encap / Event – Event handler RTM, Multicast) Decap Handler PFC , ETS LLDP• Open Flow switches receive IP forwarding Chassis Infrastructure software instructions from service controllers Management and Congestion Notification Master Policy Engine System Monitoring• Architectural aspects Layer -2 Block – QOS (Hierarchical, multiple Vlan, STP, LACP, Scheduling Scheme) & ACL HAL Layer for open flow – Resource partitioning IGMP Management – Packet flow aspects 24
  25. 25. Openflow – key issues• Enormous amount of provisioning for rules in each switch• In today’s switches – must rely on setting up ACLs• Switches typically have low limits on number of ACLs that can be set up• Will need hardware upgrade to use switches with large amounts of ACLs• Ensuring consistency of ACLs across all switches in the network? – troubleshooting challenges 25
  26. 26. NVGRE and VXLAN – BackgroundChallenges with Virtualization:• More VMs = more MAC addresses and more IP addresses• Multi-user datacenter + VMs = need to reuse MAC and IP addresses across users• Moving applications to cloud = avoid having to renumber all client applications = need to reuse MAC addresses, IP addresses and VLAN-IDs across users• More MAC addresses and IP addresses = larger table sizes in switches = larger network, more links and paths Necessity = Create a virtual network for each userPossible Solutions:• VLANs per user – limitations of VLAN-Id range• Provider bridging (Q-in-Q) – Limitations in number of users (limited by VLAN-ID range) – Proliferation of VM MAC addresses in switches in the network (requiring larger table sizes in switches) – Switches must support use of same MAC address in multiple VLANs (independent VLAN learning)• VXLAN, NVGRE – new methods 26
  27. 27. VxLAN (Virtual eXtensible LAN) – How it Works VM-11 VM-12 ……… VM-1n VM-21 VM-22 ……… VM-2n ip-11 ip-12 mac-11 mac-12 ip-1n mac-1n ip-21 ip-22 mac-21 mac-22 ip-2n mac-2n Without HYPERVISOR HYPERVISOR VxLAN SERVER – A (ip-A, mac-A) SERVER – B (ip-B, mac-B) Ethernet frame from VM-11 to VM-12 D-MAC = mac-21 S-MAC = mac-11 D-IP = ip-21 S-IP = ip-11 payload Ethernet header IP header VM-11 VM-12 ……… VM-1n VM-21 VM-22 ……… VM-2n To ip-21 ip-11 ip-12 ip-1n ip-21 ip-22 ip-2n UsingTo mac-21 mac-11 mac-12 mac-1n mac-21 mac-22 mac-2n HYPERVISOR HYPERVISOR VxLAN To ip-B SERVER – A (ip-A, mac-A) SERVER – B (ip-B, mac-B) (tunneling inTo mac-B UDP/IP) Ethernet frame from VM-11 to VM-12 D-MAC = mac-B S-MAC = mac-A D-IP = ip-B S-IP = ip-A VNI D-MAC = mac-21 S-MAC = mac-11 D-IP = ip-21 S-IP = ip-11 payload Outer Ethernet header Outer IP header Inner Ethernet header Inner IP header 27 27
  28. 28. VXLAN - Internals VN-X (User X) VM-11 VM-12 ……… VM-1n VM-21 VM-22 ……… VM-2n VN-Y (User Y) ip-11 ip-12 ip-1n ip-21 ip-22 ip-2n mac-11 mac-12 mac-1n mac-21 mac-22 mac-2n HYPERVISOR HYPERVISOR SERVER – A (ip-A, mac-A) SERVER – B (ip-B, mac-B)Self tableVM-11 VNI-XVM-12 VNI-YVM-13 VNI-Z Remote table ip-21 VNI-X ip-B ip-22 VNI-X ip-B ip-2n VNI-Y ip-B Remote table – learnt and aged continuously based on actual traffic 28
  29. 29. NVGRE• NVGRE = Network Virtualization using Generic Routing Encapsulation• Transport Ethernet frames from VMs by tunneling in GRE (Generic Routing Encapsulation)• Tunneling involves GRE header + outer IP header + outer Ethernet header• Relies on existing standardized GRE protocol – avoids new protocol, new Assigned Number, etc• Use of GRE (as opposed to UDP/IP)  loss of multipath capability VXLAN NVGRE VNI – VXLAN Network Identifier (or VXLAN TNI – Tenant Network Identifier Segment ID) VxLAN header + UDP header + IP header + GRE header + IP header + Ethernet header Ethernet header = 8+8+40+16 = 72 bytes = 8+40+16 = 64 bytes addition per Ethernet addition per Ethernet frame frame VTEP - VXLAN Tunnel End Point - originates NVGRE endpoint or terminates VXLAN tunnels VXLAN Gateway - forwards traffic between NVGRE gateway VXLAN and non-VXLAN environments New protocol Extends existing protocol for new usage 5/29/201 29 Multipath using different UDP ports No multipath since GRE header is same 2 29
  30. 30. Problems with NVGRE and VXLANConfiguration and Management: Security:• Controlled multicast (with the use of say, IGMPv3) • Existing security processes are broken – Existing security processes rely on packet within tenant network now gets broadcast to all scanning and analysis endpoints in the tenant’s virtual network – since – Encapsulating MAC in IP changes packet broadcast and multicast will get mapped to one headers, existing tools must be modified / enhanced multicast address for the entire VXLAN/VNI • Starts to put more emphasis on firewalls• Requires configuration of the virtual network mapping and IPS in virtualizers – redoing Linux consistently on all the virtual machines – network stack !! management nightmare without tools to debug and isolate misconfiguration• These maybe just the tip of the iceberg – will we need virtualized DHCP, virtualized DNS, etc.Partial Support: New Ecosystem:• Does not address QoS • Existing network analysis tools won’t – Encapsulation / tunneling techniques like Provider Bridging or work – partner ecosystem for technology PBB clearly addressed QoS by mapping the internal has to be developed “marking” to external “marking” • Existing ACLs installed in network• What is tunneled may be already tunneled – questions infrastructure are broken of backward compatibility with existing apps • Needs additional gateway to communicate outside the virtual network 30
  31. 31. TRILL vs. VXLAN (or) TRILL & VXLAN? TRILL VxLAN Addresses network; tries to optimize Addresses end points (like servers) network Technology to be implemented in network Technology to be implemented in virtualizers infrastructure (switches/routers) Needs hardware replacement of switch Needs software upgrade in virtualizers infrastructure (assuming virtualizer supplier supports this) Restricted to handling VLANs Agnostic about switches/routers between end- No optimizations for VXLAN/NVGRE points (leaves them as good/bad as they are) No changes required in end-points More computing power requirement from virtualizers (additional packet header handling, additional table maintenance, associated timers) Need is for large datacenters (lots of links Need is primarily for multi tenant datacenters and switches)Need for both – depending on what requirement a datacenter addresses 31
  32. 32. TRILL & the other networking alternatives TRILL SPB VEPA VN-Tag MC-LAG SDN / OpenflowMultipath Multipath No / NA No / NA Multipath MultipathRequires Requires No h/w H/w change No h/w No h/w change (forh/w change h/w change change at analyzer change small number of point flows) Requires h/w change for large numbersMore More Negligible More complex Negligible Simpler switchcomplex complex additional endpoint additional Complex controllerswitch switch complexity complexityDoes not Tools Existing Requires Existing Existing toolsaddress available tools enhancement tools continue to worknetwork from Mac- continue to to existing continue totrouble in-Mac work tools at workshooting technology analyzer pointLesser provisioning Least Lesser Least High amount of provisioning provisioning provisioning provisioning 32
  33. 33. Aricent’s Datacenter Ethernet SoftwareFrameworks Aricent IPRs Intelligent Switching Solution (ISS) with Datacenter Switching Extensions Services 33
  34. 34. Feel free to meet me at booth #2626 or get in touch with me for any questions at: Rajesh Kumar Sundararajan:
  35. 35. Thank you.