Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.



Published on

  • Be the first to comment


  1. 1. Networking for DevOps
  2. 2. Application Architect and Networking Traditionally application architect’s foray into networking dealt with solving server IO bottleneck and offloading the CPU. Virtualization did not change this focus. Application architects focused on solving the IO bottleneck in order to minimize waste in CPU cycles. Technologies such as RSS, LSO and TSO were incorporated into an intelligent NIC to load balance traffic across multiple cores in the server and therefore avoid cpu starvation A parallel focus - driven by cost savings achieved via storage and ethernet traffic convergence - was converged NIC (CNIC) which carried storage and ethernet traffic on a single wire. Virtualization did not shift the focus away from solving IO bottleneck. PCIe innovation such as SR-IOV and MR-IOV were incorporated into CNICs. IOV technologies enabled vNICs and VM specific offload services such as hypervisor bypass technologies. The scale of the application in Web 1.0 world did not require application architects to focus on network topology, segmentation and control plane protocols. 3-Tiered datacenter network was sufficient VSM Intelligent NIC - TCP Offload - Converged Wire - RSS/LSO - Flow Classification N1Kv3 4 Networking Focus of Application Architect in Web 1.0 2 1 1 2 3 4
  3. 3. Why Care for Network Topology Today, network plays a critical role in distributed application execution. Two key service assurances – Latency and Bandwidth – are influenced by network • Today’s programming frameworks widely use asynchronous IO, Latency Shifting (Caching) and message based communication. These frameworks enable application logic and data to be distributed among tens of thousands of servers across multiple tiers. The nodes within a tier and across tiers communicate synchronously or asynchronously over a routed IP network. • A distributed application execution environment has to arbitrate the tradeoffs in latency and bandwidth both of which are greatly influenced by underlying network topology and routing control plane. L1 L2$ P P L1 DRAM Disk DRAM DiskDRAM Disk Rack Local Cluster DRAM Disk DRAM Disk DRAM Disk Distributed System Latency Hierarchy Local System Mem: 80ns Disk: 10ms Latency Local Rack Mem: 200us Disk: 28ms Remote Rack Mem: 500us Disk: 30ms DRAM Disk
  4. 4. Network Topology – Graph Model Network layout is a combination of chosen topology (design decision) plus chosen technology (architecture decision). Graph is a concise and precise notation to describe a network topology. • Crossbar • Good for small input/output • Complexity is N^2 where N is number of input/output • Number of switches required N^2 – Problem when N is large. • Fat tree Clos • Can be non-blocking 1:1 or blocking x:1 • Characterized as Clos (m,n,r) • Complexity is (2n+r)*rn = # of switches • Torus • Blocking network; but great at scale • Optimized for data locality • Good for growth and hybrid networks • Complexity increases with switch port count. k/(2logbaseK/2 (N). Where k = port count and N = number of servers. • High port count switches are better with Clos than Tori. • Direct and Indirect Topologies • Crossbar, Fat-tree clos are indirect network i.e. nodes are not part of the network topology. Torus is a direct network. 1 2 n x x x xx x x x x 1 2 n Crossbar 1n 2n rn . . . n x m 1 2 m . . . 1 2 r . . . n n n Fat-tree 2DTorus
  5. 5. Characterizing Network Performance Latency = Sending overhead + TLinkProp x (d + 1) + (Tr + Ts+ Ta) x d + PacketSize/BW x (d + 1) + Receiving overhead where d = number of hops Tr = Switch routing delay Ta = Switch arbitration delay Ts = Switch switch delay (pin2pin) TLinkProp = Per link propagation delay Effective Bandwidth = min-of ( N * BWIngress, s * N, r * (BWBisection/g) , s * N * BWEgress ) where s is the fraction of traffic that is accepted r is the network efficiency g is fraction that crosses bi-section Network Performance • Port Buffers directly affect “s”. Port buffers sized to the length of the link optimizes “s” and can be assumed to be s =1 . • g is directly correlated to application traffic pattern. A well distributed application will max out the BWbisection • r – network efficiency is a function of multiple factors. The most prominent is link and routing efficiency i.e. control plane. • Effective bandwidth is the bandwidth between user and application i.e. north south. Bisectional bandwidth is the minimum bandwidth between two nodes i.e. east-west Network topology affects the hop count i.e. paths through the network and therefore Bi-sectional bandwidth and Latency. Application traffic patterns drives the rest of the performance metrics.
  6. 6. Traditional Datacenter Network Traditionally datacenter networks were optimized to remove bottlenecks in the north-south traffic i.e. optimize Effective Bandwidth. However, that architecture is not suitable for a distributed application that has dominant flows that traverse east-west Main Issues with this Architecture • Topology is single rooted tree with single span/path between source and destination, which causes bi- sectional bandwidth to be much lower than effective bandwidth i.e. no multiple paths • Traffic among servers is 4X or higher compared to traffic in/out of datacenter • Not optimized for small flows. Observed flows inside datacenter are short with 10-20 flows per server • Adaptive routing is not fast enough. Optimization requires complex L2/L3 configuration • Ratio of bandwidth between memory/disk and CPU to bandwidth between servers at all time high. This hurts distributed computing which use the inter server bandwidth CoreAggregationAccess Traditional Datacenter Single L2 Domain L3 Boundary
  7. 7. Changing Traffic Pattern in Datacenter The observed ratio of north south traffic coming into a web application to traffic that is generated inside the datacenter to serve the incoming session is observed to be 1:80 and higher Web App GUI Layer Bus. Logic Layer session cache North-South Traffic Public Profile WebApp External Ad Server Internal Private Cloud http-rpc or Jms calls Profile Service Messenger Service Groups Service News Service Search East-West Traffic r/o r/w r/w r/w Replicated DB 1:80 Core DB write DB Server updates Update Server Graph Updates Profile Updates JDBC etc.
  8. 8. Datacenter Fabric Industry took two approaches to scale the datacenter network: Overlays and Interconnects. • Issues that Overlays Address • Multi-tenant Scalability • VM mobility • Virtual Network Scalability • VM Placement • Virtual to Physical and Virtual to Virtual communication scalability • Asymmetry of network innovation between physical and virtual world • What is not addressed by Overlays • Standard way to terminate tunnel on the hypervisor and physical switch • Mapping between the virtual addresses and physical addresses. (who fills that table at the border gateway?) • Network flooding (ARP and L2 Multcast) • Topology unware and unoptimized • Compatibility with ECMP • Inter- datacenter traffic mobility • Trombone because L2 focus of overlays • Future proofing with SDN Overlays should address the challenges presented by a. Highly distributed virtual applications such as Hadoop/Bigdata. Where an application can span multiple physical and virtual switches. Any overlay tunnel should support both virtual and physical endpoint b. Sparse and intermittent connectivity of virtual machines. The access switch may drop in/out of participating in the virtual network c. VMs are dynamic. VM creation, deletion, Suspend/Resume cycles present a challenge for network d. Should work with existing physical switches without software upgrade. Only the first hop that add/removes packet markings should be required new purchase e. Failure domains should be limited to tunnel endpoints f. Define multiple administrative domains
  9. 9. Datacenter Overlay Landscape Overlay Technologies Adjacency Pros Cons Fabric Path L2 - vPC Support - ECMP upto 256 - Faster Convergence - Multiple L2 VLAN - No inter DC - Needs ASIC Support - Not vm aware - No support for FCoE TRILL L2 - Unlimited ECMP - SPF delivery of unicast - Fast convergence - No inter DC - Needs ASIC Support - New Tools OA&M - No vm aware Shortest Path Bridging (802.1aq) L2 - Support for existing ethernet data planes standards .ah and .ad - Unicast/multicast - Faster Convergence - 16 way ECMP only - Limited market traction - Not vm aware VXLAN L2 - MAC-in-UDP with 24 bit VNI - Scalable - Enables virtual L2 segments - Lacks explicit control plane - Requires IP Multicast - Needs ASIC support - Virtual tunnel endpoint only NVGRE (Microsoft) L2 - GRE Tunnels - Most asics have support for GRE - Does not leverage UDP so out packet headers cannot be leveraged OTV/LISP L2 - Datacenter Interconnect - Limited platform support Vpn4dc L3 - Proposed by service provider - Not much vendor support There are multiple competing standards for overlays i.e. using L3 network infrastructure to solve L2 scalability problems.
  10. 10. Datacenter Fabric – Programmatic View The management plane offers DevOps the opportunity to influence the path of their application data over the network. It is also the plane used by Cloud Controllers to provision resource along that path • Thus far, applications adapted to a network. With the new management plane, the network can adapt to the application. • Intelligence shifts to the edge of the network. Application can use APIs to probe networks and alter their consumption and constraints. • Policy definition points can analyze network data to create patterns which drive policy creation tools e.g. triangulating privacy zone, sampling at 100Gbs rates etc • The network comes under pressure to scale up/down to application needs. All the datacenter fabric technologies aim to enable this elasticity in the network. Physical Switch OpenStack Virtual Switch Server VM Compute Service Storage Service API Network Service DevOps Cloud Controllers Network virtualization technologies such as FP, TRILL, VXLAN, NVGRE, SPB play here
  11. 11. Virtual Networking Industry has a few competing virtualization stacks. The components may be different but the networking issues are similar for DevOps Components DevOps Needs to be aware of this embedded networking functionality … Hypervisor - Implements the v-switch. Examples of virtual switches include Cisco N1Kv, OpenSwitch etc. - Initiates the Vmotion which requires L2 adjacency i.e. within a VLAN - Challenges in scaling L2 across datacenter (DCI) Virtual Switches - VLAN Capable - Port group associated with VLAN - Host processor does packet processing - Challenges includes trunking of links between switch and server - Mapping server VLANS (in hypervisor) to physical switch VLANs - Size of VLAN is increasingly becoming an issue. Being resolved through encapsulation of L2 frames inside L3 (VXLAN, NVGRE, FabricPath, TRILL) Virtual NICs - Increasing getting intelligent with hardware assisted vNIC. - Offloading to assist in TCP latency - Teaming to increasing bandwidth into server - Multi-tenancy with FEX (adapter and VM) Cloud Orchestration Directors - What changed is scalability and integration with external orchestration systems - Distributed Virtual Switches (across servers) presented coordination challenges. Single control point are called directors. - Each hypervisor in a cluster continues to switch at L2 independently i.e. data paths are not centralized Physical Network Virtual Machines Virtual Switch Virtual Servers Management Center VM vFW vSLB vWAAS Virtual Networking Basics
  12. 12. Software Defined Networking Host based Centralized Controller Orchestration Topology Director Physical Network Component Description Directors - Directors for orchestration and topology – need to scale. Topology graph needs to scale for MSDC datacenter. What is the storage model (asset inventory, configuration etc.) - No explicit DevOps support i.e. no server and tooling for developers Controller - Centralized controller is yet to be proven for datacenter class. deployment . Issues remain around scalability, redundancy, security etc. - Theoretically good for large scale tables, but does not solve per device overflow - Programmability comes at cost of configuration latency Physical Network Existing network with support for OpenFlow Control Plane Mgmt Plane Data Plane Featur es Fwdin g Switch Control Plane Mgmt Plane Data Plane Featur es Fwdin g Switch Control Plane Mgmt Plane Data Plane Featur es Fwdin g Switch SDN decouples control plane from the data plane with a yet to be proven assumption that the economics of the two planes are distinct Note: Software defined network is different from software driven network. The latter is applications using available APIs to provision the network services for higher level SLA such as reservation, security etc.
  13. 13. HyperScal eDatacenter HSDC address scale out networking requirement of very large datacenters with 100K+ hosts. Innovations are targeting four key areas • Topology - To overcome limitations of traditional tree, folded-clos inspired topologies are used. - Some topologies include ToR as leaf node while some other like Bcube include host based software switches as leaf • CPU vs. ASIC - Switch microarchitecture based on merchant silicon implements clos inside the switch. Infiniband started this trend in early 2000s. - MSDC is biased towards merchant silicon, even though no compelling feature has been identified • Layer 2 vs. Layer 3 - FabricPath and TRILL scale the layer-2 network through encapsulation of mac inside IP packet. Others protocols prefer IP inside IP to scale the network. E.G Cisco’s Vinci • Multipath Forwarding - ECMP based static hash based load balancing has increased TCP layer latency. New proposals to introduce dynamic traffic engineering are being discussed.