Network-Control-and-..
Upcoming SlideShare
Loading in...5
×
 

Network-Control-and-..

on

  • 511 views

 

Statistics

Views

Total Views
511
Views on SlideShare
511
Embed Views
0

Actions

Likes
1
Downloads
3
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment
  • Great Need: current control plane promotes ossification and fragility
  • Fix up the graphics on this slide
  • Focus on what routers do, shouldn’t do. Don’t overreach the paper. Focus on architectural issues. Reviewers like the example
  • Routes flow like water through the graph, gated by policy on the links
  • Break up into two slides? Redraw size dist graphs with thicker lines and larger fonts.

Network-Control-and-.. Network-Control-and-.. Presentation Transcript

  • 100x100 Project Themes ?? Incentives vs. structure Information hiding Economics Time/space correlation, end-system infrastructure coordination ?? Exploit structure to achieve efficiency Fundamental primitives Security Network-wide abstractions Explicitly modeled Re-factoring Control/ management Holistic Design Structure Clean Slate
  • 100x100 Project Themes Optimized logical topology 2-hop backbone, routing paradigm Access/ Backbone Integrated L1-L3 approach Aggregation trees High rate multi-hop aggregation Wireless ?? ??? Lossless E2E Flow control Holistic Design Structure Clean Slate
  • 100x100 Project Themes Optimized logical topology 2-hop backbone, routing paradigm Access/ Backbone Integrated L1-L3 approach Aggregation trees High rate multi-hop aggregation Wireless ?? ??? Lossless E2E Flow control ?? Incentives vs. structure Information hiding Economics Time/space correlation, end-system infrastructure coordination ?? Exploit structure to achieve efficiency Fundamental primitives Security Network-wide abstractions Explicitly modeled Re-factoring Control/ management Holistic Design Structure Clean Slate
  • Broader Impacts
    • Clean slate design as a research trend
    • Control & Management
      • Research (Francis, Lucent, ATT)
      • Experimental/practical (att)
  • Network Control and Management in the 100x100 Architecture
  • Control/Management Needs of 100x100 Network Architecture
    • Control/Management creates logical network from physical network
      • Supports architecture and end-to-end view of 100x100 network
    • Access Network
      • Logical level: aggregation tree between CPE and Regional Node
      • Physical level: network with redundant links and multiple Regional Nodes
    • Backbone Network
      • Logical level: full mesh of links among Regional Nodes
      • Physical level: sparse graph of fiber routes constrained by geography
  • Themes of Network Control & Management
    • Holistic Design
    • Many different technologies – a few common problems
    • Find the right abstractions: exploit commonality
    • Clean Slate
    • How much autonomy do routers/switches need?
    • New principles for controlling networks
    • Eliminate duplicate logic
    • Leverage Network Structure
    • Many different types of networks exist - each with different objectives and topologies
    • Separate networking issues from distributed system issues
  • A Clean-slate Design
    • What are the fundamental causes of outages?
    • How to reduce/simplify the software in networks?
      • Control logic is software – no reason it should be hard to update, but how to avoid complexity pitfalls
    • What functionality needs to be distributed – what can be centralized?
      • What would a “RISC” router look like?
    • Leverage technology trends
      • CPU and link-speed growing faster than # of switches
  • Control Plane: The Key Leverage Point
    • Great Potential: control plane determines the behavior of the network
      • Reaction to events, reachability, services
    • Great Opportunities
      • A radical clean-slate control plane can be deployed
        • No changes to packet formats, IPv4/v6 agnostic
        • No changes to end-system software
      • Control plane is the nexus of network evolution
        • Changing the control plane logic can smooth transitions in network technologies and architectures
  • Three Principles for Network Control & Management
    • Network-level Objectives:
    • Express goals explicitly
      • Security policies, QoS, egress point selection
    • Do not bury goals in box-specific configuration
    Management Logic Reachability matrix Traffic engineering rules
  • Three Principles for Network Control & Management
    • Network-wide Views:
    • Design network to provide timely, accurate info
      • Topology, traffic, resource limitations
    • Give logic the inputs it needs
    Management Logic Reachability matrix Traffic engineering rules Read state info
  • Three Principles for Network Control & Management
    • Direct Control:
    • Allow logic to directly set forwarding state
      • FIB entries, packet filters, queuing parameters
    • Logic computes desired network state, let it implement it
    Management Logic Reachability matrix Traffic engineering rules Read state info Write state
  • Overview of the 4D Architecture
    • Decision Plane:
    • All management logic implemented on centralized servers making all decisions
    • Decision Elements use views to compute data plane state that meets objectives , then directly writes this state to routers
    Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views
  • Overview of the 4D Architecture
    • Dissemination Plane:
    • Provides a robust communication channel to each router – and robustness is the only goal!
    • May run over same links as user data, but logically separate and independently controlled
    Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views
  • Overview of the 4D Architecture
    • Discovery Plane:
    • Each router discovers its own resources and its local environment
    • E.g., the identity of its immediate neighbors
    Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views
  • Overview of the 4D Architecture
    • Data Plane:
    • Spatially distributed routers/switches
    • Ideally exposes interface to tables in hardware
    • Can deploy with today’s technology
    Decision Dissemination Discovery Data Network-level objectives Direct control Network-wide views
  • Concerns and Challenges
    • How does the 4D simplify the problem?
    • How will communication between routers and DEs survive failures in the network?
      • Can a robust dissemination plane be built?
    • Latency means DE’s view of network is behind reality. Will the control loop be stable?
    • What is the overhead to/from the DEs?
    • What happens in a network partition?
  • Fundamental Problem: Wrong Abstractions
    • interface Ethernet0
    • ip address 6.2.5.14 255.255.255.128
    • interface Serial1/0.5 point-to-point
    • ip address 6.2.2.85 255.255.255.252
    • ip access-group 143 in
    • frame-relay interface-dlci 28
    • router ospf 64
    • redistribute connected subnets
    • redistribute bgp 64780 metric 1 subnets
    • network 66.251.75.128 0.0.0.127 area 0
    • router bgp 64780
    • redistribute ospf 64 match route-map 8aTzlvBrbaW
    • neighbor 66.253.160.68 remote-as 12762
    • neighbor 66.253.160.68 distribute-list 4 in
    access-list 143 deny 1.1.0.0/16 access-list 143 permit any route-map 8aTzlvBrbaW deny 10 match ip address 4 route-map 8aTzlvBrbaW permit 20 match ip address 7 ip route 10.2.2.1/16 10.2.1.7
  • Fundamental Problem: Wrong Abstractions Router ID (sorted by file size) 881 0 Lines in config file 2000 1000 0 Size of configuration files in a single enterprise network (881 routers)
  • Fundamental Problem: Wrong Abstractions
    • Management Plane
    • Figure out what is happening in network
    • Decide how to change it
    Shell scripts Traffic Eng Databases Planning tools OSPF SNMP netflow modems Configs Link metrics
    • Control Plane
    • Multiple routing processes on each router
    • Each router with different configuration program
    • Huge number of control knobs: metrics, ACLs, policy
    FIB FIB FIB Routing policies Packet filters
    • Data Plane
    • Distributed routers
    • Forwarding, filtering, queueing
    • Based on FIB or labels
    OSPF BGP OSPF BGP OSPF BGP
  • Good Abstractions Reduce Complexity
    • All decision making logic lifted out of control plane
    • Eliminates duplicate logic in management plane
    • Dissemination plane provides robust communication to/from data plane switches
    Management Plane Control Plane Data Plane Decision Plane Dissemination Data Plane Configs FIBs, ACLs FIBs, ACLs
  • Fundamental Problem: Conflates Distributed Systems Issues with Networking Issues
    • Distributed Systems Concern: resiliency to link failures
      • Solution: multiple paths through routing process graph
    D D left Routing Process D left Routing Process D left Routing Process D D D
  • Fundamental Problem: Conflates Distributed Systems Issues with Networking Issues
    • Distributed Systems Concern: resiliency to link failures
      • Solution: multiple paths through routing process graph
    D right Routing Process D left Routing Process D left Routing Process D D D
  • Fundamental Problem: Conflates Distributed Systems Issues with Networking Issues
    • Networking Concern: implement resource or security policy
      • Solution: restrict flow of routing information, filter routes, summarize/aggregate routes
    D D left Routing Process D left Routing Process D left Routing Process D D D
  • 4D Separates Distributed Computing Issues from Networking Issues
    • Distributed computing issues ! protocols and network architecture
      • Overhead
      • Resiliency
      • Scalability
    • Networking issues ! management logic
      • Traffic engineering and service provisioning
      • Egress point selection
      • Reachability control (VPNs)
      • Precomputation of backup paths
  • 4D Can Leverage Network Structure
    • Decision plane logic can be specialized for structure of each physical network
      • Distributed protocols must be prepared for arbitrary topology graphs
      • 4D enables network logic specialized differently for access and for backbone
    • Advantages
      • Faster route computations
      • Retain flexibility to evolve network as needed
      • Support transition to 100x100 architecture
  • Fundamental Problem: Computing Configurations is Intractable
    • Computing configuration files that cause control plane to compute desired forwarding states is intractable
      • NP-hard in many cases
      • Requires predictive model of control plane behavior
    • Configurations files form a program that defines a set of forwarding states
      • Very hard to create program that permits only desired states, and doesn’t transit through bad ones
    Forwarding states allowed by configs Auto-adaptation leads to/thru bad states Planned responses avoid bad states
  • Direct Control Provides Complete Control
    • Zero device-specific configuration
    • Supports many models for “pushing” routes
      • Trivial push – convergence requires time for all updates to be receive and applied – same as today
      • Synchronized update – updates propagated, but not applied till agreed time in the future – clock skew defines convergence time
      • Controlled state trajectory – DE serializes updates to avoid all incorrect transient states
  • The Feasibility of the 4D Architecture
    • We designed and built a prototype of the 4D Architecture
    • 4D Architecture permits many designs – prototype is a single, simple design point
    • Decision plane
      • Contains logic to simultaneously compute routes and enforce reachability matrix
      • Multiple Decision Elements per network, using simple election protocol to pick master
    • Dissemination plane
      • Uses source routes to direct control messages
      • Extremely simple, but can route around failed data links
  • Evaluation of the 4D Prototype
    • Evaluated using Emulab ( www.emulab.net )
      • Linux PCs used as routers (650 – 800MHz)
      • Tested on 9 enterprise network topologies (10-100 routers each)
    Example network with 49 switches and 5 DEs
  • Performance of the 4D Prototype
    • Trivial prototype has performance comparable to well-tuned production networks
    • Recovers from single link failure in < 300 ms
      • < 1 s response considered “excellent”
    • Survives failure of master Decision Element
      • New DE takes control within 1 s
      • No disruption unless second fault occurs
    • Gracefully handles complete network partitions
      • Less than 1.5 s of outage
  • Future Work
    • Scalability
      • Evaluate over 1-10K switches, 10-100K routes
      • Networks with backbone-like propagation delays
    • Protocol improvements
      • Better dissemination and discovery planes
    • Deployment in today’s networks
      • Data center, enterprise, campus, backbone (RCP)
  • Recent Results
    • G. Xie, J. Zhan, D. A. Maltz, H. Zhang, A. Greenberg, G. Hjalmtysson, J. Rexford, “ On Static Reachability Analysis of IP Networks ,” IEEE INFOCOM 2005, Orlando, FL, March 2005.
    • J. Rexford, A. Greenberg, G. Hjalmtysson, D. A. Maltz, A. Myers, G. Xie, J. Zhan, H. Zhang, “ Network-Wide Decision Making: Toward a Wafer-Thin Control Plane ,” Proceedings of ACM HotNets-III, San Diego, CA, November 2004.
    • D. A. Maltz, J. Zhan, G. Xie, G. Hjalmtysson, A. Greenberg, H. Zhang, “ Routing Design in Operational Networks: A Look from the Inside ,” Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications (ACM SIGCOMM 2004), Portland, Oregon, 2004.
    • D. A. Maltz, J. Zhan, G. Xie, H. Zhang, G. Hjalmtysson, A. Greenberg, J. Rexford, “ Structure Preserving Anonymization of Router Configuration Data ,” Proceedings of ACM/Usenix Internet Measurement Conference (IMC 2004), Sicily, Italy, 2004.
  • Questions?
  • 4D Supports Network Evolution & Expansion
    • Decision logic can be upgraded as needed
      • No need for update of distributed protocols implemented in software distributed on every switch
    • Decision Elements can be upgraded as needed
      • Network expansion requires upgrades only to DEs, not every switch
  • 4D and Today’s Networks
    • 4D architecture and principles apply to today’s networks as well as 100x100
      • Enterprise/campus/university networks
      • Data center networks
      • Access/backbone networks
    • Greater expressivity in determining behavior
      • Behavior of butterfly graph gadgets under failure
      • Selection of traffic egress points
  • Three Key Questions
    • Is there any transition path to deploy the 4D architecture?
    • Is the 4D architecture feasible?
    • Does the 4D architecture have more expressive power than today’s approaches to network control and management?
  • Deployment of the 4D Architecture
    • Pre-existing industry trend towards separating router hardware from software
      • IETF: FORCES, GSMP, GMPLS
      • SoftRouter [Lakshman, HotNets’04]
    • Incremental deployment path exists
      • Individual networks can upgrade to 4D and gain benefits
      • Small enterprise networks have most to gain
      • No changes to end-systems required
  • Reachability Example
    • Two locations, each with data center & front office
    • All routers exchange routes over all links
    R1 R2 R5 R4 R3 Chicago (chi) New York (nyc) Data Center Front Office
  • Reachability Example R1 R2 R5 R4 R3 Chicago (chi) New York (nyc) Data Center chi-DC chi-FO nyc-DC nyc-FO chi-DC chi-FO nyc-DC nyc-FO Front Office
  • Reachability Example R1 R2 R5 R4 R3 Data Center Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc chi-DC chi-FO nyc-DC nyc-FO chi-DC chi-FO nyc-DC nyc-FO
  • Reachability Example
    • A new short-cut link added between data centers
    • Intended for backup traffic between centers
    R1 R2 R5 R4 R3 Data Center Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc
  • Reachability Example
    • Oops – new link lets packets violate security policy !
    • Routing changed, but
    • Packet filters don’t update automatically
    R1 R2 R5 R4 R3 Data Center Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit * Front Office chi nyc
  • Prohibiting Packets from chi-FO to nyc-DC
  • Reachability Example
    • Typical response – add more packet filters to plug the holes in security policy
    R1 R2 R5 R4 R3 Data Center Front Office chi nyc Packet filter: Drop nyc-FO -> * Permit * Packet filter: Drop chi-FO -> * Permit *
  • Reachability Example
    • Packet filters have surprising consequences
    • Consider a link failure
    • chi-FO and nyc-FO still connected
    R1 R2 R5 R4 R3 Data Center Drop nyc-FO -> * Front Office chi nyc Drop chi-FO -> *
  • Reachability Example
    • Network has less survivability than topology suggests
    • chi-FO and nyc-FO still connected
    • But packet filter means no data can flow!
    • Probing the network won’t predict this problem
    R1 R2 R5 R4 R3 Data Center Drop nyc-FO -> * Front Office chi nyc Drop chi-FO -> *
  • Allowing Packets from chi-FO to nyc-FO
  •  
  •  
  • Packet Filters Implement Policy
    • Packet filters used extensively throughout networks
    • Protect routers from attack
    • Implement reachability matrix
      • Define which hosts can communicate
      • Localize traffic, particularly multicas t
  • Mechanisms for Action at a Distance
    • Policy often implemented by tagging routes on one router …
    • … And testing for tag at another router
    FIB Routing Process FIB Routing Process FIB Routing Process A:tag=12 A R1 R2 R3 A:tag=12 Tag?
  • Multiple Interacting Routing Processes Client Server OSPF BGP OSPF FIB FIB OSPF FIB OSPF FIB OSPF FIB OSPF EBGP Policy1 Policy2 Internet
  • The Routing Instance Graph of a 881 Router Network
  • Reconvergence Time Under Single Link Failure
  • Reconvergence Time When Master DE Crashes
  • Reconvergence Time When Network Partitions
  • Reconvergence Time When Network Partitions
  • Systems of Systems
    • Systems are designed as components to be used in larger systems in different contexts, for different purposes, interacting with different components
      • Example: OSPF and BGP are complex systems in its own right, they are components in a routing system of a network, interacting with each other and packet filters, interacting with management tools …
    • Complex configuration to enable flexibility
      • The glue has tremendous impact on network performance
      • State of art: multiple interactive distributed programs written in assembly language
    • Lack of intellectual framework to understand global behavior
  • Many Implementations Possible
    • Multiple decision engines
    • Hot stand-by
    • Divide network & load share
    • Distributed decision engines
    • Up to one per router
    • Choice can be based on reliability requirements
    • Dessim. Plane can be in-band, or leverage OOB links
    • Less need for distributed solutions (harder to reason about)
    • More focus on network issues, less on distributed protocols
    Single redundant decision engine
  • Direct Expression Enables New Algorithms
    • OSPF normally calculates a single path to each destination D
    • OSPF allows load-balancing only for equal-cost paths to avoid loops
    • Using ECMP requires careful engineering of link weights
    D D
    • Decision Plane with network-wide view can compute multiple paths
    • “ Backup paths” installed for free!
    • Bounded stretch, bounded fan-in
  • Slides under Development
  • Supporting Network Evolution
    • Logic for controlling the network needs to change over time
      • Traffic engineering rules
      • Interactions with other networks
      • Service characteristics
    • Upgrades to field-deployed network equipment must be avoided
      • Very high cost
      • Software upgrades often require hardware upgrades (more CPU or memory)
  • Supporting Network Evolution Today
    • Today’s “Solution”
      • Vendors stuff their routers with software implementing all possible “features”
        • Multiple routing protocols
        • Multiple signaling protocols (RSVP, CR-LDP)
        • Each feature controlled by parameters set at configuration time to achieve late binding
      • Feature-creep creates configuration nightmare
        • Tremendous complexity for syntax & semantics
        • Mis-interactions between features is common
    • Our Goal: Separate decision making logic from the field-deployed devices
  • Supporting Network Expansion
    • Networks are constantly growing
      • New routers/switches/links added
      • Old equipment rarely removed
    • Adding a new switch can cause old equipment to become overloaded
      • CPU/Memory demands on each device should not scale up with network size
  • Supporting Network Expansion Today
    • Routers run a link-state routing protocol
      • Size of link-state database scales with # of routers
      • Expanding network can exceed memory limits of old routers
    • Today’s “Solution”
      • Monitor resources on all routers
      • Predict approach of exhaustion and then:
        • Global upgrade
        • Rearchitecture of routing design to add summarization, route aggregation, information hiding
    • Our Goal: make demands scale with hardware (e.g., # of interfaces)
  • Supporting Remote Devices
    • Maintaining communication with all network devices is critical for network management
      • Diagnosis of problems
      • Monitoring status and network health
      • Updating configuration or software
    • “the chicken or the egg….”
      • Cannot send device configuration/management information until it can communicate
      • Device cannot communicate until it is correctly configured
  • Supporting Remote Devices Today
    • Today’s “Solution”
      • Use PSTN as management network of last resort
      • Connect console of remote routers to phone modem
      • Can’t be used for customer premise equipment (CPE): DSL/cable modems, integrated access devices (IADs)
      • In a converged network, PSTN is decommissioned
    • Our Goal: Preserve management communication to any device that is not physically partitioned, regardless of configuration state
  • Network Control and Management Today
    • Data Plane
    • Distributed routers
    • Forwarding, filtering, queueing
    • Based on FIB or labels
    Packet filters
    • State everywhere!
    • Dynamic state in FIBs
    • Configured state in settings, policies, packet filters
    • Programmed state in magic constants, timers
    • Many dependencies between bits of state
    • State updated in uncoordinated, decentralized way!
    • Management Plane
    • Figure out what is happening in network
    • Decide how to change it
    Shell scripts Traffic Eng Databases Planning tools OSPF SNMP netflow modems Configs OSPF BGP Link metrics OSPF BGP OSPF BGP
    • Control Plane
    • Multiple routing processes on each router
    • Each router with different configuration program
    • Huge number of control knobs: metrics, ACLs, policy
    FIB FIB FIB Routing policies
  • Network Control and Management Today
    • Data Plane
    • Distributed routers
    • Forwarding, filtering, queueing
    • Based on FIB or labels
    Packet filters
    • State everywhere!
    • Dynamic state in FIBs
    • Configured state in settings, policies, packet filters
    • Programmed state in magic constants, timers
    • Many dependencies between bits of state
    • State updated in uncoordinated, decentralized way!
    • Logic everywhere!
    • Path Computation built into routing protocols
    • Routing Policy distributed across the routers
    • Packet Filters placed by tools in Mng. Plane
    • No way to arbitrate inconsistencies between logic!
    • Management Plane
    • Figure out what is happening in network
    • Decide how to change it
    Shell scripts Traffic Eng Databases Planning tools OSPF SNMP netflow modems Configs OSPF BGP Link metrics OSPF BGP OSPF BGP
    • Control Plane
    • Multiple routing processes on each router
    • Each router with different configuration program
    • Huge number of control knobs: metrics, ACLs, policy
    FIB FIB FIB Routing policies
  • A Study of Operational Production Networks
    • How complicated/simple are real control planes?
    • What is the structure of the distributed system?
    • Use reverse-engineering methodology
    • There are few or no documents
    • The ones that exist are out-of-date
    • Anonymized configuration files for 31 active networks (>8,000 configuration files)
    • 6 Tier-1 and Tier-2 Internet backbone networks
    • 25 enterprise networks
    • Sizes between 10 and 1,200 routers
    • 4 enterprise networks significantly larger than the backbone networks
  • Learning from Ethernet Evolution Experience Current Implementations: Everything Changed Except Name and Framing Ethernet Conc.. Router Server WAN HUB Switch
    • Switched solution
    • Little use for collision domains
    • Servers, routers 10 x station speed
    • 10/100/1000 Mbps, 10gig coming: Copper, Fiber
    WAN LAN Ethernet or 802.3
    • Bus-based Local Area Network
    • Collision Domain, CSMA/CD
    • Bridges and Repeaters for distance/capacity extension
    • 1-10Mbps: coax, twisted pair (10BaseT)
    B/R Early Implementations
  • Ethernet: Re-inventing the Wheel
    • Becoming as service-rich and complex as IP
      • Traffic engineering
      • Reachability control and traffic isolation (VLANs)
      • QoS (802.1q)
    • Ethernet networks rediscovering the problems and solutions faced by IP networks
    • Is there commonality to exploit?
      • Switch/routers are all fundamentally table-driven
      • Destination addr, MPLS labels, VLANs, Circuit IDs