vroom_sigcomm.ppt

404 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
404
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • The key idea of VROOM is that routers should be free to roam around, instead of being permanently attached to a specific piece of hardware. In this talk, I’ll first show that VROOM is useful for many different network management applications, such as simplifying network maintenance, simplifying service deployment and evolution. In fact, it can also save us power. I will then show through our prototype implementation and evaluation that VROOM is actually feasible in practice, with no performance impact on data traffic and no visible impact on routing protocols.
  • Here is the basic idea of VROOM: virtual router instances running on top of physical routers form the logical topology of a network. The physical routers only provide shared hardware resource and the necessary virtualization support. It is the virtual routers that run routing protocols and forward actual traffic.
  • They are basically the same thing in people’s mind. When we think of a node in a topology, we also have the physical box in our mind, and vice versa.
  • The mapping can change, maintaining the IP layer logical topology and configuration intact. Each virtual router has its own routing protocol instances and its own forwarding tables. A physical router can support multiple virtual router instances through virtualization.
  • In ISPs, as a service grows, it may need to be moved to a more powerful router. Today, this process usually involves certain period of downtime.
  • We need to migrate all the links associated with the virtual router as well. To do this in a seamless fashion, we leverage the fact that in ISPs a point-to-point link at the IP layer is usually a multi-hop path in the underlying transport network. The advances of the transport network now offer the capability to dynamically setup a new optical path and switch the old path to the new path virtually instantaneously. This allows us to realize link migration by configuring the optical transport networks.
  • During the first iteration, all pages are transferred from A to B. Subsequent iterations copy only those pages dirtied during the previous transfer phase.
  • The double data planes simplify link migration because they enable the links to be migrated independently. For example, in this simple example, to migrate the links, we can first set up the links from the new physical nodes to the two adjacent nodes. We can then migrate the traffic in each direction separately.
  • vroom_sigcomm.ppt

    1. 1. V irtual RO uters O n the M ove (VROOM): Live Router Migration as a Network-Management Primitive Yi Wang, Eric Keller, Brian Biskeborn, Kobus van der Merwe, Jennifer Rexford
    2. 2. Virtual ROuters On the Move (VROOM) <ul><li>Key idea </li></ul><ul><ul><li>Routers should be free to roam around </li></ul></ul><ul><li>Useful for many different applications </li></ul><ul><ul><li>Simplify network maintenance </li></ul></ul><ul><ul><li>Simplify service deployment and evolution </li></ul></ul><ul><ul><li>Reduce power consumption </li></ul></ul><ul><ul><li>… </li></ul></ul><ul><li>Feasible in practice </li></ul><ul><ul><li>No performance impact on data traffic </li></ul></ul><ul><ul><li>No visible impact on control-plane protocols </li></ul></ul>
    3. 3. The Two Notions of “Router” <ul><li>The IP-layer logical functionality, and the physical equipment </li></ul>Logical (IP layer) Physical
    4. 4. The Tight Coupling of Physical & Logical <ul><li>Root of many network-management challenges (and “point solutions”) </li></ul>Logical (IP layer) Physical
    5. 5. VROOM: Breaking the Coupling <ul><li>Re-mapping the logical node to another physical node </li></ul>Logical (IP layer) Physical VROOM enables this re-mapping of logical to physical through virtual router migration.
    6. 6. Case 1: Planned Maintenance <ul><li>NO reconfiguration of VRs, NO reconvergence </li></ul>A B VR-1
    7. 7. Case 1: Planned Maintenance <ul><li>NO reconfiguration of VRs, NO reconvergence </li></ul>A B VR-1
    8. 8. Case 1: Planned Maintenance <ul><li>NO reconfiguration of VRs, NO reconvergence </li></ul>A B VR-1
    9. 9. Case 2: Service Deployment & Evolution <ul><li>Move a (logical) router to more powerful hardware </li></ul>
    10. 10. Case 2: Service Deployment & Evolution <ul><li>VROOM guarantees seamless service to existing customers during the migration </li></ul>
    11. 11. Case 3: Power Savings <ul><li>$ Hundreds of millions/year of electricity bills </li></ul>
    12. 12. Case 3: Power Savings <ul><li>Contract and expand the physical network according to the traffic volume </li></ul>
    13. 13. Case 3: Power Savings <ul><li>Contract and expand the physical network according to the traffic volume </li></ul>
    14. 14. Case 3: Power Savings <ul><li>Contract and expand the physical network according to the traffic volume </li></ul>
    15. 15. Virtual Router Migration: the Challenges <ul><li>Migrate an entire virtual router instance </li></ul><ul><ul><li>All control plane & data plane processes / states </li></ul></ul>
    16. 16. Virtual Router Migration: the Challenges <ul><li>Migrate an entire virtual router instance </li></ul><ul><li>Minimize disruption </li></ul><ul><ul><li>Data plane: millions of packets/second on a 10Gbps link </li></ul></ul><ul><ul><li>Control plane: less strict (with routing message retrans.) </li></ul></ul>
    17. 17. Virtual Router Migration: the Challenges <ul><li>Migrating an entire virtual router instance </li></ul><ul><li>Minimize disruption </li></ul><ul><li>Link migration </li></ul>
    18. 18. Virtual Router Migration: the Challenges <ul><li>Migrating an entire virtual router instance </li></ul><ul><li>Minimize disruption </li></ul><ul><li>Link migration </li></ul>
    19. 19. VROOM Architecture Dynamic Interface Binding Data-Plane Hypervisor
    20. 20. <ul><li>Key idea: separate the migration of control and data planes </li></ul><ul><li>Migrate the control plane </li></ul><ul><li>Clone the data plane </li></ul><ul><li>Migrate the links </li></ul>VROOM’s Migration Process
    21. 21. <ul><li>Leverage virtual server migration techniques </li></ul><ul><li>Router image </li></ul><ul><ul><li>Binaries, configuration files, etc. </li></ul></ul>Control-Plane Migration
    22. 22. <ul><li>Leverage virtual migration techniques </li></ul><ul><li>Router image </li></ul><ul><li>Memory </li></ul><ul><ul><li>1 st stage: iterative pre-copy </li></ul></ul><ul><ul><li>2 nd stage: stall-and-copy (when the control plane is “frozen”) </li></ul></ul>Control-Plane Migration
    23. 23. <ul><li>Leverage virtual server migration techniques </li></ul><ul><li>Router image </li></ul><ul><li>Memory </li></ul>Control-Plane Migration Physical router A Physical router B DP CP
    24. 24. <ul><li>Clone the data plane by repopulation </li></ul><ul><ul><li>Enable migration across different data planes </li></ul></ul><ul><ul><li>Eliminate synchronization issue of control & data planes </li></ul></ul>Data-Plane Cloning Physical router A Physical router B CP DP-old DP-new DP-new
    25. 25. <ul><li>Data-plane cloning takes time </li></ul><ul><ul><li>Installing 250k routes takes over 20 seconds* </li></ul></ul><ul><li>The control & old data planes need to be kept “online” </li></ul><ul><li>Solution: redirect routing messages through tunnels </li></ul>Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
    26. 26. <ul><li>Data-plane cloning takes time </li></ul><ul><ul><li>Installing 250k routes takes over 20 seconds* </li></ul></ul><ul><li>The control & old data planes need to be kept “online” </li></ul><ul><li>Solution: redirect routing messages through tunnels </li></ul>Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
    27. 27. <ul><li>Data-plane cloning takes time </li></ul><ul><ul><li>Installing 250k routes takes over 20 seconds* </li></ul></ul><ul><li>The control & old data planes need to be kept “online” </li></ul><ul><li>Solution: redirect routing messages through tunnels </li></ul>Remote Control Plane *: P. Francios, et. al., Achieving sub-second IGP convergence in large IP networks, ACM SIGCOMM CCR, no. 3, 2005. Physical router A Physical router B CP DP-old DP-new
    28. 28. <ul><li>At the end of data-plane cloning, both data planes are ready to forward traffic </li></ul>Double Data Planes CP DP-old DP-new
    29. 29. <ul><li>With the double data planes, links can be migrated independently </li></ul>Asynchronous Link Migration A CP DP-old DP-new B
    30. 30. <ul><li>Control plane: OpenVZ + Quagga </li></ul><ul><li>Data plane: two prototypes </li></ul><ul><ul><li>Software-based data plane (SD): Linux kernel </li></ul></ul><ul><ul><li>Hardware-based data plane (HD): NetFPGA </li></ul></ul><ul><li>Why two prototypes? </li></ul><ul><ul><li>To validate the data-plane hypervisor design (e.g., migration between SD and HD) </li></ul></ul>Prototype Implementation
    31. 31. <ul><li>Performance of individual migration steps </li></ul><ul><li>Impact on data traffic </li></ul><ul><li>Impact on routing protocols </li></ul><ul><li>Experiments on Emulab </li></ul>Evaluation
    32. 32. <ul><li>Performance of individual migration steps </li></ul><ul><li>Impact on data traffic </li></ul><ul><li>Impact on routing protocols </li></ul><ul><li>Experiments on Emulab </li></ul>Evaluation
    33. 33. <ul><li>The diamond testbed </li></ul>Impact on Data Traffic n0 n1 n2 n3 VR
    34. 34. <ul><li>SD router w/ separate migration bandwidth </li></ul><ul><ul><li>Slight delay increase due to CPU contention </li></ul></ul><ul><li>HD router w/ separate migration bandwidth </li></ul><ul><ul><li>No delay increase or packet loss </li></ul></ul>Impact on Data Traffic
    35. 35. <ul><li>The Abilene-topology testbed </li></ul>Impact on Routing Protocols
    36. 36. <ul><li>Introduce LSA by flapping link VR2-VR3 </li></ul><ul><ul><li>Miss at most one LSA </li></ul></ul><ul><ul><li>Get retransmission 5 seconds later (the default LSA retransmission timer) </li></ul></ul><ul><ul><li>Can use smaller LSA retransmission-interval (e.g., 1 second) </li></ul></ul>Core Router Migration: OSPF Only
    37. 37. <ul><li>Average control-plane downtime: 3.56 seconds </li></ul><ul><ul><li>Performance lower bound </li></ul></ul><ul><li>OSPF and BGP adjacencies stay up </li></ul><ul><li>Default timer values </li></ul><ul><ul><li>OSPF hello interval: 10 seconds </li></ul></ul><ul><ul><li>BGP keep-alive interval: 60 seconds </li></ul></ul>Edge Router Migration: OSPF + BGP
    38. 38. Where To Migrate <ul><li>Physical constraints </li></ul><ul><ul><li>Latency </li></ul></ul><ul><ul><ul><li>E.g, NYC to Washington D.C.: 2 msec </li></ul></ul></ul><ul><ul><li>Link capacity </li></ul></ul><ul><ul><ul><li>Enough remaining capacity for extra traffic </li></ul></ul></ul><ul><ul><li>Platform compatibility </li></ul></ul><ul><ul><ul><li>Routers from different vendors </li></ul></ul></ul><ul><ul><li>Router capability </li></ul></ul><ul><ul><ul><li>E.g., number of access control lists (ACLs) supported </li></ul></ul></ul><ul><li>The constraints simplify the placement problem </li></ul>
    39. 39. Conclusions & Future Work <ul><li>VROOM: a useful network-management primitive </li></ul><ul><ul><li>Separate tight coupling between physical and logical </li></ul></ul><ul><ul><li>Simplify network management, enable new applications </li></ul></ul><ul><ul><li>No data-plane and control-plane disruption </li></ul></ul><ul><li>Future work </li></ul><ul><ul><li>Migration scheduling as an optimization problem </li></ul></ul><ul><ul><li>Other applications of router migration </li></ul></ul><ul><ul><ul><li>Handle unplanned failures </li></ul></ul></ul><ul><ul><ul><li>Traffic engineering </li></ul></ul></ul>
    40. 40. <ul><li>Thanks! </li></ul><ul><li>Questions & Comments? </li></ul><ul><li>[email_address] </li></ul>
    41. 41. Packet-aware Access Network
    42. 42. Packet-aware Access Network <ul><li>Pseudo-wires (virtual circuits) from CE to PE </li></ul>PE CE P/G-MSS: Packet-aware/Gateway Multi-Service Switch MSE: Multi-Service Edge
    43. 43. Events During Migration <ul><li>Network failure during migration </li></ul><ul><ul><li>The old VR image is not deleted until the migration is confirmed successful </li></ul></ul><ul><li>Routing messages arrive during the migration of the control plane </li></ul><ul><ul><li>BGP: TCP retransmission </li></ul></ul><ul><ul><li>OSPF: LSA retransmission </li></ul></ul>
    44. 44. <ul><li>Migrate links affixed to the virtual routers </li></ul><ul><li>Enabled by: programmable transport networks </li></ul><ul><ul><li>Long-haul links are reconfigurable </li></ul></ul><ul><ul><ul><li>Layer 3 point-to-point links are multi-hop at layer 1/2 </li></ul></ul></ul>Flexible Transport Networks Chicago New York Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector) Programmable Transport Network
    45. 45. Requirements & Enabling Technologies <ul><li>Migrate links affixed to the virtual routers </li></ul><ul><li>Enabled by: programmable transport networks </li></ul><ul><ul><li>Long-haul links are reconfigurable </li></ul></ul><ul><ul><ul><li>Layer 3 point-to-point links are multi-hop at layer 1/2 </li></ul></ul></ul>Chicago New York Washington D.C. : Multi-service optical switch (e.g., Ciena CoreDirector) Programmable Transport Network
    46. 46. Requirements & Enabling Technologies <ul><li>Enable edge router migration </li></ul><ul><li>Enabled by: packet-aware access networks </li></ul><ul><ul><li>Access links are becoming inherently virtualized </li></ul></ul><ul><ul><ul><li>Customers connects to provider edge (PE) routers via pseudo-wires (virtual circuits) </li></ul></ul></ul><ul><ul><ul><li>Physical interfaces on PE routers can be shared by multiple customers </li></ul></ul></ul>Dedicated physical interface per customer Shared physical interface
    47. 47. <ul><li>With programmable transport networks, long-haul links are reconfigurable </li></ul><ul><ul><li>IP-layer point-to-point links are multi-hop at transport layer </li></ul></ul><ul><li>VROOM leverages this capability in a new way to enable link migration </li></ul>Link Migration in Transport Networks
    48. 48. <ul><li>2. With packet-aware transport networks </li></ul><ul><ul><li>Logical links share the same physical port </li></ul></ul><ul><ul><ul><li>Packet-aware access network (pseudo wires) </li></ul></ul></ul><ul><ul><ul><li>Packet-aware IP transport network (tunnels) </li></ul></ul></ul>Link Migration in Flexible Transport Networks
    49. 49. The Out-of-box OpenVZ Approach <ul><li>Packets are forwarded inside each VE </li></ul><ul><li>When a VE is being migrated, packets are dropped </li></ul>
    50. 50. Putting It Altogether: Realizing Migration 1. The migration program notifies shadowd about the completion of the control plane migration
    51. 51. Putting It Altogether: Realizing Migration 2. shadowd requests zebra to resend all the routes, and pushes them down to virtd
    52. 52. Putting It Altogether: Realizing Migration 3. virtd installs routes the new FIB, while continuing to update the old FIB
    53. 53. Putting It Altogether: Realizing Migration 4. virtd notifies the migration program to start link migration after finishing populating the new FIB 5. After link migration is completed, the migration program notifies virtd to stop updating the old FIB
    54. 54. Power Consumption of Routers <ul><li>A Synthetic large tier-1 ISP backbone </li></ul><ul><ul><li>50 POPs (Point-of-Presence) </li></ul></ul><ul><ul><li>20 major POPs, each has: </li></ul></ul><ul><ul><ul><li>6 backbone routers, 6 peering routers, 30 access routers </li></ul></ul></ul><ul><ul><li>30 smaller POPs, each has: </li></ul></ul><ul><ul><ul><li>6 access routers </li></ul></ul></ul>Vendor Cisco Juniper Model CRS-1 12416 7613 T1600 T640 M320 Power (watt) 10,920 4,212 4,000 9,100 6,500 3,150
    55. 55. Future Work <ul><li>Algorithms that solve the constrained optimization problems </li></ul><ul><li>Control-plane hypervisor to enable cross-vendor migration </li></ul>
    56. 56. Performance of Migration Steps <ul><li>Memory copy time </li></ul><ul><ul><li>With different numbers of routes (dump file sizes) </li></ul></ul>
    57. 57. Performance of Migration Steps <ul><li>FIB population time </li></ul><ul><ul><li>Grows linearly w.r.t. the number of route entries </li></ul></ul><ul><ul><li>Installing a FIB entry into NetFPGA: 7.4 microseconds </li></ul></ul><ul><ul><li>Installing a FIB entry into Linux kernel: 1.94 milliseconds </li></ul></ul><ul><li>FIB update time: time for virtd to install entries to FIB </li></ul><ul><li>Total time: FIB update time + time for shadowd to send routes to virtd </li></ul>
    58. 58. The Importance of Separate Migration Bandwidth <ul><li>The dumbbell testbed </li></ul><ul><li>250k routes in the RIB </li></ul>
    59. 59. Separate Migration Bandwidth is Important <ul><li>Throughput of the migration traffic </li></ul>
    60. 60. Separate Migration Bandwidth is Important <ul><li>Delay increase of the data traffic </li></ul>
    61. 61. Separate Migration Bandwidth is Important <ul><li>Loss rate of the data traffic </li></ul>

    ×