MULTIMEDIA COMMUNICATION & NETWORKS

545 views

Published on

ADVANCED ROUTING

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
545
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
27
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

MULTIMEDIA COMMUNICATION & NETWORKS

  1. 1. 13PIT101 Multimedia Communication & Networks UNIT – II Dr.A.Kathirvel Professor & Head/IT - VCEW
  2. 2. Unit - II Intra AS routing – Inter AS routing – Router Architecture – Switch Fabric – Active Queue Management – Head of Line blocking – Transition from IPv4 to IPv6 – Multicasting – Abstraction of Multicast groups – Group Management – IGMP – Group Shared Multicast Tree – Source based Multicast Tree – Multicast routing in Internet – DVMRP and MOSPF – PIM – Sparse mode and Dense mode
  3. 3. INTRA AS ROUTING
  4. 4. The Internet Network layer Host, router network layer functions: Transport layer: TCP, UDP Network layer IP protocol •addressing conventions •datagram format •packet handling conventions Routing protocols •path selection •RIP, OSPF, BGP routing table ICMP protocol •error reporting •router “signaling” Link layer physical layer #4
  5. 5. Hierarchical Routing Our routing study thus far - idealization  all routers identical  network “flat” … not true in practice scale: with 50 million destinations: • can’t store all dest’s in routing tables! • routing table exchange would swamp links! administrative autonomy • internet = network of networks • each network admin may want to control routing in its own network #5
  6. 6. Hierarchical Routing • aggregate routers into regions, “autonomous systems” (AS) • routers in same AS run same routing protocol – “intra-AS” routing protocol – routers in different AS can run different intraAS routing protocol gateway routers • special routers in AS • run intra-AS routing protocol with all other routers in AS • also responsible for routing to destinations outside AS – run inter-AS routing protocol with other gateway routers #6
  7. 7. Intra-AS and Inter-AS routing C.b Gateways: B.a A.a a b A.c C A b a B a d c c b •perform inter-AS routing amongst themselves •perform intra-AS routers with other routers in their AS network layer inter-AS, intra-AS routing in gateway A.c link layer physical layer #7
  8. 8. Intra-AS and Inter-AS routing C.b A.a a Host h1 b Inter-AS routing between A and B A.c C a d B.a c a B c b A Intra-AS routing within AS A Host h2 b Intra-AS routing within AS B  We’ll examine specific inter-AS and intra-AS Internet routing protocols shortly #8
  9. 9. Routing: Example E AS D d AS A (OSPF) a2 F No Export to F a1 i AS C AS B (OSPF intra routing) i2 b AS I #9
  10. 10. Routing: Example E AS D d d1 d2 AS A (OSPF) a2 i F AS C a1 How to specify? AS B (OSPF intra routing) b AS I #10
  11. 11. IP Addressing Scheme • We need an address to uniquely identify each destination • Routing scalability needs flexibility in aggregation of destination addresses – we should be able to aggregate a set of destinations as a single routing unit • Preview: the unit of routing in the Internet is a network---the destinations in the routing protocols are networks #11
  12. 12. IP Addressing: introduction • IP address: 32-bit identifier for host, router interface • interface: connection between host, router and physical link – router’s typically have multiple interfaces – host may have multiple interfaces – IP addresses associated with interface, not host, or router 223.1.1.1 223.1.2.1 223.1.1.2 223.1.1.4 223.1.1.3 223.1.2.9 223.1.3.27 223.1.2.2 223.1.3.2 223.1.3.1 223.1.1.1 = 11011111 00000001 00000001 00000001 223 1 1 1 #12
  13. 13. IP Addressing • IP address: – network part • high order bits – host part • low order bits 223.1.1.1 223.1.1.2 223.1.1.4 223.1.1.3 • What’s a network ? (from IP address perspective) – device interfaces with same network part of IP address – can physically reach each other without intervening router 223.1.2.1 223.1.2.9 223.1.3.27 223.1.2.2 LAN 223.1.3.1 223.1.3.2 network consisting of 3 IP networks (for IP addresses starting with 223, first 24 bits are network address) #13
  14. 14. IP Addressing How to find the networks? • Detach each interface from router, host • create “islands of isolated networks 223.1.1.2 223.1.1.1 223.1.1.4 223.1.1.3 223.1.9.2 223.1.7.0 223.1.9.1 223.1.7.1 223.1.8.1 223.1.8.0 223.1.2.6 Interconnected system consisting of six networks 223.1.2.1 223.1.3.27 223.1.2.2 223.1.3.1 223.1.3.2 #14
  15. 15. IP Addresses given notion of “network”, let’s re-examine IP addresses: “class-full” addressing: class A 0 network B 10 C 110 D 1110 1.0.0.0 to 127.255.255.255 host network 128.0.0.0 to 191.255.255.255 host network multicast address host 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 32 bits #15
  16. 16. IP addressing: CIDR • classful addressing: – inefficient use of address space, address space exhaustion – e.g., class B net allocated enough addresses for 65K hosts, even if only 2K hosts in that network • CIDR: Classless InterDomain Routing – network portion of address of arbitrary length – address format: a.b.c.d/x, where x is # bits in network portion of address network part host part 11001000 00010111 00010000 00000000 200.23.16.0/23 #16
  17. 17. CIDR Address Aggregation dAS D d1 AS A (OSPF) a2 130.132.1/24 i i->a1: I can reach 130.132/16; my path: I a1 130.132.2/24 AS I intradomain routing uses /24 130.132.3/24 #17
  18. 18. CIDR Address Aggregation B x00/24: B A x01/24: C C x10/24: E E x11/24: F G F #18
  19. 19. IP addresses: how to get one? Hosts (host portion): • hard-coded by system admin in a file • DHCP: Dynamic Host Configuration Protocol: dynamically get address: “plug-and-play” – – – – – host broadcasts “DHCP discover” msg DHCP server responds with “DHCP offer” msg host requests IP address: “DHCP request” msg DHCP server sends address: “DHCP ack” msg The common practice in LAN and home access (why?) #19
  20. 20. IP addresses: how to get one? Network (network portion): • get allocated portion of ISP’s address space: ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20 Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. …. Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23 #20
  21. 21. Hierarchical addressing: route aggregation Hierarchical addressing allows efficient advertisement of routing information: Organization 0 200.23.16.0/23 Organization 1 200.23.18.0/23 Organization 2 200.23.20.0/23 Organization 7 . . . . . . Fly-By-Night-ISP “Send me anything with addresses beginning 200.23.16.0/20” Internet 200.23.30.0/23 ISPs-R-Us “Send me anything with addresses beginning 199.31.0.0/16” #21
  22. 22. Hierarchical addressing: more specific routes ISPs-R-Us has a more specific route to Organization 1 Organization 0 200.23.16.0/23 Organization 2 200.23.20.0/23 Organization 7 . . . . . . Fly-By-Night-ISP “Send me anything with addresses beginning 200.23.16.0/20” Internet 200.23.30.0/23 ISPs-R-Us Organization 1 200.23.18.0/23 “Send me anything with addresses beginning 199.31.0.0/16 or 200.23.18.0/23” #22
  23. 23. Network Address Translation: Motivation  A local network uses just one public IP address as far as outside world is concerned  Each device on the local network is assigned a private IP address rest of Internet local network (e.g., home network) 192.168.1.0/24 192.168.1.1 192.168.1.2 192.168.1.3 138.76.29.7 192.168.1.4 All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers Datagrams with source or destination in this network have 192.168.1/24 address for source, destination (as usual) #23
  24. 24. NAT: Network Address Translation Implementation: NAT router must: – outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #) . . . remote clients/servers will respond using (NAT IP address, new port #) as destination addr. – remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair – incoming datagrams: replace (NAT IP address, new port #) in dest fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table #24
  25. 25. NAT: Network Address Translation 2: NAT router changes datagram source addr from 192.168.1.2, 3345 to 138.76.29.7, 5001, updates table NAT translation table WAN side addr LAN side addr 1: host 192.168.1.2 sends datagram to 128.119.40.186, 80 138.76.29.7, 5001 192.168.1.2, 3345 …… …… S: 192.168.1.2, 3345 D: 128.119.40.186, 80 192.168.1.2 1 2 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 192.168.1.1 192.168.1.3 138.76.29.7 S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3: Reply arrives dest. address: 138.76.29.7, 5001 3 S: 128.119.40.186, 80 D: 192.168.1.2, 3345 4 192.168.1.4 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 192.168.1.2, 3345 #25
  26. 26. Network Address Translation: Advantages • No need to be allocated range of addresses from ISP: - just one public IP address is used for all devices – 16-bit port-number field allows 60,000 simultaneous connections with a single LAN-side address ! – can change ISP without changing addresses of devices in local network – can change addresses of devices in local network without notifying outside world • Devices inside local net not explicitly addressable, visible by outside world (a security plus) #26
  27. 27. NAT: Network Address Translation • If both hosts are behind different NAT, they will have difficulty establishing connection • NAT is controversial: – routers should process up to only layer 3 – violates end-to-end argument • NAT possibility must be taken into account by app designers, e.g., P2P applications – address shortage should instead be solved by having more addresses --- IPv6 #27
  28. 28. IP addressing: the last word... Q: How does an ISP get block of addresses? A: ICANN: Internet Corporation for Assigned Names and Numbers – allocates addresses – manages DNS – assigns domain names, resolves disputes #28
  29. 29. Getting a datagram from source to dest. routing table in A Dest. Net. next router Nhops 223.1.1 223.1.2 223.1.3 IP datagram: misc fields source IP addr dest IP addr data A  datagram remains unchanged, as it travels source to destination  addr fields of interest here  mainly dest. IP addr 223.1.1.4 223.1.1.4 1 2 2 223.1.1.1 223.1.1.2 223.1.1.4 223.1.2.1 223.1.2.9 B 223.1.1.3 223.1.3.1 223.1.3.27 223.1.2.2 E 223.1.3.2 #29
  30. 30. Getting a datagram from source to dest. Dest. Net. next router Nhops misc data fields 223.1.1.1 223.1.1.3 223.1.1 223.1.2 223.1.3 Starting at A, given IP datagram addressed to B:  look up net. address of B  find B is on same net. as A A  link layer will send datagram directly to B inside link-layer frame  B and A are directly connected 223.1.1.4 223.1.1.4 1 2 2 223.1.1.1 223.1.1.2 223.1.1.4 223.1.2.1 223.1.2.9 B 223.1.1.3 223.1.3.1 223.1.3.27 223.1.2.2 E 223.1.3.2 #30
  31. 31. Getting a datagram from source to dest. misc fields 223.1.1.1 223.1.2.2 Dest. Net. next router Nhops data 223.1.1 223.1.2 223.1.3 Starting at A, dest. E:  look up network address of E  E on different network      A, E not directly attached routing table: next hop router to E is 223.1.1.4 link layer sends datagram to router 223.1.1.4 inside link-layer frame datagram arrives at 223.1.1.4 continued….. A 223.1.1.4 223.1.1.4 1 2 2 223.1.1.1 223.1.1.2 223.1.1.4 223.1.2.1 223.1.2.9 B 223.1.1.3 223.1.3.1 223.1.3.27 223.1.2.2 E 223.1.3.2 #31
  32. 32. Getting a datagram from source to dest. misc fields 223.1.1.1 223.1.2.2 Dest. next network router Nhops interface data Arriving at 223.1.4, destined for 223.1.2.2  look up network address of E  E on same network as router’s interface 223.1.2.9  router, E directly attached  link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9  datagram arrives at 223.1.2.2!!! (hooray!) 223.1.1 223.1.2 223.1.3 A - 1 1 1 223.1.1.4 223.1.2.9 223.1.3.27 223.1.1.1 223.1.1.2 223.1.1.4 223.1.2.1 223.1.2.9 B 223.1.1.3 223.1.3.1 223.1.3.27 223.1.2.2 E 223.1.3.2 #32
  33. 33. IP datagram format IP protocol version number header length (bytes) “type” of data max number remaining hops (decremented at each router) upper layer protocol to deliver payload to 32 bits head. type of T len service fragment 16-bit identifier flgs offset upper time to Internet layer live checksum ver total datagram length (bytes) for fragmentation/ reassembly 32 bit source IP address 32 bit destination IP address Options (if any) data (variable length, typically a TCP or UDP segment) E.g. timestamp, record route taken, specify list of routers to visit. #33
  34. 34. IP Fragmentation & Reassembly • • network links have MTU (max.transfer size) - largest possible link-level frame. – different link types, different MTUs large IP datagram divided (“fragmented”) within net – one datagram becomes several datagrams – “reassembled” only at final destination – IP header bits used to identify, order related fragments fragmentation: in: one large datagram out: 3 smaller datagrams reassembly 4-34
  35. 35. IP Fragmentation and Reassembly Example  4000 byte datagram  MTU = 1500 bytes length ID fragflag =4000 =x =0 offset =0 One large datagram becomes several smaller datagrams length ID fragflag =1500 =x =1 length ID fragflag =1500 =x =1 1480 bytes in data field offset = 1480/8 offset =0 offset =185 length ID fragflag =1060 =x =0 offset =370 Network Layer 4-35
  36. 36. Routing in the Internet • The Global Internet consists of Autonomous Systems (AS) interconnected with each other: – Stub AS: small corporation – Multihomed AS: large corporation (no transit) – Transit AS: provider • Two-level routing: – Intra-AS: administrator is responsible for choice – Inter-AS: unique standard Lecture 6: Network Layer #36
  37. 37. Internet AS Hierarchy Inter-AS border (exterior gateway) routers Intra-AS interior (gateway) routers Lecture 6: Network Layer #37
  38. 38. Intra-AS Routing • Also known as Interior Gateway Protocols (IGP) • Most common IGPs: – RIP: Routing Information Protocol – OSPF: Open Shortest Path First – IGRP: Interior Gateway Routing Protocol (Cisco propr.) Lecture 6: Network Layer #38
  39. 39. RIP ( Routing Information Protocol) • Distance vector algorithm • Included in BSD-UNIX Distribution in 1982 • Distance metric: # of hops (max = 15 hops) – why? • Distance vectors: exchanged every 30 sec via Response Message (also called advertisement) • Each advertisement: route to up to 25 destination nets Lecture 6: Network Layer #39
  40. 40. RIP (Routing Information Protocol) z w x A y D B C Destination Network w y z x …. Next Router Num. of hops to dest. A B B -- …. 2 2 7 1 .... Routing table in D Lecture 6: Network Layer #40
  41. 41. RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor/link declared dead – routes via neighbor invalidated – new advertisements sent to neighbors – neighbors in turn send out new advertisements (if tables changed) – link failure info quickly propagates to entire net – poison reverse used to prevent ping-pong loops (infinite distance = 16 hops) Lecture 6: Network Layer #41
  42. 42. OSPF (Open Shortest Path First) • “open”: publicly available • Uses Link State algorithm – LS packet dissemination – Topology map at each node – Route computation using Dijkstra’s algorithm • OSPF advertisement carries one entry per neighbor router • Advertisements disseminated to entire AS (via flooding) Lecture 6: Network Layer #42
  43. 43. OSPF “advanced” features (not in RIP) • Security: all OSPF messages authenticated (to prevent malicious intrusion); TCP connections used • Multiple same-cost paths allowed – only one path in RIP • For each link, multiple cost metrics for different ToS (eg, satellite link cost set “low” for best effort; high for real time) • Integrated uni- and multicast support: – Multicast OSPF (MOSPF) uses same topology data base as OSPF • Hierarchical OSPF in large domains. Lecture 6: Network Layer #43
  44. 44. Hierarchical OSPF Lecture 6: Network Layer #44
  45. 45. Hierarchical OSPF • Two-level hierarchy: local area, backbone. – Link-state advertisements only in area – each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. • Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers. • Backbone routers: run OSPF routing limited to backbone. • Boundary routers: connect to other ASs. Lecture 6: Network Layer #45
  46. 46. IGRP (Interior Gateway Routing Protocol) • • • • • CISCO proprietary; successor of RIP (mid 80s) Distance Vector, like RIP several cost metrics (delay, bandwidth, reliability, load etc) uses TCP to exchange routing updates Loop-free routing via Distributed Updating Alg. (DUAL) based on diffused computation Lecture 6: Network Layer #46
  47. 47. Inter-AS routing Lecture 6: Network Layer #47
  48. 48. Internet inter-AS routing: BGP • BGP (Border Gateway Protocol): the de facto standard • Path Vector protocol: – similar to Distance Vector protocol – each Border Gateway broadcast to neighbors (peers) entire path (I.e, sequence of ASs) to destination – E.g., Gateway X may send its path to dest. Z: Path (X,Z) = X,Y1,Y2,Y3,…,Z Lecture 6: Network Layer #48
  49. 49. Internet inter-AS routing: BGP Suppose: gateway X send its path to peer gateway W • W may or may not select path offered by X – cost, policy (don’t route via competitors AS), loop prevention reasons. • If W selects path advertised by X, then: Path (W,Z) = W, Path (X,Z) • Note: X can control incoming traffic by controlling its route advertisements to peers: – e.g., don’t want to route traffic to Z -> don’t advertise any routes to Z Lecture 6: Network Layer #49
  50. 50. Internet inter-AS routing: BGP • BGP messages exchanged using TCP. • BGP messages: – OPEN: opens TCP connection to peer and authenticates sender – UPDATE: advertises new path (or withdraws old) – KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request – NOTIFICATION: reports errors in previous msg; also used to close connection Lecture 6: Network Layer #50
  51. 51. Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed Scale: • hierarchical routing saves table size, reduced update traffic Performance: • Intra-AS: can focus on performance • Inter-AS: policy may dominate over performance Lecture 6: Network Layer #51
  52. 52. Extra Lecture 6: Network Layer #52
  53. 53. ICMP: Internet Control Message Protocol • • • used by hosts & routers to communicate network-level information – error reporting: unreachable host, network, port, protocol – echo request/reply (used by ping) network-layer “above” IP: – ICMP msgs carried in IP datagrams ICMP message: type, code plus first 8 bytes of IP datagram causing error Type 0 3 3 3 3 3 3 4 Code 0 0 1 2 3 6 7 0 8 9 10 11 12 0 0 0 0 0 Network Layer description echo reply (ping) dest. network unreachable dest host unreachable dest protocol unreachable dest port unreachable dest network unknown dest host unknown source quench (congestion control - not used) echo request (ping) route advertisement router discovery TTL expired bad IP header 4-53
  54. 54. Traceroute and ICMP • Source sends series of UDP segments to dest – First has TTL =1 – Second has TTL=2, etc. – Unlikely port number • When nth datagram arrives to nth router: – Router discards datagram – And sends to source an ICMP message (type 11, code 0) – Message includes name of router& IP address • When ICMP message arrives, source calculates RTT • Traceroute does this 3 times Stopping criterion • UDP segment eventually arrives at destination host • Destination returns ICMP “dest port unreachable” packet (type 3, code 3) • When source gets this ICMP, stops. Network Layer 4-54
  55. 55. Example: tracert www.yahoo.com Tracing route to www-real.wa1.b.yahoo.com [69.147.76.15] over a maximum of 30 hops: 1 2 3 4 5 6 7 8 9 10 11 <1 ms <1 ms <1 ms 132.67.250.1 <1 ms 1 ms <1 ms dmz-cc-gw.math.tau.ac.il [132.67.252.2] <1 ms <1 ms <1 ms tel-aviv.tau.ac.il [132.66.4.1] 1 ms <1 ms <1 ms gp1-tau-ge.ilan.net.il [128.139.191.70] 1 ms * 1 ms gp0-gp1-te.ilan.net.il [128.139.188.2] 87 ms 86 ms 87 ms iucc.rt1.fra.de.geant2.net [62.40.125.121] 87 ms 87 ms 87 ms TenGigabitEthernet7-3.ar1.FRA4.gblx.net [207.138.144.45] 177 ms 177 ms 177 ms 204.245.39.226 180 ms 177 ms 265 ms ae1-p151.msr2.re1.yahoo.com [216.115.108.23] 177 ms 177 ms 177 ms te-9-4.bas-a2.re1.yahoo.com [66.196.112.203] 177 ms 177 ms 177 ms f1.www.vip.re1.yahoo.com [69.147.76.15] Trace complete.
  56. 56. IPv6 • Initial motivation: 32-bit address space soon to be completely allocated. • Additional motivation: – header format helps speed processing/forwarding – header changes to facilitate QoS IPv6 datagram format: – fixed-length 40 byte header – no fragmentation allowed Network Layer 4-56
  57. 57. IPv6 Header (Cont) Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow.” (concept of“flow” not well defined). Next header: identify upper layer protocol for data Network Layer 4-57
  58. 58. Other Changes from IPv4 • Checksum: removed entirely to reduce processing time at each hop • Options: allowed, but outside of header, indicated by “Next Header” field • ICMPv6: new version of ICMP – additional message types, e.g. “Packet Too Big” – multicast group management functions Network Layer 4-58
  59. 59. Transition From IPv4 To IPv6 • Not all routers can be upgraded simultaneous – no “flag days” – How will the network operate with mixed IPv4 and IPv6 routers? • Tunneling: IPv6 carried as payload in IPv4 datagram among IPv4 routers Network Layer 4-59
  60. 60. Tunneling B IPv6 B C IPv6 IPv6 IPv4 F IPv6 D E F IPv4 IPv6 IPv6 IPv6 A E IPv6 A Logical view: tunnel Physical view: data A-to-B: IPv6 Src:B Dest: E Src:B Dest: E Flow: X Src: A Dest: F Flow: X Src: A Dest: F data Flow: X Src: A Dest: F data B-to-C: IPv6 inside IPv4 Network Layer B-to-C: IPv6 inside IPv4 Flow: X Src: A Dest: F data E-to-F: IPv6 4-60
  61. 61. IPv6 status report • Operating systems – – wide support – early 2000 – Windows (2000, XP, Vista), BSD, Linux, Apple • Networking infrastructure – Cisco • Deployment – Slow • Penetration – Host - minor (less than 1%) – Used in 2008 in China Olympic games • Motivation: CIDR & NAT Lecture 7: Network Layer II #61
  62. 62. Active Queue Management
  63. 63. Queuing Disciplines • Each router must implement some queuing discipline • Queuing allocates both bandwidth and buffer space: – Bandwidth: which packet to serve (transmit) next – Buffer space: which packet to drop next (when required) • Queuing also affects latency
  64. 64. Typical Internet Queuing • FIFO + drop-tail – Simplest choice – Used widely in the Internet • FIFO (first-in-first-out) – Implies single class of traffic • Drop-tail – Arriving packets get dropped when queue is full regardless of flow or importance • Important distinction: – FIFO: scheduling discipline – Drop-tail: drop policy
  65. 65. FIFO + Drop-tail Problems • Leaves responsibility of congestion control to edges (e.g., TCP) • Does not separate between different flows • No policing: send more packets  get more service • Synchronization: end hosts react to same events
  66. 66. Active Queue Management • Design active router queue management to aid congestion control • Why? – Routers can distinguish between propagation and persistent queuing delays – Routers can decide on transient congestion, based on workload
  67. 67. Active Queue Designs • Modify both router and hosts – DECbit – congestion bit in packet header • Modify router, hosts use TCP – Fair queuing • Per-connection buffer allocation – RED (Random Early Detection) • Drop packet or set bit in packet header as soon as congestion is starting
  68. 68. Internet Problems • Full queues – Routers are forced to have have large queues to maintain high utilizations – TCP detects congestion from loss • Forces network to have long standing queues in steadystate • Lock-out problem – Drop-tail routers treat bursty traffic poorly – Traffic gets synchronized easily  allows a few flows to monopolize the queue space
  69. 69. Design Objectives • Keep throughput high and delay low • Accommodate bursts • Queue size should reflect ability to accept bursts rather than steady-state queuing • Improve TCP performance with minimal hardware changes
  70. 70. Lock-out Problem • Random drop – Packet arriving when queue is full causes some random packet to be dropped • Drop front – On full queue, drop packet at head of queue • Random drop and drop front solve the lock-out problem but not the full-queues problem
  71. 71. Full Queues Problem • Drop packets before queue becomes full (early drop) • Intuition: notify senders of incipient congestion – Example: early random drop (ERD): • If qlen > drop level, drop each new packet with fixed probability p • Does not control misbehaving users
  72. 72. Random Early Detection (RED) • Detect incipient congestion, allow bursts • Keep power (throughput/delay) high – Keep average queue size low – Assume hosts respond to lost packets • Avoid window synchronization – Randomly mark packets • Avoid bias against bursty traffic • Some protection against ill-behaved users
  73. 73. RED Algorithm • Maintain running average of queue length • If avgq < minth do nothing – Low queuing, send packets through • If avgq > maxth, drop packet – Protection from misbehaving sources • Else mark packet in a manner proportional to queue length – Notify sources of incipient congestion
  74. 74. RED Operation Min thresh Max thresh P(drop) Average Queue Length 1.0 maxP minth maxth Avg queue length
  75. 75. RED Algorithm • Maintain running average of queue length – Byte mode vs. packet mode – why? • For each packet arrival – Calculate average queue size (avg) – If minth ≤ avgq < maxth • Calculate probability Pa • With probability Pa – Mark the arriving packet • Else if maxth ≤ avg – Mark the arriving packet
  76. 76. Queue Estimation • Standard EWMA: avgq - (1-wq) avgq + wqqlen – Special fix for idle periods – why? • Upper bound on wq depends on minth – Want to ignore transient congestion – Can calculate the queue average if a burst arrives • Set wq such that certain burst size does not exceed minth • Lower bound on wq to detect congestion relatively quickly • Typical wq = 0.002
  77. 77. Extending RED for Flow Isolation • Problem: what to do with non-cooperative flows? • Fair queuing achieves isolation using per-flow state – expensive at backbone routers – How can we isolate unresponsive flows without per-flow state? • RED penalty box – Monitor history for packet drops, identify flows that use disproportionate bandwidth – Isolate and punish those flows
  78. 78. FRED • Fair Random Early Drop (Sigcomm, 1997) • Maintain per flow state only for active flows (ones having packets in the buffer) • minq and maxq  min and max number of buffers a flow is allowed occupy • avgcq = average buffers per flow • Strike count of number of times flow has exceeded maxq
  79. 79. FRED – Fragile Flows • Flows that send little data and want to avoid loss • minq is meant to protect these • What should minq be? – When large number of flows  2-4 packets • Needed for TCP behavior – When small number of flows  increase to avgcq
  80. 80. FRED • Non-adaptive flows – Flows with high strike count are not allowed more than avgcq buffers – Allows adaptive flows to occasionally burst to maxq but repeated attempts incur penalty
  81. 81. Stochastic Fair Blue • Same objective as RED Penalty Box – Identify and penalize misbehaving flows • Create L hashes with N bins each – – – – Each bin keeps track of separate marking rate (pm) Rate is updated using standard technique and a bin size Flow uses minimum pm of all L bins it belongs to Non-misbehaving flows hopefully belong to at least one bin without a bad flow • Large numbers of bad flows may cause false positives
  82. 82. Stochastic Fair Blue • False positives can continuously penalize same flow • Solution: moving hash function over time – Bad flow no longer shares bin with same flows – Is history reset does bad flow get to make trouble until detected again? • No, can perform hash warmup in background
  83. 83. Head of Line blocking # 83
  84. 84. Buffers Fabric Buffer locations • • • • • Input ports Output ports Inside fabric Shared Memory Combination of all # 84
  85. 85. fabric Outputs Inputs Input Queuing # 85
  86. 86. Input Buffer : properties • • • • • Input speed of queue – no more than input line Need arbiter (running N times faster than input) FIFO queue Head of Line (HoL) blocking . Utilization: • Random destination • 1- 1/e = 59% utilization • due to HoL blocking # 86
  87. 87. Head of Line Blocking # 87
  88. 88. # 88
  89. 89. # 89
  90. 90. Head of Line Blocking Stadium Beer/Soda/Chips Kwiky Mart # 90
  91. 91. Output Queuing Stadium Beer/Soda/Chips Kwiky Mart # 91
  92. 92. Head of Line Blocking A B C A C B B C # 92
  93. 93. Head of Line Blocking A B A C B C ACB B C # 93
  94. 94. Head of Line Blocking A B B C C B ACBCA B C # 94
  95. 95. VOQ—Virtual Output Queues A ARB B C A C B B C # 95
  96. 96. VOQ—Virtual Output Queues A ARB AA B A C B C B B C C # 96
  97. 97. VOQ—Virtual Output Queues A ARB AAAA B C C A B B B C C # 97
  98. 98. Performance Issue with Cross-Bars 58.6% Source: M. J. Karol, M.G. Hluchyj, S. P. Morgan, “Input Versus Output Queueing [sic] on a Space-Division Packet Switch”, IEEE Transactions on Communications, Vol COM-35, No 12, December 1987, page 1353 # 98
  99. 99. Overcoming HoL blocking: look-ahead  The fabric looks ahead into the input buffer for packets that may be transferred if they were not blocked by the head of line.  Improvement depends on the depth of the look ahead.  This corresponds to virtual output queues where each input port has buffer for each output port. # 99
  100. 100. Input Queuing Virtual output queues # 100
  101. 101. Overcoming HoL blocking: output expansion  Each output port is expanded to L output ports  The fabric can transfer up to L packets to the same output instead of one cell. Karol and Morgan, IEEE transaction on communication, 1987: 1347-1356 # 101
  102. 102. Input Queuing Output Expansion L fabric # 102
  103. 103. Output Queuing The “ideal” 2 1 1 2 1 2 1 2 1 1 2 2 1 # 103
  104. 104. Output Buffer : properties • • • • No HoL problem Output queue needs to run faster than input lines Need to provide for N packets arriving to same queue solution: limit the number of input lines that can be destined to the output. # 104
  105. 105. MEMORY FABRIC FABRIC Shared Memory a common pool of buffers divided into linked lists indexed by output port number # 105
  106. 106. Shared Memory: properties • • • • Packets stored in memory as they arrive Resource sharing Easy to implement priorities Memory is accessed at speed equal to sum of the input or output speeds • How to divide the space between the sessions # 106
  107. 107. Multicast: one sender to many receivers • Multicast: one sender to many receivers – analogy: one teacher to many students • Question: how to achieve multicast
  108. 108. Internet Multicast Service Model multicast group concept: – hosts send IP datagram pkts to multicast group – hosts that have “joined” that multicast group will receive pkts sent to that group
  109. 109. Multicast groups • host group semantics: – anyone can “join” (receive) multicast group – anyone can send to multicast gorup – no network layer identification to hosts of members • session/application-level mechanisms needed for membership identification, privacy • needed: infrastructure to deliver mcastaddressed packets to all hosts that have joined that multicast group
  110. 110. Internet Multicast Addressing • class D Internet addresses reserved for multicast: • indirection: mcast address does not name a destination, but host group to receive packet packet addr: 226.17.30.197
  111. 111. Joining a mcast group: a two-step process • local: host informs local mcast router of desire to join group: IGMP • wide area: local router interacts with other routers to receive mcast packet flow – many protocols (e.g., DVMRP, MOSPF, PIM)
  112. 112. IGMP: Internet Group Management Protocol • host: sends IGMP report when application joins mcast group – IP_ADD_MEMBERSHIP socket option – host need not explicitly “unjoin” group when leaving • router: sends IGMP query at regular intervals – host belonging to a mcast group must reply to query
  113. 113. IGMP IGMP version 1 • router: Host Membership Query msg broadcast on LAN to all hosts • host: Host Membership Report msg to indicate group membership – randomized delay before responding – implicit leave via no reply to Query • RFC 1112 IGMP v2: additions include • group-specific Query • Leave Group msg – last host replying to Query can send explicit Leave Group msg – router performs group-specific query to see if any hosts left in group – RFC 2236 IGMP v3: under development as Internet draft
  114. 114. Multicast Issues • Naming • Membership Management • Routing
  115. 115. IP Multicast Naming • Class D address represents multicast group – E.g. 226.17.30.197 • Datagram with destination address set to group delivered to all hosts in the group – Indirection – 226.17.30.197 => 65.30.1.2, 66.8.3.53, 128.32.75.60, … – Sender may or may not be in the group • No address hierarchy or subnets – How is routing done?
  116. 116. Membership Management • Some other questions: – Who is part of the group? – How does one join? – How does one leave? – Who decides if it’s OK? • Membership management answers these
  117. 117. IGMP • Internet Group Management Protocol • Runs only between host and router – Multicast routing takes care of communication between routers
  118. 118. IGMP hosts host-to-router protocol (IGMP) routers multicast routing protocols (various)
  119. 119. IGMP query • IGMP membership_query – Router sends query – Find out all groups a host belongs to – Can query a specific group instead – Sent to the “all systems group” (224.0.0.1) with TTL=1
  120. 120. IGMP report • IGMP membership_report – Response from host to a query – Can send report unsolicited • Join group this way! • IGMP leave_group – Optional – Router will clean up membership info on next membership_query
  121. 121. IGMP properties • Minimalist semantics – Host controlled membership • No decision about: – Who controls membership – Invitations – How to find groups and join them • Move these decisions to application layer
  122. 122. Soft state • Host is authoritative on group membership • Router maintains “soft state” • A crashed router soon recovers – Sends a new membership_query – Misdelivers packets for a little while • OK by IP service model!
  123. 123. Protocol types • Dense mode protocols – – – – – assumes dense group membership Source distribution tree and NACK type DVMRP (Distance Vector Multicast Routing Protocol) PIM-DM (Protocol Independent Multicast, Dense Mode) Example: Company-wide announcement • Sparse mode protocol – – – – assumes sparse group membership Shared distribution tree and ACK type PIM-SM (Protocol Independent Multicast, Sparse Mode) Examples: a Shuttle Launch CS 640 123
  124. 124. Multicast Routing • A number of routers have hosts that belong to a multicast group • How to connect them (and others) in a tree? – Shared tree: single tree for all – Source-based tree: many trees
  125. 125. Core-Based Tree • Tree rooted at a core • To join a group, send unicast message towards core – Add all links traversed until hit existing tree
  126. 126. Diagram Core
  127. 127. Choice of Core • If core close to source, efficiency is good • If core far from source, efficiency falls – Delay up to twice optimal • Optimal core placement is NP-hard – Use heuristics
  128. 128. Source-based Trees • Different tree for each possible source – Why? • Reverse path forwarding to figure out tree • Pruning to leave out routers
  129. 129. Pruning • Prune when no attached members or downstream routers • Propagate prune messages upstream S: source R1 router with attached group member R4 R2 router with no attached group member P R5 R3 R6 P R7 P prune message links with multicast forwarding
  130. 130. DVMRP • • • • Distance Vector Multicast Routing Protocol DV + RPF + Pruning DV vector carries distance to multicast sources Pruning carries a timeout – Afterwards, traffic delivery is resumed • Explicit graft message to reverse pruning – Done upon join
  131. 131. MOSPF • Multicast Extensions to OSPF • Link-state advertisements include multicast group membership – Only report directly connected hosts • Compute shortest-path spanning tree rooted at source – On demand, when receiving packet from source for the first time – Forward multicast traffic along tree
  132. 132. MOSPF performance • Global state allows source-based trees to be used – Faster delivery of messages • Overhead – Joins and leaves flooded to all routers – Any change may cause whole tree to be recomputed
  133. 133. PIM • Protocol Independent Multicast – Uses routing tables, but agnostic of how they are built • Two settings: – Dense: most routers members of a group • Use RPF flooding with pruning – Sparse: most routers not members of a group • Use shared tree or source-based tree based on data characteristics • Uses soft-state
  134. 134. Sparse vs. Dense Dense Mode • Dense participants • B/W plentiful • Membership assumed until pruned • Data driven Sparse Mode • Sparse participants • B/W overhead significant • Membership explicitly requested • Receiver driven
  135. 135. Shared v. Source-based Trees • Shared trees used initially – Tree rooted at rendezvouz-point (RP) • Can switch to source-based trees when data rate is high – RP sends a Join message to source – Each router independently decides to switch to source-based tree, sends Join to source
  136. 136. Shared Tree Example G G RP S G
  137. 137. PIM Receiver Join G G G Report G RP Join *,G S What if join is here? G
  138. 138. PIM Shared Tree After Join G G G RP S G G
  139. 139. PIM Source Based Tree G G G RP S Join s,g G G
  140. 140. PIM Source Based Tree G G G RP S G G
  141. 141. PIM routing tables • Routing entries of the form (s,g) – s - source – g - group • Wildcard entries (*,g) for shared-group trees • Packets are routed using best match
  142. 142. Queries

×