Some Unsolved Problems in High Speed Packet Swtiching Shivendra S. Panwar Joint work with : Yihan Li, Yanming Shen and H. ...
<ul><li>Advice to Woodward and Bernstein: </li></ul><ul><li>“ Follow the money”  </li></ul><ul><li>- - Deep Throat </li></...
<ul><li>Advice to performance analysts: </li></ul><ul><li>“ Find the bottleneck” </li></ul>
Packet Switching
Buffering in a Packet Switch <ul><li>Fixed-size packet switches </li></ul><ul><ul><li>Operates in a time-slotted manner </...
Output Queuing (OQ) <ul><li>100% throughput </li></ul><ul><li>Internal speedup of  N </li></ul><ul><ul><li>Impractical for...
Input Queuing (IQ) <ul><li>Easy to implement </li></ul><ul><li>HOL Blocking, throughput 58.6% </li></ul>Head of Line Block...
Virtual Output Queuing (VOQ) <ul><li>Virtual Output Queuing (VOQ) </li></ul><ul><ul><li>Overcome HOL blocking </li></ul></...
Challenges in Switch Design <ul><li>Stability </li></ul><ul><ul><li>100% throughput </li></ul></ul><ul><li>Delay performan...
High Speed Packet Switches <ul><li>VOQ switches and scheduling algorithms </li></ul><ul><li>Buffered crossbar switch </li>...
VOQ Switch Architecture  Input Segmentation Module (ISM):   Segment packets to fixed-length cells. Output Reassembly Modul...
Scheduling for VOQ Switch <ul><li>Scheduling is needed to avoid output contention </li></ul><ul><li>A scheduling problem c...
Maximum Weight Matching (MWM) <ul><li>MWM always finds a match with the maximum weight </li></ul><ul><li>Stable under any ...
Maximum Weight Matching <ul><li>The maximum weight matching algorithm is strongly stable under any admissible traffic patt...
Maximum Weight Matching <ul><li>Fluid model </li></ul><ul><li>The maximum weight matching is rate stable if: </li></ul><ul...
Approximate MWM <ul><li>1-APRX </li></ul><ul><ul><li>A function f(.) is a sub-linear function if  lim x  ∞  f(x)/x = 0 </...
Average Delay Bound <ul><li>Delay bound for MWM </li></ul><ul><ul><li>Lyapunov function </li></ul></ul><ul><li>References ...
Average Delay Bound (contd.) <ul><li>Delay bound for approximate-MWM </li></ul><ul><ul><li>Lyapunov function </li></ul></u...
Open Issues <ul><li>With simulations, MWM has the best delay performance (Cell delay) </li></ul><ul><ul><li>Average delay:...
Maximal Matching  <ul><li>Maximal Matching </li></ul><ul><ul><li>Add connections incrementally, without removing connectio...
Maximal Matching <ul><li>A maximal matching achieves 100% throughput with speed-up S≥2 under any admissible traffic patter...
Multiple Iterative Matching <ul><li>Use multiple iterations to converge on a maximal matching  </li></ul><ul><li>Parallel ...
iSLIP <ul><li>Step 1:   Request </li></ul><ul><ul><li>Each input sends a request to  every  output for which it has a queu...
Achieving 100% Throughput without Speedup <ul><li>Matching algorithms using memory </li></ul><ul><li>Polling system based ...
Low Complexity Algorithms with 100% Throughput <ul><li>Algorithms with memory </li></ul><ul><ul><li>Use the previous sched...
Matching Algorithms with Memory <ul><li>The queue length of each VOQ does not change much during successive time slots </l...
Notations <ul><li>For a  NxN switch,  there are  N!  possible matches </li></ul><ul><li>Q(t)=[q ij ] NxN , q ij  is the qu...
Randomized algorithm with memory <ul><li>Randomized algorithm with memory </li></ul><ul><ul><li>Let S(t) be the schedule u...
Derandomized Algorithm with Memory <ul><li>Hamiltonian walk  </li></ul><ul><ul><li>A walk which visits every vertex of a g...
Compared to MWM … <ul><li>Simple matching algorithms can achieve stability as MWM does </li></ul><ul><li>Not necessary to ...
With Higher Complexity and Lower Delay <ul><li>Introduce higher complexity for much lower delay than the randomized and de...
Polling System Based Matching <ul><li>Exhaustive Service Matching </li></ul><ul><ul><li>Inspired by exhaustive service pol...
Exhaustive Service Matching with Hamiltonian Walk (EMHW) <ul><li>EMHW </li></ul><ul><ul><li>Let  S(t)  be the match at tim...
E-iSLIP Average Delay Analysis <ul><li>Exhaustive random polling system model </li></ul><ul><ul><li>Symmetric system  -- o...
Delay Performance of HE-iSLIP <ul><li>Packet delay : the sum of  cell delay  and  reassembly delay </li></ul><ul><li>Cell ...
Performance Summary Always higher than HE-iSLIP. No O(logN) iSLIP Lowest when packet size is 1 cell. Yes O(N 3 ) MWM Lower...
Packet Delay under Uniform Traffic <ul><li>Pattern 1: packet size is 1 cell. </li></ul>MWM HE-iSLIP SERENA iSLIP
Packet Delay under Uniform Traffic <ul><li>Pattern 2: packet length is 10 cells </li></ul><ul><li>Pattern 3: packet length...
When packet length is larger than 1 cell <ul><li>Why does HE-iSLIP have a lower packet delay than MWM? </li></ul><ul><li>F...
Packet-Based Scheduling <ul><li>Packet-based  scheduling algorithm   </li></ul><ul><ul><li>once it starts transmitting the...
Buffered Crossbar Switch <ul><li>Distributed arbitration for inputs and outputs  </li></ul><ul><ul><li>From each input, on...
Birkhoff-von Neumann Switch <ul><li>When traffic matrix is known  </li></ul><ul><ul><li>Birkhoff-von Neumann decomposition...
Birkhoff-von Neumann Switch <ul><li>Example </li></ul><ul><li>High complexity, impractical </li></ul>0
Load-Balanced Switch <ul><li>Load-balanced switch </li></ul><ul><ul><li>Convert the traffic to uniform, then fixed switchi...
Original Work on LB Switch   <ul><li>Stability: the load-balanced switch is stable </li></ul><ul><li>Delay: burst reductio...
LB Switch variants <ul><li>Solve the out-of-sequence problem </li></ul><ul><ul><li>FCFS (First come first serve)  </li></u...
More LB switch variants <ul><li>FFF (Full frames first) (Infocom 2002, Mckeown) </li></ul><ul><ul><li>Frame-based </li></u...
Byte-Focal Switch Architecture Input VOQ Arrival 2nd stage switch fabric Second-stage VOQ Re-sequencing buffer i 1 N ( 1,1...
Byte-Focal Switch <ul><li>Packet-by-packet scheduling </li></ul><ul><ul><li>Improves the average delay performance </li></...
Multi-Stage Switches <ul><li>Single Stage Switches (e.g., Cross-point switch) </li></ul><ul><ul><li>Single path between ea...
Multi-Stage Architecture
Trueway: A Multi-Plane Multi-Stage Switch
Trueway Switch <ul><li>The switch fabric consists of  multiple  switching planes, with each being a three-stage Clos   net...
Challenges in Multi-Stage Switching <ul><li>How to efficiently allocate and share the  limited on-chip memory?   </li></ul...
Conclusion <ul><li>Introduced switch architecture trends </li></ul><ul><li>Many open research problems </li></ul><ul><li>B...
Upcoming SlideShare
Loading in …5
×

PPT

1,510 views
1,435 views

Published on

0 Comments
3 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,510
On SlideShare
0
From Embeds
0
Number of Embeds
7
Actions
Shares
0
Downloads
36
Comments
0
Likes
3
Embeds 0
No embeds

No notes for slide

PPT

  1. 1. Some Unsolved Problems in High Speed Packet Swtiching Shivendra S. Panwar Joint work with : Yihan Li, Yanming Shen and H. Jonathan Chao Polytechnic University, Brooklyn, NY NY State Center for Advanced Technology in Telecommunications http: //catt .poly. edu/CATT/panwar .html
  2. 2. <ul><li>Advice to Woodward and Bernstein: </li></ul><ul><li>“ Follow the money” </li></ul><ul><li>- - Deep Throat </li></ul><ul><li>(aka Mark Felt) </li></ul>
  3. 3. <ul><li>Advice to performance analysts: </li></ul><ul><li>“ Find the bottleneck” </li></ul>
  4. 4. Packet Switching
  5. 5. Buffering in a Packet Switch <ul><li>Fixed-size packet switches </li></ul><ul><ul><li>Operates in a time-slotted manner </li></ul></ul><ul><ul><li>The slot duration is equal to the cell transmission time </li></ul></ul><ul><li>Contention occurs when multiple inputs have arrivals destined to the same output </li></ul><ul><li>Buffering is needed to avoid packet loss </li></ul><ul><li>Buffering schemes in a packet switch </li></ul><ul><ul><li>Output queueing (IQ) </li></ul></ul><ul><ul><li>Input queueing (OQ) </li></ul></ul><ul><ul><li>Virtual output queueing (VOQ) / combined input-output-queueing (CIOQ) </li></ul></ul>
  6. 6. Output Queuing (OQ) <ul><li>100% throughput </li></ul><ul><li>Internal speedup of N </li></ul><ul><ul><li>Impractical for large N </li></ul></ul>Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 3 3 3 3
  7. 7. Input Queuing (IQ) <ul><li>Easy to implement </li></ul><ul><li>HOL Blocking, throughput 58.6% </li></ul>Head of Line Blocking Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 1 2 3 2 3 4 2 4
  8. 8. Virtual Output Queuing (VOQ) <ul><li>Virtual Output Queuing (VOQ) </li></ul><ul><ul><li>Overcome HOL blocking </li></ul></ul><ul><ul><li>No speedup requirement </li></ul></ul><ul><ul><li>Need scheduling algorithms to resolve contention </li></ul></ul><ul><ul><ul><li>Complexity </li></ul></ul></ul><ul><ul><ul><li>Performance guarantee </li></ul></ul></ul>1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4
  9. 9. Challenges in Switch Design <ul><li>Stability </li></ul><ul><ul><li>100% throughput </li></ul></ul><ul><li>Delay performance </li></ul><ul><li>Scalability </li></ul><ul><ul><li>Scale to high number of linecards and to high linecard speeds </li></ul></ul><ul><ul><li>Distributed scheduler is more desirable than a centralized scheduler </li></ul></ul><ul><li>Scheduler complexity </li></ul><ul><li>Pin count </li></ul>
  10. 10. High Speed Packet Switches <ul><li>VOQ switches and scheduling algorithms </li></ul><ul><li>Buffered crossbar switch </li></ul><ul><li>Load Balanced switch </li></ul><ul><li>Multi-stage switch </li></ul>
  11. 11. VOQ Switch Architecture Input Segmentation Module (ISM): Segment packets to fixed-length cells. Output Reassembly Module (ORM): Reassemble cells into packets. Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 Switch Fabric VOQ ISM ORM 1 N 1 N 1 N 1 N 1 N 1 N 1 N 1 N
  12. 12. Scheduling for VOQ Switch <ul><li>Scheduling is needed to avoid output contention </li></ul><ul><li>A scheduling problem can be modeled as a matching problem in a bipartite graph </li></ul><ul><ul><li>An input and an output are connected by an edge if the corresponding VOQ is not empty </li></ul></ul><ul><ul><li>Each edge may have a weight, which can be </li></ul></ul><ul><ul><ul><li>The length of the VOQ </li></ul></ul></ul><ul><ul><ul><li>The age of the HOL cell </li></ul></ul></ul>
  13. 13. Maximum Weight Matching (MWM) <ul><li>MWM always finds a match with the maximum weight </li></ul><ul><li>Stable under any admissible traffic </li></ul><ul><li>Very high complexity </li></ul><ul><ul><li>O(N 3 ), impractical </li></ul></ul>7 4 3 7 8 5 6 10 5 2 Weight of the match: 25 <ul><li>N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on Comm. , vol. 47, no. 8, Aug. 1999, pp. 1260-1267. </li></ul><ul><li>J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000 . </li></ul><ul><li>References </li></ul><ul><li>L. Tassiulas, A. Ephremides, ``Stability properties of constrained queueing systems and scheduling for maximum throughput in multihop radio networks,'' IEEE Transactions on Automatic Control , Vol. 37, No. 12, pp. 1936-1949, December 1992. </li></ul><ul><li>E. Leonardi, M. Mellia, F. Neri, Marco A. Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 </li></ul>
  14. 14. Maximum Weight Matching <ul><li>The maximum weight matching algorithm is strongly stable under any admissible traffic pattern </li></ul><ul><ul><li>Lyapunov function </li></ul></ul><ul><ul><li>Strongly stable </li></ul></ul><ul><ul><li>Admissible </li></ul></ul><ul><li>References </li></ul><ul><ul><li>Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 </li></ul></ul><ul><ul><li>N. McKeown, V. Anantharam, and J. Walrand, “Achieving 100% Throughput in an Input-Queued Switch,” IEEE Transaction on Comm. , vol. 47, no. 8, Aug. 1999, pp. 1260-1267. </li></ul></ul>
  15. 15. Maximum Weight Matching <ul><li>Fluid model </li></ul><ul><li>The maximum weight matching is rate stable if: </li></ul><ul><ul><li>The arrival processes satisfy a strong law of large numbers (SLLN) with probability one </li></ul></ul><ul><ul><li>, and </li></ul></ul><ul><li>References </li></ul><ul><ul><li>J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000 , pp. 556-564. </li></ul></ul>
  16. 16. Approximate MWM <ul><li>1-APRX </li></ul><ul><ul><li>A function f(.) is a sub-linear function if lim x  ∞ f(x)/x = 0 </li></ul></ul><ul><ul><li>Let the weight of a schedule obtained by a scheduling algorithm B be W B </li></ul></ul><ul><ul><li>Let the weight of the maximum weight match for the same switch state be W* </li></ul></ul><ul><ul><li>If W B ≥ W* - f(W*) </li></ul></ul><ul><ul><ul><li>B is a 1-APRX to MWM </li></ul></ul></ul><ul><li>B is stable if </li></ul><ul><li>Makes it possible to find stable matching algorithms with lower complexity than MWM. </li></ul><ul><li>References </li></ul><ul><ul><li>D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002. </li></ul></ul>
  17. 17. Average Delay Bound <ul><li>Delay bound for MWM </li></ul><ul><ul><li>Lyapunov function </li></ul></ul><ul><li>References </li></ul><ul><ul><li>E. Leonardi, M. Melia, F. Neri, and M. Ajmone Marson. Bounds on average delays and queue size averages and variances in input-queued cell-based switches. Proceedings of IEEE INFOCOM , 2001. </li></ul></ul>
  18. 18. Average Delay Bound (contd.) <ul><li>Delay bound for approximate-MWM </li></ul><ul><ul><li>Lyapunov function </li></ul></ul><ul><ul><li>C b : weight difference to the MWM matching </li></ul></ul><ul><li>Uniform traffic, they have the same result </li></ul><ul><li>References </li></ul><ul><ul><li>D. Shah, M. Kopikare, “Delay bounds for approximate Maximum weight matching algorithms for input-queued switches”, IEEE INFOCOM, New York, USA, June 2002. </li></ul></ul>
  19. 19. Open Issues <ul><li>With simulations, MWM has the best delay performance (Cell delay) </li></ul><ul><ul><li>Average delay: Choose the weight of a queue as Q a , then delay is increasing with a for a>0 </li></ul></ul><ul><li>Is MWM the optimal scheduling scheme for achieving the minimum average cell delay? </li></ul><ul><li>What is the optimal scheduling scheme to achieve the minimum average packet delay (Including reassembly delay)? </li></ul>
  20. 20. Maximal Matching <ul><li>Maximal Matching </li></ul><ul><ul><li>Add connections incrementally, without removing connections made earlier </li></ul></ul><ul><ul><li>No more matches can be made trivially by the end of the operation </li></ul></ul><ul><ul><li>Solution may not be unique </li></ul></ul><ul><ul><li>Complexity O(NlogN) </li></ul></ul>7 4 3 7 8 5 6 10 5 2 Weight of the match: 23
  21. 21. Maximal Matching <ul><li>A maximal matching achieves 100% throughput with speed-up S≥2 under any admissible traffic pattern </li></ul><ul><ul><li>[Leonardi, ToN 2001] </li></ul></ul><ul><ul><li>100% throughput </li></ul></ul><ul><ul><ul><li>if </li></ul></ul></ul><ul><ul><ul><li>with probability 1 </li></ul></ul></ul><ul><li>A maximal matching algorithm is rate stable with speed-up S≥2 [Dai, Infocom 2000] </li></ul><ul><li>References </li></ul><ul><ul><li>Emilio Leonardi, Marco Mellia, Fabio Neri, Marco Ajmone Marsan, “On the stability of Input-Queued Switches with speed-up”, IEEE/ACM Transactions on Networking, Vol.9, No.1, pp.104-118, ISSN: S 1063-6692(01)01313, February 2001 </li></ul></ul><ul><ul><li>J.G. Dai and B. Prabhakar, “The throughput of data switches with and without speedup,” INFOCOM 2000 , pp. 556-564. </li></ul></ul>
  22. 22. Multiple Iterative Matching <ul><li>Use multiple iterations to converge on a maximal matching </li></ul><ul><li>Parallel Iterative Matching (PIM) </li></ul><ul><li>iSLIP and DRRM </li></ul><ul><ul><li>complexity of each iteration is O(logN) </li></ul></ul><ul><ul><li>O(logN) iterations are needed to converge on a maximal matching (iSLIP) </li></ul></ul><ul><ul><li>100% throughput only under uniform traffic </li></ul></ul>
  23. 23. iSLIP <ul><li>Step 1: Request </li></ul><ul><ul><li>Each input sends a request to every output for which it has a queued cell. </li></ul></ul><ul><li>Step 2: Grant </li></ul><ul><ul><li>If an output receives multiple requests it chooses the one that appears next in a fixed round-robin schedule. </li></ul></ul><ul><ul><li>The output arbiter pointer is incremented by one location beyond the granted input if, and only if, the grant is accepted in step 3. </li></ul></ul><ul><li>Step 3: Accept </li></ul><ul><ul><li>If an input receives multiple grants, it accepts the one that appears next in a fixed round-robin schedule. </li></ul></ul><ul><ul><li>The input arbiter pointer is incremented by one location beyond the accepted output. </li></ul></ul>Input Output Request Grant Accept
  24. 24. Achieving 100% Throughput without Speedup <ul><li>Matching algorithms using memory </li></ul><ul><li>Polling system based matching </li></ul>
  25. 25. Low Complexity Algorithms with 100% Throughput <ul><li>Algorithms with memory </li></ul><ul><ul><li>Use the previous schedule as a candidate </li></ul></ul><ul><ul><li>References </li></ul></ul><ul><ul><ul><li>L. Tassiulas, “Linear complexity algorithms for maximum throughput in radio networks and input queued switches,” IEEE INFOCOM 1998 , vol.2, New York, 1998, pp.533-539. </li></ul></ul></ul><ul><ul><ul><li>P. Giaccone, B. Prabhakar, D. Shah “Toward simple, high-performance schedulers for high-aggregate bandwidth switches”, IEEE INFOCOM 2002 , New York, 2002. </li></ul></ul></ul><ul><li>Polling system based matching algorithms </li></ul><ul><ul><li>Improve the efficiency by using exhaustive service </li></ul></ul><ul><ul><li>References </li></ul></ul><ul><ul><ul><li>Y. Li, S. Panwar, H. J. Chao, “Exhaustive service matching algorithms for input queued switches,” 2004 Workshop on High Performance Switching and Routing (HPSR 2004) , April 2004. </li></ul></ul></ul><ul><ul><ul><li>Y. Li, S. Panwar, H. J. Chao, “ Performance Analysis of a Dual Round Robin Matching Switch with Exhaustive Service,” IEEE GLOBECOM 2002 . </li></ul></ul></ul>
  26. 26. Matching Algorithms with Memory <ul><li>The queue length of each VOQ does not change much during successive time slots </li></ul><ul><ul><li>In each time slot, there can be </li></ul></ul><ul><ul><ul><li>At most one cell arrives to each input </li></ul></ul></ul><ul><ul><ul><li>At most one cell departs from each input </li></ul></ul></ul><ul><li>It is likely that a busy connection will continue to be busy over a few time slots, if the queue length is used as the weight of a connection </li></ul><ul><li>Use the match in the previous time slot as an candidate for the new match </li></ul><ul><li>Important results: </li></ul><ul><ul><li>Randomized algorithm with memory [Tassiulas 98] </li></ul></ul><ul><ul><li>Derandomized algorithm with memory [Giaccone 02] </li></ul></ul><ul><ul><li>With higher complexity: APSARA, LAURA, SERENA [Giaccone 02] </li></ul></ul>
  27. 27. Notations <ul><li>For a NxN switch, there are N! possible matches </li></ul><ul><li>Q(t)=[q ij ] NxN , q ij is the queue length of VOQ ij </li></ul><ul><li>M(t), a match at time t </li></ul><ul><li>The weight of M(t) </li></ul><ul><ul><li>W(t)=<M(t),Q(t)> </li></ul></ul><ul><ul><li>the sum of the lengths of all matched VOQs </li></ul></ul>
  28. 28. Randomized algorithm with memory <ul><li>Randomized algorithm with memory </li></ul><ul><ul><li>Let S(t) be the schedule used at time t </li></ul></ul><ul><ul><li>At time t+1 , uniformly select a match R(t+1) at random from the set of all N! possible matches </li></ul></ul><ul><ul><li>Let </li></ul></ul><ul><li>Stable under any Bernoulli i.i.d. admissible arrival traffic </li></ul><ul><li>Very simple to implement, complexity O(logN) </li></ul><ul><li>Delay performance is very poor </li></ul>
  29. 29. Derandomized Algorithm with Memory <ul><li>Hamiltonian walk </li></ul><ul><ul><li>A walk which visits every vertex of a graph exactly once. </li></ul></ul><ul><ul><li>In a NxN switch, </li></ul></ul><ul><ul><ul><li>N! vertices (possible schedules), a Hamiltonian walk visits each vertex once every N! time slots </li></ul></ul></ul><ul><ul><ul><li>H(t): the value of the vertex which is visited at time t </li></ul></ul></ul><ul><ul><ul><li>The complexity of generating H(t+1) when H(t) is known is O(1) </li></ul></ul></ul><ul><li>Derandomized algorithm with memory </li></ul><ul><ul><li>Use the match generated by Hamiltonian walk instead of the random match </li></ul></ul><ul><ul><li>Similar performance as randomized algorithm </li></ul></ul>
  30. 30. Compared to MWM … <ul><li>Simple matching algorithms can achieve stability as MWM does </li></ul><ul><li>Not necessary to find “ the best match ” in each time slot to achieve 100% throughput </li></ul><ul><li>MWM has much better delay performance than randomized and derandomized matching </li></ul><ul><ul><li>“ better” matches lead to better delay performance </li></ul></ul>
  31. 31. With Higher Complexity and Lower Delay <ul><li>Introduce higher complexity for much lower delay than the randomized and derandomized algorithms </li></ul><ul><li>APSARA </li></ul><ul><ul><li>include the neighbors of the latest match as candidates </li></ul></ul><ul><li>LAURA: </li></ul><ul><ul><li>merge the latest match with a random match to remember the heavy edges </li></ul></ul><ul><li>SERENA </li></ul><ul><ul><li>Merge the latest match with the arrival figure </li></ul></ul><ul><ul><ul><li>Figure: generated from the current arrival pattern </li></ul></ul></ul><ul><ul><li>Complexity O(N) </li></ul></ul>
  32. 32. Polling System Based Matching <ul><li>Exhaustive Service Matching </li></ul><ul><ul><li>Inspired by exhaustive service polling systems </li></ul></ul><ul><ul><li>All the cells in the corresponding VOQ are served after an input and an output are matched </li></ul></ul><ul><ul><li>Slot times wasted to achieve an input-output match are amortized over all the cells waiting in the VOQ instead of only one </li></ul></ul><ul><ul><li>Cells within the same packet are transferred continuously </li></ul></ul><ul><li>Hamiltonian walk is used to guarantee stability </li></ul>
  33. 33. Exhaustive Service Matching with Hamiltonian Walk (EMHW) <ul><li>EMHW </li></ul><ul><ul><li>Let S(t) be the match at time t . </li></ul></ul><ul><ul><li>At time t+1 , generate match Z(t+1) by the Exhaustive Service Matching algorithm based on S(t), and H(t+1) by Hamiltonian walk </li></ul></ul><ul><ul><li>Let </li></ul></ul><ul><ul><ul><li>where <S,Q(t+1)> is the weight of S at time t+1 . </li></ul></ul></ul><ul><li>Stable under any admissible traffic </li></ul><ul><li>Analyzed by an exhaustive service polling system </li></ul><ul><li>Implementation complexity </li></ul><ul><ul><li>HE-iSLIP: O( logN ) </li></ul></ul>
  34. 34. E-iSLIP Average Delay Analysis <ul><li>Exhaustive random polling system model </li></ul><ul><ul><li>Symmetric system -- only consider one input </li></ul></ul><ul><ul><li>N VOQs per input, exhaustive service policy -- an exhaustive service polling system with N stations </li></ul></ul><ul><ul><li>The service order of the VOQs are not fixed -- random polling system, assume all station VOQs have the same probability of selection for service after a VOQ is served </li></ul></ul><ul><li>Switch over time S </li></ul><ul><li>Average delay T [Levy and Kleinrock] </li></ul>
  35. 35. Delay Performance of HE-iSLIP <ul><li>Packet delay : the sum of cell delay and reassembly delay </li></ul><ul><li>Cell delay : measured from VOQ to destination output </li></ul><ul><li>Reassembly delay : time spent in an ORM, often ignored in other work </li></ul>Input 1 Input 2 Input 3 Input 4 Output 1 Output 2 Output 3 Output 4 Switch Fabric VOQ ISM ORM 1 N 1 N 1 N 1 N 1 N 1 N 1 N 1 N
  36. 36. Performance Summary Always higher than HE-iSLIP. No O(logN) iSLIP Lowest when packet size is 1 cell. Yes O(N 3 ) MWM Lower than HE-iSLIP only under nonuniform diagonal traffic. Yes O(N) SERENA Highest for all traffic patterns. Yes O(logN) Derandomized Lowest when packet size is larger than 1 cell. Yes O(logN) HE-iSLIP packet delay performance stable complexity schemes
  37. 37. Packet Delay under Uniform Traffic <ul><li>Pattern 1: packet size is 1 cell. </li></ul>MWM HE-iSLIP SERENA iSLIP
  38. 38. Packet Delay under Uniform Traffic <ul><li>Pattern 2: packet length is 10 cells </li></ul><ul><li>Pattern 3: packet length is variable, the average is 10 cells (Internet packet size distribution) </li></ul>MWM HE-iSLIP HE-iSLIP MWM SERENA iSLIP iSLIP SERENA
  39. 39. When packet length is larger than 1 cell <ul><li>Why does HE-iSLIP have a lower packet delay than MWM? </li></ul><ul><li>For example, when packet length is 10 cells: </li></ul><ul><ul><li>Cell delay </li></ul></ul><ul><ul><li>Reassembly delay </li></ul></ul><ul><li>Low cell delay + low reassembly delay needed for low packet delay </li></ul>Open Problem: Which scheduler minimizes packet delay performance? HE-iSLIP MWM HE-iSLIP MWM
  40. 40. Packet-Based Scheduling <ul><li>Packet-based scheduling algorithm </li></ul><ul><ul><li>once it starts transmitting the first cell of a packet to an output port, it continues the transmission until the whole packet is completely received at the corresponding output port </li></ul></ul><ul><li>Packet-based MWM is stable for any admissible Bernoulli i.i.d. traffic </li></ul><ul><ul><li>Lyapunov function, MA. Marsan, A. Bianco, P. Giaccone, E. Leonardi, and F. Neri, “Packet Scheduling in Input-Queued Cell-Based Swithces,” INFOCOM 2001 , pp. 1085-1094. </li></ul></ul><ul><li>Packet-based MWM is stable under regenerative admissible input traffic </li></ul><ul><ul><li>Fluid model, Y. Ganjali, A. Keshavarzian, D. Shah, “Input Queued Switches: Cell switching v/s Packet switching&quot;, Proceedings of Infocom , 2003. </li></ul></ul><ul><ul><li>regenerative : Let T be the time between two successive occurrences of the event that all ports are free with E(T) being finite </li></ul></ul><ul><ul><li>Modified waiting PB-MWM algorithm is stable under any admissible traffic </li></ul></ul>
  41. 41. Buffered Crossbar Switch <ul><li>Distributed arbitration for inputs and outputs </li></ul><ul><ul><li>From each input, one cell can be sent to a crosspoint buffer if it has space </li></ul></ul><ul><ul><li>One cell can be sent to an output if at least one crosspoint buffer to that output is nonempty </li></ul></ul><ul><li>References </li></ul><ul><ul><li>Y. Doi and N. Yamanaka, “A High-Speed ATM Switch with Input and Cross-Point Buffers,” IEICE TRANS. COMMUN., VOL. E76, NO.3 , pp. 310-314, March 1993. </li></ul></ul><ul><ul><li>R. Rojas-Cessa, E. Oki, Z. Jing, and H. J. Chao, “C IXB-1: Combined Input-One-Cell-Crosspoint Buffered Switch, ” Proceedings of IEEE Workshop of High Performance Switches and Routers 2001. </li></ul></ul><ul><li>One buffer for each crosspoint </li></ul>
  42. 42. Birkhoff-von Neumann Switch <ul><li>When traffic matrix is known </li></ul><ul><ul><li>Birkhoff-von Neumann decomposition </li></ul></ul><ul><ul><li>Reference </li></ul></ul><ul><ul><ul><li>Cheng-Shang Chang, Wen-Jyh Chen and Hsiang-Yi Huang, &quot;On service guarantees for input buffered crossbar switches: a capacity decomposition approach by Birkhoff and von Neumann,&quot; IEEE IWQoS'99, pp. 79-86, London, U.K., 1999. </li></ul></ul></ul>
  43. 43. Birkhoff-von Neumann Switch <ul><li>Example </li></ul><ul><li>High complexity, impractical </li></ul>0
  44. 44. Load-Balanced Switch <ul><li>Load-balanced switch </li></ul><ul><ul><li>Convert the traffic to uniform, then fixed switching </li></ul></ul><ul><ul><li>100% throughput for broad class of traffic </li></ul></ul><ul><ul><li>No centralized scheduler needed, scalable </li></ul></ul>Switching ... ... ... ... ... ... Load-balancing … … 1 k N
  45. 45. Original Work on LB Switch <ul><li>Stability: the load-balanced switch is stable </li></ul><ul><li>Delay: burst reduction </li></ul><ul><li>Problem: unbounded out-of-sequence delays </li></ul><ul><li>Reference </li></ul><ul><ul><li>C.-S. Chang, D.-S. Lee and Y.-S. Jou, “Load balanced Birkhoff-von Neumann switches, Part I: one-stage buffering,” Computer Comm. , Vol. 25, pp. 611-622, 2002. </li></ul></ul>
  46. 46. LB Switch variants <ul><li>Solve the out-of-sequence problem </li></ul><ul><ul><li>FCFS (First come first serve) </li></ul></ul><ul><ul><li>Jitter control mechanism </li></ul></ul><ul><ul><ul><li>Increase the average delay </li></ul></ul></ul><ul><ul><li>EDF (Earliest deadline first) </li></ul></ul><ul><ul><ul><li>Reduce the average delay </li></ul></ul></ul><ul><ul><ul><li>High complexity </li></ul></ul></ul><ul><ul><li>Mailbox switch </li></ul></ul><ul><ul><ul><li>Prevent packets from being out-of-sequence </li></ul></ul></ul><ul><ul><ul><li>Not 100% throughput </li></ul></ul></ul><ul><li>References </li></ul><ul><ul><li>C.-S. Chang, D.-S. Lee and C.-M. Lien, “Load balanced Birkhoff-von Neumann switches, Part II: multi-stage buffering,” Computer Comm. , Vol. 25, pp. 623-634, 2002. </li></ul></ul><ul><ul><li>C.S. Chang, D. Lee, and Y. J. Shih, “Mailbox switch: A scalable twostage switch architecture for conflict resolution of ordered packets,” In Proceedings of IEEE INFOCOM , Hong Kong, March 2004. </li></ul></ul>
  47. 47. More LB switch variants <ul><li>FFF (Full frames first) (Infocom 2002, Mckeown) </li></ul><ul><ul><li>Frame-based </li></ul></ul><ul><ul><li>No need for resequencing </li></ul></ul><ul><ul><li>Require multi-stage buffer communication-high complexity </li></ul></ul><ul><li>FOFF (Full ordered frames first) (Sigcomm 2003, Mckeown) </li></ul><ul><ul><li>Frame-based </li></ul></ul><ul><ul><li>Maximum resequencing delay N 2 </li></ul></ul><ul><ul><li>Bandwidth wastage </li></ul></ul><ul><li>References </li></ul><ul><ul><li>I. Keslassy and N. McKeown, “Maintaining packet order in two-stage switches,” Proc. of the IEEE Infocom , June 2002. </li></ul></ul><ul><ul><li>I. Keslassy, S.-T. Chuang, K. Yu, D. Miller, M. Horowitz, O. Solgaard and N. McKeown , “Scaling Internet routers using optics,” ACM SIGCOMM ’03 , Karlsruhe, Germany, Aug. 2003. </li></ul></ul>
  48. 48. Byte-Focal Switch Architecture Input VOQ Arrival 2nd stage switch fabric Second-stage VOQ Re-sequencing buffer i 1 N ( 1,1 ) ( 1,N ) ( 1,k ) ( i,1 ) ( i,k ) ( i,N ) … … ( N,1 ) ( N,k ) ( N,N ) ( 1,1 ) ( 1,k ) ( 1,N ) ( j,1 ) 1 j N ( j,k ) ( j,N ) ( N,1 ) ( N,k ) ( N,N ) 1st stage switch fabric … … 1 k N … … ... ... ... ... ... ... ... ... ... ... ... ... … 1 2 N … 1 2 N … 1 2 N … … 1 i N
  49. 49. Byte-Focal Switch <ul><li>Packet-by-packet scheduling </li></ul><ul><ul><li>Improves the average delay performance </li></ul></ul><ul><li>The maximum resequencing delay is N 2 </li></ul><ul><li>The time complexity of the resequencing buffer is O(1) </li></ul><ul><li>Does not need communications between linecards </li></ul><ul><li>References </li></ul><ul><ul><li>Y. Shen, S. Jiang, S.S.Panwar, H.J. Chao, “Byte-Focal: a practical load-balanced swtich”, HPSR 2005, Hongkong. </li></ul></ul>
  50. 50. Multi-Stage Switches <ul><li>Single Stage Switches (e.g., Cross-point switch) </li></ul><ul><ul><li>Single path between each input-output pair </li></ul></ul><ul><ul><ul><li>Cannot meet the increasing demands of Internet traffic </li></ul></ul></ul><ul><ul><li>No packets out-of-sequence </li></ul></ul><ul><ul><li>Easy to design </li></ul></ul><ul><ul><li>Lack of scalability </li></ul></ul><ul><li>Multi-stage Switches (e.g., Clos-network switch) </li></ul><ul><ul><li>Multiple paths between each input-output pair </li></ul></ul><ul><ul><ul><li>Better tradeoff between the switch performance and complexity </li></ul></ul></ul><ul><ul><li>Highly scalable and fault tolerant </li></ul></ul><ul><ul><li>Memory-less multi-stage switches </li></ul></ul><ul><ul><ul><li>No packets out-of-sequence, may encounter internal blocking </li></ul></ul></ul><ul><ul><li>Buffered multi-stage switches </li></ul></ul><ul><ul><ul><li>Packet may be out-of-sequence, easy scheduling </li></ul></ul></ul>
  51. 51. Multi-Stage Architecture
  52. 52. Trueway: A Multi-Plane Multi-Stage Switch
  53. 53. Trueway Switch <ul><li>The switch fabric consists of multiple switching planes, with each being a three-stage Clos network with m center modules </li></ul><ul><li>Each input/output pair has multiple routing paths </li></ul><ul><li>Highly scalable </li></ul>1 n 1 2 n Cross-point buffered memory 2
  54. 54. Challenges in Multi-Stage Switching <ul><li>How to efficiently allocate and share the limited on-chip memory? </li></ul><ul><li>How to schedule packets on multiple paths to maximize memory utilization and system performance? </li></ul><ul><ul><li>How to minimize link congestion and prevent buffer overflow (i.e., stage-to-stage flow control)? </li></ul></ul><ul><ul><li>How to maintain cells/packet order if they are delivered over multiple paths (i.e., port-to-port flow control)? </li></ul></ul><ul><ul><li>How to achieve 100% throughput? </li></ul></ul>
  55. 55. Conclusion <ul><li>Introduced switch architecture trends </li></ul><ul><li>Many open research problems </li></ul><ul><li>Bottleneck keeps changing! </li></ul>

×