Successfully reported this slideshow.

Cost Effective centralized adpative routing for networks on chip

1,169 views

Published on

Ran
Manevich, Technion

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Cost Effective centralized adpative routing for networks on chip

  1. 1. A Cost Effective Centralized Adaptive Routing for Networks on Chip Ran Manevich*, Israel Cidon*, Avinoam Kolodny*, Isask’har (Zigi) Walter* and Shmuel Wimer#*Technion – Israel Institute #Bar-Ilan University of Technology QNoC Module Module Module Research Module Module Module Module Module Module Group Module Module Module May 2, 2011
  2. 2. Networks-on-Chip (NoCs) May 2, 2011
  3. 3. Global traffic information is essential tomake the right decision! May 2, 2011
  4. 4. Adaptive Routing in NoCs – Local vs. Global Information I CAN MAKE Source IT!!! A Packet routed Low from upper left to Congestion bottom right Medium Congestion corner utilizing High local congestion Congestion information. The same packet routed using global   information. Destination May 2, 2011
  5. 5. Route Selection - ATDOR ATDOR - Adaptive Toggle Dimension Ordered Routing Keep it simple! Centralized selection: The option with less congested bottleneck link is preferred. Routing tables in sources. One bit per destination. May 2, 2011
  6. 6. ATDOR Illustration 1 Five identical flows, 100 MB/s each. Initial routing - XY Links modeled as M/M/1 queues. Delay of a single link: TrafficDLINK Capacity Traffic Links capacity is 210 MB/s. May 2, 2011
  7. 7. Centralized Routing – How?• Option 1 – Continuous calculation of optimal routing for the active sessions: Achievable load balancing Speed and computation complexity System complexity May 2, 2011
  8. 8. Centralized Routing – How?• Option 2 – Iterative serial selection based on traffic load measurements between XY and YX for all source- destination pairs: Achievable load balancing Speed and computation complexity System complexity May 2, 2011
  9. 9. ATDOR illustration 1Step # Re-Routed Flow 1 3 2 1->15 2->15 2->8 Average Delay 22 ns 37 ∞ May 2, 2011
  10. 10. What did we just see? For each flow we: 1. Calculated the better route. 2. Updated routing table of the source. 3. Waited for the update to take effect and measured global traffic load. Performing steps 1-3 for each flow is slow and not scalable. Steps 2 and 3 are unified for all destinations of a single source: Achievable load balancing Speed and computation complexity Scalability May 2, 2011
  11. 11. Back illustration 1Step # Re-Routed Flow 4 3 1 4->15 1->15 2->8 5 2 2->15 Average Delay 22 ns ∞ May 2, 2011
  12. 12. Problem #1 Changing routing may enhance congestion and cause fluctuations. Solution: Change routing only if the alternative is better by the margin α, 0< α <1:if (Current Route = XY) YX if MAX[Load YX ] a MAX[Load XY ] NextRoute = XY if MAX[Load YX ] > a MAX[Load XY ]elseif (Current Route = YX) XY if MAX[Load XY ] a MAX[Load YX ] NextRoute = YX if MAX[Load XY ] > a MAX[Load YX ] May 2, 2011
  13. 13. ATDOR illustration 2Step # Re-Routed Flow 1->14 3 2 1 1->15 1->16 Average Delay ∞ May 2, 2011
  14. 14. Problem #2 Coupling among flows sharing the same source. Solution: Re-Routing counters CI,J count routing changes of flows from source I to destination J (FI,J). When CI,J reaches a limit LI,J, routing of FI,J is locked. A Possible definition of Limits LI,J : LI , J (I J ) mod 3 May 2, 2011
  15. 15. Back to illustration 2 Flows R. Changes Left 1->16 0 1 2 1->15 0 1 2->14 1->14 0LI , J (I J ) mod 3 Average Delay 22 73 ns ∞ May 2, 2011
  16. 16. Bring it all together Flows R. Changes Left 1->15 0 1 2->8 0 1 2->15 0 1 2 4->15 0 1LI , J (I J ) mod 3 Average Delay 14 22 ns ∞ May 2, 2011
  17. 17. Centralized Adaptive Routing for NoCs - Architecture Local traffic load measurements inside the routers. Traffic load measurements aggregation into Traffic Load Maps. Routing control. May 2, 2011
  18. 18. Load Measurements Aggregation An illustration of aggregation of load values in a 4X4 2D mesh. A congestion value is written to each traffic load map every clock cycle. May 2, 2011
  19. 19. ATDOR – Route Selection Circuit Maximally loaded links of the two alternatives are compared. Next route: if(Current Route = XY) YX if MAX[Load YX ] a MAX[Load XY ] NextRoute = XY if MAX[Load YX ] > a MAX[Load XY ] elseif(Current Route = YX) XY if MAX[Load XY ] a MAX[Load YX ] NextRoute = YX if MAX[Load XY ] > a MAX[Load YX ] 0 < a <1• Combinatorial pipelined implementation. Result every ATDOR clock cycle. May 2, 2011
  20. 20. Hardware Requirements The whole mechanism was implemented on xc5vlx50t VIRTEX 5 FPGA. Estimated area for 45nm technology node. Per-Router hardware overheads in % for a NoC with typical size (50 KGates) virtual channel routers. May 2, 2011
  21. 21. Average Packet Delay – Uniform Traffic• Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Uniform traffic pattern. May 2, 2011
  22. 22. Average Packet Delay – Transpose Traffic • Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. Transpose traffic pattern. May 2, 2011
  23. 23. Average Packet Delay – Hotspot Traffic• Average delay vs. average load in links normalized to links capacity. 8X8 2D Mesh. 4 Hotspots traffic pattern. May 2, 2011
  24. 24. Control Iteration Duration• Number of re-routed flows vs. time.• 8X8 2D Mesh, ATDOR clock of 100 MHz. α = 15/16 α = 3/4 May 2, 2011
  25. 25. CMP DNUCA - Architecture• 8X8 CMP DNUCA (Dynamic Non Uniform Cache Array) with 8 CPUs and 56 cache banks: May 2, 2011
  26. 26. CMP DNUCA – Saturation Throughput• Saturation throughput - Splash 2 and Parsec benchmarks on 8X8 CMP DNUCA with 8 CPUs and 56 cache banks: May 2, 2011
  27. 27. Conclusions• Centralized adaptive routing is feasible for NoCs. ATDOR: Centralized selection between XY and YX for each source-destination pair. Hardware overhead: <4% of an 8X8 typical NoC. Average saturation throughput improvement: Vs. O1TURN Vs. RCA Synthetic Patterns 19.3% 12.1% Spash 2 and Parsec 22.8% 12.8% Benchmarks May 2, 2011

×