Theory of Time 2024 (Universal Theory for Everything)
A Flexible Router Architecture for 3D Network-on-Chips
1. A Flexible Router Architecture
for 3D Network-on-Chips
Mostafa Khamis1, Mostafa Said2, Ahmed Shalaby3
1Mentor Graphics Egypt, Egypt
2 SCALE Lab, School of Engineering, Brown University, USA
3Egypt-Japan University of Science and Technology (E-JUST), Egypt
1
2. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
2
3. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
3
4. Introduction
2D integration limitations
• As technology shrinks many limitations and challenges appear for traditional
integration technology (2D ICs) :
Increasing length of wires, increasing the delay, power consumption, and routing area
Mask costs become higher with technology scaling
• For example, the ITRS figure bellow shows how small becomes the area of clock
synchronization inside a 2D chip!!
4
5. Introduction
3D Integration Evolution
5TSV
In normal life if land become expensive
we build upward rather than outward
Then move from 2D-IC to 3D-IC
Stack all dies together using TSVs
6. Introduction
3D integration with NoCs: Moving From 2D to 3D NoCs
• 3D NoCs offer far more advantages
than their 2D counterparts
3D NoCs are far more scalable
The average hop counts between
routers is reduced significantly
The energy dissipation is reduced
Performance gain is achieved
6
router
2D NoC
3D NoC
Link
7. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
7
8. Congestion in 2D-NoCs
Problem description
• Packet requesting busy buffer will be blocked
• Blocking may affect other packets (back-pressure) in the network
causing congestion
8
Packet P1 is blocked causing
P2 and P3 to be blocked also
9. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
9
10. Deadlock
Full flexibility leads to deadlocks
• Full flexibility as shown leads to deadlock because:
• All packet directions inside the buffer are permitted
• All turns are permitted then complete cycles can occur
10
Assuming each buffer FIFO is
of size 1 packet, so when all
the buffer of R1 are heading
East and all the buffers of R2
are heading West a deadlock
occurs!
11. Deadlock
Full flexibility leads to deadlocks – Cont.
• Deadlock can also occur even if XY deadlock free routing is used !!!!
11
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
12. Deadlock
Full flexibility leads to deadlocks – Cont.
• Deadlock can also occur even if XY deadlock free routing is used !!!!
12
Because of full Flexibility, North and
South buffers can now store packets
heading East or West, so the restrictions
made by XY routing is broken and cyclic
deadlock between 4 routers can also
occur
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
13. Deadlock
Resolving Deadlock
• The same restrictions of the Baseline router under XY routing is
applied to 2D-Flex router
• Packet restrictions in each port:
• North buffer: North, East, or West
• South buffer: South, East, or West
• East buffer: East (not changed)
• West buffer: West (not changed)
13
In other words, the North and South
buffers are not flexible!!!
But still East and West buffers are
14. Deadlock
Resolving Deadlock - Cont.
• The restrictions added make the 2D-Flex to follow the Turn-Model
with broken cycles, so no deadlock can occur
14
Deadlock arise because all
turns are allowed
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
Cyclic deadlock situation
Before adding restrictions
P3 is heading West and P1
heading East so deadlock
arises 1 2
34 8 7
5 6
15. Deadlock
Resolving Deadlock - Cont.
• The restrictions added make the 2D-Flex to follow the Turn-Model
with broken cycles, so no deadlock can occur
15
1 2
34 8 7
5 6
Possible turns in XY so no Deadlock
can arise because turns 4,2,5, and 7
are prohibited
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
No deadlock
After adding restrictions
P3 and P1 is heading
anywhere except West
and East so deadlock
cannot happen
16. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
16
17. Buffering Flexibility in 3D-NoCs
3D Flexible router (3D-Flex)
• Extending the architecture of the 2D
Flexible router is done by adding some
few signals!
• The new signals are marked blue in
the shown figure
• These signals deal with Up (U) and
Down (D) ports of the 3D router
• As shown the modification is simple!
but is it really simple?!
17
Routing
logic
req_US
grant_US req_int_E
grant_int_E
7
7
pkt_int_E
req_FFCE_FIFO_to
{W, N, S, U, D}
grant_FFCE_FIFO_to
{W, N, S, U, D}
pkt_E
pkt_N
pkt_S
5
5
64
pkt_W
pkt_U
pkt_D
64
5grant_FFCE_FIFO_from
{W, N, S, U, D}
5req_FFCE_FIFO_from
{W, N, S, U, D}
FIFO
Flexibility
Controller
(FFC)
FIFO
3D router
E
D
W
U
N
S
18. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
18
19. Deadlock Free 3D-Flex
Just add restrictions!
• Yes! Simply to avoid the previous 3D deadlock situations we must
avoid turns from Up and Down buffers to East, West, North, South
• The following table shows the restrictions of each buffer
19
Buffer restrictions
E E
W W
N E, W, N
S E, W, S
U E, W, N, S
D E, W, N, S
Table 1: buffers’ storage restrictions
20. Deadlock Free 3D-Flex
Deadlock freedom of 3D-Flex
Lemma “The 3D-Flex router architecture is free from the Turn-Model
Deadlocks.”
• To prove this lemma, we only need to show that neither one of the
deadlock situation of the Turn-Model can happen
• The proof is in the paper and also other details on deadlock
20
21. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
21
22. Simulation Platform
Performance comparison metrics
• The comparison is done as 3D-Flex vs. 3D-Base (3D baseline
router)
• The performance metrics studied are average delay and
throughput
Average delay: it is defined as the total cycles taken by the packet
to reach its destination including local buffer queuing delay
Average throughput: it is defined as the average ejection rate of
the packets at their destinations
22
23. Simulation Platform
Evaluation under real benchmark traffic
• We choose dVOPD and MPEG-4 video applications
23
Communication Task Graph of MPEG-4
Communication Task
Graph of dVOPD
24. Simulation Platform
Results - dVOPD
24
Average Delay Throughput
As shown in both comparisons 3D-
Flex outperforms 3D-Base
26. Simulation Platform
Evaluation under synthetic traffic
• We simulate under 2 traffics; Uniform random (Uni) and Nearest-
Neighbor (NN)
• In Uni traffic, the node distributes its injected traffic uniformly
between all the 3D NoC nodes
• For NN case, the node distributes all its traffic uniformly between all
its one hop away neighbors
26
27. Simulation Platform
Results – Uniform traffic
27
Average Delay Throughput
Again #3, 3D-Flex outperforms
3D-Base
28. Simulation Platform
Results – NN traffic
28
Average Delay Throughput
Again #4, 3D-Flex outperforms 3D-Base
But! The improvement is small since the congestion in NN is minimal
29. Simulation Platform
Area and power comparisons
• We synthesize the 3D-Flex and 3D-Base Verilog HDL designs on TSMC
65 nm standard library using Synopsys DC
• We study 1x1 (single router with no connections), 2×2, 4×4, and 8×8
where for each size there are 4,16, and 64 routers per layer
respectively
29
30. Simulation Platform
Area
30
Area comparison:
The 3D-Flex adds more cost in terms of
area due to the extra logic added for
flexibility
But this cost is overwhelmed by the wiring
connection area and hence the overhead is
reduced as the 3D NoC size increases
31. Simulation Platform
Power
31
Power comparison:
The 3D-Flex shows a slight increase in
power consumption
But again this power cost is overwhelmed
by other power components and hence the
overhead is reduced as the 3D NoC size
increases
32. Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
32
33. Conclusions
• In this paper we introduce the concept of buffering flexibility to 3D
NoC
• The new router architecture (3D-Flex) outperforms the baseline 3D
NoC router (3D-Base)
• Yet, 3D-Flex is deadlock free
• But, the 3D-Flex have some extra overhead costs in area and power
consumption but they can be ignored for large sized NoCs
33