SlideShare a Scribd company logo
1 of 34
A Flexible Router Architecture
for 3D Network-on-Chips
Mostafa Khamis1, Mostafa Said2, Ahmed Shalaby3
1Mentor Graphics Egypt, Egypt
2 SCALE Lab, School of Engineering, Brown University, USA
3Egypt-Japan University of Science and Technology (E-JUST), Egypt
1
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
2
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
3
Introduction
2D integration limitations
• As technology shrinks many limitations and challenges appear for traditional
integration technology (2D ICs) :
 Increasing length of wires, increasing the delay, power consumption, and routing area
 Mask costs become higher with technology scaling
• For example, the ITRS figure bellow shows how small becomes the area of clock
synchronization inside a 2D chip!!
4
Introduction
3D Integration Evolution
5TSV
In normal life if land become expensive
we build upward rather than outward
 Then move from 2D-IC to 3D-IC
 Stack all dies together using TSVs
Introduction
3D integration with NoCs: Moving From 2D to 3D NoCs
• 3D NoCs offer far more advantages
than their 2D counterparts
3D NoCs are far more scalable
The average hop counts between
routers is reduced significantly
The energy dissipation is reduced
Performance gain is achieved
6
router
2D NoC
3D NoC
Link
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
7
Congestion in 2D-NoCs
Problem description
• Packet requesting busy buffer will be blocked
• Blocking may affect other packets (back-pressure) in the network
causing congestion
8
Packet P1 is blocked causing
P2 and P3 to be blocked also
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
9
Deadlock
Full flexibility leads to deadlocks 
• Full flexibility as shown leads to deadlock because:
• All packet directions inside the buffer are permitted
• All turns are permitted then complete cycles can occur
10
Assuming each buffer FIFO is
of size 1 packet, so when all
the buffer of R1 are heading
East and all the buffers of R2
are heading West a deadlock
occurs! 
Deadlock
Full flexibility leads to deadlocks  – Cont.
• Deadlock can also occur even if XY deadlock free routing is used !!!!
11
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
Deadlock
Full flexibility leads to deadlocks  – Cont.
• Deadlock can also occur even if XY deadlock free routing is used !!!!
12
Because of full Flexibility, North and
South buffers can now store packets
heading East or West, so the restrictions
made by XY routing is broken and cyclic
deadlock between 4 routers can also
occur 
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
Deadlock
Resolving Deadlock 
• The same restrictions of the Baseline router under XY routing is
applied to 2D-Flex router
• Packet restrictions in each port:
• North buffer: North, East, or West
• South buffer: South, East, or West
• East buffer: East (not changed)
• West buffer: West (not changed)
13
In other words, the North and South
buffers are not flexible!!! 
But still East and West buffers are 
Deadlock
Resolving Deadlock  - Cont.
• The restrictions added make the 2D-Flex to follow the Turn-Model
with broken cycles, so no deadlock can occur
14
Deadlock arise because all
turns are allowed
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
Cyclic deadlock situation
Before adding restrictions
P3 is heading West and P1
heading East so deadlock
arises 1 2
34 8 7
5 6
Deadlock
Resolving Deadlock  - Cont.
• The restrictions added make the 2D-Flex to follow the Turn-Model
with broken cycles, so no deadlock can occur
15
1 2
34 8 7
5 6
Possible turns in XY so no Deadlock
can arise because turns 4,2,5, and 7
are prohibited
P4
P1
P3
P2
B1
B4
B3
B2
E
W
S
N
No deadlock
After adding restrictions
P3 and P1 is heading
anywhere except West
and East so deadlock
cannot happen
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
16
Buffering Flexibility in 3D-NoCs
3D Flexible router (3D-Flex)
• Extending the architecture of the 2D
Flexible router is done by adding some
few signals!
• The new signals are marked blue in
the shown figure
• These signals deal with Up (U) and
Down (D) ports of the 3D router
• As shown the modification is simple!
but is it really simple?!
17
Routing
logic
req_US
grant_US req_int_E
grant_int_E
7
7
pkt_int_E
req_FFCE_FIFO_to
{W, N, S, U, D}
grant_FFCE_FIFO_to
{W, N, S, U, D}
pkt_E
pkt_N
pkt_S
5
5
64
pkt_W
pkt_U
pkt_D
64
5grant_FFCE_FIFO_from
{W, N, S, U, D}
5req_FFCE_FIFO_from
{W, N, S, U, D}
FIFO
Flexibility
Controller
(FFC)
FIFO
3D router
E
D
W
U
N
S
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
18
Deadlock Free 3D-Flex
Just add restrictions!
• Yes! Simply to avoid the previous 3D deadlock situations we must
avoid turns from Up and Down buffers to East, West, North, South 
• The following table shows the restrictions of each buffer
19
Buffer restrictions
E E
W W
N E, W, N
S E, W, S
U E, W, N, S
D E, W, N, S
Table 1: buffers’ storage restrictions
Deadlock Free 3D-Flex
Deadlock freedom of 3D-Flex
Lemma “The 3D-Flex router architecture is free from the Turn-Model
Deadlocks.”
• To prove this lemma, we only need to show that neither one of the
deadlock situation of the Turn-Model can happen
• The proof is in the paper and also other details on deadlock
20
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
21
Simulation Platform
Performance comparison metrics
• The comparison is done as 3D-Flex vs. 3D-Base (3D baseline
router)
• The performance metrics studied are average delay and
throughput
 Average delay: it is defined as the total cycles taken by the packet
to reach its destination including local buffer queuing delay
 Average throughput: it is defined as the average ejection rate of
the packets at their destinations
22
Simulation Platform
Evaluation under real benchmark traffic
• We choose dVOPD and MPEG-4 video applications
23
Communication Task Graph of MPEG-4
Communication Task
Graph of dVOPD
Simulation Platform
Results - dVOPD
24
Average Delay Throughput
As shown in both comparisons 3D-
Flex outperforms 3D-Base 
Simulation Platform
Results – MPEG-4
25
Average Delay Throughput
Again #2, 3D-Flex outperforms
3D-Base  
Simulation Platform
Evaluation under synthetic traffic
• We simulate under 2 traffics; Uniform random (Uni) and Nearest-
Neighbor (NN)
• In Uni traffic, the node distributes its injected traffic uniformly
between all the 3D NoC nodes
• For NN case, the node distributes all its traffic uniformly between all
its one hop away neighbors
26
Simulation Platform
Results – Uniform traffic
27
Average Delay Throughput
Again #3, 3D-Flex outperforms
3D-Base   
Simulation Platform
Results – NN traffic
28
Average Delay Throughput
Again #4, 3D-Flex outperforms 3D-Base    
But! The improvement is small since the congestion in NN is minimal 
Simulation Platform
Area and power comparisons
• We synthesize the 3D-Flex and 3D-Base Verilog HDL designs on TSMC
65 nm standard library using Synopsys DC
• We study 1x1 (single router with no connections), 2×2, 4×4, and 8×8
where for each size there are 4,16, and 64 routers per layer
respectively
29
Simulation Platform
Area
30
Area comparison:
The 3D-Flex adds more cost in terms of
area due to the extra logic added for
flexibility 
But this cost is overwhelmed by the wiring
connection area and hence the overhead is
reduced as the 3D NoC size increases 
Simulation Platform
Power
31
Power comparison:
The 3D-Flex shows a slight increase in
power consumption 
But again this power cost is overwhelmed
by other power components and hence the
overhead is reduced as the 3D NoC size
increases 
Outlines
• Introduction
• Congestion in 2D-NoCs
• Buffering Flexibility Limitations
• Buffering Flexibility in 3D-NoCs
• Deadlock Free 3D-Flex Architecture
• Simulation Platform and Results
• Conclusions
32
Conclusions
• In this paper we introduce the concept of buffering flexibility to 3D
NoC
• The new router architecture (3D-Flex) outperforms the baseline 3D
NoC router (3D-Base)
• Yet, 3D-Flex is deadlock free
• But, the 3D-Flex have some extra overhead costs in area and power
consumption but they can be ignored for large sized NoCs
33
34
Please! Forward any question by email to
mostafa.saied@ejust.edu.eg

More Related Content

What's hot

VLSI-Physical Design- Tool Terminalogy
VLSI-Physical Design- Tool TerminalogyVLSI-Physical Design- Tool Terminalogy
VLSI-Physical Design- Tool TerminalogyMurali Rai
 
Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects E2MATRIX
 
ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)Sudhanshu Janwadkar
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresRISC-V International
 
FPGA in outer space
FPGA in outer spaceFPGA in outer space
FPGA in outer spaceAgradeepSett
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)vaidehi87
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Michelle Holley
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitSWINDONSilicon
 
vlsi design flow
vlsi design flowvlsi design flow
vlsi design flowAnish Gupta
 
MIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI Alliance
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNPankaj Singh
 
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for EnterprisesEnabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for EnterprisesMichelle Holley
 
Field-programmable gate array
Field-programmable gate arrayField-programmable gate array
Field-programmable gate arrayPrinceArjun1999
 

What's hot (20)

VLSI-Physical Design- Tool Terminalogy
VLSI-Physical Design- Tool TerminalogyVLSI-Physical Design- Tool Terminalogy
VLSI-Physical Design- Tool Terminalogy
 
Jeremy
JeremyJeremy
Jeremy
 
Fpga design flow
Fpga design flowFpga design flow
Fpga design flow
 
Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects Vhdl Project List - Verilog Projects
Vhdl Project List - Verilog Projects
 
ASIC DESIGN FLOW
ASIC DESIGN FLOWASIC DESIGN FLOW
ASIC DESIGN FLOW
 
Lect01 flow
Lect01 flowLect01 flow
Lect01 flow
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)ASIC design Flow (Digital Design)
ASIC design Flow (Digital Design)
 
Semi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V coresSemi dynamics high bandwidth vector capable RISC-V cores
Semi dynamics high bandwidth vector capable RISC-V cores
 
FPGA in outer space
FPGA in outer spaceFPGA in outer space
FPGA in outer space
 
186 devlin p-poster(2)
186 devlin p-poster(2)186 devlin p-poster(2)
186 devlin p-poster(2)
 
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
Building efficient 5G NR base stations with Intel® Xeon® Scalable Processors
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated Circuit
 
vlsi design flow
vlsi design flowvlsi design flow
vlsi design flow
 
EMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentationEMC2 Xilinx SDSoC presentation
EMC2 Xilinx SDSoC presentation
 
MIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHYMIPI DevCon 2016: Implementing MIPI C-PHY
MIPI DevCon 2016: Implementing MIPI C-PHY
 
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGNFUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
FUSION APU & TRENDS/ CHALLENGES IN FUTURE SoC DESIGN
 
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for EnterprisesEnabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
Enabling Multi-access Edge Computing (MEC) Platform-as-a-Service for Enterprises
 
ASIC
ASICASIC
ASIC
 
Field-programmable gate array
Field-programmable gate arrayField-programmable gate array
Field-programmable gate array
 

Similar to A Flexible Router Architecture for 3D Network-on-Chips

Practical Use Cases for Ethernet Redundancy
Practical Use Cases for Ethernet RedundancyPractical Use Cases for Ethernet Redundancy
Practical Use Cases for Ethernet RedundancyRealTime-at-Work (RTaW)
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Vincenzo Gulisano
 
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routingPLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routingPROIDEA
 
3.3 gpp NR USER Plane introduction
3.3 gpp NR USER Plane introduction3.3 gpp NR USER Plane introduction
3.3 gpp NR USER Plane introductionSaurabh Verma
 
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLS
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLSCisco DWDM Chromatic Dispertion Calculation in CTP\XLS
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLSValery Kayukov
 
crosstalk minimisation using vlsi
crosstalk minimisation using vlsicrosstalk minimisation using vlsi
crosstalk minimisation using vlsisubhradeep mitra
 
Term paper presentation
Term paper presentationTerm paper presentation
Term paper presentationmariam mehreen
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptxasdq4
 
Embedded Logic Flip-Flops: A Conceptual Review
Embedded Logic Flip-Flops: A Conceptual ReviewEmbedded Logic Flip-Flops: A Conceptual Review
Embedded Logic Flip-Flops: A Conceptual ReviewSudhanshu Janwadkar
 
computer networks_fundamentals.pptx
computer networks_fundamentals.pptxcomputer networks_fundamentals.pptx
computer networks_fundamentals.pptxssuser5cb8d3
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip BasicsA B Shinde
 
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...Communication Systems & Networks
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph Community
 
Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Ashwin Shroff
 
Review of Network switches and Routers- 2021.pptx
Review of Network switches and Routers-  2021.pptxReview of Network switches and Routers-  2021.pptx
Review of Network switches and Routers- 2021.pptxShawW2
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdlRavi Sony
 
campus_design_eng1.ppt
campus_design_eng1.pptcampus_design_eng1.ppt
campus_design_eng1.pptchali100
 

Similar to A Flexible Router Architecture for 3D Network-on-Chips (20)

Chapter 10.pptx
Chapter 10.pptxChapter 10.pptx
Chapter 10.pptx
 
Practical Use Cases for Ethernet Redundancy
Practical Use Cases for Ethernet RedundancyPractical Use Cases for Ethernet Redundancy
Practical Use Cases for Ethernet Redundancy
 
Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)Crash course on data streaming (with examples using Apache Flink)
Crash course on data streaming (with examples using Apache Flink)
 
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routingPLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
PLNOG19 - Krzysztof Szarkowicz - RIFT i nowe pomysły na routing
 
3.3 gpp NR USER Plane introduction
3.3 gpp NR USER Plane introduction3.3 gpp NR USER Plane introduction
3.3 gpp NR USER Plane introduction
 
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLS
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLSCisco DWDM Chromatic Dispertion Calculation in CTP\XLS
Cisco DWDM Chromatic Dispertion Calculation in CTP\XLS
 
crosstalk minimisation using vlsi
crosstalk minimisation using vlsicrosstalk minimisation using vlsi
crosstalk minimisation using vlsi
 
Term paper presentation
Term paper presentationTerm paper presentation
Term paper presentation
 
L3-.pptx
L3-.pptxL3-.pptx
L3-.pptx
 
Embedded Logic Flip-Flops: A Conceptual Review
Embedded Logic Flip-Flops: A Conceptual ReviewEmbedded Logic Flip-Flops: A Conceptual Review
Embedded Logic Flip-Flops: A Conceptual Review
 
Sdh total final
Sdh total finalSdh total final
Sdh total final
 
computer networks_fundamentals.pptx
computer networks_fundamentals.pptxcomputer networks_fundamentals.pptx
computer networks_fundamentals.pptx
 
Manja ppt
Manja pptManja ppt
Manja ppt
 
SOC Chip Basics
SOC Chip BasicsSOC Chip Basics
SOC Chip Basics
 
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...
A Study on MPTCP for Tolerating Packet Reordering and Path Heterogeneity in W...
 
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong KimCeph QoS: How to support QoS in distributed storage system - Taewoong Kim
Ceph QoS: How to support QoS in distributed storage system - Taewoong Kim
 
Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)Gate Diffusion Input Technology (Very Large Scale Integration)
Gate Diffusion Input Technology (Very Large Scale Integration)
 
Review of Network switches and Routers- 2021.pptx
Review of Network switches and Routers-  2021.pptxReview of Network switches and Routers-  2021.pptx
Review of Network switches and Routers- 2021.pptx
 
Short.course.introduction.to.vhdl
Short.course.introduction.to.vhdlShort.course.introduction.to.vhdl
Short.course.introduction.to.vhdl
 
campus_design_eng1.ppt
campus_design_eng1.pptcampus_design_eng1.ppt
campus_design_eng1.ppt
 

Recently uploaded

💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...vershagrag
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsvanyagupta248
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxkalpana413121
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxNANDHAKUMARA10
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxpritamlangde
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaOmar Fathy
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityMorshed Ahmed Rahath
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesRashidFaridChishti
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesMayuraD1
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayEpec Engineered Technologies
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startQuintin Balsdon
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptAfnanAhmad53
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...jabtakhaidam7
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...ppkakm
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfsumitt6_25730773
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiessarkmank1
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Ramkumar k
 

Recently uploaded (20)

💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
💚Trustworthy Call Girls Pune Call Girls Service Just Call 🍑👄6378878445 🍑👄 Top...
 
AIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech studentsAIRCANVAS[1].pdf mini project for btech students
AIRCANVAS[1].pdf mini project for btech students
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
DeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakesDeepFakes presentation : brief idea of DeepFakes
DeepFakes presentation : brief idea of DeepFakes
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
Jaipur ❤CALL GIRL 0000000000❤CALL GIRLS IN Jaipur ESCORT SERVICE❤CALL GIRL IN...
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Introduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdfIntroduction to Data Visualization,Matplotlib.pdf
Introduction to Data Visualization,Matplotlib.pdf
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 

A Flexible Router Architecture for 3D Network-on-Chips

  • 1. A Flexible Router Architecture for 3D Network-on-Chips Mostafa Khamis1, Mostafa Said2, Ahmed Shalaby3 1Mentor Graphics Egypt, Egypt 2 SCALE Lab, School of Engineering, Brown University, USA 3Egypt-Japan University of Science and Technology (E-JUST), Egypt 1
  • 2. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 2
  • 3. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 3
  • 4. Introduction 2D integration limitations • As technology shrinks many limitations and challenges appear for traditional integration technology (2D ICs) :  Increasing length of wires, increasing the delay, power consumption, and routing area  Mask costs become higher with technology scaling • For example, the ITRS figure bellow shows how small becomes the area of clock synchronization inside a 2D chip!! 4
  • 5. Introduction 3D Integration Evolution 5TSV In normal life if land become expensive we build upward rather than outward  Then move from 2D-IC to 3D-IC  Stack all dies together using TSVs
  • 6. Introduction 3D integration with NoCs: Moving From 2D to 3D NoCs • 3D NoCs offer far more advantages than their 2D counterparts 3D NoCs are far more scalable The average hop counts between routers is reduced significantly The energy dissipation is reduced Performance gain is achieved 6 router 2D NoC 3D NoC Link
  • 7. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 7
  • 8. Congestion in 2D-NoCs Problem description • Packet requesting busy buffer will be blocked • Blocking may affect other packets (back-pressure) in the network causing congestion 8 Packet P1 is blocked causing P2 and P3 to be blocked also
  • 9. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 9
  • 10. Deadlock Full flexibility leads to deadlocks  • Full flexibility as shown leads to deadlock because: • All packet directions inside the buffer are permitted • All turns are permitted then complete cycles can occur 10 Assuming each buffer FIFO is of size 1 packet, so when all the buffer of R1 are heading East and all the buffers of R2 are heading West a deadlock occurs! 
  • 11. Deadlock Full flexibility leads to deadlocks  – Cont. • Deadlock can also occur even if XY deadlock free routing is used !!!! 11 P4 P1 P3 P2 B1 B4 B3 B2 E W S N
  • 12. Deadlock Full flexibility leads to deadlocks  – Cont. • Deadlock can also occur even if XY deadlock free routing is used !!!! 12 Because of full Flexibility, North and South buffers can now store packets heading East or West, so the restrictions made by XY routing is broken and cyclic deadlock between 4 routers can also occur  P4 P1 P3 P2 B1 B4 B3 B2 E W S N
  • 13. Deadlock Resolving Deadlock  • The same restrictions of the Baseline router under XY routing is applied to 2D-Flex router • Packet restrictions in each port: • North buffer: North, East, or West • South buffer: South, East, or West • East buffer: East (not changed) • West buffer: West (not changed) 13 In other words, the North and South buffers are not flexible!!!  But still East and West buffers are 
  • 14. Deadlock Resolving Deadlock  - Cont. • The restrictions added make the 2D-Flex to follow the Turn-Model with broken cycles, so no deadlock can occur 14 Deadlock arise because all turns are allowed P4 P1 P3 P2 B1 B4 B3 B2 E W S N Cyclic deadlock situation Before adding restrictions P3 is heading West and P1 heading East so deadlock arises 1 2 34 8 7 5 6
  • 15. Deadlock Resolving Deadlock  - Cont. • The restrictions added make the 2D-Flex to follow the Turn-Model with broken cycles, so no deadlock can occur 15 1 2 34 8 7 5 6 Possible turns in XY so no Deadlock can arise because turns 4,2,5, and 7 are prohibited P4 P1 P3 P2 B1 B4 B3 B2 E W S N No deadlock After adding restrictions P3 and P1 is heading anywhere except West and East so deadlock cannot happen
  • 16. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 16
  • 17. Buffering Flexibility in 3D-NoCs 3D Flexible router (3D-Flex) • Extending the architecture of the 2D Flexible router is done by adding some few signals! • The new signals are marked blue in the shown figure • These signals deal with Up (U) and Down (D) ports of the 3D router • As shown the modification is simple! but is it really simple?! 17 Routing logic req_US grant_US req_int_E grant_int_E 7 7 pkt_int_E req_FFCE_FIFO_to {W, N, S, U, D} grant_FFCE_FIFO_to {W, N, S, U, D} pkt_E pkt_N pkt_S 5 5 64 pkt_W pkt_U pkt_D 64 5grant_FFCE_FIFO_from {W, N, S, U, D} 5req_FFCE_FIFO_from {W, N, S, U, D} FIFO Flexibility Controller (FFC) FIFO 3D router E D W U N S
  • 18. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 18
  • 19. Deadlock Free 3D-Flex Just add restrictions! • Yes! Simply to avoid the previous 3D deadlock situations we must avoid turns from Up and Down buffers to East, West, North, South  • The following table shows the restrictions of each buffer 19 Buffer restrictions E E W W N E, W, N S E, W, S U E, W, N, S D E, W, N, S Table 1: buffers’ storage restrictions
  • 20. Deadlock Free 3D-Flex Deadlock freedom of 3D-Flex Lemma “The 3D-Flex router architecture is free from the Turn-Model Deadlocks.” • To prove this lemma, we only need to show that neither one of the deadlock situation of the Turn-Model can happen • The proof is in the paper and also other details on deadlock 20
  • 21. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 21
  • 22. Simulation Platform Performance comparison metrics • The comparison is done as 3D-Flex vs. 3D-Base (3D baseline router) • The performance metrics studied are average delay and throughput  Average delay: it is defined as the total cycles taken by the packet to reach its destination including local buffer queuing delay  Average throughput: it is defined as the average ejection rate of the packets at their destinations 22
  • 23. Simulation Platform Evaluation under real benchmark traffic • We choose dVOPD and MPEG-4 video applications 23 Communication Task Graph of MPEG-4 Communication Task Graph of dVOPD
  • 24. Simulation Platform Results - dVOPD 24 Average Delay Throughput As shown in both comparisons 3D- Flex outperforms 3D-Base 
  • 25. Simulation Platform Results – MPEG-4 25 Average Delay Throughput Again #2, 3D-Flex outperforms 3D-Base  
  • 26. Simulation Platform Evaluation under synthetic traffic • We simulate under 2 traffics; Uniform random (Uni) and Nearest- Neighbor (NN) • In Uni traffic, the node distributes its injected traffic uniformly between all the 3D NoC nodes • For NN case, the node distributes all its traffic uniformly between all its one hop away neighbors 26
  • 27. Simulation Platform Results – Uniform traffic 27 Average Delay Throughput Again #3, 3D-Flex outperforms 3D-Base   
  • 28. Simulation Platform Results – NN traffic 28 Average Delay Throughput Again #4, 3D-Flex outperforms 3D-Base     But! The improvement is small since the congestion in NN is minimal 
  • 29. Simulation Platform Area and power comparisons • We synthesize the 3D-Flex and 3D-Base Verilog HDL designs on TSMC 65 nm standard library using Synopsys DC • We study 1x1 (single router with no connections), 2×2, 4×4, and 8×8 where for each size there are 4,16, and 64 routers per layer respectively 29
  • 30. Simulation Platform Area 30 Area comparison: The 3D-Flex adds more cost in terms of area due to the extra logic added for flexibility  But this cost is overwhelmed by the wiring connection area and hence the overhead is reduced as the 3D NoC size increases 
  • 31. Simulation Platform Power 31 Power comparison: The 3D-Flex shows a slight increase in power consumption  But again this power cost is overwhelmed by other power components and hence the overhead is reduced as the 3D NoC size increases 
  • 32. Outlines • Introduction • Congestion in 2D-NoCs • Buffering Flexibility Limitations • Buffering Flexibility in 3D-NoCs • Deadlock Free 3D-Flex Architecture • Simulation Platform and Results • Conclusions 32
  • 33. Conclusions • In this paper we introduce the concept of buffering flexibility to 3D NoC • The new router architecture (3D-Flex) outperforms the baseline 3D NoC router (3D-Base) • Yet, 3D-Flex is deadlock free • But, the 3D-Flex have some extra overhead costs in area and power consumption but they can be ignored for large sized NoCs 33
  • 34. 34 Please! Forward any question by email to mostafa.saied@ejust.edu.eg