SlideShare a Scribd company logo
1 of 8
Download to read offline
Term Paper Submission ECE 562 – Fall 2013 
1 
ISBs: Bidirectional Buffer-less Router with Intelligent Space Buffers 
Dhiraj Chaudhary and Ahmed Louri 
Dept. of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721 
{dhirajchaudhary,louri}@ece.arizona.edu 
ABSTRACT 
Buffers in routers consume significant power and area. A novel intelligent space buffers (ISBs) NOC architecture capable of mitigating both power and performance issues is proposed. Buffer-less router designs illustrates a significant degradation of performance at high injection rates. We make a case for new approach for power efficient design of Network- on-Chip utilizing buffer-less routers with improved performance. 
General Terms: Architecture, Algorithm, Design. 
Keywords: routing, network on chip, control, buffers, Channels. 
1. INTRODUCTION 
Today high performance and power are very tight constraints for Network on Chip (NOC). According to some papers, NOC consumes up to 30% of power in the Intel 80-core Terascale chip [1] and about 40% in the MIT RAW chip [2].A lot work has been done and still in progress to balance power and performance. As we increase the number of cores the latency dominates and power control mechanisms further worsen this situations. It is essential to design a low power design for NOC by keeping performance with in certain limits. This paper will discuss about a new low power design which can be thought of as a balanced implementation for future NOC designs. 
Buffers are power hungry. A paper by Onur Mutlu et. al [3] suggests removing buffers can save upto 60% of total power in NOC. But removing buffers has potential negative impact on performance and bandwidth efficiency. This design works well for low injection rates but for high injection rate BLESS consumes a substantial percentage of chip power with degradation in performance. Latif Khalid et. al [4] discusses a very straight forward approach, utilize ideal buffers. Storing packets require more power as compared to transmission them so it is better to transmit packets [9]. Sharing of buffers amongst various ports or virtual channels can decrease a significant buffer count. This design comes with an additional computational complexity impacting area consumption and may be power in certain cases. Avinash Kodi et. al [5] has introduced adaptive dual-function links. Links can dynamically configured as repeaters as well as storage units in case of congestion. It can save ~40% of buffer power, and area efficient as well. 
In this paper, we propose intelligent space buffers (ISBs) which can achieve high performance with buffer-less routers by keeping power consumption with in certain limits. We deploy buffers in the space around the router. Congestion control mechanism is inherent quality of control unit. Control unit dynamically manages the number of buffers allocated to each channel according to traffic. Bi-directional [6] links has been utilized to utilize buffers in a more effective manner. 
2. RELATED WORKS 
2.1 BLESS: buffer-less routers 
Buffers are responsible for 60% of total power consumption in network on chip (NOC) and consumes about 64% of static power [7] [8]. Many researchers hate buffers and try to completely keep them away from router. Buffer-less router design BLESS by Onur Mutlu et. al [3] demonstrates 60% reduction in area, deadlock avoidance, simplified router design and no live locks etc. But the research statistics shows that by eliminating buffers, there is a major degradation in performance. Concept goes well for low injection rate but with high injection rate, significant degradation in both power and performance has been observed [3]. 
In conventional design one can see the buffers associated with each virtual channel. Along with that there is huge area hungry control circuitry including VC
Term Paper Submission ECE 562 – Fall 2013 
2 
allocator, switch allocator and route computation unit are present. 
Figure 1. Traditional switch architecture with buffers 
Figure 2. Buffer-less switch architecture 
If we go for buffer-less router then significant area can be saved. BLESS uses hot potato routing protocol. It is a deflection based mechanism in which after receiving a packet or flit, router will deflect it in any direction based on port availability. Flit ranking mechanism illustrated in figure -- takes care of live-lock problem caused by deflection. Oldest packet will get more priority which can avoid the live-lock situation in buffer-less. As the flits are always in motion so deadlock situation cannot arise, which is one of the major problems in the routers with buffers. Another advantage of BLESS is very less router latency because of less routing computations. But major drawback is buffer-less does not perform well in high injection rates. With the increase in injection rate at router, its performance degrades drastically. As illustrated in [3] injection rate of 0.08, buffer-less router outperforms the router with buffers. At injection rate 0.28 there is drastic increase in link and router energy. This is due to the fact that packet takes longer time when deflected in wrong directions to reach destination. Pipeline latency is less in BLESS as compared to conventional router with buffers. Decrease in latency is because of elimination of virtual channel allocation and switch allocation stages. Experimental results [3] clearly indicates the breakdown for buffer-less at 0.29 injection rate compared to 0.35 for 4 VC- 4 flits buffer. All the experiments are carried out by considering 8*8 routers using synthetic traces utilizing 4 different traffic patterns: Uniform routing (UR), transpose (TR), mesh tornado (TOR) and bit complement (BC). 
BLESS design works well for less traffic network. In NOCs it is applicable to the memory-core interface. As memory and core communicate at less injection rates. But still there are a lot of issues associated with buffer-less routers. First one is flit overhead, every flit should have header associated with it. Second one is high latency with respect to each flit reaching destination. Because flits will arrive at different time intervals therefore to accumulate flits to packet we may require a large buffer size at receiver. Because of all above stated drawbacks BLESS did not get much success in term of practical implementation. 
2.2 Shared buffers 
In this design Latif Khalid et. al [4] has proposed to share the buffers associated with each virtual channel. Figure 3 describes the conventional router architecture in which each virtual channel has its own buffer space associated with it.
Term Paper Submission ECE 562 – Fall 2013 
3 
Figure 3. Architecture of input part of router for shared buffers NOC design (Courtesy of Latif, Khalid, Tiberiu Seceleanu, and Hannu Tenhunen. "Power and area efficient design of network-on-chip router through utilization of idle buffers." Engineering of Computer Based Systems (ECBS), 2010 17th IEEE International Conference and Workshops on. IEEE, 2010.) 
Figure 1 describes the conventional router architecture in which each virtual channel has its own buffer space associated with it. Traffic of virtual channel 1 cannot utilize the buffers of other virtual channel even though they are free. In practical scenario 100% buffers are never utilized. The idea is to utilize this unutilized channel buffer space. In figure 3 we showcase the shared buffer architecture. 
The main contribution of this paper lies in the input part where the channels share the common buffer space. Each packet is divided in flits in which first flit is head flit. We call it as beginning of packet (BOP). When BOP arrives at buffer allocator unit. It will look for the free buffer space and allocate it. Then allocated signal is sent to buffer write controller in response to which buffer write controller will send busy signal. After receiving busy signal buffer allocator will send allocated to signal which will set the multiplexer pins of input buffer. After allocation, grant signal will be sent to port sending flits. This signal acts as the virtual channel identifier. For every new flit the port will send the NewFlit_Dx_x signal to buffer write controller. In case of two requests for one buffer slot we need to arbitrate which is done by priority signal shown in figure. Status_flag is the logical AND operation of all the busy signals which indicate all buffer slots are full. After receiving this signal, requesting neighboring port takes decision to redirect flits to some other direction or store until congestion is resolved. 
2.3 iDEAL- Inter-router Dual-function Energy and Area-efficient Links for NoC architectures 
With continued improvement in the router design, a paper [5] addresses a completely new era of architecture in NOCs which saves up to 40% of buffer power and 41% of router area. Basic idea is to utilize the repeaters in the links to dynamically act as buffers. iDEAL replaces the conventional buffers by three state repeaters. When the control signal is low, three state repeater acts in the similar way as conventional repeater. But with high control signal it can act as a buffer which can hold the bit. 
Figure 1 illustrates the conventional router architecture, in which each virtual channel has 4 buffer slots of 128 bits each. We can remove some of these buffers and can place them on the link. This can save router area and power consumption as well. Figure 4 shows the reduced buffer size of router v4-r16-c0 to v4-r8-c8. Congestion control signal dynamically configure these adaptive link buffers (ALBs) to act as repeaters or buffers according to traffic load. iDEAL improves power
Term Paper Submission ECE 562 – Fall 2013 
4 
Figure 4. Dual function links used in iDEAL NOC architecture (Courtesy of Kodi, Avinash Karanth, Ashwini Sarathy, and Ahmed Louri. "iDEAL: Inter-router dual-function energy and area-efficient links for network-on-chip (NoC) architectures." ACM SIGARCH Computer Architecture News. Vol. 36. No. 3. IEEE Computer Society, 2008) 
and area more than 40% with 1-2 % degradation in performance [5]. 
2.4 BiNoC: A Bidirectional NoC Architecture with Dynamic Self-Reconfigurable Channel 
Bidirectional NoCs allow each communication channel to be dynamically configured in either directions to enhance the performance. This design illustrates a significant increase in performance with some area penalty [6]. Aim is to utilize the channel’s bandwidth more effectively. In BiNOC design, if outgoing channel has more traffic as compared to incoming channel, BiNoC design can switch the direction of incoming channel. In this way load is shared between two channels. BiNoC can be utilized in the networks where traffic density varies much in opposite directions. 
3. DESIGN OF INTELLIGENT SPACE BUFFERS 
3.1 NOC router Architecture 
We use an n * n mesh architecture in a 2-D mesh. Routers are considered as buffer-less and connected to processing element (PE). Each router is connected to four adjacent neighbors north, east, south & west respectively. Packets are divided in to head, body and tail flits similar to conventional architectures. Deflection routing algorithm is considered in this design. 
3.2 Problem description: 
Buffer-less routers illustrates a significant degradation in performance and power consumption at high injection rates, which defeats aim to go for buffer-less [6]. 
(a) 
(b)
Term Paper Submission ECE 562 – Fall 2013 
5 
Figure 5. (a) Drop packet in case of congestion for BLESS router architecture 
(b)Redirected packet in case of congestion for BLESS architecture. 
In figure 5, suppose that B and C both send their respective packets to same output port of router A. Then router A will have to drop one of packets because there is no buffers to store packets and at a time only one can take that output port. Or if deflection based routing algorithm is employed then packets are redirected to any output port which is free. Deflected packet takes long time to reach destination which degrades the overall performance of BLESS router design. 
3.1 Intelligent space buffers (ISBs) implementation 
In this section we detail the implementation of intelligent space buffers and associated control unit. 
Figure 6. Proposed intelligent space buffers. 
Figure 6 illustrates the conventional buffers 
replaced by stack of buffers placed outside router. 
When the decision and control unit’s signal is low then buffers will be in power down mode. Whereas in case of congestion, buffers will be activated and hold the data bits. Buffers will be in activation mode until congestion is alleviated. This implementation enables the buffer-less routers to perform well at high injection rates. Control unit is the heart of ISBs which is discussed in next section. 
3.2 Control Unit Implementation 
Control unit enables the buffers to be in power down or active mode during congestion. A single control unit is responsible for the activation of all space buffers shown in figure 6. Control unit as illustrated in figure 7, consists of a counter which counts the number of flits/ packets flowing in particular link. Although for simplicity only one link is shown but in practical implementation 2 links will be controlled by control unit. Comparator unit compares the count obtained from counter unit to the predetermined stored value “P”. If value exceed this threshold value (P) then decision & control unit sends the activate signal to respective buffers. Apart from that control unit will also send 
Figure 7. Proposed control unit implementation for ISBs
Term Paper Submission ECE 562 – Fall 2013 
6 
the switching signals to sw1 and sw2. Now all the traffic from port A to B will traverse via buffer unit. The overhead of control unit is negligible if we compare it with power saving. 
Figure 8. Proposed algorithm implemented at control unit of ISB architecture 
Figure 8 illustrates the detailed algorithm to be implemented at control unit. The main issue is, how to determine threshold value. Another issue is how much buffer space to be allocated to each channel in case of congestion. We have considered 80% for the prototype but still it needs an improvement. 
3.3 Dynamic space buffers in Bi-Directional links 
Proposed intelligent space buffers architecture can be further optimized by utilizing bi-directional links [6]. Figure 9 illustrates the behavior of links when traffic in one dimension dominates the other. In figure 9(b), R1 (Router 1) configures both the channels and links as the output when traffic from R1 to R2 is more than traffic from R2 to R1. Figure 9(c) illustrates the opposite scenario that is traffic from R2 to R1 is more. 
In figure 10 block diagram illustrates the bi- directional channel or link between router A and B. 
Introducing bidirectional links can improve performance [6] at high injection rates. 
Figure 9. 
(a) Conventional unidirectional link between routers R1 and R2. 
(b) Reconfigured links for congestion from R1 to R2 router. 
But there is scope of power reduction in our design by using bi-directional channels instead of unidirectional. Algorithm at router interface works in a similar fashion as described in [6] 
Figure 10. Bidirectional links implemented in ISBs 
Suppose that routers cannot process a packet before 2 ns and a packet is sent from router A to router B at 1 ns followed by one more packet on the same port interface at 2 ns. But router B cannot process new request before 3 ns so it will drop the packet. We can utilize the incoming channel from router B to Router A at same port if it is free. A control circuitry is needed to switch the direction of port. If 2 or more packets request the same port at 2 ns then algorithm illustrated in figure 8 running at control circuitry of space buffers will start executing.
Term Paper Submission ECE 562 – Fall 2013 
7 
3.4 Power gated frame implementation 
Figure 11. Proposed pipelined power gating scheme 
Power gating suffers from wake up latency which impacts performance [10] [11]. We are using sleep mode transistors in ISBs for performance optimization. 10% of total transistors are in sleep mode and 90 % remain in complete shut off. When injection rate at any port is high, control block will redirect the traffic via buffers. When 8 % of buffers are occupied then 30 % of remaining buffers are triggered to wake up mode. This will avoid the wake up latency. As shown in figure 11, when traffic is below threshold then we can start sending buffers back to power down mode. We have assumed 10% drop in buffer space when load decreases below some threshold value. State 5 indicates 90% buffers are utilized at most. After this all the packets specific to that port will be discarded. This will avoid the impact of congestion to another port. Proposed gating scheme can perform well at high injection rates also. As we overcoming wakeup latency, this scheme offers high performance as compared to conventional power gating. We are keeping buffers in power down mode which is complete shut-down hence static power dissipation will be less in pipelined power gating scheme. 
Pipelined power gating scheme is easy to implement and promising in terms of power and high performance. Exact performance gain can be calculated after simulations. Our estimation shows saving of more than 5 clock cycles. As 5 clock
Term Paper Submission ECE 562 – Fall 2013 
8 
cycles saving is illustrated in [11] and pipelined power gating can further improvise this performance. 
4. DESIGN COMPLEXITY 
Proposed ISBs architecture is not area efficient design. Because we are dynamically controlling links as well as buffers. Control circuitry may take a large percentage of area. Another issue is with predetermined threshold value used in control unit. We need to recheck the proposed design in real time traffic. We may implement a learning mechanism to set predetermined threshold but area constraint is the major issue which we need to look for success of ISBs. 
5. FUTURE WORK 
While ISBs is appealing design for its power and performance balance but there exists a large design space that spans the gap between traditional and ISBs architecture. First, area efficient design for ISBs NOC architecture, which is not discussed in this paper. Another one is, permutation and priority schemes to be implemented at the control block in case of congestion. Deadlock may also be the problem of ISBs because of implementation of new buffers. Flow control mechanisms are implemented by counter, which can be improved to make ISBs more performance and power. 
6. CONCLUSION 
In this paper we propose a novel architecture to counter performance and power issues in NOC. ISBs utilizes buffer-less router and bidirectional links to achieve significant saving in power. To counter performance issue, we provide self- configured intelligent space buffers. Novel architecture lacks in simulations because of time constraints. It is our hope that this proposed architecture will inspire more new ideas for works on NOC. 
7. REFRENCES 
[1] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. “A 5-ghz mesh interconnect for a teraflops processor”. IEEE Micro, 27(5), 2007. 
[2] Taylor, Michael Bedford, et al. "Evaluation of the Raw microprocessor: An exposed-wire- 
ndelay architecture for ILP and streams." ACM SIGARCH Computer Architecture News. Vol. 32. No. 2. IEEE Computer Society, 2004. 
[3] Moscibroda, Thomas, and Onur Mutlu. "A case for bufferless routing in on-chip networks." ACM SIGARCH Computer Architecture News. Vol. 37. No. 3. ACM, 2009] 
[4] Latif, Khalid, Tiberiu Seceleanu, and Hannu Tenhunen. "Power and area efficient design of network-on-chip router through utilization of idle buffers." Engineering of Computer Based Systems (ECBS), 2010 17th IEEE International Conference and Workshops on. IEEE, 2010. 
[5] Kodi, Avinash Karanth, Ashwini Sarathy, and Ahmed Louri. "iDEAL: Inter-router dual- function energy and area-efficient links for network-on-chip (NoC) architectures." ACM SIGARCH Computer Architecture News. Vol. 36. No. 3. IEEE Computer Society, 2008. 
[6] Y.C. Lan, S.H. Lo, Y.C. Lin, Y.H. Hu, and S.J. Chen, "BiNoC: A Bidirectional NoC Architecture with Dynamic Self- Reconfigurable Channel," in Proc. of the 3rd ACM/IEEE International Symposium on Networks-on-Chip, pp. 266-275, 2009. 
[7] W. Hangsheng, L. S. Peh, and S. Malik. “Power driven design of router microarchitectures in on-chip networks,” Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 105-116, 2003. 
[8] Xuning Chen and Li-Shiuan Peh. “Leakage power modeling and optimization of interconnection networks”. Proceedings of International Symposium on Low Power Electronics and Design, pp. 9095, 2003. 
[9] T. T. Ye, L. Benini, G. De Micheli. “Analysis of power consumption on switch fabrics in network routers,” Proceedings of the 39th Design Automation Conference (DAC), pp. 524-529, 2002. 
[10] Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, "Microarchitectural techniques for power gating of execution units," in International Symposium on Lower Power Electronics and Design (ISLPED), CA, USA, pp. 32-37, 2004. 
[11] H. Matsutani, M. Koibuchi, W. Daihan, and H. Amano, "Run-time power gating of on-chip routers using look-ahead routing," in 13th Asia and South Pacific Design Automation Conference (ASP-DAC), Piscataway, NJ, USA, pp. 55-60, 2008.

More Related Content

What's hot

An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...
An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...
An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...IJERD Editor
 
Study of Leach Protocol- A Review
Study of Leach Protocol- A ReviewStudy of Leach Protocol- A Review
Study of Leach Protocol- A ReviewEditor IJMTER
 
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...ijfcstjournal
 
29 ijaprr vol1-4-14-23kishor
29 ijaprr vol1-4-14-23kishor29 ijaprr vol1-4-14-23kishor
29 ijaprr vol1-4-14-23kishorijaprr_editor
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSIAEME Publication
 
Improving thrpoughput and energy efficiency by pctar protocol in wireless
Improving thrpoughput and energy efficiency by pctar protocol in wirelessImproving thrpoughput and energy efficiency by pctar protocol in wireless
Improving thrpoughput and energy efficiency by pctar protocol in wirelessIaetsd Iaetsd
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution vivek pratap singh
 
Energy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksEnergy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksPushpita Biswas
 
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATION
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATIONVECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATION
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATIONPiero Belforte
 
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET Journal
 
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...IRJET Journal
 
2015 11-07 -ad_hoc__network architectures and protocol stack
2015 11-07 -ad_hoc__network architectures and protocol stack2015 11-07 -ad_hoc__network architectures and protocol stack
2015 11-07 -ad_hoc__network architectures and protocol stackSyed Ariful Islam Emon
 
Gateway based multi hop distributed energy efficient clustering protocol for ...
Gateway based multi hop distributed energy efficient clustering protocol for ...Gateway based multi hop distributed energy efficient clustering protocol for ...
Gateway based multi hop distributed energy efficient clustering protocol for ...ijujournal
 
Various Clustering Techniques in Wireless Sensor Network
Various Clustering Techniques in Wireless Sensor NetworkVarious Clustering Techniques in Wireless Sensor Network
Various Clustering Techniques in Wireless Sensor NetworkEditor IJCATR
 
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORK
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORKSIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORK
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORKijngnjournal
 

What's hot (19)

An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...
An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...
An Analysis of Low Energy Adaptive Clustering Hierarchy (LEACH) Protocol for ...
 
Study of Leach Protocol- A Review
Study of Leach Protocol- A ReviewStudy of Leach Protocol- A Review
Study of Leach Protocol- A Review
 
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...
MULTI-HOP DISTRIBUTED ENERGY EFFICIENT HIERARCHICAL CLUSTERING SCHEME FOR HET...
 
29 ijaprr vol1-4-14-23kishor
29 ijaprr vol1-4-14-23kishor29 ijaprr vol1-4-14-23kishor
29 ijaprr vol1-4-14-23kishor
 
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMSFPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
FPGA IMPLEMENTATION OF PRIORITYARBITER BASED ROUTER DESIGN FOR NOC SYSTEMS
 
94
9494
94
 
Improving thrpoughput and energy efficiency by pctar protocol in wireless
Improving thrpoughput and energy efficiency by pctar protocol in wirelessImproving thrpoughput and energy efficiency by pctar protocol in wireless
Improving thrpoughput and energy efficiency by pctar protocol in wireless
 
87
8787
87
 
Hybrid networking and distribution
Hybrid networking and distribution Hybrid networking and distribution
Hybrid networking and distribution
 
Aps 10june2020
Aps 10june2020Aps 10june2020
Aps 10june2020
 
Energy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networksEnergy efficient communication techniques for wireless micro sensor networks
Energy efficient communication techniques for wireless micro sensor networks
 
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATION
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATIONVECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATION
VECTOR VS PIECEWISE-LINEAR FITTING FOR SIGNAL AND POWER INTEGRITY SIMULATION
 
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD CodeIRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
IRJET- Performance Analysis of IP Over Optical CDMA System based on RD Code
 
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...
IRJET-Comparative Study of Leach, Sep,Teen,Deec, and Pegasis in Wireless Sens...
 
2015 11-07 -ad_hoc__network architectures and protocol stack
2015 11-07 -ad_hoc__network architectures and protocol stack2015 11-07 -ad_hoc__network architectures and protocol stack
2015 11-07 -ad_hoc__network architectures and protocol stack
 
Gateway based multi hop distributed energy efficient clustering protocol for ...
Gateway based multi hop distributed energy efficient clustering protocol for ...Gateway based multi hop distributed energy efficient clustering protocol for ...
Gateway based multi hop distributed energy efficient clustering protocol for ...
 
Various Clustering Techniques in Wireless Sensor Network
Various Clustering Techniques in Wireless Sensor NetworkVarious Clustering Techniques in Wireless Sensor Network
Various Clustering Techniques in Wireless Sensor Network
 
82
8282
82
 
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORK
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORKSIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORK
SIMULATION BASED ANALYSIS OF CLUSTER-BASED PROTOCOL IN WIRELESS SENSOR NETWORK
 

Viewers also liked

Coarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerCoarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerDhiraj Chaudhary
 
Heartbeat Prayer Cards
Heartbeat Prayer CardsHeartbeat Prayer Cards
Heartbeat Prayer CardsChris Nicholls
 
Intelligent space buffers power efficient solution for network on chip
Intelligent space buffers   power efficient solution for network on chipIntelligent space buffers   power efficient solution for network on chip
Intelligent space buffers power efficient solution for network on chipDhiraj Chaudhary
 
Отделение предпринимательства и сервиса
Отделение предпринимательства и сервисаОтделение предпринимательства и сервиса
Отделение предпринимательства и сервисаrusval
 
Coarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureCoarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureDhiraj Chaudhary
 
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Dhiraj Chaudhary
 
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Dhiraj Chaudhary
 
Cosas para hacer en internet
Cosas para hacer en internetCosas para hacer en internet
Cosas para hacer en internetyuliethardila15
 
Tahoe Silicon Mountain and the New Tahoe Economy
Tahoe Silicon Mountain and the New Tahoe EconomyTahoe Silicon Mountain and the New Tahoe Economy
Tahoe Silicon Mountain and the New Tahoe EconomyTahoe Silicon Mountain
 
Google Glass Project Final
Google Glass Project FinalGoogle Glass Project Final
Google Glass Project FinalHunter Brown
 

Viewers also liked (13)

Coarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c routerCoarse grained hybrid reconfigurable architecture with no c router
Coarse grained hybrid reconfigurable architecture with no c router
 
Heartbeat Prayer Cards
Heartbeat Prayer CardsHeartbeat Prayer Cards
Heartbeat Prayer Cards
 
Intelligent space buffers power efficient solution for network on chip
Intelligent space buffers   power efficient solution for network on chipIntelligent space buffers   power efficient solution for network on chip
Intelligent space buffers power efficient solution for network on chip
 
Отделение предпринимательства и сервиса
Отделение предпринимательства и сервисаОтделение предпринимательства и сервиса
Отделение предпринимательства и сервиса
 
Portfolio Booklet
Portfolio BookletPortfolio Booklet
Portfolio Booklet
 
Temperate Tension
Temperate TensionTemperate Tension
Temperate Tension
 
Coarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architectureCoarse grained hybrid reconfigurable architecture
Coarse grained hybrid reconfigurable architecture
 
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
Coarse Grained Hybrid Reconfigurable Architecture with NoC Router for Variabl...
 
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...Coarse grained hybrid reconfigurable architecture with noc router for variabl...
Coarse grained hybrid reconfigurable architecture with noc router for variabl...
 
Technical Ascension
Technical AscensionTechnical Ascension
Technical Ascension
 
Cosas para hacer en internet
Cosas para hacer en internetCosas para hacer en internet
Cosas para hacer en internet
 
Tahoe Silicon Mountain and the New Tahoe Economy
Tahoe Silicon Mountain and the New Tahoe EconomyTahoe Silicon Mountain and the New Tahoe Economy
Tahoe Silicon Mountain and the New Tahoe Economy
 
Google Glass Project Final
Google Glass Project FinalGoogle Glass Project Final
Google Glass Project Final
 

Similar to Power efficient solution for network on chip

SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOC SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOC VLSICS Design
 
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOCSMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOCVLSICS Design
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPijaceeejournal
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...VLSICS Design
 
Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyIJRES Journal
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Editor IJARCET
 
Low power network on chip architectures: A survey
Low power network on chip architectures: A surveyLow power network on chip architectures: A survey
Low power network on chip architectures: A surveyCSITiaesprime
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsiigeeks1234
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiigeeks1234
 
Optimal State Assignment to Spare Cell inputs for Leakage Recovery
Optimal State Assignment to Spare Cell inputs for Leakage RecoveryOptimal State Assignment to Spare Cell inputs for Leakage Recovery
Optimal State Assignment to Spare Cell inputs for Leakage RecoveryIJASCSE
 
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...IRJET Journal
 
Enhanced low power, fast and area efficient carry select adder
Enhanced low power, fast and area efficient carry select adderEnhanced low power, fast and area efficient carry select adder
Enhanced low power, fast and area efficient carry select addereSAT Publishing House
 
Elastistore flexible elastic buffering for virtual-channel-based networks on...
Elastistore  flexible elastic buffering for virtual-channel-based networks on...Elastistore  flexible elastic buffering for virtual-channel-based networks on...
Elastistore flexible elastic buffering for virtual-channel-based networks on...I3E Technologies
 
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...IJERD Editor
 

Similar to Power efficient solution for network on chip (20)

SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOC SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
 
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOCSMART MULTICROSSBAR ROUTER DESIGN IN NOC
SMART MULTICROSSBAR ROUTER DESIGN IN NOC
 
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIPA ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
A ULTRA-LOW POWER ROUTER DESIGN FOR NETWORK ON CHIP
 
Ijecet 06 08_003
Ijecet 06 08_003Ijecet 06 08_003
Ijecet 06 08_003
 
Ijecet 06 08_003
Ijecet 06 08_003Ijecet 06 08_003
Ijecet 06 08_003
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
Optimized Design of 2D Mesh NOC Router using Custom SRAM & Common Buffer Util...
 
20 24
20 2420 24
20 24
 
Network on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A surveyNetwork on Chip Architecture and Routing Techniques: A survey
Network on Chip Architecture and Routing Techniques: A survey
 
Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427Ijarcet vol-2-issue-4-1420-1427
Ijarcet vol-2-issue-4-1420-1427
 
Low power network on chip architectures: A survey
Low power network on chip architectures: A surveyLow power network on chip architectures: A survey
Low power network on chip architectures: A survey
 
A018120105
A018120105A018120105
A018120105
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Ieee 2015 project list_vlsi
Ieee 2015 project list_vlsiIeee 2015 project list_vlsi
Ieee 2015 project list_vlsi
 
Me,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsiMe,be ieee 2015 project list_vlsi
Me,be ieee 2015 project list_vlsi
 
Optimal State Assignment to Spare Cell inputs for Leakage Recovery
Optimal State Assignment to Spare Cell inputs for Leakage RecoveryOptimal State Assignment to Spare Cell inputs for Leakage Recovery
Optimal State Assignment to Spare Cell inputs for Leakage Recovery
 
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...
IRJET - Analysis of Different Arbitration Algorithms for Amba Ahb Bus Protoco...
 
Enhanced low power, fast and area efficient carry select adder
Enhanced low power, fast and area efficient carry select adderEnhanced low power, fast and area efficient carry select adder
Enhanced low power, fast and area efficient carry select adder
 
Elastistore flexible elastic buffering for virtual-channel-based networks on...
Elastistore  flexible elastic buffering for virtual-channel-based networks on...Elastistore  flexible elastic buffering for virtual-channel-based networks on...
Elastistore flexible elastic buffering for virtual-channel-based networks on...
 
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...
Performance Evaluation of Ant Colony Optimization Based Rendezvous Leach Usin...
 

Power efficient solution for network on chip

  • 1. Term Paper Submission ECE 562 – Fall 2013 1 ISBs: Bidirectional Buffer-less Router with Intelligent Space Buffers Dhiraj Chaudhary and Ahmed Louri Dept. of Electrical and Computer Engineering, University of Arizona, Tucson, AZ 85721 {dhirajchaudhary,louri}@ece.arizona.edu ABSTRACT Buffers in routers consume significant power and area. A novel intelligent space buffers (ISBs) NOC architecture capable of mitigating both power and performance issues is proposed. Buffer-less router designs illustrates a significant degradation of performance at high injection rates. We make a case for new approach for power efficient design of Network- on-Chip utilizing buffer-less routers with improved performance. General Terms: Architecture, Algorithm, Design. Keywords: routing, network on chip, control, buffers, Channels. 1. INTRODUCTION Today high performance and power are very tight constraints for Network on Chip (NOC). According to some papers, NOC consumes up to 30% of power in the Intel 80-core Terascale chip [1] and about 40% in the MIT RAW chip [2].A lot work has been done and still in progress to balance power and performance. As we increase the number of cores the latency dominates and power control mechanisms further worsen this situations. It is essential to design a low power design for NOC by keeping performance with in certain limits. This paper will discuss about a new low power design which can be thought of as a balanced implementation for future NOC designs. Buffers are power hungry. A paper by Onur Mutlu et. al [3] suggests removing buffers can save upto 60% of total power in NOC. But removing buffers has potential negative impact on performance and bandwidth efficiency. This design works well for low injection rates but for high injection rate BLESS consumes a substantial percentage of chip power with degradation in performance. Latif Khalid et. al [4] discusses a very straight forward approach, utilize ideal buffers. Storing packets require more power as compared to transmission them so it is better to transmit packets [9]. Sharing of buffers amongst various ports or virtual channels can decrease a significant buffer count. This design comes with an additional computational complexity impacting area consumption and may be power in certain cases. Avinash Kodi et. al [5] has introduced adaptive dual-function links. Links can dynamically configured as repeaters as well as storage units in case of congestion. It can save ~40% of buffer power, and area efficient as well. In this paper, we propose intelligent space buffers (ISBs) which can achieve high performance with buffer-less routers by keeping power consumption with in certain limits. We deploy buffers in the space around the router. Congestion control mechanism is inherent quality of control unit. Control unit dynamically manages the number of buffers allocated to each channel according to traffic. Bi-directional [6] links has been utilized to utilize buffers in a more effective manner. 2. RELATED WORKS 2.1 BLESS: buffer-less routers Buffers are responsible for 60% of total power consumption in network on chip (NOC) and consumes about 64% of static power [7] [8]. Many researchers hate buffers and try to completely keep them away from router. Buffer-less router design BLESS by Onur Mutlu et. al [3] demonstrates 60% reduction in area, deadlock avoidance, simplified router design and no live locks etc. But the research statistics shows that by eliminating buffers, there is a major degradation in performance. Concept goes well for low injection rate but with high injection rate, significant degradation in both power and performance has been observed [3]. In conventional design one can see the buffers associated with each virtual channel. Along with that there is huge area hungry control circuitry including VC
  • 2. Term Paper Submission ECE 562 – Fall 2013 2 allocator, switch allocator and route computation unit are present. Figure 1. Traditional switch architecture with buffers Figure 2. Buffer-less switch architecture If we go for buffer-less router then significant area can be saved. BLESS uses hot potato routing protocol. It is a deflection based mechanism in which after receiving a packet or flit, router will deflect it in any direction based on port availability. Flit ranking mechanism illustrated in figure -- takes care of live-lock problem caused by deflection. Oldest packet will get more priority which can avoid the live-lock situation in buffer-less. As the flits are always in motion so deadlock situation cannot arise, which is one of the major problems in the routers with buffers. Another advantage of BLESS is very less router latency because of less routing computations. But major drawback is buffer-less does not perform well in high injection rates. With the increase in injection rate at router, its performance degrades drastically. As illustrated in [3] injection rate of 0.08, buffer-less router outperforms the router with buffers. At injection rate 0.28 there is drastic increase in link and router energy. This is due to the fact that packet takes longer time when deflected in wrong directions to reach destination. Pipeline latency is less in BLESS as compared to conventional router with buffers. Decrease in latency is because of elimination of virtual channel allocation and switch allocation stages. Experimental results [3] clearly indicates the breakdown for buffer-less at 0.29 injection rate compared to 0.35 for 4 VC- 4 flits buffer. All the experiments are carried out by considering 8*8 routers using synthetic traces utilizing 4 different traffic patterns: Uniform routing (UR), transpose (TR), mesh tornado (TOR) and bit complement (BC). BLESS design works well for less traffic network. In NOCs it is applicable to the memory-core interface. As memory and core communicate at less injection rates. But still there are a lot of issues associated with buffer-less routers. First one is flit overhead, every flit should have header associated with it. Second one is high latency with respect to each flit reaching destination. Because flits will arrive at different time intervals therefore to accumulate flits to packet we may require a large buffer size at receiver. Because of all above stated drawbacks BLESS did not get much success in term of practical implementation. 2.2 Shared buffers In this design Latif Khalid et. al [4] has proposed to share the buffers associated with each virtual channel. Figure 3 describes the conventional router architecture in which each virtual channel has its own buffer space associated with it.
  • 3. Term Paper Submission ECE 562 – Fall 2013 3 Figure 3. Architecture of input part of router for shared buffers NOC design (Courtesy of Latif, Khalid, Tiberiu Seceleanu, and Hannu Tenhunen. "Power and area efficient design of network-on-chip router through utilization of idle buffers." Engineering of Computer Based Systems (ECBS), 2010 17th IEEE International Conference and Workshops on. IEEE, 2010.) Figure 1 describes the conventional router architecture in which each virtual channel has its own buffer space associated with it. Traffic of virtual channel 1 cannot utilize the buffers of other virtual channel even though they are free. In practical scenario 100% buffers are never utilized. The idea is to utilize this unutilized channel buffer space. In figure 3 we showcase the shared buffer architecture. The main contribution of this paper lies in the input part where the channels share the common buffer space. Each packet is divided in flits in which first flit is head flit. We call it as beginning of packet (BOP). When BOP arrives at buffer allocator unit. It will look for the free buffer space and allocate it. Then allocated signal is sent to buffer write controller in response to which buffer write controller will send busy signal. After receiving busy signal buffer allocator will send allocated to signal which will set the multiplexer pins of input buffer. After allocation, grant signal will be sent to port sending flits. This signal acts as the virtual channel identifier. For every new flit the port will send the NewFlit_Dx_x signal to buffer write controller. In case of two requests for one buffer slot we need to arbitrate which is done by priority signal shown in figure. Status_flag is the logical AND operation of all the busy signals which indicate all buffer slots are full. After receiving this signal, requesting neighboring port takes decision to redirect flits to some other direction or store until congestion is resolved. 2.3 iDEAL- Inter-router Dual-function Energy and Area-efficient Links for NoC architectures With continued improvement in the router design, a paper [5] addresses a completely new era of architecture in NOCs which saves up to 40% of buffer power and 41% of router area. Basic idea is to utilize the repeaters in the links to dynamically act as buffers. iDEAL replaces the conventional buffers by three state repeaters. When the control signal is low, three state repeater acts in the similar way as conventional repeater. But with high control signal it can act as a buffer which can hold the bit. Figure 1 illustrates the conventional router architecture, in which each virtual channel has 4 buffer slots of 128 bits each. We can remove some of these buffers and can place them on the link. This can save router area and power consumption as well. Figure 4 shows the reduced buffer size of router v4-r16-c0 to v4-r8-c8. Congestion control signal dynamically configure these adaptive link buffers (ALBs) to act as repeaters or buffers according to traffic load. iDEAL improves power
  • 4. Term Paper Submission ECE 562 – Fall 2013 4 Figure 4. Dual function links used in iDEAL NOC architecture (Courtesy of Kodi, Avinash Karanth, Ashwini Sarathy, and Ahmed Louri. "iDEAL: Inter-router dual-function energy and area-efficient links for network-on-chip (NoC) architectures." ACM SIGARCH Computer Architecture News. Vol. 36. No. 3. IEEE Computer Society, 2008) and area more than 40% with 1-2 % degradation in performance [5]. 2.4 BiNoC: A Bidirectional NoC Architecture with Dynamic Self-Reconfigurable Channel Bidirectional NoCs allow each communication channel to be dynamically configured in either directions to enhance the performance. This design illustrates a significant increase in performance with some area penalty [6]. Aim is to utilize the channel’s bandwidth more effectively. In BiNOC design, if outgoing channel has more traffic as compared to incoming channel, BiNoC design can switch the direction of incoming channel. In this way load is shared between two channels. BiNoC can be utilized in the networks where traffic density varies much in opposite directions. 3. DESIGN OF INTELLIGENT SPACE BUFFERS 3.1 NOC router Architecture We use an n * n mesh architecture in a 2-D mesh. Routers are considered as buffer-less and connected to processing element (PE). Each router is connected to four adjacent neighbors north, east, south & west respectively. Packets are divided in to head, body and tail flits similar to conventional architectures. Deflection routing algorithm is considered in this design. 3.2 Problem description: Buffer-less routers illustrates a significant degradation in performance and power consumption at high injection rates, which defeats aim to go for buffer-less [6]. (a) (b)
  • 5. Term Paper Submission ECE 562 – Fall 2013 5 Figure 5. (a) Drop packet in case of congestion for BLESS router architecture (b)Redirected packet in case of congestion for BLESS architecture. In figure 5, suppose that B and C both send their respective packets to same output port of router A. Then router A will have to drop one of packets because there is no buffers to store packets and at a time only one can take that output port. Or if deflection based routing algorithm is employed then packets are redirected to any output port which is free. Deflected packet takes long time to reach destination which degrades the overall performance of BLESS router design. 3.1 Intelligent space buffers (ISBs) implementation In this section we detail the implementation of intelligent space buffers and associated control unit. Figure 6. Proposed intelligent space buffers. Figure 6 illustrates the conventional buffers replaced by stack of buffers placed outside router. When the decision and control unit’s signal is low then buffers will be in power down mode. Whereas in case of congestion, buffers will be activated and hold the data bits. Buffers will be in activation mode until congestion is alleviated. This implementation enables the buffer-less routers to perform well at high injection rates. Control unit is the heart of ISBs which is discussed in next section. 3.2 Control Unit Implementation Control unit enables the buffers to be in power down or active mode during congestion. A single control unit is responsible for the activation of all space buffers shown in figure 6. Control unit as illustrated in figure 7, consists of a counter which counts the number of flits/ packets flowing in particular link. Although for simplicity only one link is shown but in practical implementation 2 links will be controlled by control unit. Comparator unit compares the count obtained from counter unit to the predetermined stored value “P”. If value exceed this threshold value (P) then decision & control unit sends the activate signal to respective buffers. Apart from that control unit will also send Figure 7. Proposed control unit implementation for ISBs
  • 6. Term Paper Submission ECE 562 – Fall 2013 6 the switching signals to sw1 and sw2. Now all the traffic from port A to B will traverse via buffer unit. The overhead of control unit is negligible if we compare it with power saving. Figure 8. Proposed algorithm implemented at control unit of ISB architecture Figure 8 illustrates the detailed algorithm to be implemented at control unit. The main issue is, how to determine threshold value. Another issue is how much buffer space to be allocated to each channel in case of congestion. We have considered 80% for the prototype but still it needs an improvement. 3.3 Dynamic space buffers in Bi-Directional links Proposed intelligent space buffers architecture can be further optimized by utilizing bi-directional links [6]. Figure 9 illustrates the behavior of links when traffic in one dimension dominates the other. In figure 9(b), R1 (Router 1) configures both the channels and links as the output when traffic from R1 to R2 is more than traffic from R2 to R1. Figure 9(c) illustrates the opposite scenario that is traffic from R2 to R1 is more. In figure 10 block diagram illustrates the bi- directional channel or link between router A and B. Introducing bidirectional links can improve performance [6] at high injection rates. Figure 9. (a) Conventional unidirectional link between routers R1 and R2. (b) Reconfigured links for congestion from R1 to R2 router. But there is scope of power reduction in our design by using bi-directional channels instead of unidirectional. Algorithm at router interface works in a similar fashion as described in [6] Figure 10. Bidirectional links implemented in ISBs Suppose that routers cannot process a packet before 2 ns and a packet is sent from router A to router B at 1 ns followed by one more packet on the same port interface at 2 ns. But router B cannot process new request before 3 ns so it will drop the packet. We can utilize the incoming channel from router B to Router A at same port if it is free. A control circuitry is needed to switch the direction of port. If 2 or more packets request the same port at 2 ns then algorithm illustrated in figure 8 running at control circuitry of space buffers will start executing.
  • 7. Term Paper Submission ECE 562 – Fall 2013 7 3.4 Power gated frame implementation Figure 11. Proposed pipelined power gating scheme Power gating suffers from wake up latency which impacts performance [10] [11]. We are using sleep mode transistors in ISBs for performance optimization. 10% of total transistors are in sleep mode and 90 % remain in complete shut off. When injection rate at any port is high, control block will redirect the traffic via buffers. When 8 % of buffers are occupied then 30 % of remaining buffers are triggered to wake up mode. This will avoid the wake up latency. As shown in figure 11, when traffic is below threshold then we can start sending buffers back to power down mode. We have assumed 10% drop in buffer space when load decreases below some threshold value. State 5 indicates 90% buffers are utilized at most. After this all the packets specific to that port will be discarded. This will avoid the impact of congestion to another port. Proposed gating scheme can perform well at high injection rates also. As we overcoming wakeup latency, this scheme offers high performance as compared to conventional power gating. We are keeping buffers in power down mode which is complete shut-down hence static power dissipation will be less in pipelined power gating scheme. Pipelined power gating scheme is easy to implement and promising in terms of power and high performance. Exact performance gain can be calculated after simulations. Our estimation shows saving of more than 5 clock cycles. As 5 clock
  • 8. Term Paper Submission ECE 562 – Fall 2013 8 cycles saving is illustrated in [11] and pipelined power gating can further improvise this performance. 4. DESIGN COMPLEXITY Proposed ISBs architecture is not area efficient design. Because we are dynamically controlling links as well as buffers. Control circuitry may take a large percentage of area. Another issue is with predetermined threshold value used in control unit. We need to recheck the proposed design in real time traffic. We may implement a learning mechanism to set predetermined threshold but area constraint is the major issue which we need to look for success of ISBs. 5. FUTURE WORK While ISBs is appealing design for its power and performance balance but there exists a large design space that spans the gap between traditional and ISBs architecture. First, area efficient design for ISBs NOC architecture, which is not discussed in this paper. Another one is, permutation and priority schemes to be implemented at the control block in case of congestion. Deadlock may also be the problem of ISBs because of implementation of new buffers. Flow control mechanisms are implemented by counter, which can be improved to make ISBs more performance and power. 6. CONCLUSION In this paper we propose a novel architecture to counter performance and power issues in NOC. ISBs utilizes buffer-less router and bidirectional links to achieve significant saving in power. To counter performance issue, we provide self- configured intelligent space buffers. Novel architecture lacks in simulations because of time constraints. It is our hope that this proposed architecture will inspire more new ideas for works on NOC. 7. REFRENCES [1] Y. Hoskote, S. Vangal, A. Singh, N. Borkar, and S. Borkar. “A 5-ghz mesh interconnect for a teraflops processor”. IEEE Micro, 27(5), 2007. [2] Taylor, Michael Bedford, et al. "Evaluation of the Raw microprocessor: An exposed-wire- ndelay architecture for ILP and streams." ACM SIGARCH Computer Architecture News. Vol. 32. No. 2. IEEE Computer Society, 2004. [3] Moscibroda, Thomas, and Onur Mutlu. "A case for bufferless routing in on-chip networks." ACM SIGARCH Computer Architecture News. Vol. 37. No. 3. ACM, 2009] [4] Latif, Khalid, Tiberiu Seceleanu, and Hannu Tenhunen. "Power and area efficient design of network-on-chip router through utilization of idle buffers." Engineering of Computer Based Systems (ECBS), 2010 17th IEEE International Conference and Workshops on. IEEE, 2010. [5] Kodi, Avinash Karanth, Ashwini Sarathy, and Ahmed Louri. "iDEAL: Inter-router dual- function energy and area-efficient links for network-on-chip (NoC) architectures." ACM SIGARCH Computer Architecture News. Vol. 36. No. 3. IEEE Computer Society, 2008. [6] Y.C. Lan, S.H. Lo, Y.C. Lin, Y.H. Hu, and S.J. Chen, "BiNoC: A Bidirectional NoC Architecture with Dynamic Self- Reconfigurable Channel," in Proc. of the 3rd ACM/IEEE International Symposium on Networks-on-Chip, pp. 266-275, 2009. [7] W. Hangsheng, L. S. Peh, and S. Malik. “Power driven design of router microarchitectures in on-chip networks,” Proceedings of the 36th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), pp. 105-116, 2003. [8] Xuning Chen and Li-Shiuan Peh. “Leakage power modeling and optimization of interconnection networks”. Proceedings of International Symposium on Low Power Electronics and Design, pp. 9095, 2003. [9] T. T. Ye, L. Benini, G. De Micheli. “Analysis of power consumption on switch fabrics in network routers,” Proceedings of the 39th Design Automation Conference (DAC), pp. 524-529, 2002. [10] Z. Hu, A. Buyuktosunoglu, V. Srinivasan, V. Zyuban, H. Jacobson, and P. Bose, "Microarchitectural techniques for power gating of execution units," in International Symposium on Lower Power Electronics and Design (ISLPED), CA, USA, pp. 32-37, 2004. [11] H. Matsutani, M. Koibuchi, W. Daihan, and H. Amano, "Run-time power gating of on-chip routers using look-ahead routing," in 13th Asia and South Pacific Design Automation Conference (ASP-DAC), Piscataway, NJ, USA, pp. 55-60, 2008.