Energy Aware Design Methodologies for Application                 Specific NoC       Naveen Choudhary, M. S. Gaur, V. Laxm...
number of routers the bit traverses from tile ti to tile tj, Erbit and   genetic algorithm (refer section V). The routing ...
1.   NoCEA.T = Φ; NoCEA.R = Φ; NoCEA.S = Φ; Γ = Φ ;                               A.   SPF and MSTF with Random Benchmarks...
clocks to 5 clocks and 7.5 clocks to 20.4 clocks and reduction                  for deadlock prevention, the presented met...
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Energy Aware Design Methodologies for Application Specific NoC Naveen Choudhary, M. S. Gaur, V. Laxmi Virendra Singh Department of Computer Engineering SERC Malaviya National Institute of Technology Indian Institute of Science Jaipur, India Bangalore, India,, II. IRREGULAR NOC COMMUNICATION MODEL ANDAbstract— Network-on-Chip (NoC) has emerged as a solution ARCHITECTUREfor communication framework for high-performance nanoscale In the following paragraphs, communication model,architecture. One important aspect, in addition to deadlock-free associated NoC architecture and routing function applicable forrouting, is low power consumption. In view of varied the customized Irregular NoC are described.communication requirements, application specific SoC design isincreasingly important. Customized NoC architectures are moresuitable for a particular application, and do not necessarilyconform to regular topologies. In this work, a methodology usingthe priori knowledge of the application’s communicationcharacteristic for the design of customized and energy optimizedirregular NoC is proposed. Keywords- NP-hard; NoC; Optimization; SoC; Core Graph. I. INTRODUCTION Network-on-Chip [1, 2, 7] has been proposed as thesolution for the on-chip communication challenges of future Figure 1. Application specific communication model in NoCSoC architectures. Early works [2, 13] in NoC favored the useof standard topologies such as meshes, tori, k-ary n-cubes or fat A. Communication Modeltrees under the assumption that the wires can be well structured Task graphs [9, 11] are generally used to model thein such topologies. However most application specific SoCs behavior of complex multi-core SoC applications on anare heterogeneous with each core having different size, abstract level. The tasks Ti is mapped to a set of IP cores vj,functionality and communication requirements. Thus, standard which communicates through unidirectional point-to-pointtopologies can have a structure that poorly matches the abstract channels. The generic communication model is shownapplication traffic leading to large wiring complexity after in Figure 1 and related definitions are presented as follows.floor-planning, as well as significant energy and area overhead.Moreover, for most SoCs the system is designed with static (or Definition 1 Core Graph is a directed graph, G (V, E) with eachsemi-static) mapping of tasks to processors and hardware cores vertex νi ∈ V representing an IP core and a directed edge ei,j ∈ E,and hence the communication traffic characteristics of the SoC representing the communication between the cores νi and νj. Theare well characterized at design time. Therefore it is expected weight of the edge ei,j denoted by bi,j , represent the desired average bandwidth requirement of the communication from νi and νj.that networks with irregular topology tailored to theapplications requirements to have an edge over the networks Definition 2 NoC topology graph is a directed graph N (U, F) withwith regular topology. Application specific custom topology each vertex υi ∈ U representing a node/tile in the topology and amapping and design have been explored in [8, 9, 10, 19]. In directed edge fi,j ∈ F represents direct communication channelthis paper, two genetic algorithm based heuristics for the between vertices υi and υj. Weight of the edge fi,j denoted by Abi,jdesign of customized energy efficient irregular NoC based on represents the available link/channel bandwidth across the edge fi,j.the applied routing function are proposed. B. Chip Layout & NoC Energy Model Irregular NoC communication model and architecture are Floorplanning can be done using non-slicing baseddefined in Section II. The proposed energy efficient design floorplannners such as B*-Trees [12]. The energy model [9] formethodologies for customized NoC are presented in Section III. the Network-on-Chip is defined as follows:The Genetic Algorithm (GA) used in the proposed Ebit (t i , t j ) = n hops × Erbit + (n hops − 1) × Elbitmethodologies is described in section IV. Section V presentssome experimental results followed by a brief conclusion in Where Ebit(ti, tj) is the average dynamic energy consumptionSection VI. for sending one bit of data from tile ti to tile tj, nhops is the 978-1-4244-8971-8/10$26.00 c 2010 IEEE
  2. 2. number of routers the bit traverses from tile ti to tile tj, Erbit and genetic algorithm (refer section V). The routing tables ofElbit are the energy consumed by router and link respectively routers in the discovered shortest energy path are marked withfor transporting one bit of data. tag shortest path. Lastly the proposed methodology uses the modified Dijkstra’s algorithm [14] according to up*/down*C. Routing in Irregular NoC (Left Right) rule for finding escape routing paths from each The popular routing algorithms with irregular topologies node in the shortest energy path to the correspondingsuch as up*/down* routing [5], Left-Right routing [6], L-turn destination in the generated NoC and tags them as up*/down*routing [6] use the turn model [4] to avoid deadlock condition. (Left-Right). While taking routing decision the output channelsIn this paper minimal (shortest) paths are used for tagged as shortest path are selected with higher priority andcommunication and up*/down* or Left-Right routing function up*/down* (Left Right) tagged channels are selected only whenis used to provide deadlock free escape paths [3] to avoid no output channel corresponding to shortest path is free.deadlock situation in the network. B. Shortest Path First (SPF) Methodology III. METHODOLOGIES FOR ENERGY EFFICIENT NOC SPF is similar to MSTF methodology with the exception GENERATION that in SPF the topology generation is initiated by first finding the shortest energy path and later the topology is extended by constructing the MST. As in MSTF, a genetic algorithm is used to find the optimized energy-efficient traffic characteristics order of the application. Since in MSTF, MST is constructed first, it is possible that a large number of links for a number of nodes in the topology are the links pertaining to MST. As maximum links emanating from a node is limited to ndmax, this phenomenon can lead to increased value of hop count in the shortest energy paths generated later leading to increased communication energy. However the SPF overcomes this drawback by creating the links pertaining to shortest energy Figure 2. Network construction using proposed methodologies path before the links pertaining to MST. As shortest energy In this section, two GA based methodologies: minimum- paths in the topology are generated first in SPF and so therespanning-tree-first (MSTF) and shortest-paths-first (SPF) for can be a possibility that not enough number of free ports arethe design of customized energy efficient NoC and available to construct the MST in the topology later. In suchcorresponding routing tables for deadlock free communication case a minimum number of ports per node need to be reservedare presented. The routing function is implemented as given by before finding the shortest energy paths. However experimentsSilla et al [3] with up*/down* and Left-Right routing for escape showed that if communication requirement are uniformlypaths. For both the methodologies the floorplan information distributed in the Core Graph then such problems are rare ifand Core Graph exhibiting traffic characteristics respectively any. Algorithm 1 briefly presents the proposed methodologies.are taken as inputs (refer Figure 2). Floorplanning can be done Algorithm 1 : Energy aware application specific NoC generatorbased on Manhattan distance using a floorplannners such as Require :B*-Trees [12] assuming over the cell routing [17]. In both the 1. Œ = Core Graph = {E edges (i.e. traffic characteristics), V vertices}proposed methodologies the link length is not allowed to 2. V = {vi | vi is ith IP core}exceed the maximum permitted channel length (emax) due to 3. E = {eij : vi → vj with weight bwij | vi (source), vj (destination) • V}constraint of physical signaling delay. Moreover constraint on 4. NoC = {T (Topology), R (Set of routing tables), S (set of shortest path)}maximum permitted node-degree (ndmax) prevents the algorithm 5. TC_Array = {Array of traffic characteristic (i.e. ordered set of E)} 6. ndmax = Maximum permitted node degree in the topology Tfrom instantiating slow routers with a large number of I/O- 7. emax = The maximum permitted length of a link(channel) in topology Tchannels which would decrease the achievable clock frequency 8. Manhattan Distance = ∆= {dij | dij = |vi – vj|, vi, vj • V, dij < emax }due to internal routing and scheduling delay of the router. 9. u = node with maximum communication in Œ Ensure : Energy Aware NoC Topology for CGA. Minimum Spanning Tree First (MSTF) Methodology Procedure Minimum-Spanning-Tree-First() In this methodology, first while keeping the constraints on 1. NoCEA.T = Φ; NoCEA.R = Φ; NoCEA.S = Φ;ndmax and emax a minimum spanning tree (MST) using 2. Γ = {MST rooted at u as per ∆ and constraints ndmax & emax } 3. NoCEA.T = NoCEA.T ∪ {Γ}Manhattan distance as a metric is generated on the nodes of the 4. (NoCEA, TC_Array) = GeniticAlgo(NoCEA,Γ)Core Graph. The node with maximum bandwidth requirement 5. for each path si • {NoCEA.S }is assumed as the root of the constructed MST. This MST helps o N = {set of nodes in path si}in classifying all the channels of the topology as “up” (“Left”) o for nj • Nor “down” (“Right”). While keeping the constraints on ndmax NoCEA.R =NOCEA. R ∪ {update routing tables in NOCEA. Rand emax, the topology is further extended by laying the shortest for nodes • V in the root followed by the shortest up*/down*energy path for each traffic characteristics. Due to constraints (Left–Right) escape path from node nj to the destination node of path si. The routing table entry type tag is set as up*/down*on ndmax and emax, the order in which such shortest energy paths (Lef –Right) for these nodes}are generated basically decides the total communication energy o Endforrequirement of the generated topology. The optimized order of 7. endfortraffic characteristics of the application is found using a Endprocedure Procedure Shortest-Paths-First( )
  3. 3. 1. NoCEA.T = Φ; NoCEA.R = Φ; NoCEA.S = Φ; Γ = Φ ; A. SPF and MSTF with Random Benchmarks2. (NoCEA, TC_Array) = GeniticAlgo(NoCEA,Γ)3. Γ = { MST rooted at u as per ∆ and constraints ndmax & emax } Performance of the proposed SPF and MSTF methodology4. NoCEA.T = NoCEA.T ∪ {Γ} were compared on the IrNIRGAM with varying packet5. for each path si • {NoCEA.S } injection interval. Figure 3 shows performance results averaged o N = {set of nodes in path si} over 50 generated energy efficient irregular topologies o for nj • N generated based on up*/down* routing function. Constraints of NoCEA.R =NOCEA. R ∪ {update routing tables in NOCEA. R ndmax = 4 and emax as 1.5 times the maximum length of the for nodes • V in the root followed by the shortest up*/down* core/node among all the cores in the NoC were observed. For (Left–Right) escape path from node nj to the destination node of path si. The routing table entry type tag is set as up*/down* the SPF, total dynamic communication energy consumption (Lef –Right) for these nodes} was on average 18.5% lesser in comparison to MSTF o endfor Methodology. Moreover reduction in latency ( in the range of8. endfor 7.5 clocks to 10 clocks) was observed for comparativelyEndprocedure similar throughput. IV. GENETIC ALGORITHM A genetic algorithm [15] based heuristic is used to find thebest order of the traffic characteristics to generate the shortestenergy paths in topology such that the communication energyrequirement of the application is optimized. In the proposedgenetic algorithm formulation each chromosome is representedas an array of genes with each gene representing a trafficcharacteristic for the application. 500 chromosomes are takenin the initial population and crossover and mutation are doneon 50% and 40% of the population in each generation. (a)Crossover is achieved by intermixing of the trafficcharacteristics of two chromosomes whereas mutation isperformed by randomly changing the order of trafficcharacteristic in a chromosome. Fitness of chromosome isregarded as high if its cost approaches 0. The fitness functionused is as follows. Cost = Eci / X Where X is maximum chromosome energy requirement,Eci is the energy requirement for chromosome ci. It may benoted that, the best 10% chromosomes (referred as Best Class) (b)in any generation are directly transferred to the next generation, Figure 3. Performance comparison with varying packet injection interval ofso as not to degrade the solution between the generations. (a) dynamic communication energy consumption (in pico joules) and (b) Average flit latency (in clock cycles) of the proposed MSTF and SPF V. EXPERIMENTAL RESULTS methodology averaged over 50 generated energy efficient irregular topologies with number of cores varying from 16 to 81 Multiple Core Graphs using TGFF [11] were randomlygenerated with diverse bandwidth requirement of the IP Cores. B. SPF and Regular NoC with Random BenchmarksMoreover a NoC simulator IrNIRGAM, extended version ofNIRGAM [16] supporting irregular topology with the provision Figure 4 shows the performance comparison SPF with 2D-of supporting escape path routing for avoiding deadlock Mesh for equivalent sized tile and according to the applicationscondition was deployed for performance evaluation. traffic characteristics requirement. ndmax = 4 and emax wasIrNIRGAM was run for 10000 clock cycles with applied packet taken as 2 times the length of the core/node. The SPF withinjection interval to evaluate the network performance with up*/down* (Left-Right) routing shows reduced average flitvarying traffic load. The router energy consumption is latency in the range of 10 (9.4) clocks to 20.9 (18.4) clocks andevaluated using the power simulator orion [18] for 0.18µm 13.8 (13.2) clocks to 76 (69) clocks and reduction in averagetechnology. Similarly the dynamic bit energy consumption for per flit communication energy in the range of 18.8 (18.5%) tointer-node links (Elbit) can be calculated using the equation: 29.2 (25.8%) and 25.2 (24.6%) to 54.7 (53%) in comparison to 2D-Mesh with XY and OE routing respectively for up*/down* Elbit = (1 / 2) × α × C phy × VDD 2 (Left-Right) routing. In most cases SPF with up*/down* routing was found to perform better. Where α = average probability of a 1 to 0 or 0 to 1transition between two successive samples in the stream for a C. SPF and Regular NoC with Intelligent Mappingspecific bit, α = 0.5 assuming data stream to be purely random, The proposed SPF methodology was compared with theCphy = physical capacitance of inter-node wire and VDD is the intelligent energy aware mapping technique proposed in [9] forsupply voltage. equivalent tile sizes and application to core mapping. Figure 5 shows reduction in average flit latency in the range of 1.7
  4. 4. clocks to 5 clocks and 7.5 clocks to 20.4 clocks and reduction for deadlock prevention, the presented methodologies can bein average per flit communication energy in the range of 1.6% adapted with any topology agnostic routing algorithms whereto 10.9% and 17% to 37% for SPF methodology for equivalent generic routing rules based on turn prohibition can be laid. Itthroughput in comparison to the 2D-Mesh with XY and OE is believed that the combined treatment of the routing androuting respectively. topology generation offers a huge potential of optimization for future application-specific NoC architectures. REFERENCES [1] W. J. Dally, B.Towles,,“Route Packets, Not Wires: On-Chip Interconnection Networks,” in IEEE Proceedings of the 38th Design Automation Conference (DAC), pp. 684–689, 2001. [2] S. Kumar, A. Jantsch, J.-P. Soininen, M. Forsell, M. Millberg, J. Oberg, K. Tiensyrja, and A. Hemani, “A Network on Chip Architecture and Design Methodology”, In Proceedings of VLSI Annual Symposium (a) Average per flit latency and throughput (in flits) (ISVLSI 2002), pp. 105–112, 2002. [3] F. Silla, J. Duato, “ High-Performance Routing in Networks of Workstations with Irregular Topology,” in IEEE Transactions on Parallel and Distributed Systems, vol. 11, pp. 699-719, july 2000. [4] C. Glass, L. Ni, “The Turn Model for Adaptive Routing”. In Proceeding of 19th International Symposium on Computer Architecture. pp. 278– 287, May 1992. [5] M. D. Schroeder et al., “Autonet: A High-Speed Self-Configuring Local Area Network Using Point-to-Point Links”. Journal on Selected Areas in Communications, 9, 1991. [6] A. Jouraku, A. Funahashi, H. Amano, M. Koibuchi, “L-turn routing: An (b) Average communication energy per flit Adaptive Routing in Irregular Networks”. In Proceeding of theFigure 4. SPF (up*/down* & Left-Right routing) performance comparison International Conference on Parallel Processing, pp. 374-383, Sep.with 2D-Mesh (XY & OE routing) averaged over 50 generated energy 2001.efficient irregular topologies with number of cores varying from 16 to 81 (a) [7] U. Ogras, J. Hu, R. Marculescu, “Key research problems in NoC design:Average flit latency (in clock cycles) and (b) Average communication energy a holistic perspective”. In IEEE CODES+ISSS, pp. 69-74, 2005.consumption per flit (in pico joules) [8] S. Murali, G. De Micheli, “SUNMAP: A Tool for Automatic Topology Selection and Generation for NoCs”. In Proceeding of DAC, 2004. [9] J. Hu, R. Marculescu, “Energy-Aware Mapping for Tile-based NOC Architectures Under Performance Constraints”. In ASP-DAC 2003, Jan 2003. [10] J. Hu, R. Marculescu, “Energy- and performance-aware mapping for regular NoC architectures”. In IEEE Trans. on CAD of Integrated Circuits and Systems, 24(4), April 2005. [11] R. P. Dick, D. L. Rhodes, W. Wolf, “TGFF: task graphs for free”. In Proceeding of the International Workshop on Hardware/Software Codesign, March 1998. [12] Y. C. Chang, Y. W. Chang, G. M. Wu, S. W. Wu, “B*-Trees : A New (a) Average per flit latency and Throughput (in flits) Representation for Non-Slicing Floorplans”. In Proceeding of 37th Design Automation Conference, pp. 458-463, 2000. [13] L. Natvig, “High-level Architectural Simulation of the Torus Routing Chip”. In Proceedings of the International Verilog HDL Conference, California, pp. 48–55, Mar. 1997. [14] T. Cormen, C. Leiserson, R. Rivest, Introduction to Algorithms, Prentice Hall International, 1990. [15] A. E. Eiben and J. E. Smith, Introduction to Evolutionary Computing, Springer-Verlag, Berlin, Heidelberg, 2003. [16] L. Jain, B. M. Al-Hashimi, M. S. Gaur, V. Laxmi, A. Narayanan, (b) Average communication energy per flit “NIRGAM: A Simulator for NoC Interconnect Routing and ApplicationFigure 5. SPF and 2D-Mesh performance comparison for intelligent Modelling”. DATE 2007, 2007.application to Core mapping averaged over 50 generated energy efficient [17] K. Srinivasan, K. S. Chatha, “Layout Aware Design of Mesh based NoCirregular topologies with number of cores varying from 16 to 81 (a) Average Architectures”. In Proceedings of 4th International Conference onflit latency (in clock cycles) and (b) Average communication energy Hardware Software Codesign and System Synthesis, Seoul, Korea, pp.consumption per flit (in pico joules) 136-141, 2006. [18] H-S Wang et al., “Orion: A Power-Performance Simulator for VI. CONCLUSION AND FUTURE WORK Interconnection Network,” in Proc. International Symposium on Microarchitecture, Nov 2002. [19] K. Srinivasan et al., “An Automated Technique for Topology and Route In this paper, the energy efficient customized Irregular Generation of Application Specific On-Chip Interconnection Networks,”topology generation problem for NoC was addressed. in Proc. ICCAD 2005.Although in this paper for the proposed methodologies,up*/down* and Left-Right routing were used as escape path