Traffic Characterization for Multicasting in NoC                  V.Laxmi1 , Roopesh Chuggani2 , M.S.Gaur3 , Pankaj Khandel...
pointers for further extension in Section VIII.                                     0                                     ...
Exponential distribution is parametrized by average value of                               To verify this observation, we ...
400                                                                                                450                    ...
flow between routers. NIRGAM support wormhole switching               of mt1 and mt2 while last two columns represent value...
NoC core will reduce the size of NoC chip and the cost                         [15] K. K. Paliwal, J. S. George, N. Ramesh...
Upcoming SlideShare
Loading in …5



Published on

  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide


  1. 1. Traffic Characterization for Multicasting in NoC V.Laxmi1 , Roopesh Chuggani2 , M.S.Gaur3 , Pankaj Khandelwal4 , Prateek Bansal5 Department of Computer Engineering National Institute of Technology Jaipur{vlaxmi |gaurms },{roopesh.chuggani2 |pankaj1394 |prateekbansal.895 } 1 3 Abstract—NoC (Network on Chip) is an emerging paradigm one core to another. Traffic modelling has been proposed as anfor design of VLSI/ULSI circuits to overcome communication open area of research in recent papers [7]. Most evaluationsbottleneck of traditional bus based systems. NoC communica- and analysis of NoC design parameters are still based on basiction framework consists of regularly placed routers, which areconnected to processing cores. NoC performance is determined synthetic traffic patterns such as CBR (Constant Bit Rate),by latency and throughput for communication requirements. bursty, bit-complement, transpose, etc. These traffic patternsNoC communication traffic modelling plays an important role do not capture real-world scenario as each of these patternsin design of NoC simulators and/or prototypes. This paper comprise of only point-to-point communications, i.e. for eachpresents a framework for modelling source traffic for multipoint source there is only one destination. Traffic modelling ofcommunication from one source to different destinations as isrequired for multicasting. Such a traffic model captures real- multicast communication for NoC is still in scenarios such as multicasting, execution of concurrent In multimedia applications such as NoC design for modulesmultiple tasks on a single core (each task requiring commu- of MPEG encoder/decoder, point-to-multipoint communica-nication with different destinations). The model proposes how tion patterns are also needed as experienced by authors whileconcurrent traffic streams from a single core to different desti- extending capability of an NoC simulator. This requires gen-nations can be mathematically characterized as a single streamat source end. The model is derived from statistical behaviour eration of multiple traffic streams originating from the sameof probabilistically demultiplexing of a single traffic stream. In source but destined for different cores. A similar traffic patternits nascent stage, the method is proposed for a scenario of one is observed when a core is running concurrent tasks; each tasksource concurrently communicating with two destinations as shall requiring communicating with different required for mapping two concurrent tasks to same core or In this paper, we propose how multicast communication,simultaneous broadcast to two destinations. Index Terms—Network on Chip, Multicasting, Bursty Traffic, i.e. multiple traffic streams originating at the source, can beProbabilistic Demultiplexing, Exponential Distribution viewed as a single traffic stream without any adverse impact on statistical characteristics of destination traffic streams. The I. I NTRODUCTION model is derived from observations of statistical behaviour of received streams at destinations in a single source multiple VLSI designs are increasingly becoming more complex with destinations scenario. Till now, to the best of our knowledge,increase in scale of integration resulting in more components no traffic model has been proposed to accurately characterizebeing fabricated on the same chip. With resultant increase in this scenario. In this initial work, we present model forthe number of processing cores (CPU, DSP, memory, etc.), two destinations. This can be used as basis for n(n > 2)increased inter-core communication requirement cannot be destinations.satisfied by the traditional bus based communication archi- The model is based on the observation that probabilistictecture [1], [2]. Network on Chip (NoC) has been proposed division of a bursty traffic stream into two separate streamsas an alternative [3]. NoC provides a communication layer results in both streams being bursty. Burst parameter of eachof regularly placed, interconnected routers. Inter-core com- stream is related to the that of the original stream. Themunication takes place through these routers. Decoupling of proposed traffic model has been implemented and tested oncommunication and computation simplifies IC design process. an open source NoC simulator NIRGAM [8].Regularity in NoC structure results in better scalability and This paper is organized as follows: In Section II, we presentfault tolerance [2], [4]. Because of its modular structure, many the background survey in this field. In Section III, we presentcomponents can be reused from previous designs resulting in objectives of the presented work and motivation for proposedreduced time to market for new NoC designs. traffic model. In Section IV, we derive how statistical charac- NoC design parameters include topology selection, router teristics of traffic streams received at destinations are related todesign and choice of routing function. A NoC simulator can those of the source traffic. These relationships are derived fromassist the designer in evaluation of different NoC designs. observations of experiments conducted. Section V describesOne important aspect of simulator design is characterization of NoC simulator NIRGAM, on which the proposed model isinter-core traffic. Traffic modelling of the cores is an important implemented, in brief. In Section VI, implementation of thestep in NoC design [5], [6]. Traffic models are mathematical proposed model on NIRGAM is described. Experimental resultcharacterization of statistical properties of data flowing from are presented in Section VII followed by conclusions and978-1-4244-8971-8/10$26.00 c 2010 IEEE
  2. 2. pointers for further extension in Section VIII. 0 1 2 3 II. R ELATED W ORK 7 Applications needs to be mapped to the underlying NoC 4 5 6architecture by dividing their functionality of the applicationinto smaller tasks. Each task is mapped onto one NoC core. 8 9 10 11Many algorithms for mapping these tasks on to IP core havebeen proposed [9]–[11]. In each of previous work, a singletask is mapped onto one IP core. Most of the past work has 12 13 14 15been done to map a single application onto the underlyingnetwork. In [9], the tasks of a process control platform are 0mapped on to NoC cores in one to one manner. In [11], Hu et IP Core Task Data Flowal propose an energy constrained mapping of communicationtask graph to a NoC. This work considers single task per core. Fig. 1: NoC Architecture with Multiple Task per core NoC evaluation is based on the assumption of mapping sin-gle task per core and point-to-point traditional traffic patternslike bit complement, transpose [3]. This type of communi- statistical characteristics of traffic received at the destinations.cation is limited to only few applications, because rarely a Following are the assumptions for our model.node communicates with just a single node or with all the 1) There is one source and two destinations. This canother nodes in the network. For modelling a multicast (point happen when at most two traffic streams are emanatingto multipoint) scenario, uniform random traffic is used by on a single core.selecting a random destination for each packet; probability of 2) Each stream (task) is generating Bursty traffic; averageeach destination being selected is same. In [12], a new traffic OFF time of this traffic is modelled using exponentialpattern is proposed to create the scenario where tasks with distribution.higher intertask communicating tasks are mapped to cores in 3) Traffic model is independent of burst size (Numberadjacent regions. In this traffic pattern, communication is point of packets in a particular burst). Experimental resultsto point but, traffic is distributed to multiple destinations. suggest that traffic statistics appears to be independent These traffic patterns cannot model the point to multipoint of burst size. Details are discussed in Section VII.traffic generated by multiple tasks executing on a singlecore. This is because when we map multiple tasks on single We define following parameters for our traffic model :core, traffic of the core is composed of the individual traffic 1) mc : Average (Mean) OFF time of the traffic generatedgenerated by each tasks. Each individual traffic stream can by the core node.have different statistical properties and destination pattern. But 2) p1 : Probability that packet is destined for first destina-traditional traffic generators do not provide functionality for tion.such a communication. 3) p2 : Probability that packet is destined for second destination III. M OTIVATION 4) mt1 : Average (Mean) OFF time of the traffic received In this paper, we try to model point-to-multipoint source by first destination.traffic pattern given the statistical behaviour of traffic received 5) mt2 : Average (Mean) OFF time of the traffic receivedat the destinations. This will result in multiple traffic streams by second destination.emerging from same core. Each traffic stream may have a Our model is based on the observation that when a burstydifferent destination and is likely to have different statistical traffic generated using exponential distribution with averageproperties. OFF time as mc is demultiplexed probabilistically into two Figure 1 shows one such scenario in an NoC of size 4 × 4 traffic streams, demultiplexed traffic streams still follow expo-wherein cores are numbered 0 to 15. Core 0 is multicasting nential distribution. Average OFF time of each stream/task isto cores 9 and 10 respectively. Core 7 is multicasting to cores mt1 and mt2 respectively. Probabilistic demultiplexing means10 and 12 respectively. There is one unicast communication that each packet is assigned to one of the streams/tasks as perfrom core 15 to core 13. probabilities (p1 , p2 ). A random number is generated and if it is less than p1 this burst of packets belongs to first stream, IV. P ROPOSED MODEL otherwise to second one. The main objective of the work presented here is to deter- We investigate dependence of mt1 and mt2 on mc , p1 , p2 .mine how a point-to-multipoint traffic pattern can be modelledat source end. We need to derive statistical characteristic of the A. Bursty Traffic Modeltraffic at source given traffic characteristics at the destination. Bursty traffic is modelled using exponential distribution [8].For such a derivation, we first consider the inverse of the Both inter packet interval and packet size follow exponentialobjective. Given source traffic characteristics, what are the distribution. We are concerned only with inter packet intervals.
  3. 3. Exponential distribution is parametrized by average value of To verify this observation, we generated and demultiplexedthe distribution denoted by m. The probability density function traffic for multiple values of mc . One such instance is shown in(PDF) of an exponential distribution is Figure 2. Here, Figure 2(a) shows the probability distribution x 1 −m of original trace with m = 30 while Figure 2(b) shows PDF me , x≥0 of one of the demultiplexed trace with probability 0.6. As can f (x; m) = (1) 0, x<0 be seen, both approximate to exponential distribution.m is also known as expected value of the distribution. Fol- C. Deriving the relationlowing variables are required in the traffic model To seek relationship between mc , mt1 and mc , mt2 , weB. Observation of Demultiplexed Trace generated and demultiplexed traces for various values of mc We generated a traffic trace with a random average OFF and calculated the values of mt1 and mt2 . It was foundtime mc . This traffic trace was divided into two different that average OFF time of traffic generated by each streamtraces using probabilities (p1 , p2 ). The PDF of the original is directly proportional to average core OFF time.trace was exponential as expected. PDFs of each demultiplexed mt1 ∝ mc (2a)trace was observed to follow similar exponential distribution.This observation was significant because it meant that we can mt2 ∝ mc (2b)generate two different exponential distributions from a singledistribution by probabilistically demultiplexing. 100 Offtime of task 1 (mt1) with probability 0.4 120 90 Offtime of task 2 (mt2) with probability 0.6 Average Off time for tasks 80 100 70 80 60 Frequency 50 60 40 40 30 20 20 10 0 5 10 15 20 25 30 35 0 Average Off time at Core 0 50 100 150 Value of inter packet time Fig. 3: mt1 v/s mc and mt2 v/s mc (a) Original Figure 3 shows the plot of average OFF time of core and 70 of demultiplexed traffic streams. On X axis is the average 60 OFF time of core (mc ), while on Y axis is the OFF time of both streams. As can be seen, the curve comes out to be 50 approximately linear, hence showing direct proportionality. Next, we deduce the relationship between the mt1 , mt2 and Frequency 40 p1 , p2 . To achieve this we kept the mc constant and probability of generation was varied from 0.1 to 0.95 (p1 + p2 = 1). It 30 was found that average OFF time of traffic generated by each 20 stream is inversely proportional to respectiveprobability. 1 10 mt1 ∝ (3a) p1 0 0 20 40 60 80 100 120 140 150 1 Value of inter packet time mt2 ∝ (3b) p2 (b) Demultiplexed The Figure 4 shows the plot of mt1 versus the probability(p1 ) for mc = 50. Probability is on the X-axisFig. 2: (a) PDF for Original Trace, (b) PDF for a demultiplexed and average OFF time is on the Y-axis. As can be seentrace (probability= 0.6) from the plot, curve precisely shows the inverse relationship.
  4. 4. 400 450 400 Actual offtime for source offtime 15 350 Analytical offtime for source offtime 15 Actual offtime for source offtime 25 350 Average Off time (mt1) 300 Analytical offtime for source offtime 25 Actual offtime for source offtime 35 Average Off Time 300 Analytical offtime for source offtime 35 250 250 200 200 150 150 100 100 50 50 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Probability (p1) Probability Fig. 4: Variation of mt1 w.r.t p1 Fig. 5: Analytical v/s actual OFF time of Task 1 for different values of mcAs the probability approaches unity the case reduces from 900point-multipoint scenario to point-point scenario and mt1 Actual Off time for source off time 35 Analytical off time for source offtme 35approaches mc . While for other destination, it attains a very 800 Actual Off time for source off time 25high value. Using Equations (2a), (2b), (3a) and (3b) with 700 Analytical off time for source offtme 25 Actual Off time for source off time 15curve fitting of both the curves, empirical relationship between Analytical off time for source offtme 15 Average Off time 600average OFF time for each was derived as: 500 1 mc + p2 + c1 mt1 + c2 (4) 400 p1 300 1 mc + p1 + c3 200 mt2 + c4 (5) p2 100 c1 , c2 , c3 , c4 are constants. In our case, when curve fitting 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9was applied following values were obtained c1 = c3 = 6 and Probabilityc2 = c4 = −6. Verification of the Equations (4), (5) is performed in two Fig. 6: Analytical v/s actual OFF time of Task 2 for differentsteps. We calculate average OFF time of traffic generated by values of mceach stream in two ways : 1) The values of mt1 and mt2 is calculated from the demul- tiplexed traces obtained with different values of p1 , p2 V. NIRGAM and mc . These values are referred to as ‘calculated’ or Network-on-chip Interconnect Routing and Application actual OFF time from trace. Modelling (NIRGAM) [8] is a discrete event, cycle accu- 2) For all the corresponding values of p1 , p2 and mc , values rate simulator targeted at Network on Chip (NoC) research. of mt1 and mt2 is calculated using Equations (4) and (5). NIRGAM is written in SystemC, which is a dynamic library These values are referred to as ‘analytical’ OFF time. for hardware modelling built on top of C++. NIRGAM allows Analytical and actual values are plotted on same figure to users to change various options in terms of NoC simulationverify the derived Equations (4) and (5). Figures 5 and 6 show at every stage such as routing algorithm, topologies, virtualthe result of verification. The results have been shown for channels, buffers etc. Simulation framework allows analysingdifferent values of mc to verify our model for a range of results in terms of various performance metric such as latency,core OFF time values. On X- axis is the probability of traffic throughput etc. Orion [13] has been integrated into NIRGAMgeneration and transmission for each stream and on Y axis is and allows users to creating and analysing power estimationthe OFF time of the traffic generated for that stream. As can be graphs. NIRGAM provides support for fault tolerance [14] andseen from the Figures 5 and 6, values from analytical formula QoS [15].very accurately estimates the actual OFF time calculate from NIRGAM supports 2D mesh and 2D torus topologies. Rout-demultiplexed trace. ing in NIRGAM is done using flits. These are the units that
  5. 5. flow between routers. NIRGAM support wormhole switching of mt1 and mt2 while last two columns represent valuesmechanism. Presently it supports a number of routing algo- calculated from traces generated by our traffic model. It canrithm such as XY, OE, DyaD, source, Q-routing, MaXY and be observed that calculated values and input values are nearlyPROM. A large number of options are available when it comes traffic modelling in NIRGAM as it supports various typeof traffic patterns such as Hotspot NED [12] as well as trafficinjection models. TABLE I: Calculated vs Input mean OFF time Other user configurable parameters in NIRGAM are virtual Input OFF Calculated mc Calculated time Probability OFF timechannels i.e. number of virtual channels per physical channels, Task1 Task2 p1 p2 Task1 Task2buffer size of an input channel, clock frequency. All these 16 25 0.60 0.40 4 15.4 22.2parameters can be specified in the configuration file of the 20 40 0.66 0.34 8 21.3 43.0 16 16 0.50 0.50 3 17.1 18.0NIRGAM before starting the simulation. 15 20 0.56 0.44 3 15.3 20.6 10 20 0.65 0.35 1 12.8 22.7 VI. I MPLEMENTATION OF P ROPOSED M ODEL 30 10 0.26 0.74 1 32.7 10.9 As discussed in Section IV, given the values of mc , p1 , p2we can calculate mt1 and mt2 using Equations (4) and (5). We ran simulation for different values of the flit interval.Though for implementing the proposed traffic model as a Simulation was done for three values of flit interval – 2, 4 andtraffic generator in any simulator it is desired that mt1 and 8 clock cycles. Results are shown in Table II. It is observed thatmt2 should be the input parameters. Different values of these mean OFF time calculated from generated trace is independentaverage OFF time will represent different classes of streams. of the flit interval. Hence, proposed traffic model can be usedTo derive values of mc , p1 , p2 for given values of mt1 with different flit intervals.and mt2 , we use Equations (4) and (5) and the fact thatp1 + p2 = 1 along with the derived values of c1 , c2 , c3 andc4 . A generalized version of the equation needed to solve for TABLE II: Calculated vs Input mean OFF time for differentp1 is shown below in Equation (6). Flit Intervals Input Off time Calculated OFF time Flit Interval = 2 Flit Interval = 4 Flit Interval = 8 Task1 Task2 Task1 Task2 Task1 Task2 Task1 Task2 p3 (mt1 + mt2 + 12) − p2 (mt1 + 2 ∗ mt2 + 18) + 1 1 15 20 15.8 20.0 16.2 19.0 15.6 20.2 p1 (mt2 + 8) − 1 = 0 (6) 11 30 11.0 31.4 11.2 29.7 11.4 29.1 8 11 8.7 11.4 8.7 11.5 8.6 11.6 Equation (6) has three possible roots, the one between 0 18 18 17.8 18.5 18.5 18.4 18.0 18.2and 1 is selected as probability values are in range [0 · · · 1].Computed root is assigned to p1 and p2 is computed as 1−p1 .mc can be calculated using Equation (4). When implementing the traffic model in NIRGAM values TABLE III: Calculated vs Input mean OFF time for differentof mt1 and mt2 are read from a configuration file. Using these Burst Lengthvalues Equation (6) is solved for p1 using bisection method Input Off time Calculated OFF time[16]. Once mc , p1 , p2 are known mc is used to generate Burst size = 4 Burst size = 8 Burst size = 12 Task1 Task2 Task1 Task2 Task1 Task2 Task1 Task2bursty traffic. Each time a new burst starts a random numberis generated in range [0 · · · 1]. If the generated number is less 15 20 14.8 20.2 14.6 19.1 14.5 18.9than p1 , first stream is allowed to transmit i.e. destination is 11 30 11.4 31.6 11.3 28.6 11.0 30.6 8 11 8.6 11.6 8.3 11.9 8.0 12.0chosen according to first stream for the current burst, otherwise 18 18 18.3 17.2 17.2 18.4 18.4 18.4destination is chosen according to second stream. VII. E XPERIMENTAL R ESULTS Simulation was run with different values of the burst size. We ran NIRGAM simulator for different values of mt1 and We have used three values of burst size – 4, 8 and 12 packets.mt2 on 4 × 4 mesh topology. Traffic model was attached to Results obtained are shown in Table III. Calculated meancore 0 and two destinations were cores 7 & 10 respectively. OFF time from trace is independent of the burst size of theTraffic was generated for 5000 clock cycle and simulation was traffic. This observation allows use of different burst sizes forrun for 8000 clock cycles. Number of virtual channels were modelling different streams/tasks.eight. To verify the traffic model, input values of mt1 and mt2 VIII. C ONCLUSION(values read from configuration file as specified by the user) This paper presented a traffic model for multicast communi-are compared with values calculated from demultiplexed trace. cation in NoC. This also models traffic scenario of concurrentThese values along with calculated values of mc , p1 and p2 tasks mapped to same core; each task requiring communicationare shown in Table I. Columns 1 and 2 show the input values with different destination. Mapping multiple tasks on a single
  6. 6. NoC core will reduce the size of NoC chip and the cost [15] K. K. Paliwal, J. S. George, N. Rameshan, V. Laxmi, M. S. Gaur,and shall provide more optimal use of network resources. To V. Janyani, and R. Narasimhan, “Implementation of Q O S aware Q- routing algorithm for network-on-chip,” in Communications in Computerfurther analyse this concept of the multicasting/multitasking, and Information Science, 2009.we provide a traffic model under the assumption that each task [16] A. Eiger, K. Sikorski, and F. Stenger, “A bisection method for systemsgenerates bursty traffic. For point-multipoint communication, of nonlinear equations,” ACM Trans. Math. Softw., vol. 10, no. 4, pp. 367–377, December 1984.the core can be viewed as generating a single stream with afixed average OFF time. This burst is probabilistically demul-tiplexed into two streams. The probabilities for demultiplexingare calculated based on specified average OFF time of trafficgenerated by each communication stream. Traffic model isimplemented and verified on an open source NoC simulatorNIRGAM. Multicast traffic model is independent of inter-flitinterval and burst size. In this paper, we have presented anovel model for simultaneous broadcast to two destinationsbut the model can be extended to n(n > 2) destinations. Inlatter case, the solution will require numerical method. Furtheranalysis of the performance of the various routing algorithms,topologies under other traffic distributions shall be part of ourfuture work. R EFERENCES [1] L. Carloni, P. Pande, and Y. Xie, “Networks-on-chip in emerging interconnect paradigms: Advantages and challenges,” in Networks-on- Chip, 2009. NoCS 2009, may 2009, pp. 93 –102. [2] L. Benini and G. D. Micheli, “Networks on chips: A new soc paradigm,” Computer, vol. 35, pp. 70–78, 2002. [3] W. J. Dally and B. Towles, “Route packets, not wires: on-chip intecon- nection networks,” in DAC ’01: Proceedings of the 38th annual Design Automation Conference, 2001, pp. 684–689. [4] J. Duato, S. Yalamanchili, and N. Lionel, Interconnection Networks: An Engineering Approach. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 2002. [5] M. Ali, M. Welzl, and S. Hellebrand, “A dynamic routing mechanism for network on chip,” in NORCHIP Conference, 2005. 23rd, 21-22 2005, pp. 70 – 73. [6] L. Tedesco, A. Mello, L. Giacomet, N. Calazans, and F. Moraes, “Ap- plication driven traffic modeling for nocs,” in SBCCI ’06: Proceedings of the 19th annual symposium on Integrated circuits and systems design, 2006, pp. 62–67. [7] R. Marculescu and P. Bogdan, “The chip is the network: Toward a sci- ence of network-on-chip design,” Foundations and Trends in Electronic Design Automation, vol. 2, no. 4, pp. 371–461, 2009. [8] “NIRGAM,” 2009. [Online]. Available: [9] T. Ahonen, D. A. Sig¨ enza-Tortosa, H. Bin, and J. Nurmi, “Topology u optimization for application-specific networks-on-chip,” in SLIP ’04: Proceedings of the 2004 international workshop on System level in- terconnect prediction. New York, NY, USA: ACM, 2004, pp. 53–60.[10] W. H. Ho and T. M. Pinkston, “A methodology for designing efficient on-chip interconnects on well-behaved communication patterns,” in HPCA ’03: Proceedings of the 9th International Symposium on High- Performance Computer Architecture. Washington, DC, USA: IEEE Computer Society, 2003, p. 377.[11] J. Hu and R. Marculescu, “Energy-aware mapping for tile-based noc architectures under performance constraints,” in ASP-DAC ’03: Proceed- ings of the 2003 Asia and South Pacific Design Automation Conference. New York, NY, USA: ACM, 2003, pp. 233–239.[12] A.-M. Rahmani, I. Kamali, P. Lotfi-Kamran, A. Afzali-Kusha, and S. Safari, “Negative exponential distribution traffic pattern for power/performance analysis of network on chips,” in VLSI Design, 2009 22nd International Conference on, 5-9 2009, pp. 157 –162.[13] A. B. Kahng, B. Li, L.-S. Peh, and K. Samadi, “Orion 2.0: A fast and accurate noc power and area model for early-stage design space exploration,” in DATE’09, 2009, pp. 423–428.[14] C. Grecu, L. Anghel, P. P. Pande, A. Ivanov, and R. Saleh, “Essential fault-tolerance metrics for noc infrastructures,” in IOLTS ’07: Pro- ceedings of the 13th IEEE International On-Line Testing Symposium. Washington, DC, USA: IEEE Computer Society, 2007, pp. 37–42.