SlideShare a Scribd company logo
1 of 5
Download to read offline
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014366
AbstractThe networks-on-chip (NoC) communica-
tion has an increasingly larger impact on the system
power consumption and performance. Emerging
technologies, like surface wave, are believed to have
lower transmission latency and power consumption over
the conventional wireless NoC. Therefore, this paper
studies how to optimize the network performance and
power consumption by giving the packet-switching
fabric and traffic pattern of each application. Compared
with the conventional method of wire-linked, which
adds wireless transceivers by using the genetic
algorithm (GA), the proposed maximal declining sorting
algorithm (MDSA) can effectively reduce time
consumption by as much as 20.4% to 35.6%. We also
evaluate the power consumption and configuration time
to prove the effective of the proposed algorithm.

Index TermsMaximal declining sorting algorithm,
networks-on-chip, surface wave, network performance.
1. Introduction
A large number of chip multiprocessors (CMPs) have
been designed for different kinds of applications[1]
nowadays, including scientific computing, big data, the
Internet-based services[2]
, etc. One of the most important
parts in CMP is its networks-on-chip (NoC)[3]
, which can
provide efficient network communications for the processor
cores. NoC is a communication paradigm that has emerged
to tackle different on-chip challenges and satisfy different
demands in terms of high performance and economical

Manuscript received August 11, 2014; revised November 10, 2014.
This work was supported by the National Natural Science Foundation of
China under Grant No. 61376024 and No. 61306024, the Natural Science
Foundation of Guangdong Province under Grant No. S2013040014366.
and Basic Research Program of Shenzhen No. JCYJ20140417113430642
and JCYJ20140901003939020.
Z. Fu, Z.-B. Hu, and C. Gong are with the Department of Computer,
China University of Geosciences (Wuhan), Wuhan 430074, China (e-mail:
zhao.fu.unlv@gmail.com; tgbujm725vic@163.com; edoc1991@gmail.
com)
W.-M. Pan is in Guangzhou Institute of Advanced Technology, China
Academy of Sciences, Guangzhou, 511400, China (wm.pan@giat.ac.cn ).
G.-B. Lv is with the Department of Network, China University of
Geosciences (Wuhan), Wuhan 430074, China (Corresponding author
e-mail: gblv@cug.edu.cn)
Digital Object Identifier: 10.3969/j.issn.1674-862X.2014.04.005
interconnect implementations, which can mostly determine
the performance of a chip.
NoC in CMP is assumed to take a general purpose
packet-switching fabric[4]
where packets are transmitted
through complex router pipelines in a hop-by-hop manner.
Advances in the surface wave technique[5]
open a door to
handle the situation of too many hops in the network
communication. The surface wave technique has been
proved to be high performance, low latency, and less power
consuption in NoC communication. With this emerging
technology, time and power consumption can be greatly
reduced and the throughput will become larger. Karkar[6]
improved the surface wave technique to enhance its
performance when one core communicates with many other
cores. Fig. 1 shows the physical implementation of the
surface wave technique.
Fig. 1. Surface wave interconnect communication channel with
multi sub-channels, where the master node transmit through the
shared surface to slave node(s)
In a mesh network, the packet-switching fabric is the
basic for any communication between two different routers.
And the packets only can go through two adjacent routers.
The path where the packets go through is a hop. We can
easily figure out that there are many hops between two
different communication routers. If we can reduce the hops
at the largest degree, the performance of the NoC will be
the best.
In this paper, we propose a maximal declining sorting
algorithm (MDSA) based on the declined hops.
Benchmarks are carried out in the CMP simulator. Then the
traffic volume of any communication pairs can be recorded.
The following step is to calculate the hops between the
communication pairs. Consequently, we can easily get the
Application Aware Topology Generation for Surface
Wave Networks-on-Chip
Zhao Fu, Zheng-Bing Hu, Cheng Gong, Wen-Ming Pan, and Guo-Bin Lv
Flit(128b)
F0
Fn
(n) Sub-channels
Shared Surface
Rx node
mixer
compiner
16-QAM
Fn
F0
TX RX
FU et al.: Application Aware Topology Generation for Surface Wave Network-on-Chips 367
product of the hops and traffic volume. Wireless
transceivers can be added based on the product. The larger
the product, the more chances can be selected out, because
the affect of wireless here is much stronger than anywhere
else. As an example, there is a 33 common mesh network
in Fig. 2. We randomly generate a communication graph in
Fig. 3, which means we run an application on this 33
mesh network. Then we can generate the traffic volume of
any communication pairs. As mentioned before, we will get
the hops and product.
1,1
2,2
1,2
2,3
1,3
2,1
3,32,33,1
Fig. 2. Mesh network of 33.
Fig. 3. Communication graph.
The genetic algorithm (GA) is a widely used algorithm
which can provide a new way for us to find the proper
communication pairs that have a higher traffic volume.
Depending on the results of GA and MDSA, we could get
one of the possible topologies of the network mentioned in
Fig. 3, respectively, which are shown in Fig. 4.
(a) (b)
Fig. 4. Possible changed network topologies: (a) topology
mimiced by GA and (b) topology modified by the proposed
MDSA.
We can figure out the fact that the new proposed
algorithm is more powerful than the previous conventional
algorithm. Compared with the GA algorithm, the time
consumption of MDSA is 24.7% less, and the power
consumption also reduced.
The rest of the paper is arranged as follows. Section 2
introduces related research work. Section 3 describes the
detailed problem. Section 4 presents the proposed MDSA
and Section 5 describes the experiments which show the
proposed algorithm is more effective. Finally, the
conclusions are drawn in Section 6.
2. Related Work
Metal-based NoC just offers limited scalability with the
relentless technology scaling especially in global network
communications. It will cause more power consumption
and lower throughput than the wireless link NoC. Image the
scene that there are two communication cores which have a
long distance between the two communication units.
Common NoC in CMP is assumed to take packet-switching
fabric for general purposes where packets are transmitted
through complex router pipelines in a hop-by-hop manner.
Then the communication here will take many hops rather
than one hop by the wireless link. With more and more
cores integrated into a single silicon chip, this situation will
be even worse in the future.
On the other hand, emerging technologies[5]
, like the
surface wave technique, have been proved to be low latency
and high performance in network communications. With
the surface wave, the throughput will be much higher and
the average delay also can be greatly reduced. And some
wonderful ideas[6]
have been proposed to improve the
performance of surface wave in NoC. It acts as a paradigm
to tackle different on-chip challenges and satisfy different
demands in terms of high performance and economical
interconnect implementations.
To meet the scalability demand, this paper proposes a
new hybrid architecture[1][6]
combining the metal
interconnect and Zenneck surface waves interconnects
(SWIs) in NoC. The proposed architecture can evidently
reduce the hop counts between any communication pairs,
which will have better average network latency and higher
throughput. Furthermore, SWI certainly can ensure the
quality of the transmit signal while there are many wireless
interconncetion simultaneously. The time consumption also
can be greatly reduced by this new propose hybrid
architecture.
3. Problem Formulation
Nowadays, much time and power has been spent on the
network communication[8]
when we run an application in a
mobile phone, or some other electronic equipment. How to
speed up the network communication in NoC so as to meet
the demand of real-time applications brings in opportunities
as well as challenges[9]
. The target of our proposed
algorithm is to mimic the network topology in a chip, then
change the topology of the NoC in design time based on the
algorithm. By running the same application on the
optimized chip, we find that the communication time
greatly is reduced and the performance of the chip is better.
Communication time in NoC is vitally important
because it determines the network performance, and the
network performance will determine the property of a chip.
As we know, there are two key factors affecting the
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014368
communication time[10]
in NoC. They are the traffic
volumes between two communication units and the hops
from the source unit to the destination unit. The formulation
of the communication time of running an application on a
chip can be expressed as
0
, NoC set
( , ) ( , )hop( , )
s d
f s d T s d s d t

  (1)
where s is the coordinate of the source in a rectangular
plane coordinate system of the NoC and d means the
destination. Each has two parameters which are the
value in x axis and y axis. T(s, d) stands for the traffic
volume between unit s and unit d, and hop(s, d) is the hops
from s to d, and t0 is the clock cycle that is needed for one
byte data to finish one hop. The sum of each unit is the time
that we need to run the application. Our target is to
minimize f(s, d).
Then we can analyze the formulation step by step. The
traffic volume of two units in a chip will stay the same once
the application is defined. And the time cost for one byte
data to go one hop is also a constant. Only the hops
between two units can be changed by adding wireless
transceivers to the source and destination. However, owing
to the chip size and resources constraints, it is impossible to
add transceivers to all units. The number of transceivers
that can be added to a chip named “sum” and the number of
transceivers that can be added to a unit called “mum” are
strictly constrained. The constraints can be defined as
sum n (2)
num m . (3)
The sum also indicates the number of bridges can be
built in one NoC. If there are too many transceivers in a
unit, the performance of a chip may be lower, even the
chip may power off because of the high temperature.
4. Proposed Algorithm
The architecture of the experiment CMP is described as
follows. There are 64 cores on this mesh network CMP
simulator, which also contains another 8 memory
controllers in it. As the cache in cores have different access
speeds, we divide the core into two parts: L1 and L2. L1
stands for a faster cache which is nearby the central
processing unit (CPU), and L2 is a common cache. When
we cannot fetch the data in L1 and get a L1 miss response,
the core will access to L2 for the data. However, there also
exsit L2 miss. Then the data fetch instruction may access to
the memory controller for the data. It will not stop until
getting a hit response or a memory controller miss response.
There is a clear description in Fig. 5.
Consider the fact that the traffic volumes between two
units keep the same once the application is defined. No
matter the unit transfers the data packet to others or
receives from others. We count the traffic volume of the
source unit and destination unit. Then the traffic volume of
the 64 L1s, 64 L2s, and 8 memory controllers can be
obtained. So given a benchmark, we can quickly acquire
the traffic volume of each router. When we run the
benchmark in the CMP simulator, we can calculate the
traffic volume of each router, which is shown in Fig. 6.
Fig. 5. Architecture of NoC.
Fig. 6. Number of parts in different level of traffic volume when
we run the ferret benchmark in NoC.
The traffic volumes of L1 cache, L2 cache, and memory
controler can be counted based on the experiment result
before. Then the traffic volume between any two units of the
overall 136 units can be counted. The busy routers in NoC
can be expressed based on the different traffic volume in
each pair of parts. In Fig. 7, those two units which have a
higher workload will get a darker line between them. And
the width of the line will be larger. According to the figure,
the busier routers will be easily figured out. For those
routers, the wireless link is the best option, which can
reduce the communication time and poer consumption. A
new topology of the NoC is generated depending on the
traffic volume patterns of the benchmark.
And the time needed for one byte data to transfer
between two adjacent routers is also a constant. So the only
way to minimize the network communication time is to
reduce the hops by adding wireless transceivers to the NoC.
Once we set a pair of transceivers to the source unit and
destination unit, the hops between them will be 1.
0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0
Weighted degree (104
)
0.8
0.6
0.4
0.2
0
Rateofcoresinthisleve
FU et al.: Application Aware Topology Generation for Surface Wave Network-on-Chips 369
Fig. 7. Communication workload.
0( , )
1
m km k
m k
 

(4a)
1 ( , )=1
hop( , )
fabs( . . . . ) ( , )=0
m k
s d
s i s j d i d j m k



  
(4b)
where ( , )m k is used to record whether there is a
wireless transceiver between the two units: 1 means yes and
0 means no, and m is the communication pair, k is a set of
communication pairs. And fabs( . . . . )s i s j d i d j   means
the hops between a communication pair, source s and
destination d. s.i is the x coordinate of a source router, s.j is
the y coordinate of a source router. Likewise, d.i and d.j is
the x and y coordinates of destination.
The proposed new algorithm is based on the theory of
maximally reducing the time if there is a pair of wireless
transceivers between the two units. The algorithm can be
described as
0time( , ) ( , )(hop( , ) 1)s d T s d s d t  . (5)
In this formulation, (s, d) is the set of the source and
destination, time(s, d) records the reduced time, hop(s, d)1
is the hops that can be reduced if there is a wireless
transceivers between them.
Then we can sort the 2-dimension array in a descending
order:
( , ) sort(time( , ))t s d s d (6)
where sort() is the ranking function.
At last, we can select out the top n pairs of the source
and destination under the constraint that each router can be
linked with at most m transceivers and store the information
in a set which is the solution that where to add transceivers
is the most effective.
result( , ) select( ( , )).s d t s d (7)
The conventional algorithm to mimic the topology is as
follows. Given a mesh network, then the wireless
transceivers can be added to the source and destination
based on GA without any variation, where the son and its
parent belong to the same units set. But the metal-based
mesh NoC architecture can be modified by the surface
wave based on the result here.
Compared with the proposed algorithm, the NoC
proposed based on the experiment result of the
conventional algorithm maybe a sub-optimal solution. The
result is helpful but not the best. Maybe it can speed up the
network communication and sometimes it may have a good
effect. However, it will make no sense most of time. In the
new proposed algorithm, we mimic the topology based on
the theory of maximal declining. It directly points to the
effect of wireless transceivers. This can be a well-defined
way to optimize the topology. After the optimization, the
performance of the chip will be better.
5. Implementation
There exist two same chips. The NoC in each chip has
64 cores and 8 memory controllers. We modify the
topology of the NoC in two chips by two different
algorithms: the conventional GA without any variation and
the proposed MDSA, respectively. As mentioned before,
wireless transceivers are added to the communication pairs.
The parameters in the simulator when the applications are
running are shown in Table 1.
Then the eleven benchmarks are carried out on the two
new chips. The detailed information of each benchmark is
shown in Table 2.
The experiment result is presented in Table 3. The
column 1 is the name of benchmark. And the second one is
the number of data packets in this network communication
and the third one is the number of communication packets.
MDSA and GA are the clock cycles needed to execute the
benchmarks in the simulator by using MDSA and GA,
respectively.
Table 1: Parameters used in the simulation
Items Parameters
Number of processors 64 (MIPS ISA 32 compatible)
Fetch/decode/commit
size
4/4/4
ROB size 64
L1 D cache (private) 16 KB, 2-way, 32 B line,2 cycles, 2 ports,
dual tags
L1 I cache (private) 32 KB, 2-way, 64 B line,2 cycles,2 ports
L2 I cache (private) 64 KB slice,64 B line, 6 cycles, 2 ports
Main memory size 2 GB
Tech size 45 nm
On chip network parameters
NoC flit size 128-bit
Data packet size 8 flits
NoC latency router 2 cycles, link 1 cycle
NoC VC number 4
NoC buffer 5×8 flits
Table 2: Benchmarks
Type Information
BFS (different
input graphs)
graphs with 500/1000/2000/3000 vertexes
PARSEC blackscholes, fluidanimate, freqmine, canneal,
dedup, streamcluster, swaptions
SPLASH-2 barnes, raytrace
JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014370
Table 3: Experimental results
Benchmark Data_packet Meta_packet MDSA GA
Barnes 907 2106 1425347 1910653
Blackscholes 50 81 51461 71873
Canneal 998 2502 478897 603906
Dedup 97 132 922683 1159004
Ferret 110 158 165494 228899
Fluidanimate 1356 1571 100642 128698
Freqmine 20 10 465013 671014
Raytrace 517 1161 487533 707595
Streamcluster 388 422 312091 459634
Swaptions 76 38 613663 952893
Vips 207 144 1257614 1763835
The same eleven benchmarks are executed in the same
simulator while the topologies of NoC are different. From
Table 3, it can be seen that the time consumption of each
benchmark can be reduced by 20.4% to 35.6%. As the time
consumption for the network communication can be
effectively reduced, the application can be run faster and
more reliably. Then the power consumption of the chip can
also decline. So the performance of the chip will be better
to meet the demand of real-time applications.
6. Conclusions
This paper proposes the MDSA to optimize the network
performance. From the experiment results, it can be seen
that the proposed MDSA is more effective than the
conventional GA without any variation method. By using
the MDSA, the time consumption can be reduced by 20.4%
to 35.6% and the power consumption also declines
accordingly. The performance of the chip will also be better
than the previous one. With more and more cores integrated
into a chip, the time for network communication in NoC
will be longer and longer. Then the proposed algorithm will
play a more and more important role in solving the network
communication task in NoC in the near future.
References
[1] A. Ganguly, S. Deb, and B. Belzer, “Scalable hybrid
wireless network-on-chip architectures for multicore
systems,” IEEE Trans. on Computers, vol. 60, no. 10, pp.
14851502, 2011.
[2] L. Pejman, “Scale-out processors,” in Proc. of the 39th
Annual Int. Symposium on Computer Architecture, New
York, 2012, pp. 500511.
[3] A. Ghiribaldi, D. Ludovici, F. Trivino, A. Strano, J. Flich, J.
-L. S´anchez, F. Alfaro, M. Favalli, and D. Bertozzi, “A
complete self-testing and self-configuring NoC
infrastructure for cost-effective MPSoCs,” ACM Trans. on
Embedded Computing Systems, vol. 12, no. 4, pp. 106:129,
2013.
[4] H. Bokhari, “DarkNoC: designing energy efficient
network-on-chip with multi-VT cells for dark silicon,” in
Proc. of Design Automation Conf. 2014, San Francisco,
2014, pp. 354360.
[5] A. Karkar, J. Turner, K. Tong, T. Mak, A. Yakovlev, and X.
Fei, “Novel hybrid wire-surface wave interconnects for next
generation networks-on-chip,” IET Computers & Digital
Techniques, vol. 7, no. 6, pp. 294303, 2013.
[6] A. Karkar, “Hybrid wire-surface wave interconnects for
one-to-many traffic handling in network-on-chip,” in Proc.
of Design, Automation and Test in Europe Conf. and
Exhibition, Dresden, 2014, pp. 14.
[7] S. Lee, “ A scalable micro wireless interconnect structure for
CMPs,” in Proc. of the 15th Annual Int. Conf. on Mobile
Computing and Networking, Beijing, 2009, pp. C.1.2C.2.1.
[8] D. Ditomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W.
Rayess, “Energy-efficient adaptive wireless NoCs
architecture,” in Proc. of the 7th IEEE/ACM Int. Symposium
on Networks on Chip, Tempe, 2013, pp. 18.
[9] Y.-H. Jin and T. M. Pinkston, “PAIS: parallelism-aware
interconnect scheduling in multicores,” ACM Trans. on
Embedded Computing Systems, vol. 13, no. 3, pp. 108:121,
2014.
[10] A. E. Kiasari, Z. Hi Lu, and A. Jantsch, “An analytical
latency model for networks-on-chip,” IEEE Trans. on Very
Large Scale Integration Systems, vol. 21, no. 1, pp. 113123,
2013.
Zhao Fu was born in Hubei, China in 1990. He
received the B.S. degree from the China
University of Geosciences (Wuhan) (CUG) in
2013. He is currently pursuing the M.S. degree
in computer science with the same university.
His research interests include networks-on-chip,
computer architecture, and parallel computing.
Zheng-Bin Hu was born in Henan, China in 1991. He received
the B.S. degree from CUG, Wuhan in 2013. He is currently
pursuing the M.S. degree with the Department of Computer
Technology, CUG. His research interests include networks-on-chip,
computer architecture, and parallel computing.
Cheng Gong was born in Hubei, China in 1989. He received the
B.S. degree from CUG, Wuhan in 2013. He is currently pursuing
the M.S. degree with the Department of Computer Technology,
CUG. His research interests include evolutionary algorithms,
artificial neural networks, networks-on-chip, and chip
multiprocessors.
Wen-Ming Pan was born in Guangdong,
China in 1982. He received the B.S. and M.S.
degrees in electronic engineering from Jinan
University, Guangzhou in 2004 and 2007,
respectively. He works as a research
assistant with Guangzhou Institute of
Advanced Technology, Chinese Academy of
Sciences. He has been engaged in the FPGA
design research for years. His research interests include
networks-on-chip and parallel computing.
Guo-Bin Lv was born in Hubei, China in 1965. He received the
B.S. degree from Xi’an Jiao Tong University (XJTU), Xi’an in
1987, the M.S. degree from CUG, Wuhan in 1996, and the Ph.D.
degree from CUG, Wuhan in 2012. He is currently working as a
professor with the Department of Network, CUG. His research
interests include high performance computing and computer
network.
Zheng-Bin Hu’s, Cheng Gong’s, and Guo-Bin Lv’s photographs
are not available at the time of publication.

More Related Content

What's hot

M.E Computer Science Mobile Computing Projects
M.E Computer Science Mobile Computing ProjectsM.E Computer Science Mobile Computing Projects
M.E Computer Science Mobile Computing ProjectsVijay Karan
 
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...IJECEIAES
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...IRJET Journal
 
Physical layer network coding
Physical layer network codingPhysical layer network coding
Physical layer network codingNguyen Tan
 
Optimized Traffic Flow over Multipath in Optical Networks
Optimized Traffic Flow over Multipath in Optical NetworksOptimized Traffic Flow over Multipath in Optical Networks
Optimized Traffic Flow over Multipath in Optical Networkspaperpublications3
 
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...Comparative Analysis of Green Algorithm within Active Queue Management for Mo...
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...ijtsrd
 
A DDRESSING T HE M ULTICHANNEL S ELECTION , S CHEDULING A ND C OORDINATION...
A DDRESSING  T HE  M ULTICHANNEL S ELECTION , S CHEDULING  A ND C OORDINATION...A DDRESSING  T HE  M ULTICHANNEL S ELECTION , S CHEDULING  A ND C OORDINATION...
A DDRESSING T HE M ULTICHANNEL S ELECTION , S CHEDULING A ND C OORDINATION...pijans
 
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERINGENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERINGIJCNCJournal
 
Performance of cluster-based cognitive multihop networks under joint impact o...
Performance of cluster-based cognitive multihop networks under joint impact o...Performance of cluster-based cognitive multihop networks under joint impact o...
Performance of cluster-based cognitive multihop networks under joint impact o...TELKOMNIKA JOURNAL
 
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...CSCJournals
 
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...ijwmn
 
5113jgraph01
5113jgraph015113jgraph01
5113jgraph01graphhoc
 
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...Georgios Giannakopoulos
 
Congestion control, routing, and scheduling 2015
Congestion control, routing, and scheduling 2015Congestion control, routing, and scheduling 2015
Congestion control, routing, and scheduling 2015parry prabhu
 
Multi-Channel Multi-Interface Wireless Network Architecture
Multi-Channel Multi-Interface Wireless Network ArchitectureMulti-Channel Multi-Interface Wireless Network Architecture
Multi-Channel Multi-Interface Wireless Network Architectureijsrd.com
 
O dsr optimized dsr routing
O dsr optimized dsr routingO dsr optimized dsr routing
O dsr optimized dsr routingijwmn
 
Adaptive resource allocation and internet traffic engineering on data network
Adaptive resource allocation and internet traffic engineering on data networkAdaptive resource allocation and internet traffic engineering on data network
Adaptive resource allocation and internet traffic engineering on data networkcsandit
 
Route Reservation In Adhoc Wireless Networks
Route Reservation In Adhoc Wireless NetworksRoute Reservation In Adhoc Wireless Networks
Route Reservation In Adhoc Wireless NetworksIJRES Journal
 

What's hot (20)

M.E Computer Science Mobile Computing Projects
M.E Computer Science Mobile Computing ProjectsM.E Computer Science Mobile Computing Projects
M.E Computer Science Mobile Computing Projects
 
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...
Analysing Mobile Random Early Detection for Congestion Control in Mobile Ad-h...
 
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
A Multiple Access Technique for Differential Noise Shift Keying: A Review of ...
 
Physical layer network coding
Physical layer network codingPhysical layer network coding
Physical layer network coding
 
Optimized Traffic Flow over Multipath in Optical Networks
Optimized Traffic Flow over Multipath in Optical NetworksOptimized Traffic Flow over Multipath in Optical Networks
Optimized Traffic Flow over Multipath in Optical Networks
 
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...Comparative Analysis of Green Algorithm within Active Queue Management for Mo...
Comparative Analysis of Green Algorithm within Active Queue Management for Mo...
 
5 G Numerology
5 G Numerology5 G Numerology
5 G Numerology
 
Bo mim orelay
Bo mim orelayBo mim orelay
Bo mim orelay
 
A DDRESSING T HE M ULTICHANNEL S ELECTION , S CHEDULING A ND C OORDINATION...
A DDRESSING  T HE  M ULTICHANNEL S ELECTION , S CHEDULING  A ND C OORDINATION...A DDRESSING  T HE  M ULTICHANNEL S ELECTION , S CHEDULING  A ND C OORDINATION...
A DDRESSING T HE M ULTICHANNEL S ELECTION , S CHEDULING A ND C OORDINATION...
 
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERINGENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
ENERGY CONSUMPTION REDUCTION IN WIRELESS SENSOR NETWORK BASED ON CLUSTERING
 
Performance of cluster-based cognitive multihop networks under joint impact o...
Performance of cluster-based cognitive multihop networks under joint impact o...Performance of cluster-based cognitive multihop networks under joint impact o...
Performance of cluster-based cognitive multihop networks under joint impact o...
 
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...
Routing in Wireless Sensor Networks: Improved Energy Efficiency and Coverage ...
 
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...
PERFORMANCE EVALUATION OF SELECTED E2E TCP CONGESTION CONTROL MECHANISM OVER ...
 
5113jgraph01
5113jgraph015113jgraph01
5113jgraph01
 
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...
How to Dimension Wireless Networks for Packet Data Services with Guaranteed Q...
 
Congestion control, routing, and scheduling 2015
Congestion control, routing, and scheduling 2015Congestion control, routing, and scheduling 2015
Congestion control, routing, and scheduling 2015
 
Multi-Channel Multi-Interface Wireless Network Architecture
Multi-Channel Multi-Interface Wireless Network ArchitectureMulti-Channel Multi-Interface Wireless Network Architecture
Multi-Channel Multi-Interface Wireless Network Architecture
 
O dsr optimized dsr routing
O dsr optimized dsr routingO dsr optimized dsr routing
O dsr optimized dsr routing
 
Adaptive resource allocation and internet traffic engineering on data network
Adaptive resource allocation and internet traffic engineering on data networkAdaptive resource allocation and internet traffic engineering on data network
Adaptive resource allocation and internet traffic engineering on data network
 
Route Reservation In Adhoc Wireless Networks
Route Reservation In Adhoc Wireless NetworksRoute Reservation In Adhoc Wireless Networks
Route Reservation In Adhoc Wireless Networks
 

Similar to Application Aware Topology Generation for Surface Wave Networks-on-Chip

Nocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksNocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksijcsa
 
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONranjith kumar
 
Improving the network lifetime of mane ts through cooperative mac protocol de...
Improving the network lifetime of mane ts through cooperative mac protocol de...Improving the network lifetime of mane ts through cooperative mac protocol de...
Improving the network lifetime of mane ts through cooperative mac protocol de...Pvrtechnologies Nellore
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkIJMTST Journal
 
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...ijgca
 
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...Analysis of Link State Resource Reservation Protocol for Congestion Managemen...
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...ijgca
 
Xevgenis_Michael_CI7110_ Data_Communications.DOC
Xevgenis_Michael_CI7110_ Data_Communications.DOCXevgenis_Michael_CI7110_ Data_Communications.DOC
Xevgenis_Michael_CI7110_ Data_Communications.DOCMichael Xevgenis
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsVijay Karan
 
M phil-computer-science-networking-projects
M phil-computer-science-networking-projectsM phil-computer-science-networking-projects
M phil-computer-science-networking-projectsVijay Karan
 
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...IJCNCJournal
 
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...IJCNCJournal
 
Performing Network Simulators of TCP with E2E Network Model over UMTS Networks
Performing Network Simulators of TCP with E2E Network Model over UMTS NetworksPerforming Network Simulators of TCP with E2E Network Model over UMTS Networks
Performing Network Simulators of TCP with E2E Network Model over UMTS NetworksAM Publications,India
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelIJMER
 
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...IJECEIAES
 
M.Phil Computer Science Mobile Computing Projects
M.Phil Computer Science Mobile Computing ProjectsM.Phil Computer Science Mobile Computing Projects
M.Phil Computer Science Mobile Computing ProjectsVijay Karan
 
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETs
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETsVEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETs
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETsCSCJournals
 
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...csandit
 
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...cscpconf
 

Similar to Application Aware Topology Generation for Surface Wave Networks-on-Chip (20)

Nocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless linksNocs performance improvement using parallel transmission through wireless links
Nocs performance improvement using parallel transmission through wireless links
 
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTIONIEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
IEEE BE-BTECH NS2 PROJECT@ DREAMWEB TECHNO SOLUTION
 
Improving the network lifetime of mane ts through cooperative mac protocol de...
Improving the network lifetime of mane ts through cooperative mac protocol de...Improving the network lifetime of mane ts through cooperative mac protocol de...
Improving the network lifetime of mane ts through cooperative mac protocol de...
 
Design of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection NetworkDesign of an Efficient Communication Protocol for 3d Interconnection Network
Design of an Efficient Communication Protocol for 3d Interconnection Network
 
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...
ANALYSIS OF LINK STATE RESOURCE RESERVATION PROTOCOL FOR CONGESTION MANAGEMEN...
 
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...Analysis of Link State Resource Reservation Protocol for Congestion Managemen...
Analysis of Link State Resource Reservation Protocol for Congestion Managemen...
 
Xevgenis_Michael_CI7110_ Data_Communications.DOC
Xevgenis_Michael_CI7110_ Data_Communications.DOCXevgenis_Michael_CI7110_ Data_Communications.DOC
Xevgenis_Michael_CI7110_ Data_Communications.DOC
 
40520130101003
4052013010100340520130101003
40520130101003
 
M.Phil Computer Science Networking Projects
M.Phil Computer Science Networking ProjectsM.Phil Computer Science Networking Projects
M.Phil Computer Science Networking Projects
 
M phil-computer-science-networking-projects
M phil-computer-science-networking-projectsM phil-computer-science-networking-projects
M phil-computer-science-networking-projects
 
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
 
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
Optimize the Network Coding Paths to Enhance the Coding Protection in Wireles...
 
Performing Network Simulators of TCP with E2E Network Model over UMTS Networks
Performing Network Simulators of TCP with E2E Network Model over UMTS NetworksPerforming Network Simulators of TCP with E2E Network Model over UMTS Networks
Performing Network Simulators of TCP with E2E Network Model over UMTS Networks
 
Noise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc ModelNoise Tolerant and Faster On Chip Communication Using Binoc Model
Noise Tolerant and Faster On Chip Communication Using Binoc Model
 
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...
Improvement of crankshaft MAC protocol for wireless sensor networks: a simula...
 
M.Phil Computer Science Mobile Computing Projects
M.Phil Computer Science Mobile Computing ProjectsM.Phil Computer Science Mobile Computing Projects
M.Phil Computer Science Mobile Computing Projects
 
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETs
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETsVEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETs
VEGAS: Better Performance than other TCP Congestion Control Algorithms on MANETs
 
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...
A Cross Layer Based Scalable Channel Slot Re-Utilization Technique for Wirele...
 
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...
A CROSS-LAYER BASED SCALABLE CHANNEL SLOT RE-UTILIZATION TECHNIQUE FOR WIRELE...
 
1720 1724
1720 17241720 1724
1720 1724
 

Application Aware Topology Generation for Surface Wave Networks-on-Chip

  • 1. JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014366 AbstractThe networks-on-chip (NoC) communica- tion has an increasingly larger impact on the system power consumption and performance. Emerging technologies, like surface wave, are believed to have lower transmission latency and power consumption over the conventional wireless NoC. Therefore, this paper studies how to optimize the network performance and power consumption by giving the packet-switching fabric and traffic pattern of each application. Compared with the conventional method of wire-linked, which adds wireless transceivers by using the genetic algorithm (GA), the proposed maximal declining sorting algorithm (MDSA) can effectively reduce time consumption by as much as 20.4% to 35.6%. We also evaluate the power consumption and configuration time to prove the effective of the proposed algorithm.  Index TermsMaximal declining sorting algorithm, networks-on-chip, surface wave, network performance. 1. Introduction A large number of chip multiprocessors (CMPs) have been designed for different kinds of applications[1] nowadays, including scientific computing, big data, the Internet-based services[2] , etc. One of the most important parts in CMP is its networks-on-chip (NoC)[3] , which can provide efficient network communications for the processor cores. NoC is a communication paradigm that has emerged to tackle different on-chip challenges and satisfy different demands in terms of high performance and economical  Manuscript received August 11, 2014; revised November 10, 2014. This work was supported by the National Natural Science Foundation of China under Grant No. 61376024 and No. 61306024, the Natural Science Foundation of Guangdong Province under Grant No. S2013040014366. and Basic Research Program of Shenzhen No. JCYJ20140417113430642 and JCYJ20140901003939020. Z. Fu, Z.-B. Hu, and C. Gong are with the Department of Computer, China University of Geosciences (Wuhan), Wuhan 430074, China (e-mail: zhao.fu.unlv@gmail.com; tgbujm725vic@163.com; edoc1991@gmail. com) W.-M. Pan is in Guangzhou Institute of Advanced Technology, China Academy of Sciences, Guangzhou, 511400, China (wm.pan@giat.ac.cn ). G.-B. Lv is with the Department of Network, China University of Geosciences (Wuhan), Wuhan 430074, China (Corresponding author e-mail: gblv@cug.edu.cn) Digital Object Identifier: 10.3969/j.issn.1674-862X.2014.04.005 interconnect implementations, which can mostly determine the performance of a chip. NoC in CMP is assumed to take a general purpose packet-switching fabric[4] where packets are transmitted through complex router pipelines in a hop-by-hop manner. Advances in the surface wave technique[5] open a door to handle the situation of too many hops in the network communication. The surface wave technique has been proved to be high performance, low latency, and less power consuption in NoC communication. With this emerging technology, time and power consumption can be greatly reduced and the throughput will become larger. Karkar[6] improved the surface wave technique to enhance its performance when one core communicates with many other cores. Fig. 1 shows the physical implementation of the surface wave technique. Fig. 1. Surface wave interconnect communication channel with multi sub-channels, where the master node transmit through the shared surface to slave node(s) In a mesh network, the packet-switching fabric is the basic for any communication between two different routers. And the packets only can go through two adjacent routers. The path where the packets go through is a hop. We can easily figure out that there are many hops between two different communication routers. If we can reduce the hops at the largest degree, the performance of the NoC will be the best. In this paper, we propose a maximal declining sorting algorithm (MDSA) based on the declined hops. Benchmarks are carried out in the CMP simulator. Then the traffic volume of any communication pairs can be recorded. The following step is to calculate the hops between the communication pairs. Consequently, we can easily get the Application Aware Topology Generation for Surface Wave Networks-on-Chip Zhao Fu, Zheng-Bing Hu, Cheng Gong, Wen-Ming Pan, and Guo-Bin Lv Flit(128b) F0 Fn (n) Sub-channels Shared Surface Rx node mixer compiner 16-QAM Fn F0 TX RX
  • 2. FU et al.: Application Aware Topology Generation for Surface Wave Network-on-Chips 367 product of the hops and traffic volume. Wireless transceivers can be added based on the product. The larger the product, the more chances can be selected out, because the affect of wireless here is much stronger than anywhere else. As an example, there is a 33 common mesh network in Fig. 2. We randomly generate a communication graph in Fig. 3, which means we run an application on this 33 mesh network. Then we can generate the traffic volume of any communication pairs. As mentioned before, we will get the hops and product. 1,1 2,2 1,2 2,3 1,3 2,1 3,32,33,1 Fig. 2. Mesh network of 33. Fig. 3. Communication graph. The genetic algorithm (GA) is a widely used algorithm which can provide a new way for us to find the proper communication pairs that have a higher traffic volume. Depending on the results of GA and MDSA, we could get one of the possible topologies of the network mentioned in Fig. 3, respectively, which are shown in Fig. 4. (a) (b) Fig. 4. Possible changed network topologies: (a) topology mimiced by GA and (b) topology modified by the proposed MDSA. We can figure out the fact that the new proposed algorithm is more powerful than the previous conventional algorithm. Compared with the GA algorithm, the time consumption of MDSA is 24.7% less, and the power consumption also reduced. The rest of the paper is arranged as follows. Section 2 introduces related research work. Section 3 describes the detailed problem. Section 4 presents the proposed MDSA and Section 5 describes the experiments which show the proposed algorithm is more effective. Finally, the conclusions are drawn in Section 6. 2. Related Work Metal-based NoC just offers limited scalability with the relentless technology scaling especially in global network communications. It will cause more power consumption and lower throughput than the wireless link NoC. Image the scene that there are two communication cores which have a long distance between the two communication units. Common NoC in CMP is assumed to take packet-switching fabric for general purposes where packets are transmitted through complex router pipelines in a hop-by-hop manner. Then the communication here will take many hops rather than one hop by the wireless link. With more and more cores integrated into a single silicon chip, this situation will be even worse in the future. On the other hand, emerging technologies[5] , like the surface wave technique, have been proved to be low latency and high performance in network communications. With the surface wave, the throughput will be much higher and the average delay also can be greatly reduced. And some wonderful ideas[6] have been proposed to improve the performance of surface wave in NoC. It acts as a paradigm to tackle different on-chip challenges and satisfy different demands in terms of high performance and economical interconnect implementations. To meet the scalability demand, this paper proposes a new hybrid architecture[1][6] combining the metal interconnect and Zenneck surface waves interconnects (SWIs) in NoC. The proposed architecture can evidently reduce the hop counts between any communication pairs, which will have better average network latency and higher throughput. Furthermore, SWI certainly can ensure the quality of the transmit signal while there are many wireless interconncetion simultaneously. The time consumption also can be greatly reduced by this new propose hybrid architecture. 3. Problem Formulation Nowadays, much time and power has been spent on the network communication[8] when we run an application in a mobile phone, or some other electronic equipment. How to speed up the network communication in NoC so as to meet the demand of real-time applications brings in opportunities as well as challenges[9] . The target of our proposed algorithm is to mimic the network topology in a chip, then change the topology of the NoC in design time based on the algorithm. By running the same application on the optimized chip, we find that the communication time greatly is reduced and the performance of the chip is better. Communication time in NoC is vitally important because it determines the network performance, and the network performance will determine the property of a chip. As we know, there are two key factors affecting the
  • 3. JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014368 communication time[10] in NoC. They are the traffic volumes between two communication units and the hops from the source unit to the destination unit. The formulation of the communication time of running an application on a chip can be expressed as 0 , NoC set ( , ) ( , )hop( , ) s d f s d T s d s d t    (1) where s is the coordinate of the source in a rectangular plane coordinate system of the NoC and d means the destination. Each has two parameters which are the value in x axis and y axis. T(s, d) stands for the traffic volume between unit s and unit d, and hop(s, d) is the hops from s to d, and t0 is the clock cycle that is needed for one byte data to finish one hop. The sum of each unit is the time that we need to run the application. Our target is to minimize f(s, d). Then we can analyze the formulation step by step. The traffic volume of two units in a chip will stay the same once the application is defined. And the time cost for one byte data to go one hop is also a constant. Only the hops between two units can be changed by adding wireless transceivers to the source and destination. However, owing to the chip size and resources constraints, it is impossible to add transceivers to all units. The number of transceivers that can be added to a chip named “sum” and the number of transceivers that can be added to a unit called “mum” are strictly constrained. The constraints can be defined as sum n (2) num m . (3) The sum also indicates the number of bridges can be built in one NoC. If there are too many transceivers in a unit, the performance of a chip may be lower, even the chip may power off because of the high temperature. 4. Proposed Algorithm The architecture of the experiment CMP is described as follows. There are 64 cores on this mesh network CMP simulator, which also contains another 8 memory controllers in it. As the cache in cores have different access speeds, we divide the core into two parts: L1 and L2. L1 stands for a faster cache which is nearby the central processing unit (CPU), and L2 is a common cache. When we cannot fetch the data in L1 and get a L1 miss response, the core will access to L2 for the data. However, there also exsit L2 miss. Then the data fetch instruction may access to the memory controller for the data. It will not stop until getting a hit response or a memory controller miss response. There is a clear description in Fig. 5. Consider the fact that the traffic volumes between two units keep the same once the application is defined. No matter the unit transfers the data packet to others or receives from others. We count the traffic volume of the source unit and destination unit. Then the traffic volume of the 64 L1s, 64 L2s, and 8 memory controllers can be obtained. So given a benchmark, we can quickly acquire the traffic volume of each router. When we run the benchmark in the CMP simulator, we can calculate the traffic volume of each router, which is shown in Fig. 6. Fig. 5. Architecture of NoC. Fig. 6. Number of parts in different level of traffic volume when we run the ferret benchmark in NoC. The traffic volumes of L1 cache, L2 cache, and memory controler can be counted based on the experiment result before. Then the traffic volume between any two units of the overall 136 units can be counted. The busy routers in NoC can be expressed based on the different traffic volume in each pair of parts. In Fig. 7, those two units which have a higher workload will get a darker line between them. And the width of the line will be larger. According to the figure, the busier routers will be easily figured out. For those routers, the wireless link is the best option, which can reduce the communication time and poer consumption. A new topology of the NoC is generated depending on the traffic volume patterns of the benchmark. And the time needed for one byte data to transfer between two adjacent routers is also a constant. So the only way to minimize the network communication time is to reduce the hops by adding wireless transceivers to the NoC. Once we set a pair of transceivers to the source unit and destination unit, the hops between them will be 1. 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 Weighted degree (104 ) 0.8 0.6 0.4 0.2 0 Rateofcoresinthisleve
  • 4. FU et al.: Application Aware Topology Generation for Surface Wave Network-on-Chips 369 Fig. 7. Communication workload. 0( , ) 1 m km k m k    (4a) 1 ( , )=1 hop( , ) fabs( . . . . ) ( , )=0 m k s d s i s j d i d j m k       (4b) where ( , )m k is used to record whether there is a wireless transceiver between the two units: 1 means yes and 0 means no, and m is the communication pair, k is a set of communication pairs. And fabs( . . . . )s i s j d i d j   means the hops between a communication pair, source s and destination d. s.i is the x coordinate of a source router, s.j is the y coordinate of a source router. Likewise, d.i and d.j is the x and y coordinates of destination. The proposed new algorithm is based on the theory of maximally reducing the time if there is a pair of wireless transceivers between the two units. The algorithm can be described as 0time( , ) ( , )(hop( , ) 1)s d T s d s d t  . (5) In this formulation, (s, d) is the set of the source and destination, time(s, d) records the reduced time, hop(s, d)1 is the hops that can be reduced if there is a wireless transceivers between them. Then we can sort the 2-dimension array in a descending order: ( , ) sort(time( , ))t s d s d (6) where sort() is the ranking function. At last, we can select out the top n pairs of the source and destination under the constraint that each router can be linked with at most m transceivers and store the information in a set which is the solution that where to add transceivers is the most effective. result( , ) select( ( , )).s d t s d (7) The conventional algorithm to mimic the topology is as follows. Given a mesh network, then the wireless transceivers can be added to the source and destination based on GA without any variation, where the son and its parent belong to the same units set. But the metal-based mesh NoC architecture can be modified by the surface wave based on the result here. Compared with the proposed algorithm, the NoC proposed based on the experiment result of the conventional algorithm maybe a sub-optimal solution. The result is helpful but not the best. Maybe it can speed up the network communication and sometimes it may have a good effect. However, it will make no sense most of time. In the new proposed algorithm, we mimic the topology based on the theory of maximal declining. It directly points to the effect of wireless transceivers. This can be a well-defined way to optimize the topology. After the optimization, the performance of the chip will be better. 5. Implementation There exist two same chips. The NoC in each chip has 64 cores and 8 memory controllers. We modify the topology of the NoC in two chips by two different algorithms: the conventional GA without any variation and the proposed MDSA, respectively. As mentioned before, wireless transceivers are added to the communication pairs. The parameters in the simulator when the applications are running are shown in Table 1. Then the eleven benchmarks are carried out on the two new chips. The detailed information of each benchmark is shown in Table 2. The experiment result is presented in Table 3. The column 1 is the name of benchmark. And the second one is the number of data packets in this network communication and the third one is the number of communication packets. MDSA and GA are the clock cycles needed to execute the benchmarks in the simulator by using MDSA and GA, respectively. Table 1: Parameters used in the simulation Items Parameters Number of processors 64 (MIPS ISA 32 compatible) Fetch/decode/commit size 4/4/4 ROB size 64 L1 D cache (private) 16 KB, 2-way, 32 B line,2 cycles, 2 ports, dual tags L1 I cache (private) 32 KB, 2-way, 64 B line,2 cycles,2 ports L2 I cache (private) 64 KB slice,64 B line, 6 cycles, 2 ports Main memory size 2 GB Tech size 45 nm On chip network parameters NoC flit size 128-bit Data packet size 8 flits NoC latency router 2 cycles, link 1 cycle NoC VC number 4 NoC buffer 5×8 flits Table 2: Benchmarks Type Information BFS (different input graphs) graphs with 500/1000/2000/3000 vertexes PARSEC blackscholes, fluidanimate, freqmine, canneal, dedup, streamcluster, swaptions SPLASH-2 barnes, raytrace
  • 5. JOURNAL OF ELECTRONIC SCIENCE AND TECHNOLOGY, VOL. 12, NO. 4, DECEMBER 2014370 Table 3: Experimental results Benchmark Data_packet Meta_packet MDSA GA Barnes 907 2106 1425347 1910653 Blackscholes 50 81 51461 71873 Canneal 998 2502 478897 603906 Dedup 97 132 922683 1159004 Ferret 110 158 165494 228899 Fluidanimate 1356 1571 100642 128698 Freqmine 20 10 465013 671014 Raytrace 517 1161 487533 707595 Streamcluster 388 422 312091 459634 Swaptions 76 38 613663 952893 Vips 207 144 1257614 1763835 The same eleven benchmarks are executed in the same simulator while the topologies of NoC are different. From Table 3, it can be seen that the time consumption of each benchmark can be reduced by 20.4% to 35.6%. As the time consumption for the network communication can be effectively reduced, the application can be run faster and more reliably. Then the power consumption of the chip can also decline. So the performance of the chip will be better to meet the demand of real-time applications. 6. Conclusions This paper proposes the MDSA to optimize the network performance. From the experiment results, it can be seen that the proposed MDSA is more effective than the conventional GA without any variation method. By using the MDSA, the time consumption can be reduced by 20.4% to 35.6% and the power consumption also declines accordingly. The performance of the chip will also be better than the previous one. With more and more cores integrated into a chip, the time for network communication in NoC will be longer and longer. Then the proposed algorithm will play a more and more important role in solving the network communication task in NoC in the near future. References [1] A. Ganguly, S. Deb, and B. Belzer, “Scalable hybrid wireless network-on-chip architectures for multicore systems,” IEEE Trans. on Computers, vol. 60, no. 10, pp. 14851502, 2011. [2] L. Pejman, “Scale-out processors,” in Proc. of the 39th Annual Int. Symposium on Computer Architecture, New York, 2012, pp. 500511. [3] A. Ghiribaldi, D. Ludovici, F. Trivino, A. Strano, J. Flich, J. -L. S´anchez, F. Alfaro, M. Favalli, and D. Bertozzi, “A complete self-testing and self-configuring NoC infrastructure for cost-effective MPSoCs,” ACM Trans. on Embedded Computing Systems, vol. 12, no. 4, pp. 106:129, 2013. [4] H. Bokhari, “DarkNoC: designing energy efficient network-on-chip with multi-VT cells for dark silicon,” in Proc. of Design Automation Conf. 2014, San Francisco, 2014, pp. 354360. [5] A. Karkar, J. Turner, K. Tong, T. Mak, A. Yakovlev, and X. Fei, “Novel hybrid wire-surface wave interconnects for next generation networks-on-chip,” IET Computers & Digital Techniques, vol. 7, no. 6, pp. 294303, 2013. [6] A. Karkar, “Hybrid wire-surface wave interconnects for one-to-many traffic handling in network-on-chip,” in Proc. of Design, Automation and Test in Europe Conf. and Exhibition, Dresden, 2014, pp. 14. [7] S. Lee, “ A scalable micro wireless interconnect structure for CMPs,” in Proc. of the 15th Annual Int. Conf. on Mobile Computing and Networking, Beijing, 2009, pp. C.1.2C.2.1. [8] D. Ditomaso, A. Kodi, D. Matolak, S. Kaya, S. Laha, and W. Rayess, “Energy-efficient adaptive wireless NoCs architecture,” in Proc. of the 7th IEEE/ACM Int. Symposium on Networks on Chip, Tempe, 2013, pp. 18. [9] Y.-H. Jin and T. M. Pinkston, “PAIS: parallelism-aware interconnect scheduling in multicores,” ACM Trans. on Embedded Computing Systems, vol. 13, no. 3, pp. 108:121, 2014. [10] A. E. Kiasari, Z. Hi Lu, and A. Jantsch, “An analytical latency model for networks-on-chip,” IEEE Trans. on Very Large Scale Integration Systems, vol. 21, no. 1, pp. 113123, 2013. Zhao Fu was born in Hubei, China in 1990. He received the B.S. degree from the China University of Geosciences (Wuhan) (CUG) in 2013. He is currently pursuing the M.S. degree in computer science with the same university. His research interests include networks-on-chip, computer architecture, and parallel computing. Zheng-Bin Hu was born in Henan, China in 1991. He received the B.S. degree from CUG, Wuhan in 2013. He is currently pursuing the M.S. degree with the Department of Computer Technology, CUG. His research interests include networks-on-chip, computer architecture, and parallel computing. Cheng Gong was born in Hubei, China in 1989. He received the B.S. degree from CUG, Wuhan in 2013. He is currently pursuing the M.S. degree with the Department of Computer Technology, CUG. His research interests include evolutionary algorithms, artificial neural networks, networks-on-chip, and chip multiprocessors. Wen-Ming Pan was born in Guangdong, China in 1982. He received the B.S. and M.S. degrees in electronic engineering from Jinan University, Guangzhou in 2004 and 2007, respectively. He works as a research assistant with Guangzhou Institute of Advanced Technology, Chinese Academy of Sciences. He has been engaged in the FPGA design research for years. His research interests include networks-on-chip and parallel computing. Guo-Bin Lv was born in Hubei, China in 1965. He received the B.S. degree from Xi’an Jiao Tong University (XJTU), Xi’an in 1987, the M.S. degree from CUG, Wuhan in 1996, and the Ph.D. degree from CUG, Wuhan in 2012. He is currently working as a professor with the Department of Network, CUG. His research interests include high performance computing and computer network. Zheng-Bin Hu’s, Cheng Gong’s, and Guo-Bin Lv’s photographs are not available at the time of publication.