I studied in Indian Institute of Technology, Kharagpur, India. I did my B.Texh and M.Tech in the department of Electronics and Electrical Communication Engineering. I was student of 2018 batch. After that, I joined Schneider Electric Systems India Private limited Company as Software design Engineer. Currently I am designated as Senior Firmware Engineer in the same company. I have work experience of 4+ years. The uploaded ppt is my MTP Thesis. It is about "temperature aware application mapping on to mesh based network on chip using Genetic Algorithm".
Model Call Girl in Narela Delhi reach out to us at π8264348440π
Β
Kailash(13EC35032)_mtp.pptx
1. By
Kailash Chand Meena
(13EC35032)
under the supervision of
Prof. Santanu Chattopadhyay
Department of Electronics and Electrical Communication
Engineering
IIT Kharagpur
2. 1.Introduction:
ο Application mapping is one of the most important dimensions in Network-
on-Chip (NoC) research. It affects the overall performance and power
requirement of the system.
ο Rapid progress in technology scaling makes transistors smaller and faster
over successive generations and consequently number of IP cores in a
system gets increased but power consumption of transistor no longer
scales in proportion .
ο Increasing number of IP-cores in a multi-processor system on chip makes
NoC application mapping more challenging to find optimum core-to-router
mapping.
ο A significant proportion of the power consumed gets directly dissipated as
heat. Increase in power density can lead to increase several others.
ο Application mapping with its ability to spread out high power components
can potentially be a good approach to mitigate the looming issue of
hotspots in many-core processors.
3. Terminology in Application Mapping
ο Application: An application consists of a set of tasks, each of which is implemented
by an IP core.
ο IP Cores : Functional modules of NoC are known as intellectual property(IP) cores.
ο Hopcount: Distance is measured in terms of hopcount to transmit a message from
source router to the destination router through the router fabric.
ο Core Graph: Application can be represented in the form of a core graph, with each
vertex representing an IP core and the directed edge representing the
communication between the cores. An video application VOPD(video object plane
decoder) consists of 16 cores and DVOPD(dual video object plane decoder) consists
of 32 cores.
6. οMesh Topology:
β’The mesh topology is one of the most common network topologies because it
provides a regular structure with short interconnects and a high bisection width and
a modular architecture for the NoC with equal sized links.
7. 2.What is Application Mapping Problem?
ο The core graph of an application is a directed graph, CG(C,E) with each vertex ππβ C
representing a core and the directed edge ππ,πβE representing the communication
between the cores ππ and ππ. The bandwidth requirement of the communication from ππ
to ππ, is weighted to the edge ππ,π and is denoted by πππππ,π.
ο The NoC topology graph is a directed graph TG(T,G) with each vertex π‘π belongs to T
representing a node in the topology and the directed edge ππ,πrepresenting a physical
link between the vertices π‘π and π‘π. The weight of the edge ππ,πis denoted as ππ€π,π
represents the bandwidth across the edge ππ,π.
ο A mapping of core graph CG(C,E) onto the topology graph TG(T,G) is defined by the
function H: CG βTP. Such that, βππβC,βπ‘πβT and map (ππ) = π‘π .
ο The quality of such a mapping is defined in terms of the total communication cost of the
application under this mapping. The communication between each pair of cores can be
treated as flow of a single commodity ππ, k = 1, 2,...,|E|.
ο The value of commodity ππ corresponding to the communication between cores ππ and
ππ is equal to πππππ,π , the bandwidth requirement. The quantity ππ(i, j) indicating the
value of commodity ππ flowing through link (π‘π, π‘π) is given by-
value (ππ) , if link (π‘π, π‘π) ο Path (source (ππ ),destination (ππ))
0 , otherwise
8. Contd.
ο To ensure that the bandwidth does not exceed the limits of individual links,
the following constraints must be satisfied-
π=1
|πΈ|
ππ(π, π) β€ ππ€π,π , β i, j β {1, 2,...,|T |}.
ο The Communication Cost between the core ππ and ππ is measured by-
πΆππππππ π‘ π,π = πππππ,π Γ ππ·(map ππ , map ππ )
ο The total communication cost of a mapping solution is calculated as-
πΆππππΆππ π‘ = ππ,ππ βπΈ πΆππππππ π‘ (πΆπ, πΆπ)
9. 3. Problem Statement:
ο Given the properties of the application (in terms of its core graph)and NoC
architecture(in terms of topology graph),the optimum association between routers
and cores has to be so determined that the weighted communication cost(BW Γ
Hop-count) of the application and the peak temperature of the chip remain
minimum under a given routing mechanism.
ο The following are the inputs to the problem:
1. A task graph CG, representing the application.
2. A topology graph TG corresponding to the 2D NoC.
3. Power profile of each core.
4. Power profile of each router and link.
5. Floorplan for the NoC.
ο A core together with its corresponding router, forms a tile. The tiles are identified
by the routerβs ID. So each tile has an associated power profile, governed by the
associated IP-core, router and links.
ο The above mentioned problem has been solved using the Genetics Algorithm(GA).
10. 4. Why Genetic Algorithm(GA)?:
ο GA offers several advantages over other stochastic strategies for the optimization of
the application mapping problem like Simulated Annealing(SA) and Ant Colony
Optimization(ACO) .
ο In GA optimization, multiple solutions co-exist at any stage of the process, whereas,
SA progresses with only one solution. The solutions of GA are generally produced
faster than SA and ACO which use only limited population and resources.
ο Proposed GA based approach combines the local search method with the global
search method(guided search) to balance exploration and exploitation.
ο In GA approach, chromosomes( mapping solutions) do not die because of the local
best of a chromosome(solution) remains attached to that chromosome and gets
updated whenever a better solution identified by the solution.
ο But in SA, the population moves together in an unguided search and some
solutions are filtered out by the selection criteria. Similarly, in ACO, random paths
are selected for an ant(solution) and because of that solution takes time to
converge.
11. 5. GA formulation of Application Mapping Problem:
5.1.Chromosome structure and initial population generation:
ο The length of each chromosome is equal to the number of vertices in a core graph,
and the chromosome is en-coded into integer strings.
ο Each gene (vertex in core graph) in the chromosome contains an integer which
indicates a randomly chosen node in mesh topology, and the vertex can not overlap
each other.
ο A chromosome can efficiently be represented as an 1D-array, in which the indices
represent the router numbers, and the values of the cells represent the core
associated with the corresponding router. Thus, a chromosome is a permutation of
the numbers of cores in core graph
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
16 4 3 2 14 5 6 1 13 12 7 9 15 11 8 10
12. Chromosome structure and corresponding NoC Mapping
A chromosome conveniently can be viewed as a 1-D array in which chromosome[i]
notes down the core mapped to the ππ‘β router or node.
1 2 3 4 5 6 7 8 9 10 11 12 13 14
15 16
1
6
4 3 2 1
4
5 6 1 1
3
1
2
7 9 1
5
1
1
8 1
0
13. 5.2. Evaluation of Fitness value of Chromosome by calculating Objective Function:
β’ The Communication Cost between the core ππ and ππ is measured by-
πΆππππππ π‘ π,π = πππππ,π Γ ππ·(map ππ , map ππ )
β’ The total communication cost of a mapping solution is calculated as-
πΆππππΆππ π‘ = ππ,ππ βπΈ πΆππππππ π‘ πΆπ, πΆπ
β’ F_obj[i] = πΆππππΆππ π‘
β’ Fitness of ππ‘β chromosome:
Fitness[i]=1/(1+F_obj[i])
5.3. Chromosome Selection for Next Generation using Roulette Wheel:
β’ The fitness probability for ππ‘β
chromosome is formulated by:
P[i]=Fitness[i] / ( π=1
π
πΉππ‘πππ π [π])
β’ The cumulative probability for ππ‘β
chromosome can be formulated as:
πΆ[π] =
π=1
π
π[π]
14. Contd.
ο Algorithm for the Roulette wheel selection process:
begin
k ο 0;
while(k < population size) do
R[k] ο (0,1);
For(i=0 to population size) do
if(R[k]< C[i]) then
chromosome[k] ο chromosome[i];
break;
i=i+1;
end;
k=k+1;
end;
end;
15. 5.4. Crossover Operation over Chromosomes(Solutions):
ο For the crossover process, generated floating point random numbers between 0 to
1. Chromosome k will be selected as parent if R[k] < crossover rate.
ο After Chromosome selection as parent, position of crossover point is determined
by generating random integers between one to (numbers of cores in core graph-1).
ο Algorithm:
begin
k ο 0;
While (k<population size) do
R[k] ο random(0,1);
If( R[k]< crossover rate) then
Select chromosome[k] as parent;
k=k+1;
end;
end;
16. 5.5 Mutation operation over Chromosomes:
ο Number of chromosomes that have mutations in population is determined by the
mutation rate parameter.
ο In mutation process, exchange two members in chromosomes that are selected
randomly.
ο Total_members = number of cores in a chromosome Γ population size.
ο Mutation process is done by generating a random integer between 1 to
Total_Members. If generated random number is smaller than mutation rate then
marked the position of gene and it will be mutated.
ο Number of mutations = mutation rate Γ Total_members
ο Algorithm:
begin
k ο 0;
While(k < number of mutations) do
R[k] ο [1,total_members]; Integer random number
a ο Quotient of (R[k] / core_num);
select chromosome[a] for mutation;
b ο Remainder of (R[k] / core_num);
select position b in chromosome [a] for mutation;
k=k+1;
end;
end;
17. 6. Control Over GA Iterations:
ο In this approach, the GA has been run several times to improve upon the best
solution (ππ π’πππ) which has been found in previous iterations. At the end of the ππ‘β
iteration of the GA, let the best solution for the ππ‘β chromosome, found in this
iteration be ππππ π‘π
π
and the best solution found in previous n iterations be ππ π’ππππ .
In the (π + 1)π‘β
iteration of GA, it starts with a new set of chromosomes. However
the ππππ π‘π
π
and ππ π’ππππ solutions are passed on from ππ‘β to the (π + 1)π‘β
iteration of GA.
ο The maximum number of GA runs has been set as follows:
1. Either the number of GA iterations exceeds a user-define value. For this work,
this limit value is set to be 1000.
2. Or, fitness of the solution ππ π’ππππ which has been found in previous iterations
does not change in the last 30 runs.
18. 7. Genetic Algorithm Formulation of Temperature-Aware Mapping:
7.1. Temperature Calculation:
ο The primary source of heat generation in a chip is governed by the energy dissipation of
the tiles present in the silicon layer.
ο This heat generated in the silicon layer flows towards the heat sink through the following
heat transfer path(PHTP): Silicon layer β Thermal Interface-layer β Heat Spreader β
Heat Sink.
ο Each of these layers is divided into several smaller blocks, as in the block model of
Hotspot.
ο We have considered that each block in the Si-layer corresponds to a tile present in the
NoC. Thereby, if the NoC contains n tiles, the Si-layer is divided into n blocks.
ο Also, the other layers present in the PHTP, exactly below Si-layer are divided into similar
n-blocks. Therefore, a total of such (4 Γ n) number of blocks are present in the thermal
model.
ο In addition to those 4n blocks, the Heat Spreader layer contains 4 extra peripheral blocks
and the Heat Sink layer contains 8 extra peripheral blocks. Hence the total number of
blocks present in the thermal model of the chip (tot_blk) is (4 Γ n + 12).
ο The CTM works on the principle of duality between the thermal and the electrical
quantities.
19. Contd.
ο Thermal resistance along x, y and z directions:
ππ π₯ =
1
ππππ¦ππ
(0.5 Γ
π·π₯
π·π¦ Γ π·π§
)
ππ π¦ =
1
ππππ¦ππ
(0.5 Γ
π·π¦
π·π§ Γ π·π₯
)
ππ π§ =
1
ππππ¦ππ
(0.5 Γ
π·π§
2π·π₯ Γ π·π¦
)
ο Following equation is solved to determine the temperature matrix ([π]π‘ππ‘_πππΓ1) :
[πΆ]π‘ππ‘_πππΓtot_blk Γ π π‘ππ‘_πππΓ1= π π‘ππ‘_πππΓ1
20. 7.2 Fitness Calculation:
ο The fitness of each chromosome is evaluated using the following
expression:
πΉππ‘πππ π = π€ Γ
πΆππππΆππ π‘
πΆππππΆππ π‘πππ₯
+ 1 β π€ Γ (
ππππππΆβππ
ππππ₯πππ
)
ο When w=0, it minimizes the chip temperature, and w=1, it minimizes the
communication cost.
21. 8.Simulation Results:
8.1. Comparison of Communication cost for Benchmark Applications:
The applications are mapped onto 2-D mesh structures with mesh sizes
noted in Table I.
TABLE I
NoC Benchmarks and Their Mesh-Sizes
Benchmark NoCs No. Of Cores 2-D Mesh Size
DVOPD 32 8 Γ 4
VOPD 16 4 Γ 4
MPEG-4 12 4 Γ 4
PIP 8 4 Γ 2
MWD 12 4 Γ 4
263ENC MP3DEC 12 4 Γ 4
MP3ENC MP3DEC 13 4 Γ 4
263DEC MP3DEC 14 4 Γ 4
23. 8.2. Latency and Throughput for Benchmark Applications :
Used System-C based Noxim simulator to calculate network latency and throughput.
TABLE III
Noxim Settings
Parameters Values
Buffer Depth 6
Minimum and Maximum Packet Size 64 flits(32 flits per flit)
Routing Dimension ordered(XY)
Selection Logic Random
Warm-up Time 10000 Clk cycles
Simulation Time 20000 Clk cycles
Traffic Table based
28. Contd.
ο To check the applicability of the GA based thermal-aware mapping approach on larger scale, a
few task graphs are generated using TGFF tool.
TABLE VIII
Communication Cost and Peak Temperature Reduction for Different TGFF Task Graphs
Task Graphs Comm_Cost(Hops Γ BW) Peak Temp. Reduction(Kelvin)
Graph111 124732.77 92.55
Graph112 718853.43 92.09
Graph113 876083.87 96.97
Graph114 182443.65 92.56
Graph115 160572.93 94.38
Graph116 20306.87 92.37
Graph117 20306.87 97.57
Graph118 221245.67 90.66
29. 8.6. Trading-off Communication Cost and Peak Temperature:
ο A trade-off is established between NoC peak temperature and Communication Cost. Below
figure shows the trade-offs between communication cost and peak temperature for
benchmark application VOPD.
30. 8.7. Imposing Thermal Safety by Temperature Constraints:
ο In this experiment, thermal safety has been imposed by taking peak temperature as a
constraint. The experiment finds out the mapping solution that is suitable to the temperature
budget.
TABLE IX
Communication Cost and Peak Temperature Constraints
NoC Benchmark Applications
VOPD DVOPD
Tcons (Kelvin) Comm_Cost Tpeak(Kelvin) Tcons (Kelvin) Comm_Cost Tpeak (Kelvin)
361 3612 358.87 360 9427 356.38
359 4888 356.26 356 10486 354.23
356 4899 351.07 359 10510 357.10
31. 8.8. Dynamic Simulation Of Thermal-Aware Mapping:
ο For the simulation purpose, Noxim simulator has been used. Any NoC is expected to have high
throughput, while the latency is expected to be low.
TABLE X
Throughput and Latency of NoC Benchmarks
Benchmark NoCs Throughput(Flits/Cycle) Latency(Cycles)
DVOPD 83735.70 0.53
VOPD 82398.14 0.57
MPEG-4 79998.50 0.59
PIP 89475.20 0.63
MWD 89963.70 0.61
263ENC-MP3DEC 81997.10 0.58
32. 9.Conclusions:
ο Proposed mapping approach produces reasonable improvement in communication
cost compared to some of the previously reported strategies.
ο It can be noted from simulation results that, the proposed strategy performs better
compared to NMAP for the NoCs having higher number of cores.
ο The communication model used in proposed approach is assumed that each router
takes same amount of time to traverse through it. In practical, this may not be true.
ο Proposed thermal-aware mapping approach has been found to improve the
communication cost and peak temperature of the chip.
ο A trade-off has also been established between communication cost and peak
temperature , so that designers can choose the solution that suits their
requirement best.
ο Experimental results show that the proposed thermal-aware mapping approach
outperforms, those of many contemporary approaches, reported in the literature.
33. 10.Future Scope:
ο Proposed mapping strategy can be extended for mapping and routing for
NoC architectures with other network topologies like Ring, Torus topology
etc.
ο Proposed thermal-aware mapping approach can be extended for 3-D
structured mapping strategies targeting fault-tolerant and reliability-aware
mapping techniques for 2-D as well as 3-D NoC environments.
34. 11.References:
ο [1].S.Murali and G.De. Micheli,Bandwidth-constrained mapping of cores onto noc
architectures,design, Automation and test in Europe conference and exhibition, 2004. Proceedings,
vol. 2. Feb. 2004, pp. 896β901.
ο [2].Pradip Kumar Sahu, Kanchan Manna, Tapan Shah and Santanu Chattopadhyay, A Constructive
Heuristic for Application Mapping onto Mesh Based Network-on-Chip, Journal of Circuits, Systems,
and Computers Vol. 24, No. 8 (2015) 1550126 (29 pages)
ο [3]. P. K. Sahu and S. Chattopadhyay, A survey on application mapping strategies for network- on-
chip design, J. Syst. Archit., vol. 59, 2013,pp. 60β76.
ο [4].Application Mapping Onto Mesh-Based Network-on-Chip Using Discrete Particle Swarm
Optimization, Pradip Kumar Sahu, Tapan Shah, Kanchan Manna, and Santanu Chattopadhyay IEEE
transactions on very large scale integration (VLSI) systems, VOL. 22, NO. 2, February 2014.
ο [5].J. Hu and R. Marculescu,βEnergy-aware mapping for tile-based NoC architectures
underperformance constraints,βin Proc. Asia South Pacific Des. Autom. Conf., 2003,pp.233-239.
ο [6].M. Moazzen, A. Reza, and M. Reshadi, CoolMap: A Thermal-aware mapping algorithm for
application specific networks-on- chip, in Proc. Euromicro Conf. Digital Syst. Des., Sep. 2012,
pp. 731β734.
ο [7].D. Zhu, L. Chen, T. Pinkston, and M. Pedram, TAPP: Temperature- aware application mapping for
NoC-based many-core processors, in Proc. Des., Autom. Test Eur., 2015, pp. 1241β1244.
ο [8].W. Huang, S. Ghosh, S. Velusamy, K. Sankaranarayanan, K. Skadron, M Stan, Hotspot: a compact
thermal modeling methodology for early-stage vlsi design, very large scale integer, VLSI syst. IEEE
Trans, 14(5) (2006) 501-513.
ο [9]. http://mehransoft.ir/wp-content/uploads/2014/05/Noxim_User_Guide.pdf