4. Introduction
The Task Mapping Problem
3
t0 t1
t3
t4
t2
TASKS
t5
NP-Hard problem!
power consumption
communication profile
execution time
Filipo Novo Mór
5. Introduction
Brute-force algorithms are not feasible for solving
NP-Hard problems
Alternative: to use heuristic methods
Best solution possible, although there is no
guarantee the best global solution will be found
Evolutionary Algorithms
Differential Evolution (DE)
4Filipo Novo Mór
6. Introduction
Motivation
Previous works
Considering the DE features of:
Optimization of non-linear problems
Simplicity and flexibility of its code
Try finding a more efficient task mapping solver
using DE
5Filipo Novo Mór
7. Introduction
Objective
Implement a new elitist strategy on Single
Objective DE to efficiently solve the Task Mapping
onto NoC Problem
6Filipo Novo Mór
8. Theoretical Background
7Filipo Novo Mór
Introduction
Theoretical
Background
Related
Work
Project
Methodology
Experimental
Results
Conclusions
13. Theoretical Background
Differential Evolution (DE)
Filipo Novo Mór 12
vector
initialization
mutation recombination selection
vector initialization
Population is randomly initialized
Uniform probabilistic distribution
If a preliminary solution is available, must add distributed random
deviations to it
Each individual on the population represent a solution candidate
14. Theoretical Background
Filipo Novo Mór 13
𝑿𝒊,𝑮 , 𝐢 = {𝟏, 𝟐, … , 𝑵𝑷}
…
NP
Population
(solution candidate)1
(solution candidate)2
(solution candidate)3
(solution candidate)n
15. Theoretical Background
Differential Evolution (DE)
Filipo Novo Mór 14
mutation
generate a new mutate vector
a new parameter vector is generated by the DE by adding the
weighted difference between two population vectors to a third vector
vector
initialization
mutation recombination selection
18. Theoretical Background
Differential Evolution (DE)
Filipo Novo Mór 17
mutation
generate a new mutate vector
a new parameter vector is generated by the DE by adding the
weighted difference between two population vectors to a third vector
the resulting vector will be used as a donor on the next step
keeps pacing throughout the solution space
vector
initialization
mutation recombination selection
19. Theoretical Background
Differential Evolution (DE)
Filipo Novo Mór 18
recombination
enhance the Population diversity
keep track of good candidate solutions from previous generations
vector
initialization
mutation recombination selection
20. Theoretical Background
Filipo Novo Mór 19
𝑽 𝒊,𝑮+𝟏
𝑿 𝒊,𝑮
𝑼 𝒋,𝒊,𝑮+𝟏
D
𝑼𝒋,𝒊,𝑮+𝟏 =
𝑽𝒋,𝒊,𝑮+𝟏 if 𝒓𝒂𝒏𝒅𝒋,𝒊 ≤ 𝑪𝑹
𝑿𝒋,𝒊,𝑮 if 𝒓𝒂𝒏𝒅𝒋,𝒊 > 𝑪𝑹
i = 1, 2, … , 𝑁𝑃
j = 1, 2, … , 𝐷
𝑉𝑖,𝐺+1 ≠ 𝑋𝑖,𝐺
21. Theoretical Background
Differential Evolution (DE)
Filipo Novo Mór 20
selection
only the best individuals will be kept in the Population
vector
initialization
mutation recombination selection
22. Theoretical Background
Filipo Novo Mór 21
𝑿 𝒊,𝑮
𝑼 𝒋,𝒊,𝑮+𝟏
…
Population
Xi,G
(solution candidate)2
(solution candidate)3
(solution candidate)n
Uj,i,G+1
23. Theoretical Background
Filipo Novo Mór 22
A - Population
Initialization
Is ui,G+1
better
than xi,G
?
H - Update
Population
B – Population
Evaluation
C - Select
xr1,G, xr2,G and xr3,G
D - Mutation E - Recombination F - Evaluates ui,G+1
no
yes
repeat for n generations
for each individual i in the Population, repeatI - Select
Dominant
Solutions from
Archieve
G
DE – complete steps
24. Theoretical Background
Population Evaluation on DE
Filipo Novo Mór 23
≅ 𝑶 𝒏 𝟐
how deep would be the
impact on the overall
performance?
X2
X1
25. Filipo Novo Mór 24
Theoretical Background
0
200
400
600
800
1000
1200
1400
1600
50 100 500 1000 2000 5000 7500 10000
milliseconds
N
Dominance Algorithms
Execution Time
M&S BF Naive BF Smart
0
1
2
3
4
5
6
50 100 500 1000 2000 5000 7500 10000
milliseconds
N
Mishra & Sandeep Dominance Algorithm
Execution Time
3
5
21
32
63
146
210
287
0 50 100 150 200 250 300 350
50
100
500
1000
2000
5000
7500
10000
Speedup
N
M&S Dominance Algorithm Tested algorithms:
Brute Force “Naïve”: N2 two independent nested loops.
Brute Force “Smart”: N2 two dependent nested loops.
Mishra & Sandeep: heapsort + 1 outer loop with a dynamic
variant linked list.
Tested in a I5 CPU, 8GB RAM, running Kubuntu 14.04. All tests
performed using “nice -20” prioritization.
To generate the data set:
𝑓1 = 1 − 𝑥2, 𝑥 = 𝑟𝑎𝑛𝑑48()
𝑓2 = 1 − 𝑥2, 𝑥 = 𝑟𝑎𝑛𝑑48()
26. Filipo Novo Mór 25
Theoretical Background
Managing the DE archive
truncate the archive using
the Crowding Distance metric
Kumar and Kesavan, 2015
28. Theoretical Background
NASA Numerical Aerodynamic Simulation (NAS)
Filipo Novo Mór 27
CG - Conjugate Gradient, irregular memory access and communication
FT - discrete 3D fast Fourier Transform, all-to-all communication
IS - Integer Sort, random memory access
LU - Lower-Upper Gauss-Seidel solver. Large number of short messages
MG - Multi-Grid on a sequence of meshes, long- and short-distance
communication, memory intensive
These applications were selected because they have task
communication based profiles. Therefore they are ideal for the
purposes of this work.
29. Related Work
28Filipo Novo Mór
Introduction
Theoretical
Background
Related
Work
Project
Methodology
Experimental
Results
Conclusions
30. Related Work
• J. R. Ku and S. G. Ku [34]
• Two phases:
• clustered high communicating tasks into partitions
• Used NSGA-II algorithm
• Mapped these partitions onto NoC processors.
• Tried to keep high communicating partitions close to each other
• Used a second version of the NSGA-II algorithm
• 15% more efficient then Physical Mapping Algorithm
• C. Deng et al. [41]
• Changed the classical DE
• Included a sorting step before chromosomes recombination
• For high-level task graphs, free of a target hardware architecture
Filipo Novo Mór 29
31. Related Work
• Sen Zhao et al. [45]
• Proposed a MODE using an adaptative mutation operator.
• The strategy is changed during runtime to try achieving better solutions on the
fly
• The resulting vector is now compared with the whole population, not only with
your ’father’
• Tested using benchmark ZDT functions only
• D. Das, M. Verma and A. Das [58]
• Hardware/software partitioning problem using DE
• Objective functions: execution time, area cost and communication cost
• DE ran 16% faster than PSO
• Quality of acieved solutions were not described
• Zhuo Qingqi et al. [51]
• Solving Task Mapping problem combining two evolutionary algorithms (not DE)
• Parallel approach for searching the solution space
• MPEG-4 and VOPD (Video Objective Plane Decoder) benchmark applications
• Saves 13% on energy and is 3% more efficient in communication latency
Filipo Novo Mór 30
32. Project Methodology
31Filipo Novo Mór
Introduction
Theoretical
Background
Related
Work
Project
Methodology
Experimental
Results
Conclusions
33. Project Methodology
Filipo Novo Mór 32
E A C
F B
D
0
1
2
0 1 2
0 1 2
3 4 5
6 7 8
resulting task map
E A C F B D
0 1 2 3 4 5 6 7 8
chromosomes
individual
0
1
2
0 1 2
0 1 2
3 4 5
6 7 8
task mapping step
A
C
E
B
D
F
5
5
3
2
5
3
4
1
34. Project Methodology
Filipo Novo Mór 33
E A C F B D
B A C D F E
C A B E D F
F D A B E C
0 1 2 3 4 5 6 7 8
0
1
2
3
0
1
2
0 1 2
0 1 2
3 4 5
6 7 8
A
C
E
B
D
F
5
5
3
2
5
3
4
1
35. Project Methodology
Data Structures Modelling
Filipo Novo Mór 34
0 0 3 4 2
1 3 2 4 4
4 2 2 1 0
3 4 0 1 1
t0 t1 t2 t3 t4
Populationsize(NP)Population Dimension (D)
0
1
2
0 1 2
0 1 2
3 4 5
6 7 8
D = number of existing tasks
Adherent to SODE and MODE
38. Project Methodology
Modifying DE: rewarding “good” individuals
Identify most communicating tasks
proposal 1:
Reward individuals keeping most communicating tasks
near to each other
Proposal 2:
Try generate “good” individuals during mutation or
recombination operations
Filipo Novo Mór 37
39. Project Methodology
Identifying most communicating tasks
Filipo Novo Mór 38
A
C
E
B
D
F
5
5
3
2
5
3
4
1
A, B: 5
A, C: 5
B, D: 3
D, F: 1
F, D: 4
C, E: 5
E, A: 3
E, B: 2
A, B: 5
A, C: 5
B, D: 3
D, F: 1+4
C, E: 5
E, A: 3
E, B: 2
A, B, C,E: 5+5+3
B, D, A, E: 3+5+2
D, F, B: 5+3
C, E, A: 5+5
E, A, C, B: 3+5+2
A, B, C,E: 13
B, D, A, E: 10
D, F, B: 8
C, E, A: 10
E, A, C, B: 10
tA, tB, tC and tE
40. Project Methodology
Proposal 1
Filipo Novo Mór 39
Ideal bonus value is 10%
Different bonus values tend to
stuck the evolution (no more
convergence is reach)
On average, ±14% of solutions at
the final Generation had been
rewarded
41. Project Methodology
Proposal 2
Filipo Novo Mór 40
Proposal 2 was halted:
No more convergence after 4
generations on average
Too few tasks? Too small NoC?
42. Project Methodology
Validating the DE (Single Objective)
Filipo Novo Mór 41
SO_Proc36_T36_CR0_50_F0_40_Gen1000_Noc6_6_1_Pop20_Test2016061716017308_ft32x1_v2ap01
46. Experimental Results
45Filipo Novo Mór
Introduction
Theoretical
Background
Related
Work
Project
Methodology
Experimental
Results
Conclusions
47. Experimental Results
Parameters Range
NP 10 and 20
G 100, 300, 500, 100, 5000 and 10000
CR 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 06, 0.7, 0.8 and 0.9
F 0.05, 0.1, 0.2, 0.3, 0.4, 0.5, 06, 0.7, 0.8 and 0.9
Filipo Novo Mór 46
Single Objective DE
NASA NAS applications: IS, CG, FT, MG, LU
Each test case was executed at least 30 times
Goal: reduce communication volume
50. Experimental Results
Filipo Novo Mór 49
Single Objective DE – NASA NAS benchmark
5120
11377 11556
5040 5147
0
2000
4000
6000
8000
10000
12000
14000
CG FT IS LU MG
Average Execution Time
by benchmark application
51. Experimental Results
Filipo Novo Mór 50
SODE vs CAFES – NASA NAS benchmark
NASA NAS applications: IS, CG, FT, MG, LU
Each test case was executed at least 30 times
CAFES was set to the best execution parameters found during
preparation tests.
The same formula was used by CAFES and SODE to calculate the
fitness value
The comparison focused on the quality of the best candidate solutions
The comparison considered the five best candidate solutions of each
test case for both tested algorithms
52. Experimental Results
Filipo Novo Mór 51
SODE vs CAFES – NASA NAS benchmark
969114
989330
2616473
3020149
1124121
1109178
3858343
2503478
655376
485965
SODE
CAFES
SODE
CAFES
SODE
CAFES
SODE
CAFES
SODE
CAFES
CGFTISLUMG
SODE vs CAFES
Top 5 Best Solutions - Mean Values
SODE CAFES SODE CAFES SODE CAFES SODE CAFES SODE CAFES
CG FT IS LU MG
8766 11537 9954
100 914 574
92400
51726
14357
4950
SODE vs CAFES
Top 5 Best Solutions - Standard Deviation
Mean Values: absolute scalar value for the communication volume
Standard Deviation: how close are the best solutions from each other
54. Conclusions
A new adaptation for the SODE was proposed,
rewarding individuals who kept related
communicating tasks close to each other
Testes were executed using the NASA NAS
benchmark, showing our implementation was able
to generate feasible solutions.
Our algorithm was compared to the SA
implementation existing on the CAFES Framework.
Our implementation reached better solutions on two
of five benchmark applications; achieve similar
results on one application. CAFES achieved better
solution on other two tested applications
Our implementation has proved to be important on
solving the Task Mapping onto NoC problem,
specially for applications with similar NASA NAS
message exchange profiles
Filipo Novo Mór 53
55. Filipo Novo Mór
Supervisor: Dr. César Augusto Missio Marcon
Co-supervisor: Dr. Andrew Rau-Chaplin
2016, August 18th
www.filipomor.com
master thesis defense
Thank you!