SlideShare a Scribd company logo
1 of 55
MCSoC-16
Communication-Based Power Modelling for Heterogeneous
Multiprocessor Architectures
Baptiste Roux, Matthieu Gautier, Olivier Sentieys, Steven Derrien
September 22, 2016
INRIA, DGA, IRISA, Univ-Rennes 1
Context
Embedded architectural trend
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
Embedded architectural trend
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
Embedded architectural trend
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
Embedded architectural trend
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
Embedded architectural trend
1Li et al., “McPAT: An integrated power, area, and timing modeling framework for
multicore and manycore architectures”.
2Binkert et al., “The Gem5 Simulator”.
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
HMpSoC issues
• Complex and hard to solve HW/SW partionning and task mapping
• Estimate power consumption early in design flow is mandatory
• Huge design space
⇒ Available power modelling tools1,2
are not well-adapted to complex
multicores
Embedded architectural trend
1Li et al., “McPAT: An integrated power, area, and timing modeling framework for
multicore and manycore architectures”.
2Binkert et al., “The Gem5 Simulator”.
B. Roux 1
Embbeded market trend
• Computation
• Power consumption
HMpSoC issues
• Complex and hard to solve HW/SW partionning and task mapping
• Estimate power consumption early in design flow is mandatory
• Huge design space
⇒ Available power modelling tools1,2
are not well-adapted to complex
multicores⇒ Need a fast power modelling tool
Outline
Heterogeneous MpSoC
• Definition of HMpSoC families
• Generic representation focused on memory
Communication-based power model
• Fast power modelling approach for task-mapping on HMpSoC
Model parameter extraction
• µBenchmarking methodology to ease power model’s parameter
extraction
Validation on Xilinx Zynq
• Power model parameter extraction with µBenchmark
• Power model output validation with mutant applications
B. Roux 2
Heterogeneous MpSoC
Representative architectures
B. Roux 3
Representative architectures
B. Roux 3
Kalray MPPA
Representative architectures
B. Roux 3
Representative architectures
B. Roux 3
TILERA Tile-Gx
Representative architectures
B. Roux 3
Representative architectures
B. Roux 3
Xilinx Zynq
Representative architectures
B. Roux 3
Representative architectures
B. Roux 3
Model
Representative architectures
B. Roux 3
Model
Heterogeneous MpSoC
B. Roux 4
HMpSoC family
• Distributed HMpSoC: small HW accelerators, fast communications
with SW
• Shared HMpSoC: large HW accelerators shared between clusters, slow
communications with SW
HW
Memory
unit
processor
unit 1
processor
unit n
NoC
NoC
N
o
C
...
...
...
...
SW
1
NoC_itf
Memory
unit
SW
...
processor
unit 1MEMORY
SW
2
SW
N
GPIO
DDR
N
o
C
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
HW
SW
1
NoC_itf
PU
...
MEMORY
SW
2
PU
N
HW
SW
1
NoC_itf
SW
...
MEMORY
SW
2
SW
N
HW
SW
1
NoC_itf
SW
...
MEMORY
SW
2
SW
N
HW
Memory
unit
hardware
unit A
processor
unit 1
processor
unit n
NoC
NoC
N
o
C
...
...
...
...
NoC_itf
Memory
unit
processor
unit 1MEMORY
GPIO
DDR
PU
0
N
o
C
GPIO
DDR
GPIO
DDR
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
MEMORY
PU
0
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
MEMORY
PU
0
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
Distributed HMpSoC Shared HMpSoC
Heterogeneous MpSoC
B. Roux 4
HMpSoC family
• Distributed HMpSoC: small HW accelerators, fast communications
with SW
• Shared HMpSoC: large HW accelerators shared between clusters, slow
communications with SW
HW
Memory
unit
processor
unit 1
processor
unit n
NoC
NoC
N
o
C
...
...
...
...
SW
1
NoC_itf
Memory
unit
SW
...
processor
unit 1MEMORY
SW
2
SW
N
GPIO
DDR
N
o
C
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
Memory
unit
processor
unit 1
processor
unit n
Memory
unit
processor
unit 1
GPIO
DDR
HW
SW
1
NoC_itf
PU
...
MEMORY
SW
2
PU
N
HW
SW
1
NoC_itf
SW
...
MEMORY
SW
2
SW
N
HW
SW
1
NoC_itf
SW
...
MEMORY
SW
2
SW
N
HW
Memory
unit
hardware
unit A
processor
unit 1
processor
unit n
NoC
NoC
N
o
C
...
...
...
...
NoC_itf
Memory
unit
processor
unit 1MEMORY
GPIO
DDR
PU
0
N
o
C
GPIO
DDR
GPIO
DDR
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
MEMORY
PU
0
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
MEMORY
PU
0
PU
1
PU
2
PU
3
PU
...
PU
N
NoC_itf
Generic description
How to precisely describe an architecture in those families?
Memory Tree Abstraction
B. Roux 5
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
Network
Classe
Cluster
Classe
Core
Classe
...
Sw Core
N
... ...Hw Core Hw Core
Communication-based power
model
Motivation
HMpSoC energy consumption
Three main sources:
• Dynamic energy consumption used for computations
• Static energy dissipated during execution time
• Energy used for communications between cores
Assumptions
A parallelisable application could be executed on multiple threads,
reducing the execution time but not its complexity:
• Amount of computations is independent of chosen parallelism degree
• Amount of communications and synchronizations is directly linked to
the number of execution threads
B. Roux 6
Power Model Structure (1)
Communication energy cost
• Communications are map into memory
• C(Tki , Tkj ): set of communication channels crossed from task Tki
to task Tkj
Ecom(Tki , Tkj ) =
c∈C(Tki ,Tkj )
e0c + e1c × bytes(Tki , Tkj )
Note: Synchronization and IO events are managed as communications.
Computation energy cost
Ecomp(Tkk ): computed once for each kind of available computational
cores.
B. Roux 7
Power Model Structure (1)
B. Roux 7
H
A
R
D
W
A
R
E
S
U
I
T
A
B
L
E
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
(42)
(34)
H
A
R
D
W
A
R
E
S
U
I
T
A
B
L
E
(10)
(30)
Load
Store
Basic power Bloc
instructions
(95)
(17)
Power Model Structure (1)
B. Roux 7
H
A
R
D
W
A
R
E
S
U
I
T
A
B
L
E
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
Load
Store
Basic power Bloc
instructions
(42)
(34)
H
A
R
D
W
A
R
E
S
U
I
T
A
B
L
E
(10)
(30)
Load
Store
Basic power Bloc
instructions
(95)
(17)
NoC
Network
Classe
Cluster
Classe
Core
Classe
Cluster A Cluster B
CLUSTER MEMORY
SW MEMORY
CLUSTER MEMORY
SW MEMORY
HW
A
SW
A1
SW
B
HW
B
Power Model Structure (1)
B. Roux 7
Load
Store
SW
A
Load
Store
SW
A
Load
Store
HW
A
Load
Store
SW
B
(42)
(34)
(10)
(30)
Load
Store
HW
B
(95)
(17)
NoC
Network
Classe
Cluster
Classe
Core
Classe
Cluster A Cluster B
CLUSTER MEMORY
SW MEMORY
CLUSTER MEMORY
SW MEMORY
HW
A
SW
A1
SW
B
HW
B
Power Model Structure (1)
B. Roux 7
Load
Store
SW
A
Load
Store
SW
A
Load
Store
HW
A
Load
Store
SW
B
(42)
(34)
(10)
(30)
Load
Store
HW
B
(95)
(17)
NoC
Network
Classe
Cluster
Classe
Core
Classe
Cluster A Cluster B
CLUSTER MEMORY
SW MEMORY
CLUSTER MEMORY
SW MEMORY
HW
A
SW
A1
SW
B
HW
B
Power Model Structure (1)
B. Roux 7
Load
Store
SW
A
Load
Store
SW
A
Load
Store
HW
A
Load
Store
SW
B
(42)
(34)
(10)
(30)
Store
HW
B
(95)
(17)
Load
NoC
Network
Classe
Cluster
Classe
Core
Classe
Cluster A Cluster B
CLUSTER MEMORY
SW MEMORY
CLUSTER MEMORY
SW MEMORY
HW
A
SW
A1
SW
B
HW
B
Power Model Structure (1)
Communication energy cost
• Communications are map into memory
• C(Tki , Tkj ): set of communication channels crossed from task Tki
to task Tkj
Ecom(Tki , Tkj ) =
c∈C(Tki ,Tkj )
e0c + e1c × bytes(Tki , Tkj )
Note: Synchronization and IO events are managed as communications.
Computation energy cost
Ecomp(Tkk ): computed once for each kind of available computational
cores.
B. Roux 7
Power Model Structure (2)
Static energy cost
Estat = Texec × Pstat
where Pstat is the static power, Texec is the critical path in the mapping
graph weighted with computations and communications.
Global energy cost
Et = Estat +
k∈NTk
Ecomp(Tkk ) +
(i,j)∈N2
Tk
Ecom(Tki , Tkj )
B. Roux 8
Power Model Structure (2)
Static energy cost
Estat = Texec × Pstat
where Pstat is the static power, Texec is the critical path in the mapping
graph weighted with computations and communications.
Global energy cost
Et = Estat +
k∈NTk
Ecomp(Tkk ) +
(i,j)∈N2
Tk
Ecom(Tki , Tkj )
B. Roux 8
µBenchmarks
µBenchmarks purpose
Definition
A µBenchmark is a simple and synthetic application that aims at
stressing a specific part of the execution architecture.
Properties
• Selectivity: µbenchs only stress a specific communication channel
• Intensity variability: µbenchs stress a communication channel with
different intensity
• Duration variability: µbenchs duration is adapted to match power
measurement timing resolution
B. Roux 9
µBenchmark structure
General structure
• InterCluster
• IntraCluster
• HwChannel
• SwChannel
B. Roux 10
Algorithm 1: Generic µBenchmark structure.
Data: scaleFactor, size
initBenchmarkEnv()
startPowerMeasure()
for iteration in scaleFactor do
openCommunicationChannel()
producer = spawnProducerThread(size)
consummer = spawnConsumerThread(size)
waitThread(producer, consumer)
closeCommunicationChannel()
end
stopPowerMeasure()
writePowerMeasureToFile()
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
Network
Classe
Cluster
Classe
Core
Classe
...
Sw Core
N
... ...Hw Core Hw Core
µBenchmark structure
General structure
• InterCluster
• IntraCluster
• HwChannel
• SwChannel
B. Roux 10
Algorithm 1: Generic µBenchmark structure.
Data: scaleFactor, size
initBenchmarkEnv()
startPowerMeasure()
for iteration in scaleFactor do
openCommunicationChannel()
producer = spawnProducerThread(size)
consummer = spawnConsumerThread(size)
waitThread(producer, consumer)
closeCommunicationChannel()
end
stopPowerMeasure()
writePowerMeasureToFile()
Network
Classe
Cluster
Classe
Core
Classe
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
...
Sw Core
N
... ...Hw Core Hw Core
Memory
sublevel1
NoC
sublevel0
NoC
sublevel1
Memory
sublevel1
µBenchmark structure
General structure
• InterCluster
• IntraCluster
• HwChannel
• SwChannel
B. Roux 10
Algorithm 1: Generic µBenchmark structure.
Data: scaleFactor, size
initBenchmarkEnv()
startPowerMeasure()
for iteration in scaleFactor do
openCommunicationChannel()
producer = spawnProducerThread(size)
consummer = spawnConsumerThread(size)
waitThread(producer, consumer)
closeCommunicationChannel()
end
stopPowerMeasure()
writePowerMeasureToFile()
Network
Classe
Cluster
Classe
Core
Classe
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
...
Sw Core
N
... ...Hw Core Hw Core
Memory
sublevel1
Memory
sublevel0
Memory
sublevel1
Memory
sublevel0
µBenchmark structure
General structure
• InterCluster
• IntraCluster
• HwChannel
• SwChannel
B. Roux 10
Algorithm 1: Generic µBenchmark structure.
Data: scaleFactor, size
initBenchmarkEnv()
startPowerMeasure()
for iteration in scaleFactor do
openCommunicationChannel()
producer = spawnProducerThread(size)
consummer = spawnConsumerThread(size)
waitThread(producer, consumer)
closeCommunicationChannel()
end
stopPowerMeasure()
writePowerMeasureToFile()
Network
Classe
Cluster
Classe
Core
Classe
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
...
Sw Core
N
... ...Hw Core Hw Core
Memory
sublevel1
Memory
sublevel0
Sw Core
N
Memory
sublevel1
Memory
sublevel0
Hw Core Hw Core
µBenchmark structure
General structure
• InterCluster
• IntraCluster
• HwChannel
• SwChannel
B. Roux 10
Algorithm 1: Generic µBenchmark structure.
Data: scaleFactor, size
initBenchmarkEnv()
startPowerMeasure()
for iteration in scaleFactor do
openCommunicationChannel()
producer = spawnProducerThread(size)
consummer = spawnConsumerThread(size)
waitThread(producer, consumer)
closeCommunicationChannel()
end
stopPowerMeasure()
writePowerMeasureToFile()
Network
Classe
Cluster
Classe
Core
Classe
Sw Core
1
Memory
sublevel1
NoC
sublevel0
Memory
sublevel0
NoC
sublevel1
Sw Core
1
Sw Core
N
Memory
sublevel1
Memory
sublevel0
...
Sw Core
N
... ...Hw Core Hw Core
Sw Core
1
Memory
sublevel0
Sw Core
1
Sw Core
N
Memory
sublevel0
Sw Core
N
... ...
Power Modelling of Zynq
Experimental infrastructure
Zynq architecture
Virtual
Memory
space
Processing System
Progammable
Logic
PS7_0 PS7_1
SCU
A
M
B
A
Cache L2
Cache L1 Cache L1
DDR
A
M
B
A
interconnect
ext
P1
ext
P0
ext
Pn
HP1
HP2
HP3
HP4
GP0
GP1
GP2
GP3
.
.
. IRQ
IRQ0
...
IRQ15
A
C
P
Advanced
coherency
protocol
interconnect
Experimental setup
• Board: Xilinx Zc702
• OS: Linux kernel v4.0.0
• Power measurement: TI UCD92xx, PMBus
controlled, 5ms resolution, 7 rails
B. Roux 11
Parameters Extraction (1): CL2 write
B. Roux 12
Power:
300 400 500 600 700 800 900 1000
[size]
5.0
5.5
6.0
6.5
7.0
7.5
[P]
1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps)
300 400 500 600 700 800 900 1000
[size]
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
[P]
1e 1 benchmark CL2writepowerRails: VccPAux(PS)
300 400 500 600 700 800 900 1000
[size]
2.2
2.4
2.6
2.8
3.0
3.2
3.4
[P]
1e 1 benchmark CL2writepowerRails: VccPInt(PS)
rails A rails B rails C
Parameters Extraction (1): CL2 write
B. Roux 12
Power:
300 400 500 600 700 800 900 1000
[size]
5.0
5.5
6.0
6.5
7.0
7.5
[P]
1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps)
300 400 500 600 700 800 900 1000
[size]
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
[P]
1e 1 benchmark CL2writepowerRails: VccPAux(PS)
300 400 500 600 700 800 900 1000
[size]
2.2
2.4
2.6
2.8
3.0
3.2
3.4
[P]
1e 1 benchmark CL2writepowerRails: VccPInt(PS)
rails A rails B rails C
Time:
300 400 500 600 700 800 900 1000
[size]
2
3
4
5
6
7
8
[S]
1e 5 benchmark CL2writepowerRails: timeIt
Parameters Extraction (1): CL2 write
B. Roux 12
Power:
300 400 500 600 700 800 900 1000
[size]
5.0
5.5
6.0
6.5
7.0
7.5
[P]
1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps)
300 400 500 600 700 800 900 1000
[size]
1.00
1.05
1.10
1.15
1.20
1.25
1.30
1.35
[P]
1e 1 benchmark CL2writepowerRails: VccPAux(PS)
300 400 500 600 700 800 900 1000
[size]
2.2
2.4
2.6
2.8
3.0
3.2
3.4
[P]
1e 1 benchmark CL2writepowerRails: VccPInt(PS)
rails A rails B rails C
Time:
300 400 500 600 700 800 900 1000
[size]
2
3
4
5
6
7
8
[S]
1e 5 benchmark CL2writepowerRails: timeIt
Energy:
300 400 500 600 700 800 900 1000
Size [bytes]
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
Energy[J]
1e 4
CL2write
Energy curves
Parameters Extraction (2)
B. Roux 13
300 400 500 600 700 800 900 1000
Size [bytes]
0.2
0.4
0.6
0.8
Energy[J] 1e 4
CL1read
CL1read_burst
CL2read
CL2read_burst
CL1write
CL1write_burst
CL2write
CL2write_burst
Energy curves
Parameters Extraction (2)
3Results over 22 benchmarks are available in the paperB. Roux 13
300 400 500 600 700 800 900 1000
Size [bytes]
0.2
0.4
0.6
0.8
Energy[J]
1e 4
CL1read
CL1read_burst
CL2read
CL2read_burst
CL1write
CL1write_burst
CL2write
CL2write_burst
Energy curves
3
Time [s] Energy [J]
Benchmark f : x → t1x + t0 f : x → e1x + e0
t1 t0 e1 e0
CL1 read 1.82e-08 -2.95e-08 1.52e-09 -2.45e-09
CL1 read burst 1.02e-08 6.68e-09 8.40e-10 5.50e-10
CL1 write 6.03e-08 3.12e-07 4.72e-09 2.44e-08
CL1 write burst 5.05e-08 -3.73e-07 3.73e-09 -2.75e-08
CL2 read 1.76e-08 -2.69e-08 1.51e-09 -2.30e-09
CL2 read burst 9.62e-09 3.19e-07 8.01e-10 2.66e-08
CL2 write 7.19e-08 -5.46e-09 1.70e-08 -1.29e-09
CL2 write burst 5.08e-08 -4.14e-07 3.71e-09 -3.02e-08
Validation on mutant applications (1)
Mutant application
• Abstract application automatically generated from pattern functions
• Randomly generates communication traffic
Mutant generation
• (n + 1) Rounds per application
• 3 workers per Round
• 12 Software patterns
• 6 Hardware patterns
B. Roux 14
Round 0
Round 1
Round n
SW slotB
random size
and
pattern function
SW slotA
random size
and
pattern function
HW slotA
random size
and
pattern function
SW slotB
random size
and
pattern function
SW slotA
random size
and
pattern function
HW slotA
random size
and
pattern function
SW slotB
random size
and
pattern function
SW slotA
random size
and
pattern function
HW slotA
random size
and
pattern function
Validation on mutant applications (2)
B. Roux 15
Table 1: Communications spread over channels in two mutants
Total bytes
Channel name
Cache L1 Cache L2 DDR HPx ACP GPx
4.56e+07
read 6.6% read 1.3% read 1.3%
read 1.2% read 0.6% polling 3.5%
read burst 0.6% read burst 5.5% read burst 18.0%
write 6.8% write 2.5% write 1.1%
write 2.0% write 0.4% irq 0.5%
write burst 6.6% write burst 0.0% write burst 41.3%
5.37e+07
read 6.7% read 2.1% read 4.7%
read 2.9% read 4.8% polling 2.0%
read burst 4.7% read burst 0.2% read burst 10.0%
write 5.0% write 0.6% write 0.0%
write 7.4% write 2.0% irq 0.5%
write burst 4.7% write burst 6.8% write burst 35.0%
Table 2: Estimation vs. measures
mutantRank
Time [s] Energy [J] Error
measured estimated measured estimated time energy
mutant 1 2.308 2.311 2.949 2.943 0.1% 0.2%
mutant 2 2.340 2.336 3.031 2.964 0.2% 2.2%
average on 80 mutants 2.974 2.975 3.855 3.861 0.5 % 1.0 %
Power estimation time for 80 mutants 0.5s
Conclusion
Initial issue
Provide a very-fast power modelling methodology for task-mapping in
Heterogeneous MpSoC
Proposals
• Generic model of Heterogeneous MpSoC
• Power modelling approach focused on communication channels
• µBenchmark approach that enable architecture’s parameters
extraction
Ongoing work
• Integrate this methodology in state of the art compiler frameworks4,5
• Towards HW/Sw partitioning for HMpSoC under energy efficiency
constraint
4Floch et al., “GeCoS: A framework for prototyping custom hardware design flows”.
5Ceng et al., “MAPS: An Integrated Framework for MPSoC Application
Parallelization”.
B. Roux 16
Conclusion
Initial issue
Provide a very-fast power modelling methodology for task-mapping in
Heterogeneous MpSoC
Proposals
• Generic model of Heterogeneous MpSoC
• Power modelling approach focused on communication channels
• µBenchmark approach that enable architecture’s parameters
extraction
⇒ Estimation accuracy and time fit well with task-mapping
Ongoing work
• Integrate this methodology in state of the art compiler frameworks4,5
• Towards HW/Sw partitioning for HMpSoC under energy efficiency
constraint
4Floch et al., “GeCoS: A framework for prototyping custom hardware design flows”.
5Ceng et al., “MAPS: An Integrated Framework for MPSoC Application
Parallelization”.
B. Roux 16
Thanks for your attention
Do you have any questions?
B. Roux 16
Backup slides
Zynq architecture
B. Roux
Virtual
Memory
space
Processing System
Progammable
Logic
PS7_0 PS7_1
SCU
A
M
B
A
Cache L2
Cache L1 Cache L1
DDR
A
M
B
A
interconnect
ext
P1
ext
P0
ext
Pn
HP1
HP2
HP3
HP4
GP0
GP1
GP2
GP3
.
.
. IRQ
IRQ0
...
IRQ15
A
C
P
Advanced
coherency
protocol
interconnect
Communication patterns
B. Roux

More Related Content

Recently uploaded

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTbhaskargani46
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfJiananWang21
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesPrabhanshu Chaturvedi
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Call Girls in Nagpur High Profile
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 

Recently uploaded (20)

Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
Generative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPTGenerative AI or GenAI technology based PPT
Generative AI or GenAI technology based PPT
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Glass Ceramics: Processing and Properties
Glass Ceramics: Processing and PropertiesGlass Ceramics: Processing and Properties
Glass Ceramics: Processing and Properties
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
NFPA 5000 2024 standard .
NFPA 5000 2024 standard                                  .NFPA 5000 2024 standard                                  .
NFPA 5000 2024 standard .
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Ramesh Nagar Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 

Featured

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by HubspotMarius Sescu
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTExpeed Software
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsPixeldarts
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summarySpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best PracticesVit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project managementMindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36
 

Featured (20)

2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot2024 State of Marketing Report – by Hubspot
2024 State of Marketing Report – by Hubspot
 
Everything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPTEverything You Need To Know About ChatGPT
Everything You Need To Know About ChatGPT
 
Product Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage EngineeringsProduct Design Trends in 2024 | Teenage Engineerings
Product Design Trends in 2024 | Teenage Engineerings
 
How Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental HealthHow Race, Age and Gender Shape Attitudes Towards Mental Health
How Race, Age and Gender Shape Attitudes Towards Mental Health
 
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 

Slides McSoC

  • 1. MCSoC-16 Communication-Based Power Modelling for Heterogeneous Multiprocessor Architectures Baptiste Roux, Matthieu Gautier, Olivier Sentieys, Steven Derrien September 22, 2016 INRIA, DGA, IRISA, Univ-Rennes 1
  • 3. Embedded architectural trend B. Roux 1 Embbeded market trend • Computation • Power consumption
  • 4. Embedded architectural trend B. Roux 1 Embbeded market trend • Computation • Power consumption
  • 5. Embedded architectural trend B. Roux 1 Embbeded market trend • Computation • Power consumption
  • 6. Embedded architectural trend B. Roux 1 Embbeded market trend • Computation • Power consumption
  • 7. Embedded architectural trend 1Li et al., “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures”. 2Binkert et al., “The Gem5 Simulator”. B. Roux 1 Embbeded market trend • Computation • Power consumption HMpSoC issues • Complex and hard to solve HW/SW partionning and task mapping • Estimate power consumption early in design flow is mandatory • Huge design space ⇒ Available power modelling tools1,2 are not well-adapted to complex multicores
  • 8. Embedded architectural trend 1Li et al., “McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures”. 2Binkert et al., “The Gem5 Simulator”. B. Roux 1 Embbeded market trend • Computation • Power consumption HMpSoC issues • Complex and hard to solve HW/SW partionning and task mapping • Estimate power consumption early in design flow is mandatory • Huge design space ⇒ Available power modelling tools1,2 are not well-adapted to complex multicores⇒ Need a fast power modelling tool
  • 9. Outline Heterogeneous MpSoC • Definition of HMpSoC families • Generic representation focused on memory Communication-based power model • Fast power modelling approach for task-mapping on HMpSoC Model parameter extraction • µBenchmarking methodology to ease power model’s parameter extraction Validation on Xilinx Zynq • Power model parameter extraction with µBenchmark • Power model output validation with mutant applications B. Roux 2
  • 20. Heterogeneous MpSoC B. Roux 4 HMpSoC family • Distributed HMpSoC: small HW accelerators, fast communications with SW • Shared HMpSoC: large HW accelerators shared between clusters, slow communications with SW HW Memory unit processor unit 1 processor unit n NoC NoC N o C ... ... ... ... SW 1 NoC_itf Memory unit SW ... processor unit 1MEMORY SW 2 SW N GPIO DDR N o C Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR HW SW 1 NoC_itf PU ... MEMORY SW 2 PU N HW SW 1 NoC_itf SW ... MEMORY SW 2 SW N HW SW 1 NoC_itf SW ... MEMORY SW 2 SW N HW Memory unit hardware unit A processor unit 1 processor unit n NoC NoC N o C ... ... ... ... NoC_itf Memory unit processor unit 1MEMORY GPIO DDR PU 0 N o C GPIO DDR GPIO DDR PU 1 PU 2 PU 3 PU ... PU N NoC_itf MEMORY PU 0 PU 1 PU 2 PU 3 PU ... PU N NoC_itf MEMORY PU 0 PU 1 PU 2 PU 3 PU ... PU N NoC_itf Distributed HMpSoC Shared HMpSoC
  • 21. Heterogeneous MpSoC B. Roux 4 HMpSoC family • Distributed HMpSoC: small HW accelerators, fast communications with SW • Shared HMpSoC: large HW accelerators shared between clusters, slow communications with SW HW Memory unit processor unit 1 processor unit n NoC NoC N o C ... ... ... ... SW 1 NoC_itf Memory unit SW ... processor unit 1MEMORY SW 2 SW N GPIO DDR N o C Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR Memory unit processor unit 1 processor unit n Memory unit processor unit 1 GPIO DDR HW SW 1 NoC_itf PU ... MEMORY SW 2 PU N HW SW 1 NoC_itf SW ... MEMORY SW 2 SW N HW SW 1 NoC_itf SW ... MEMORY SW 2 SW N HW Memory unit hardware unit A processor unit 1 processor unit n NoC NoC N o C ... ... ... ... NoC_itf Memory unit processor unit 1MEMORY GPIO DDR PU 0 N o C GPIO DDR GPIO DDR PU 1 PU 2 PU 3 PU ... PU N NoC_itf MEMORY PU 0 PU 1 PU 2 PU 3 PU ... PU N NoC_itf MEMORY PU 0 PU 1 PU 2 PU 3 PU ... PU N NoC_itf Generic description How to precisely describe an architecture in those families?
  • 22. Memory Tree Abstraction B. Roux 5 Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 Network Classe Cluster Classe Core Classe ... Sw Core N ... ...Hw Core Hw Core
  • 24. Motivation HMpSoC energy consumption Three main sources: • Dynamic energy consumption used for computations • Static energy dissipated during execution time • Energy used for communications between cores Assumptions A parallelisable application could be executed on multiple threads, reducing the execution time but not its complexity: • Amount of computations is independent of chosen parallelism degree • Amount of communications and synchronizations is directly linked to the number of execution threads B. Roux 6
  • 25. Power Model Structure (1) Communication energy cost • Communications are map into memory • C(Tki , Tkj ): set of communication channels crossed from task Tki to task Tkj Ecom(Tki , Tkj ) = c∈C(Tki ,Tkj ) e0c + e1c × bytes(Tki , Tkj ) Note: Synchronization and IO events are managed as communications. Computation energy cost Ecomp(Tkk ): computed once for each kind of available computational cores. B. Roux 7
  • 26. Power Model Structure (1) B. Roux 7 H A R D W A R E S U I T A B L E Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions (42) (34) H A R D W A R E S U I T A B L E (10) (30) Load Store Basic power Bloc instructions (95) (17)
  • 27. Power Model Structure (1) B. Roux 7 H A R D W A R E S U I T A B L E Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions Load Store Basic power Bloc instructions (42) (34) H A R D W A R E S U I T A B L E (10) (30) Load Store Basic power Bloc instructions (95) (17) NoC Network Classe Cluster Classe Core Classe Cluster A Cluster B CLUSTER MEMORY SW MEMORY CLUSTER MEMORY SW MEMORY HW A SW A1 SW B HW B
  • 28. Power Model Structure (1) B. Roux 7 Load Store SW A Load Store SW A Load Store HW A Load Store SW B (42) (34) (10) (30) Load Store HW B (95) (17) NoC Network Classe Cluster Classe Core Classe Cluster A Cluster B CLUSTER MEMORY SW MEMORY CLUSTER MEMORY SW MEMORY HW A SW A1 SW B HW B
  • 29. Power Model Structure (1) B. Roux 7 Load Store SW A Load Store SW A Load Store HW A Load Store SW B (42) (34) (10) (30) Load Store HW B (95) (17) NoC Network Classe Cluster Classe Core Classe Cluster A Cluster B CLUSTER MEMORY SW MEMORY CLUSTER MEMORY SW MEMORY HW A SW A1 SW B HW B
  • 30. Power Model Structure (1) B. Roux 7 Load Store SW A Load Store SW A Load Store HW A Load Store SW B (42) (34) (10) (30) Store HW B (95) (17) Load NoC Network Classe Cluster Classe Core Classe Cluster A Cluster B CLUSTER MEMORY SW MEMORY CLUSTER MEMORY SW MEMORY HW A SW A1 SW B HW B
  • 31. Power Model Structure (1) Communication energy cost • Communications are map into memory • C(Tki , Tkj ): set of communication channels crossed from task Tki to task Tkj Ecom(Tki , Tkj ) = c∈C(Tki ,Tkj ) e0c + e1c × bytes(Tki , Tkj ) Note: Synchronization and IO events are managed as communications. Computation energy cost Ecomp(Tkk ): computed once for each kind of available computational cores. B. Roux 7
  • 32. Power Model Structure (2) Static energy cost Estat = Texec × Pstat where Pstat is the static power, Texec is the critical path in the mapping graph weighted with computations and communications. Global energy cost Et = Estat + k∈NTk Ecomp(Tkk ) + (i,j)∈N2 Tk Ecom(Tki , Tkj ) B. Roux 8
  • 33. Power Model Structure (2) Static energy cost Estat = Texec × Pstat where Pstat is the static power, Texec is the critical path in the mapping graph weighted with computations and communications. Global energy cost Et = Estat + k∈NTk Ecomp(Tkk ) + (i,j)∈N2 Tk Ecom(Tki , Tkj ) B. Roux 8
  • 35. µBenchmarks purpose Definition A µBenchmark is a simple and synthetic application that aims at stressing a specific part of the execution architecture. Properties • Selectivity: µbenchs only stress a specific communication channel • Intensity variability: µbenchs stress a communication channel with different intensity • Duration variability: µbenchs duration is adapted to match power measurement timing resolution B. Roux 9
  • 36. µBenchmark structure General structure • InterCluster • IntraCluster • HwChannel • SwChannel B. Roux 10 Algorithm 1: Generic µBenchmark structure. Data: scaleFactor, size initBenchmarkEnv() startPowerMeasure() for iteration in scaleFactor do openCommunicationChannel() producer = spawnProducerThread(size) consummer = spawnConsumerThread(size) waitThread(producer, consumer) closeCommunicationChannel() end stopPowerMeasure() writePowerMeasureToFile() Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 Network Classe Cluster Classe Core Classe ... Sw Core N ... ...Hw Core Hw Core
  • 37. µBenchmark structure General structure • InterCluster • IntraCluster • HwChannel • SwChannel B. Roux 10 Algorithm 1: Generic µBenchmark structure. Data: scaleFactor, size initBenchmarkEnv() startPowerMeasure() for iteration in scaleFactor do openCommunicationChannel() producer = spawnProducerThread(size) consummer = spawnConsumerThread(size) waitThread(producer, consumer) closeCommunicationChannel() end stopPowerMeasure() writePowerMeasureToFile() Network Classe Cluster Classe Core Classe Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 ... Sw Core N ... ...Hw Core Hw Core Memory sublevel1 NoC sublevel0 NoC sublevel1 Memory sublevel1
  • 38. µBenchmark structure General structure • InterCluster • IntraCluster • HwChannel • SwChannel B. Roux 10 Algorithm 1: Generic µBenchmark structure. Data: scaleFactor, size initBenchmarkEnv() startPowerMeasure() for iteration in scaleFactor do openCommunicationChannel() producer = spawnProducerThread(size) consummer = spawnConsumerThread(size) waitThread(producer, consumer) closeCommunicationChannel() end stopPowerMeasure() writePowerMeasureToFile() Network Classe Cluster Classe Core Classe Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 ... Sw Core N ... ...Hw Core Hw Core Memory sublevel1 Memory sublevel0 Memory sublevel1 Memory sublevel0
  • 39. µBenchmark structure General structure • InterCluster • IntraCluster • HwChannel • SwChannel B. Roux 10 Algorithm 1: Generic µBenchmark structure. Data: scaleFactor, size initBenchmarkEnv() startPowerMeasure() for iteration in scaleFactor do openCommunicationChannel() producer = spawnProducerThread(size) consummer = spawnConsumerThread(size) waitThread(producer, consumer) closeCommunicationChannel() end stopPowerMeasure() writePowerMeasureToFile() Network Classe Cluster Classe Core Classe Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 ... Sw Core N ... ...Hw Core Hw Core Memory sublevel1 Memory sublevel0 Sw Core N Memory sublevel1 Memory sublevel0 Hw Core Hw Core
  • 40. µBenchmark structure General structure • InterCluster • IntraCluster • HwChannel • SwChannel B. Roux 10 Algorithm 1: Generic µBenchmark structure. Data: scaleFactor, size initBenchmarkEnv() startPowerMeasure() for iteration in scaleFactor do openCommunicationChannel() producer = spawnProducerThread(size) consummer = spawnConsumerThread(size) waitThread(producer, consumer) closeCommunicationChannel() end stopPowerMeasure() writePowerMeasureToFile() Network Classe Cluster Classe Core Classe Sw Core 1 Memory sublevel1 NoC sublevel0 Memory sublevel0 NoC sublevel1 Sw Core 1 Sw Core N Memory sublevel1 Memory sublevel0 ... Sw Core N ... ...Hw Core Hw Core Sw Core 1 Memory sublevel0 Sw Core 1 Sw Core N Memory sublevel0 Sw Core N ... ...
  • 42. Experimental infrastructure Zynq architecture Virtual Memory space Processing System Progammable Logic PS7_0 PS7_1 SCU A M B A Cache L2 Cache L1 Cache L1 DDR A M B A interconnect ext P1 ext P0 ext Pn HP1 HP2 HP3 HP4 GP0 GP1 GP2 GP3 . . . IRQ IRQ0 ... IRQ15 A C P Advanced coherency protocol interconnect Experimental setup • Board: Xilinx Zc702 • OS: Linux kernel v4.0.0 • Power measurement: TI UCD92xx, PMBus controlled, 5ms resolution, 7 rails B. Roux 11
  • 43. Parameters Extraction (1): CL2 write B. Roux 12 Power: 300 400 500 600 700 800 900 1000 [size] 5.0 5.5 6.0 6.5 7.0 7.5 [P] 1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps) 300 400 500 600 700 800 900 1000 [size] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 [P] 1e 1 benchmark CL2writepowerRails: VccPAux(PS) 300 400 500 600 700 800 900 1000 [size] 2.2 2.4 2.6 2.8 3.0 3.2 3.4 [P] 1e 1 benchmark CL2writepowerRails: VccPInt(PS) rails A rails B rails C
  • 44. Parameters Extraction (1): CL2 write B. Roux 12 Power: 300 400 500 600 700 800 900 1000 [size] 5.0 5.5 6.0 6.5 7.0 7.5 [P] 1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps) 300 400 500 600 700 800 900 1000 [size] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 [P] 1e 1 benchmark CL2writepowerRails: VccPAux(PS) 300 400 500 600 700 800 900 1000 [size] 2.2 2.4 2.6 2.8 3.0 3.2 3.4 [P] 1e 1 benchmark CL2writepowerRails: VccPInt(PS) rails A rails B rails C Time: 300 400 500 600 700 800 900 1000 [size] 2 3 4 5 6 7 8 [S] 1e 5 benchmark CL2writepowerRails: timeIt
  • 45. Parameters Extraction (1): CL2 write B. Roux 12 Power: 300 400 500 600 700 800 900 1000 [size] 5.0 5.5 6.0 6.5 7.0 7.5 [P] 1e 1 benchmark CL2writepowerRails: Vcc_1V5_Ps(Ps) 300 400 500 600 700 800 900 1000 [size] 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 [P] 1e 1 benchmark CL2writepowerRails: VccPAux(PS) 300 400 500 600 700 800 900 1000 [size] 2.2 2.4 2.6 2.8 3.0 3.2 3.4 [P] 1e 1 benchmark CL2writepowerRails: VccPInt(PS) rails A rails B rails C Time: 300 400 500 600 700 800 900 1000 [size] 2 3 4 5 6 7 8 [S] 1e 5 benchmark CL2writepowerRails: timeIt Energy: 300 400 500 600 700 800 900 1000 Size [bytes] 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Energy[J] 1e 4 CL2write Energy curves
  • 46. Parameters Extraction (2) B. Roux 13 300 400 500 600 700 800 900 1000 Size [bytes] 0.2 0.4 0.6 0.8 Energy[J] 1e 4 CL1read CL1read_burst CL2read CL2read_burst CL1write CL1write_burst CL2write CL2write_burst Energy curves
  • 47. Parameters Extraction (2) 3Results over 22 benchmarks are available in the paperB. Roux 13 300 400 500 600 700 800 900 1000 Size [bytes] 0.2 0.4 0.6 0.8 Energy[J] 1e 4 CL1read CL1read_burst CL2read CL2read_burst CL1write CL1write_burst CL2write CL2write_burst Energy curves 3 Time [s] Energy [J] Benchmark f : x → t1x + t0 f : x → e1x + e0 t1 t0 e1 e0 CL1 read 1.82e-08 -2.95e-08 1.52e-09 -2.45e-09 CL1 read burst 1.02e-08 6.68e-09 8.40e-10 5.50e-10 CL1 write 6.03e-08 3.12e-07 4.72e-09 2.44e-08 CL1 write burst 5.05e-08 -3.73e-07 3.73e-09 -2.75e-08 CL2 read 1.76e-08 -2.69e-08 1.51e-09 -2.30e-09 CL2 read burst 9.62e-09 3.19e-07 8.01e-10 2.66e-08 CL2 write 7.19e-08 -5.46e-09 1.70e-08 -1.29e-09 CL2 write burst 5.08e-08 -4.14e-07 3.71e-09 -3.02e-08
  • 48. Validation on mutant applications (1) Mutant application • Abstract application automatically generated from pattern functions • Randomly generates communication traffic Mutant generation • (n + 1) Rounds per application • 3 workers per Round • 12 Software patterns • 6 Hardware patterns B. Roux 14 Round 0 Round 1 Round n SW slotB random size and pattern function SW slotA random size and pattern function HW slotA random size and pattern function SW slotB random size and pattern function SW slotA random size and pattern function HW slotA random size and pattern function SW slotB random size and pattern function SW slotA random size and pattern function HW slotA random size and pattern function
  • 49. Validation on mutant applications (2) B. Roux 15 Table 1: Communications spread over channels in two mutants Total bytes Channel name Cache L1 Cache L2 DDR HPx ACP GPx 4.56e+07 read 6.6% read 1.3% read 1.3% read 1.2% read 0.6% polling 3.5% read burst 0.6% read burst 5.5% read burst 18.0% write 6.8% write 2.5% write 1.1% write 2.0% write 0.4% irq 0.5% write burst 6.6% write burst 0.0% write burst 41.3% 5.37e+07 read 6.7% read 2.1% read 4.7% read 2.9% read 4.8% polling 2.0% read burst 4.7% read burst 0.2% read burst 10.0% write 5.0% write 0.6% write 0.0% write 7.4% write 2.0% irq 0.5% write burst 4.7% write burst 6.8% write burst 35.0% Table 2: Estimation vs. measures mutantRank Time [s] Energy [J] Error measured estimated measured estimated time energy mutant 1 2.308 2.311 2.949 2.943 0.1% 0.2% mutant 2 2.340 2.336 3.031 2.964 0.2% 2.2% average on 80 mutants 2.974 2.975 3.855 3.861 0.5 % 1.0 % Power estimation time for 80 mutants 0.5s
  • 50. Conclusion Initial issue Provide a very-fast power modelling methodology for task-mapping in Heterogeneous MpSoC Proposals • Generic model of Heterogeneous MpSoC • Power modelling approach focused on communication channels • µBenchmark approach that enable architecture’s parameters extraction Ongoing work • Integrate this methodology in state of the art compiler frameworks4,5 • Towards HW/Sw partitioning for HMpSoC under energy efficiency constraint 4Floch et al., “GeCoS: A framework for prototyping custom hardware design flows”. 5Ceng et al., “MAPS: An Integrated Framework for MPSoC Application Parallelization”. B. Roux 16
  • 51. Conclusion Initial issue Provide a very-fast power modelling methodology for task-mapping in Heterogeneous MpSoC Proposals • Generic model of Heterogeneous MpSoC • Power modelling approach focused on communication channels • µBenchmark approach that enable architecture’s parameters extraction ⇒ Estimation accuracy and time fit well with task-mapping Ongoing work • Integrate this methodology in state of the art compiler frameworks4,5 • Towards HW/Sw partitioning for HMpSoC under energy efficiency constraint 4Floch et al., “GeCoS: A framework for prototyping custom hardware design flows”. 5Ceng et al., “MAPS: An Integrated Framework for MPSoC Application Parallelization”. B. Roux 16
  • 52. Thanks for your attention Do you have any questions? B. Roux 16
  • 54. Zynq architecture B. Roux Virtual Memory space Processing System Progammable Logic PS7_0 PS7_1 SCU A M B A Cache L2 Cache L1 Cache L1 DDR A M B A interconnect ext P1 ext P0 ext Pn HP1 HP2 HP3 HP4 GP0 GP1 GP2 GP3 . . . IRQ IRQ0 ... IRQ15 A C P Advanced coherency protocol interconnect