SlideShare a Scribd company logo
1
ALEA:Fine-grain Energy Profiling with Basic Block
sampling
Lev Mukhanov,Dimitrios S. Nikolopoulos and Bronis R. de Supinski
Queen’s University of Belfast
PACT 2015
2
Executive summary
Fine-grain energy profiling is essential for energy
optimization
Contribution:
Probabilistic approach and a tool(ALEA) for
fine-grain energy profiling
3
Outline
Introduction
Probabilistic approach and ALEA
implementation
Validation process
Experiments and use cases
4
Energy optimization challenge
18
Numberoflines,Millions
Numberofcommitsfor12months,Thousands
Energy efficient???
68
5
How workloads affect energy?
I/O Blocks
Block 1
Block 2
10.2 Seconds
0.3 Seconds
10 Seconds
Kmeans
6
How workloads affect energy?
I/O Blocks
Block 1
Block 2
10.2 Seconds
0.3 Seconds
10 Seconds
83 Joules
2 Joules
246 Joules
Kmeans
7
Fine-grain energy profiling challenges
Coarse-grained power/energy meters
Any measurements bias real energy
Overhead introduced by measurements is critical
8
Fine-grain energy profiling challenges
Coarse-grained power/energy meters
Any measurements bias real energy
Overhead introduced by measurements is critical
9
State of the art approaches
Manual Instrumentation[PowerPack,R.Ge 2010]
 Low overhead
 Coarse-grain
 What code should be instrumented?
 Source code should be modified
Binary Instrumentation
 Fine-grain
 Overhead(PIN - overhead more than 300%)
HPM(Hardware Performance Monitos)[R.Bertran 2013]
EPI(Energy Per Instruction) models[Y.S. Shao 2013]
 Low overhead
 Do not capture the dynamic execution context
 Low accuracy
Sampling[PowerScope,J.Flinn 1999]
 Low overhead
  Is it fine-grain?
10
Performance profiling based on Sampling
Performance profiling model: a period between samples is
associated with the sampled object =⇒ coarse-grain
probabilistic model
11
Probabilistic model
Normal distribution
ˆtbbm =ˆpbbm ·texec =nbbm ·texec
n
pbbm =P(Xbbm =1)=C1
tbbm
C1
texec
=
k
j=1
latencyj
bbm
texec
=tbbm
texec
ˆpowu
bbm = ˆpowbbm +zα/2
s
nbbm
ˆpbbm =nbbm
n
ˆebbm = ˆpowbbm ·ˆtbbm
ˆpowu
bbm = ˆpowbbm +zα/2
s
nbbm
ˆpowbbm = 1
nbbm
·
nbbm
i=1
powi
bbm
95 % confidence interval
ˆpowbbm = 1
nbbm
·
nbbm
i=1
powi
bbm
ˆpowu
bbm = ˆpowbbm +zα/2
s
nbbm
ˆpowu
bbm = ˆpowbbm +zα/2
s
nbbm
s= 1
nbbm−1 ·
nbbm
i=1
(powi
bbm− ˆpowbbm)2
ˆebbm = ˆpowbbm ·ˆtbbm
ˆpowl
bbm = ˆpowbbm−zα/2
s
nbbm
ˆpbbm =nbbm
n
ˆtbbm =ˆpbbm ·texec =nbbm ·texec
n
ˆpowbbm = 1
nbbm
·
nbbm
i=1
powi
bbm
ˆpowbbm = 1
nbbm
·
nbbm
i=1
powi
bbm
95 % confidence interval
ˆebbm = ˆpowbbm ·ˆtbbm
pbbm =P(Xbbm =1)=C1
tbbm
C1
texec
=
k
j=1
latencyj
bbm
texec
=tbbm
texec
ˆpowl
bbm = ˆpowbbm−zα/2
s
nbbm
95 % confidence interval
12
Probabilistic model
Execution time of a block:
timeblock = pblock · timeapplication (1)
Estimation of ˆpblock using sampling
Estimation of execution time:
ˆtimeblock = ˆpblock · timeapplication (2)
Power measurements and a sample are taken simultaneously
to estimate ˆpowerblock
Estimation of energy consumption:
ˆenergyblock = ˆpowerblock · ˆtimeblock (3)
13
Random sampling ≈ Systematic sampling
Application
Time(ticks)1 2 ... 1023 1024 ... 5990
power
block2
power
block2
power
block3
power
block9
Random sampling
Application
Time(ticks)... 23 ... 1023
power
block5
power
block2
power
block9
Systematic sampling
... 1023
1000 1000
random
14
Parallel profiling challenge
POWER
15
Parallel profiling challenge
POWER
?
16
Profiling of parallel applications
How to apportion power/energy between threads?
Basic block vector(BBV) bbm:
bbm = bbthread1 , bbthread2 , ..., bbthreadl
(4)
17
Implementation
ALEA
Thread1
Thread2
...
ThreadN
Application
RAPLRAPL INA231
DWARF is used to assign energy estimates to source code
Architecture independent implementation(portable)
Low overhead( 1%) - suitable for on-line profiling
18
Sampling period and accuracy of the estimates
Accuracy ∼ the number of samples
0 2000 4000 6000 8000 10000
Number of samples
0.28
0.29
0.30
0.31
0.32
0.33
0.34Executiontime,Sec
Random error
Time
0 2000 4000 6000 8000 10000
Number of samples
0.35
0.40
0.45
0.50
0.55
Energy,J
Confidence interval
Energy
estimated time/energy
measured time/energy
Sampling period =⇒ Accuracy
19
Sampling period and accuracy of the estimates
Sampling incurs overhead - bias of the estimates
↓ sampling period ↓ random error ↑ overhead
↑ sampling period ↑ random error ↓ overhead
sampling period ↓↑?
1 2 5 8 10 15 20 25 50 100
Sampling period,ms
0
5
10
15
20
25
30
Overhead,%
Optimal:10 ms
Overhead ∼ 1%
Sandy Bridge
Overhead(sequential)
Overhead(parallel)
0
5
10
15
20
25
30
Error,%
1 2 5 8 10 15 20 25 50 100
Sampling period,ms
0
5
10
15
20
25
30
Overhead,%
Optimal:10 ms
Overhead ∼ 1%
Exynos
0
5
10
15
20
25
30
Error,%
Error(sequential)
Error(parallel)
20
Validation
14 benchmarks(SPEC 2000, Parsec, Rodinia, SPEC OMP)
direct instrumentaion
81% coverage
Energy estimates Average Error
Sandy Bridge Exynos
all blocks 1.4 % 2.6 %
fine-grain blocks 1.6 % 3.7 %
parallel blocks 3.1 % 3.6 %
all bench 1.4 % 1.9 %
21
Effect of cache instructions and pipelining
Arithmetic
Original
Cache
0
2
4
6
8
10
12
Power,W
Sandy Bridge
Arithmetic
Cache
Original
0.0
0.5
1.0
1.5
2.0
Power,W
Exynos
Arithmetic
Cache
Original
EPIOriginal
0
500
1000
1500
2000
2500
Energy,J
50%
Sandy Bridge
Energy
0
50
100
150
200
250
Time,Sec
Arithmetic
Cache
Original
EPIOriginal
0
100
200
300
400
500
600
700
800
Energy,J
29%
Exynos
0
100
200
300
400
500
600
Time,Sec
Time
Pipelining hides latency( =⇒ energy) of cache accesses
EPI models could lead to significant errors
22
Use cases
kmeans/Sandy Bridge
profiling: 50 % of the total energy is spent on one block(Euclidean
distance)
optimization strategy:align and to restrict pointers,forced unroll
results: 7x energy reduction
ocean cp/Exynos
profiling: more than 50% of energy is spent on 6 blocks
optimization strategy: disable predictive commoning
optimization
results: 10 % power reduction
raytrace/Exynos
profiling: 50% of the total energy is spent on 2
blocks(SphPeIntersect)
optimization strategy: remove redundant memory accesses
and indirect addressing instruction
results: reduce energy by 6 %
23
Conclusion
The proposed probabilistic approach and ALEA provides:
low overhead(∼ 1 %,on-line profiling)
accurate estimates(Intel ∼ 1.4 %,ARM ∼ 2.6 %)
estimates at the fine-grain level
architecture-independent approach
ALEA could be effectively applied to optimize energy and
power consumption
Future work:
improve accuracy of the estimates
port to new architectures(GPUs and Intel Xeon Phi)
profiling of VMs
23
Thank you
This research has been supported by the UK EPSRC and by the EC FP7
24
BackUp
25
Probabilistic model
Random sampling is approximated by systematic sampling
Power and a basic block are sampled simultaneously
For each block time, energy and power estimates are provided
For each estimate a confidence interval is provided
See the paper for more details
26
Use cases
kmeans.Sandy Bridge
50 % of energy is spent on one block(Euclidean distance)
optimization strategy:align and to restrict pointers,forced unroll
results: 7x energy reduction
ocean cp.Exynos
more than 50% of energy is spent on 6 blocks
optimization strategy: disable predictive commoning
optimization
results: 10 % power reduction
raytrace.Exynos
50% of energy is spent on 2 blocks (SphPeIntersect)
optimization strategy: remove redundant memory accesses
and indirect addressing instruction
results: reduce energy by 6 %
27
Use cases.Sandy Bridge
56 % of time is spent on one block(Euclidean distance)
Problems:unroll and auto-vectorization are not applied
Optimization strategy:align and to restrict pointers,forced
unroll
Results: 7x energy decrease
0 1 2 3 4 5 6 7 8 9
Threads
0
5
10
15
20
25
30
Time,Sec
Cache sharing effectCache sharing effect
basic block -O3
basic block -O3 + hints
0 1 2 3 4 5 6 7 8 9
Threads
5
10
15
20
25
30
35
40
45
Power,W
0 1 2 3 4 5 6 7 8 9
Threads
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Energy,100J
221 Joules(697 %)
28
Validation.Sandy Bridge results
fft(seq)oceancp(seq)
oceanncp(seq)radix(seq)art(seq)ammp(seq)quake(seq)cfd(seq)heartwall(seq)
streamclu(seq)cfd(par)streamclu(par)ammp(par)quake(par)aver(seq)aver(par)
0
1
2
3
4
5
6
7
Averageerror,%
1-st basic block(MORE samples)
2-nd basic block(LESS samples)
Time estimates
Energy estimates
29
Impact of Memory instructions
Nop
Arithm
Mem(L1,store)
Mem(L1,load)
Mem(store)
Mem(L1)
Original
Mem(L2,store)
Mem(load)
Mem(L2,load)
Mem
Mem(L2)
0
2
4
6
8
10
12
Power,W
cache access intensity
Sandy Bridge
Arithm
Nop
Mem(L1,load)
Mem(L1)
Mem(L1,store)
Mem(L2,load)
Mem(L2,store)
Mem(L2)
Original(L2)
0.0
0.5
1.0
1.5
2.0
Power,W
cache access intensity
Exynos
CPU power is primarily affected by cache accesses
30
Probability to sample a basic block
Basic block execution
Introduce Xbbm associated with each tick:
Xbbm =
1, if bbm is the sampled basic block
0, otherwise
(5)
Take one random sampling. Probability that bbm is sampled:
pbbm = P(Xbbm = 1) =
C1
tbbm
C1
texec
=
k
j=1 latencyj
bbm
texec
=
tbbm
texec
(6)
31
Execution time estimates
Take samples several times.Random sampling
Xbbm random and follows the Bernoulli distribution
Estimate pbbm using the maximum likelihood estimator of
parameter pbbm in the Bernoulli distribution for Xbbm
ˆpbbm =
nbbm
n
(7)
tbbm is estimated as
ˆtbbm = ˆpbbm · texec =
nbbm · texec
n
(8)
32
Power and Energy estimates
The same probabilistic approach
Power consumption is random variable(Normal distribution)
Implementation of the variable is associated with each tick
The mean power consumption of bbm:
ˆpowbbm =
1
nbbm
·
nbbm
i=1
powi
bbm (9)
Energy consumption of bbm:
ˆebbm = ˆpowbbm · ˆtbbm (10)
33
Quality of time estimates
Confidence interval for pbbm
ˆpu
bbm = ˆpbbm + zα/2
1
n
· ˆpbbm · (1 − ˆpbbm) (11)
ˆpl
bbm = ˆpbbm − zα/2
1
n
· ˆpbbm · (1 − ˆpbbm) (12)
ˆpl
bbm ≤ p ≤ ˆpu
bbm (13)
Confidence interval for tbbm
ˆpl
bbm · texec ≤ tbbm ≤ ˆpu
bbm · texec (14)
34
Bounds and Confidence.Energy
We can similarly build a confidence interval for power
ˆpowu
bbm = ˆpowbbm + zα/2
s
√
nbbm
(15)
ˆpowl
bbm = ˆpowbbm − zα/2
s
√
nbbm
(16)
s =
1
nbbm − 1
·
nbbm
i=1
(powi
bbm − ˆpowbbm)2 (17)
ˆpowl
bbm ≤ powbbm ≤ ˆpowu
bbm (18)
Confidence interval for energy consumption
ˆpl
bbm · texec · ˆpowl
bbm ≤ ebbm ≤ ˆpu
bbm · texec · ˆpowu
bbm (19)
35
Parallel applications
Basic block vector(BBV) bbm
bbm = bbthread1 , bbthread2 , ..., bbthreadl
(20)
ˆtbbm
= ˆpbbm
· texec =
nbbm
· texec
n
(21)
ˆpowbbm
=
1
nbbm
·
nbbm
i=1
powi
bbm
(22)
ˆebbm
= ˆpowbbm
· ˆtbbm
(23)
36
Experiments. Impact of Memory instruction
How to optimize energy consumption?
Performance vs Power optimization
How to decrease power consumption? What affects power
consumption?
Block Description
Basic block A Copy of BBA
Mem Only memory access instructions of BBA
NoMem Only arithmetic/logic instructions of BBA
Mem(L2) Mem block with the size of accessed
data limited to 2MB (L2 cache size on Exynos)
Mem(L1) Mem block with the size of accessed
data limited to 2KB (L1 cache size on Exynos)
Mem(load) Mem block with load instructions only
Mem(store) Mem block with store instructions only
Mem(L2,load) Mem(L2) block with loads only
Mem(L2,store) Mem(L2) block with stores only
Mem(L1,load) Mem(L1) block with loads only
Mem(L1,store) Mem(L1) block with stores only
37
Use case(Exynos).Power optimization.ocean cp
more than 50% of the total execution time is spent in 6 basic
blocks
optimization strategy: remove redundant cache accesses
disable prefetch,predictive commoning optimization
(up to 14 % power decrease)
for each basic block different strategy should be applied
DVFS could be applied also...
Baseline Energy-optimal
Time(s) Energy (J) Time (s) Energy (J) Threads Frequency Manual optimization
bb1,jacobcalc2.C:301 2.03 8.48 1.87 6.03 4 1500 MHz No
bb2,slave2.C:641 1.54 6.70 1.31 4.16 2 1600 MHz Yes
bb3,laplacalc.C:83 2.02 9.53 2.55 7.98 2 1500 MHz No
bb4,multi.C:253 2.17 7.22 2.62 6.52 2 1500 MHz No
bb5,multi.C:235 2.36 7.88 3.29 5.56 1 1500 MHZ No
bb6,multi.C:290 2.67 9.23 3.23 5.46 1 1500 MHz No
program 29.93 108.64 26.88 72.84 2.0 (avg.) 1516 MHz (avg.) Yes
38
Platforms
Intel Sandy Bridge
(Intel Xeon E5-2650), 2 CPU, 8 cores, 32KB/32KB
I/D-Cache per core, 2MB L2 cache, 20MB L3
cache. OS: CentOS (release 6.5). Frequency: 2 GHz.
Energy measurements: RAPL
Samsung Exynos 5 Octa(Odroid-XU+E),
ARM Big.LITTLE, 4 A15 cores,4
A7 cores, 32KB/32KB I/D-Cache per core, 2MB
L2 cache, OS: Ubuntu 14.04 LTS.Frequency:1.6 GHz
Energy measurements: Power meters(INA 231)

More Related Content

What's hot

Wannier90: Band Structures, Tips and Tricks
Wannier90: Band Structures, Tips and TricksWannier90: Band Structures, Tips and Tricks
Wannier90: Band Structures, Tips and Tricks
Jonathan Skelton
 
VASP: Some Accumulated Wisdom
VASP: Some Accumulated WisdomVASP: Some Accumulated Wisdom
VASP: Some Accumulated Wisdom
Jonathan Skelton
 
The Linear Model of a PV moduel
The Linear Model of a PV moduelThe Linear Model of a PV moduel
Security Constrained UCP with Operational and Power Flow Constraints
Security Constrained UCP with Operational and Power Flow ConstraintsSecurity Constrained UCP with Operational and Power Flow Constraints
Security Constrained UCP with Operational and Power Flow Constraints
IDES Editor
 
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
IAEME Publication
 
A new T-circuit model of wind turbine generator for power system steady state...
A new T-circuit model of wind turbine generator for power system steady state...A new T-circuit model of wind turbine generator for power system steady state...
A new T-circuit model of wind turbine generator for power system steady state...
journalBEEI
 
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
IJMER
 
Experimental Verification of the main MPPT techniques for photovoltaic system
Experimental Verification of the main MPPT techniques for photovoltaic systemExperimental Verification of the main MPPT techniques for photovoltaic system
Experimental Verification of the main MPPT techniques for photovoltaic system
International Journal of Power Electronics and Drive Systems
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
Salford Systems
 
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
TELKOMNIKA JOURNAL
 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET Journal
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC Applications
Facultad de Informática UCM
 
The optimal solution for unit commitment problem using binary hybrid grey wol...
The optimal solution for unit commitment problem using binary hybrid grey wol...The optimal solution for unit commitment problem using binary hybrid grey wol...
The optimal solution for unit commitment problem using binary hybrid grey wol...
IJECEIAES
 
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
Presentation: Wind Speed Prediction using Radial Basis Function Neural NetworkPresentation: Wind Speed Prediction using Radial Basis Function Neural Network
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
Arzam Muzaffar Kotriwala
 
A Genetic Algorithm Approach to Solve Unit Commitment Problem
A Genetic Algorithm Approach to Solve Unit Commitment ProblemA Genetic Algorithm Approach to Solve Unit Commitment Problem
A Genetic Algorithm Approach to Solve Unit Commitment Problem
IOSR Journals
 
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
IJECEIAES
 
ECE 565 presentation
ECE 565 presentationECE 565 presentation
ECE 565 presentation
Lakshmi Yasaswi Kamireddy
 
SEMI ADIABATIC ECRL AND PFAL FULL ADDER
SEMI ADIABATIC ECRL AND PFAL FULL ADDERSEMI ADIABATIC ECRL AND PFAL FULL ADDER
SEMI ADIABATIC ECRL AND PFAL FULL ADDER
csandit
 

What's hot (20)

Wannier90: Band Structures, Tips and Tricks
Wannier90: Band Structures, Tips and TricksWannier90: Band Structures, Tips and Tricks
Wannier90: Band Structures, Tips and Tricks
 
VASP: Some Accumulated Wisdom
VASP: Some Accumulated WisdomVASP: Some Accumulated Wisdom
VASP: Some Accumulated Wisdom
 
The Linear Model of a PV moduel
The Linear Model of a PV moduelThe Linear Model of a PV moduel
The Linear Model of a PV moduel
 
Security Constrained UCP with Operational and Power Flow Constraints
Security Constrained UCP with Operational and Power Flow ConstraintsSecurity Constrained UCP with Operational and Power Flow Constraints
Security Constrained UCP with Operational and Power Flow Constraints
 
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
AN EFFICIENT ALGORITHM FOR WRAPPER AND TAM CO-OPTIMIZATION TO REDUCE TEST APP...
 
A new T-circuit model of wind turbine generator for power system steady state...
A new T-circuit model of wind turbine generator for power system steady state...A new T-circuit model of wind turbine generator for power system steady state...
A new T-circuit model of wind turbine generator for power system steady state...
 
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
Improving efficiency of Photovoltaic System with Neural Network Based MPPT Co...
 
Experimental Verification of the main MPPT techniques for photovoltaic system
Experimental Verification of the main MPPT techniques for photovoltaic systemExperimental Verification of the main MPPT techniques for photovoltaic system
Experimental Verification of the main MPPT techniques for photovoltaic system
 
Science (1)
Science (1)Science (1)
Science (1)
 
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
Energy Consumption Saving in Embedded Microprocessors Using Hardware Accelera...
 
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...IRJET-  	  A New Approach to Economic Load Dispatch by using Improved QEMA ba...
IRJET- A New Approach to Economic Load Dispatch by using Improved QEMA ba...
 
ML_Cohesiv_energies
ML_Cohesiv_energiesML_Cohesiv_energies
ML_Cohesiv_energies
 
Runtime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC ApplicationsRuntime Methods to Improve Energy Efficiency in HPC Applications
Runtime Methods to Improve Energy Efficiency in HPC Applications
 
The optimal solution for unit commitment problem using binary hybrid grey wol...
The optimal solution for unit commitment problem using binary hybrid grey wol...The optimal solution for unit commitment problem using binary hybrid grey wol...
The optimal solution for unit commitment problem using binary hybrid grey wol...
 
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
Presentation: Wind Speed Prediction using Radial Basis Function Neural NetworkPresentation: Wind Speed Prediction using Radial Basis Function Neural Network
Presentation: Wind Speed Prediction using Radial Basis Function Neural Network
 
A Genetic Algorithm Approach to Solve Unit Commitment Problem
A Genetic Algorithm Approach to Solve Unit Commitment ProblemA Genetic Algorithm Approach to Solve Unit Commitment Problem
A Genetic Algorithm Approach to Solve Unit Commitment Problem
 
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
Multi Objective Directed Bee Colony Optimization for Economic Load Dispatch W...
 
ECE 565 presentation
ECE 565 presentationECE 565 presentation
ECE 565 presentation
 
SEMI ADIABATIC ECRL AND PFAL FULL ADDER
SEMI ADIABATIC ECRL AND PFAL FULL ADDERSEMI ADIABATIC ECRL AND PFAL FULL ADDER
SEMI ADIABATIC ECRL AND PFAL FULL ADDER
 

Similar to ALEA:Fine-grain Energy Profiling with Basic Block sampling

Power evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technologyPower evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technology
IAEME Publication
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
Engr. Muhammad Shan Saleem
 
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
Ali Al-Waeli
 
Optimal and Power Aware BIST for Delay Testing of System-On-Chip
Optimal and Power Aware BIST for Delay Testing of System-On-ChipOptimal and Power Aware BIST for Delay Testing of System-On-Chip
Optimal and Power Aware BIST for Delay Testing of System-On-Chip
IDES Editor
 
Multi Area Economic Dispatch
Multi Area Economic DispatchMulti Area Economic Dispatch
Multi Area Economic Dispatch
yhckelvin
 
stability of power flow analysis of different resources both on and off grid
stability of power flow analysis of different resources both on and off gridstability of power flow analysis of different resources both on and off grid
stability of power flow analysis of different resources both on and off grid
rehman1oo
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Rafael Nogueras
 
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
CSCJournals
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
NECST Lab @ Politecnico di Milano
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
VLSICS Design
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
VLSICS Design
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
VLSICS Design
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
VLSICS Design
 
Condition Monitoring of a Large-scale PV Power Plant in Australia
Condition Monitoring of a Large-scale PV Power Plant in AustraliaCondition Monitoring of a Large-scale PV Power Plant in Australia
Condition Monitoring of a Large-scale PV Power Plant in Australia
Amit Dhoke
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
ijeei-iaes
 
Renewable Asset Risk Management
Renewable Asset Risk ManagementRenewable Asset Risk Management
Renewable Asset Risk Management
Manuele Monti
 
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
OPAL-RT TECHNOLOGIES
 
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
IRJET Journal
 
G352734
G352734G352734
G352734
IJERA Editor
 

Similar to ALEA:Fine-grain Energy Profiling with Basic Block sampling (20)

Power evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technologyPower evaluation of adiabatic logic circuits in 45 nm technology
Power evaluation of adiabatic logic circuits in 45 nm technology
 
NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN  NOC POWER MANAGEMENT CONTROLLER DESIGN
NOC POWER MANAGEMENT CONTROLLER DESIGN
 
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
Performance prediction of PV & PV/T systems using Artificial Neural Networks ...
 
Optimal and Power Aware BIST for Delay Testing of System-On-Chip
Optimal and Power Aware BIST for Delay Testing of System-On-ChipOptimal and Power Aware BIST for Delay Testing of System-On-Chip
Optimal and Power Aware BIST for Delay Testing of System-On-Chip
 
Multi Area Economic Dispatch
Multi Area Economic DispatchMulti Area Economic Dispatch
Multi Area Economic Dispatch
 
stability of power flow analysis of different resources both on and off grid
stability of power flow analysis of different resources both on and off gridstability of power flow analysis of different resources both on and off grid
stability of power flow analysis of different resources both on and off grid
 
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free NetworksSelf-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
Self-Balancing Multimemetic Algorithms in Dynamic Scale-Free Networks
 
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
Design of High Speed Low Power 15-4 Compressor Using Complementary Energy Pat...
 
Run-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environmentsRun-time power management in cloud and containerized environments
Run-time power management in cloud and containerized environments
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
 
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
SIMULTANEOUS OPTIMIZATION OF STANDBY AND ACTIVE ENERGY FOR SUB-THRESHOLD CIRC...
 
Condition Monitoring of a Large-scale PV Power Plant in Australia
Condition Monitoring of a Large-scale PV Power Plant in AustraliaCondition Monitoring of a Large-scale PV Power Plant in Australia
Condition Monitoring of a Large-scale PV Power Plant in Australia
 
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
Optimal Power Flow with Reactive Power Compensation for Cost And Loss Minimiz...
 
Renewable Asset Risk Management
Renewable Asset Risk ManagementRenewable Asset Risk Management
Renewable Asset Risk Management
 
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
RT15 Berkeley | Real-Time Simulation of A Modular Multilevel Converter Based ...
 
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
IRJET - Modeling and Simulation of Fuzzy Logic based Controller with Proposed...
 
G352734
G352734G352734
G352734
 
G352734
G352734G352734
G352734
 

Recently uploaded

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilityISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
SciAstra
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
Nistarini College, Purulia (W.B) India
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
fafyfskhan251kmf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
tonzsalvador2222
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
RenuJangid3
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Erdal Coalmaker
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
sanjana502982
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Sérgio Sacani
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
Abdul Wali Khan University Mardan,kP,Pakistan
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
Wasswaderrick3
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
moosaasad1975
 

Recently uploaded (20)

SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), EligibilityISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
ISI 2024: Application Form (Extended), Exam Date (Out), Eligibility
 
Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.Nucleic Acid-its structural and functional complexity.
Nucleic Acid-its structural and functional complexity.
 
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdfDMARDs Pharmacolgy Pharm D 5th Semester.pdf
DMARDs Pharmacolgy Pharm D 5th Semester.pdf
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Chapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisisChapter 12 - climate change and the energy crisis
Chapter 12 - climate change and the energy crisis
 
Leaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdfLeaf Initiation, Growth and Differentiation.pdf
Leaf Initiation, Growth and Differentiation.pdf
 
Unveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdfUnveiling the Energy Potential of Marshmallow Deposits.pdf
Unveiling the Energy Potential of Marshmallow Deposits.pdf
 
Toxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and ArsenicToxic effects of heavy metals : Lead and Arsenic
Toxic effects of heavy metals : Lead and Arsenic
 
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...THEMATIC  APPERCEPTION  TEST(TAT) cognitive abilities, creativity, and critic...
THEMATIC APPERCEPTION TEST(TAT) cognitive abilities, creativity, and critic...
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
DERIVATION OF MODIFIED BERNOULLI EQUATION WITH VISCOUS EFFECTS AND TERMINAL V...
 
What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.What is greenhouse gasses and how many gasses are there to affect the Earth.
What is greenhouse gasses and how many gasses are there to affect the Earth.
 

ALEA:Fine-grain Energy Profiling with Basic Block sampling

  • 1. 1 ALEA:Fine-grain Energy Profiling with Basic Block sampling Lev Mukhanov,Dimitrios S. Nikolopoulos and Bronis R. de Supinski Queen’s University of Belfast PACT 2015
  • 2. 2 Executive summary Fine-grain energy profiling is essential for energy optimization Contribution: Probabilistic approach and a tool(ALEA) for fine-grain energy profiling
  • 3. 3 Outline Introduction Probabilistic approach and ALEA implementation Validation process Experiments and use cases
  • 5. 5 How workloads affect energy? I/O Blocks Block 1 Block 2 10.2 Seconds 0.3 Seconds 10 Seconds Kmeans
  • 6. 6 How workloads affect energy? I/O Blocks Block 1 Block 2 10.2 Seconds 0.3 Seconds 10 Seconds 83 Joules 2 Joules 246 Joules Kmeans
  • 7. 7 Fine-grain energy profiling challenges Coarse-grained power/energy meters Any measurements bias real energy Overhead introduced by measurements is critical
  • 8. 8 Fine-grain energy profiling challenges Coarse-grained power/energy meters Any measurements bias real energy Overhead introduced by measurements is critical
  • 9. 9 State of the art approaches Manual Instrumentation[PowerPack,R.Ge 2010] Low overhead Coarse-grain What code should be instrumented? Source code should be modified Binary Instrumentation Fine-grain Overhead(PIN - overhead more than 300%) HPM(Hardware Performance Monitos)[R.Bertran 2013] EPI(Energy Per Instruction) models[Y.S. Shao 2013] Low overhead Do not capture the dynamic execution context Low accuracy Sampling[PowerScope,J.Flinn 1999] Low overhead Is it fine-grain?
  • 10. 10 Performance profiling based on Sampling Performance profiling model: a period between samples is associated with the sampled object =⇒ coarse-grain probabilistic model
  • 11. 11 Probabilistic model Normal distribution ˆtbbm =ˆpbbm ·texec =nbbm ·texec n pbbm =P(Xbbm =1)=C1 tbbm C1 texec = k j=1 latencyj bbm texec =tbbm texec ˆpowu bbm = ˆpowbbm +zα/2 s nbbm ˆpbbm =nbbm n ˆebbm = ˆpowbbm ·ˆtbbm ˆpowu bbm = ˆpowbbm +zα/2 s nbbm ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm 95 % confidence interval ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm ˆpowu bbm = ˆpowbbm +zα/2 s nbbm ˆpowu bbm = ˆpowbbm +zα/2 s nbbm s= 1 nbbm−1 · nbbm i=1 (powi bbm− ˆpowbbm)2 ˆebbm = ˆpowbbm ·ˆtbbm ˆpowl bbm = ˆpowbbm−zα/2 s nbbm ˆpbbm =nbbm n ˆtbbm =ˆpbbm ·texec =nbbm ·texec n ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm 95 % confidence interval ˆebbm = ˆpowbbm ·ˆtbbm pbbm =P(Xbbm =1)=C1 tbbm C1 texec = k j=1 latencyj bbm texec =tbbm texec ˆpowl bbm = ˆpowbbm−zα/2 s nbbm 95 % confidence interval
  • 12. 12 Probabilistic model Execution time of a block: timeblock = pblock · timeapplication (1) Estimation of ˆpblock using sampling Estimation of execution time: ˆtimeblock = ˆpblock · timeapplication (2) Power measurements and a sample are taken simultaneously to estimate ˆpowerblock Estimation of energy consumption: ˆenergyblock = ˆpowerblock · ˆtimeblock (3)
  • 13. 13 Random sampling ≈ Systematic sampling Application Time(ticks)1 2 ... 1023 1024 ... 5990 power block2 power block2 power block3 power block9 Random sampling Application Time(ticks)... 23 ... 1023 power block5 power block2 power block9 Systematic sampling ... 1023 1000 1000 random
  • 16. 16 Profiling of parallel applications How to apportion power/energy between threads? Basic block vector(BBV) bbm: bbm = bbthread1 , bbthread2 , ..., bbthreadl (4)
  • 17. 17 Implementation ALEA Thread1 Thread2 ... ThreadN Application RAPLRAPL INA231 DWARF is used to assign energy estimates to source code Architecture independent implementation(portable) Low overhead( 1%) - suitable for on-line profiling
  • 18. 18 Sampling period and accuracy of the estimates Accuracy ∼ the number of samples 0 2000 4000 6000 8000 10000 Number of samples 0.28 0.29 0.30 0.31 0.32 0.33 0.34Executiontime,Sec Random error Time 0 2000 4000 6000 8000 10000 Number of samples 0.35 0.40 0.45 0.50 0.55 Energy,J Confidence interval Energy estimated time/energy measured time/energy Sampling period =⇒ Accuracy
  • 19. 19 Sampling period and accuracy of the estimates Sampling incurs overhead - bias of the estimates ↓ sampling period ↓ random error ↑ overhead ↑ sampling period ↑ random error ↓ overhead sampling period ↓↑? 1 2 5 8 10 15 20 25 50 100 Sampling period,ms 0 5 10 15 20 25 30 Overhead,% Optimal:10 ms Overhead ∼ 1% Sandy Bridge Overhead(sequential) Overhead(parallel) 0 5 10 15 20 25 30 Error,% 1 2 5 8 10 15 20 25 50 100 Sampling period,ms 0 5 10 15 20 25 30 Overhead,% Optimal:10 ms Overhead ∼ 1% Exynos 0 5 10 15 20 25 30 Error,% Error(sequential) Error(parallel)
  • 20. 20 Validation 14 benchmarks(SPEC 2000, Parsec, Rodinia, SPEC OMP) direct instrumentaion 81% coverage Energy estimates Average Error Sandy Bridge Exynos all blocks 1.4 % 2.6 % fine-grain blocks 1.6 % 3.7 % parallel blocks 3.1 % 3.6 % all bench 1.4 % 1.9 %
  • 21. 21 Effect of cache instructions and pipelining Arithmetic Original Cache 0 2 4 6 8 10 12 Power,W Sandy Bridge Arithmetic Cache Original 0.0 0.5 1.0 1.5 2.0 Power,W Exynos Arithmetic Cache Original EPIOriginal 0 500 1000 1500 2000 2500 Energy,J 50% Sandy Bridge Energy 0 50 100 150 200 250 Time,Sec Arithmetic Cache Original EPIOriginal 0 100 200 300 400 500 600 700 800 Energy,J 29% Exynos 0 100 200 300 400 500 600 Time,Sec Time Pipelining hides latency( =⇒ energy) of cache accesses EPI models could lead to significant errors
  • 22. 22 Use cases kmeans/Sandy Bridge profiling: 50 % of the total energy is spent on one block(Euclidean distance) optimization strategy:align and to restrict pointers,forced unroll results: 7x energy reduction ocean cp/Exynos profiling: more than 50% of energy is spent on 6 blocks optimization strategy: disable predictive commoning optimization results: 10 % power reduction raytrace/Exynos profiling: 50% of the total energy is spent on 2 blocks(SphPeIntersect) optimization strategy: remove redundant memory accesses and indirect addressing instruction results: reduce energy by 6 %
  • 23. 23 Conclusion The proposed probabilistic approach and ALEA provides: low overhead(∼ 1 %,on-line profiling) accurate estimates(Intel ∼ 1.4 %,ARM ∼ 2.6 %) estimates at the fine-grain level architecture-independent approach ALEA could be effectively applied to optimize energy and power consumption Future work: improve accuracy of the estimates port to new architectures(GPUs and Intel Xeon Phi) profiling of VMs
  • 24. 23 Thank you This research has been supported by the UK EPSRC and by the EC FP7
  • 26. 25 Probabilistic model Random sampling is approximated by systematic sampling Power and a basic block are sampled simultaneously For each block time, energy and power estimates are provided For each estimate a confidence interval is provided See the paper for more details
  • 27. 26 Use cases kmeans.Sandy Bridge 50 % of energy is spent on one block(Euclidean distance) optimization strategy:align and to restrict pointers,forced unroll results: 7x energy reduction ocean cp.Exynos more than 50% of energy is spent on 6 blocks optimization strategy: disable predictive commoning optimization results: 10 % power reduction raytrace.Exynos 50% of energy is spent on 2 blocks (SphPeIntersect) optimization strategy: remove redundant memory accesses and indirect addressing instruction results: reduce energy by 6 %
  • 28. 27 Use cases.Sandy Bridge 56 % of time is spent on one block(Euclidean distance) Problems:unroll and auto-vectorization are not applied Optimization strategy:align and to restrict pointers,forced unroll Results: 7x energy decrease 0 1 2 3 4 5 6 7 8 9 Threads 0 5 10 15 20 25 30 Time,Sec Cache sharing effectCache sharing effect basic block -O3 basic block -O3 + hints 0 1 2 3 4 5 6 7 8 9 Threads 5 10 15 20 25 30 35 40 45 Power,W 0 1 2 3 4 5 6 7 8 9 Threads 0.0 0.5 1.0 1.5 2.0 2.5 3.0 Energy,100J 221 Joules(697 %)
  • 30. 29 Impact of Memory instructions Nop Arithm Mem(L1,store) Mem(L1,load) Mem(store) Mem(L1) Original Mem(L2,store) Mem(load) Mem(L2,load) Mem Mem(L2) 0 2 4 6 8 10 12 Power,W cache access intensity Sandy Bridge Arithm Nop Mem(L1,load) Mem(L1) Mem(L1,store) Mem(L2,load) Mem(L2,store) Mem(L2) Original(L2) 0.0 0.5 1.0 1.5 2.0 Power,W cache access intensity Exynos CPU power is primarily affected by cache accesses
  • 31. 30 Probability to sample a basic block Basic block execution Introduce Xbbm associated with each tick: Xbbm = 1, if bbm is the sampled basic block 0, otherwise (5) Take one random sampling. Probability that bbm is sampled: pbbm = P(Xbbm = 1) = C1 tbbm C1 texec = k j=1 latencyj bbm texec = tbbm texec (6)
  • 32. 31 Execution time estimates Take samples several times.Random sampling Xbbm random and follows the Bernoulli distribution Estimate pbbm using the maximum likelihood estimator of parameter pbbm in the Bernoulli distribution for Xbbm ˆpbbm = nbbm n (7) tbbm is estimated as ˆtbbm = ˆpbbm · texec = nbbm · texec n (8)
  • 33. 32 Power and Energy estimates The same probabilistic approach Power consumption is random variable(Normal distribution) Implementation of the variable is associated with each tick The mean power consumption of bbm: ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm (9) Energy consumption of bbm: ˆebbm = ˆpowbbm · ˆtbbm (10)
  • 34. 33 Quality of time estimates Confidence interval for pbbm ˆpu bbm = ˆpbbm + zα/2 1 n · ˆpbbm · (1 − ˆpbbm) (11) ˆpl bbm = ˆpbbm − zα/2 1 n · ˆpbbm · (1 − ˆpbbm) (12) ˆpl bbm ≤ p ≤ ˆpu bbm (13) Confidence interval for tbbm ˆpl bbm · texec ≤ tbbm ≤ ˆpu bbm · texec (14)
  • 35. 34 Bounds and Confidence.Energy We can similarly build a confidence interval for power ˆpowu bbm = ˆpowbbm + zα/2 s √ nbbm (15) ˆpowl bbm = ˆpowbbm − zα/2 s √ nbbm (16) s = 1 nbbm − 1 · nbbm i=1 (powi bbm − ˆpowbbm)2 (17) ˆpowl bbm ≤ powbbm ≤ ˆpowu bbm (18) Confidence interval for energy consumption ˆpl bbm · texec · ˆpowl bbm ≤ ebbm ≤ ˆpu bbm · texec · ˆpowu bbm (19)
  • 36. 35 Parallel applications Basic block vector(BBV) bbm bbm = bbthread1 , bbthread2 , ..., bbthreadl (20) ˆtbbm = ˆpbbm · texec = nbbm · texec n (21) ˆpowbbm = 1 nbbm · nbbm i=1 powi bbm (22) ˆebbm = ˆpowbbm · ˆtbbm (23)
  • 37. 36 Experiments. Impact of Memory instruction How to optimize energy consumption? Performance vs Power optimization How to decrease power consumption? What affects power consumption? Block Description Basic block A Copy of BBA Mem Only memory access instructions of BBA NoMem Only arithmetic/logic instructions of BBA Mem(L2) Mem block with the size of accessed data limited to 2MB (L2 cache size on Exynos) Mem(L1) Mem block with the size of accessed data limited to 2KB (L1 cache size on Exynos) Mem(load) Mem block with load instructions only Mem(store) Mem block with store instructions only Mem(L2,load) Mem(L2) block with loads only Mem(L2,store) Mem(L2) block with stores only Mem(L1,load) Mem(L1) block with loads only Mem(L1,store) Mem(L1) block with stores only
  • 38. 37 Use case(Exynos).Power optimization.ocean cp more than 50% of the total execution time is spent in 6 basic blocks optimization strategy: remove redundant cache accesses disable prefetch,predictive commoning optimization (up to 14 % power decrease) for each basic block different strategy should be applied DVFS could be applied also... Baseline Energy-optimal Time(s) Energy (J) Time (s) Energy (J) Threads Frequency Manual optimization bb1,jacobcalc2.C:301 2.03 8.48 1.87 6.03 4 1500 MHz No bb2,slave2.C:641 1.54 6.70 1.31 4.16 2 1600 MHz Yes bb3,laplacalc.C:83 2.02 9.53 2.55 7.98 2 1500 MHz No bb4,multi.C:253 2.17 7.22 2.62 6.52 2 1500 MHz No bb5,multi.C:235 2.36 7.88 3.29 5.56 1 1500 MHZ No bb6,multi.C:290 2.67 9.23 3.23 5.46 1 1500 MHz No program 29.93 108.64 26.88 72.84 2.0 (avg.) 1516 MHz (avg.) Yes
  • 39. 38 Platforms Intel Sandy Bridge (Intel Xeon E5-2650), 2 CPU, 8 cores, 32KB/32KB I/D-Cache per core, 2MB L2 cache, 20MB L3 cache. OS: CentOS (release 6.5). Frequency: 2 GHz. Energy measurements: RAPL Samsung Exynos 5 Octa(Odroid-XU+E), ARM Big.LITTLE, 4 A15 cores,4 A7 cores, 32KB/32KB I/D-Cache per core, 2MB L2 cache, OS: Ubuntu 14.04 LTS.Frequency:1.6 GHz Energy measurements: Power meters(INA 231)