VLSID_2015_DSE_HMP_v3

VLSI
Design
&
Embedded
Systems
Conference

January
2015

Bengaluru,
India

Cross-Layer Exploration of
Heterogeneous Multicore
Processor Configurations
Santanu Sarma and N. Dutt

Introduction & Motivation
•  Emerging
and
future
compuCng
systems
will
be

heterogeneous
mulCcore
processor(HMP)[Borkar11]

•  Heterogeneity
manifest
even
in
homogenous

architectures
due
to
process
variability

[Teodorescu08]

•  They
will
be
rich
in
diﬀerent
types
of
cores
with

diverse
memories
and
accelerators
[P20
PlaQorm
;

ARM2013;
Angstrom
plaQorm,
MIT
2014]

•  They
are
monitor–rich
at
lower
layers
of
abstracCons

[Kornaros13,
Lefurgy13,
Gupta13]

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
2

Examples of Existing HMPs
Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU
Trend
towards
Heterogeneous
Mul7core
Processors

with
diﬀerent
core
specializa7on

Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU

Emerging & Future HMPs
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
4

NoC
NoC
NoC

NoC
NoC
NoC

NoC
NoC
NoC

SRAM

/SPM

Y
Y

Z

eDRAM

GPU
A7

A7

A7

A7

A7

A7

A7

A7

A7

L2

A11

A11

A11

A11

L2

A15

L2

L3

On-chip
Flash
Accelerators
Futuris7c
heterogeneous
many
core
processor
with
distributed

memories,
heterogeneous
networks
and
accelerators

Emerging & Future HMPs
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
5

Futuris7c
heterogeneous
mul7core
processor
are
expected
to
have

shared
memories,
coherent
bus,
mul7ple
networks
and
accelerators

A15

Bluetooth
GSM
WiFi
3/4G
5G

A7

A7

A7

A7

A7

A7

A7

A7

A7

L2

A11

A11

A11

A11

L2

L2

Cache
Coherent
Interconnect

L3

GPU

Accelerator

Disk

Global
Interrupt
Controller

DRAM
SPM

Y
Y

Z

OtherAccelerators

HMP Composition Problem
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
6

A7# A11#
A15#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
A7# A11#
A15#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
LLC#
A11#
A15#
A11#
A11#
A11#
A11#
A15#
A11#
A11#
A11#
LLC#
A11#
A11#
A11#
A11#
A11#
A11#
A11#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A15#A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
LLC#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7# A11#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
A7# A11#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
LLC#
A11#
A11#
A11#
A11#
A11#
A11#
A11#
A11#
(a)# (b)#
(c)# (d)#
Representative Applications Area-Power Constrained HMP Architecture
A configuration = a set of no of cores of each type
Which
HMP
conﬁgura7on
is
the
best
for
the
representa7ve

applica7ons?

6/8/15
7
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
HMP configuration for Area Budget of 4Ev6
Ev4
Ev5
Ev6
Ev8
Relative Core Sizes
EV8

EV6

EV5

EV4

Large
design
space
of
HMP
conﬁgura7ons;
4xEV8
area
results
in
46428

HMP
conﬁgura7ons

6/8/15
8
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
Ev4
Ev5
Ev6
Ev8
Config# 1
LLC

EV6
EV6

EV6
EV6

6/8/15
9
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
Ev4
Ev5
Ev6
Ev8
Config# 2
LLC

EV6
EV6

EV6

EV5
EV5

EV5
EV5

E
V
5

6/8/15
10
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
Ev4
Ev5
Ev6
Ev8
Config# 9
LLC

EV6

EV6

EV5
EV5

EV5
EV5

EV4
EV4

EV4
EV4
EV4

EV4
EV4
EV4

EV5

6/8/15
11
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
Ev4
Ev5
Ev6
Ev8
Config# 37
LLC

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

Goal
•  Explore
and
configure
a
HMP
for
a
given
system
goal
under
system
level

constraints
(e.g.,
Area
or
Power)

•  Performance
MaximizaCon
(PerfMax)

•  Energy
MinimizaCon
(EnergyMin)

•  Power
MinimizaCon
(PowerMin)

•  Energy
Efficiency
MaximizaCon
(EEMax)

•  Enables
the
designer
to
comparaCvely
evaluate
and
select
the
most

promising
(e.g.,
energy
efficient)
HMP
architecture

•  Improve
exploraCon
Cme
and
resource
requirement
at
relaCvely
small

error

•  Present
a
holisCc
cross-‐layer
approach
that
is
more
representaCve
of

actual
HMP
systems

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
12

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
13

J Platform Goal/Objective
Processing cores in the HMP configuration
Total number of cores of K types
Optimization Problem:
Total area of the HMP; ai area of core type i
HMP Configuration, a set of # each core types
Set of all feasible configurations

Challenges in
HMP Composition
•  Extremely
large
design
space

–  Large
parametric
space

–  Huge
spaCal-‐temporal
dynamics

•  Complex
InteracCon
of
layers

–  Features
and
alributes
idenCficaCon

–  Difficulty
to
capture
layer
specific
alributes

–  Mechanism
to
actuate
layer
specific
features

•  Full-‐Stack
Model
Building
Challenge

–  Large
volume
of
data
/
Big-‐data
for
model
building

–  Model
composiCon

–  Accuracy-‐complexity
trade-‐off

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
14

Related Work
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
15

•  ExisCng
work
focus
towards
HMP
runCme
systems
[14],
[4],

[21],
[17],
[2],
[6],
[10]

•  Limited
words
in
cross-‐layer
modeling
of
HMPs
and
cross-‐
layer
DSE
but
several
piece

work
in
DoE
[1],

•  Resource
allocaCon
[Zidenberg
2012,
Zidenberg
2013]

–  OpCmal
resource
allocaCon
to
specialized
Accelerators
in

SoC;

not
to
cores
in
HMPs

–  System
objecCve
:
improve
performance

–  Do
not
consider
Full-‐system
stack
and
OS

–  Narrowly
focuses
on
the
Hardware
layer
,
not
applicable

for
generic
HMPs

HMP
ComposiCon
Approach
•  Four
Stages
of
performing

Cross-‐Layer
Design
Space

ExploraCons

1.  Build
PredicCve
Model

of
each
Core
Types

2.  Compose
PredicCve

Model
of
HMP

ConfiguraCon

3.  Construct
RSM
of

System
ObjecCve
(J)

4.  Find/Search
the
Best

HMP
ConfiguraCon

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
16

Build
PredicCve
Model

of
each
Core
Types

Compose
PredicCve

Model
of
HMP

ConfiguraCon

Use
HMP
PredicCve

Model
to
build
RSM
of

ObjecCve
(J)

Find
the
Best
HMP

ConfiguraCon
for
the

ObjecCve

Cross-Layer Predictive Model
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
17

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

SO

SI

SN

SH

SC

Sensors,
monitors

and
Observer

OPERATING
CONDITION

Sensing and monitoring
at different Layers
Virtual Sensors / monitors
Physical Sensors/ monitors
Applica.ons
SA

Predic.ve

Model

Perf.
Power
Energy
HMP StackOperating Parameters HMP Predictive model
Temp.
Reliability
Error

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
18

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons

Applica7on
Layer
Features
:

#
of
ApplicaCon

ApplicaCon
Type

•  memory
bound
/
core
bound

•  Real-‐Cme
vs
sor

•  Exact,
approximate

•  ﬁxed
vs
ﬂoaCng
point

ApplicaCon
Size/
Memory
footprint

ApplicaCon
Phases

ApplicaCon
CriCcality

#of
funcCons,
classes,
loc

ApplicaCon
Complexity

Degree
of
ILP,
MLP

Accuracy
requirement

Performance
requirement

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
19

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons

Opera7ng
System
Layer
Features:

Scheduling
Policy

Alloca4on
/
Balancing
Policy

Scheduling
Epoch

Balancing
Epoch

#
of
Threads,
Thread
Types,
Thread
Priority

Thread
loca4on
history

No
of
Context
Switch

Migra4on
Overhead

busy
cycles
(cyBusy),
idle
cycles
(cyIdle),

sleep
cycles
(cySleep)

Execu4on
Time
Matrix
(

)

Performance
Characteriza4on
Matrix
(S)

Power
Characteriza4on
Matrix
(P)

Energy
Characteriza4on
Matrix
(E)

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
20

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons
Instruc7on
Set
Layer
Features:

ISA
Type
and
Width
(ﬁxed)

commiLed
instruc4ons
(Itotal),

commiLed
load
and
stores
(Imem),

commiLed
branches
(Ibranch)

Floa4ng
point
Instruc4ons
(IFP)

Integer
Instruc4ons
(Iint)

Cri4cal
Instruc4ons
(Icr)

Non-‐Cri4cal
Instruc4ons
(Incr)

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
21

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons

Hardware
Layer
Features
and
Proper7es:

Core
Type

Issue
width
(Iw),

LQ/SQ
size
(LSQ),

IQ
size
(IQ),

ROB
size
(ROB),

Int/ﬂoat
Regs
(IFR),

L1$I
size
(KB)
(L1I
),

L1$D
size
(KB)
(L1D),

L2$I
size
(KB)
(L2I
),

L2$D
size
(KB)
(L2D
)

Core
Freq.
(MHz)
(F),

Core
Voltage
(V
),

Core
Area
(a),
Uncore
Area
(au)

Core
Power
(pw)

Floorplan
and
placement

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
22

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons
Network/Bus
Layer

Features:

Bus
Proper7es:

Gem5
Shared
Bus
Model
[Binkert11]

No
of
Bus

Bus
Type,
Bus
Width,
Bus
Frequency,
Bus
Mode

#L2
Bus,
#
coherence
domains

Conten4ons

Latency

NoC
Proper7es
[Orion
2.0]:

Topology

Rou4ng
policy

Flit
size,
Flit
width

#of
VC

Buﬀer
Size

Frequency
&
Latency

Conten4ons

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
23

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

Applica.ons
Circuit
and
Device
Proper7es:

Imported
from
CACTI
[Thoziyoor08]
&

McPAT
[Shen09]

Technology
Node

Tech.
Parameters

VDD,
VTh,
Bias
Voltage

Wire
model
parameters

Delay
model
parameters

Memory
cell
model
parameters

Cell
Power
model
parameters

Building Predictive Model of
Core Types
•  Divided
in
Two
Phases:

–  Training
phase:
known
data
(or
training
set)
are
used
to
idenCfy
the

predicCve
model
conﬁguraCon;
use
special
benchmarks
for
coverage

–  PredicCon
phase:
predicCve
model
is
used
to
forecast
the
unknown

system
response

•  Use
Regression
based
data
ﬁtng
in
the
predicCve
model
of
core
types

Performance
(Throughput)
and
Power

•  Predictor
for
each
core
type:

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
24

Performance Predictor coefficients
Power Predictor coefficients
Cross-layer feature vector

Predictive Model of
Core Types
6/8/15
25
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

Applications
Linux
Kernel

Task
0

Task
n

App 0
Task
0

Task
n

App n
Operating
System
HMP
Platform
Benchmarks
Ev8

Ev6

Ev5
Ev4

Disk
DRAM

McPAT

HPC/

Sensing

Interface

….
PowerPerf.
Gem5
Predic.ve

Model

±
App.
Type,
Size,
etc

Task/Thread
Model

Task
ExecuCon
Time

Task
Throughput

Task
AllocaCon
&

Scheduling
Policy/Strategy

Memory
AllocaCon

Etc..

Hardware
Architecture

ConﬁguraCons,
Performance

Events
Counters

Bus
SpeciﬁcaCons

Circuit/Device
Scaling

Technology
Parameters

Power/Energy
ConsumpCon

circuit
delay
parameters

System Specifications
System Perf.
System Power
System Energy
Heterogeneous Platform Simulator
DoE
Data
Regression
Fitting
Full System Stack

Compose Predictive Model
of HMP Configuration
•  Use
predicCve
models
of

individual
core
type
to
compose

total
system
model

•  The
performance
and
power
of

each
core
are
added
to
get
full

system
power
and
performance

•  Core
to
core
interference
and

interacCons
is
captured
via
the

feature
of
the
last
level
cache

and
network

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
26

Build
PredicCve
Model

of
each
Core
Types

Compose
PredicCve

Model
of
HMP

ConﬁguraCon

Use
HMP
PredicCve

Model
to
build
RSM
of

ObjecCve
(J)

Find
the
Best
HMP

ConﬁguraCon
for
the

ObjecCve

Construct RSM of the
System Objective (J)
•  Response
Surface
Models
(RSM)
are

analyCcal
approximate
expression
of

the
System
ObjecCve
(J)

•  A
higher
level
predicCve
model
using

the
individual
core
type
predicCve

models

•  System
level
RSM
can
include
un-‐core

components
and
core
to
core

interacCon
characterisCcs

•  RSMs
are
dominated
by

–  core
characterisCcs
for
computaCon
centric
apps

–  Network
for
communicaCon
centric
apps

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
27

Build
PredicCve
Model

of
each
Core
Types

Compose
PredicCve

Model
of
HMP

ConﬁguraCon

Use
HMP
PredicCve

Model
to
build
RSM
of

ObjecCve
(J)

Find
the
Best
HMP

ConﬁguraCon
for
the

ObjecCve

DSE Optimization
•  Formulated
as
OpCmizaCon
Problem

–  Uses
PredicCve
Models
of
Core
types

and
of
HMP

–  Layer
specific
goals
can
be
include
in

system
level
goals

–  Models
of
Individual
core
types
are

used
to
build
HMP
models

•  Search
for
the
best
configuraCon
for
the

objecCve
using
predicCve
models
/
RSM

•  Used
global
opCmizaCon
methods
(SA)
to

find
the
configuraCon

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
28

Build
PredicCve
Model

of
each
Core
Types

Compose
PredicCve

Model
of
HMP

ConfiguraCon

Use
HMP
PredicCve

Model
to
build
RSM
of

ObjecCve
(J)

Find
the
Best
HMP

ConfiguraCon
for
the

ObjecCve

Experiments & Setup
•  Experiments
Goal:
Find
the
HMP
configuraCon
C

under
system
constraint

•  Given:

–  System-‐Level
Goal
(J):

•  Performance
MaximizaCon
(PerfMax)

•  Energy
MinimizaCon
(EnergyMin)

•  Power
MinimizaCon
(PowerMin)

•  Energy
Efficiency
MaximizaCon
(EEMax)

–  Individual
Layer
Specific
Goal:

•  E.g.,:
OS
AllocaCon
ObjecCve:
minD,
minE,
minED,
minED2

•  Heterogeneity
Aware
Allocator/Scheduler

–  Set
of
Representa.ve
Benchmarks:
PARSEC,
MediaBench

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
29

HMP Platform Setup
6/8/15
30
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

Setup:

•  Full
System
Stack

•  Integrated
with
Gem5+

McPAT+
CacC

•  Supports
mulCple
ISA

•  Run
Linux
OS

•  Modiﬁed
for
Linux
OS

Allocator
for
Heterogeneity

Awareness

•  SimulaCon
Environment:

Cluster
with
10000
cores;
10

TB
storage

Thread
0

Thread
n

App 0
Thread
0

Thread
n

App n
Applications
Operating
System
Extended
Gem5
Platform
Benchmarks
Disk
DRAM

McPAT

HPC/
Sensing

Interface

….
PowerPerf.
Core
1

RQ

Schedule()

Core
2

RQ

Schedule()

Core
n

RQ

Schedule()

load_balance()

Heterogeneity-‐
Aware
Scheduler

Linux Kernel
……
……
Ev6

$I
$D

L2

Ev7

$I
$D

L2

Ev4

$
I

$D

L2

EV8

$I
$D

L2

Performance of Predictive
Model of Core Types
6/8/15
31
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

Full-System Performance and Power
Characteristics of core types
0

1

2

3

4

5

6

%
Error

Benchmarks

Predictor
Error
for
Core
Type
Ev6

Perf.
Error

Power
Error

0

1

2

3

4

5

6

%
Error

Benchmarks

Predictor
Error
for
Core
Type
Ev4

Perf.
Error

Power
Error

Performance
and
Power
Predic7on
Errors
are
with
in
5
%
for
each
core

Types

Performance of Predictive
Model of Core Types
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
32

Legend: Huge (EV8) ; Big (EV6); Medium (EV5); Small (EV4)
Performance
and
Power
Predic7on
Errors
are
with
in
9
%
for
core-‐to-‐
core
types

Performance of HMP
System Predictive Model
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
33

0 5 10 15 20 25 30 35 40
0
5
10
15
20
25
30
35
#ofCores
HMP Configuration #
Ev4
Ev5
Ev6
Ev8
Performance
and
Power
Predic7on
Errors
are
with
in
9
%
for
System

Level
HMP
Conﬁgura7ons;
Over
1000x
speedup

Experimental
DSE
Results

•  HMP
configuraCons
can

have
2x-‐3x
performance

power
difference
for
same

area
resource

•  With
increasing
load,
the

EDP
increases
with

heterogeneity-‐awareness

•  Layer
specific
objecCves

can
severely
interfere
with

system
objecCve

•  Some
cross-‐layer
features

have
dominant
impact
on

HMP
power
and

performance

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
34

Lack
of
Heterogeneity-‐Awareness
will
have
serious
implica7ons
in

HMP
performance
and
power
and
thus
composi7on
problem

Cross-Layer DSE using Predictive Models

HMP Configurations
6/8/15
35
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

LLC

EV6
EV6

EV6
EV6

LLC

EV6
EV6

EV6

EV5
EV5

EV5
EV5

E
V
5

LLC

EV6

EV6

EV5
EV5

EV5
EV5

EV4
EV4

EV4
EV4
EV4

EV4
EV4
EV4

EV5
LLC

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

EV4

Config#9
[Energy Efficient]
Config#37
[Energy Centric]
Config#2
[Power-Throughput efficient]
Config#1
[Performance Centric]

Conclusion
•  We
presented
a
holisCc
cross-‐layer
approach
for
HMPs
composiCon
under

system
level
constraints
(e.g.,
Area
or
Power)
as
an
OpCmizaCon
problem

•  The
approach
consists
of
predicCve
cross-‐layer
model
of
core
types

and
total
system

that
are
computaConally
efficient
for
design

exploraCon

•  Enable
over
two
order
of
magnitude
improvement
in
exploraCon
Cme

and
resource
requirement

at
less
than
7%
average
error

•  We
show
:

–  HMP
configuraCons
can
have
2x-‐3x
performance
power
difference
for

same
area
resource

–  With
increasing
load,
the
EDP
increases
with
heterogeneity-‐awareness

–  Layer
specific
objecCves
can
severely
interfere
with
system
objecCve

–  Some
cross-‐layer
features
have
dominant
impact
on
HMP
power
and

performance

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
36

References
1.  The
Design
and
Analysis
of
Computer
Experiments.
Springer-‐Verlag,
2003.

2.  G.
Palermo
et
al.
Respir:
A
response
surface-‐based
pareto
iteraCve
refinement
for
applicaCon-‐specific

design
space
exploraCon.
Computer-‐Aided
Design
of
Integrated
Circuits
and
Systems,
IEEE
TransacCons

on,
28(12):1816
–1829,
dec.
2009.

3.  A.D.
Pimentel
et
al.
A
systemaCc
approach
to
exploring
embedded
system
architectures
at
mulCple

abstracCon
levels.
Computers,
IEEE
TransacCons
on,
55(2):99
–
112,
feb.
2006.

4.  K.
Keutzer
et
al.
System-‐level
design:
orthogonalizaCon
of
concerns
and
plaQorm-‐based
design.
Computer-‐
Aided
Design
of
Integrated
Circuits
and
Systems,
IEEE
TransacCons
on,
19(12):1523
–1543,
dec
2000.

5.  P.
Greenhalgh.
Big.lille
processing
with
arm
cortex-‐a15
&
cortex-‐a7:
Improving
energy
efficiency
in
high-‐
performance
mobile
plaQorms.
2011

6.  NVidia.
Variable
smp
-‐
a
mulC-‐core
cpu
architecture
for
low
power
and
high
performance.
2011

7.  T.
Zidenberg,
I
Keslassy,
and
U.
Weiser.
OpCmal
resource
allocaCon
with
mulCamdahl.
Computer,
46(7):
70–77,
July
2013.

8.  Tsahee
Zidenberg,
Isaac
Keslassy,
and
Uri
Weiser.
MulCamdahl:
How
should
i
divide
my
heterogenous

chip?
Computer
Architecture
Lelers,
11(2):65–68,
2012.

9.  Sheng
Li
et
al.
McPAT:
An
integrated
power,
area,
and
Cming
modeling
framework
for
mulCcore
and

manycore
architectures.
In
Microarchitecture,
2009.
MICRO-‐42.
42nd
Annual
IEEE/ACM
InternaConal

Symposium
on,
pages
469–480,
2009.

10.  Thoziyoor,
Shyamkumar,
et
al.
"CACTI
5.1."
HP
Laboratories,
April
2
(2008).

11.  Nathan
Binkert
et
al.
The
gem5
simulator.
SIGARCH
Comput.
Archit.
News,
39(2):1–7,
August
2011.

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
37

www.variability.org
www.nsf.gov
www.uci.edu

THANKS

S.Sarma
38

Towards Full System
Energy Efficiency Models
6/8/15
39
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015

ApplicaCon
&
Workload
Model
OS
and
Scheduling
Model

Hardware,
Memory
&
Bus

Architecture
Circuit
and
Device
Models

HMP
ComposiCon
Approach
•  Preform
Cross-‐Layer
Design

Space
ExploraCons

•  Large
design
space
pruned
by

using
DoE

•  Formulated
as
OpCmizaCon

Problem

–  Uses
PredicCve
Models
of

HMP

–  System
and
layer
specific

goals
evaluaCon
using
the

predicCve
models

–  Models
of
Individual
core

types
are
used
to
build
HMP

models

•  Used
Global
opCmizaCon

methods
(SA
or
GA)
to
find
the

configuraCon

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
40

Algorithm for DSE

Predictive Model
Layers
Parametric
Features
&
AZributes
Remarks

Hardware
Architecture

Features

Issue
width
(Iw),

LQ/SQ
size
(LSQ),
IQ
size
(IQ),

ROB
size
(ROB),

Int/ﬂoat
Regs
(IFR),

L1$I
size
(KB)
(L1I
),cL1$D
size
(KB)
(L1D),

Freq.
(MHz)
(F),
Voltage
(V
),

Core
Area
(a).

Performance
Events
Counters
branch
mispredicCon
rate
(mB);

L1
instrucCon
miss
rate
(mL1I
),

L1
data
cache
miss
rate
(mL1D),

instrucCon
TLB
miss
rate
(mITLB)

data
TLB
miss
rate
(mDTLB)

Context
switch
counters
(Cw)

Cycle
and
InstrucCon
Counters
busy
cycles
(cyBusy),
idle
cycles
(cyIdle),
sleep

cycles
(cySleep)

commiLed
instruc4ons
(Itotal),

commiLed
load
and
stores
(Imem),

commiLed
branches
(Ibranch)

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
41

HMP
ComposiCon
Approach
•  Preform
Cross-‐Layer
Design
Space
ExploraCons
Four
Stages

–  Provides
an
holisCc
approach
with
complete

system

–  Jointly
consider
features
of
the
applicaCons,
OS,

HW,
Bus/Network,
Circuits
and
devices
layers

–  Avoids
pathological
scenarios
of
single
layer

approach

–  EffecCvely
captures
crucial
interacCon
between

layers

–  Improve
exploraCon
Cme
and
resource
for
small

errors

–  Uses
computaConally
efficient
predicCve

models
developed
from

•  Large
design
space
pruned
by
using
DoE

•  Formulated
as
OpCmizaCon
Problem

–  Uses
PredicCve
Models
of
HMP

–  System
and
layer
specific
goals
evaluaCon
using
the

predicCve
models

–  Models
of
Individual
core
types
are
used
to
build

HMP
models

•  Used
Global
opCmizaCon
methods
(SA
or
GA)
to
find

the
configuraCon

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
42

Build
PredicCve
Model

of
Core
Types

Compose
PredicCve

Model
of
HMP

ConfiguraCon

Use
HMP
PredicCve

Model
to
build
RSM
of

ObjecCve
(J)

Find
the
Best
HMP

ConfiguraCon
for
the

ObjecCve

Related Work
6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
43

•  ExisCng
work
focuses
toward
HMP
runCme
systems

[14],
[4],
[21],
[17],
[2],
[6],
[10]

•  Limited
words
in
cross-‐layer
modeling
of
HMPs
and

cross-‐layer
DSE

•  Closest
to
our
work
[Zidenberg
2012,
Zidenberg

2013]

– OpCmal
resource
allocaCon
to
specialized

Accelerators
in
SoC
not
cores

– System
objecCve
:
improve
performance

– Do
not
consider
Full-‐system
stack
and
OS

– Focus
only
in
the
Hardware
layer

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
44

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

SO

SI

SN

SH

SC

Sensors,
monitors

and
Observer

OPERATING
CONDITION

at different Layers
Applica.ons
SA

Predic.ve

Model

Perf.
Power
Energy
Temp.
Reliability
Error

6/8/15
©
VLSI
Design
&
Embedded
Systems
Conference
-‐
2015
45

Opera.ng
System

Instruc.on
Set
Architecture

Hardware
Architecture

Network/Bus

Communica.on
Architecture

Device/Circuit
Architecture

SO

SI

SN

SH

SC

Sensors,
monitors

and
Observer

OPERATING
CONDITION

at different Layers
Applica.ons
SA

Predic.ve

Model

Perf.
Power
Energy
Temp.
Reliability
Errors
Vulnerabil

VLSID_2015_DSE_HMP_v3

Recommended

Recommended

More Related Content

Similar to VLSID_2015_DSE_HMP_v3

Similar to VLSID_2015_DSE_HMP_v3 (20)

VLSID_2015_DSE_HMP_v3