Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems

POWER AND CLOCK GATING
MODELLING
IN
COARSE GRAINED
RECONFIGURABLE SYSTEMS
Francesca Palumbo
University of Sassari
PolComIng
Information Engineering Unit
Tiziana Fanni, Carlo Sau, Paolo Meloni
and Luigi Raffo
University of Cagliari
Dept. of Electrical and Electronics Eng.
tiziana.fanni@diee.unica.it
ACM International Conference on
Computing Frontiers 2016
E
O
L
A
B

OUTLINE
 Problem Statement
 Background
 Dataflow Model of Computation
 Coarse Grain Reconfiguration
 Multi-Dataflow Composer Tool
 Proposed approach
 Parameters Analysis
 Power Estimation Model
 Logic Regions Analysis
 Metodology assesment
 Conclusions
2

PROBLEM STATEMENT
Consumers need:





 3

PROBLEM STATEMENT
Consumers need:
 Integrated complex and fancy
resurce-intensive applications




 3

PROBLEM STATEMENT
Consumers need:
 Long battery life



 3

PROBLEM STATEMENT
Consumers need:




Possible solutions:
3

PROBLEM STATEMENT
Consumers need:
 Dataflow Model of Computation
 Modularity and parallelism 
 INTEGRATION AND RE-USABILITY


Possible solutions:
3

PROBLEM STATEMENT
Consumers need:
 Dataflow Model of Computation
 Modularity and parallelism 
 INTEGRATION AND RE-USABILITY
 Coarse-graine reconfiguration
 Flexibility and resource sharing 
 MULTI-APPLICATION PORTABLE DEVICES
Possible solutions:
3

BACKGROUND
Dataflow Model of Computation
 DATAFLOW FORMALISM
 Directed graph of actors
(functional units).
 Actors exchange tokens
(data packets) through
dedicated channels


 4

BACKGROUND
(functional units).
dedicated channels



actions
state
4

BACKGROUND
(functional units).
dedicated channels
 CHARACTERISTICS
 Explicit the intrinsic application parallelism.
 Modularity favours model
re-usability/adaptivity.
actions
state
4

BACKGROUND
Coarse-Graine Reconfiguration
A
in1
D
in2
F
in3
E
G
B
Sbox_1
Sbox_2
out1
C
Sbox_0
ID net=1
LUTCoarse Grained Reconfigurable Platform
A B C
D E C
A
F
CG
Dataflow Descriptions
N:1
α
β
γ
α
β
γ
0 0
0
5

BACKGROUND
A
in1
D
in2
F
in3
E
G
B
Sbox_1
Sbox_2
out1
C
Sbox_0
ID net=3
A B C
D E C
A
F
CG
N:1
α
β
γ
α
β
γ
1 1
0



6

BACKGROUND
A
in1
D
in2
F
in3
E
G
B
Sbox_1
Sbox_2
out1
C
Sbox_0
ID net=3
A B C
D E C
A
F
CG
N:1
α
β
γ
α
β
γ
1 1
0
 Large Power consumption  Power Saving
Methodologies
 Dynamic  Clock-gating
 Static  Power-gating
6

BACKGROUND
http://sites.unica.it/rpct/Multi-Dataflow Composer Tool
A
in1
D
in2
F
in3
E
G
B
Sbox_1
Sbox_2
out1
C
LR1
LR2: Always ON LR
LR3
LR4
LR5
LR4
RTN
RTN
RTN
RTN
RTN
RTN
ISOL
ISOL
ISOLISOL
Sbox_0
ISOL
VDD
SW
VDD
SW
VDD
SW
ISOL
VDD
SW
ID net
Power
Controller
CG_en1
CG_en4
CG_en5
CG_en3
clock
7

PROPOSED APPROACH
Parameters analysis:










8

PROPOSED APPROACH
 Architectural parameters









8

PROPOSED APPROACH
 Actors
 RAMs
 Combinatorial Logic
 Sequential Logic…





8

PROPOSED APPROACH
 Actors
 RAMs
 Functional parameters




8

PROPOSED APPROACH
 Actors
 RAMs
 Input Data
 LR’s Activation Time


8

PROPOSED APPROACH
 Actors
 RAMs
 Input Data
 Technological parameters

8

PROPOSED APPROACH
 Actors
 RAMs
 Input Data
 Technological parameters
 As transistors get smaller the contribution of
the static contribute gets larger and not negligible.
8

PROPOSED APPROACH
 Power gating.
 Clock gating.
Power Estimation Model
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
TregrtnregregPrtnRCPcmbP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i



 9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Combinational
Logic
Standard Cost
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Combinational
Logic
Sequential
Logic
Standard Cost
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Retention CellsCombinational
Logic
Sequential
Logic
Standard Cost Overhead
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Retention Cells
Isolation Cells
Combinational
Logic
Sequential
Logic
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Retention Cells
Isolation Cells
Power Controller
Combinational
Logic
Sequential
Logic
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Retention Cells
Isolation Cells
Power Controller
Clock Gating Cells
Combinational
Logic
Sequential
Logic
9

PROPOSED APPROACH
 Power gating.
 Clock gating.
]*)(*)([
]*)(*)([
]*)*#(*)*#([
*]/#)#(#*)()*#()([
)(_)()(
iOFFOFFiONON
iOFFOFFiONON
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TContrPTContrP
TisoISOPTisoISOP
LROverExtLRPLRP
i






]*)(*)([
]*)()([
)(_)()(
iOFFOFFiONON
iON
LRactors
iiONi
TCGPTCGP
TregPcmbP
LROverExtLRPLRP
i




Retention Cells
Isolation Cells
Power Controller
Clock Gating Cells
Combinational
Logic
Sequential
Logic
9
Term not present
in CG static model!

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
ESTIMATE PG ESTIMATE CG
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Area < Area_th Evaluate CG
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Power CG: +1%  Discard
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Area > Area_th Evaluate PG
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Power PG: +2%  Evaluate CG
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Power CG: -5%  Apply CG
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
10

PROPOSED APPROACH
LRs analysis
NO
YES
NO
YES
NO
YES
NO
YES
AREA
THRESHOLDING
Select PDi to be
Implement with PG
Select PDi to be
Implement with CG
START
i=0
Increment i
STOP
1
2 3
∀PDi
i<N+1
Is Above?
Is there
Saving%?
Is there
Saving%?
LRs LR1 LR2 LR3
Area 2% 5.8% 23%
PG --- +2% -18%
CG +1% -5% ---
Area_th: 5%
Power CG: -18%  Apply PG
10

METODOLOGY ASSESMENT
Design Under Test: ASIC 90nm technology
host
processor
FFT
coprocessor
INTERCONNECTION LAYER
APPLICATION # KERNEL #LRs
FFT 4 8
11

host
processor
FFT
coprocessor
FFT 4 8
PDs TiON #reg Area
%
PD1 33% 1024 62.44
PD2 67% 512 0.46
PD3 37% 256 15.84
PD4 4% 0 0.41
PD5 21% 0 0.25
PD6 42% 0 0.43
PD7 58% 128 7.93
PD8 96% 512 1.36 11

host
processor
FFT
coprocessor
FFT 4 8
PDs TiON #reg Area
%
PD1 33% 1024 62.44
PD2 67% 512 0.46
PD3 37% 256 15.84
PD4 4% 0 0.41
PD5 21% 0 0.25
PD6 42% 0 0.43
PD7 58% 128 7.93
PD8 96% 512 1.36
Highly active
11

host
processor
FFT
coprocessor
FFT 4 8
PDs TiON #reg Area
%
PD1 33% 1024 62.44
PD2 67% 512 0.46
PD3 37% 256 15.84
PD4 4% 0 0.41
PD5 21% 0 0.25
PD6 42% 0 0.43
PD7 58% 128 7.93
PD8 96% 512 1.36
CombinatorialHighly active
11

host
processor
FFT
coprocessor
FFT 4 8
PDs TiON #reg Area
%
PD1 33% 1024 62.44
PD2 67% 512 0.46
PD3 37% 256 15.84
PD4 4% 0 0.41
PD5 21% 0 0.25
PD6 42% 0 0.43
PD7 58% 128 7.93
PD8 96% 512 1.36
CombinatorialHighly active Best candidate
11

host
processor
FFT
coprocessor
FFT 4 8
PDs TiON #reg Area
%
PD1 33% 1024 62.44
PD2 67% 512 0.46
PD3 37% 256 15.84
PD4 4% 0 0.41
PD5 21% 0 0.25
PD6 42% 0 0.43
PD7 58% 128 7.93
PD8 96% 512 1.36
Small LRsCombinatorialHighly active Best candidate
11

Estimated Saving
12

Estimated Saving
Area_th: 5% PG_1
12

Estimated Saving
Area_th: 5% PG_1
PG PDs: PD1, PD3, PD7
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
Area_th: 10% PG_2
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
Area_th: 10% PG_2
PG PDs: PD1, PD3
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
Area_th: 10% PG_2
PG PDs: PD1, PD3
CG PDs: PD2, PD7, PD8
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
Area_th: 10% PG_2
PG PDs: PD1, PD3
Err. 0.33% Err. 1.41% Err. 1.08%
Err. 0.88% Err. 11.62% Err. 39.67%
12

Estimated Saving
Area_th: 5% PG_1
CG PDs: PD2, PD8
Area_th: 10% PG_2
PG PDs: PD1, PD3
Err. 0.33% Err. 1.41% Err. 1.08%
Err. 0.88% Err. 11.62% Err. 39.67%
0.16% Estimated Vs
0.11% Real
12

Designs Comparison
13

CONCLUSIONS
Advantages of the proposed approach
 Given a CGR design with n kernel and k LRs






14

CONCLUSIONS
 The proposed approach requires uniquely the
analysis of the baseline CGR design:
 one synthesis
 n simulations (one foreach kernel)



14

CONCLUSIONS
 The proposed approach requires uniquely the
analysis of the baseline CGR design:
 one synthesis
 n simulations (one foreach kernel)
 Otherwise:
 (2*k+1) (one PG and one CG for each LR + one of
Base the design)
 (2*k+1)*n simulation (n simulation for each design) 14

ACKNOWLEDGEMENTS
 The research leading to these results has
received funding from:
 the Region of Sardinia L.R.7/2007 under grant
agreement CRP-18324 [RPCT Project].
 the Region of Sardinia, Young Researchers Grant, POR
Sardegna FSE 2007-2013, L.R.7/2007 “Promotion of the
scientific research and technological innovation in
Sardinia”
15

Francesca Palumbo
University of Sassari
PolComIng
Information Engineering Unit
Tiziana Fanni, Carlo Sau, Paolo Meloni
and Luigi Raffo
University of Cagliari
Dept. of Electrical and Electronics Eng.
E
O
L
A
B
?
ACM International Conference on
Computing Frontiers 2016
tiziana.fanni@diee.unica.it

READ FULL ARTICLE
 ACM Digital Library
http://dl.acm.org/citation.cfm?id=2911713
 RPCT Project Website:
http://sites.unica.it/rpct/references/

REFERENCE
 @inproceedings{Fanni_CF2016,
author = {Fanni, Tiziana and Sau, Carlo and Meloni, Paolo
and Raffo, Luigi and Palumbo, Francesca},
title = {Power and Clock Gating Modelling in Coarse
Grained Reconfigurable Systems},
booktitle = {Proceedings of the ACM International
Conference on Computing Frontiers},
series = {CF '16},
year = {2016},
location = {Como, Italy},
doi = {10.1145/2903150.2911713},
publisher = {ACM},
address = {New York, NY, USA}, }
BibTex

Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (11)

Similar to Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems

Similar to Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems (20)

More from MDC_UNICA

More from MDC_UNICA (12)

Recently uploaded

Recently uploaded (7)

Power and Clock Gating Modelling in Coarse Grained Reconfigurable Systems