SlideShare a Scribd company logo
1 of 30
Download to read offline
SmartBalance:	
  A	
  Sensing-­‐Driven	
  Linux	
  
Load	
  Balancer	
  for	
  Energy	
  Efficiency	
  of	
  
Heterogeneous	
  MPSoCs	
  
Santanu	
  Sarma	
  	
  
Computer	
  Science,	
  UC	
  Irvine	
  
	
  
	
  
	
  
	
  
h3p://variability.org	
  	
  
Coauthors: Tiago R. Muck, Danny Bathen, N. Dutt, A. Nicolau
T1: Measurement and Modeling
T2: Design Tools and Testing
T3: Microarchitecture and Compilers
T4: Runtime Support
T5: Applications and Testbeds
T6: Outreach and Education
Current Trends in MPSoC
•  Emerging	
  and	
  future	
  compu?ng	
  systems	
  will	
  be	
  
heterogeneous	
  mul?core	
  processor(HMP)[Borkar11]	
  
•  They	
  will	
  be	
  rich	
  in	
  different	
  types	
  of	
  cores	
  with	
  
diverse	
  memories	
  and	
  accelerators	
  [ARM	
  big.Li3le,
2013;	
  Angstrom	
  plaTorm,	
  MIT	
  2014,	
  P2012	
  PlaTorm]	
  
•  Heterogeneity	
  manifest	
  even	
  in	
  homogenous	
  
architectures	
  due	
  to	
  process	
  variability	
  	
  
[Teodorescu08]	
  
•  They	
  are	
  monitor–rich	
  at	
  lower	
  layers	
  of	
  abstrac?ons	
  
[Kornaros13,	
  Lefurgy13,	
  Gupta13]	
  	
  	
  
6/10/15	
  
©	
  VLSI	
  Design	
  &	
  Embedded	
  Systems	
  
Conference	
  -­‐	
  2015	
  
2	
  
Heterogeneous	
  PlaTorms	
  	
  
Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU
Clear	
  Trend	
  Towards	
  Heterogeneous	
  Many/mul;	
  
core	
  Architectures	
  with	
  different	
  core	
  types	
  
Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU
Emerging & Future HMPs
6/10/15	
   4	
  
Futuris;c	
  heterogeneous	
  mul;core	
  processor	
  are	
  expected	
  to	
  have	
  
shared	
  memories,	
  coherent	
  bus,	
  mul;ple	
  networks	
  and	
  accelerators	
  
A15	
  
Bluetooth	
   GSM	
  WiFi	
   3/4G	
   5G	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
L2	
  
A11	
  
A11	
  
A11	
  
A11	
  
L2	
  
L2	
  
Cache	
  Coherent	
  Interconnect	
  
L3	
  
GPU	
  	
  
Accelerator	
  
Disk	
  
Global	
  Interrupt	
  Controller	
  	
  
DRAM	
   SPM	
  
Y	
   Y	
  
Z	
  
OtherAccelerators
Smart	
  Load	
  Balancing	
  Problem	
  
•  Standard	
  Load	
  Balance:	
  Distribute	
  threads	
  
(tasks)	
  among	
  cores	
  uniformly	
  and	
  randomly	
  
(lack	
  of	
  awareness	
  at	
  thread	
  level)	
  	
  
	
  
•  Smart	
  Load	
  Balancing	
  :	
  	
  Distribute	
  threads	
  
(tasks)	
  among	
  cores	
  with	
  awareness	
  of	
  
energy/power	
  at	
  thread	
  levels	
  
5	
  
Tradi?onal	
  OS	
  Allocator	
  &	
  Scheduler	
  
•  Do	
  not	
  cope	
  jointly	
  with	
  
workload	
  variability	
  and	
  
heterogeneity	
  
•  Do	
  not	
  expose	
  
Variability	
  at	
  OS	
  layer	
  
•  Lacks	
  suitable	
  
Abstrac?ons	
  	
  
•  Lacks	
  support	
  for	
  
Generic	
  HMPs	
  
6	
  
Alloca?on	
  /	
  Balancing	
  
A7	
   A11	
  
A15	
  
A11	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
A7	
   A11	
  
A15	
  
A11	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
LLC	
  
Task1	
  
Task2	
  
Task	
  n	
  Task	
  m	
  
Scheduler	
  
Tradi;onal	
  OS	
  (eg.	
  Linux)	
  not	
  yet	
  ready	
  to	
  deal	
  with	
  
DUAL	
  Challenge	
  of	
  Heterogeneity	
  and	
  Variability	
  	
  	
  	
  
SmartBalance	
  Approach	
  
7	
  
A7	
   A11	
  
A15	
  
A11	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
A7	
   A11	
  
A15	
  
A11	
  A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
LLC	
  
Task1	
  
Task2	
  
Task	
  n	
  Task	
  m	
  
Predict	
  
Balance	
  Sense	
  
Scheduler	
  
•  Sensing-­‐driven	
  closed	
  
loop	
  predic?ve	
  approach	
  
•  Support	
  Generic	
  HMPs	
  	
  
•  Supports	
  shared	
  &	
  
independent	
  task	
  models	
  
Heterogeneity	
  and	
  Performance-­‐Power-­‐Aware	
  
Balancer/Allocator	
  for	
  Generic	
  HMPs	
  
SmartBalance	
  Approach	
  
8	
  
Sensing
Es?ma?on	
  &	
  Predic?on
Alloca?on
Epoch	
  1 Epoch	
  2 Epoch	
  3
Scheduling	
  	
  
	
  	
  	
  	
  (CFS)
TSTA
Scheduling	
  	
  
	
  	
  	
  	
  (CFS)
Scheduling	
  	
  
	
  	
  	
  	
  (CFS)
TEpoch
A7	
   A11	
  
A15	
  
A11	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
A7	
   A11	
  
A15	
  
A11	
  A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
LLC	
  
Task1	
  
Task2	
  
Task	
  n	
  Task	
  m	
  
Predict	
  
Balance	
  Sense	
  
Scheduler	
  
SmartBalance	
  stages	
  are	
  divided	
  into	
  ;me	
  slices	
  
called	
  EPOCHS	
  
On-­‐Chip	
  Sensing	
  and	
  Measurement	
  
•  Performance	
  Sensing	
  
–  Hardware	
  Performance	
  	
  
counters	
  at	
  each	
  core	
  
•  Power	
  Sensing	
  
–  Per	
  core	
  total	
  power	
  
sensing	
  	
  
–  Dynamic	
  Power	
  Sensing:	
  
virtual	
  sensor	
  based	
  per	
  
core	
  
–  Leakage	
  Power	
  Sensing:	
  
•  Per	
  block	
  leakage	
  sensor	
  
•  Network	
  of	
  sensors	
  	
  
Epoch	
  1 Epoch	
  2 Epoch	
  3
A7	
   A11	
  
A15	
  
A11	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
A7	
   A11	
  
A15	
  
A11	
  A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A7	
  
A11	
  
A11	
  
LLC	
  
10	
  
SmartBalance	
  Sensing	
  and	
  Measurement	
  
…….#
Smart#Balancing#Epoch#TEpoch(k)#
Linux#CFS#
Sched#Period#T1k(1)# T1k(L)#
…#τ1 τ2 τ m @me#Core1#
TEpoch(kA1)#
…….# @me#Core2#
…….# @me#Coren#
Sense# Es@mate#&#predict# Balance#
Performance-­‐Power	
  Predic?on	
  
•  Performance	
  predic?on	
  at	
  
each	
  core	
  types	
  based	
  on	
  
profiling	
  or	
  online	
  learning	
  	
  
•  Customized	
  predictors	
  for	
  
each	
  different	
  core	
  with	
  
fine	
  /	
  precise	
  predic?on	
  
–  For	
  known	
  architectures	
  
•  A	
  generic	
  predictor	
  for	
  
coarse	
  predic?on	
  	
  
–  For	
  unknown	
  architectures	
  
with	
  new	
  core	
  types	
  
11	
  
Epoch	
  1 Epoch	
  2 Epoch	
  3
Variability-­‐Aware	
  
Performance	
  &	
  
Power	
  Predic?on	
  	
  Performance	
  	
  
Counters	
  
Configura?ons
…..
Perf. Matrix
Power Matrix
Power	
  &	
  Variability	
  	
  
Sensing
On-­‐line	
  Op?miza?on	
  	
  	
  
l  Problem	
  Defini?on:	
  	
  
l  NP-­‐HARD	
  problem	
  
l  Finding	
  soluGon	
  requires	
  
heurisGcs	
  
l  Simulated	
  annealing	
  based	
  
AllocaGon	
  
l  On-­‐line	
  low-­‐overhead	
  (<	
  1%	
  for	
  
100ms	
  Epoch	
  )	
  
Task	
  
AllocaGon	
  
(SA	
  Based	
  
Online	
  Solver)
Objec?ve(s)
Alloca?on
Epoch	
  1 Epoch	
  2 Epoch	
  3
t1 t4
t3
t2
ipc00
ipcij
Perf. matrix
p00
pij
Power. matrixmax
Ψ
IPS
Power
⎛
⎝⎜
⎞
⎠⎟
CFS CFS CFS
SmartBalance	
  Approach	
  
13	
  
35#
Epoch#1# Epoch#2# Epoch#3#
Variability3Aware#
Performance#&#
Power#Predic=on###Performance##
Counters##
Configura=ons#
…..#
Perf.#Matrix#
Power#Matrix#
Power#&#Variability##
Sensing#
A7# A11#
A15#
A11#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
A7# A11#
A15#
A11#A7#
A7#
A7#
A7#
A7#
A7#
A7#
A11#
A11#
LLC#
Task1#
Task2#
Task#n#Task#m#
Predict(
Balance(Sense(
Scheduler#
Alloca=on#/##
Load#Balancer#
A15#
A15#
LLC#
Task1#
Task2#
Task#n#Task#m#
Scheduler#
(a)  Tradi2onal(OS((
Allocator/Scheduler(
(b)(Sensing;driven(Predic2ve(
OS(Allocator/Scheduler(
##
A15#
A15#
A15#
A15#
(c)(Predic2ve(alloca2on(for(epochs(
(Each(epoch(cover(mul2ple(Linux(scheduling(cycles(
Experimental	
  PlaTorm	
  
•  Extension	
  of	
  the	
  
gem5	
  	
  
– McPAT	
  	
  +	
  power	
  
variability	
  
– Sensing	
  interface	
  	
  
•  Heterogeneous	
  
Alpha-­‐based	
  cores:	
  
–  8-­‐way	
  OoO	
  (Huge)	
  
–  4-­‐way	
  OoO	
  (Big)	
  
–  2-­‐way	
  OoO	
  (Medium)	
  
–  Inorder	
  (Small)	
  
14	
  
Thread'0'
Thread'n'
App'0'
Thread'0'
Thread'n'
App'n'
Applica/ons'
Opera/ng''
System'
Extended'
Gem5'
Pla;orm'
Benchmarks'
Disk' DRAM'
McPAT'
HPC/'
Sensing'
Interface'
….'
Power'Perf.'
Core'1'
RQ'
Schedule()'
Core'2'
RQ'
Schedule()'
Core'n'
RQ'
Schedule()'
load_balance()'
smart_balance()'
Linux'Kernel'
……'
……'
Big(
$I' $D'
L2'
Medium(
$I' $D'
L2'
Small(
$I' $D'
L2'
Huge(
$I' $D'
L2'
Experimental	
  Goals	
  &	
  Benchmarks	
  
•  Goals:	
  Improve	
  Energy	
  Efficiency	
  
•  Benchmarks:	
  PARSEC	
  	
  &	
  Mixes	
  
•  Interac?ve	
  Benchmarks	
  (IMB)	
  
– 9	
  IMBs	
  (e.g.	
  High	
  Throughput	
  High	
  Interac?vity	
  
HTHI)	
  
– Ability	
  to	
  control	
  phases,	
  wait	
  periods	
  etc	
  
15	
  
PARSEC
Mixes,
Mix1, Mix2, Mix3, Mix4, Mix5, Mix6,
X264Hcrew,
x264Hbow,
x264Lcrew,
x264Lbow,
x264Lcrew,
x264Hbow,
x264Hcrew,
x264Lbow,
Bodytrack,
x264Hcrew,
,
Bodytrack,
x264Hcrew,
x264Lbow,
16	
  
Results	
  w.r.t.	
  Vanilla	
  Linux	
  CFS	
  
Over 50 % improvement wrt to Vanilla Linux Kernel
•  Linux CFS Scheduler Uniformly distributes the threads
irrespective of the core types & feature
•  SmartBalance makes workload & power-aware runtime decisions
17	
  
Results	
  wrt	
  ARM	
  GTS	
  	
  
Over ~20% improvement wrt State-of-the-art ARM GTS
•  ARM GTS makes binary decision to select either a big core or a
small core based on utilization threshold
•  Unaware of thread-level power and performance
18	
  
Overheads	
  
Overhead is < 1% for 100ms Epoch for Quad-core system
19	
  
Predictor	
  Performance	
  
20	
  
Scalability	
  	
  
21	
  
Related	
  Work	
  
Reference' Scheme'Generality' Per2Thread'
Awareness'
Per2Core'Awareness' Integrated'
&'
Implemen
ted'in''OS'No'Core'
Types'>2'
Thread2to2
core2raBo'
>1'
'
IPC' Power' UBl.' IPC' Power'
Chen2009(( Yes( No( No( No( No( Yes( Yes( No(
Annamalai2013( No( No( No( No( No( Yes( Yes( No(
Liu2013( Yes( Yes( No( No( No( Yes( Yes( No(
Kim2014(( No( Yes( No( No( Yes( No( No( Yes(
Linaro(IKS(2013( No( Yes( No( No( Yes( No( No( Yes(
ARM(GTS(2013( No( Yes( No( No( Yes( No( No( Yes(
SmartBalance' Yes' Yes' Yes' Yes' Yes' Yes' Yes' Yes'
Summary	
  and	
  Future	
  Work	
  
•  Performance-­‐Power-­‐Aware	
  PredicGve	
  Linux	
  
Load	
  Balancing	
  	
  
•  Over	
  50%	
  improvement	
  for	
  Quad	
  core	
  HMP	
  
at	
  <1%	
  overhead	
  
22	
  
Predict	
  
Balance	
  Sense	
  
•  Over	
  20	
  %	
  improvement	
  in	
  
energy	
  efficiency	
  wrt	
  ART	
  
GTS	
  policy	
  	
  
•  Future	
  Work:	
  Load,	
  Priority,	
  
and	
  Thermal	
  Awareness	
  of	
  
the	
  balancer	
  
6/1/12	
   ©	
  Santanu	
  Sarma,	
  UCI	
  	
   23	
  
Thanks	
  	
  
santanus@uci.edu
24	
  
Experimental	
  Setup	
  
Task%0%
Task%n%
App%0%
Task%0%
Task%n%
App%n%
Applica'ons+
Opera'ng%%
System+
Pla5orm+
Benchmarks%
Huge%
Big%
Medium+ Small+
Disk% DRAM%
McPAT%
HPC/%
Sensing%
Interface%
….%
Gem5%Performance%Simulator%
Ext.%for%Heterogeneous%MPSOC%
Core%0%
RQ%
Schedule()%
Core%1%
RQ%
Schedule()%
Core%n%
RQ%
Schedule()%
Power%Perf.%
load_balance()%
smart_balance()+
Linux+Kernel+
……%
……%
CORE+FEATURES+ Huge+ Big+ Medium+ Small+
Issue%width%% 8% 4% 2% 1%
LQ/SQ%size%% 32/32% 16/16% 8/8% 8/8%
IQ%size%% 64% 32% 16% 16%
ROB%size%% 192% 128% 64% 64%
Int/float%Regs%% 256% 128% 64% 64%
L1$I%size%(KB)%% 64% 32% 16% 16%
L1$D%size%(KB)%% 64% 32% 16% 16%
Freq.%(MHz)%% 2000% 1500% 1000% 500%
Voltage%(V)1% 1% 0.8% 0.7% 0.6%
Peak%Throughput1%% 4.18% 2.60% 1.31% 0.91%
Peak%Power%(W)1% 8.62% 1.41% 0.53% 0.095%
Area%(%mm%2%)%1%% 11.99% 5.08% 3.04% 2.27%
1%Es^mated%using%Gem5+and%McPAT%at%22nm%with%PARSEC%benchmarks%
(a)% (b)%
Implementa?on	
  in	
  Linux	
  
•  Modifica?ons	
  for	
  SmartBalance	
  
– Load	
  balancing	
  replaced	
  by	
  SmartBalance	
  
– Each	
  phase	
  runs	
  as	
  a	
  kernel	
  thread	
  
– System	
  does	
  not	
  halt	
  while	
  running	
  SmartBalance	
  25	
  
26	
  
Predictor	
  Coefficients	
  
Preliminary	
  results	
  
27	
  
28	
  
Sensing	
  Overhead	
  
0	
  
0.05	
  
0.1	
  
0.15	
  
0.2	
  
0.25	
  
0.3	
  
0.35	
  
0.4	
  
0.45	
  
0.5	
  
1	
   2	
  
%	
  Overhead	
  wrt	
  	
  4	
  cores	
  
Sensor	
  type	
  
Leakage	
  Sensing	
  Overhead	
  
%	
  Area	
  Overhead	
  	
  
%Power	
  Overhead	
  
CPSoC	
  Computa?onal	
  PlaTorm	
  
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPU CPU
$I $D
$L2
NIA
OCSA
NoC
Router
OCSA
CPS Core
Adaptive Router
Chip Hardware
App 1 App 2 App N
Cross-Layer Sensors
(Virtual & Physical)
Decisions & Learning
(Controller)
Actuation (software
and hardware)
ReflectiveMiddlewareLayer
Scheduling
Memory
Manager
File System
Device
Drivers
Traditional Operating System
Hypervisor
Observe Decide
Act
Application
Layer
CPSCore
DDRO(s)
Oxide
Sensor(s)
Temperature
Sensor(s)
Leakage
Sensor(s)
Aging Sensor(s)
Reliability Sensor(s)
Performance Counters
CPU(s)
$I $D
$L2
Scratch pad/
On-Chip SRAM
NIA
Timer & RTC
PLL
On-chip
Actuation Unit
On-Chip Sensing & Actuation (OCSA)
GPIO
!
Cross-­‐Layer	
  Physical/Virtual	
  Sensing	
  &	
  
Actua?on	
  	
  
ApplicaGons	
  
OperaGng	
  System	
  
Network/Bus	
  CommunicaGon	
  
Architecture	
  	
  
Hardware	
  Architecture	
  
Device/Circuit	
  Architecture	
  	
  	
  
SA	
  
SO	
  
SN	
  
SH	
  
SC	
  
Sensors	
  	
  
(Observer)	
  
AdapGve	
  Control	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
(Decide)	
  
AA	
  
AO	
  
AN	
  
AH	
  
AC	
  
Actua?on	
  	
  
(Act)	
  

More Related Content

What's hot

Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Deepak Kumar
 
TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...chiportal
 
Multiprocessor architecture and programming
Multiprocessor architecture and programmingMultiprocessor architecture and programming
Multiprocessor architecture and programmingRaul Goycoolea Seoane
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Ismail Mukiibi
 
Advance hdl design training on xilinx fpga
Advance hdl design training on xilinx fpgaAdvance hdl design training on xilinx fpga
Advance hdl design training on xilinx fpgademon_2M
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPA B Shinde
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An OverviewSanjiv Malik
 
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPU
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPUModern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPU
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPUabhijeetnawal
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and VerificationDVClub
 
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyDesign of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyTELKOMNIKA JOURNAL
 
4+yr Hardware Design Engineer_Richa
4+yr Hardware Design Engineer_Richa4+yr Hardware Design Engineer_Richa
4+yr Hardware Design Engineer_RichaRicha Verma
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014Hossam Hassan
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUBNusrat Mary
 

What's hot (20)

Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)Implementation of Soft-core processor on FPGA (Final Presentation)
Implementation of Soft-core processor on FPGA (Final Presentation)
 
CPLD & FPLD
CPLD & FPLDCPLD & FPLD
CPLD & FPLD
 
SoC FPGA Technology
SoC FPGA TechnologySoC FPGA Technology
SoC FPGA Technology
 
TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...TRACK D: A breakthrough in logic design drastically improving performances fr...
TRACK D: A breakthrough in logic design drastically improving performances fr...
 
Multiprocessor architecture and programming
Multiprocessor architecture and programmingMultiprocessor architecture and programming
Multiprocessor architecture and programming
 
Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6Advanced computer architecture lesson 5 and 6
Advanced computer architecture lesson 5 and 6
 
Advance hdl design training on xilinx fpga
Advance hdl design training on xilinx fpgaAdvance hdl design training on xilinx fpga
Advance hdl design training on xilinx fpga
 
Advanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILPAdvanced Techniques for Exploiting ILP
Advanced Techniques for Exploiting ILP
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An Overview
 
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPU
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPUModern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPU
Modern INTEL Microprocessors' Architecture and Sneak Peak at NVIDIA TEGRA GPU
 
VLIW Processors
VLIW ProcessorsVLIW Processors
VLIW Processors
 
Pipeline parallelism
Pipeline parallelismPipeline parallelism
Pipeline parallelism
 
Low-Power Design and Verification
Low-Power Design and VerificationLow-Power Design and Verification
Low-Power Design and Verification
 
CPLDs
CPLDsCPLDs
CPLDs
 
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking TechnologyDesign of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
Design of LDPC Decoder Based On FPGA in Digital Image Watermarking Technology
 
4+yr Hardware Design Engineer_Richa
4+yr Hardware Design Engineer_Richa4+yr Hardware Design Engineer_Richa
4+yr Hardware Design Engineer_Richa
 
Public Seminar_Final 18112014
Public Seminar_Final 18112014Public Seminar_Final 18112014
Public Seminar_Final 18112014
 
Superscalar Architecture_AIUB
Superscalar Architecture_AIUBSuperscalar Architecture_AIUB
Superscalar Architecture_AIUB
 
Reconfigurable computing
Reconfigurable computingReconfigurable computing
Reconfigurable computing
 
TensorRT survey
TensorRT surveyTensorRT survey
TensorRT survey
 

Similar to SmartBalance-DAC-v2

PPT_for_big_LITTLE_style_Asymmetric_Mult.pptx
PPT_for_big_LITTLE_style_Asymmetric_Mult.pptxPPT_for_big_LITTLE_style_Asymmetric_Mult.pptx
PPT_for_big_LITTLE_style_Asymmetric_Mult.pptxssuser8b324e
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plansinside-BigData.com
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...Arun Joseph
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialmadhuinturi
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Summit
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Alpine Data
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsHPCC Systems
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Databricks
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUGlobalLogic Ukraine
 
Unleash performance through parallelism - Intel® Math Kernel Library
Unleash performance through parallelism - Intel® Math Kernel LibraryUnleash performance through parallelism - Intel® Math Kernel Library
Unleash performance through parallelism - Intel® Math Kernel LibraryIntel IT Center
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsRuhaim Izmeth
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORScscpconf
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processorscsandit
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsNational Cheng Kung University
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Intel® Software
 

Similar to SmartBalance-DAC-v2 (20)

PPT_for_big_LITTLE_style_Asymmetric_Mult.pptx
PPT_for_big_LITTLE_style_Asymmetric_Mult.pptxPPT_for_big_LITTLE_style_Asymmetric_Mult.pptx
PPT_for_big_LITTLE_style_Asymmetric_Mult.pptx
 
Update on Trinity System Procurement and Plans
Update on Trinity System Procurement and PlansUpdate on Trinity System Procurement and Plans
Update on Trinity System Procurement and Plans
 
Machine Learning @NECST
Machine Learning @NECSTMachine Learning @NECST
Machine Learning @NECST
 
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...Empirically Derived Abstractions in Uncore Power Modeling for a  Server-Class...
Empirically Derived Abstractions in Uncore Power Modeling for a Server-Class...
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
 
Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
 
Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
OpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC SystemsOpenPOWER Acceleration of HPCC Systems
OpenPOWER Acceleration of HPCC Systems
 
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
Training Distributed Deep Recurrent Neural Networks with Mixed Precision on G...
 
PFQ@ PAM12
PFQ@ PAM12PFQ@ PAM12
PFQ@ PAM12
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014ODP Presentation LinuxCon NA 2014
ODP Presentation LinuxCon NA 2014
 
Unleash performance through parallelism - Intel® Math Kernel Library
Unleash performance through parallelism - Intel® Math Kernel LibraryUnleash performance through parallelism - Intel® Math Kernel Library
Unleash performance through parallelism - Intel® Math Kernel Library
 
Fast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating SystemsFast switching of threads between cores - Advanced Operating Systems
Fast switching of threads between cores - Advanced Operating Systems
 
Japan's post K Computer
Japan's post K ComputerJapan's post K Computer
Japan's post K Computer
 
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORSAFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
AFFECT OF PARALLEL COMPUTING ON MULTICORE PROCESSORS
 
Affect of parallel computing on multicore processors
Affect of parallel computing on multicore processorsAffect of parallel computing on multicore processors
Affect of parallel computing on multicore processors
 
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded SystemsF9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
F9: A Secure and Efficient Microkernel Built for Deeply Embedded Systems
 
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
Performance Optimization of Deep Learning Frameworks Caffe* and Tensorflow* f...
 

SmartBalance-DAC-v2

  • 1. SmartBalance:  A  Sensing-­‐Driven  Linux   Load  Balancer  for  Energy  Efficiency  of   Heterogeneous  MPSoCs   Santanu  Sarma     Computer  Science,  UC  Irvine           h3p://variability.org     Coauthors: Tiago R. Muck, Danny Bathen, N. Dutt, A. Nicolau T1: Measurement and Modeling T2: Design Tools and Testing T3: Microarchitecture and Compilers T4: Runtime Support T5: Applications and Testbeds T6: Outreach and Education
  • 2. Current Trends in MPSoC •  Emerging  and  future  compu?ng  systems  will  be   heterogeneous  mul?core  processor(HMP)[Borkar11]   •  They  will  be  rich  in  different  types  of  cores  with   diverse  memories  and  accelerators  [ARM  big.Li3le, 2013;  Angstrom  plaTorm,  MIT  2014,  P2012  PlaTorm]   •  Heterogeneity  manifest  even  in  homogenous   architectures  due  to  process  variability     [Teodorescu08]   •  They  are  monitor–rich  at  lower  layers  of  abstrac?ons   [Kornaros13,  Lefurgy13,  Gupta13]       6/10/15   ©  VLSI  Design  &  Embedded  Systems   Conference  -­‐  2015   2  
  • 3. Heterogeneous  PlaTorms     Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU Clear  Trend  Towards  Heterogeneous  Many/mul;   core  Architectures  with  different  core  types   Examples: ARM (big.Little) , NVidia Tegra, and AMD GPGPU
  • 4. Emerging & Future HMPs 6/10/15   4   Futuris;c  heterogeneous  mul;core  processor  are  expected  to  have   shared  memories,  coherent  bus,  mul;ple  networks  and  accelerators   A15   Bluetooth   GSM  WiFi   3/4G   5G   A7   A7   A7   A7   A7   A7   A7   A7   A7   L2   A11   A11   A11   A11   L2   L2   Cache  Coherent  Interconnect   L3   GPU     Accelerator   Disk   Global  Interrupt  Controller     DRAM   SPM   Y   Y   Z   OtherAccelerators
  • 5. Smart  Load  Balancing  Problem   •  Standard  Load  Balance:  Distribute  threads   (tasks)  among  cores  uniformly  and  randomly   (lack  of  awareness  at  thread  level)       •  Smart  Load  Balancing  :    Distribute  threads   (tasks)  among  cores  with  awareness  of   energy/power  at  thread  levels   5  
  • 6. Tradi?onal  OS  Allocator  &  Scheduler   •  Do  not  cope  jointly  with   workload  variability  and   heterogeneity   •  Do  not  expose   Variability  at  OS  layer   •  Lacks  suitable   Abstrac?ons     •  Lacks  support  for   Generic  HMPs   6   Alloca?on  /  Balancing   A7   A11   A15   A11   A7   A7   A7   A7   A7   A7   A7   A7   A11   A11   A7   A11   A15   A11   A7   A7   A7   A7   A7   A7   A7   A7   A11   A11   LLC   Task1   Task2   Task  n  Task  m   Scheduler   Tradi;onal  OS  (eg.  Linux)  not  yet  ready  to  deal  with   DUAL  Challenge  of  Heterogeneity  and  Variability        
  • 7. SmartBalance  Approach   7   A7   A11   A15   A11   A7   A7   A7   A7   A7   A7   A7   A7   A11   A11   A7   A11   A15   A11  A7   A7   A7   A7   A7   A7   A7   A11   A11   LLC   Task1   Task2   Task  n  Task  m   Predict   Balance  Sense   Scheduler   •  Sensing-­‐driven  closed   loop  predic?ve  approach   •  Support  Generic  HMPs     •  Supports  shared  &   independent  task  models   Heterogeneity  and  Performance-­‐Power-­‐Aware   Balancer/Allocator  for  Generic  HMPs  
  • 8. SmartBalance  Approach   8   Sensing Es?ma?on  &  Predic?on Alloca?on Epoch  1 Epoch  2 Epoch  3 Scheduling            (CFS) TSTA Scheduling            (CFS) Scheduling            (CFS) TEpoch A7   A11   A15   A11   A7   A7   A7   A7   A7   A7   A7   A7   A11   A11   A7   A11   A15   A11  A7   A7   A7   A7   A7   A7   A7   A11   A11   LLC   Task1   Task2   Task  n  Task  m   Predict   Balance  Sense   Scheduler   SmartBalance  stages  are  divided  into  ;me  slices   called  EPOCHS  
  • 9. On-­‐Chip  Sensing  and  Measurement   •  Performance  Sensing   –  Hardware  Performance     counters  at  each  core   •  Power  Sensing   –  Per  core  total  power   sensing     –  Dynamic  Power  Sensing:   virtual  sensor  based  per   core   –  Leakage  Power  Sensing:   •  Per  block  leakage  sensor   •  Network  of  sensors     Epoch  1 Epoch  2 Epoch  3 A7   A11   A15   A11   A7   A7   A7   A7   A7   A7   A7   A7   A11   A11   A7   A11   A15   A11  A7   A7   A7   A7   A7   A7   A7   A11   A11   LLC  
  • 10. 10   SmartBalance  Sensing  and  Measurement   …….# Smart#Balancing#Epoch#TEpoch(k)# Linux#CFS# Sched#Period#T1k(1)# T1k(L)# …#τ1 τ2 τ m @me#Core1# TEpoch(kA1)# …….# @me#Core2# …….# @me#Coren# Sense# Es@mate#&#predict# Balance#
  • 11. Performance-­‐Power  Predic?on   •  Performance  predic?on  at   each  core  types  based  on   profiling  or  online  learning     •  Customized  predictors  for   each  different  core  with   fine  /  precise  predic?on   –  For  known  architectures   •  A  generic  predictor  for   coarse  predic?on     –  For  unknown  architectures   with  new  core  types   11   Epoch  1 Epoch  2 Epoch  3 Variability-­‐Aware   Performance  &   Power  Predic?on    Performance     Counters   Configura?ons ….. Perf. Matrix Power Matrix Power  &  Variability     Sensing
  • 12. On-­‐line  Op?miza?on       l  Problem  Defini?on:     l  NP-­‐HARD  problem   l  Finding  soluGon  requires   heurisGcs   l  Simulated  annealing  based   AllocaGon   l  On-­‐line  low-­‐overhead  (<  1%  for   100ms  Epoch  )   Task   AllocaGon   (SA  Based   Online  Solver) Objec?ve(s) Alloca?on Epoch  1 Epoch  2 Epoch  3 t1 t4 t3 t2 ipc00 ipcij Perf. matrix p00 pij Power. matrixmax Ψ IPS Power ⎛ ⎝⎜ ⎞ ⎠⎟ CFS CFS CFS
  • 13. SmartBalance  Approach   13   35# Epoch#1# Epoch#2# Epoch#3# Variability3Aware# Performance#&# Power#Predic=on###Performance## Counters## Configura=ons# …..# Perf.#Matrix# Power#Matrix# Power#&#Variability## Sensing# A7# A11# A15# A11# A7# A7# A7# A7# A7# A7# A7# A7# A11# A11# A7# A11# A15# A11#A7# A7# A7# A7# A7# A7# A7# A11# A11# LLC# Task1# Task2# Task#n#Task#m# Predict( Balance(Sense( Scheduler# Alloca=on#/## Load#Balancer# A15# A15# LLC# Task1# Task2# Task#n#Task#m# Scheduler# (a)  Tradi2onal(OS(( Allocator/Scheduler( (b)(Sensing;driven(Predic2ve( OS(Allocator/Scheduler( ## A15# A15# A15# A15# (c)(Predic2ve(alloca2on(for(epochs( (Each(epoch(cover(mul2ple(Linux(scheduling(cycles(
  • 14. Experimental  PlaTorm   •  Extension  of  the   gem5     – McPAT    +  power   variability   – Sensing  interface     •  Heterogeneous   Alpha-­‐based  cores:   –  8-­‐way  OoO  (Huge)   –  4-­‐way  OoO  (Big)   –  2-­‐way  OoO  (Medium)   –  Inorder  (Small)   14   Thread'0' Thread'n' App'0' Thread'0' Thread'n' App'n' Applica/ons' Opera/ng'' System' Extended' Gem5' Pla;orm' Benchmarks' Disk' DRAM' McPAT' HPC/' Sensing' Interface' ….' Power'Perf.' Core'1' RQ' Schedule()' Core'2' RQ' Schedule()' Core'n' RQ' Schedule()' load_balance()' smart_balance()' Linux'Kernel' ……' ……' Big( $I' $D' L2' Medium( $I' $D' L2' Small( $I' $D' L2' Huge( $I' $D' L2'
  • 15. Experimental  Goals  &  Benchmarks   •  Goals:  Improve  Energy  Efficiency   •  Benchmarks:  PARSEC    &  Mixes   •  Interac?ve  Benchmarks  (IMB)   – 9  IMBs  (e.g.  High  Throughput  High  Interac?vity   HTHI)   – Ability  to  control  phases,  wait  periods  etc   15   PARSEC Mixes, Mix1, Mix2, Mix3, Mix4, Mix5, Mix6, X264Hcrew, x264Hbow, x264Lcrew, x264Lbow, x264Lcrew, x264Hbow, x264Hcrew, x264Lbow, Bodytrack, x264Hcrew, , Bodytrack, x264Hcrew, x264Lbow,
  • 16. 16   Results  w.r.t.  Vanilla  Linux  CFS   Over 50 % improvement wrt to Vanilla Linux Kernel •  Linux CFS Scheduler Uniformly distributes the threads irrespective of the core types & feature •  SmartBalance makes workload & power-aware runtime decisions
  • 17. 17   Results  wrt  ARM  GTS     Over ~20% improvement wrt State-of-the-art ARM GTS •  ARM GTS makes binary decision to select either a big core or a small core based on utilization threshold •  Unaware of thread-level power and performance
  • 18. 18   Overheads   Overhead is < 1% for 100ms Epoch for Quad-core system
  • 21. 21   Related  Work   Reference' Scheme'Generality' Per2Thread' Awareness' Per2Core'Awareness' Integrated' &' Implemen ted'in''OS'No'Core' Types'>2' Thread2to2 core2raBo' >1' ' IPC' Power' UBl.' IPC' Power' Chen2009(( Yes( No( No( No( No( Yes( Yes( No( Annamalai2013( No( No( No( No( No( Yes( Yes( No( Liu2013( Yes( Yes( No( No( No( Yes( Yes( No( Kim2014(( No( Yes( No( No( Yes( No( No( Yes( Linaro(IKS(2013( No( Yes( No( No( Yes( No( No( Yes( ARM(GTS(2013( No( Yes( No( No( Yes( No( No( Yes( SmartBalance' Yes' Yes' Yes' Yes' Yes' Yes' Yes' Yes'
  • 22. Summary  and  Future  Work   •  Performance-­‐Power-­‐Aware  PredicGve  Linux   Load  Balancing     •  Over  50%  improvement  for  Quad  core  HMP   at  <1%  overhead   22   Predict   Balance  Sense   •  Over  20  %  improvement  in   energy  efficiency  wrt  ART   GTS  policy     •  Future  Work:  Load,  Priority,   and  Thermal  Awareness  of   the  balancer  
  • 23. 6/1/12   ©  Santanu  Sarma,  UCI     23   Thanks     santanus@uci.edu
  • 24. 24   Experimental  Setup   Task%0% Task%n% App%0% Task%0% Task%n% App%n% Applica'ons+ Opera'ng%% System+ Pla5orm+ Benchmarks% Huge% Big% Medium+ Small+ Disk% DRAM% McPAT% HPC/% Sensing% Interface% ….% Gem5%Performance%Simulator% Ext.%for%Heterogeneous%MPSOC% Core%0% RQ% Schedule()% Core%1% RQ% Schedule()% Core%n% RQ% Schedule()% Power%Perf.% load_balance()% smart_balance()+ Linux+Kernel+ ……% ……% CORE+FEATURES+ Huge+ Big+ Medium+ Small+ Issue%width%% 8% 4% 2% 1% LQ/SQ%size%% 32/32% 16/16% 8/8% 8/8% IQ%size%% 64% 32% 16% 16% ROB%size%% 192% 128% 64% 64% Int/float%Regs%% 256% 128% 64% 64% L1$I%size%(KB)%% 64% 32% 16% 16% L1$D%size%(KB)%% 64% 32% 16% 16% Freq.%(MHz)%% 2000% 1500% 1000% 500% Voltage%(V)1% 1% 0.8% 0.7% 0.6% Peak%Throughput1%% 4.18% 2.60% 1.31% 0.91% Peak%Power%(W)1% 8.62% 1.41% 0.53% 0.095% Area%(%mm%2%)%1%% 11.99% 5.08% 3.04% 2.27% 1%Es^mated%using%Gem5+and%McPAT%at%22nm%with%PARSEC%benchmarks% (a)% (b)%
  • 25. Implementa?on  in  Linux   •  Modifica?ons  for  SmartBalance   – Load  balancing  replaced  by  SmartBalance   – Each  phase  runs  as  a  kernel  thread   – System  does  not  halt  while  running  SmartBalance  25  
  • 28. 28   Sensing  Overhead   0   0.05   0.1   0.15   0.2   0.25   0.3   0.35   0.4   0.45   0.5   1   2   %  Overhead  wrt    4  cores   Sensor  type   Leakage  Sensing  Overhead   %  Area  Overhead     %Power  Overhead  
  • 29. CPSoC  Computa?onal  PlaTorm   CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPU CPU $I $D $L2 NIA OCSA NoC Router OCSA CPS Core Adaptive Router Chip Hardware App 1 App 2 App N Cross-Layer Sensors (Virtual & Physical) Decisions & Learning (Controller) Actuation (software and hardware) ReflectiveMiddlewareLayer Scheduling Memory Manager File System Device Drivers Traditional Operating System Hypervisor Observe Decide Act Application Layer CPSCore DDRO(s) Oxide Sensor(s) Temperature Sensor(s) Leakage Sensor(s) Aging Sensor(s) Reliability Sensor(s) Performance Counters CPU(s) $I $D $L2 Scratch pad/ On-Chip SRAM NIA Timer & RTC PLL On-chip Actuation Unit On-Chip Sensing & Actuation (OCSA) GPIO !
  • 30. Cross-­‐Layer  Physical/Virtual  Sensing  &   Actua?on     ApplicaGons   OperaGng  System   Network/Bus  CommunicaGon   Architecture     Hardware  Architecture   Device/Circuit  Architecture       SA   SO   SN   SH   SC   Sensors     (Observer)   AdapGve  Control                                           (Decide)   AA   AO   AN   AH   AC   Actua?on     (Act)