SlideShare a Scribd company logo
1 of 145
Politecnico di Milano
From FPGA-based Reconfigurable Systems to
Autonomic Heterogeneous Computing Systems
an enabling technologies perspective and more…
PEoPLe@DEIB
CE4 @ 13 march 2018
Marco D. Santambrogio
<marco.santambrogio@polimi.it>
Politecnico di Milano
2
Computing systems are getting…
3
little…
Computing systems are getting…
4
little…
little+Big
Computing systems are getting…
5
little…
little+Big
little+Big and heterogeneous
Computing systems are getting…
Heterogeneous Complex Systems
• Ryft ONE
– Big Data infrastructure due to an FPGA-accellerated architecture
– http://www.ryft.com/
• IBM Power8
– Introducing the Coherent Accelerator Processor Interface (CAPI) port that is
layered on top of PCI Express 3.0
– http://www-304.ibm.com/webapp/set2/sas/f/capi/home.html
• Microsoft Catapult
– Stratix V (Arria 10 FPGA)
– http://research.microsoft.com/en-us/projects/catapult/
• Amazon EC2 F1 Instances
– Xilinx UltraScale Plus FPGA
– https://aws.amazon.com/about-aws/whats-new/2017/04/amazon-ec2-f1-
instances-customizable-fpgas-for-hardware-acceleration-are-now-generally-
available/
• OpenPower Foundation
– http://openpowerfoundation.org/
6
Reconfiguration in everyday life
Soccer
(Partial – Static)
8
Reconfiguration in everyday life
Soccer
Football
(Complete – Static)
(Partial – Static)
9
Reconfiguration in everyday life
Soccer
Football
(Complete – Static)
(Partial – Static)
10
11
13
14
20 March 2018 Marco D. Santambrogio/ Politecnico di Milano 15
Research Challenge
17
18
Set of Available
Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
Starting Scenario @ 2009
19
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
Set of Available
Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
Starting Scenario @ 2009
People Demanding
for Functionalities
Set of Available
Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
20
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
Starting Scenario @ 2009
People Demanding
for Functionalities
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
21
RC
Implementations
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Starting Scenario @ 2009
The Problem
Time
Area
A
B
Rec. F
F
Rec. E
E
Rec. C
C
Rec. D
D
22
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
A Possible Solution
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU Implementations
RR3RR2RR1
A
RR3RR2RR1
C
RR3RR2RR1
B
RR3RR2RR1
B
RR3RR2RR1
D
RR3RR2RR1
D
E
RR3RR2RR1
E
RR3RR2RR1
RR3RR2RR1
F
Time
Area
A
B
Rec. C
C
Rec. F
F
Rec. E
E
D
Rec. D
23
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
RC Implementations
Relocation
24
A
E
D
C
B
F
2/1
2/2
1/2
1/1
1/1
2/2
A possible scenario
FiArea/Time
Legenda:
Time
Time
Area
A
B
Rec. C
C
R2
F
F
R2
E
E
D
R2
D
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
Relocation: Virtual homogeneity
25
Relocation: Virtual homogeneity
26
Set of functionalities
(1) (2)
(3)
1/2
2/2
2/5
Legenda:
func
Area/ExeTime
Relocation: Virtual homogeneity
27
Relocation: Virtual homogeneity
28
(2)
2/5
Starting Scenario @ 2009
Set of Available
Functionalities
FiArea/Time
Legenda:
A2/1
B 1/2
C2/2
D 1/1 E 1/1
F 2/2
RR3RR2RR1
FPGA
29
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
People Demanding
for Functionalities
Starting Scenario
30
RR3RR2RR1
A
RR3RR2RR1
F
RR3RR2RR1
D
RR3RR2RR1
B
RR3RR2RR1
C
E
RR3RR2RR1
RFU
Implementations
RC
Implementations
People Demanding
for Functionalities
Starting Scenario
31
People Demanding
for Functionalities
NECST Scenario @ 2017/2018
32
People Demanding
for Functionalities
34
35EXTRA Consortium Proprietary
36
!
Usability
37
Interac( vity
38
Modularity
39
40
The proposed CAOS framework
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
41
The proposed CAOS framework
Architectural
Templates
( )
OpenCL
Streaming
…
( )
…
Computa8on Model Technology
SST
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
42
CAOS Frontend
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
43
CAOS Functions Optimization
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
44
CAOS Backend
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
45
CAOS Backend
____
__
____
____
__
___
____
____
Application
(C, C++ OPENCL)
WebUI
CAOSFlowManager
Frontend
IR generation – profiling –
templates applicability check
– HW/SW partitioning
Functions
Optimization
HW resource estimation –
static code analysis –
performance estimation –
code optimization / DSE
Backend
Runtime generation – function
synthesis – floorplanning –
bitstream generation
IR gen.
profiling
…
HW est.
DSE
…
Floorpl.
Bit. gen.
…
…
…
<system>
…
</system>
Profiling
Datasets
System
Description
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
10 1 0
10 1 0
1
10 1 0
10 1 1
01 0
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
1 01 0
1 01 0
1
1 01 0
1 01 1
0 10
System
runtime
FPGAs
bitstreams
SST
MaxCompiler
Output Genera- on
46
CAOS: OpenCL and SDAccel
• CAOS Frontend supports OpenCL code:
– Intermediate representation support
– Template applicability check for SDA
– Code profiling through LTPV (OpenCL profiler)
– Function optimization:
• Static code analysis and HW resource estimation within
SDA
– Backend support for SDAccell
</>
</>
47
CAOS Backend for SDAccel
SDAccel generates &
provides:
- XCLBIN containing the
bitstream
- OpenCL Runtime to
manage kernel
execution
CAOS Integrates SDAccel:
- Identifying I/O
Variables
- Generating a specific
OpenCL Host code for
the application
48
Evaluations: Evaluations
[1, 2] Streaming Stencil Time-step (SST)
[3] Pearson Correlation Coefficient, Asian Option Pricing
[5] Protein Folding
[4] Smith Waterman and Vessels Segmentation
49
[1]
[2]
[3]
[5]
[4]
[4]
[*] intel Xeon E5 1410
32 GB RAM
[*]
50EXTRA Consortium Proprietary
Hints on the problem…
51
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC,
pages 13-25, 2012.
Heuristic-Optimal Floorplanner
52
Reconfigurable regions +
Resource requirements FPGA Heuristic
solution
MILP model
MILP Solver
(Gurobi, Cplex, GLPK, …)
Improved heuristic solution
User-defined linear
Objective Function
Geometrical constraints
• Non overlapping
guaranteed by the
geometrical constraints
Objective function
• Cost function can be defined starting from the
variables and parameters of the MILP model
• Implemented metrics:
– Global wirelength measured using HPWL ( )
– Regions perimeter ( )
– Wasted resources ( )
53
Hints on the problem…
54
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC,
pages 13-25, 2012.
Hints on the problem…
• Optimal solution in 29s
• 34% wasted frames
reduction
– No DSP and CLB wasted
by the Video Decoder RR
– No BRAM wasted by the
Signal Decoder RR
• Approximately same
wirelength
55
*
[*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC,
pages 13-25, 2012.
20 March 2018 your name / affiliation here 56
Runtime Reconfiguration Management
• Reconfigurable architecture
– Static Area: used to control the reconfiguration process
– Reconfigurable Area: used to swap at runtime different cores
– Reconfiguration- oriented communication infrastructure
• Runtime reconfiguration managed via SW
– Standalone, Operating System
• Increased portability of user applications
• Inherited multitasking capabilities
• Simplified software development process
• Bitstreams relocation technique to
– speedup the overall system execution
– achieve a core preemptive execution
– assign at runtime the bitstreams placement
– reduce the amount of memory used to store partial bitstreams
58
Runtime Reconfiguration Management
• Reconfigurable architecture
– Static Area: used to control the reconfiguration process
– Reconfigurable Area: used to swap at runtime different cores
– Reconfiguration- oriented communication infrastructure
• Runtime reconfiguration managed via SW
– Standalone, Operating System
• Increased portability of user applications
• Inherited multitasking capabilities
• Simplified software development process
• Bitstreams relocation technique to
– speedup the overall system execution
– achieve a core preemptive execution
– assign at runtime the bitstreams placement
– reduce the amount of memory used to store partial bitstreams
59
OS-based management of dynamic
reconfiguration
• Provide software support for dynamic partial reconfiguration
on Systems-on-Chip running an operating system (i.e.,
LINUX).
– OS customization for specific architectures
– Rec. Functional Unit caching policies to improve the performance
– Partial reconfiguration process management from the OS
– Addition and removal of reconfigurable components
– Automatic loading and unloading of specific drivers for the IP-Cores
upon components configuration and/or deconfiguration
• Hardware-independent interface for software developers
based on the GNU/Linux
• Easier programming interface for specific drivers
60
IP-Core Devices Access
• Interaction with configured IP-Cores implemented by
means of the standard Linux device access
– Open, Close, Read, write, ioctl operations
/dev/device_1
/dev/device_2a
/dev/device_2b
Software
Application 1
Software
Application 2
device_1.o
device_2.o
/dev
Device
Drivers
Userspace
Devices
IP-Core_1
IP-Core_2a
IP-Core_2b
FPGA
Multiple instances
of the same
hardware module
61
Reconfigurable Process Control Block
• Reconfigurable process: an RFU object code in execution
• Each reconfigurable process is represented in the system by a
Reconfigurable Process Control Block (RPCB).
• A RPCB contains all the information associated with a specific
reconfigurable process
– State: the state in which the reconfigurable process control is at
the current time
– Position: the placement position on the device
– Object Code Accounting Information:
• Object Code
• Configuration Priority
• Resources
• Position
Configured
Addressing Space
Assignment
Ready
Positioning
Configuring
Configuring
Cached
Computing
Removed
Waiting
Executing
Preempted
62
The Centralized Manager
• Userspace applications are not allowed to explicitly
request a bitstream
– They request high-level functionalities
• Userspace requests are collected and served by a
centralized manager (Linux Reconfiguration Manager)
– The OS chooses the configuration code
– A new reconfigurable process is created
• Only the LRM can ask for a bitstream to be configured
on the FPGA
– Centralized knowledge of the device status
– Area management and module caching
63
63
Reconfiguration: Online Static
64
Reconfiguration: Online Static
65
Reconfiguration: Online Static
66
Reconfiguration: Online Static
67
Reconfiguration: Online Static
68
Reconfiguration: Online Static
69
70
Where to go necst?
Trying to raise the bar
• Towards the design and implementaion of Self-
adaptive and autonomic systems
71
Trying to raise the bar
• Towards the design and implementaion of Self-
adaptive and autonomic systems
• We need to include 2 more features
– Goal-oriented
• Tell the system what you want
• System’s job to figure out how to get there
– Approximate
• Does not expend any more effort than necessary to
meet goals
72
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
73
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
74
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
75
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
76
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
77
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
78
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
79
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
80
Data DataEnvironment Environment
Output - Environment Update
Output - Environment Update
Online Static Solution Adaptive Solution
81
Nice idea, but
82
Nice idea, but how to use it!
83
Umh…
Autonomic Operating System
• The AcOS project aims at
– designing and prototyping a patch for commodity
operating systems (e.g. Linux, FreeBSD)
– being capable to observe its own execution and
optimize, in a self-aware manner, its behavior with
respect to the external environment, to user needs
and to applications demands
[integrated/used in different research projects]
84
The research effort
• K42
– http://researcher.watson.ibm.com/researcher/view_
project.php?id=2078
• The SElf-awarE Computing (SEEC) Model
– http://groups.csail.mit.edu/carbon/seec/
• Angstrom
– http://projects.csail.mit.edu/angstrom/
• The Swarm project and Tessellation OS
– http://tessellation.cs.berkeley.edu
85
Reconfiguration: Self-Aware
86
Reconfiguration: Self-Aware
87
Reconfiguration: Self-Aware
88
89
Reconfiguration: Self-Aware
90
AcOS: via an intelligent ODA loop
91
Monitoring
Infrastructures
Adaptation
Policies
Applications
System
Goals
Measurements
Decide
Act Observe
The Heart Rate Monitor
• Heartbeats signal[1] either progresses or availability
– video encoder: 1 heartbeat = 1 frame
– web server: 1 heartbeat = 1 request
– database server: 1 heartbeat = transaction
• Heart rate as a performance measure and goal
– High-level, application-specific performance
measurements and goals (e.g., video encoder: 30
heartbeats/s = 30 frames/s)
• Compact API, user/kernel-space partitioned
implementation[2]
– User-mode fast-path heartbeats issue
– User and kernel-mode low-latency heart rate access
92
[1] Hoffmann et al., Application Heartbeats for Software Performance and Health,
Proc. of the 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2010
[2] Sironi et al., Metronome: Operating System Level Performance Management via Self-Adaptive Computing,
Proc. of the 49th Annual Design Automation Conference 2012
The Heart Rate Monitor
93
The Heart Rate Monitor
• Set performance goal
94
The Heart Rate Monitor
• Set performance goal
95
e.g.:
min: 25hb/sec
max: 35hb/sec
The Heart Rate Monitor
• Set performance goal
• Run the app and update progress
96
Issue an heartbeat
e.g.:
min: 25hb/sec
max: 35hb/sec
The Heart Rate Monitor
• Set performance goal
• Run the app and update progress
97
Issue an heartbeat
Statistics automatically
updated
e.g.:
min: 25hb/sec
max: 35hb/sec
The Heart Rate Monitor
• Set performance goal
• Run the app and update progress
• Check heart rate and, if necessary, react
98
Issue an heartbeat
Statistics automatically
updated
e.g.:
min: 25hb/sec
max: 35hb/sec
A first evaluation
• Hardware
• Quad-core: Intel Core i7-870 @ 2.97 GHz with 8 MB of
LLC
• Hyper-Threading, Turbo Boost, etc. disabled
• Memory: 4 GB of DDR3-1333
• Software
• Debian GNU/Linux 2.6.35
• PARSEC Benchmark Suite 2.1
99
x86 without any control
100
The controlled x86
101
Consensus
object
core allocator
x264
Two controlled x86, different goals
102
Consensus
object
core allocator
x264 x264
Autonomic Scheduling
• In a scenario were applications
– are competing for the same set of resources
– require predictable performance, expressed
through high-level, application-specific metrics
• The scheduler has to become Performance-
Aware to automatically allocate resources to
match performance goals
– With Metronome, we introduce performance-
awareness by means of a non-invasive
modifications to the Completely Fair Scheduler
103
Metronome
104
Metronome
105
Metronome
Sironi et al., Metronome: Operating System Level Performance Management via Self-Adaptive Computing,
Proc. of the 49th Annual Design Automation Conference 2012
106
From Metronome to Metronome++
• Metronome demonstrates
how a simple heuristic can be
enough to enable goal-
oriented resource allocation
by exploiting runtime
performance feedback
• Metronome++ has been
designed with more advanced
adaptation policy to
dynamically allocate CPUs to
SLO-bound
– E.g. Through tasks migration
among run queues
107
(1) Linux kernel vanilla
(2) Linux kernel enhanced with Metronome++
Trying to raise the bar (again)
• AcOS took into consideration performance…
• Any other HOT topic?
108
Trying to raise the bar (again)
• AcOS took into consideration performance…
• Any other HOT topic?
– What about temperature Control/Management!
109
110
swaptions @ 2.80 GHz
ab @ 2.80 GHz
temperatureincrease(°C)
0
10
20
time (s)
0 100 200 300 400 500 600
Temperature Control/Management
starting scenario
111
swaptions @ 2.80 GHz
ab @ 2.80 GHz
temperatureincrease(°C)
0
10
20
time (s)
0 100 200 300 400 500 600
Temperature Control/Management
set a temperature cap
Temperature Control/Management
DVFS is dangerous
112
113
swaptions @ 2.80 GHz
ab @ 2.80 GHz
temperatureincrease(°C)
0
10
20
time (s)
0 100 200 300 400 500 600
Temperature Control/Management
back to the starting scenario
114
swaptions @ 2.80 GHz
ab @ 2.80 GHz
temperatureincrease(°C)
0
10
20
time (s)
0 100 200 300 400 500 600
Temperature Control/Management
set the same temperature cap
Temperature Control/Management
The AcOS refreshement: ThermOS
115
116
Where to go necst?
A vision
• To discover/understand the future, sometimes
you have to look back at the past…
117
Heterogeneous System Architecture
118
More information available at: http://hsafoundation.com/
A global property
• Being able to adapt is not a specific domain
property
– Operating system, computer architecture, etc…
– HPC
– Exascale computing systems
– Embedded systems
– IoT
– …
119
A global property
• Being able to adapt is not a specific domain
property
– Operating system, computer architecture, etc…
– HPC
– Exascale computing systems
– Embedded systems
– IoT
embedded/mobile devices and
HPC/distributed/exascale computing
120
A self-adaptive, context-aware operating system to enhance users
experience via smart devices and environments
The CHANGE project
Enabling Technologies For Self-Aware
Adaptive Computing Systems
121
The mobile context
• What about
– Taking into consideration users’ habits and needs
– Considering very constrained and fluctuating scenarios
122
The mobile context
• What about
– Taking into consideration users’ habits and needs
– Considering very constrained and fluctuating scenarios
• Going Mobile!
– Constrained in terms of available resources
• Multi-core, reduced CPU performance, few memory
• Battery constrained
– Internal and external conditions are fluctuating
• Resources availability varies over time
• External conditions are unpredictable
123
Mobile context: some examples
• Tegra 2
– Dual-Core ARM Cortex-A9
– ULP GeForce, 8 cores
• A6
– Dual-Core based on ARMv7
– Triple-core PowerVR SGX 543MP3 GPU
• Tegra 3
– Quad-Core
– ULP GeForce, 12 cores
124
Motorola Photon 4G
HTC One X
The morphone approach
• To react to different situations all the information
from different components should not be
disregarded
• Our approach uses all the information that it can
retrieve to create a model of the context
– Environment
– Self
– User behavior
125
The Mpower example
126
Public Available for Free
SEPTEMBER 2013
mpower.necst.it
https://www.facebook.com/morphone.os
Gain back your Android battery life!
Configuration A
Configuration B
The concept of configuration
• Domain-specific feature: hardware modules currently used
• We defined the concept of configuration:
• “Given the controllable hardware modules on a device, a
configuration is a combination of their internal state”
• We compute a single ARX model for each configuration
Configuration
A
Configuration
B
Configuration
C
127
Modelling Approach
128
Modeling approach: “divide et impera”
129
We experienced a piecewise linear behavior and tried to attribute this to
domain-specific features
Modelling Approach
129
Modeling approach: “divide et impera”
130
We experienced a piecewise linear behavior and tried to attribute this to
domain-specific features
Configuration
A
Configuration
B
Configuration
C
Modelling Approach
130
Modeling approach: “divide et impera”
131
We experienced a piecewise linear behavior and tried to attribute this to
domain-specific features
Configuration
A
Configuration
B
Configuration
C
= actions on controllable
variables
Modelling Approach
131
Modeling approach: “divide et impera”
132
We experienced a piecewise linear behavior and tried to attribute this to
domain-specific features
Configuration
A
Configuration
B
Configuration
C
= actions on controllable
variables
Exogenous input
(uncontrollable)
Modelling Approach
132
Actions on controllable variables
• They are determined by the user’s behavior
• We model the evolution of the smartphone’s configuration as
a Markov Decision Process
• A state for every configuration
• Transitions’ weights represent the
probability to go from
a configuration to another
Configuration
A
Configuration
B
Configuration
C
133
Move Decision Phase on Device
134
• To perform the computation needed for the
decision phase on the device
– We need low energy computing systems
• The battery is always the bottleneck
– We need fast computing system
• The system has to be responsive
– We need reconfigurable computing systems
• The analysis to be performed on the input data
changes with respect to
135EXTRA Consortium Proprietary
136EXTRA Consortium Proprietary
137EXTRA Consortium Proprietary
138EXTRA Consortium Proprietary
139EXTRA Consortium Proprietary
140EXTRA Consortium Proprietary
141EXTRA Consortium Proprietary
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
159
(b) Working NERA Cluster
gure 25: NERA Cluster 159
NQ-Z1 Development (b) Working NERA Cluster
Figure 25: NERA Cluster
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
159
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
159
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
159
(a) PYNQ-Z1 Development
board
(b) Working NERA Cluster
Figure 25: NERA Cluster
142EXTRA Consortium Proprietary
143EXTRA Consortium Proprietary
x
144
Politecnico di Milano
From FPGA-based Reconfigurable Systems to
Autonomic Heterogeneous Computing Systems
an enabling technologies perspective and more…
International Center for Theoretical Physics
Trieste @ 24 November 2017
Marco D. Santambrogio
<marco.santambrogio@polimi.it>
Politecnico di Milano

More Related Content

What's hot

Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorDeepak Tomar
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processorsRISC-V International
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACTDVClub
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Roy Messinger
 
Deep submicron-backdoors-ortega-syscan-2014-slides
Deep submicron-backdoors-ortega-syscan-2014-slidesDeep submicron-backdoors-ortega-syscan-2014-slides
Deep submicron-backdoors-ortega-syscan-2014-slidesortegaalfredo
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga coressanaz nouri
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V International
 
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATION
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATIONFROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATION
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATIONieijjournal
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitSWINDONSilicon
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip coreanishgoel
 
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemSynopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemMostafa Khamis
 

What's hot (20)

Hari Krishna Vetsa Resume
Hari Krishna Vetsa ResumeHari Krishna Vetsa Resume
Hari Krishna Vetsa Resume
 
An introduction to FPGAs and Their MPSOCs
An introduction to FPGAs and Their MPSOCs  An introduction to FPGAs and Their MPSOCs
An introduction to FPGAs and Their MPSOCs
 
The Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft ProcessorThe Microarchitecure Of FPGA Based Soft Processor
The Microarchitecure Of FPGA Based Soft Processor
 
Smart logic
Smart logicSmart logic
Smart logic
 
Vlsi design flow
Vlsi design flowVlsi design flow
Vlsi design flow
 
Online test program generator for RISC-V processors
Online test program generator for RISC-V processorsOnline test program generator for RISC-V processors
Online test program generator for RISC-V processors
 
Verification Automation Using IPXACT
Verification Automation Using IPXACTVerification Automation Using IPXACT
Verification Automation Using IPXACT
 
3rd Lecture
3rd Lecture3rd Lecture
3rd Lecture
 
Asic &fpga
Asic &fpgaAsic &fpga
Asic &fpga
 
Asic
AsicAsic
Asic
 
Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison Xilinx vs Intel (Altera) FPGA performance comparison
Xilinx vs Intel (Altera) FPGA performance comparison
 
Deep submicron-backdoors-ortega-syscan-2014-slides
Deep submicron-backdoors-ortega-syscan-2014-slidesDeep submicron-backdoors-ortega-syscan-2014-slides
Deep submicron-backdoors-ortega-syscan-2014-slides
 
Xilinx fpga cores
Xilinx fpga coresXilinx fpga cores
Xilinx fpga cores
 
Asic design
Asic designAsic design
Asic design
 
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML AcceleratorsRISC-V & SoC Architectural Exploration for AI and ML Accelerators
RISC-V & SoC Architectural Exploration for AI and ML Accelerators
 
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATION
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATIONFROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATION
FROM FPGA TO ASIC IMPLEMENTATION OF AN OPENRISC BASED SOC FOR VOIP APPLICATION
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
Making of an Application Specific Integrated Circuit
Making of an Application Specific Integrated CircuitMaking of an Application Specific Integrated Circuit
Making of an Application Specific Integrated Circuit
 
Nios2 and ip core
Nios2 and ip coreNios2 and ip core
Nios2 and ip core
 
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation SystemSynopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
Synopsys Fusion Compiler-Comprehensive RTL-to-GDSII Implementation System
 

Similar to From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing Systems

From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...NECST Lab @ Politecnico di Milano
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationVEDLIoT Project
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryDeepak Shankar
 
FPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusionFPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusionPersiPersi1
 
AXONIM 2018 industrial automation technical support
AXONIM 2018 industrial automation technical supportAXONIM 2018 industrial automation technical support
AXONIM 2018 industrial automation technical supportVitaliy Bozhkov ✔
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...NECST Lab @ Politecnico di Milano
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationVEDLIoT Project
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream csching
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectSaltlux Inc.
 
Resume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - CopyResume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - CopyVenkata Rakesh Gudipalli
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 

Similar to From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing Systems (20)

From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing S...
 
The NECSTLab Multi-Faceted Experience with AWS F1
The NECSTLab Multi-Faceted Experience with AWS F1The NECSTLab Multi-Faceted Experience with AWS F1
The NECSTLab Multi-Faceted Experience with AWS F1
 
CAOS @ ICCD2017
CAOS @ ICCD2017CAOS @ ICCD2017
CAOS @ ICCD2017
 
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentationHiPEAC Computing Systems Week 2022_Mario Porrmann presentation
HiPEAC Computing Systems Week 2022_Mario Porrmann presentation
 
The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...The CAOS framework: democratize the acceleration of compute intensive applica...
The CAOS framework: democratize the acceleration of compute intensive applica...
 
CAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based SystemsCAOS: A CAD Framework for FPGA-Based Systems
CAOS: A CAD Framework for FPGA-Based Systems
 
Mirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP LibraryMirabilis_Design AMD Versal System-Level IP Library
Mirabilis_Design AMD Versal System-Level IP Library
 
FPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusionFPGA_prototyping proccesing with conclusion
FPGA_prototyping proccesing with conclusion
 
AXONIM 2018 industrial automation technical support
AXONIM 2018 industrial automation technical supportAXONIM 2018 industrial automation technical support
AXONIM 2018 industrial automation technical support
 
The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...The CAOS framework: Democratize the acceleration of compute intensive applica...
The CAOS framework: Democratize the acceleration of compute intensive applica...
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 
No[1][1]
No[1][1]No[1][1]
No[1][1]
 
Alex16_ic
Alex16_icAlex16_ic
Alex16_ic
 
HiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentationHiPEAC 2022_Marco Tassemeier presentation
HiPEAC 2022_Marco Tassemeier presentation
 
Splunk app for stream
Splunk app for stream Splunk app for stream
Splunk app for stream
 
IJCRT2006062.pdf
IJCRT2006062.pdfIJCRT2006062.pdf
IJCRT2006062.pdf
 
Web Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC ProjectWeb Scale Reasoning and the LarKC Project
Web Scale Reasoning and the LarKC Project
 
Resume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - CopyResume_VenkataRakeshGudipalli Master - Copy
Resume_VenkataRakeshGudipalli Master - Copy
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Shantanu's Resume
Shantanu's ResumeShantanu's Resume
Shantanu's Resume
 

More from NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

More from NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Recently uploaded

Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfMahmoud M. Sallam
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxsocialsciencegdgrohi
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)eniolaolutunde
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsanshu789521
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupJonathanParaisoCruz
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptxVS Mahajan Coaching Centre
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfUjwalaBharambe
 

Recently uploaded (20)

9953330565 Low Rate Call Girls In Rohini Delhi NCR
9953330565 Low Rate Call Girls In Rohini  Delhi NCR9953330565 Low Rate Call Girls In Rohini  Delhi NCR
9953330565 Low Rate Call Girls In Rohini Delhi NCR
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Pharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdfPharmacognosy Flower 3. Compositae 2023.pdf
Pharmacognosy Flower 3. Compositae 2023.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptxHistory Class XII Ch. 3 Kinship, Caste and Class (1).pptx
History Class XII Ch. 3 Kinship, Caste and Class (1).pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)Software Engineering Methodologies (overview)
Software Engineering Methodologies (overview)
 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Presiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha electionsPresiding Officer Training module 2024 lok sabha elections
Presiding Officer Training module 2024 lok sabha elections
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
MARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized GroupMARGINALIZATION (Different learners in Marginalized Group
MARGINALIZATION (Different learners in Marginalized Group
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions  for the students and aspirants of Chemistry12th.pptxOrganic Name Reactions  for the students and aspirants of Chemistry12th.pptx
Organic Name Reactions for the students and aspirants of Chemistry12th.pptx
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdfFraming an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
Framing an Appropriate Research Question 6b9b26d93da94caf993c038d9efcdedb.pdf
 

From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing Systems

  • 1. Politecnico di Milano From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing Systems an enabling technologies perspective and more… PEoPLe@DEIB CE4 @ 13 march 2018 Marco D. Santambrogio <marco.santambrogio@polimi.it> Politecnico di Milano
  • 6. Heterogeneous Complex Systems • Ryft ONE – Big Data infrastructure due to an FPGA-accellerated architecture – http://www.ryft.com/ • IBM Power8 – Introducing the Coherent Accelerator Processor Interface (CAPI) port that is layered on top of PCI Express 3.0 – http://www-304.ibm.com/webapp/set2/sas/f/capi/home.html • Microsoft Catapult – Stratix V (Arria 10 FPGA) – http://research.microsoft.com/en-us/projects/catapult/ • Amazon EC2 F1 Instances – Xilinx UltraScale Plus FPGA – https://aws.amazon.com/about-aws/whats-new/2017/04/amazon-ec2-f1- instances-customizable-fpgas-for-hardware-acceleration-are-now-generally- available/ • OpenPower Foundation – http://openpowerfoundation.org/ 6
  • 7.
  • 8. Reconfiguration in everyday life Soccer (Partial – Static) 8
  • 9. Reconfiguration in everyday life Soccer Football (Complete – Static) (Partial – Static) 9
  • 10. Reconfiguration in everyday life Soccer Football (Complete – Static) (Partial – Static) 10
  • 11. 11
  • 12.
  • 13. 13
  • 14. 14
  • 15. 20 March 2018 Marco D. Santambrogio/ Politecnico di Milano 15
  • 16.
  • 18. 18 Set of Available Functionalities FiArea/Time Legenda: A2/1 B 1/2 C2/2 D 1/1 E 1/1 F 2/2 RR3RR2RR1 FPGA Starting Scenario @ 2009
  • 20. People Demanding for Functionalities Set of Available Functionalities FiArea/Time Legenda: A2/1 B 1/2 C2/2 D 1/1 E 1/1 F 2/2 RR3RR2RR1 FPGA 20 RR3RR2RR1 A RR3RR2RR1 F RR3RR2RR1 D RR3RR2RR1 B RR3RR2RR1 C E RR3RR2RR1 RFU Implementations RC Implementations Starting Scenario @ 2009
  • 22. The Problem Time Area A B Rec. F F Rec. E E Rec. C C Rec. D D 22 A E D C B F 2/1 2/2 1/2 1/1 1/1 2/2 A possible scenario FiArea/Time Legenda: Time RR3RR2RR1 A RR3RR2RR1 F RR3RR2RR1 D RR3RR2RR1 B RR3RR2RR1 C E RR3RR2RR1 RFU Implementations RC Implementations
  • 23. A Possible Solution RR3RR2RR1 A RR3RR2RR1 F RR3RR2RR1 D RR3RR2RR1 B RR3RR2RR1 C E RR3RR2RR1 RFU Implementations RR3RR2RR1 A RR3RR2RR1 C RR3RR2RR1 B RR3RR2RR1 B RR3RR2RR1 D RR3RR2RR1 D E RR3RR2RR1 E RR3RR2RR1 RR3RR2RR1 F Time Area A B Rec. C C Rec. F F Rec. E E D Rec. D 23 A E D C B F 2/1 2/2 1/2 1/1 1/1 2/2 A possible scenario FiArea/Time Legenda: Time RC Implementations
  • 24. Relocation 24 A E D C B F 2/1 2/2 1/2 1/1 1/1 2/2 A possible scenario FiArea/Time Legenda: Time Time Area A B Rec. C C R2 F F R2 E E D R2 D RR3RR2RR1 A RR3RR2RR1 F RR3RR2RR1 D RR3RR2RR1 B RR3RR2RR1 C E RR3RR2RR1 RFU Implementations RC Implementations
  • 26. Relocation: Virtual homogeneity 26 Set of functionalities (1) (2) (3) 1/2 2/2 2/5 Legenda: func Area/ExeTime
  • 29. Starting Scenario @ 2009 Set of Available Functionalities FiArea/Time Legenda: A2/1 B 1/2 C2/2 D 1/1 E 1/1 F 2/2 RR3RR2RR1 FPGA 29 RR3RR2RR1 A RR3RR2RR1 F RR3RR2RR1 D RR3RR2RR1 B RR3RR2RR1 C E RR3RR2RR1 RFU Implementations RC Implementations People Demanding for Functionalities
  • 32. NECST Scenario @ 2017/2018 32 People Demanding for Functionalities
  • 33.
  • 34. 34
  • 36. 36
  • 40. 40
  • 41. The proposed CAOS framework ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams 41
  • 42. The proposed CAOS framework Architectural Templates ( ) OpenCL Streaming … ( ) … Computa8on Model Technology SST ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams 42
  • 43. CAOS Frontend ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams 43
  • 44. CAOS Functions Optimization ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams 44
  • 45. CAOS Backend ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams 45
  • 46. CAOS Backend ____ __ ____ ____ __ ___ ____ ____ Application (C, C++ OPENCL) WebUI CAOSFlowManager Frontend IR generation – profiling – templates applicability check – HW/SW partitioning Functions Optimization HW resource estimation – static code analysis – performance estimation – code optimization / DSE Backend Runtime generation – function synthesis – floorplanning – bitstream generation IR gen. profiling … HW est. DSE … Floorpl. Bit. gen. … … … <system> … </system> Profiling Datasets System Description 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 10 1 0 10 1 0 1 10 1 0 10 1 1 01 0 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 1 01 0 1 01 0 1 1 01 0 1 01 1 0 10 System runtime FPGAs bitstreams SST MaxCompiler Output Genera- on 46
  • 47. CAOS: OpenCL and SDAccel • CAOS Frontend supports OpenCL code: – Intermediate representation support – Template applicability check for SDA – Code profiling through LTPV (OpenCL profiler) – Function optimization: • Static code analysis and HW resource estimation within SDA – Backend support for SDAccell </> </> 47
  • 48. CAOS Backend for SDAccel SDAccel generates & provides: - XCLBIN containing the bitstream - OpenCL Runtime to manage kernel execution CAOS Integrates SDAccel: - Identifying I/O Variables - Generating a specific OpenCL Host code for the application 48
  • 49. Evaluations: Evaluations [1, 2] Streaming Stencil Time-step (SST) [3] Pearson Correlation Coefficient, Asian Option Pricing [5] Protein Folding [4] Smith Waterman and Vessels Segmentation 49 [1] [2] [3] [5] [4] [4] [*] intel Xeon E5 1410 32 GB RAM [*]
  • 51. Hints on the problem… 51 * [*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC, pages 13-25, 2012.
  • 52. Heuristic-Optimal Floorplanner 52 Reconfigurable regions + Resource requirements FPGA Heuristic solution MILP model MILP Solver (Gurobi, Cplex, GLPK, …) Improved heuristic solution User-defined linear Objective Function Geometrical constraints • Non overlapping guaranteed by the geometrical constraints
  • 53. Objective function • Cost function can be defined starting from the variables and parameters of the MILP model • Implemented metrics: – Global wirelength measured using HPWL ( ) – Regions perimeter ( ) – Wasted resources ( ) 53
  • 54. Hints on the problem… 54 * [*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC, pages 13-25, 2012.
  • 55. Hints on the problem… • Optimal solution in 29s • 34% wasted frames reduction – No DSP and CLB wasted by the Video Decoder RR – No BRAM wasted by the Signal Decoder RR • Approximately same wirelength 55 * [*] Vipin, K. and Fahmy, S. A.: Architecture-aware reconfiguration-centric floorplanning for partial reconfiguration. In ARC, pages 13-25, 2012.
  • 56. 20 March 2018 your name / affiliation here 56
  • 57.
  • 58. Runtime Reconfiguration Management • Reconfigurable architecture – Static Area: used to control the reconfiguration process – Reconfigurable Area: used to swap at runtime different cores – Reconfiguration- oriented communication infrastructure • Runtime reconfiguration managed via SW – Standalone, Operating System • Increased portability of user applications • Inherited multitasking capabilities • Simplified software development process • Bitstreams relocation technique to – speedup the overall system execution – achieve a core preemptive execution – assign at runtime the bitstreams placement – reduce the amount of memory used to store partial bitstreams 58
  • 59. Runtime Reconfiguration Management • Reconfigurable architecture – Static Area: used to control the reconfiguration process – Reconfigurable Area: used to swap at runtime different cores – Reconfiguration- oriented communication infrastructure • Runtime reconfiguration managed via SW – Standalone, Operating System • Increased portability of user applications • Inherited multitasking capabilities • Simplified software development process • Bitstreams relocation technique to – speedup the overall system execution – achieve a core preemptive execution – assign at runtime the bitstreams placement – reduce the amount of memory used to store partial bitstreams 59
  • 60. OS-based management of dynamic reconfiguration • Provide software support for dynamic partial reconfiguration on Systems-on-Chip running an operating system (i.e., LINUX). – OS customization for specific architectures – Rec. Functional Unit caching policies to improve the performance – Partial reconfiguration process management from the OS – Addition and removal of reconfigurable components – Automatic loading and unloading of specific drivers for the IP-Cores upon components configuration and/or deconfiguration • Hardware-independent interface for software developers based on the GNU/Linux • Easier programming interface for specific drivers 60
  • 61. IP-Core Devices Access • Interaction with configured IP-Cores implemented by means of the standard Linux device access – Open, Close, Read, write, ioctl operations /dev/device_1 /dev/device_2a /dev/device_2b Software Application 1 Software Application 2 device_1.o device_2.o /dev Device Drivers Userspace Devices IP-Core_1 IP-Core_2a IP-Core_2b FPGA Multiple instances of the same hardware module 61
  • 62. Reconfigurable Process Control Block • Reconfigurable process: an RFU object code in execution • Each reconfigurable process is represented in the system by a Reconfigurable Process Control Block (RPCB). • A RPCB contains all the information associated with a specific reconfigurable process – State: the state in which the reconfigurable process control is at the current time – Position: the placement position on the device – Object Code Accounting Information: • Object Code • Configuration Priority • Resources • Position Configured Addressing Space Assignment Ready Positioning Configuring Configuring Cached Computing Removed Waiting Executing Preempted 62
  • 63. The Centralized Manager • Userspace applications are not allowed to explicitly request a bitstream – They request high-level functionalities • Userspace requests are collected and served by a centralized manager (Linux Reconfiguration Manager) – The OS chooses the configuration code – A new reconfigurable process is created • Only the LRM can ask for a bitstream to be configured on the FPGA – Centralized knowledge of the device status – Area management and module caching 63 63
  • 70. 70 Where to go necst?
  • 71. Trying to raise the bar • Towards the design and implementaion of Self- adaptive and autonomic systems 71
  • 72. Trying to raise the bar • Towards the design and implementaion of Self- adaptive and autonomic systems • We need to include 2 more features – Goal-oriented • Tell the system what you want • System’s job to figure out how to get there – Approximate • Does not expend any more effort than necessary to meet goals 72
  • 73. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 73
  • 74. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 74
  • 75. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 75
  • 76. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 76
  • 77. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 77
  • 78. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 78
  • 79. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 79
  • 80. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 80
  • 81. Data DataEnvironment Environment Output - Environment Update Output - Environment Update Online Static Solution Adaptive Solution 81
  • 83. Nice idea, but how to use it! 83 Umh…
  • 84. Autonomic Operating System • The AcOS project aims at – designing and prototyping a patch for commodity operating systems (e.g. Linux, FreeBSD) – being capable to observe its own execution and optimize, in a self-aware manner, its behavior with respect to the external environment, to user needs and to applications demands [integrated/used in different research projects] 84
  • 85. The research effort • K42 – http://researcher.watson.ibm.com/researcher/view_ project.php?id=2078 • The SElf-awarE Computing (SEEC) Model – http://groups.csail.mit.edu/carbon/seec/ • Angstrom – http://projects.csail.mit.edu/angstrom/ • The Swarm project and Tessellation OS – http://tessellation.cs.berkeley.edu 85
  • 89. 89
  • 91. AcOS: via an intelligent ODA loop 91 Monitoring Infrastructures Adaptation Policies Applications System Goals Measurements Decide Act Observe
  • 92. The Heart Rate Monitor • Heartbeats signal[1] either progresses or availability – video encoder: 1 heartbeat = 1 frame – web server: 1 heartbeat = 1 request – database server: 1 heartbeat = transaction • Heart rate as a performance measure and goal – High-level, application-specific performance measurements and goals (e.g., video encoder: 30 heartbeats/s = 30 frames/s) • Compact API, user/kernel-space partitioned implementation[2] – User-mode fast-path heartbeats issue – User and kernel-mode low-latency heart rate access 92 [1] Hoffmann et al., Application Heartbeats for Software Performance and Health, Proc. of the 15th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, 2010 [2] Sironi et al., Metronome: Operating System Level Performance Management via Self-Adaptive Computing, Proc. of the 49th Annual Design Automation Conference 2012
  • 93. The Heart Rate Monitor 93
  • 94. The Heart Rate Monitor • Set performance goal 94
  • 95. The Heart Rate Monitor • Set performance goal 95 e.g.: min: 25hb/sec max: 35hb/sec
  • 96. The Heart Rate Monitor • Set performance goal • Run the app and update progress 96 Issue an heartbeat e.g.: min: 25hb/sec max: 35hb/sec
  • 97. The Heart Rate Monitor • Set performance goal • Run the app and update progress 97 Issue an heartbeat Statistics automatically updated e.g.: min: 25hb/sec max: 35hb/sec
  • 98. The Heart Rate Monitor • Set performance goal • Run the app and update progress • Check heart rate and, if necessary, react 98 Issue an heartbeat Statistics automatically updated e.g.: min: 25hb/sec max: 35hb/sec
  • 99. A first evaluation • Hardware • Quad-core: Intel Core i7-870 @ 2.97 GHz with 8 MB of LLC • Hyper-Threading, Turbo Boost, etc. disabled • Memory: 4 GB of DDR3-1333 • Software • Debian GNU/Linux 2.6.35 • PARSEC Benchmark Suite 2.1 99
  • 100. x86 without any control 100
  • 102. Two controlled x86, different goals 102 Consensus object core allocator x264 x264
  • 103. Autonomic Scheduling • In a scenario were applications – are competing for the same set of resources – require predictable performance, expressed through high-level, application-specific metrics • The scheduler has to become Performance- Aware to automatically allocate resources to match performance goals – With Metronome, we introduce performance- awareness by means of a non-invasive modifications to the Completely Fair Scheduler 103
  • 106. Metronome Sironi et al., Metronome: Operating System Level Performance Management via Self-Adaptive Computing, Proc. of the 49th Annual Design Automation Conference 2012 106
  • 107. From Metronome to Metronome++ • Metronome demonstrates how a simple heuristic can be enough to enable goal- oriented resource allocation by exploiting runtime performance feedback • Metronome++ has been designed with more advanced adaptation policy to dynamically allocate CPUs to SLO-bound – E.g. Through tasks migration among run queues 107 (1) Linux kernel vanilla (2) Linux kernel enhanced with Metronome++
  • 108. Trying to raise the bar (again) • AcOS took into consideration performance… • Any other HOT topic? 108
  • 109. Trying to raise the bar (again) • AcOS took into consideration performance… • Any other HOT topic? – What about temperature Control/Management! 109
  • 110. 110 swaptions @ 2.80 GHz ab @ 2.80 GHz temperatureincrease(°C) 0 10 20 time (s) 0 100 200 300 400 500 600 Temperature Control/Management starting scenario
  • 111. 111 swaptions @ 2.80 GHz ab @ 2.80 GHz temperatureincrease(°C) 0 10 20 time (s) 0 100 200 300 400 500 600 Temperature Control/Management set a temperature cap
  • 113. 113 swaptions @ 2.80 GHz ab @ 2.80 GHz temperatureincrease(°C) 0 10 20 time (s) 0 100 200 300 400 500 600 Temperature Control/Management back to the starting scenario
  • 114. 114 swaptions @ 2.80 GHz ab @ 2.80 GHz temperatureincrease(°C) 0 10 20 time (s) 0 100 200 300 400 500 600 Temperature Control/Management set the same temperature cap
  • 115. Temperature Control/Management The AcOS refreshement: ThermOS 115
  • 116. 116 Where to go necst?
  • 117. A vision • To discover/understand the future, sometimes you have to look back at the past… 117
  • 118. Heterogeneous System Architecture 118 More information available at: http://hsafoundation.com/
  • 119. A global property • Being able to adapt is not a specific domain property – Operating system, computer architecture, etc… – HPC – Exascale computing systems – Embedded systems – IoT – … 119
  • 120. A global property • Being able to adapt is not a specific domain property – Operating system, computer architecture, etc… – HPC – Exascale computing systems – Embedded systems – IoT embedded/mobile devices and HPC/distributed/exascale computing 120
  • 121. A self-adaptive, context-aware operating system to enhance users experience via smart devices and environments The CHANGE project Enabling Technologies For Self-Aware Adaptive Computing Systems 121
  • 122. The mobile context • What about – Taking into consideration users’ habits and needs – Considering very constrained and fluctuating scenarios 122
  • 123. The mobile context • What about – Taking into consideration users’ habits and needs – Considering very constrained and fluctuating scenarios • Going Mobile! – Constrained in terms of available resources • Multi-core, reduced CPU performance, few memory • Battery constrained – Internal and external conditions are fluctuating • Resources availability varies over time • External conditions are unpredictable 123
  • 124. Mobile context: some examples • Tegra 2 – Dual-Core ARM Cortex-A9 – ULP GeForce, 8 cores • A6 – Dual-Core based on ARMv7 – Triple-core PowerVR SGX 543MP3 GPU • Tegra 3 – Quad-Core – ULP GeForce, 12 cores 124 Motorola Photon 4G HTC One X
  • 125. The morphone approach • To react to different situations all the information from different components should not be disregarded • Our approach uses all the information that it can retrieve to create a model of the context – Environment – Self – User behavior 125
  • 126. The Mpower example 126 Public Available for Free SEPTEMBER 2013 mpower.necst.it https://www.facebook.com/morphone.os Gain back your Android battery life! Configuration A Configuration B
  • 127. The concept of configuration • Domain-specific feature: hardware modules currently used • We defined the concept of configuration: • “Given the controllable hardware modules on a device, a configuration is a combination of their internal state” • We compute a single ARX model for each configuration Configuration A Configuration B Configuration C 127
  • 129. Modeling approach: “divide et impera” 129 We experienced a piecewise linear behavior and tried to attribute this to domain-specific features Modelling Approach 129
  • 130. Modeling approach: “divide et impera” 130 We experienced a piecewise linear behavior and tried to attribute this to domain-specific features Configuration A Configuration B Configuration C Modelling Approach 130
  • 131. Modeling approach: “divide et impera” 131 We experienced a piecewise linear behavior and tried to attribute this to domain-specific features Configuration A Configuration B Configuration C = actions on controllable variables Modelling Approach 131
  • 132. Modeling approach: “divide et impera” 132 We experienced a piecewise linear behavior and tried to attribute this to domain-specific features Configuration A Configuration B Configuration C = actions on controllable variables Exogenous input (uncontrollable) Modelling Approach 132
  • 133. Actions on controllable variables • They are determined by the user’s behavior • We model the evolution of the smartphone’s configuration as a Markov Decision Process • A state for every configuration • Transitions’ weights represent the probability to go from a configuration to another Configuration A Configuration B Configuration C 133
  • 134. Move Decision Phase on Device 134 • To perform the computation needed for the decision phase on the device – We need low energy computing systems • The battery is always the bottleneck – We need fast computing system • The system has to be responsive – We need reconfigurable computing systems • The analysis to be performed on the input data changes with respect to
  • 141. 141EXTRA Consortium Proprietary (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster 159 (b) Working NERA Cluster gure 25: NERA Cluster 159 NQ-Z1 Development (b) Working NERA Cluster Figure 25: NERA Cluster (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster 159 (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster 159 (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster 159 (a) PYNQ-Z1 Development board (b) Working NERA Cluster Figure 25: NERA Cluster
  • 144. 144
  • 145. Politecnico di Milano From FPGA-based Reconfigurable Systems to Autonomic Heterogeneous Computing Systems an enabling technologies perspective and more… International Center for Theoretical Physics Trieste @ 24 November 2017 Marco D. Santambrogio <marco.santambrogio@polimi.it> Politecnico di Milano

Editor's Notes

  1. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  2. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  3. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  4. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  5. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  6. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  7. The framework has been designed in order to ensure usability, so that users with low-expertise on FPGA are able to use it, interactivity during the design phases to guide the user through the optimization process, modularity, so that users can upload their custom modules.
  8. [AGGIUNGI QUI NOTE SULLA PROPRIETA’ DI MARKOVIANITA’]