Emerging multi/many-core architectures, targeting both High Performance Computing (HPC) and mobile devices, increase the interest for self-adaptive systems, where both applications and computational resources could smoothly adapt to the changing of the working conditions. In these scenarios, an efficient Run-Time Resource Manager (RTRM) framework can provide a valuable support to identify the optimal tradeoff between the Quality-of-Service (QoS) requirements of the applications and the time varying resources availability.
This paper introduces a new approach to the development of a system-wide RTRM featuring: a) a hierarchical and distributed control, b) the exploitation of design-time information, c) a rich multi-objective optimization strategy and d) a portable and modular design based on a set of tunable policies. The framework is already available as an Open Source project, targeting a NUMA architecture and a new generation multi/many-core research platform. First tests show benefits for the execution of parallel applications, the scalability of the proposed multiobjective resources partitioning strategy, and the sustainability of the overheads introduced by the framework.
TeamStation AI System Report LATAM IT Salaries 2024
A RTRM Proposal for Multi/Many-Core Platforms and Reconfigurable Applications
1. A RTRM Proposal for Multi/Many-Core Platforms
and Reconfigurable Applications
P. Bellasi, G. Massari and W. Fornaciari
{bellasi, massari, fornacia}@elet.polimi.it
ReCoSoC 2012
Dipartimento di Elettronica e Informazione
Politecnico di Milano
Last revision Jul, 2 2012
2. Introduction
Why Run-Time Resource Management?
Run-Time Resources Management (RTRM) is about
finding the optimal tradeoff between
QoS requirements and resources availability
Target scenario
Shared HW resources
upcoming many-core devices are complex systems
process variations and run-time issues
Mixed SW workloads
resources sharing and competition
among applications with different and time-varying requirements
Simple solutions are required
support for frequently changing use-cases
suitable for both critical and best-effort applications
The BarbequeRTRM Framework 2
3. Introduction
Paper Contribution
Methodology to support system-wide run-time
resource management
exploiting design-time information
hierarchical and distributed control
http://www.2parma.eu
BarbequeRTRM Framework
multi-objective optimization strategy
easily portable and modular design
run-time tunable and scalable policies
open source project http://bosp.dei.polimi.it
The BarbequeRTRM Framework 3
4. Introduction
How we compare?
Desirable Properties
De
sig
Co
Clu
ntr
n-T
Ho
ste
Mu
ol-
He
im
Mu
mo
red
Re
Th
ter
eE
lt.
lti-
g.
Resources
co
eo
.P
Re
Re
xp
Po
Ob
Pla
nf.
ry
lat
so
Managers
so
loi
r
jec
tab
/ Ad
tfo
Mo
for
urc
urc
tat
Proposals
ti v
rm
ilit
ap
ion
ms
de
es
es
e
s
y
t
l
StarPU
Binotto et al.
Fu et al.
ACTORS
SEEC
DistRM
BarbequeRTRM
The BarbequeRTRM Framework 4
5. The BarbequeRTRM
System-Wide RTRM: Overall Framework
user-space Application-Specific RTM
Critical Apps Best-Effort Apps Fine grained control on application
a b allocated resources:
A B - task ordering
- virtual processor assignment
RTLib RTLib - DVFS
Dynamic Code - application parameters monitoring
Generation C C
c System-Wide RTRM
Res Accounting Res Partitioning Coarse grained control on platform
G available resources:
Res Abstraction - resource accounting, partitioning
and abstraction
- high-level HW events handling
MRAPI Platform Proxy e.g., critical conditions, faults...
- manage applications priorities
D E - power/thermal “coarse tuning”
kernel
Platform DRV
Platform DRV d
Platform Driver
BarbequeRTRM
supported platforms F
f
Task Mapping I Platform Firmware
Legend
DDM H X SW Interface (API)
e
Y SW/HW Meta-data
The BarbequeRTRM Framework 5
6. The BarbequeRTRM
System-Wide RTRM: Presentation Outline
1
user-space Application-Specific RTM
Critical Apps Best-Effort Apps Fine grained control on application
a b allocated resources:
A B - task ordering
- virtual processor assignment
RTLib RTLib - DVFS
Dynamic Code - application parameters monitoring
Generation C C
c 2 System-Wide RTRM
Res Accounting Res Partitioning Coarse grained control on platform
G available resources:
Res Abstraction - resource accounting, partitioning
and abstraction
- high-level HW events handling
MRAPI Platform Proxy e.g., critical conditions, faults...
3 - manage applications priorities
D E - power/thermal “coarse tuning”
kernel
Platform DRV
Platform DRV d
Platform Driver
BarbequeRTRM
supported platforms F
f
Task Mapping I Platform Firmware
Legend
DDM H X SW Interface (API)
e
Y SW/HW Meta-data
The BarbequeRTRM Framework 6
7. The BarbequeRTRM
System-Wide RTRM: Distributed Hierarchical Control
1
user-space Application-Specific RTM
Critical Apps Best-Effort Apps Fine grained control on application
a b allocated resources:
A B - task ordering
- virtual processor assignment
RTLib RTLib - DVFS
Dynamic Code - application parameters monitoring
Generation C C
c System-Wide RTRM
Res Accounting Res Partitioning Coarse grained control on platform
G available resources:
Res Abstraction - resource accounting, partitioning
and abstraction
- high-level HW events handling
MRAPI Platform Proxy e.g., critical conditions, faults...
- manage applications priorities
D E - power/thermal “coarse tuning”
kernel
Platform DRV
Platform DRV d
Platform Driver
BarbequeRTRM
supported platforms F
f
Task Mapping I Platform Firmware
Legend
DDM H X SW Interface (API)
e
Y SW/HW Meta-data
The BarbequeRTRM Framework 7
8. The Proposed Control Solution
Distributed Hierarchical Control
Different subsystems have their own control loop (CL)
System-wide level (resources partitioning, system-wide optimization, ...)
Application specific (application tuning, dynamic memory management, ...)
Firmware/OS level (F/V control, thermal alarms, resource availability, ...)
FF closed CL
using OP and AWM
Optimal
user defined goal functions
including overheads
Robust
Adaptive
The BarbequeRTRM Framework 8
9. The BarbequeRTRM
System-Wide RTRM: Resource Partitioning Strategy
user-space Application-Specific RTRM
Critical Apps Best-Effort Apps Fine grained control on application
a b allocated resources:
A B - task ordering
- virtual processor assignment
RTLib RTLib - DVFS
Dynamic Code - application parameters monitoring
Generation C C
c 2 System-Wide RTRM
Res Accounting Res Partitioning Coarse grained control on platform
G available resources:
Res Abstraction - resource accounting, partitioning
and abstraction
- high-level HW events handling
MRAPI Platform Proxy e.g., critical conditions, faults...
- manage applications priorities
D E - power/thermal “coarse tuning”
kernel
Platform DRV
Platform DRV d
Platform Driver
BarbequeRTRM
supported platforms F
f
Task Mapping I Platform Firmware
Legend
DDM H X SW Interface (API)
e
Y SW/HW Meta-data
The BarbequeRTRM Framework 9
10. Scheduling Policy
YaMS - A modular multi-objective scheduler
Introduction of a new modular policy (YaMS)
partition available resources (R) on applications (A)
considering A priorities and R “residual” availabilities
multi-objective optimization
support a set of tunable goals
DONE: performances, overheads,
congestion, fairness
WIP: stability, robustness,
thermal and power
increase overall system value
considering discrete and tunable
improvements
LP theory, MMKP heuristic
promote scheduling of some AWMs
which improve optimization goals
demote scheduling of others AWMs
which degrade solution metrics
e.g. stability and robustness
The BarbequeRTRM Framework 10
12. Scheduling Policy
System-Wide Controller – Overall View
BBQ Validation Policy
- enforce certain control properties
energy budget, stability and robustness
- authorize resources synchronization
The BarbequeRTRM Framework 12
13. Scheduling Policy
System-Wide Controller – Inner-Loop “Scheduling”
BBQ Validation Policy
- enforce certain control properties
energy budget, stability and robustness
- authorize resources synchronization
The BarbequeRTRM Framework 13
14. Scheduling Policy
System-Wide Controller – Inner-Loop Overheads
Apps with 3 AWM, 3 Clusters => 9 configuration per application
BBQ running on NSJ, 4 CPUs @ 2.5GHz (max)
+
+
+
The BarbequeRTRM Framework 14
15. The BarbequeRTRM
System-Wide RTRM: Platform Integration
user-space Application-Specific RTRM
Critical Apps Best-Effort Apps Fine grained control on application
a b allocated resources:
A B - task ordering
- virtual processor assignment
RTLib RTLib - DVFS
Dynamic Code - application parameters monitoring
Generation C C
c System-Wide RTRM
Res Accounting Res Partitioning Coarse grained control on platform
G available resources:
Res Abstraction - resource accounting, partitioning
and abstraction
- high-level HW events handling
MRAPI Platform Proxy e.g., critical conditions, faults...
3 - manage applications priorities
D E - power/thermal “coarse tuning”
kernel
Platform DRV
Platform DRV d
Platform Driver
BarbequeRTRM
supported platforms F
f
Task Mapping I Platform Firmware
Legend
DDM H X SW Interface (API)
e
Y SW/HW Meta-data
The BarbequeRTRM Framework 15
16. The BarbequeRTRM
Platform Integration Layer
Support for generic Linux SMP/NUMA machines
portable solution, based on Linux Control Group
CPUs and Memory nodes assignments
Memory amount assignment
CPU bandwidth quota assignment (requires kernel 3.2)
Support both resources monitoring and control
pre-configured CGroup to define BBQ controlled resources
at Barbeque start, by parsing a pre-configured cgroups
… than Barbeque takes control over these resources
tun-time generation of new CGroup to control applications
requires RTLib integration (of course)
Working on P2012 integration
new genration many-core platform from STMicroelectronics
first version: 4 cluster with 16 Processing Element each one
The BarbequeRTRM Framework 16
17. Synchronization Policy
System-Wide Controller – Outer-Loop “Synchronization”
BBQ Validation Policy
- enforce certain control properties
energy budget, stability and robustness
- authorize resources synchronization
The BarbequeRTRM Framework 17
18. Synchronization Policy
System-Wide Controller – Outer-Loop Overheads
min AWM 25% CPU Time, 3 Clusters x 4CPUs => max 48 syncs
BBQ running on NSJ, 4 CPUs @ 2.5GHz (max)
+
Linux kernel 3.2
Creation overheads: ~500ms
Update overheads: ~100ms
+
(1/3 on quadcore i7)
+
+
+
Application dependent
CGroups
PIL
The BarbequeRTRM Framework 18
19. The BarbequeRTRM Framework
Power Optimizations
Initial experiments on congested workloads
increasing running instances of Bodytrack (PARSEC)
3AWM: [1,2,4] Threads
system-wide power measurements
via the standard IPMI interface
Power Gains
2,3-3,7%
Time Gains
338-625%
X86_64 NUMA machine: 3 Clusters x 4CPUs
BBQ running on NSJ, 4 CPUs @ 800MHz
The BarbequeRTRM Framework 19
20. The BarbequeRTRM Framework
Conclusions
Framework for System-Wide RTRM
flexibility and scalability of the RTRM strategy
thanks to its hierarchical and distributed control structure
acceptable overheads for real usage scenarios
including those with variable workload
tunable multi-objective optimization policies
to cope with several design constraints and goals
e.g., performance, power, thermal and reliability, ...
promising results in terms of performance improving
and power consumption reduction
for a highly parallel workload, on a NUMA multi-core architecture
availability of a simple API interface
making straightforward for the programmers to take full advantages
from framework services
The BarbequeRTRM Framework 20
21. The BarbequeRTRM Framework
Future Works
Optimization policies
robustness and stability assessment
improving power/energy optimizations
thermal and reliability management
Platform integration
extended support for Android targets
complete the integration with the P2012 many-core platform
The BarbequeRTRM Framework 21
22. Thanks for your attention!
If you are interested, please check
the project website for further information
and keep update with the developments
http://bosp.dei.polimi.it