SlideShare a Scribd company logo
1 of 54
Download to read offline
D1.2 Analysis and selection of low power
techniques, services and patterns
V1.0
Document information
Contract number 687902
Project website www.safepower-project.eu
Contractual deadline 01/07/2016
Dissemination Level PU
Nature R
Author OFF
Contributors IKL, USI, FEN, IMP, KTH
Reviewer
IAB members:
 Gebhard Bouwer (TÜV Rheinland)
 Christophe Honvault (ESA)
 Joaquín Autrán (GMV)
 Daniel Gracia (Thales)
 Giulio Corradi (Xilinx)
Internal reviewer:
 SAA
Keywords low power techniques, system-level services
Notices:
This project and the research leading to these results has received funding from the European Community’s
H2020 program [H2020-ICT-2015] under grant agreement 687902
 2016 SAFEPOWER Consortium Partners. All rights reserved.
Page 2 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Change log
VERSION DESCRIPTION OF CHANGE
V0.1 First draft OFF (Maher Fakih)
V0.2 Planned contributions by IKL, USI, OFF, FEN, KTH
V0.5 Contribution by IKL, USI, OFF, FEN, KTH
V0.6 Consolidated and Reviewed by OFF (Maher Fakih)
V1.0 Consolidated of IAB Review. SAAB Review by OFF
Page 3 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Table of contents
1. EXECUTIVE SUMMARY ........................................................................................... 4
2. STATE OF THE ART OF LOW-POWER TECHNOLOGIES............................................... 5
2.1 OPERATING SYSTEM AND FIRMWARE LOW-POWER SUPPORT..........................................5
2.2 HARDWARE LOW-POWER SUPPORT...................................................................................9
2.3 POWER MEASUREMENT AND MONITORING SUPPORT....................................................19
3. IMPACT OF THE TECHNIQUES ON SAFETY (ANALYSIS) ........................................... 21
3.1 SAFETY, LOW-POWER AND MIXED CRITICALITY................................................................21
3.2 STATE OF THE ART IN SECURITY, MIXED-CRITICALITY AND LOW-POWER ........................22
4. ZYNQ PLATFORM POWER MANAGEMENT CAPABILITIES....................................... 24
4.1 OVERVIEW OF LOW-POWER FEATURES............................................................................24
4.2 POWER RAILS/DOMAINS...................................................................................................25
4.3 CLOCK CONTROL................................................................................................................26
4.4 PROCESSOR SYSTEM (PS): POWER MANAGEMENT ..........................................................27
4.5 PROGRAMMABLE LOGIC (PL): POWER MANAGEMENT....................................................29
4.6 MONITORING ....................................................................................................................30
4.7 SUMMARY OF LOW POWER FEATURES ............................................................................32
4.8 OUTLOOK TO ADVANCED FEATURES ................................................................................32
5. LOW-POWER SERVICES ........................................................................................ 34
5.1 LOW POWER SERVICES OF THE HYPERVISOR/OS..............................................................34
5.2 GENERATION OF LOW-POWER NOCS WITH NOC MPSOC SYSTEM GENERATORS............37
5.3 UTILIZATION OF LOW-POWER SERVICES AT HIGHER ABSTRACTION LEVELS....................39
5.4 DEFINITION OF THE LOW POWER SERVICES OF THE GENERIC PLATFORM W.R.T. THE ON-CHIP
COMMUNICATION.............................................................................................................40
6. LITERATURE ......................................................................................................... 47
Page 4 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
EXECUTIVE SUMMARY1.
In this deliverable different aspects of the state-of-the-art of low-power technologies are
analysed and confronted to the mixed criticality needs and requirements, in order to be able
to make a selection of the technologies that are feasible for the SAFEPOWER needs. This
means in particular that all candidate technologies and their combination had to be
investigated from a safety standard perspective. The following technologies and techniques
have been taken into account:
 Hardware and Software level support/techniques for power management (e.g.,
Dynamic Voltage and Frequency Scaling (DVFS), Power Gating, Clock Gating),
 Architectural services (e.g., fault tolerance, communications, diagnostics) taking
into account hardware level power.
Page 5 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
STATE OF THE ART OF LOW-POWER TECHNOLOGIES2.
In this section a short overview of the current existing approaches is presented which cover
low-power management technologies at different service levels e.g. at the level supported
by the operating systems, the firmware support and the hardware support. Next we
elaborate on the power monitoring at the above mentioned different levels.
2.1 Operating system and firmware low-power support
The operating system and the hypervisor are the underlying software that manage and
control of the hardware devices for the applications and execute the application activities.
Although in the literature there is not any reference to hypervisors due to its recent
appearance, some of the traditional techniques used by the operating systems can be
moved from the OS layer to the hypervisor layer in a partitioned system. However, other
techniques can be still considered at OS level, such as device management, due to the
hypervisor allows to the OS in partitions a direct management of some devices.
Next subsections review some of the main techniques involving the OS:
2.1.1 Memory management
From the operating system point of view, the memory management can impact in two main
issues: allocation of applications and management of memory types to reduce the energy
consumption.
There exist a group of works that have analyzed and proposed solutions to mitigate the
amount of memory of the applications. In [89], it is focused on the amount of memory and
the need of saving memory by compressing pages of memory. It requires from the OS the
virtual memory management unit (MMU) to store and load compressed pages. Authors
claim for an important memory size reduction and, consequently, energy. In [90], hardware
mechanisms for compression of data between cache and RAM are proposed. Other papers
have proposed several schemes for this purpose. The process compression/decompression
requires extra time. However, this approach is intended to be performed via hardware and
only in specific points of the execution.
Dynamic memory allocation or dynamic storage allocation (DSA) has been a relevant part of
the OSs for allocating memory to applications. Applications either perform memory
requests to the OS for memory blocks or release already allocated blocks. The allocator
algorithm is crucial for memory allocation and two main problems arise: temporal cost of the
allocation and space usage. [91] presents a survey of these techniques. In [92] the TLSF
allocator is proposed which performs the allocation and deallocation in constant time and
achieve bounded fragmentation in the system.
2.1.1.1 Cache and scratchpad memories
Scratchpad memory has been used as a partial or entire replacement for cache memory due
to its better energy efficiency and predictability.
Scratch-Pad Memory (SPM) is intended to avoid the main drawbacks of caches. They consist
of small, fast memory areas (SRAM...), very much like caches, but are directly and explicitly
managed at the software level, either by the developer or by the compiler. Hence, no
Page 6 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
dedicated circuit is required for SPM management. This would mean that there is even a
deterministic behavior which is not provided by typical cache implementations.
Deterministic behavior is a major benefit for safety related applications.
In [93], a comparison of several SPM with their advantages is presented. One of these
advantages is the important reduction of the energy (up to 40% less energy than caches). In
[94] a survey of techniques for SPM management is detailed.
2.1.2 Basic device management (hypervisor)
2.1.2.1 Processor management
The processor management at OS level can have an impact on the energy consumption via:
 Setting the processor in a low power state or switching-off during the time intervals
of no activity.
 Scaling CPU voltage
 Scaling CPU frequency
 Scaling both, CPU voltage and frequency
These techniques can be used to reduce the energy consumption but the main problem in
real-time systems is the deadline of the tasks. When the voltage or frequency of the CPU
decreases the CPU speed decreases and the time required to complete a task increases. The
decision about which conditions should be satisfied in order for a task or a set of real-time
tasks to meet the temporal constraints is an NP-Hard problem [95]. In the recent years,
techniques to decide an on-line or off-line schedule to guarantee the task deadlines while
minimizing the CPU energy consumption have proliferated in journals and conference
papers. A review of them can be found in [[96], [97], [98], [99], [100], [101], [102]].
2.1.2.2 Scheduling techniques for low power consumption
Dynamic voltage and frequency scaling (DVFS) is a technique that allows to modify voltage
and/or frequency of the CPU based on performance and power requirements. Several
commercial processors support this technology for saving power. The limitation of DVFS is
that it increases execution time of the tasks and can lead to miss deadlines. So, the
appropriated selection of the scale is fundamental to guarantee deadlines while reducing the
energy consumption.
Several techniques have been proposed to save energy while guaranteeing task deadlines.
Some of these techniques compute the slack available and adjust the frequency or/and
voltage to reduce the slack. In general, high priority tasks are executed at higher frequencies
to generate more slack and adjust lower priorities task to reduce the frequency.
Another technique uses non-linear optimization to find the optimal frequency for every task.
This technique, however, has large complexity and hence is only suitable for the off-line use.
On the other hand, dynamic power management (DPM) [98] takes profit of the low power
energy states (like sleep or stand-by) every time the processor is idle. DPM is a mechanism
that dynamically reconfigures a system to provide the requested services and tasks at the
same performance level but with a minimum number of active components or a minimum
load. DPM considers the transition time between different power consumption modes.
Switching from the active mode to the sleep mode and then back to the active mode has a
Page 7 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
penalty in time and energy overhead. Therefore, it requires to check the impact from a point
of view of scheduling and energy consumption.
The operating system is in charge of the policy for DVFS or DPM decisions. These can be
defined off-line or on-line. Off-line decisions are defined prior the system execution and it is
provided to the OS the information related to the execution conditions for each task. On-line
decisions are taken by the OS according to the execution results. As far as we know, there
are not works involving hypervisors in these approaches.
On the basis of the existence of a hypervisor that deals with partitions that encapsulate an
OS (guestOS) and the internal application tasks, the schedule is hierarchical. Two scheduling
levels coexist:
 Hypervisor schedule: it is a static schedule of partitions where temporal windows
for partitions are off-line decided. In multicore systems, the schedule also defines in
which core the partitions (or the temporal windows of a partition) are executed.
 guestOS schedule: in the temporal intervals in which the partition is scheduled by
the hypervisor, the guestOS schedules the internal tasks according its priority or
deadline.
Under this view, a separation of concerns can be defined for the two scheduling levels:
 Hypervisor level: off-line scheduling using non-linear optimization techniques. The
off-line schedule can decide allocation of partitions to cores, temporal intervals
allocation and a range of the voltage and/or frequency scale for each temporal
interval. Additionally, it can deal with DPM when the core is idle or the partition
finishes its activity before the allocated time.
 guestOS level: on-line scheduling using execution time information and the range of
allowed voltage and/or frequency to adjust the energy consumption.
2.1.2.3 Clock management
The RTOS offers to the applications a regular clock service. In addition, the RTOS has to
program a timer device to be aware about the time occurrence. The clock management at
RTOS level maintains the time value and increments it according the time advance. However,
the RTOS can program the timer to be interrupted every time unit (system tick) and maintain
a counter with the increments. In practice, ticks execute periodically, at a rate sufficient for
the most granular timing needed by the application. As a result, most system ticks will not
result in a time-driven function being executed. In energy efficient applications, it is clearly
undesirable to be woken up from a low-power mode just to service the system tick timer
interrupt and then find there is nothing to do. System tick is the basis for the time activities
in the OS. When an application is running, it is interrupted by the clock management every
system tick with the implication of change of the power mode required to execute the OS
service.
An alternative to this common practice in OS-s is to program the timer periodically to
manage the clock with very long periods (n seconds) avoiding the overflow count. In this
case, the OS is interrupted every n seconds incrementing an internal OS counter of the
interrupts received. The system time is built with the value of the internal counter and the
value of the timer register. This mechanism reduces dramatically the number of interrupts
due to the clock management and allows to reduce the time used by the OS handling
Page 8 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
interrupts and, consequently, the interrupts to the application with the change of the power
mode implication. This management impacts directly in the energy consumption of the
system.
Time events are directly managed by the OS programming one-shot timer (use of a second
timer).
2.1.3 Device management (OS)
OS-s manage devices in the system. Devices that have the hardware capability can be
managed to enter in device power states. The OS can apply different policies to individual
devices. The OS can:
 Power up a device as soon as needed after system start-up or idle shutdown.
 Power down a device at system shutdown time or put it to sleep when idle.
 Enable device wake-up, if the device supports wake-up capabilities.
 Manage device performance states, if the device supports decreasing performance
or features to reduce power consumption.
When the system is composed by the OS and its applications, the OS can power down the
device when the system is idle. In a partitioned system based on hypervisor, it provides
virtual CPU to the partitions but devices are managed by the OSs. Explicit resource allocation
is specified in the configuration file of the system. In the case of devices, they are explicitly
allocated to partition that contain the device drivers and manage them. So, all policies
related to device power management should be handled by the guestOS.
Based on the hypervisor cyclic scheduler, partitions run at specified temporal windows
defined in the configuration file. From this point of view, the hypervisor can know at which
time intervals the partition is not under execution and could apply specific policies to put
devices to sleep. However, as the device drivers are allocated and managed at OS level, the
hypervisor cannot do it. Therefore, the task to put the devices to sleep mode must be
delegated to the partitions based on the functional operation and utilization of peripherals.
2.1.3.1 Firmware support
Although power-management techniques may be directly implemented in the software
application code, yet offering mature and highly tested operating system services supporting
such techniques would be more reliable and less cumbersome.
Typically, a developer would specify for the application to be deployed a set of use-cases
where each use-case demands a certain operating mode with specific performance and
power requirements. Depending on the current modus operandi of the current application,
the RTOS applies the appropriate power management service by setting the entire SoC or
individual sub-devices into the corresponding power mode.
The main requirement on the Firmware support for power management is that its services
should possess a complete knowledge of the underlying hardware power capabilities and
should be able to control these setting it in different power modes. In order to maintain
complexity, firmware is used to support the RTOS via offering service to low-level hardware
technologies (see Figure 1).
Page 9 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Figure 1: Operational phases of Power management services (Source Intel Corporation, 2009) [45]
Usually modern power-aware Real-time Operating Systems (RTOS) comes with their built-in
firmware supporting different power services such as DVFS, clock gating and others
mentioned above. Taking a look for e.g. at the Linux Core Power management User Guide
(V4.1), shows that Linux kernel supports a variety of (built-in) dynamic power-management
services such as DVFS, CPUIdle and Smartreflex and idle-power management services such as
Suspend/Resume services.
2.2 Hardware low-power support
2.2.1 Low power techniques for predictable architectures and communication
Power consumption can be divided into three parts, dynamic, static and short-circuit power.
They depend on physical values like voltage or frequency, so hardware support for low
power techniques can have a significant impact. The next subsections deal with techniques
concerning aspects of communication and standby-sparing as well as fault tolerance and
safety aspects.
2.2.1.1 Low power on chip communication
On today's System-on-Chips, many different cores can be integrated on one chip. An
important aspect of power consumption is communication between the single cores.
Therefore, the following subsections handle low power techniques for on chip
communication.
2.2.1.1.1 Run-time clusterization for energy efficient on-chip communication
System-level exploration of run-time power clusterization as presented in [68] increases
energy efficiency of on-chip communication using an adaptive system architecture for power
management called Dynamically-Clustered DVFS (dynamic voltage and frequency scaling),
for short DCDVFS. At runtime, overburdened or idle network regions are identified and
reconfigured with new power schemes. This method improves Voltage Island partitioning
(V/F partitioning), where spatial locality of communication traffic on a parallel platform is
exploited. The benefit of DCDVFS is, that clusters are configured at runtime while in V/F
partitioning the islands are defined at design time. On that score, spatial variations of
communication traffics are also considered.
Page 10 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Simulations on an 8*8 mesh Network-on-Chip (NoC) and a 65nm power model extracted
from Orion 2.0 show, that the approach achieves much lower energy for traffic with spatial
variations compared to existing approaches (9% to 42%). Besides, the approach incurs a
moderate and predictive latency and minimal area overhead.
2.2.1.1.2 Adaptive SoC
System-on-a-chip (SoC) implementations integrate different intellectual property (IP) cores
or take advantage of application parallelism to improve performance. One disadvantage are
inefficiencies like hot spot bottlenecks which may introduce additional power consumptions.
Therefore, in AsoCs [69] a statically scheduled interconnect structure increases system
throughput since unnecessary interconnect switching activities are eliminated. Additionally,
application mapping tools balance load across all cores and unused cores can be dynamically
reconfigured to low power.
The authors show, that the interconnect power consumption is low and the overhead due to
configuration streams is less than 10% for both bandwidth and power.
As another approach [70], minimizing energy consumption is reached by mapping
application task to heterogeneous processing elements (PEs) on a NoC which may operate at
different voltage levels. Therefore, the tasks must be mapped to PEs and the PEs to routers.
Additionally, operating voltages must be assigned to the PEs and the data paths must be
routed. The steps can be solved sequential or unified. The unified approach has more
performant and energy efficient results as shown in evaluations using the E3S benchmark
suite. The authors also show, that their heuristic approach achieves near-optimal solutions
while it is much faster than the optimal algorithm.
2.2.1.1.3 DVFS in NoCs
A NoC is a high performance and scalable alternative to the bus-based architecture [72] but
consumes a lot of power (up to 39% in [71]). Reducing the power consumed by the NoC
leads to significant system wide energy savings. An effective hardware technique is DVFS.
DVFS allows to adjust the processor frequency depending on the workload by reducing the
supply voltage. Since there is a square relationship between voltage and power and a linear
relationship between voltage and frequency, reducing the frequency results in a cubic power
saving [72]. The idea is to adjust the voltage in low-utilization phases such that the circuit
operates at just the speed it requires to process the workload.
There has been a lot of research done in improving DVFS. Some of them are described and
compared in [72]. Solutions take advantage of e.g. CPU idle states during memory access,
applying adaptive design techniques on local NoC units to globally reduce energy
consumption or using per-core DVFS than varying the whole chip's voltage. Also additional
hardware is required like a Power Management Unit (PMU) that controls the generation of
the supply voltage and clock.
One disadvantage of DVFS is that due to increased execution time also leakage energy rises
[73]. The authors of [73] introduce enhanced race-to-halt to resolve that and show in
simulations an improvement of up to 8% over the existing Leakage Control Earliest Deadline
First schedule [74].
DVFS is often combined with other techniques. For example, in most cases memory limits
the reduction of frequency and voltage in the whole system. Using voltage islands is
Page 11 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
lucrative since communication and memory can run at different voltages such that both are
are safe and meet their throughput requirements [80].
2.2.1.1.4 Slack optimization
The unused processing time in a system is called slack and can be categorized in two types,
dynamic and static. The latter exists due to spare capacity since the system is loaded less
than what can be guaranteed by the schedulability tests. Differences between the worst-
case assumptions and the actual behaviour result in dynamic slack [73].
Lin et al. [75] presented 4 principles for effective slack management and developed 4 slack
scheduling algorithms for Earliest Deadline First (EDF)-based systems that support mixed
criticality. The principles are (1) to allocate slack as early as possible with the priority of the
donating task, (2) to allocate slack to the task with the highest (earliest original) priority, (3)
to allow tasks to borrow against future resource reservation (with priority of the job from
which the resources are borrowed) to complete their current job and (4) to retroactively
allocate slack to tasks that have borrowed from their current budget to complete a previous
job. Using these principles can reduce the average deadline miss ratio by up to 100%.
The authors of [73] improved these algorithms to reduce power consumption in
combination with DVFS such that the system can consume available slack in idle mode.
2.2.1.1.5 Variable channel width
To improve throughput and latency, existing NoC implementations use wide channels of
about 128-bit or more. While these channel widths are beneficial for large message sizes
(512 bits or more), short control messages only use 64 bit. Since control messages account
for a significant portion of the NoC traffic, it is a waste of energy not to use the remaining
bits for other messages [71].
The approach in [71] splits the 128-bit channel into two narrower channels of 64-bit which
allows transmitting a short message of 64-bit on the one link and shutting down the other
one. If no congestion occurs, it is nearly as performant to transmit two short flits in the wide
links as to send them one by one in a single narrow channel. However, the latter has the
chance to enable clock gating at no performance penalty and low hardware overhead. For
that reason, the approach is applicable to reduce the NoC's power consumption by up to
25% with workloads from the PARSEC 2.1 benchmark.
2.2.1.1.6 Router Power Gating
Router Power Gating is an effective way to reduce power consumption in NoC by switching
off routers but may introduce wakeup-delay and energy overhead caused by frequent mode
transaction [76].
A major power-consuming operation in NoC is memory access and data moving. Yuho Jin
[76] presents a combination of router power gating with region based data placement. The
idea is to reduce data traffic by localizing private data and concentrating shared data in one
region of the NoC which increases the power gating opportunity. Therefore a dimensionally
power-gating router with a region-based routing algorithm is introduced which reduces
router static power and performance/energy overheads in power gating. The SPEComp
benchmark shows power savings by 46% using dynamic power gating management and 20%
in case of static management [76] in an 8*8 mesh NoC with 64 cores.
Page 12 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
2.2.1.2 Scheduling for low power CRTES architectures
Hardware capabilities like DVFS on their own are insufficient and must be paired with
software to control them. Placing this logic into the OS scheduler is attractive because of
simplicity, low cost and low risk associated with modifying the OS' scheduler part.
In [83] the authors describe three types of scheduling techniques. The first type controls
DVFS and DPM to dynamically throttle voltage and frequency of the CPU or temporarily
suspend its operation. The second one performs thermal management. It primarily relies on
the placement on threads in cores to avoid thermal hotspots and temperature gradients. At
last, asymmetric systems are depicted. These systems are built with low-power and high-
power cores on the same chip executing the same binary. The goal is to assign threads to
cores according to their requirements on resources.
All algorithms discussed need dynamical monitoring properties of the workload, to make
decisions that consider the characteristics of the interplay between hardware and workload,
and controlling the configuration and allocation of CPU cores to make for a best trade-off
between performance and power consumption.
2.2.1.3 Standby-sparing
In real-time systems, redundancy is commonly used to achieve fault tolerance. While time-
redundancy does not incur high energy overhead, it is not capable to obtain the reliability
requirements of safety-critical applications. Standby-sparing as a hardware-redundancy
technique can meet those reliability constraints but is not suitable for low-energy systems
since they either impose considerable energy overheads or are not proper for hard timing
constraints [77].
Ejlali et al. [77] developed an online energy management technique for standby-sparing in
hard real-time applications called low-energy standby-sparing (LESS). It exploits available
slack at runtime to reduce energy consumption while guaranteeing hard deadlines and uses
Dynamic Power Management (DPM) which shuts down idle system components [78].
Compared to an existing low-energy time-redundancy system, LESS is more reliable and
provides about 26% energy saving in case of relaxed time constraints. For tight deadlines,
LESS preserves the reliability but with 49% more energy consumption. Compared to triple
modular redundancy or duplication as well-known hardware redundancy techniques, the
increase is much lower since these methods have an overhead of 200% and 100%,
respectively [77].
2.2.1.4 Low energy methods and safety
In many applications like automotive and avionics many tasks with different criticality levels
are integrated on one chip building mixed-criticality systems. Due to the complexity of
modern computer platforms, obtaining accurate WCETs is hard. Uncertainty in WCETs can
lead to task overruns which should be avoided in case of safety-critical tasks. Typical
solutions are termination of low-critical tasks or degradation of the service provided.
Disadvantage is, that removing tasks with low criticality-level means, that the required safety
associated with those levels are also removed [86].
Page 13 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
The approach given in [86] prevents that by using DVFS not to slow down the system to save
energy but to speed it up. Then, fast recovery is possible as well as resolving higher
workload. Due to higher frequencies, energy dissipation also increases. If the system still
misses deadlines, tasks can also be terminated instead of overclocking to normal speed. The
proposed technique is evaluated using an industrial flight management system.
Another example concerns DVFS since the reduction of the supply voltage can increase
transient fault rates [85]. Zhao et al. use dynamically allocated recoveries. They show, that
providing recovery allowance to a given periodic task achieves high reliability levels as long
as the allowance can be reclaimed on demand. To determine the recovery allowance and
frequency assignments they use a feasibility test which minimizes energy consumption while
satisfying timing and reliability constraints [85].
2.2.1.5 Fault tolerance and low power
Low power techniques may have the problem that they negatively affect the system's
reliability. E.g., studies have shown that DVFS comes at the cost of significantly increased
transient fault rates [85] . Reducing the supply voltage of caches also has a negative impact
on their reliability [79]. This section describes techniques that provide both, fault tolerance
and low power.
2.2.1.5.1 Energy saving with fault tolerant caches
There are a lot of mechanisms for leakage reduction or fault-tolerance in deep-submicron
memories but they often do not affect both aspects. Former approaches for fault-tolerant
voltage-scalable (FTVS) SRAM cache architectures can suffer from high overheads. That is
why the authors of [79] introduce a static (SPCS) and dynamic (DPCS) variant of
power/capacity scaling, a simple and low-overhead fault tolerant cache architecture. The
mechanism combines global multi-level voltage scaling of the data array SRAM cells with
power gating of blocks that become faulty at each voltage level. SPCS sets the runtime cache
VDD statically, such that almost all of the cache blocks are not faulty. DPCS reduces the
voltage to save more power than SPCS while limiting the impact on performance caused by
additional faulty blocks.
Due to significantly lower overheads the approach can achieve lower static power for all
effective cache capacities than a recent complex FTVS work. Architectural simulations show
energy savings of 55% (SPCS) and 69% (DPCS) with respect to baseline caches at 1V. In the
worst-case cache configuration there is no more than 4% of performance and 5% area
penalties while maintaining high yield.
2.2.1.5.2 Fault tolerant low power communication
Challenges in communication systems are: power consumption, which occurs due to the
complex algorithms enabling broadband communication, scaling which enables
unprecedented integration but introduces a penalty by leakage power and reliability, and
costs. Generating high yielding architectures needs built in self-tests and self-repair, so 100%
error-free chips are very expensive and soon will become impractical [80].
One solution is a system that has built in inherent redundancy. Communication systems are
a perfect fit due to the high level of redundancy. The authors [80] examined the
Page 14 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
relationships between the components and their vulnerability in terms of power
consumption and reliability and between the needs, assumptions and requirements of the
algorithm executing the design. If the algorithm is able to accept and possibly correct
hardware induced errors, it becomes possible to co-design the algorithm and the hardware
simultaneously and thereby reduce the disadvantages of technology scaling. They
intentionally vary the operating conditions to a point when the error occurs and to exploit
the extended design space to optimize other aspects which leads to an optimal design in
power consumption and robustness. The example of a WCDMA modem shows savings of
23% in embedded memory power consumption and 13% in the whole system.
2.2.1.5.3 Low overhead two state check pointing
Transient faults are a major reliability concern which can be tolerated by triple modular
redundancy or standby-sparing. Checkpointing with rollback recovery is another well-
established technique but incurs significant time and energy overhead which may not be
feasible in hard real time systems [81]. A low overhead alternative is two-state
checkpointing (TsCp). It differentiates between fault-free and faulty execution and leverages
to types of checkpoints. Non-uniform intervals are determined based on postpointing
checkpoint insertions in fault-free states to decrease the number of checkpoints. Uniform
intervals shall minimize execution time for faulty states, leaving more time for energy
management in fault-free states. Enabling the checkpoints at selective locations curtails the
time and energy overhead while considering deadlines, execution time and tolerable faults.
A trade-off between the number of checkpoints and the operating voltage-frequency
settings obtains energy-efficient fault tolerance.
Evaluation on an embedded LEON3-processor with non-volatile memory technology to store
the checkpoints has shown that TsCp reduces the number of checkpoints by 62% at average
which results in 14% and 13% reduced execution time and energy consumption,
respectively. The combination with DVS achieves up to 26 % (21% on average) energy saving
compared to the state-of-the-art checkpointing while providing a reasonable reliability.
2.2.2 Low power multicore embedded systems
To increase performance and provide scalability, nowadays multicore systems are widely
used [87] . Multicore systems are more complex than single cores, e.g. high level caches
must be coherent [88] , so some of the approaches described above only consider singlecore
systems. In future work, these techniques must be applied to multicore systems (e.g. [79]).
Other examples for low power techniques concerning multicores are dynamic WCET
estimation with DVFS and worst-case power estimation which we will discuss in the next
subsections.
2.2.2.1 Dynamic WCET estimation for real-time multicore embedded systems
supporting DVFS
In real-time systems, the worst-case execution time (WCET) of tasks is required to ensure
systems stability. A high estimation accuracy reduces the number of deadline misses and
supplementary improves energy savings [82].
The Processor-Memory model (Proc-Mem) proposed in [82] dynamically predicts the
execution time of an application running on a multicore processer when varying the
Page 15 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
processor frequency. Instead of analyzing the application's source code or hardware
platform, Proc-Mem executes the workload during the first hyperperiod at maximum speed
to find the required input parameters for the model which then estimates the most energy-
efficient frequency that fulfils the deadline through different applications' periods. These
values are used by the scheduler for subsequent periods.
Compared to a typical Constant Memory Access Time model (CMAT), the deviation of Proc-
mem is always lower than 6% with respect to the measured execution time while the
deviation of CMAT always exceeds 30%. The approach reaches up to 47.8% (22.9% on
average) of energy savings for a similar number of deadline misses.
2.2.2.2 WCP (Worst-Case Power) Estimation
In [43] , [44] authors claim that energy is as important as time in mixed-criticality systems
and they demonstrate how an incorrect handling of energy can violate mixed-criticality
guarantees.
With the purpose of overtaking this issue, a pioneer work was done in [40] utilizing static
analysis to estimate the worst-case energy consumption of a program running on complex
architectures and providing power guarantees.
For the same purpose, a monitoring and control mechanism has been proposed in [41] to
isolate the power consumption of mixed-criticality applications on a many-core platform.
Xilinx offers also an accurate WCP estimation using the Xilinx Power Estimator (XPE) [42] .
XPE is a kind of excel sheet which can be calibrated with design specific parameters (see the
proposed 7 steps in [42] ) to obtain an estimation of the WCP.
In addition, some novel work was presented in [43] on how to estimate the worst-case
power for energy-constrained embedded systems. In this work, authors propose to compute
upper bounds for the energy consumption by statically analysing (combining implicit path
enumeration and genetic algorithms methods) the program code based on the energy costs
of single instructions on the target architecture. In case no precise energy costs are available
for single instructions, the authors propose to determine the WCP by measurement. Here
they support finding a set a suitable program inputs to be used as measurement parameters.
2.2.3 PCB support for low-power
As today’s electronic designs continue to grow in complexity, managing power consumption
and optimizing overall efficiency become even more important. Accurate power supply
voltage and current monitoring is crucial to conserving power and guaranteeing reliability in
everything from industrial and telecom applications, to automotive and consumer
electronics.
2.2.3.1 Best practices on low-power PCB designs
Measuring power consumption, as well as other critical parameters, and optimizing overall
efficiency can be a challenge with discrete solutions. Nevertheless, several tools have been
developed which will help us in this work by analysing our PCB design’s power integrity and
consumption and optimizing it.
Page 16 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
HyperLynx Power Integrity – Mentor Graphic Corporation
This tool [1] provides modelling of power distribution networks and noise propagation
mechanisms throughout the PCB design process. It is useful for identifying potential power
integrity distribution issues that can interfere with board design logic, and for investigating
and validating solutions in a “what-if” environment. HyperLynx PI analyzes voltage drop,
identifying areas of excessive current density in the layout, simulates IC switching noise as it
propagates throughout planes and vias and facilitates Power Delivery Network (PDN)
impedance validation across the full operating frequency range.
PDN Analyzer – Altium
The PDN Analyzer [2] allows resolving PDN issues as they arise in the board layout process. It
offers analysis of complex nets and copper geometries, plots DC voltage and current density
graphics and it provides customized views for DC power analysis. All that in the same unified
design and analysis environment.
CR-5000 Lightning – Power Integrity Advance – Zuken
This tool [3] provides power integrity and electromagnetic interference analysis within the
real-time PCB design flow. With EMI, AC and DC power analysis combined in a single
environment, it helps determining the best decoupling and power distribution strategy for
the pre-layout and post-layout stages with support for a complete what-if environment.
PI Solution – Sigrity
The Sigrity PI Solution [4] offers 4 different tools related to Power Integrity:
 PowerSI: Detailed electrical analysis with fast and accurate signal/power integrity
and design-stage EMI analysis, S-parameter model extraction, and frequency
domain simulation.
 PowerDC: Electrical/thermal co-simulation, hotspot detection, and signoff for low-
voltage, high-current PCB and IC package designs.
 OptimizePI: AC frequency analysis of boards and IC packages, with support for pre-
and post-layout decap studies that ensure high performance at system and
component levels.
 PowerSI 3DEM Option: Full-wave and quasi-static solver technology capable of
accurate analysis of complex 3D structures.
2.2.3.2 Low power commercial microprocessors
In this section, different COTS technologies are identified as standard processors families
that are commonly used in the industry for their low-power features.
ARM [5] uses a 32-bit RISC (Reduced Instruction Set Computing) instruction set for its
processors. The processors using this architecture require significantly fewer transistors than
typical CISC (Complex Instruction Set Computing) processors, reducing cost and power use.
They offer three architecture series: Cortex-A (Application), Cortex-R (Real-time) and Cortex-
Page 17 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
M (Microcontroller). This last series is the most common in embedded systems for its range
of energy efficient, scalable and compatible processors [6]. For instance, NXP’s LPC Cortex-M
series microcontrollers [8] are a commercial implementation of this series. It also exist a
VHDL IP implementation of the Cortex-M0 family, a small, simple (and therefore) a low-
power footprint soft-core. One interesting low-power feature of the ARM family is the
big.LITTLE technology [7]. The big.LITTLE processing is a power-optimization technology
where high-performance ARM CPU cores are combined with the low-power and more
efficient ARM CPU cores to deliver peak-performance capacity, sustained performance, and
parallel processing performance, at lower average power. It combines the ultra-low power
ARM Cortex A7 core with the fast Cortex A15 core, so depending on the task’s intensity
which is being processed, it will be sent to one or the other core, saving up to 75% of the
energy.
The Texas Instruments MSP430 [9] is a family of 16-bit von Neumann architecture based
processors. The family consists of more than 500 products, more than 300 of which being
categorized as Ultra-low Power. This subfamily includes processor running from 4 to 24MHz,
which can consume less than 1µA in idle mode. They also offer multiple of low-power modes
and peripherals which can run autonomously in low-power modes. A specific subcategory
(MSP430FRxx) highlights the FRAM technology, which combines the best of Flash and SRAM
memories, since it is non-volatile while offering fast and low-power writes. Minimum
operating voltage can be as low as 0.9V for some processors. All families of MSP430 series of
microcontrollers have one active mode and five software selectable low-power modes of
operation. An interrupt event can wake up the device from any of the five low-power
modes, service the request, and restore back to the low-power mode on return from the
interrupt program.
The Renesas RL78 [11], is a 16-bit CISC architecture based family of microcontrollers,
intended for low-power applications. Depending on the product, the maximum frequency is
between 20 and 32MHz while minimum operating voltage is between 1.6 and 2.7V. The
STOP mode current consumption is of 0.52µA. The RL78 family combines the high-
performance architecture and low power consumption of the 78K0R, with the peripheral
functions of the R8C and 78K.
There are also some processors with x86 architecture that are oriented to meet the low
power constraints. The Intel Atom processor family has a subset of processors that are
oriented to the embedded market, some of which are designed to have very low power
consumption. The Atom D525 [12] is an ultra-low-voltage dual core processor in which each
core is 2-way hyperthreaded. The Intel Atom Z3000 Processor series delivers leading
performance with all-day battery life. It offers a smaller footprint and lower power usage
while also enabling double the compute performance and triple the graphics performance
compared to the previous-generation Intel Atom processor. The Intel Atom Z3000 Processor
series also includes Intel® Burst Technology 2.0 with four cores, four threads and 2MB L2
cache.
The Intel Quark SoC X1000 series [13] is a system on chip from Intel Corporation. This SoC
series allows low power, thermally constrained, fanless, and headless design [14]. They can
work at frequencies up to 400 MHz and the processor offers three low power modes. The
Intel Quark SoC X1000 series has the ability to run at half or at quarter of maximum CPU
frequency in order to decrease the power consumption.
Page 18 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
2.2.3.3 ICs for measuring Power/Energy
One can find different integrated circuit components that ease the power-energy
measurements and power supply management within the PCB design.
UCD90XXX – Texas Instruments
TI’s UCD90XXX Sequencer and Monitors are power management dedicated integrated
circuits with I2C/SMBus/PMBus and JTAG interfaces. They allow sequencing, monitoring and
resetting of power supplies at start-up/ power-down, when external events occur or when
voltage or current thresholds are surpassed. It supports ACPI specification, by defining up to
8 system states with only 3 GPIs and defining which rails are on and which ones are off at
each system state. It also enables fault and peak logging into its FLASH memory.
UCD92XX – Texas Instruments
The UCD92xx family of digital PWM controllers [15] is a multi-rail, multi-phase synchronous
buck for digital devices designed for non-isolated dc/dc power applications. They integrate
dedicated circuitry for dc/dc loop management with flash memory and a serial interface to
support configurability, monitoring, and management. They integrate multi-loop
management with sequencing, margining, tracking, and intelligent phase management to
optimize the solution for total system efficiency.
LTC29XX Power Monitors– Linear Technology
LT’s power monitors provide [16] high configurability without compromising performance or
functionality. The LTC2990/LTC2991 quad/octal I2C power monitors can be configured up to
35 different ways, perfect for 3V to 5.5V systems that need simple and accurate digital
monitoring of combinations of temperature, voltage and current. If higher voltages are
required, then the LTC2945 and its 0V to 80V operating range also allows monitoring
current, voltage and power via an I2
C interface. The LTC2946 provides energy readings for
rails up to 100V. For measuring AC or instantaneous power, the LT2940 analog power and
current monitor for 4V to 80V systems brings together the necessary circuits to accurately
measure, monitor and control power.
Digital Power System Management– Linear Technology
A PSM product is configured and monitored through a PMBus/SMBus/I²C digital interface.
LTpowerPlay development environment provides control and monitoring of power supply
voltage, current, power and energy use, sequencing, margining and “black box” fault log
data.
High current outputs, up to and exceeding 200 amps, for FPGAs, ASICs and processors can be
designed with the multiphase extender LTC3870/-1 andLTC3874 devices. These slave
Page 19 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
controllers provide a small and cost effective solution for supplying very large currents by
being cascaded with the master controllers. Up to 12 phases can be paralleled and clocked
out-of-phase with the LTC3870/-1 operating up to a 60V input and the LTC3874 working with
the line of sub-milliohm DCR sensing current mode master controllers.
78M6612 Power and Energy Measurement IC – Teridian
The 78M6612 [16] is a single phase AC power measurement and monitoring (AC-PMON) IC
capable of 0.5% Wh or better accuracy over 2000:1 current range and over temperature.
Four analog inputs are provided for measuring up to two AC loads or wall outlets. It also
includes an 8-bit MPU core with 32KB of embedded FLASH, an UART interface, and a number
of GPIO for easy integration into any power supply or outlet monitoring unit.
2.3 Power measurement and monitoring support
In general, as indicated in [39] , power measurement and monitoring support is
indispensable to circumvent inaccuracy when deploying computer-aided power analysis
tools since estimates obtained by the former method can deviate from the actual power
consumption of the working MPSoC.
Utilizing performance counters (see survey in [39] ) is a run-time monitoring technique
typically used to characterise the power consumption of a running system (or its
subcomponents). With this technique the activity of a certain block is recorded via
performance counters, and these are used in mathematical empirical models to reason on
the power consumption at different granularity levels (the instruction, block or software
tasks granularity levels). Yet the design-time estimates, even if based on run-time
performance counters measures, are not accurate and depend on infrastructure specific
counters.
According to [39] a combination of infrared imaging, and electric current measurement
techniques (see Figure 2) can yield a high-resolution spatial power maps of individual parts
for a given circuit. A mathematical foundation taking the thermal map and the current
measurements (see Figure 2) and outputting the corresponding power maps is given in [39] .
The infrared imaging uses infrared technology to obtain a thermal profile of the individual
circuit components. By knowing the thermal behaviour of a certain circuit and its heat
diffusion to the ambient temperature, the power can be obtained (e.g. using least-square
estimation see [38] ). This method is considered the most flexible since it is non-invasive and
does not require extra design setup.
Measuring the electrical current consumption of a given system can be performed either via
the usage of shunt resistors or using a clamp meter [39] . In the first approach, shunt
resistors having very low resistance (for e.g. 1mΩ) and a high accuracy ±0.1% are connected
in series with the positive supply lines [39] . They are deliberately chosen with low resistance
in order not to influence the load supply, offering at the same time an interface to measure
Page 20 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
the current supply. By knowing the current and the supply voltage the power consumption
can then be easily calculated. In the clamp meter based approach, a clamp meter device
measures (based on the hall-effect phenomena) the induced magnetic field variations
around the supply wire und use the measured values to obtain the electric current [39] .
While the clamp-meter based measuring is less intrusive, the shunt-based measurements
are more accurate and less eligible to noisiness. Both techniques require either ADCs or
digital multi-meter to sample the analog signals measured.
Electrical current measurement can be utilized by power managers at run-time to monitor
the power consumption the system (or certain subcomponents), in order to optimize power
consumption.
Figure 2: Monitoring concept based on infrared and current measurements (taken from [39] )
Page 21 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
IMPACT OF THE TECHNIQUES ON SAFETY (ANALYSIS)3.
In this section the impact of low-power capabilities on dependability will be analyzed. Within
dependability, which involves several properties (availability, maintainability, safety,
security, etc. [30]), the focus of the SAFEPOWER project is the impact of those features on
safety. This particularly means that all candidate technologies and their combination have to
be investigated from a safety standard (e.g., generic IEC-61508, railway EN-5012x and other
is the aerospace domain)perspective so the resultant low-power services are also “safe”.
Within the following lines a preliminary investigation on safety and low-power is surveyed
and, additionally, a brief study on security and low power is also developed.
3.1 Safety, low-power and mixed criticality
Into this first section, the impact of safety of the SAFEPOWER low-power and mixed-
criticality technologies is analyzed.
3.1.1 Safety and Low Power
Safety-critical applications have made very limited use of energy and power management
features. Non-safety related embedded applications (e.g., consumer electronics) can shut- or
slow-down hardware features considering not much more than the impact of the user
experience, but safety-critical applications must also carefully consider the impact of those
actions on the overall system safety. In the latter case, those low-power features must
comply with safety standard requirements (e.g., IEC 61508) in both: (1) the product life-cycle
or functional safety management (to avoid systematic design faults) and (2) techniques and
measures to control failures during operation (to control physical random faults).
Due to the explosion of autonomous systems thanks to the big improvements on the energy
storage technologies (e.g., batteries) or purely motivated by energy budget requirements,
power efficiency and power management are also very interesting cost competitive features
for safety critical systems. This safe power management cannot be done, according to the
product life-cycle, taking online decisions (dynamic reconfiguration is not recommended for
SIL 2-4 integrity levels in IEC-61508, and predictable command execution is also mandatory
in other standards, such us, in space). This suggests that the adaptation to changing
scenarios (e.g., a low power mode) on safety-critical systems must be addressed with
precompiled and verified schemes, like in [31] at operating system level or in [32] at network
level. Standard power management actions, such as, gating actions (peripheral clock or core
turn-off), must follow safe shutdown and start-up protocols. In [33] for instance, safe start-
up and shutdown scenarios are considered for an IEC 61508 compliant hypervisor partitions,
but not primarily for power management proposes.
Component power management (and temperature) is also closely coupled with its lifetime.
A proper (and low) power demand of a specific hardware component could prolong its
lifetime and, directly, the intrinsic reliability of the system. In fact, temperature monitoring is
suggested as a major diagnostic element when using an on-chip redundancy for safety
proposes (e.g., IEC-61508-2 Annex E). One can address those requirements with external
chip external monitoring components (see section 2.2.3.3) or with a more efficient way using
ring oscillator if the target device is an FPGA [36]
Page 22 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
3.1.2 Safety and Mixed-criticality
A mixed-criticality approach can also benefit from modular certification. This feature is
considered in several domain safety standards with different name: in IEC 61508 each
module is named “compliant item”, in ISO 26262 it is called “safety element out of context
(SEooC)” and in EN51019 “generic product”. The modular approach reduces the impact of
changes to a subset of the safety case, enabling reusability of its parts [34] Low power
services must comply with the safety argumentation behind such an approach.
For example, the safety-concept approach within MULTIPARTES [34] PROXIMA [35] and
DREAMS [33] EU projects proposed an argumentation for the use of multicores for mixed-
criticality system considering spatial and temporal isolation among partitions mapped to
different cores, but the impact of temperature or power was not explicitly analyzed. In fact,
this safety-concept approach is an effective way to establish a formal dialogue with a
certification authority and move away from the academic safety-certification analysis with a
rigorous safety argumentation. This early contact with certification authorities identifies
possible conflicts w.r.t. to certification standards and paves the way to the future
industrialization of the technology.
In the case of the CONTREX EU project [37] current activities in the area of predictable
computing platforms and segregation mechanisms were complemented with techniques
considering extra-functional properties such as real-time, power, and temperature for
safety/certification in mixed-critical systems. In contrary to the SAFEPOWER proposal, while
some safety measures were partly considered, no complete safety process was integrated to
the overall design flow of the CONTREX project.
3.2 State of the art in security, mixed-criticality and low-power
Classically, safety-critical systems have been considered close or semi-close systems with
very limited and controlled interactions with its environment. Current embedded systems
and, particularly, mixed-criticality nodes through its non-safety related part are more
connected to open networks (e.g., local networks, wireless networks, the cloud). In fact,
even the safety standards have started considering the inclusion of security aspects on their
life cycle.
3.2.1 Security and Mixed-Criticality
On this mixed-criticality area, there are several hardware and software mechanisms to
protect critical parts from the non-secure ones. For instance, in software, the same spatial
and temporal separation mechanisms used on hypervisors to isolate partitions from design
faults could prevent attacker to access safe (now also secure) partitions from the non-safe
(or non-secure) partition.
The US Government has a Protection Profile for Separation Kernels in Environments
Requiring High Robustness [22] which is commonly known as the Separation Kernel
Protection Profile (SKPP). Separation kernel is defined by SKPP as "hardware and/or
firmware and/or software mechanisms whose primary function is to establish, isolate and
separate multiple partitions and control information flow between the subjects and
exported resources allocated to those partitions" [22] It has to be proved that there is not
any unexpected channel for information between domains.
Page 23 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
This protection profile specifies the security evaluation criteria so that a given system, in
case is compliant with, can be certificated under the Common Criteria (also called IEC-15408)
standard. It has to be mentioned that the Common Criteria certification does not assure
security, albeit it guarantees that the declaration and specification about the system given is
true or not [21] . One of the commercial real-time operative systems which is compliant with
this protection profile is INTEGRITY-178B by Green Hills Software Inc. [23] This system was
used as baseline to partly implement a software crypto demonstrator in the separation
kernel by J. Frid [24] In addition, a state of art concerning separation kernels from a historical
and technical perspective is provided.
Similarly, although in hardware, the ARM Trustzone technology [25] is able to separate the
execution environment between two different worlds: secure and normal (non-secure). This
security feature is achieved by dividing all the hardware and software resources of the
system on chip so that they exist in those two worlds. The system is designed in such way
that it ensures that no secure world resources can be accessed by the normal world
elements. However, secure world resources have access to the non-secure ones. Thus,
employing this technology, a single physical core is able to securely and efficiently execute
code from both secure and normal worlds, which removes the necessity of another
dedicated processor core.
3.2.2 Low-power
Security and low-power are coupled in the sense that more secure versions of the same
cryptographic algorithm are also more power hungry. In [27] [28] [29] one can see the
different comparisons of several cipher algorithms and their performance depending on the
power consumptions. The power consumption itself could be also a trace for attacker to get
information on the encryption algorithm and a way to hack secret key. In [29] dynamic
voltage and frequency scaling (“switching”) is used to distortion the power consumption
trace and further protect the secret key integrity. A kind of attack could also consider to hack
the system so e.g., requesting to perform a task that increases the consumption and makes
the system out of battery, but few bibliography have been found on this track.
Page 24 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
ZYNQ PLATFORM POWER MANAGEMENT CAPABILITIES4.
In this section we will elaborate on the Zynq platform from Xilinx with its provided low-
power capabilities.
4.1 Overview of low-power features
Figure 3 depicts an overview of the Zynq-7000 SoC. The Zynq SoC comprises a Processing
System (PS) part and a Programmable Logic (PL) part combining both the computing power
of a dual-ARM platform (PS) at 866 MHz and the flexibility of FPGAs (PL) on one SoC. The
ARM dual core is connected to the peripherals via a central Interconnect. On the left side the
available interfaces are shown which can be connected to the pinout of the MPSoC by the
I/O Mux. The Application Processor unit (APU) has a direct interface to a Multiport DRAM
Controller which can also be accessed via the central interconnect and likely the Flash
Controller as well as the Programmable Logic (FPGA) part of the MPSoC are accessed via the
interconnect.
According to [46] the following low-power features are supported by the Zynq SoC:
 PL power-off
 Cortex A9 processor standby mode
 Clock gating for most PS subsystems
 Three PLLs can be programmed to minimize power consumption
 Subsystem clocks can be programmed for optional clock frequency
 Programmable voltage for I/O Banks:
 MIO: HSTL 1.8V, LVCMOS 1.8V, 2.5V and 3.3V
 DDR: DDR2 1.8V, DDR3 1.5V and LPDDR2 1.2V
 DDR3 and LPDDR2 low power mode
 DDR 16 or 32-bit data I/O
In the following, we will take a look at each feature and describe how these features can
be used in the context of power management.
Page 25 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Figure 3: Structural overview of Xilinx Zynq-7000 family [46]
4.2 Power Rails/Domains
Figure 4 shows the different power domains for the PS and PL of the Zynq-7000 SoC devices.
An interesting fact is that the PS and PL have separate power domains allowing for e.g. to
power down the PL independent from PS for power saving purposes.
Figure 4: Power Domains of the Zynq-7000 SoC [47]
The detailed description of the single power pins for the PS and PL parts is depicted in
Page 26 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Table 1 [46].
Page 27 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Table 1: Detailed description of Zynq power pins [46]
4.3 Clock Control
Figure 5 shows the main components PS Clock subsystem of the Zynq SoC. As seen all clocks
generated by the PS are derived either from the ARM PLL, the I/O PLL or the DDR PLL where
each of them triggers also the corresponding peripheral [46]. The bypass control block mode
and the frequency of each PLL are independently controllable via software instructions. In
addition, the PS clock module provides four frequency-programmable clocks (FCLKs) to the
PL which can also be individually controlled for providing different frequencies.
Similar to the clock control subsystem, the PS includes also a reset subsystem which also
provides four individually programmable reset signals to the PL [46].
Page 28 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
Figure 5: Xilinx Zynq PS Clock Subsystem [46]
4.4 Processor System (PS): Power Management
In the following a brief description of the power management techniques is given available
at the PS level of the Zynq board.
4.4.1 PS Peripherals Clock Gating
As already mentioned, several clock domains are supported in the PS, each with
independent clock-gating control. During runtime, unused clock domains can be shut down
and clocks for PS peripherals such as timers, DMA, SPI, QSPI, SDIO, and the DDR controller
can be gated to reduce dynamic power dissipation [48] .
4.4.2 Caches
Concerning the caches on the PS part, the L2 cache controller supports dynamic clock gating
and standby Mode. By the dynamic clock gating, the L2 controller clock stops as the ARM
controller becomes idle after some delay cycles (32 cycles). Similarly in the standby mode
the L2 cache controller internal clocks are also stopped but here only for a specific state of
the processor: the Wait-For-Interrupt (WFI) state [46] while still maintaining the processor
and RAM power. By applying the dynamic clock gating in the WFI state, the power is now
only influenced by the leakage currents and the clocking of small logic responsible for the
wake-up condition [48] .
Page 29 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
4.4.3 On-Chip Memory (OCM)
The OCM, a low-latency memory with a 256 KB of RAM and 128 KB of ROM (boot ROM), can
be used to reduce overall power of the system. In the case where the DDR is in a low-power
mode, the OCM can be used to store executable code which clearly reduces the power
consumption due to the minimal power dissipation of the OCM [48] .
4.4.4 Snoop Control Unit (SCU)
The SCU block manages the data coherency between the two ARM CPUs and the L2 cache.
The SCU internal clocks can be stopped if the standby state is enabled. If the CPUs are in WFI
mode, no pending requests from the Accelerator Coherency Port (ACP) and no remaining
activity is expected at this stage in the SCU. If one of the above conditions is not more
satisfied the SCU resumes its normal operation [48] .
4.4.5 PLL
In general the clocks in the PS can be dynamically slowed down or gated off to reduce
power. According to [46], the PLL power consumption is dependent on the PLL output
frequency, thus power consumption can be reduced by using a lower PLL output frequency.
Power can also be reduced by powering down unused PLLs. For example, if all clock
generators can be driven by the DDR PLL, then the ARM core and I/O PLLs can be disabled to
reduce power consumption. The DDR PLL is the only unit that can drive all of the clock
generators. Each clock can be individually disabled when not in use. In some cases,
individual subsystems contain additional clock disable capabilities and other power
management features.
4.4.6 Physical Memory
Zynq-7000 AP SoCs [46]support different types of physical memory, such as DDR2, DDR3,
and LPDDR2. Minimizing DDR power consumption would have great impact on the overall
system power. In order to realize this, the following issues (listed in [46]) should be taken
into consideration:
 The DDR controller operating speed,
 The choice of DDR width and whether ECC is enabled or disabled,
 The number of DDR chips used,
 The DDR type, such as using LPDDR for significant voltage reductions,
 The use of different DDR modes during low power operation, such as DDR self-
refresh mode.
4.4.7 Firmware support
The Zynq Linux kernel supports the following power management states [48] :
 Frequency scaling: The Linux framework utilizes [47] the cpufreq framework used
to scale the CPU frequency,
Page 30 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
 Low-power idle (CPUidle): This is a low-power state which takes places when the
CPU is idle. CPUidle drivers are used to manage the CPU idle levels (set CPU to low
power state when CPU is in the WFI state),
 Suspend power management state which are used to enter sleep states like the
well known suspend to disk/RAM on laptops, supporting three different states [48]
[48] :
 S0: Freeze or Suspend-To-Idle. This is a generic, pure software, light-weight,
system sleep state. It allows more energy sparing compared to the low-power
idle state by freezing the user space and putting all I/O devices into low-power
states such that the processors can spend more time in their idle states.
 S1: Standby or power-on suspend. If supported, in this state, all processor
caches are flushed and instruction execution stops. Power to the processor
and RAM is maintained. In addition to freezing user space and putting all I/O
devices into low-power states, which is done for Suspend-To-Idle too, non-
boot CPUs are taken offline and all low-level system functions are suspended
during transitions into this state. For this reason, it should allow more energy
to be saved relative to Suspend-To-Idle, but the resume latency will generally
be greater than for that state.
 S3: Suspend-to-RAM. This state, if supported, significant energy sparing can
be reached as every component in the system is put into a low-power state
except for memory. System and device state is saved to memory. All devices
are suspended and powered off. RAM remains powered and is put to the self-
refresh mode to retain its contents.
In addition, frameworks supporting hardware monitoring are also available in the Linux
kernel [47] :
 XADC for temperature and voltage monitoring,
 UCD9248 for voltage and current monitoring of PWM controllers on Xilinx platforms
like ZC702,
 UCD90120 for power supply sequencer and monitors used on Xilinx platforms like
ZC702.
4.5 Programmable Logic (PL): Power Management
The following power-management features are possible for the PL part of the Zynq [47] :
 Logic Resource Utilization minimization of Logic Resource Utilization would
contribute positively to the power reduction.
 Managing Control Sets1: Avoiding the usage of both a set and reset on a register or
latch and using active-high control signals, have proven to be efficient in terms of
power reduction. In addition, reducing the use of sets and resets improves device
utilization, resulting in reduced power.
1
Control sets are signals that control synchronous elements such as clock, set, reset and clock enable.
Page 31 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
 PL Frequency scaling: it is possible to reduce the frequency of the logical part if not
necessary to run at full speed all the time.
 Clock Gating: as the dynamic power consumption of the PL is mainly affected by the
clock frequency (fclk), gating the local clock-enable or data paths stops the
switching activity and eliminates unnecessary toggling when results of these paths
are not used which in turn helps reducing power.
 BRAM: to save power, the block RAM enable can be driven low when the block RAM
is not used.
 PL Data Retention: After gating the PL clocks the voltage of the PL could be reduced
to a retention level (V_DRINT= 0.75V) reducing the static power consumption.
Below this voltage retention level the configuration data might be lost.
 PL-Power-down Control: the power to the PL can be completely shut down to
reduce power consumption which is possible due to independent power supplies of
both PL and PS. The power supplies of the PL which can be turned off are: VCCINT,
VCCAUX, VCCBRAM and VCCO. When using this technique the configuration of the
PL is lost and reconfiguration is needed.
4.6 Monitoring
For monitoring the performance of the Znyq SoC, on-chip performance counters are
supported which monitors single components and which can be used to estimate the power
consumption of the system. In addition, special sensors can be connected to the SoC after
finishing the implementation for a more accurate power consumption measurement.
4.6.1 Performance Counters
Several performance counters are supported on the Zynq SoC, to monitor the system
components [48] :
 SCU Global Timer (PS). The SCU global timer can be used to timestamp system
events in a single clock domain. Alternatively, operating systems often provide high
accuracy timers for software event tracing, such as Linux clock_nano_sleep.
 ARM Performance Monitoring Units (PS). Each ARM core has a performance
monitoring unit (PMU) that is used to count micro-architectural events. These
counters can be accessed directly by software, through operating system utilities, or
with chip debuggers such as Linux Perf or ARM Streamline.
 L2 Cache Event Counters (PS). The L2 cache has event counters that can be accessed
to measure cache performance.
 GigE Controller (PS). The gigabit Ethernet controller has statistical counters to track
bytes received and transmitted on its interface.
 AXI Performance Monitor (PL). This core can be added in PL to monitor AXI
performance metrics such as throughput and latency. Trace functions enable time-
stamped AXI traces, such as time-stamped start and end of AXI transactions to
observe per-transaction latency.
Page 32 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
 AXI Timer (PL). This core can be added in PL to provide a free-running timer in PL.
This timer is useful for time-stamping events in PL clock domains.
 AXI Traffic Generator (PL). This core can generate a variety of traffic patterns to the
PS interfaces. When used with an AXI performance monitor, the traffic generator
can help provide early system-level performance estimates. The core can be used to
estimate data-movement costs and validate design partition choices.
These performance counters can be used to construct an analytical power/temperature
models (see [39] ) to achieve a rough estimation of the power consumption of the overall
SoC.
4.6.2 Monitoring – Power and Temperature Sensors
4.6.2.1 XADC Monitoring
The Xilinx analog-to-digital converter (XADC) can be used for monitoring applications on the
Zynq SoC. When instantiated in the PL, the PS-XADC can establish connectivity to the XADC
via an XADC AXI4-lite interface (see Figure 6). Since the XADC has a power supply and
temperatures sensors, control information of the PS can be monitored. Each of these
sensors can be configured to hold minimum/maximal thresholds which when violated during
runtime alarm signals can be issued [46]. The XADC can be configured via an industrial Linux
driver (provided by Xilinx) to ease the monitoring process for the end user [49] .
4.6.2.2 ZC702 Power Measuring using TI Fusion Power Designer
There exists several mature works to measure the power consumption of some Zynq-7000
SoC instances via external boards (sensors). One example is the ZC702 board where using
Texas Instruments (TI) fusion power Designer, a very accurate power measurement can be
established. The main measurement concept rely on the ability to continuously measure and
monitor the three digital power controllers (UCD9248) available on the ZC702 board. For
detailed setup description with a demo application refer to [47] .
Figure 6: XADC in PL for Monitoring PS [46]
Page 33 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
4.7 Summary of Low Power Features
In the context of SAFEPOWER the ZYNQ provides rich and promising mechanisms for run-
time power management as well as for resource management capabilities. Regarding the PS
power management, one of these capabilities is the dynamic frequency scaling (DFS). This
feature of the processing systems (PS) gives the availability to control the frequency of the
ARM dual core or the programmable logic. Together with an external voltage regulator
which is configurable on the fly by the operating system it would be possible to realize a
system with dynamic voltage and frequency scaling (DVFS). The same method can be used
for the external DDR RAM or peripherals. It is also possible to shut down the unused clocks
for saving power. If an asymmetric processing of the ARM cores is used, it is possible to use
CPU hotplug. With this feature it is possible to bring up or shut down the secondary
processor core if it is or is not needed at the time (Vdd and clk for one of the processors set
to zero). A wake up of a core can be done by an interrupt for example. Regarding resource
management a Linux kernel can be used on the ARM cores offering Linux low-power
services. On the PL side clock gating and reset per component are possible. In addition,
shutting down the whole PL is also possible (via PL-Power-down Control:), with the penalty
of reconfiguration latency. The following table summarizes the basic foundations:
Table 2: Summary of Zynq low-Power features
Static Dynamic Advanced (hibernate) Monitoring
Reduce number of PLLs
to minimum
DVFS of PS ARM cores Freeze or low-power
idle state
Performance counters
Find lowest possible
operating frequency
and supply voltage
Shutdown of one of
the PS ARM cores
Standby or power-on
suspend
XADC on-chip Power
and Temperature
Monitor
Shutdown (power
gate) unused PS
components
DFS of PL softcores
(e.g. Microblaze)
Suspend-to-RAM External (PCB) power
sensors
DDR memory selection
(as part of the PCB
design)
Clockgating of PS
components
FPGA low-power
design and synthesis
rules/constraints
Clockgating of PL
softcores (e.g.
Microblaze)
Logic enable/disable of
PL components
PL Data Retention or
Power shutdown
4.8 Outlook to Advanced Features
In the context of SAFEPOWER the Zynq SoC provides architectural services allowing a stable
foundation for power-aware CRTES. Depending on these low-level architectural services, the
following advanced services which built upon those available in the Zynq should be enabled:
1. Power-aware adaptive execution service for CRTES: For the sharing of processor
cores among mixed criticality applications including safety-critical ones,
hypervisors (e.g., XtratuM) will be used, which ensure time/space partitioning as
Page 34 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
well as power/energy/temperature partitioning for the computational resources.
The scheduling of computational resources (e.g., processor, memory) in
SAFEPOWER will ensure that each task obtains not only a predefined portion
of the computation and energy resources, but also that execution occurs at
the right time and with a high level of temporal predictability. The execution
services of SAFEPOWER will support the switching between different schedules to
react to dynamically changing computational load and varying
power/energy/temperature constraints. This dynamic adaptation will be tightly
integrated with the underlying low-power mechanisms of the hardware (e.g.,
DVFS, power gating, power monitoring). The execution environments will be
amenable to relevant safety standards and worst-case execution time analysis.
2. Power-aware adaptive communication service for CRTES: SAFEPOWER will provide
services for low-power message-based real-time communication among
components. Based on an intelligent communication system with a priori knowledge
about the allowed behaviour of components in the value and time domain,
SAFEPOWER will ensure partitioning with respect to time, space, power, energy
and temperature. The configuration of the communication system (e.g.,
network interfaces, routers) will be dynamically configurable based on varying
load conditions and resource availability (e.g., remaining energy) in order to
enable clock scaling, power gating in the communication system as well as the
reconfiguration of the application. The shared memory model will be supported on
top of message-based NoCs.
3. Power, energy and temperature extensions of health monitors: SAFEPOWER will
support health monitors for faults such as overuse of shared resources, meeting
of power/energy/temperature constraints and deadline exceeding. The health
monitoring service will use error detection services in the execution and
communication services as well as power/temperature measurement techniques
in the hardware. Both automatic reactions of the platform (e.g., automatic
reconfiguration) as well as the notification of applications for user-defined reactions
will be supported.
4. Power, energy and temperature adaptation services for CRTES: These services
will change the scheduling and allocation of communication and computational
resources at runtime in order to exploit the potential for low power, energy and
temperature awareness, while at the same time considering the requirements of
certification, real-time constraints and time/space partitioning. In particular,
solutions for the dynamic reconfiguration of time -triggered execution and
communication systems will be provided. Branching points in time-triggered
schedules will be analysed and optimized at design time, thereby obtaining temporal
guarantees and maintaining the benefits of time -triggered control for CRTES such as
temporal predictability, implicit synchronization of resource accesses and service
executions.
Page 35 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
LOW-POWER SERVICES5.
Here the needed architectural low-power services for the generic platform are defined
taking into account hardware level power management techniques. For e.g. start/stop of
any low-power technique, setting system to sleep, fault tolerance, communications,
diagnostics etc.
5.1 Low power services of the hypervisor/OS
5.1.1 Monitoring services
The hypervisor will provide services for monitoring the registers of the power/temperature
measurements. Partitions can invoke these services to analyse the evolution and detect
abnormal situations. A High-level Health Monitor can be in charge of the periodic data
acquisition, analysis and decisions of the system behaviour. For instance:
 Service: <Power/temperature monitor>
 Returns a data structure with the measurements.
 Comment: only partitions with system rights can succeed in the service.
5.1.2 Idle management
Processor idle time can be the result of early completion execution time of partitions or slack
times in the scheduling plan of partition.
Concerning to slack times in the scheduling plan, the hypervisor schedules partitions under a
cyclic schedule in each core (see Figure 7). The cyclic schedule defines a set of temporal
windows (slots) in a temporal frame (Major Frame or MAF) specifying the following
parameters:
 offset of the slot with respect the MAF origin (start time of the slot)
 slot duration
 partition identifier
The cyclic schedule is static and it is generated as result of the requirements of the
applications. As result of the analysis, the schedule plan for each core can define empty slots
that correspond with the idle time.
Figure 7: Example of schedule plan for 1 core
Page 36 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
The behaviour of the hypervisor when the next slot to be executed is an idle time slot should
be specified. This behaviour could be the same or different for each core. In an initial
decision, we assume that the behaviour will be the same for all cores.
The hypervisor can define several behaviours for the idle management that can be selected
when the hypervisor is configured and compiled. The hypervisor default behaviour when an
idle slot is found in the schedule could be:
1. Do not modify the core behaviour.
2. Use a default low power consumption mode of the core
3. Select a low power consumption mode of the core based on the overhead of core
mode and idle time available.
4. Define a default operational frequency (ex. the lowest frequency available).
Additionally to the default behaviour of the hypervisor, a partition with system rights could
modify the behaviour of the core during the execution according to accommodate with the
energy conditions. In this case, a hypervisor service to set the behaviour in execution time is
defined.
 Service: <Set idle time behaviour>
 Returns nothing.
 Comment: only partitions with system rights can succeed in the service.
5.1.3 Management of the end of partition activity
A second idle management can occur when a partition finishes its internal activities prior the
end of the slot. When the scheduling is generated, a partition slot is allocated as
consequence of the temporal needs of the partition. These temporal needs can be the
occurrence of several internal periodic tasks that are analysed in its worst case execution
time (WCET). So, the result of the schedule is a slot that can contain several task
instantiation in the WCET.
Figure 8: Execution of a slot
In Figure 8 a possible execution of the P1 that involves 3 tasks is shown. The slot has been
defined according to the WCET of each task. In an execution of the slot, the task finishes its
activity in an execution time smaller than the expected WCET. This means that at the end of
the slot there is a remaining slot time at partition level. In that case, if the guestOS informs
the hypervisor the end of the slot activity, the hypervisor can apply the policy for the idle
time.
 Service: <End of the partition activity in the slot>
Page 37 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
 Returns nothing.
 Comment: all partitions can invoke this service.
5.1.4 Adaption of the processor frequency to the partition needs
The off-line scheduling analysis of the system has to generate the cyclic schedule for all cores
in the system. This cyclic schedule for each core can be generated according to the temporal
constraints but also w.r.t the energy constraints. Taking into account the optimizations that
the schedule generation can produce, the partitions are allocated into temporal slots
running at a specified frequency. Of course, the execution time of the partition in a slot at a
specified CPU frequency will vary depending on the selected frequency.
To cover this functionality, the slot definition is extended to cover the minimal frequency at
which the partition should be executed. So, the new slot definition is:
 offset of the slot with respect the MAF origin (start time of the slot)
 slot duration
 partition identifier
 minimal frequency
When the slot is executed in a core, the hypervisor will adapt the core frequency to the
minimal frequency specified in the configuration file.
In any case, the partition could increase/decrease the frequency in each slot execution but
not decreases the frequency beyond the minimum frequency defined and not increases the
frequency beyond the maximum frequency allowed in the configuration file. The former is in
order to do not compromise the deadline constraints of the tasks, and the second allows
limiting the power consumption of the partition. A service for the partition is defined to
change the frequency during the slot execution.
 Service: <change the processor frequency during this slot execution>
 Returns nothing.
 Comment: all partitions can invoke this service.
 Note: the frequency change only affect to the current slot execution. Next partition
activations will be executed at the frequency specified in the configuration file.
5.1.5 Device management
Devices are handled by the guest OSs and not at hypervisor level. It is responsibility of the
guestOS to put the device in the appropriated mode at the end of the slot activities.
Hypervisor could offer services to sleep/wake-up the devices, but the guest OS is who decide
when these actions should be performed.
Page 38 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
5.1.6 Coordination OS and Hypervisor
guestOS and hypervisor should coordinate its activities to cooperate in the energy
management of the system. This coordination involves the previously defined services that
are invoked by the guestOS to the hypervisor.
5.1.7 System information
In a partitioned system, only partitions with system rights are recognized by the hypervisor
to get information or take actions related with the system or other partitions. This is a
security requirement that is applied by the hypervisor. As consequence, only partitions with
system rights can access to the information of the system. Partition rights are defined by the
system integrator and are statically defined in the configuration file.
A partition with the system rights can invoke the system get information to know the system
status. System status is a data structure that contains several fields related to the system
execution including the performance registers that are used.
 Service: <Get system information>
 Returns data structure of the system status.
 Comment: only partitions with system rights can succeed in the service.
 Note: Currently this service exists in XtratuM but a redefinition of the data structure
to include the new information is needed.
5.2 Generation of low-power NoCs with NoC MPSoC system generators
5.2.1 State of the art
There has been a multitude of studies on how to make power-efficient on and off-chip
networks, but they mainly focus on implementation details and the power behaviour of the
NoC itself, which is of limited interest to this study.
Then there are those that aim to generate the NoC itself. These pure NoC generators
typically produce VHDL and/or Verilog code for a certain type of NoC, with a bunch of
parameters to modify its settings, together with a Network Interface so it can be operated
from a test bench. However, in general they do not provide support for integration into a
working MPSoC system.
The third category of providers is those that generate an entire MPSoC system, but are less
flexible when it comes to generate different types of NoCs. Instead, they focus on reducing
the efforts of integrating the NoC into the system, and how to generate a working HW/SW
MPSoC system. This is of high interest to this study, not only because it raises the TRL
considerably, but also lets us explore the predictability and thereby the safety of the final
system.
The fourth category of providers is the commercial ones, with ready-made NoC chip
solutions, ready to be integrated as an add-on to some FPGA boards. However, since these
systems come with a fixed notion of NoC and memory hierarchy, they are of less use since
they were not designed with predictability and programmability using Models-of-
Computations in mind.
In the coming subsections, we will go through SoA research that is relevant to this study.
Page 39 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
5.2.2 Low-Power Aspects
Reducing power when it comes to implementing NoC structures is pretty straightforward.
There are only two parameters to play around with: area and frequency. Since power is
proportional to the switched capacitance, which in turn is proportional to the area, the
rationale behind minimizing area is: the smaller the design, the lower the power
consumption will be. This opts for implementing the NoC structures in a bit-serial manner.
However, for constant throughput, the frequency must then be increased with the same
factor as area was reduced, gaining very little in the end. You only gain if the channel is silent
for long periods of time.
The other option is to reduce the switching frequency, either the operating/clock frequency
of the switch/routers or the frequency of the data traffic frequency in the network. Since it is
hard to reduce clock frequency in the switch network, asynchronous communication has
been suggested between the switch/router nodes [57]. The rationale behind this is that the
network should only switch and consume power when it has something to switch.
However, asynchronous NoCs are difficult to make predictable, and how to start sending
once the reset signal is globally released is a really interesting problem. Thus, most designs
continue to be synchronous implementations. For a comprehensive overview of methods to
reduce power in NoCs, we refer the reader to [56].
5.2.3 Open Source NoC Generators
Many NoC structures have been suggested over the years since the first paper with the word
NoC in the title was published in the year 2000 [58]. A few of those has even been released
as open source, but then mainly for use as a help for doing research.
The more interesting of these are the ones that come with a generator, i.e., with a method
or program that allows to user to configure the NoC according his or her needs. For instance,
the CONNECT tool from Carnegie-Mellon University [59][60] can generate Verilog code for
various NoC implementations, but the code is released under copyright and cannot be
reused by anyone else to create a commercial product. The Atlas framework, developed by
the GAPH group at PUCRS in Brazil [61][62], can produce different NoC topologies and
generate synthesizable VHDL files.
Another example is Netmaker from University of Cambridge. It is a library of fully-
synthesizable parameterized Network-on-Chip (NoC) implementations, released under GPL
license, so it cannot be used for commercial purposes either.
Others generate only parts of the NoC, for instance [64], from Stanford, which presents a
parameterized RTL implementation of a state-of-the-art VC router, or HNOCs, which is
targeted for simulation of Heterogeneous NoCs.
5.2.4 NoC MPSoC System Generators
The most interesting NoC Generators are those that come with a complete design flow,
which let designers to compose entire MPSoC systems, including SW stack and Device
Drivers. These generators have in common that they use an XML description to specify the
MPSoC system. The XML is then used as input to a generator program that then produces
the actual implementation. They are typically limited to a few NoC type and topologies, but
focus instead on ease of use and producing a working design that is correct-by-construction.
The two most prominent ones are the NoC System Generator from KTH in Sweden
Page 40 of 54
D1.2: Analysis and selection of low power techniques, services and patterns
Version 1.0
[50][51][55][66], the CompSoC platform from Eindhoven and Delft in the Netherlands [67].
The KTH System Generator is a tool suite using the Nostrum NoC from KTH. It has a GUI as
frontend for entering the MPSoC system, and uses MoCs inspired by the ForSyDe
methodology. The tool suite is also FPGA vendor agnostic. The tool generates an image of its
internal representation of the system in the target FPGA tool vendors own frontend
language, i.e., it generates sopc, or qsys files for Altera, and mhs and vivado tcl scripts for
Xilinx implementations. The CompSoC platform is centered around the Aethereal NoC, and
targets Xilinx technology. It also has hooks for importing designs generated by the ForSyDe
methodology.
5.3 Utilization of low-power services at higher abstraction levels
In order to utilize the techniques presented in this deliverable, it is of key importance that
the properties and services of the architecture can be used at higher levels of abstractions in
the design flow. Designing a power-efficient mixed-criticality system, where several
applications share the same platform is extremely challenging. Thus the design process
should start at a high level of abstraction and needs to be supported by tools for design
space exploration and synthesis.
Given a set of application models with individual design constraints, a set of global
constraints, and a platform model, the objective of the DSE activity is to find an efficient
implementation of all applications on the shared platform that satisfies all individual and
global design constraints. In the context of mixed-criticality systems it is of utmost
importance that the DSE process can give guarantees that all constraints in terms of timing
and power will be met by the proposed implementation. The techniques discussed in this
deliverable and the research within the SAFEPOWER project is key prerequisites for a DSE-
tool because of the QoS-guarantees that can be provided by the platform. The DSE-tool,
developed partially in the CONTREX-project and presented in [52], formulates the DSE-
problem as constraint satisfaction problem and captures applications as synchronous data
flow graphs [54] and uses a predictable MPSoC platform with TDM-bus. A solution does not
only give a mapping of SDF-actors to processing elements, but also generates the schedule
for the set of actors on each processing element and schedules the communication on the
TDM-bus. However, the tool focuses only on timing guarantees and does not take power
into account. The analytical DSE-tool [52] has been combined with a simulation-based DSE-
tool in [53] into a joint analytical and simulation-based DSE tool to analyse typical scenarios
in addition to the worst case through simulation. The simulation tool can also be equipped
with power-models to give the power consumption for a typical scenario. However, there is
still a lack of good power models for higher levels of abstraction, while it is already possible
to give good timing models for predictable platforms, which makes it so far difficult for DSE-
tools to give absolute power guarantees.
The NoC system generator [55] is a tool that can generate both the hardware in form of a
configurable NoC and configurable tiles and the software automatically for Altera and Xilinx-
FPGAs. The tool currently uses a heart-beat model [50] to support applications to support
applications modelled with a synchronous models of computation, and can generate
implementations from Simulink models [51].
Inside the SAFEPOWER project, the NoC system generator will be extended to support
techniques for low power predictable NoC-s. Furthermore, an integration of a DSE-tool into
the NoC-system generator would facilitate system design, because then the designer can
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns
D1.2 analysis and selection of low power techniques, services and patterns

More Related Content

What's hot

Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
guest42b2673
 
1. wcdma rno paging problem analysis guidance 20041101-a-1.0
1. wcdma rno paging problem analysis guidance 20041101-a-1.01. wcdma rno paging problem analysis guidance 20041101-a-1.0
1. wcdma rno paging problem analysis guidance 20041101-a-1.0
mounkid el afendi
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
Samsung Electronics
 
Risk Assessments and Reliability, What You Need To Know
Risk Assessments and Reliability, What You Need To KnowRisk Assessments and Reliability, What You Need To Know
Risk Assessments and Reliability, What You Need To Know
Steven Shapiro, PE, ATD
 
Embedded os
Embedded osEmbedded os
Embedded os
chian417
 

What's hot (17)

Rtos slides
Rtos slidesRtos slides
Rtos slides
 
Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
Wcdma Rno Handover Algorithm Analysis And Parameter Configurtaion Guidance 20...
 
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
White Paper: Monitoring EMC Greenplum DCA with Nagios - EMC Greenplum Data Co...
 
How to Measure RTOS Performance
How to Measure RTOS Performance How to Measure RTOS Performance
How to Measure RTOS Performance
 
Embedded Intro India05
Embedded Intro India05Embedded Intro India05
Embedded Intro India05
 
1. wcdma rno paging problem analysis guidance 20041101-a-1.0
1. wcdma rno paging problem analysis guidance 20041101-a-1.01. wcdma rno paging problem analysis guidance 20041101-a-1.0
1. wcdma rno paging problem analysis guidance 20041101-a-1.0
 
ICT AIM project Energy Management Device Keletron
ICT AIM project Energy Management Device KeletronICT AIM project Energy Management Device Keletron
ICT AIM project Energy Management Device Keletron
 
booting-booster-final-20160420-0700
booting-booster-final-20160420-0700booting-booster-final-20160420-0700
booting-booster-final-20160420-0700
 
RTOS Basic Concepts
RTOS Basic ConceptsRTOS Basic Concepts
RTOS Basic Concepts
 
Risk Assessments and Reliability, What You Need To Know
Risk Assessments and Reliability, What You Need To KnowRisk Assessments and Reliability, What You Need To Know
Risk Assessments and Reliability, What You Need To Know
 
thesis_SaurabhPanda
thesis_SaurabhPandathesis_SaurabhPanda
thesis_SaurabhPanda
 
Gsm or x10 based scada system for industrial
Gsm or x10 based scada system for industrialGsm or x10 based scada system for industrial
Gsm or x10 based scada system for industrial
 
Gsm or x10 based scada system for industrial automation
Gsm or x10 based scada system for industrial automationGsm or x10 based scada system for industrial automation
Gsm or x10 based scada system for industrial automation
 
Embedded os
Embedded osEmbedded os
Embedded os
 
Protective relaying-philosphy-and-design-guidelines
Protective relaying-philosphy-and-design-guidelinesProtective relaying-philosphy-and-design-guidelines
Protective relaying-philosphy-and-design-guidelines
 
Alcatel Lucent 9500 mpr user manual
Alcatel Lucent 9500 mpr user manualAlcatel Lucent 9500 mpr user manual
Alcatel Lucent 9500 mpr user manual
 
REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1REAL TIME OPERATING SYSTEM PART 1
REAL TIME OPERATING SYSTEM PART 1
 

Similar to D1.2 analysis and selection of low power techniques, services and patterns

AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
ijesajournal
 
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
ijesajournal
 
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
IJCSEA Journal
 
PROGRAMMABLE LOGIC by Hites Khatri
PROGRAMMABLE LOGIC by Hites KhatriPROGRAMMABLE LOGIC by Hites Khatri
PROGRAMMABLE LOGIC by Hites Khatri
Hitesh Khatri
 

Similar to D1.2 analysis and selection of low power techniques, services and patterns (20)

AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
 
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
AN EFFICIENT HYBRID SCHEDULER USING DYNAMIC SLACK FOR REAL-TIME CRITICAL TASK...
 
APE-Annotation Programming For Energy Eciency in Android
APE-Annotation Programming For Energy Eciency in AndroidAPE-Annotation Programming For Energy Eciency in Android
APE-Annotation Programming For Energy Eciency in Android
 
Matter new
Matter newMatter new
Matter new
 
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...IRJET-  	  Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
IRJET- Analysis of Micro Inversion to Improve Fault Tolerance in High Spe...
 
Parallex - The Supercomputer
Parallex - The SupercomputerParallex - The Supercomputer
Parallex - The Supercomputer
 
PLC and SCADA summer training report- government engineering college ajmer
PLC and SCADA summer training report- government engineering college ajmerPLC and SCADA summer training report- government engineering college ajmer
PLC and SCADA summer training report- government engineering college ajmer
 
Plc 7 my saminar plc
Plc 7  my saminar plcPlc 7  my saminar plc
Plc 7 my saminar plc
 
Cluster computing report
Cluster computing reportCluster computing report
Cluster computing report
 
An Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors RisksAn Efficient Approach Towards Mitigating Soft Errors Risks
An Efficient Approach Towards Mitigating Soft Errors Risks
 
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUSDYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
 
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUSDYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
DYNAMIC TASK SCHEDULING ON MULTICORE AUTOMOTIVE ECUS
 
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
ENERGY EFFICIENT SCHEDULING FOR REAL-TIME EMBEDDED SYSTEMS WITH PRECEDENCE AN...
 
kogatam_swetha
kogatam_swethakogatam_swetha
kogatam_swetha
 
A Virtual Machine Resource Management Method with Millisecond Precision
A Virtual Machine Resource Management Method with Millisecond PrecisionA Virtual Machine Resource Management Method with Millisecond Precision
A Virtual Machine Resource Management Method with Millisecond Precision
 
Microcontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken driversMicrocontroller based automatic engine locking system for drunken drivers
Microcontroller based automatic engine locking system for drunken drivers
 
PLC SCADA report Paras Singhal
PLC SCADA report Paras SinghalPLC SCADA report Paras Singhal
PLC SCADA report Paras Singhal
 
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTINGLOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING
 
PROGRAMMABLE LOGIC by Hites Khatri
PROGRAMMABLE LOGIC by Hites KhatriPROGRAMMABLE LOGIC by Hites Khatri
PROGRAMMABLE LOGIC by Hites Khatri
 
Alcohol report
Alcohol reportAlcohol report
Alcohol report
 

More from Babak Sorkhpour

D7.1 project management handbook
D7.1 project management handbookD7.1 project management handbook
D7.1 project management handbook
Babak Sorkhpour
 
D2.1 definition of reference architecture
D2.1 definition of reference architectureD2.1 definition of reference architecture
D2.1 definition of reference architecture
Babak Sorkhpour
 

More from Babak Sorkhpour (20)

Integration of mixed-criticality subsystems on multicore and manycore processors
Integration of mixed-criticality subsystems on multicore and manycore processorsIntegration of mixed-criticality subsystems on multicore and manycore processors
Integration of mixed-criticality subsystems on multicore and manycore processors
 
Virtualization and hypervisor solutions for mixed-criticality systems based o...
Virtualization and hypervisor solutions for mixed-criticality systems based o...Virtualization and hypervisor solutions for mixed-criticality systems based o...
Virtualization and hypervisor solutions for mixed-criticality systems based o...
 
D7.1 project management handbook
D7.1 project management handbookD7.1 project management handbook
D7.1 project management handbook
 
D2.1 definition of reference architecture
D2.1 definition of reference architectureD2.1 definition of reference architecture
D2.1 definition of reference architecture
 
Babak sorkhpour seminar in 80 8-24
Babak sorkhpour seminar in 80 8-24Babak sorkhpour seminar in 80 8-24
Babak sorkhpour seminar in 80 8-24
 
معرفي پروژه اتوماسيون سيستم هشدار سيل استان گلستان
معرفي پروژه اتوماسيون سيستم هشدار سيل استان گلستانمعرفي پروژه اتوماسيون سيستم هشدار سيل استان گلستان
معرفي پروژه اتوماسيون سيستم هشدار سيل استان گلستان
 
All love.com
All love.comAll love.com
All love.com
 
معرفي سيستم winaura
 معرفي سيستم winaura معرفي سيستم winaura
معرفي سيستم winaura
 
معارفه ریکی و متافیزیک
معارفه ریکی و متافیزیکمعارفه ریکی و متافیزیک
معارفه ریکی و متافیزیک
 
فناوری اطلاعات و تولید نهایی
فناوری اطلاعات و تولید نهایی فناوری اطلاعات و تولید نهایی
فناوری اطلاعات و تولید نهایی
 
شناسایی دانش
شناسایی دانششناسایی دانش
شناسایی دانش
 
مدیریت دانش در سازمانهای ارزیاب زیست محیطی
مدیریت دانش در سازمانهای ارزیاب زیست محیطیمدیریت دانش در سازمانهای ارزیاب زیست محیطی
مدیریت دانش در سازمانهای ارزیاب زیست محیطی
 
کارگاه جامع آموزش مدیریت دانش
کارگاه  جامع آموزش مدیریت دانشکارگاه  جامع آموزش مدیریت دانش
کارگاه جامع آموزش مدیریت دانش
 
کارگاه آموزش مدیریت دانش تبریز
کارگاه آموزش مدیریت دانش تبریزکارگاه آموزش مدیریت دانش تبریز
کارگاه آموزش مدیریت دانش تبریز
 
انواع استراتژی های مدیریت دانش براساس مدل های مدیریت استراتژیک کدامند؟
انواع استراتژی های مدیریت دانش براساس مدل های مدیریت استراتژیک کدامند؟انواع استراتژی های مدیریت دانش براساس مدل های مدیریت استراتژیک کدامند؟
انواع استراتژی های مدیریت دانش براساس مدل های مدیریت استراتژیک کدامند؟
 
مبانی انگیزه کارکنان
مبانی انگیزه کارکنانمبانی انگیزه کارکنان
مبانی انگیزه کارکنان
 
آنتالوژی
آنتالوژیآنتالوژی
آنتالوژی
 
فناوری های مدیریت دانش
 فناوری های مدیریت دانش فناوری های مدیریت دانش
فناوری های مدیریت دانش
 
سیستم خبره خودآموز
سیستم خبره خودآموزسیستم خبره خودآموز
سیستم خبره خودآموز
 
گزارش دفاع نهایی مدیریت دانش
گزارش دفاع نهایی مدیریت دانشگزارش دفاع نهایی مدیریت دانش
گزارش دفاع نهایی مدیریت دانش
 

Recently uploaded

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
Neometrix_Engineering_Pvt_Ltd
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
jaanualu31
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
dannyijwest
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
HenryBriggs2
 

Recently uploaded (20)

Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptxHOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
HOA1&2 - Module 3 - PREHISTORCI ARCHITECTURE OF KERALA.pptx
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
Cybercrimes in the Darknet and Their Detections: A Comprehensive Analysis and...
 
Path loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata ModelPath loss model, OKUMURA Model, Hata Model
Path loss model, OKUMURA Model, Hata Model
 
8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor8086 Microprocessor Architecture: 16-bit microprocessor
8086 Microprocessor Architecture: 16-bit microprocessor
 
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
scipt v1.pptxcxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx...
 
Hostel management system project report..pdf
Hostel management system project report..pdfHostel management system project report..pdf
Hostel management system project report..pdf
 
Passive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.pptPassive Air Cooling System and Solar Water Heater.ppt
Passive Air Cooling System and Solar Water Heater.ppt
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Databricks Generative AI Fundamentals .pdf
Databricks Generative AI Fundamentals  .pdfDatabricks Generative AI Fundamentals  .pdf
Databricks Generative AI Fundamentals .pdf
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
Convergence of Robotics and Gen AI offers excellent opportunities for Entrepr...
 

D1.2 analysis and selection of low power techniques, services and patterns

  • 1. D1.2 Analysis and selection of low power techniques, services and patterns V1.0 Document information Contract number 687902 Project website www.safepower-project.eu Contractual deadline 01/07/2016 Dissemination Level PU Nature R Author OFF Contributors IKL, USI, FEN, IMP, KTH Reviewer IAB members:  Gebhard Bouwer (TÜV Rheinland)  Christophe Honvault (ESA)  Joaquín Autrán (GMV)  Daniel Gracia (Thales)  Giulio Corradi (Xilinx) Internal reviewer:  SAA Keywords low power techniques, system-level services Notices: This project and the research leading to these results has received funding from the European Community’s H2020 program [H2020-ICT-2015] under grant agreement 687902  2016 SAFEPOWER Consortium Partners. All rights reserved.
  • 2. Page 2 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Change log VERSION DESCRIPTION OF CHANGE V0.1 First draft OFF (Maher Fakih) V0.2 Planned contributions by IKL, USI, OFF, FEN, KTH V0.5 Contribution by IKL, USI, OFF, FEN, KTH V0.6 Consolidated and Reviewed by OFF (Maher Fakih) V1.0 Consolidated of IAB Review. SAAB Review by OFF
  • 3. Page 3 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Table of contents 1. EXECUTIVE SUMMARY ........................................................................................... 4 2. STATE OF THE ART OF LOW-POWER TECHNOLOGIES............................................... 5 2.1 OPERATING SYSTEM AND FIRMWARE LOW-POWER SUPPORT..........................................5 2.2 HARDWARE LOW-POWER SUPPORT...................................................................................9 2.3 POWER MEASUREMENT AND MONITORING SUPPORT....................................................19 3. IMPACT OF THE TECHNIQUES ON SAFETY (ANALYSIS) ........................................... 21 3.1 SAFETY, LOW-POWER AND MIXED CRITICALITY................................................................21 3.2 STATE OF THE ART IN SECURITY, MIXED-CRITICALITY AND LOW-POWER ........................22 4. ZYNQ PLATFORM POWER MANAGEMENT CAPABILITIES....................................... 24 4.1 OVERVIEW OF LOW-POWER FEATURES............................................................................24 4.2 POWER RAILS/DOMAINS...................................................................................................25 4.3 CLOCK CONTROL................................................................................................................26 4.4 PROCESSOR SYSTEM (PS): POWER MANAGEMENT ..........................................................27 4.5 PROGRAMMABLE LOGIC (PL): POWER MANAGEMENT....................................................29 4.6 MONITORING ....................................................................................................................30 4.7 SUMMARY OF LOW POWER FEATURES ............................................................................32 4.8 OUTLOOK TO ADVANCED FEATURES ................................................................................32 5. LOW-POWER SERVICES ........................................................................................ 34 5.1 LOW POWER SERVICES OF THE HYPERVISOR/OS..............................................................34 5.2 GENERATION OF LOW-POWER NOCS WITH NOC MPSOC SYSTEM GENERATORS............37 5.3 UTILIZATION OF LOW-POWER SERVICES AT HIGHER ABSTRACTION LEVELS....................39 5.4 DEFINITION OF THE LOW POWER SERVICES OF THE GENERIC PLATFORM W.R.T. THE ON-CHIP COMMUNICATION.............................................................................................................40 6. LITERATURE ......................................................................................................... 47
  • 4. Page 4 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 EXECUTIVE SUMMARY1. In this deliverable different aspects of the state-of-the-art of low-power technologies are analysed and confronted to the mixed criticality needs and requirements, in order to be able to make a selection of the technologies that are feasible for the SAFEPOWER needs. This means in particular that all candidate technologies and their combination had to be investigated from a safety standard perspective. The following technologies and techniques have been taken into account:  Hardware and Software level support/techniques for power management (e.g., Dynamic Voltage and Frequency Scaling (DVFS), Power Gating, Clock Gating),  Architectural services (e.g., fault tolerance, communications, diagnostics) taking into account hardware level power.
  • 5. Page 5 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 STATE OF THE ART OF LOW-POWER TECHNOLOGIES2. In this section a short overview of the current existing approaches is presented which cover low-power management technologies at different service levels e.g. at the level supported by the operating systems, the firmware support and the hardware support. Next we elaborate on the power monitoring at the above mentioned different levels. 2.1 Operating system and firmware low-power support The operating system and the hypervisor are the underlying software that manage and control of the hardware devices for the applications and execute the application activities. Although in the literature there is not any reference to hypervisors due to its recent appearance, some of the traditional techniques used by the operating systems can be moved from the OS layer to the hypervisor layer in a partitioned system. However, other techniques can be still considered at OS level, such as device management, due to the hypervisor allows to the OS in partitions a direct management of some devices. Next subsections review some of the main techniques involving the OS: 2.1.1 Memory management From the operating system point of view, the memory management can impact in two main issues: allocation of applications and management of memory types to reduce the energy consumption. There exist a group of works that have analyzed and proposed solutions to mitigate the amount of memory of the applications. In [89], it is focused on the amount of memory and the need of saving memory by compressing pages of memory. It requires from the OS the virtual memory management unit (MMU) to store and load compressed pages. Authors claim for an important memory size reduction and, consequently, energy. In [90], hardware mechanisms for compression of data between cache and RAM are proposed. Other papers have proposed several schemes for this purpose. The process compression/decompression requires extra time. However, this approach is intended to be performed via hardware and only in specific points of the execution. Dynamic memory allocation or dynamic storage allocation (DSA) has been a relevant part of the OSs for allocating memory to applications. Applications either perform memory requests to the OS for memory blocks or release already allocated blocks. The allocator algorithm is crucial for memory allocation and two main problems arise: temporal cost of the allocation and space usage. [91] presents a survey of these techniques. In [92] the TLSF allocator is proposed which performs the allocation and deallocation in constant time and achieve bounded fragmentation in the system. 2.1.1.1 Cache and scratchpad memories Scratchpad memory has been used as a partial or entire replacement for cache memory due to its better energy efficiency and predictability. Scratch-Pad Memory (SPM) is intended to avoid the main drawbacks of caches. They consist of small, fast memory areas (SRAM...), very much like caches, but are directly and explicitly managed at the software level, either by the developer or by the compiler. Hence, no
  • 6. Page 6 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 dedicated circuit is required for SPM management. This would mean that there is even a deterministic behavior which is not provided by typical cache implementations. Deterministic behavior is a major benefit for safety related applications. In [93], a comparison of several SPM with their advantages is presented. One of these advantages is the important reduction of the energy (up to 40% less energy than caches). In [94] a survey of techniques for SPM management is detailed. 2.1.2 Basic device management (hypervisor) 2.1.2.1 Processor management The processor management at OS level can have an impact on the energy consumption via:  Setting the processor in a low power state or switching-off during the time intervals of no activity.  Scaling CPU voltage  Scaling CPU frequency  Scaling both, CPU voltage and frequency These techniques can be used to reduce the energy consumption but the main problem in real-time systems is the deadline of the tasks. When the voltage or frequency of the CPU decreases the CPU speed decreases and the time required to complete a task increases. The decision about which conditions should be satisfied in order for a task or a set of real-time tasks to meet the temporal constraints is an NP-Hard problem [95]. In the recent years, techniques to decide an on-line or off-line schedule to guarantee the task deadlines while minimizing the CPU energy consumption have proliferated in journals and conference papers. A review of them can be found in [[96], [97], [98], [99], [100], [101], [102]]. 2.1.2.2 Scheduling techniques for low power consumption Dynamic voltage and frequency scaling (DVFS) is a technique that allows to modify voltage and/or frequency of the CPU based on performance and power requirements. Several commercial processors support this technology for saving power. The limitation of DVFS is that it increases execution time of the tasks and can lead to miss deadlines. So, the appropriated selection of the scale is fundamental to guarantee deadlines while reducing the energy consumption. Several techniques have been proposed to save energy while guaranteeing task deadlines. Some of these techniques compute the slack available and adjust the frequency or/and voltage to reduce the slack. In general, high priority tasks are executed at higher frequencies to generate more slack and adjust lower priorities task to reduce the frequency. Another technique uses non-linear optimization to find the optimal frequency for every task. This technique, however, has large complexity and hence is only suitable for the off-line use. On the other hand, dynamic power management (DPM) [98] takes profit of the low power energy states (like sleep or stand-by) every time the processor is idle. DPM is a mechanism that dynamically reconfigures a system to provide the requested services and tasks at the same performance level but with a minimum number of active components or a minimum load. DPM considers the transition time between different power consumption modes. Switching from the active mode to the sleep mode and then back to the active mode has a
  • 7. Page 7 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 penalty in time and energy overhead. Therefore, it requires to check the impact from a point of view of scheduling and energy consumption. The operating system is in charge of the policy for DVFS or DPM decisions. These can be defined off-line or on-line. Off-line decisions are defined prior the system execution and it is provided to the OS the information related to the execution conditions for each task. On-line decisions are taken by the OS according to the execution results. As far as we know, there are not works involving hypervisors in these approaches. On the basis of the existence of a hypervisor that deals with partitions that encapsulate an OS (guestOS) and the internal application tasks, the schedule is hierarchical. Two scheduling levels coexist:  Hypervisor schedule: it is a static schedule of partitions where temporal windows for partitions are off-line decided. In multicore systems, the schedule also defines in which core the partitions (or the temporal windows of a partition) are executed.  guestOS schedule: in the temporal intervals in which the partition is scheduled by the hypervisor, the guestOS schedules the internal tasks according its priority or deadline. Under this view, a separation of concerns can be defined for the two scheduling levels:  Hypervisor level: off-line scheduling using non-linear optimization techniques. The off-line schedule can decide allocation of partitions to cores, temporal intervals allocation and a range of the voltage and/or frequency scale for each temporal interval. Additionally, it can deal with DPM when the core is idle or the partition finishes its activity before the allocated time.  guestOS level: on-line scheduling using execution time information and the range of allowed voltage and/or frequency to adjust the energy consumption. 2.1.2.3 Clock management The RTOS offers to the applications a regular clock service. In addition, the RTOS has to program a timer device to be aware about the time occurrence. The clock management at RTOS level maintains the time value and increments it according the time advance. However, the RTOS can program the timer to be interrupted every time unit (system tick) and maintain a counter with the increments. In practice, ticks execute periodically, at a rate sufficient for the most granular timing needed by the application. As a result, most system ticks will not result in a time-driven function being executed. In energy efficient applications, it is clearly undesirable to be woken up from a low-power mode just to service the system tick timer interrupt and then find there is nothing to do. System tick is the basis for the time activities in the OS. When an application is running, it is interrupted by the clock management every system tick with the implication of change of the power mode required to execute the OS service. An alternative to this common practice in OS-s is to program the timer periodically to manage the clock with very long periods (n seconds) avoiding the overflow count. In this case, the OS is interrupted every n seconds incrementing an internal OS counter of the interrupts received. The system time is built with the value of the internal counter and the value of the timer register. This mechanism reduces dramatically the number of interrupts due to the clock management and allows to reduce the time used by the OS handling
  • 8. Page 8 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 interrupts and, consequently, the interrupts to the application with the change of the power mode implication. This management impacts directly in the energy consumption of the system. Time events are directly managed by the OS programming one-shot timer (use of a second timer). 2.1.3 Device management (OS) OS-s manage devices in the system. Devices that have the hardware capability can be managed to enter in device power states. The OS can apply different policies to individual devices. The OS can:  Power up a device as soon as needed after system start-up or idle shutdown.  Power down a device at system shutdown time or put it to sleep when idle.  Enable device wake-up, if the device supports wake-up capabilities.  Manage device performance states, if the device supports decreasing performance or features to reduce power consumption. When the system is composed by the OS and its applications, the OS can power down the device when the system is idle. In a partitioned system based on hypervisor, it provides virtual CPU to the partitions but devices are managed by the OSs. Explicit resource allocation is specified in the configuration file of the system. In the case of devices, they are explicitly allocated to partition that contain the device drivers and manage them. So, all policies related to device power management should be handled by the guestOS. Based on the hypervisor cyclic scheduler, partitions run at specified temporal windows defined in the configuration file. From this point of view, the hypervisor can know at which time intervals the partition is not under execution and could apply specific policies to put devices to sleep. However, as the device drivers are allocated and managed at OS level, the hypervisor cannot do it. Therefore, the task to put the devices to sleep mode must be delegated to the partitions based on the functional operation and utilization of peripherals. 2.1.3.1 Firmware support Although power-management techniques may be directly implemented in the software application code, yet offering mature and highly tested operating system services supporting such techniques would be more reliable and less cumbersome. Typically, a developer would specify for the application to be deployed a set of use-cases where each use-case demands a certain operating mode with specific performance and power requirements. Depending on the current modus operandi of the current application, the RTOS applies the appropriate power management service by setting the entire SoC or individual sub-devices into the corresponding power mode. The main requirement on the Firmware support for power management is that its services should possess a complete knowledge of the underlying hardware power capabilities and should be able to control these setting it in different power modes. In order to maintain complexity, firmware is used to support the RTOS via offering service to low-level hardware technologies (see Figure 1).
  • 9. Page 9 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Figure 1: Operational phases of Power management services (Source Intel Corporation, 2009) [45] Usually modern power-aware Real-time Operating Systems (RTOS) comes with their built-in firmware supporting different power services such as DVFS, clock gating and others mentioned above. Taking a look for e.g. at the Linux Core Power management User Guide (V4.1), shows that Linux kernel supports a variety of (built-in) dynamic power-management services such as DVFS, CPUIdle and Smartreflex and idle-power management services such as Suspend/Resume services. 2.2 Hardware low-power support 2.2.1 Low power techniques for predictable architectures and communication Power consumption can be divided into three parts, dynamic, static and short-circuit power. They depend on physical values like voltage or frequency, so hardware support for low power techniques can have a significant impact. The next subsections deal with techniques concerning aspects of communication and standby-sparing as well as fault tolerance and safety aspects. 2.2.1.1 Low power on chip communication On today's System-on-Chips, many different cores can be integrated on one chip. An important aspect of power consumption is communication between the single cores. Therefore, the following subsections handle low power techniques for on chip communication. 2.2.1.1.1 Run-time clusterization for energy efficient on-chip communication System-level exploration of run-time power clusterization as presented in [68] increases energy efficiency of on-chip communication using an adaptive system architecture for power management called Dynamically-Clustered DVFS (dynamic voltage and frequency scaling), for short DCDVFS. At runtime, overburdened or idle network regions are identified and reconfigured with new power schemes. This method improves Voltage Island partitioning (V/F partitioning), where spatial locality of communication traffic on a parallel platform is exploited. The benefit of DCDVFS is, that clusters are configured at runtime while in V/F partitioning the islands are defined at design time. On that score, spatial variations of communication traffics are also considered.
  • 10. Page 10 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Simulations on an 8*8 mesh Network-on-Chip (NoC) and a 65nm power model extracted from Orion 2.0 show, that the approach achieves much lower energy for traffic with spatial variations compared to existing approaches (9% to 42%). Besides, the approach incurs a moderate and predictive latency and minimal area overhead. 2.2.1.1.2 Adaptive SoC System-on-a-chip (SoC) implementations integrate different intellectual property (IP) cores or take advantage of application parallelism to improve performance. One disadvantage are inefficiencies like hot spot bottlenecks which may introduce additional power consumptions. Therefore, in AsoCs [69] a statically scheduled interconnect structure increases system throughput since unnecessary interconnect switching activities are eliminated. Additionally, application mapping tools balance load across all cores and unused cores can be dynamically reconfigured to low power. The authors show, that the interconnect power consumption is low and the overhead due to configuration streams is less than 10% for both bandwidth and power. As another approach [70], minimizing energy consumption is reached by mapping application task to heterogeneous processing elements (PEs) on a NoC which may operate at different voltage levels. Therefore, the tasks must be mapped to PEs and the PEs to routers. Additionally, operating voltages must be assigned to the PEs and the data paths must be routed. The steps can be solved sequential or unified. The unified approach has more performant and energy efficient results as shown in evaluations using the E3S benchmark suite. The authors also show, that their heuristic approach achieves near-optimal solutions while it is much faster than the optimal algorithm. 2.2.1.1.3 DVFS in NoCs A NoC is a high performance and scalable alternative to the bus-based architecture [72] but consumes a lot of power (up to 39% in [71]). Reducing the power consumed by the NoC leads to significant system wide energy savings. An effective hardware technique is DVFS. DVFS allows to adjust the processor frequency depending on the workload by reducing the supply voltage. Since there is a square relationship between voltage and power and a linear relationship between voltage and frequency, reducing the frequency results in a cubic power saving [72]. The idea is to adjust the voltage in low-utilization phases such that the circuit operates at just the speed it requires to process the workload. There has been a lot of research done in improving DVFS. Some of them are described and compared in [72]. Solutions take advantage of e.g. CPU idle states during memory access, applying adaptive design techniques on local NoC units to globally reduce energy consumption or using per-core DVFS than varying the whole chip's voltage. Also additional hardware is required like a Power Management Unit (PMU) that controls the generation of the supply voltage and clock. One disadvantage of DVFS is that due to increased execution time also leakage energy rises [73]. The authors of [73] introduce enhanced race-to-halt to resolve that and show in simulations an improvement of up to 8% over the existing Leakage Control Earliest Deadline First schedule [74]. DVFS is often combined with other techniques. For example, in most cases memory limits the reduction of frequency and voltage in the whole system. Using voltage islands is
  • 11. Page 11 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 lucrative since communication and memory can run at different voltages such that both are are safe and meet their throughput requirements [80]. 2.2.1.1.4 Slack optimization The unused processing time in a system is called slack and can be categorized in two types, dynamic and static. The latter exists due to spare capacity since the system is loaded less than what can be guaranteed by the schedulability tests. Differences between the worst- case assumptions and the actual behaviour result in dynamic slack [73]. Lin et al. [75] presented 4 principles for effective slack management and developed 4 slack scheduling algorithms for Earliest Deadline First (EDF)-based systems that support mixed criticality. The principles are (1) to allocate slack as early as possible with the priority of the donating task, (2) to allocate slack to the task with the highest (earliest original) priority, (3) to allow tasks to borrow against future resource reservation (with priority of the job from which the resources are borrowed) to complete their current job and (4) to retroactively allocate slack to tasks that have borrowed from their current budget to complete a previous job. Using these principles can reduce the average deadline miss ratio by up to 100%. The authors of [73] improved these algorithms to reduce power consumption in combination with DVFS such that the system can consume available slack in idle mode. 2.2.1.1.5 Variable channel width To improve throughput and latency, existing NoC implementations use wide channels of about 128-bit or more. While these channel widths are beneficial for large message sizes (512 bits or more), short control messages only use 64 bit. Since control messages account for a significant portion of the NoC traffic, it is a waste of energy not to use the remaining bits for other messages [71]. The approach in [71] splits the 128-bit channel into two narrower channels of 64-bit which allows transmitting a short message of 64-bit on the one link and shutting down the other one. If no congestion occurs, it is nearly as performant to transmit two short flits in the wide links as to send them one by one in a single narrow channel. However, the latter has the chance to enable clock gating at no performance penalty and low hardware overhead. For that reason, the approach is applicable to reduce the NoC's power consumption by up to 25% with workloads from the PARSEC 2.1 benchmark. 2.2.1.1.6 Router Power Gating Router Power Gating is an effective way to reduce power consumption in NoC by switching off routers but may introduce wakeup-delay and energy overhead caused by frequent mode transaction [76]. A major power-consuming operation in NoC is memory access and data moving. Yuho Jin [76] presents a combination of router power gating with region based data placement. The idea is to reduce data traffic by localizing private data and concentrating shared data in one region of the NoC which increases the power gating opportunity. Therefore a dimensionally power-gating router with a region-based routing algorithm is introduced which reduces router static power and performance/energy overheads in power gating. The SPEComp benchmark shows power savings by 46% using dynamic power gating management and 20% in case of static management [76] in an 8*8 mesh NoC with 64 cores.
  • 12. Page 12 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 2.2.1.2 Scheduling for low power CRTES architectures Hardware capabilities like DVFS on their own are insufficient and must be paired with software to control them. Placing this logic into the OS scheduler is attractive because of simplicity, low cost and low risk associated with modifying the OS' scheduler part. In [83] the authors describe three types of scheduling techniques. The first type controls DVFS and DPM to dynamically throttle voltage and frequency of the CPU or temporarily suspend its operation. The second one performs thermal management. It primarily relies on the placement on threads in cores to avoid thermal hotspots and temperature gradients. At last, asymmetric systems are depicted. These systems are built with low-power and high- power cores on the same chip executing the same binary. The goal is to assign threads to cores according to their requirements on resources. All algorithms discussed need dynamical monitoring properties of the workload, to make decisions that consider the characteristics of the interplay between hardware and workload, and controlling the configuration and allocation of CPU cores to make for a best trade-off between performance and power consumption. 2.2.1.3 Standby-sparing In real-time systems, redundancy is commonly used to achieve fault tolerance. While time- redundancy does not incur high energy overhead, it is not capable to obtain the reliability requirements of safety-critical applications. Standby-sparing as a hardware-redundancy technique can meet those reliability constraints but is not suitable for low-energy systems since they either impose considerable energy overheads or are not proper for hard timing constraints [77]. Ejlali et al. [77] developed an online energy management technique for standby-sparing in hard real-time applications called low-energy standby-sparing (LESS). It exploits available slack at runtime to reduce energy consumption while guaranteeing hard deadlines and uses Dynamic Power Management (DPM) which shuts down idle system components [78]. Compared to an existing low-energy time-redundancy system, LESS is more reliable and provides about 26% energy saving in case of relaxed time constraints. For tight deadlines, LESS preserves the reliability but with 49% more energy consumption. Compared to triple modular redundancy or duplication as well-known hardware redundancy techniques, the increase is much lower since these methods have an overhead of 200% and 100%, respectively [77]. 2.2.1.4 Low energy methods and safety In many applications like automotive and avionics many tasks with different criticality levels are integrated on one chip building mixed-criticality systems. Due to the complexity of modern computer platforms, obtaining accurate WCETs is hard. Uncertainty in WCETs can lead to task overruns which should be avoided in case of safety-critical tasks. Typical solutions are termination of low-critical tasks or degradation of the service provided. Disadvantage is, that removing tasks with low criticality-level means, that the required safety associated with those levels are also removed [86].
  • 13. Page 13 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 The approach given in [86] prevents that by using DVFS not to slow down the system to save energy but to speed it up. Then, fast recovery is possible as well as resolving higher workload. Due to higher frequencies, energy dissipation also increases. If the system still misses deadlines, tasks can also be terminated instead of overclocking to normal speed. The proposed technique is evaluated using an industrial flight management system. Another example concerns DVFS since the reduction of the supply voltage can increase transient fault rates [85]. Zhao et al. use dynamically allocated recoveries. They show, that providing recovery allowance to a given periodic task achieves high reliability levels as long as the allowance can be reclaimed on demand. To determine the recovery allowance and frequency assignments they use a feasibility test which minimizes energy consumption while satisfying timing and reliability constraints [85]. 2.2.1.5 Fault tolerance and low power Low power techniques may have the problem that they negatively affect the system's reliability. E.g., studies have shown that DVFS comes at the cost of significantly increased transient fault rates [85] . Reducing the supply voltage of caches also has a negative impact on their reliability [79]. This section describes techniques that provide both, fault tolerance and low power. 2.2.1.5.1 Energy saving with fault tolerant caches There are a lot of mechanisms for leakage reduction or fault-tolerance in deep-submicron memories but they often do not affect both aspects. Former approaches for fault-tolerant voltage-scalable (FTVS) SRAM cache architectures can suffer from high overheads. That is why the authors of [79] introduce a static (SPCS) and dynamic (DPCS) variant of power/capacity scaling, a simple and low-overhead fault tolerant cache architecture. The mechanism combines global multi-level voltage scaling of the data array SRAM cells with power gating of blocks that become faulty at each voltage level. SPCS sets the runtime cache VDD statically, such that almost all of the cache blocks are not faulty. DPCS reduces the voltage to save more power than SPCS while limiting the impact on performance caused by additional faulty blocks. Due to significantly lower overheads the approach can achieve lower static power for all effective cache capacities than a recent complex FTVS work. Architectural simulations show energy savings of 55% (SPCS) and 69% (DPCS) with respect to baseline caches at 1V. In the worst-case cache configuration there is no more than 4% of performance and 5% area penalties while maintaining high yield. 2.2.1.5.2 Fault tolerant low power communication Challenges in communication systems are: power consumption, which occurs due to the complex algorithms enabling broadband communication, scaling which enables unprecedented integration but introduces a penalty by leakage power and reliability, and costs. Generating high yielding architectures needs built in self-tests and self-repair, so 100% error-free chips are very expensive and soon will become impractical [80]. One solution is a system that has built in inherent redundancy. Communication systems are a perfect fit due to the high level of redundancy. The authors [80] examined the
  • 14. Page 14 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 relationships between the components and their vulnerability in terms of power consumption and reliability and between the needs, assumptions and requirements of the algorithm executing the design. If the algorithm is able to accept and possibly correct hardware induced errors, it becomes possible to co-design the algorithm and the hardware simultaneously and thereby reduce the disadvantages of technology scaling. They intentionally vary the operating conditions to a point when the error occurs and to exploit the extended design space to optimize other aspects which leads to an optimal design in power consumption and robustness. The example of a WCDMA modem shows savings of 23% in embedded memory power consumption and 13% in the whole system. 2.2.1.5.3 Low overhead two state check pointing Transient faults are a major reliability concern which can be tolerated by triple modular redundancy or standby-sparing. Checkpointing with rollback recovery is another well- established technique but incurs significant time and energy overhead which may not be feasible in hard real time systems [81]. A low overhead alternative is two-state checkpointing (TsCp). It differentiates between fault-free and faulty execution and leverages to types of checkpoints. Non-uniform intervals are determined based on postpointing checkpoint insertions in fault-free states to decrease the number of checkpoints. Uniform intervals shall minimize execution time for faulty states, leaving more time for energy management in fault-free states. Enabling the checkpoints at selective locations curtails the time and energy overhead while considering deadlines, execution time and tolerable faults. A trade-off between the number of checkpoints and the operating voltage-frequency settings obtains energy-efficient fault tolerance. Evaluation on an embedded LEON3-processor with non-volatile memory technology to store the checkpoints has shown that TsCp reduces the number of checkpoints by 62% at average which results in 14% and 13% reduced execution time and energy consumption, respectively. The combination with DVS achieves up to 26 % (21% on average) energy saving compared to the state-of-the-art checkpointing while providing a reasonable reliability. 2.2.2 Low power multicore embedded systems To increase performance and provide scalability, nowadays multicore systems are widely used [87] . Multicore systems are more complex than single cores, e.g. high level caches must be coherent [88] , so some of the approaches described above only consider singlecore systems. In future work, these techniques must be applied to multicore systems (e.g. [79]). Other examples for low power techniques concerning multicores are dynamic WCET estimation with DVFS and worst-case power estimation which we will discuss in the next subsections. 2.2.2.1 Dynamic WCET estimation for real-time multicore embedded systems supporting DVFS In real-time systems, the worst-case execution time (WCET) of tasks is required to ensure systems stability. A high estimation accuracy reduces the number of deadline misses and supplementary improves energy savings [82]. The Processor-Memory model (Proc-Mem) proposed in [82] dynamically predicts the execution time of an application running on a multicore processer when varying the
  • 15. Page 15 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 processor frequency. Instead of analyzing the application's source code or hardware platform, Proc-Mem executes the workload during the first hyperperiod at maximum speed to find the required input parameters for the model which then estimates the most energy- efficient frequency that fulfils the deadline through different applications' periods. These values are used by the scheduler for subsequent periods. Compared to a typical Constant Memory Access Time model (CMAT), the deviation of Proc- mem is always lower than 6% with respect to the measured execution time while the deviation of CMAT always exceeds 30%. The approach reaches up to 47.8% (22.9% on average) of energy savings for a similar number of deadline misses. 2.2.2.2 WCP (Worst-Case Power) Estimation In [43] , [44] authors claim that energy is as important as time in mixed-criticality systems and they demonstrate how an incorrect handling of energy can violate mixed-criticality guarantees. With the purpose of overtaking this issue, a pioneer work was done in [40] utilizing static analysis to estimate the worst-case energy consumption of a program running on complex architectures and providing power guarantees. For the same purpose, a monitoring and control mechanism has been proposed in [41] to isolate the power consumption of mixed-criticality applications on a many-core platform. Xilinx offers also an accurate WCP estimation using the Xilinx Power Estimator (XPE) [42] . XPE is a kind of excel sheet which can be calibrated with design specific parameters (see the proposed 7 steps in [42] ) to obtain an estimation of the WCP. In addition, some novel work was presented in [43] on how to estimate the worst-case power for energy-constrained embedded systems. In this work, authors propose to compute upper bounds for the energy consumption by statically analysing (combining implicit path enumeration and genetic algorithms methods) the program code based on the energy costs of single instructions on the target architecture. In case no precise energy costs are available for single instructions, the authors propose to determine the WCP by measurement. Here they support finding a set a suitable program inputs to be used as measurement parameters. 2.2.3 PCB support for low-power As today’s electronic designs continue to grow in complexity, managing power consumption and optimizing overall efficiency become even more important. Accurate power supply voltage and current monitoring is crucial to conserving power and guaranteeing reliability in everything from industrial and telecom applications, to automotive and consumer electronics. 2.2.3.1 Best practices on low-power PCB designs Measuring power consumption, as well as other critical parameters, and optimizing overall efficiency can be a challenge with discrete solutions. Nevertheless, several tools have been developed which will help us in this work by analysing our PCB design’s power integrity and consumption and optimizing it.
  • 16. Page 16 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 HyperLynx Power Integrity – Mentor Graphic Corporation This tool [1] provides modelling of power distribution networks and noise propagation mechanisms throughout the PCB design process. It is useful for identifying potential power integrity distribution issues that can interfere with board design logic, and for investigating and validating solutions in a “what-if” environment. HyperLynx PI analyzes voltage drop, identifying areas of excessive current density in the layout, simulates IC switching noise as it propagates throughout planes and vias and facilitates Power Delivery Network (PDN) impedance validation across the full operating frequency range. PDN Analyzer – Altium The PDN Analyzer [2] allows resolving PDN issues as they arise in the board layout process. It offers analysis of complex nets and copper geometries, plots DC voltage and current density graphics and it provides customized views for DC power analysis. All that in the same unified design and analysis environment. CR-5000 Lightning – Power Integrity Advance – Zuken This tool [3] provides power integrity and electromagnetic interference analysis within the real-time PCB design flow. With EMI, AC and DC power analysis combined in a single environment, it helps determining the best decoupling and power distribution strategy for the pre-layout and post-layout stages with support for a complete what-if environment. PI Solution – Sigrity The Sigrity PI Solution [4] offers 4 different tools related to Power Integrity:  PowerSI: Detailed electrical analysis with fast and accurate signal/power integrity and design-stage EMI analysis, S-parameter model extraction, and frequency domain simulation.  PowerDC: Electrical/thermal co-simulation, hotspot detection, and signoff for low- voltage, high-current PCB and IC package designs.  OptimizePI: AC frequency analysis of boards and IC packages, with support for pre- and post-layout decap studies that ensure high performance at system and component levels.  PowerSI 3DEM Option: Full-wave and quasi-static solver technology capable of accurate analysis of complex 3D structures. 2.2.3.2 Low power commercial microprocessors In this section, different COTS technologies are identified as standard processors families that are commonly used in the industry for their low-power features. ARM [5] uses a 32-bit RISC (Reduced Instruction Set Computing) instruction set for its processors. The processors using this architecture require significantly fewer transistors than typical CISC (Complex Instruction Set Computing) processors, reducing cost and power use. They offer three architecture series: Cortex-A (Application), Cortex-R (Real-time) and Cortex-
  • 17. Page 17 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 M (Microcontroller). This last series is the most common in embedded systems for its range of energy efficient, scalable and compatible processors [6]. For instance, NXP’s LPC Cortex-M series microcontrollers [8] are a commercial implementation of this series. It also exist a VHDL IP implementation of the Cortex-M0 family, a small, simple (and therefore) a low- power footprint soft-core. One interesting low-power feature of the ARM family is the big.LITTLE technology [7]. The big.LITTLE processing is a power-optimization technology where high-performance ARM CPU cores are combined with the low-power and more efficient ARM CPU cores to deliver peak-performance capacity, sustained performance, and parallel processing performance, at lower average power. It combines the ultra-low power ARM Cortex A7 core with the fast Cortex A15 core, so depending on the task’s intensity which is being processed, it will be sent to one or the other core, saving up to 75% of the energy. The Texas Instruments MSP430 [9] is a family of 16-bit von Neumann architecture based processors. The family consists of more than 500 products, more than 300 of which being categorized as Ultra-low Power. This subfamily includes processor running from 4 to 24MHz, which can consume less than 1µA in idle mode. They also offer multiple of low-power modes and peripherals which can run autonomously in low-power modes. A specific subcategory (MSP430FRxx) highlights the FRAM technology, which combines the best of Flash and SRAM memories, since it is non-volatile while offering fast and low-power writes. Minimum operating voltage can be as low as 0.9V for some processors. All families of MSP430 series of microcontrollers have one active mode and five software selectable low-power modes of operation. An interrupt event can wake up the device from any of the five low-power modes, service the request, and restore back to the low-power mode on return from the interrupt program. The Renesas RL78 [11], is a 16-bit CISC architecture based family of microcontrollers, intended for low-power applications. Depending on the product, the maximum frequency is between 20 and 32MHz while minimum operating voltage is between 1.6 and 2.7V. The STOP mode current consumption is of 0.52µA. The RL78 family combines the high- performance architecture and low power consumption of the 78K0R, with the peripheral functions of the R8C and 78K. There are also some processors with x86 architecture that are oriented to meet the low power constraints. The Intel Atom processor family has a subset of processors that are oriented to the embedded market, some of which are designed to have very low power consumption. The Atom D525 [12] is an ultra-low-voltage dual core processor in which each core is 2-way hyperthreaded. The Intel Atom Z3000 Processor series delivers leading performance with all-day battery life. It offers a smaller footprint and lower power usage while also enabling double the compute performance and triple the graphics performance compared to the previous-generation Intel Atom processor. The Intel Atom Z3000 Processor series also includes Intel® Burst Technology 2.0 with four cores, four threads and 2MB L2 cache. The Intel Quark SoC X1000 series [13] is a system on chip from Intel Corporation. This SoC series allows low power, thermally constrained, fanless, and headless design [14]. They can work at frequencies up to 400 MHz and the processor offers three low power modes. The Intel Quark SoC X1000 series has the ability to run at half or at quarter of maximum CPU frequency in order to decrease the power consumption.
  • 18. Page 18 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 2.2.3.3 ICs for measuring Power/Energy One can find different integrated circuit components that ease the power-energy measurements and power supply management within the PCB design. UCD90XXX – Texas Instruments TI’s UCD90XXX Sequencer and Monitors are power management dedicated integrated circuits with I2C/SMBus/PMBus and JTAG interfaces. They allow sequencing, monitoring and resetting of power supplies at start-up/ power-down, when external events occur or when voltage or current thresholds are surpassed. It supports ACPI specification, by defining up to 8 system states with only 3 GPIs and defining which rails are on and which ones are off at each system state. It also enables fault and peak logging into its FLASH memory. UCD92XX – Texas Instruments The UCD92xx family of digital PWM controllers [15] is a multi-rail, multi-phase synchronous buck for digital devices designed for non-isolated dc/dc power applications. They integrate dedicated circuitry for dc/dc loop management with flash memory and a serial interface to support configurability, monitoring, and management. They integrate multi-loop management with sequencing, margining, tracking, and intelligent phase management to optimize the solution for total system efficiency. LTC29XX Power Monitors– Linear Technology LT’s power monitors provide [16] high configurability without compromising performance or functionality. The LTC2990/LTC2991 quad/octal I2C power monitors can be configured up to 35 different ways, perfect for 3V to 5.5V systems that need simple and accurate digital monitoring of combinations of temperature, voltage and current. If higher voltages are required, then the LTC2945 and its 0V to 80V operating range also allows monitoring current, voltage and power via an I2 C interface. The LTC2946 provides energy readings for rails up to 100V. For measuring AC or instantaneous power, the LT2940 analog power and current monitor for 4V to 80V systems brings together the necessary circuits to accurately measure, monitor and control power. Digital Power System Management– Linear Technology A PSM product is configured and monitored through a PMBus/SMBus/I²C digital interface. LTpowerPlay development environment provides control and monitoring of power supply voltage, current, power and energy use, sequencing, margining and “black box” fault log data. High current outputs, up to and exceeding 200 amps, for FPGAs, ASICs and processors can be designed with the multiphase extender LTC3870/-1 andLTC3874 devices. These slave
  • 19. Page 19 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 controllers provide a small and cost effective solution for supplying very large currents by being cascaded with the master controllers. Up to 12 phases can be paralleled and clocked out-of-phase with the LTC3870/-1 operating up to a 60V input and the LTC3874 working with the line of sub-milliohm DCR sensing current mode master controllers. 78M6612 Power and Energy Measurement IC – Teridian The 78M6612 [16] is a single phase AC power measurement and monitoring (AC-PMON) IC capable of 0.5% Wh or better accuracy over 2000:1 current range and over temperature. Four analog inputs are provided for measuring up to two AC loads or wall outlets. It also includes an 8-bit MPU core with 32KB of embedded FLASH, an UART interface, and a number of GPIO for easy integration into any power supply or outlet monitoring unit. 2.3 Power measurement and monitoring support In general, as indicated in [39] , power measurement and monitoring support is indispensable to circumvent inaccuracy when deploying computer-aided power analysis tools since estimates obtained by the former method can deviate from the actual power consumption of the working MPSoC. Utilizing performance counters (see survey in [39] ) is a run-time monitoring technique typically used to characterise the power consumption of a running system (or its subcomponents). With this technique the activity of a certain block is recorded via performance counters, and these are used in mathematical empirical models to reason on the power consumption at different granularity levels (the instruction, block or software tasks granularity levels). Yet the design-time estimates, even if based on run-time performance counters measures, are not accurate and depend on infrastructure specific counters. According to [39] a combination of infrared imaging, and electric current measurement techniques (see Figure 2) can yield a high-resolution spatial power maps of individual parts for a given circuit. A mathematical foundation taking the thermal map and the current measurements (see Figure 2) and outputting the corresponding power maps is given in [39] . The infrared imaging uses infrared technology to obtain a thermal profile of the individual circuit components. By knowing the thermal behaviour of a certain circuit and its heat diffusion to the ambient temperature, the power can be obtained (e.g. using least-square estimation see [38] ). This method is considered the most flexible since it is non-invasive and does not require extra design setup. Measuring the electrical current consumption of a given system can be performed either via the usage of shunt resistors or using a clamp meter [39] . In the first approach, shunt resistors having very low resistance (for e.g. 1mΩ) and a high accuracy ±0.1% are connected in series with the positive supply lines [39] . They are deliberately chosen with low resistance in order not to influence the load supply, offering at the same time an interface to measure
  • 20. Page 20 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 the current supply. By knowing the current and the supply voltage the power consumption can then be easily calculated. In the clamp meter based approach, a clamp meter device measures (based on the hall-effect phenomena) the induced magnetic field variations around the supply wire und use the measured values to obtain the electric current [39] . While the clamp-meter based measuring is less intrusive, the shunt-based measurements are more accurate and less eligible to noisiness. Both techniques require either ADCs or digital multi-meter to sample the analog signals measured. Electrical current measurement can be utilized by power managers at run-time to monitor the power consumption the system (or certain subcomponents), in order to optimize power consumption. Figure 2: Monitoring concept based on infrared and current measurements (taken from [39] )
  • 21. Page 21 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 IMPACT OF THE TECHNIQUES ON SAFETY (ANALYSIS)3. In this section the impact of low-power capabilities on dependability will be analyzed. Within dependability, which involves several properties (availability, maintainability, safety, security, etc. [30]), the focus of the SAFEPOWER project is the impact of those features on safety. This particularly means that all candidate technologies and their combination have to be investigated from a safety standard (e.g., generic IEC-61508, railway EN-5012x and other is the aerospace domain)perspective so the resultant low-power services are also “safe”. Within the following lines a preliminary investigation on safety and low-power is surveyed and, additionally, a brief study on security and low power is also developed. 3.1 Safety, low-power and mixed criticality Into this first section, the impact of safety of the SAFEPOWER low-power and mixed- criticality technologies is analyzed. 3.1.1 Safety and Low Power Safety-critical applications have made very limited use of energy and power management features. Non-safety related embedded applications (e.g., consumer electronics) can shut- or slow-down hardware features considering not much more than the impact of the user experience, but safety-critical applications must also carefully consider the impact of those actions on the overall system safety. In the latter case, those low-power features must comply with safety standard requirements (e.g., IEC 61508) in both: (1) the product life-cycle or functional safety management (to avoid systematic design faults) and (2) techniques and measures to control failures during operation (to control physical random faults). Due to the explosion of autonomous systems thanks to the big improvements on the energy storage technologies (e.g., batteries) or purely motivated by energy budget requirements, power efficiency and power management are also very interesting cost competitive features for safety critical systems. This safe power management cannot be done, according to the product life-cycle, taking online decisions (dynamic reconfiguration is not recommended for SIL 2-4 integrity levels in IEC-61508, and predictable command execution is also mandatory in other standards, such us, in space). This suggests that the adaptation to changing scenarios (e.g., a low power mode) on safety-critical systems must be addressed with precompiled and verified schemes, like in [31] at operating system level or in [32] at network level. Standard power management actions, such as, gating actions (peripheral clock or core turn-off), must follow safe shutdown and start-up protocols. In [33] for instance, safe start- up and shutdown scenarios are considered for an IEC 61508 compliant hypervisor partitions, but not primarily for power management proposes. Component power management (and temperature) is also closely coupled with its lifetime. A proper (and low) power demand of a specific hardware component could prolong its lifetime and, directly, the intrinsic reliability of the system. In fact, temperature monitoring is suggested as a major diagnostic element when using an on-chip redundancy for safety proposes (e.g., IEC-61508-2 Annex E). One can address those requirements with external chip external monitoring components (see section 2.2.3.3) or with a more efficient way using ring oscillator if the target device is an FPGA [36]
  • 22. Page 22 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 3.1.2 Safety and Mixed-criticality A mixed-criticality approach can also benefit from modular certification. This feature is considered in several domain safety standards with different name: in IEC 61508 each module is named “compliant item”, in ISO 26262 it is called “safety element out of context (SEooC)” and in EN51019 “generic product”. The modular approach reduces the impact of changes to a subset of the safety case, enabling reusability of its parts [34] Low power services must comply with the safety argumentation behind such an approach. For example, the safety-concept approach within MULTIPARTES [34] PROXIMA [35] and DREAMS [33] EU projects proposed an argumentation for the use of multicores for mixed- criticality system considering spatial and temporal isolation among partitions mapped to different cores, but the impact of temperature or power was not explicitly analyzed. In fact, this safety-concept approach is an effective way to establish a formal dialogue with a certification authority and move away from the academic safety-certification analysis with a rigorous safety argumentation. This early contact with certification authorities identifies possible conflicts w.r.t. to certification standards and paves the way to the future industrialization of the technology. In the case of the CONTREX EU project [37] current activities in the area of predictable computing platforms and segregation mechanisms were complemented with techniques considering extra-functional properties such as real-time, power, and temperature for safety/certification in mixed-critical systems. In contrary to the SAFEPOWER proposal, while some safety measures were partly considered, no complete safety process was integrated to the overall design flow of the CONTREX project. 3.2 State of the art in security, mixed-criticality and low-power Classically, safety-critical systems have been considered close or semi-close systems with very limited and controlled interactions with its environment. Current embedded systems and, particularly, mixed-criticality nodes through its non-safety related part are more connected to open networks (e.g., local networks, wireless networks, the cloud). In fact, even the safety standards have started considering the inclusion of security aspects on their life cycle. 3.2.1 Security and Mixed-Criticality On this mixed-criticality area, there are several hardware and software mechanisms to protect critical parts from the non-secure ones. For instance, in software, the same spatial and temporal separation mechanisms used on hypervisors to isolate partitions from design faults could prevent attacker to access safe (now also secure) partitions from the non-safe (or non-secure) partition. The US Government has a Protection Profile for Separation Kernels in Environments Requiring High Robustness [22] which is commonly known as the Separation Kernel Protection Profile (SKPP). Separation kernel is defined by SKPP as "hardware and/or firmware and/or software mechanisms whose primary function is to establish, isolate and separate multiple partitions and control information flow between the subjects and exported resources allocated to those partitions" [22] It has to be proved that there is not any unexpected channel for information between domains.
  • 23. Page 23 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 This protection profile specifies the security evaluation criteria so that a given system, in case is compliant with, can be certificated under the Common Criteria (also called IEC-15408) standard. It has to be mentioned that the Common Criteria certification does not assure security, albeit it guarantees that the declaration and specification about the system given is true or not [21] . One of the commercial real-time operative systems which is compliant with this protection profile is INTEGRITY-178B by Green Hills Software Inc. [23] This system was used as baseline to partly implement a software crypto demonstrator in the separation kernel by J. Frid [24] In addition, a state of art concerning separation kernels from a historical and technical perspective is provided. Similarly, although in hardware, the ARM Trustzone technology [25] is able to separate the execution environment between two different worlds: secure and normal (non-secure). This security feature is achieved by dividing all the hardware and software resources of the system on chip so that they exist in those two worlds. The system is designed in such way that it ensures that no secure world resources can be accessed by the normal world elements. However, secure world resources have access to the non-secure ones. Thus, employing this technology, a single physical core is able to securely and efficiently execute code from both secure and normal worlds, which removes the necessity of another dedicated processor core. 3.2.2 Low-power Security and low-power are coupled in the sense that more secure versions of the same cryptographic algorithm are also more power hungry. In [27] [28] [29] one can see the different comparisons of several cipher algorithms and their performance depending on the power consumptions. The power consumption itself could be also a trace for attacker to get information on the encryption algorithm and a way to hack secret key. In [29] dynamic voltage and frequency scaling (“switching”) is used to distortion the power consumption trace and further protect the secret key integrity. A kind of attack could also consider to hack the system so e.g., requesting to perform a task that increases the consumption and makes the system out of battery, but few bibliography have been found on this track.
  • 24. Page 24 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 ZYNQ PLATFORM POWER MANAGEMENT CAPABILITIES4. In this section we will elaborate on the Zynq platform from Xilinx with its provided low- power capabilities. 4.1 Overview of low-power features Figure 3 depicts an overview of the Zynq-7000 SoC. The Zynq SoC comprises a Processing System (PS) part and a Programmable Logic (PL) part combining both the computing power of a dual-ARM platform (PS) at 866 MHz and the flexibility of FPGAs (PL) on one SoC. The ARM dual core is connected to the peripherals via a central Interconnect. On the left side the available interfaces are shown which can be connected to the pinout of the MPSoC by the I/O Mux. The Application Processor unit (APU) has a direct interface to a Multiport DRAM Controller which can also be accessed via the central interconnect and likely the Flash Controller as well as the Programmable Logic (FPGA) part of the MPSoC are accessed via the interconnect. According to [46] the following low-power features are supported by the Zynq SoC:  PL power-off  Cortex A9 processor standby mode  Clock gating for most PS subsystems  Three PLLs can be programmed to minimize power consumption  Subsystem clocks can be programmed for optional clock frequency  Programmable voltage for I/O Banks:  MIO: HSTL 1.8V, LVCMOS 1.8V, 2.5V and 3.3V  DDR: DDR2 1.8V, DDR3 1.5V and LPDDR2 1.2V  DDR3 and LPDDR2 low power mode  DDR 16 or 32-bit data I/O In the following, we will take a look at each feature and describe how these features can be used in the context of power management.
  • 25. Page 25 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Figure 3: Structural overview of Xilinx Zynq-7000 family [46] 4.2 Power Rails/Domains Figure 4 shows the different power domains for the PS and PL of the Zynq-7000 SoC devices. An interesting fact is that the PS and PL have separate power domains allowing for e.g. to power down the PL independent from PS for power saving purposes. Figure 4: Power Domains of the Zynq-7000 SoC [47] The detailed description of the single power pins for the PS and PL parts is depicted in
  • 26. Page 26 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Table 1 [46].
  • 27. Page 27 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Table 1: Detailed description of Zynq power pins [46] 4.3 Clock Control Figure 5 shows the main components PS Clock subsystem of the Zynq SoC. As seen all clocks generated by the PS are derived either from the ARM PLL, the I/O PLL or the DDR PLL where each of them triggers also the corresponding peripheral [46]. The bypass control block mode and the frequency of each PLL are independently controllable via software instructions. In addition, the PS clock module provides four frequency-programmable clocks (FCLKs) to the PL which can also be individually controlled for providing different frequencies. Similar to the clock control subsystem, the PS includes also a reset subsystem which also provides four individually programmable reset signals to the PL [46].
  • 28. Page 28 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 Figure 5: Xilinx Zynq PS Clock Subsystem [46] 4.4 Processor System (PS): Power Management In the following a brief description of the power management techniques is given available at the PS level of the Zynq board. 4.4.1 PS Peripherals Clock Gating As already mentioned, several clock domains are supported in the PS, each with independent clock-gating control. During runtime, unused clock domains can be shut down and clocks for PS peripherals such as timers, DMA, SPI, QSPI, SDIO, and the DDR controller can be gated to reduce dynamic power dissipation [48] . 4.4.2 Caches Concerning the caches on the PS part, the L2 cache controller supports dynamic clock gating and standby Mode. By the dynamic clock gating, the L2 controller clock stops as the ARM controller becomes idle after some delay cycles (32 cycles). Similarly in the standby mode the L2 cache controller internal clocks are also stopped but here only for a specific state of the processor: the Wait-For-Interrupt (WFI) state [46] while still maintaining the processor and RAM power. By applying the dynamic clock gating in the WFI state, the power is now only influenced by the leakage currents and the clocking of small logic responsible for the wake-up condition [48] .
  • 29. Page 29 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 4.4.3 On-Chip Memory (OCM) The OCM, a low-latency memory with a 256 KB of RAM and 128 KB of ROM (boot ROM), can be used to reduce overall power of the system. In the case where the DDR is in a low-power mode, the OCM can be used to store executable code which clearly reduces the power consumption due to the minimal power dissipation of the OCM [48] . 4.4.4 Snoop Control Unit (SCU) The SCU block manages the data coherency between the two ARM CPUs and the L2 cache. The SCU internal clocks can be stopped if the standby state is enabled. If the CPUs are in WFI mode, no pending requests from the Accelerator Coherency Port (ACP) and no remaining activity is expected at this stage in the SCU. If one of the above conditions is not more satisfied the SCU resumes its normal operation [48] . 4.4.5 PLL In general the clocks in the PS can be dynamically slowed down or gated off to reduce power. According to [46], the PLL power consumption is dependent on the PLL output frequency, thus power consumption can be reduced by using a lower PLL output frequency. Power can also be reduced by powering down unused PLLs. For example, if all clock generators can be driven by the DDR PLL, then the ARM core and I/O PLLs can be disabled to reduce power consumption. The DDR PLL is the only unit that can drive all of the clock generators. Each clock can be individually disabled when not in use. In some cases, individual subsystems contain additional clock disable capabilities and other power management features. 4.4.6 Physical Memory Zynq-7000 AP SoCs [46]support different types of physical memory, such as DDR2, DDR3, and LPDDR2. Minimizing DDR power consumption would have great impact on the overall system power. In order to realize this, the following issues (listed in [46]) should be taken into consideration:  The DDR controller operating speed,  The choice of DDR width and whether ECC is enabled or disabled,  The number of DDR chips used,  The DDR type, such as using LPDDR for significant voltage reductions,  The use of different DDR modes during low power operation, such as DDR self- refresh mode. 4.4.7 Firmware support The Zynq Linux kernel supports the following power management states [48] :  Frequency scaling: The Linux framework utilizes [47] the cpufreq framework used to scale the CPU frequency,
  • 30. Page 30 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0  Low-power idle (CPUidle): This is a low-power state which takes places when the CPU is idle. CPUidle drivers are used to manage the CPU idle levels (set CPU to low power state when CPU is in the WFI state),  Suspend power management state which are used to enter sleep states like the well known suspend to disk/RAM on laptops, supporting three different states [48] [48] :  S0: Freeze or Suspend-To-Idle. This is a generic, pure software, light-weight, system sleep state. It allows more energy sparing compared to the low-power idle state by freezing the user space and putting all I/O devices into low-power states such that the processors can spend more time in their idle states.  S1: Standby or power-on suspend. If supported, in this state, all processor caches are flushed and instruction execution stops. Power to the processor and RAM is maintained. In addition to freezing user space and putting all I/O devices into low-power states, which is done for Suspend-To-Idle too, non- boot CPUs are taken offline and all low-level system functions are suspended during transitions into this state. For this reason, it should allow more energy to be saved relative to Suspend-To-Idle, but the resume latency will generally be greater than for that state.  S3: Suspend-to-RAM. This state, if supported, significant energy sparing can be reached as every component in the system is put into a low-power state except for memory. System and device state is saved to memory. All devices are suspended and powered off. RAM remains powered and is put to the self- refresh mode to retain its contents. In addition, frameworks supporting hardware monitoring are also available in the Linux kernel [47] :  XADC for temperature and voltage monitoring,  UCD9248 for voltage and current monitoring of PWM controllers on Xilinx platforms like ZC702,  UCD90120 for power supply sequencer and monitors used on Xilinx platforms like ZC702. 4.5 Programmable Logic (PL): Power Management The following power-management features are possible for the PL part of the Zynq [47] :  Logic Resource Utilization minimization of Logic Resource Utilization would contribute positively to the power reduction.  Managing Control Sets1: Avoiding the usage of both a set and reset on a register or latch and using active-high control signals, have proven to be efficient in terms of power reduction. In addition, reducing the use of sets and resets improves device utilization, resulting in reduced power. 1 Control sets are signals that control synchronous elements such as clock, set, reset and clock enable.
  • 31. Page 31 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0  PL Frequency scaling: it is possible to reduce the frequency of the logical part if not necessary to run at full speed all the time.  Clock Gating: as the dynamic power consumption of the PL is mainly affected by the clock frequency (fclk), gating the local clock-enable or data paths stops the switching activity and eliminates unnecessary toggling when results of these paths are not used which in turn helps reducing power.  BRAM: to save power, the block RAM enable can be driven low when the block RAM is not used.  PL Data Retention: After gating the PL clocks the voltage of the PL could be reduced to a retention level (V_DRINT= 0.75V) reducing the static power consumption. Below this voltage retention level the configuration data might be lost.  PL-Power-down Control: the power to the PL can be completely shut down to reduce power consumption which is possible due to independent power supplies of both PL and PS. The power supplies of the PL which can be turned off are: VCCINT, VCCAUX, VCCBRAM and VCCO. When using this technique the configuration of the PL is lost and reconfiguration is needed. 4.6 Monitoring For monitoring the performance of the Znyq SoC, on-chip performance counters are supported which monitors single components and which can be used to estimate the power consumption of the system. In addition, special sensors can be connected to the SoC after finishing the implementation for a more accurate power consumption measurement. 4.6.1 Performance Counters Several performance counters are supported on the Zynq SoC, to monitor the system components [48] :  SCU Global Timer (PS). The SCU global timer can be used to timestamp system events in a single clock domain. Alternatively, operating systems often provide high accuracy timers for software event tracing, such as Linux clock_nano_sleep.  ARM Performance Monitoring Units (PS). Each ARM core has a performance monitoring unit (PMU) that is used to count micro-architectural events. These counters can be accessed directly by software, through operating system utilities, or with chip debuggers such as Linux Perf or ARM Streamline.  L2 Cache Event Counters (PS). The L2 cache has event counters that can be accessed to measure cache performance.  GigE Controller (PS). The gigabit Ethernet controller has statistical counters to track bytes received and transmitted on its interface.  AXI Performance Monitor (PL). This core can be added in PL to monitor AXI performance metrics such as throughput and latency. Trace functions enable time- stamped AXI traces, such as time-stamped start and end of AXI transactions to observe per-transaction latency.
  • 32. Page 32 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0  AXI Timer (PL). This core can be added in PL to provide a free-running timer in PL. This timer is useful for time-stamping events in PL clock domains.  AXI Traffic Generator (PL). This core can generate a variety of traffic patterns to the PS interfaces. When used with an AXI performance monitor, the traffic generator can help provide early system-level performance estimates. The core can be used to estimate data-movement costs and validate design partition choices. These performance counters can be used to construct an analytical power/temperature models (see [39] ) to achieve a rough estimation of the power consumption of the overall SoC. 4.6.2 Monitoring – Power and Temperature Sensors 4.6.2.1 XADC Monitoring The Xilinx analog-to-digital converter (XADC) can be used for monitoring applications on the Zynq SoC. When instantiated in the PL, the PS-XADC can establish connectivity to the XADC via an XADC AXI4-lite interface (see Figure 6). Since the XADC has a power supply and temperatures sensors, control information of the PS can be monitored. Each of these sensors can be configured to hold minimum/maximal thresholds which when violated during runtime alarm signals can be issued [46]. The XADC can be configured via an industrial Linux driver (provided by Xilinx) to ease the monitoring process for the end user [49] . 4.6.2.2 ZC702 Power Measuring using TI Fusion Power Designer There exists several mature works to measure the power consumption of some Zynq-7000 SoC instances via external boards (sensors). One example is the ZC702 board where using Texas Instruments (TI) fusion power Designer, a very accurate power measurement can be established. The main measurement concept rely on the ability to continuously measure and monitor the three digital power controllers (UCD9248) available on the ZC702 board. For detailed setup description with a demo application refer to [47] . Figure 6: XADC in PL for Monitoring PS [46]
  • 33. Page 33 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 4.7 Summary of Low Power Features In the context of SAFEPOWER the ZYNQ provides rich and promising mechanisms for run- time power management as well as for resource management capabilities. Regarding the PS power management, one of these capabilities is the dynamic frequency scaling (DFS). This feature of the processing systems (PS) gives the availability to control the frequency of the ARM dual core or the programmable logic. Together with an external voltage regulator which is configurable on the fly by the operating system it would be possible to realize a system with dynamic voltage and frequency scaling (DVFS). The same method can be used for the external DDR RAM or peripherals. It is also possible to shut down the unused clocks for saving power. If an asymmetric processing of the ARM cores is used, it is possible to use CPU hotplug. With this feature it is possible to bring up or shut down the secondary processor core if it is or is not needed at the time (Vdd and clk for one of the processors set to zero). A wake up of a core can be done by an interrupt for example. Regarding resource management a Linux kernel can be used on the ARM cores offering Linux low-power services. On the PL side clock gating and reset per component are possible. In addition, shutting down the whole PL is also possible (via PL-Power-down Control:), with the penalty of reconfiguration latency. The following table summarizes the basic foundations: Table 2: Summary of Zynq low-Power features Static Dynamic Advanced (hibernate) Monitoring Reduce number of PLLs to minimum DVFS of PS ARM cores Freeze or low-power idle state Performance counters Find lowest possible operating frequency and supply voltage Shutdown of one of the PS ARM cores Standby or power-on suspend XADC on-chip Power and Temperature Monitor Shutdown (power gate) unused PS components DFS of PL softcores (e.g. Microblaze) Suspend-to-RAM External (PCB) power sensors DDR memory selection (as part of the PCB design) Clockgating of PS components FPGA low-power design and synthesis rules/constraints Clockgating of PL softcores (e.g. Microblaze) Logic enable/disable of PL components PL Data Retention or Power shutdown 4.8 Outlook to Advanced Features In the context of SAFEPOWER the Zynq SoC provides architectural services allowing a stable foundation for power-aware CRTES. Depending on these low-level architectural services, the following advanced services which built upon those available in the Zynq should be enabled: 1. Power-aware adaptive execution service for CRTES: For the sharing of processor cores among mixed criticality applications including safety-critical ones, hypervisors (e.g., XtratuM) will be used, which ensure time/space partitioning as
  • 34. Page 34 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 well as power/energy/temperature partitioning for the computational resources. The scheduling of computational resources (e.g., processor, memory) in SAFEPOWER will ensure that each task obtains not only a predefined portion of the computation and energy resources, but also that execution occurs at the right time and with a high level of temporal predictability. The execution services of SAFEPOWER will support the switching between different schedules to react to dynamically changing computational load and varying power/energy/temperature constraints. This dynamic adaptation will be tightly integrated with the underlying low-power mechanisms of the hardware (e.g., DVFS, power gating, power monitoring). The execution environments will be amenable to relevant safety standards and worst-case execution time analysis. 2. Power-aware adaptive communication service for CRTES: SAFEPOWER will provide services for low-power message-based real-time communication among components. Based on an intelligent communication system with a priori knowledge about the allowed behaviour of components in the value and time domain, SAFEPOWER will ensure partitioning with respect to time, space, power, energy and temperature. The configuration of the communication system (e.g., network interfaces, routers) will be dynamically configurable based on varying load conditions and resource availability (e.g., remaining energy) in order to enable clock scaling, power gating in the communication system as well as the reconfiguration of the application. The shared memory model will be supported on top of message-based NoCs. 3. Power, energy and temperature extensions of health monitors: SAFEPOWER will support health monitors for faults such as overuse of shared resources, meeting of power/energy/temperature constraints and deadline exceeding. The health monitoring service will use error detection services in the execution and communication services as well as power/temperature measurement techniques in the hardware. Both automatic reactions of the platform (e.g., automatic reconfiguration) as well as the notification of applications for user-defined reactions will be supported. 4. Power, energy and temperature adaptation services for CRTES: These services will change the scheduling and allocation of communication and computational resources at runtime in order to exploit the potential for low power, energy and temperature awareness, while at the same time considering the requirements of certification, real-time constraints and time/space partitioning. In particular, solutions for the dynamic reconfiguration of time -triggered execution and communication systems will be provided. Branching points in time-triggered schedules will be analysed and optimized at design time, thereby obtaining temporal guarantees and maintaining the benefits of time -triggered control for CRTES such as temporal predictability, implicit synchronization of resource accesses and service executions.
  • 35. Page 35 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 LOW-POWER SERVICES5. Here the needed architectural low-power services for the generic platform are defined taking into account hardware level power management techniques. For e.g. start/stop of any low-power technique, setting system to sleep, fault tolerance, communications, diagnostics etc. 5.1 Low power services of the hypervisor/OS 5.1.1 Monitoring services The hypervisor will provide services for monitoring the registers of the power/temperature measurements. Partitions can invoke these services to analyse the evolution and detect abnormal situations. A High-level Health Monitor can be in charge of the periodic data acquisition, analysis and decisions of the system behaviour. For instance:  Service: <Power/temperature monitor>  Returns a data structure with the measurements.  Comment: only partitions with system rights can succeed in the service. 5.1.2 Idle management Processor idle time can be the result of early completion execution time of partitions or slack times in the scheduling plan of partition. Concerning to slack times in the scheduling plan, the hypervisor schedules partitions under a cyclic schedule in each core (see Figure 7). The cyclic schedule defines a set of temporal windows (slots) in a temporal frame (Major Frame or MAF) specifying the following parameters:  offset of the slot with respect the MAF origin (start time of the slot)  slot duration  partition identifier The cyclic schedule is static and it is generated as result of the requirements of the applications. As result of the analysis, the schedule plan for each core can define empty slots that correspond with the idle time. Figure 7: Example of schedule plan for 1 core
  • 36. Page 36 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 The behaviour of the hypervisor when the next slot to be executed is an idle time slot should be specified. This behaviour could be the same or different for each core. In an initial decision, we assume that the behaviour will be the same for all cores. The hypervisor can define several behaviours for the idle management that can be selected when the hypervisor is configured and compiled. The hypervisor default behaviour when an idle slot is found in the schedule could be: 1. Do not modify the core behaviour. 2. Use a default low power consumption mode of the core 3. Select a low power consumption mode of the core based on the overhead of core mode and idle time available. 4. Define a default operational frequency (ex. the lowest frequency available). Additionally to the default behaviour of the hypervisor, a partition with system rights could modify the behaviour of the core during the execution according to accommodate with the energy conditions. In this case, a hypervisor service to set the behaviour in execution time is defined.  Service: <Set idle time behaviour>  Returns nothing.  Comment: only partitions with system rights can succeed in the service. 5.1.3 Management of the end of partition activity A second idle management can occur when a partition finishes its internal activities prior the end of the slot. When the scheduling is generated, a partition slot is allocated as consequence of the temporal needs of the partition. These temporal needs can be the occurrence of several internal periodic tasks that are analysed in its worst case execution time (WCET). So, the result of the schedule is a slot that can contain several task instantiation in the WCET. Figure 8: Execution of a slot In Figure 8 a possible execution of the P1 that involves 3 tasks is shown. The slot has been defined according to the WCET of each task. In an execution of the slot, the task finishes its activity in an execution time smaller than the expected WCET. This means that at the end of the slot there is a remaining slot time at partition level. In that case, if the guestOS informs the hypervisor the end of the slot activity, the hypervisor can apply the policy for the idle time.  Service: <End of the partition activity in the slot>
  • 37. Page 37 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0  Returns nothing.  Comment: all partitions can invoke this service. 5.1.4 Adaption of the processor frequency to the partition needs The off-line scheduling analysis of the system has to generate the cyclic schedule for all cores in the system. This cyclic schedule for each core can be generated according to the temporal constraints but also w.r.t the energy constraints. Taking into account the optimizations that the schedule generation can produce, the partitions are allocated into temporal slots running at a specified frequency. Of course, the execution time of the partition in a slot at a specified CPU frequency will vary depending on the selected frequency. To cover this functionality, the slot definition is extended to cover the minimal frequency at which the partition should be executed. So, the new slot definition is:  offset of the slot with respect the MAF origin (start time of the slot)  slot duration  partition identifier  minimal frequency When the slot is executed in a core, the hypervisor will adapt the core frequency to the minimal frequency specified in the configuration file. In any case, the partition could increase/decrease the frequency in each slot execution but not decreases the frequency beyond the minimum frequency defined and not increases the frequency beyond the maximum frequency allowed in the configuration file. The former is in order to do not compromise the deadline constraints of the tasks, and the second allows limiting the power consumption of the partition. A service for the partition is defined to change the frequency during the slot execution.  Service: <change the processor frequency during this slot execution>  Returns nothing.  Comment: all partitions can invoke this service.  Note: the frequency change only affect to the current slot execution. Next partition activations will be executed at the frequency specified in the configuration file. 5.1.5 Device management Devices are handled by the guest OSs and not at hypervisor level. It is responsibility of the guestOS to put the device in the appropriated mode at the end of the slot activities. Hypervisor could offer services to sleep/wake-up the devices, but the guest OS is who decide when these actions should be performed.
  • 38. Page 38 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 5.1.6 Coordination OS and Hypervisor guestOS and hypervisor should coordinate its activities to cooperate in the energy management of the system. This coordination involves the previously defined services that are invoked by the guestOS to the hypervisor. 5.1.7 System information In a partitioned system, only partitions with system rights are recognized by the hypervisor to get information or take actions related with the system or other partitions. This is a security requirement that is applied by the hypervisor. As consequence, only partitions with system rights can access to the information of the system. Partition rights are defined by the system integrator and are statically defined in the configuration file. A partition with the system rights can invoke the system get information to know the system status. System status is a data structure that contains several fields related to the system execution including the performance registers that are used.  Service: <Get system information>  Returns data structure of the system status.  Comment: only partitions with system rights can succeed in the service.  Note: Currently this service exists in XtratuM but a redefinition of the data structure to include the new information is needed. 5.2 Generation of low-power NoCs with NoC MPSoC system generators 5.2.1 State of the art There has been a multitude of studies on how to make power-efficient on and off-chip networks, but they mainly focus on implementation details and the power behaviour of the NoC itself, which is of limited interest to this study. Then there are those that aim to generate the NoC itself. These pure NoC generators typically produce VHDL and/or Verilog code for a certain type of NoC, with a bunch of parameters to modify its settings, together with a Network Interface so it can be operated from a test bench. However, in general they do not provide support for integration into a working MPSoC system. The third category of providers is those that generate an entire MPSoC system, but are less flexible when it comes to generate different types of NoCs. Instead, they focus on reducing the efforts of integrating the NoC into the system, and how to generate a working HW/SW MPSoC system. This is of high interest to this study, not only because it raises the TRL considerably, but also lets us explore the predictability and thereby the safety of the final system. The fourth category of providers is the commercial ones, with ready-made NoC chip solutions, ready to be integrated as an add-on to some FPGA boards. However, since these systems come with a fixed notion of NoC and memory hierarchy, they are of less use since they were not designed with predictability and programmability using Models-of- Computations in mind. In the coming subsections, we will go through SoA research that is relevant to this study.
  • 39. Page 39 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 5.2.2 Low-Power Aspects Reducing power when it comes to implementing NoC structures is pretty straightforward. There are only two parameters to play around with: area and frequency. Since power is proportional to the switched capacitance, which in turn is proportional to the area, the rationale behind minimizing area is: the smaller the design, the lower the power consumption will be. This opts for implementing the NoC structures in a bit-serial manner. However, for constant throughput, the frequency must then be increased with the same factor as area was reduced, gaining very little in the end. You only gain if the channel is silent for long periods of time. The other option is to reduce the switching frequency, either the operating/clock frequency of the switch/routers or the frequency of the data traffic frequency in the network. Since it is hard to reduce clock frequency in the switch network, asynchronous communication has been suggested between the switch/router nodes [57]. The rationale behind this is that the network should only switch and consume power when it has something to switch. However, asynchronous NoCs are difficult to make predictable, and how to start sending once the reset signal is globally released is a really interesting problem. Thus, most designs continue to be synchronous implementations. For a comprehensive overview of methods to reduce power in NoCs, we refer the reader to [56]. 5.2.3 Open Source NoC Generators Many NoC structures have been suggested over the years since the first paper with the word NoC in the title was published in the year 2000 [58]. A few of those has even been released as open source, but then mainly for use as a help for doing research. The more interesting of these are the ones that come with a generator, i.e., with a method or program that allows to user to configure the NoC according his or her needs. For instance, the CONNECT tool from Carnegie-Mellon University [59][60] can generate Verilog code for various NoC implementations, but the code is released under copyright and cannot be reused by anyone else to create a commercial product. The Atlas framework, developed by the GAPH group at PUCRS in Brazil [61][62], can produce different NoC topologies and generate synthesizable VHDL files. Another example is Netmaker from University of Cambridge. It is a library of fully- synthesizable parameterized Network-on-Chip (NoC) implementations, released under GPL license, so it cannot be used for commercial purposes either. Others generate only parts of the NoC, for instance [64], from Stanford, which presents a parameterized RTL implementation of a state-of-the-art VC router, or HNOCs, which is targeted for simulation of Heterogeneous NoCs. 5.2.4 NoC MPSoC System Generators The most interesting NoC Generators are those that come with a complete design flow, which let designers to compose entire MPSoC systems, including SW stack and Device Drivers. These generators have in common that they use an XML description to specify the MPSoC system. The XML is then used as input to a generator program that then produces the actual implementation. They are typically limited to a few NoC type and topologies, but focus instead on ease of use and producing a working design that is correct-by-construction. The two most prominent ones are the NoC System Generator from KTH in Sweden
  • 40. Page 40 of 54 D1.2: Analysis and selection of low power techniques, services and patterns Version 1.0 [50][51][55][66], the CompSoC platform from Eindhoven and Delft in the Netherlands [67]. The KTH System Generator is a tool suite using the Nostrum NoC from KTH. It has a GUI as frontend for entering the MPSoC system, and uses MoCs inspired by the ForSyDe methodology. The tool suite is also FPGA vendor agnostic. The tool generates an image of its internal representation of the system in the target FPGA tool vendors own frontend language, i.e., it generates sopc, or qsys files for Altera, and mhs and vivado tcl scripts for Xilinx implementations. The CompSoC platform is centered around the Aethereal NoC, and targets Xilinx technology. It also has hooks for importing designs generated by the ForSyDe methodology. 5.3 Utilization of low-power services at higher abstraction levels In order to utilize the techniques presented in this deliverable, it is of key importance that the properties and services of the architecture can be used at higher levels of abstractions in the design flow. Designing a power-efficient mixed-criticality system, where several applications share the same platform is extremely challenging. Thus the design process should start at a high level of abstraction and needs to be supported by tools for design space exploration and synthesis. Given a set of application models with individual design constraints, a set of global constraints, and a platform model, the objective of the DSE activity is to find an efficient implementation of all applications on the shared platform that satisfies all individual and global design constraints. In the context of mixed-criticality systems it is of utmost importance that the DSE process can give guarantees that all constraints in terms of timing and power will be met by the proposed implementation. The techniques discussed in this deliverable and the research within the SAFEPOWER project is key prerequisites for a DSE- tool because of the QoS-guarantees that can be provided by the platform. The DSE-tool, developed partially in the CONTREX-project and presented in [52], formulates the DSE- problem as constraint satisfaction problem and captures applications as synchronous data flow graphs [54] and uses a predictable MPSoC platform with TDM-bus. A solution does not only give a mapping of SDF-actors to processing elements, but also generates the schedule for the set of actors on each processing element and schedules the communication on the TDM-bus. However, the tool focuses only on timing guarantees and does not take power into account. The analytical DSE-tool [52] has been combined with a simulation-based DSE- tool in [53] into a joint analytical and simulation-based DSE tool to analyse typical scenarios in addition to the worst case through simulation. The simulation tool can also be equipped with power-models to give the power consumption for a typical scenario. However, there is still a lack of good power models for higher levels of abstraction, while it is already possible to give good timing models for predictable platforms, which makes it so far difficult for DSE- tools to give absolute power guarantees. The NoC system generator [55] is a tool that can generate both the hardware in form of a configurable NoC and configurable tiles and the software automatically for Altera and Xilinx- FPGAs. The tool currently uses a heart-beat model [50] to support applications to support applications modelled with a synchronous models of computation, and can generate implementations from Simulink models [51]. Inside the SAFEPOWER project, the NoC system generator will be extended to support techniques for low power predictable NoC-s. Furthermore, an integration of a DSE-tool into the NoC-system generator would facilitate system design, because then the designer can