Applications and hardware complexity management in modern systems tend to collide with efficient resource and power balance. Therefore, dedicated and power-aware design frameworks are necessary to implement efficient multi-functional runtime reconfigurable signal processing platforms. In this work, we adopt dataflow specifications as a starting point to challenge power minimization.
Automated Power Gating Methodology for Dataflow-Based Reconfigurable SystemsMDC_UNICA
Modern embedded systems designers are required to implement efficient multi-functional applications, over portable platforms under strong energy and resources constraints. Automatic tools may help them in challenging such a complex scenario: to develop complex reconfigurable systems while reducing time-to-market. At the same time, automated methodologies can aid them to manage power consumption. Dataflow models of computation, thanks to their modularity, turned out to be extremely useful to these purposes. In this paper, we will demonstrate as they can be used to automatically achieve power management since the earliest stage of the design flow. In particular, we are focussing on the automation of power gating. The methodology has been evaluated on an image processing use case targeting an ASIC 90 nm CMOS technology.
Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL...MDC_UNICA
pecialized hardware infrastructures for efficient multi-application runtime reconfigurable platforms require to address several issues. The higher is the system complexity, the more error prone and time consuming is the entire design flow. Moreover, system configuration along with resource management and mapping are challenging, especially when runtime adaptivity is required. In order to address these issues, the Reconfigurable Video Coding Group within the MPEG group has developed the MPEG RMC standards ISO/IEC 23001-4 and 23002-4, based on the dataflow Model of Computation. In this paper, we propose an integrated design flow, leveraging on Xronos, TURNUS, and the Multi-Dataflow Composer tool, capable of automatic synthesis and mapping of reconfigurable systems. In particular, an RVC MPEG-4 SP decoder and the RVC Intra MPEG-4 SP decoder have been implemented on the same coarse-grained reconfigurable platform, targeting a Xilinx Virtex 5 330 FPGA board. Results confirmed the potentiality of the approach, capable of completely preserving the single decoders functionality and of providing, in addition, considerable power/area benefits with respect to the parallel implementation of the considered decoders on the same platform.
Power gating is the main power reduction techniques for the static power. As long as technology scaling is taking place, static power becomes paramount important factor to the VLSI designs.Therefore Power gating is the recent power reduction technique that is actively in research areas.
Automated Power Gating Methodology for Dataflow-Based Reconfigurable SystemsMDC_UNICA
Modern embedded systems designers are required to implement efficient multi-functional applications, over portable platforms under strong energy and resources constraints. Automatic tools may help them in challenging such a complex scenario: to develop complex reconfigurable systems while reducing time-to-market. At the same time, automated methodologies can aid them to manage power consumption. Dataflow models of computation, thanks to their modularity, turned out to be extremely useful to these purposes. In this paper, we will demonstrate as they can be used to automatically achieve power management since the earliest stage of the design flow. In particular, we are focussing on the automation of power gating. The methodology has been evaluated on an image processing use case targeting an ASIC 90 nm CMOS technology.
Automated Design Flow for Coarse-Grained Reconfigurable Platforms: an RVC-CAL...MDC_UNICA
pecialized hardware infrastructures for efficient multi-application runtime reconfigurable platforms require to address several issues. The higher is the system complexity, the more error prone and time consuming is the entire design flow. Moreover, system configuration along with resource management and mapping are challenging, especially when runtime adaptivity is required. In order to address these issues, the Reconfigurable Video Coding Group within the MPEG group has developed the MPEG RMC standards ISO/IEC 23001-4 and 23002-4, based on the dataflow Model of Computation. In this paper, we propose an integrated design flow, leveraging on Xronos, TURNUS, and the Multi-Dataflow Composer tool, capable of automatic synthesis and mapping of reconfigurable systems. In particular, an RVC MPEG-4 SP decoder and the RVC Intra MPEG-4 SP decoder have been implemented on the same coarse-grained reconfigurable platform, targeting a Xilinx Virtex 5 330 FPGA board. Results confirmed the potentiality of the approach, capable of completely preserving the single decoders functionality and of providing, in addition, considerable power/area benefits with respect to the parallel implementation of the considered decoders on the same platform.
Power gating is the main power reduction techniques for the static power. As long as technology scaling is taking place, static power becomes paramount important factor to the VLSI designs.Therefore Power gating is the recent power reduction technique that is actively in research areas.
Automatic Generation of Dataflow-Based Reconfigurable Co-processing UnitsMDC_UNICA
Hardware accelerators are widely adopted to speed up computationally onerous applications. However their design is not trivial, especially if multiple applications/kernels need to be served. To this aim the Multi-Dataflow Composer (MDC) tool can be adopted to generate the internal computing core of flexible and reconfigurable hardware accelerators. Nevertheless, MDC is not able, as it is, to deploy ready to use accelerators. To address this lack, we conceived a fully automated design flow for coarse-grained reconfigurable and memory-mapped hardware accelerators, which required: a) the definition of a generic co-processing template, the Template Interface Layer; b) the extension of the MDC tool to characterize such a template and deploy the accelerators. This methodology represents, within MPEG Reconfigurable Video Coding studies, the first framework for the automatic generation of reconfigurable hardware accelerators and, as it will be discussed, it may be beneficial also in other contexts of execution with fairly limited adjustments. Results validated the proposed approach in a real use case scenario, comparing the automatically generated co-processor with a previous custom one.
The SKA Project - The World's Largest Streaming Data Processorinside-BigData.com
In this presentation from the 2014 HPC Advisory Council Europe Conference, Paul Calleja from University of Cambridge presents: The SKA Project - The World's Largest Streaming Data Processor.
"The Square Kilometre Array Design Studies is an international effort to investigate and develop technologies which will enable us to build an enormous radio astronomy telescope with a million square meters of collecting area."
Watch the video presentation: http://wp.me/p3RLHQ-cot
The insatiable desire for more capacity and faster datarate arising from the explosion in bandwidth-intense applications necessitates deployment of higher capacity WDM systems in both inside datacenter and metro datacenter interconnects systems. This demand for higher datarates is reflected by the numerous standardization and study groups commenced in the last couple of years. While the standardization scrutinizes the requirements for smooth migration for a datacenter interconnect with distances of upto 40km, the upgrade to high capacity DC metro access networks (beyond 40km), where no standardization exists, needs to address the economic viability together with greater space, power and bandwidth efficiency requirements. This presentation focuses on both of these areas and elucidates the current trends and evolving technologies to deal with the dynamically shifting landscape of DCI.
How Solar Siting Surveys identify the potential for local solar generation (2...Clean Coalition
The Clean Coalition held a Peninsula Advanced Energy Community (PAEC) Solar Siting Survey webinar on February 27, 2018. Program Engineer Bob O’Hagan presented. The Clean Coalition conducts Solar Siting Surveys to help utilities, municipalities, community choice aggregators, local governments, and communities assess the potential for local solar generation.
CarbonNet storage site characterisation and selection processGlobal CCS Institute
The CarbonNet Project has undertaken an extensive geoscience evaluation programme to identify, characterise and select prospective offshore storage sites in the nearshore Gippsland Basin, in south eastern Australia.
The process builds upon basin and regional assessments undertaken at the national level, and focuses upon leads and play fairs assessed using a vast amount of geological data available from 50 years of petroleum exploration and developments in the basin.
CarbonNet geoscience work has been subject to independent scientific peer reviews, and external assurance certification by Det Norske Veritas against the recommended practise for geological storage of carbon dioxide (CO2) J203.
CarbonNet now holds five greenhouse gas assessments permits providing exclusive rights to explore, appraisal and develop a portfolio of CO2 storage sites.
The project has identified a prioritised storage site capable of storing in excess of 125 Mt of CO2 for which a 'Declaration of Storage' has been prepared which demonstrates the 'fundamental determinants' and probability assessment of potential CO2 plume paths as required under Australian CCS legislation'.
This webinar will be presented by Dr Nick Hoffman, CarbonNet Geosequestration Advisor, and will provide an overview of CarbonNet geoscience evaluation programme, referencing the relevant knowledge share products available on the Global CCS Institute website.
On Thursday 19 November 2015, the British Embassy in Paris hosted a second trilateral workshop with French, German and British delegates from the research, government and business sectors to discuss the importance of energy storage.
Router Analysis, Inc. tested the Cisco ASR1004 and 1006 routers with the latest RP2, ESP40 and SIP40 hardware. The router was outfitted with Six 5 port GE SPAs and 2 10GE SPAs.
Overall the results were as expected, with the router performing within the specs provided by Cisco. For some tests the router performed at a lesser rate that expected.
Read the report to find out more.
Sustainability: Actors, Behavior, and Transparency
Part 1: A Graph-based Perspective to Footprint Assessment
Part 2: SmartPacket - Redistributing the Routing Intelligence among Network Components in SDNs
Part 3: Profiling without ‘Profiling’ – Use Case of a Federated Approach to Resource Management in Smart House
Part 4: A Multi-Entity Input Output (MEIO) Approach to Sustainability - Water-Energy-GHG (WEG) Footprint Statements
Part 5 (Afternoon): Dynamic Network Topology-on-Demand for SDNs Using Failure-resilience Generalized Topologies of Physical Underlay
Power and Clock Gating Modelling in Coarse Grained Reconfigurable SystemsMDC_UNICA
Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and exibility, designers often opt for coarse-grained recongurable (CGR) systems. Nevertheless, these systems require specic attention to the power problem, since large set of resources may be underutilized while computing a certain task.
This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The
proposed flow guides designers towards optimal implementations, saving designer eort and time.
The Square Kilometre Array is currently undergoing the Preliminary Design Reviews for its composing elements, and is thus at a critical point on its way to becoming ready for construction starting in 2018. In this talk we will provide an overview of the SKA, its composing elements, and their status, with emphasis on the Telescope Manager and the Science Data Processor, respectively the Monitoring & Control system, and Pipeline. We will see how do they compare with their ALMA equivalents, and how is the SKA similar/different from ALMA.
The Reconfigurable Video Coding (RVC) framework is a recent ISO standard aiming at providing a unified specification of MPEG video technology in the form of a library of components. The word “reconfigurable” evokes run-time instantiation of different decoders starting from an on-the-fly analysis of the input bitstream.
In this paper we move a first step towards the definition of systematic procedures that, based on the MPEG RVC specification formalism, are able to produce multi-decoder platforms, capable of fast switching between different configurations. Looking at the similarities between the decoding algorithms to implement, the papers describes an automatic tool for their composition into a single configurable multi-decoder built of all the required modules, and able to reuse the shared components so as to reduce the overall footprint (either from a hardware or software perspective).
The proposed approach, implemented in C++ leveraging on Flex and Bison code generation tools, typically exploited in the compilers front-end, demonstrates to be successful in the composition of two different decoders MPEG-4 Part 2 (SP): serial and parallel.
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerMDC_UNICA
Dataflow Model of Computation (D-MoC) is particularly suitable to close the gap between hardware architects and software developers. Leveraging on the combination of the D-MoC with a coarse-grained reconfigurable approach to hardware design, we propose a tool, the Multi-Dataflow Composer (MDC) tool, able to improve time-to-market of modern complex multi-purpose systems by allowing the derivation of HDL runtime reconfigurable platforms starting from the D-MoC models of the targeted set of applications. MDC tool has proven to provide a considerable on-chip area saving: the 82% of saving has been reached combining of different applications in the image processing domain, adopting a 90 nm CMOS technology. In future the MDC tool, with a very small integration effort, will also be extremely useful to create multi-standard codec platforms for MPEG RVC applications.
More Related Content
Similar to Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Strategy
Automatic Generation of Dataflow-Based Reconfigurable Co-processing UnitsMDC_UNICA
Hardware accelerators are widely adopted to speed up computationally onerous applications. However their design is not trivial, especially if multiple applications/kernels need to be served. To this aim the Multi-Dataflow Composer (MDC) tool can be adopted to generate the internal computing core of flexible and reconfigurable hardware accelerators. Nevertheless, MDC is not able, as it is, to deploy ready to use accelerators. To address this lack, we conceived a fully automated design flow for coarse-grained reconfigurable and memory-mapped hardware accelerators, which required: a) the definition of a generic co-processing template, the Template Interface Layer; b) the extension of the MDC tool to characterize such a template and deploy the accelerators. This methodology represents, within MPEG Reconfigurable Video Coding studies, the first framework for the automatic generation of reconfigurable hardware accelerators and, as it will be discussed, it may be beneficial also in other contexts of execution with fairly limited adjustments. Results validated the proposed approach in a real use case scenario, comparing the automatically generated co-processor with a previous custom one.
The SKA Project - The World's Largest Streaming Data Processorinside-BigData.com
In this presentation from the 2014 HPC Advisory Council Europe Conference, Paul Calleja from University of Cambridge presents: The SKA Project - The World's Largest Streaming Data Processor.
"The Square Kilometre Array Design Studies is an international effort to investigate and develop technologies which will enable us to build an enormous radio astronomy telescope with a million square meters of collecting area."
Watch the video presentation: http://wp.me/p3RLHQ-cot
The insatiable desire for more capacity and faster datarate arising from the explosion in bandwidth-intense applications necessitates deployment of higher capacity WDM systems in both inside datacenter and metro datacenter interconnects systems. This demand for higher datarates is reflected by the numerous standardization and study groups commenced in the last couple of years. While the standardization scrutinizes the requirements for smooth migration for a datacenter interconnect with distances of upto 40km, the upgrade to high capacity DC metro access networks (beyond 40km), where no standardization exists, needs to address the economic viability together with greater space, power and bandwidth efficiency requirements. This presentation focuses on both of these areas and elucidates the current trends and evolving technologies to deal with the dynamically shifting landscape of DCI.
How Solar Siting Surveys identify the potential for local solar generation (2...Clean Coalition
The Clean Coalition held a Peninsula Advanced Energy Community (PAEC) Solar Siting Survey webinar on February 27, 2018. Program Engineer Bob O’Hagan presented. The Clean Coalition conducts Solar Siting Surveys to help utilities, municipalities, community choice aggregators, local governments, and communities assess the potential for local solar generation.
CarbonNet storage site characterisation and selection processGlobal CCS Institute
The CarbonNet Project has undertaken an extensive geoscience evaluation programme to identify, characterise and select prospective offshore storage sites in the nearshore Gippsland Basin, in south eastern Australia.
The process builds upon basin and regional assessments undertaken at the national level, and focuses upon leads and play fairs assessed using a vast amount of geological data available from 50 years of petroleum exploration and developments in the basin.
CarbonNet geoscience work has been subject to independent scientific peer reviews, and external assurance certification by Det Norske Veritas against the recommended practise for geological storage of carbon dioxide (CO2) J203.
CarbonNet now holds five greenhouse gas assessments permits providing exclusive rights to explore, appraisal and develop a portfolio of CO2 storage sites.
The project has identified a prioritised storage site capable of storing in excess of 125 Mt of CO2 for which a 'Declaration of Storage' has been prepared which demonstrates the 'fundamental determinants' and probability assessment of potential CO2 plume paths as required under Australian CCS legislation'.
This webinar will be presented by Dr Nick Hoffman, CarbonNet Geosequestration Advisor, and will provide an overview of CarbonNet geoscience evaluation programme, referencing the relevant knowledge share products available on the Global CCS Institute website.
On Thursday 19 November 2015, the British Embassy in Paris hosted a second trilateral workshop with French, German and British delegates from the research, government and business sectors to discuss the importance of energy storage.
Router Analysis, Inc. tested the Cisco ASR1004 and 1006 routers with the latest RP2, ESP40 and SIP40 hardware. The router was outfitted with Six 5 port GE SPAs and 2 10GE SPAs.
Overall the results were as expected, with the router performing within the specs provided by Cisco. For some tests the router performed at a lesser rate that expected.
Read the report to find out more.
Sustainability: Actors, Behavior, and Transparency
Part 1: A Graph-based Perspective to Footprint Assessment
Part 2: SmartPacket - Redistributing the Routing Intelligence among Network Components in SDNs
Part 3: Profiling without ‘Profiling’ – Use Case of a Federated Approach to Resource Management in Smart House
Part 4: A Multi-Entity Input Output (MEIO) Approach to Sustainability - Water-Energy-GHG (WEG) Footprint Statements
Part 5 (Afternoon): Dynamic Network Topology-on-Demand for SDNs Using Failure-resilience Generalized Topologies of Physical Underlay
Power and Clock Gating Modelling in Coarse Grained Reconfigurable SystemsMDC_UNICA
Power reduction is one of the biggest challenges in modern systems and tends to become a severe issue dealing with complex scenarios. To provide high-performance and exibility, designers often opt for coarse-grained recongurable (CGR) systems. Nevertheless, these systems require specic attention to the power problem, since large set of resources may be underutilized while computing a certain task.
This paper focuses on this issue. Targeting CGR devices, we propose a way to model in advance power and clock gating costs on the basis of the functional, technological and architectural parameters of the baseline CGR system. The
proposed flow guides designers towards optimal implementations, saving designer eort and time.
The Square Kilometre Array is currently undergoing the Preliminary Design Reviews for its composing elements, and is thus at a critical point on its way to becoming ready for construction starting in 2018. In this talk we will provide an overview of the SKA, its composing elements, and their status, with emphasis on the Telescope Manager and the Science Data Processor, respectively the Monitoring & Control system, and Pipeline. We will see how do they compare with their ALMA equivalents, and how is the SKA similar/different from ALMA.
The Reconfigurable Video Coding (RVC) framework is a recent ISO standard aiming at providing a unified specification of MPEG video technology in the form of a library of components. The word “reconfigurable” evokes run-time instantiation of different decoders starting from an on-the-fly analysis of the input bitstream.
In this paper we move a first step towards the definition of systematic procedures that, based on the MPEG RVC specification formalism, are able to produce multi-decoder platforms, capable of fast switching between different configurations. Looking at the similarities between the decoding algorithms to implement, the papers describes an automatic tool for their composition into a single configurable multi-decoder built of all the required modules, and able to reuse the shared components so as to reduce the overall footprint (either from a hardware or software perspective).
The proposed approach, implemented in C++ leveraging on Flex and Bison code generation tools, typically exploited in the compilers front-end, demonstrates to be successful in the composition of two different decoders MPEG-4 Part 2 (SP): serial and parallel.
The Multi-Dataflow Composer Tool: a Runtime Reconfigurable HDL Platform ComposerMDC_UNICA
Dataflow Model of Computation (D-MoC) is particularly suitable to close the gap between hardware architects and software developers. Leveraging on the combination of the D-MoC with a coarse-grained reconfigurable approach to hardware design, we propose a tool, the Multi-Dataflow Composer (MDC) tool, able to improve time-to-market of modern complex multi-purpose systems by allowing the derivation of HDL runtime reconfigurable platforms starting from the D-MoC models of the targeted set of applications. MDC tool has proven to provide a considerable on-chip area saving: the 82% of saving has been reached combining of different applications in the image processing domain, adopting a 90 nm CMOS technology. In future the MDC tool, with a very small integration effort, will also be extremely useful to create multi-standard codec platforms for MPEG RVC applications.
A Coarse-Grained Reconfigurable Wavelet Denoiser Exploiting the Multi-Dataflo...MDC_UNICA
In the last few years, efficient resource management turned out to be one of the major challenges for hardware designers. Strategies of reusability through reconfiguration have demonstrated interesting potentials to address it, providing also power and area minimization. The Multi-Dataflow Composer (MDC) tool has been presented to the scientific community to automatically build-up runtime coarse-grained reconfigurable platforms. Originally conceived in the field of Reconfigurable Video Coding (RVC), the MDC allows achieving attractive results also for hardware designers operating in different application fields. In this work, we intend to demonstrate the potential orthogonality of the MDC approach with respect to the RVC domain. A runtime reconfigurable wavelet denoiser, targeted for biomedical applications, has been developed and prototyped onto an FPGA Development Board Spartan 3E 1600. Impressive results in terms of resource minimization have been achieved with respect to more traditional solutions.
DSE and Profiling of Multi-Context Coarse-Grained Reconfigurable SystemsMDC_UNICA
The implementation of multi-context systems over coarse-grained reconfigurable platforms could bring several benefits in terms of efficient resource usage and power management. Nevertheless on-the-fly reconfiguration and mapping are not so straightforward and the optimal configuration of the substrate could be extremely time consuming. In this paper we present an early stage design space exploration methodology intended for dataflow-based design flows where multiple input specifications have to be taken into account. The proposed approach, coupled to the Multi-Dataflow Composer tool, has been exploited to assemble the central reconfigurable computing core of an accelerator for video/image processing.
Reconfigurable Coprocessors Synthesis in the MPEG-RVC DomainMDC_UNICA
Flexibility and high efficiency are common design drivers in the embedded systems domain. Coarse-grained reconfigurable coprocessors can tackle these issues, but they suffer of complex design, debugging and applications mapping problems. In this paper, we propose an automated design flow that aids developers in design and managing coarse-grained reconfigurable coprocessors. It provides both the hardware IP and the software drivers, featuring two different levels of coupling with the host processor. The presented solution has been tested on a JPEG codec, targeting a commercial Xilinx Virtex-5 FPGA.
Power Modelling for Saving Strategies in Coarse Grained Reconfigurable SystemsMDC_UNICA
In the context of coarse-grained reconfigurable systems we present a power estimation model to guide the designer in deciding which part of the design may benefit from the application of a power gating technique. The model is assessed by adopting a reconfigurable core for image processing targeting an ASIC 90 nm technology.
Adaptable AES Implementation with Power-Gating SupportMDC_UNICA
In this paper, we propose a reconfigurable design of the Advanced Encryption Standard capable of adapting at run-time to the requirements of the target application. Reconfiguration is achieved by activating only a specific subset of all the instantiated processing elements. Further, we explore the effectiveness of power gating and clock gating methodologies to minimize the energy consumption of the processing elements not involved in computation.
The RPCT project aims to define a software framework providing an efficient support for the automatic generation and management of dedicated reconfigurable systems. The starting points of this framework are the high-level specifications provided as dataflow models.
The main outcome of this research is the Multi Dataflow Composer (MDC) tool, which automatically generates a reconfigurable hardware platform. The MDC tool is extremely suitable in video and image processing contexts, portable platforms and bio-medical signal processing applications (i.e. implantable or wearable devices).
The RPCT research project has been developed within the Microelectronics and Bioengineering Lab (EOLAB) of the Department of Electrical and Electronic Engineering at the University of Cagliari.
Power-Awarness in Coarse-Grained Reconfigurable Designs: a Dataflow Based Strategy
1. Francesca Palumbo
Università degli Studi di Sassari
PolComIng – Information
Engineering Unit
2014 IEEE International Workshop on gnal rocessing ystems
October 20 – 22 2014, Belfast, UK
and Luigi Raffo
DIEE – Dept. of Electrical and Electronics Eng.
EOLAB - Microelectronics and Bioeng. Lab.
2. OUTLINE
• Introduction
– Problem statement
– Background
– The power issue
• Automatic Power-Awareness Strategies
– Baseline Multi-Dataflow Composer
– Static Power: Structural Optimization
– Dynamic Power: Behavior Optimization
• Performance Assessment
– Design Under Test
– Structural Evaluation
– Behavior Evaluation
• Final Remarks and Future Directions
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau2
3. OUTLINE
• Introduction
– Problem statement
– Background
– The power issue
• Automatic Power-Awareness Strategies
– Baseline Multi-Dataflow Composer
– Static Power: Structural Optimization
– Dynamic Power: Behavior Optimization
• Performance Assessment
– Design Under Test
– Structural Evaluation
– Behavior Evaluation
• Final Remarks and Future Directions
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau3
4. PROBLEM STATEMENT
CONSUMER NEEDS:
• HIGH PERFORMANCES real time applications:
– Media players, video calling...
• UP-TO-DATE SOLUTIONS
– Support for the last audio/video codecs, file formats...
• MORE INTEGRATED FEATURES in mobile devices:
– MP3, Camera,Video, GPS...
• LONG BATTERY LIFE
– Convenient form factor, affordable price...
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau4
5. PROBLEM STATEMENT
CONSUMER NEEDS:
• HIGH PERFORMANCES real time applications:
– Media players, video calling...
• UP-TO-DATE SOLUTIONS
– Support for the last audio/video codecs, file formats...
• MORE INTEGRATED FEATURES in mobile devices:
– MP3, Camera,Video, GPS...
• LONG BATTERY LIFE
– Convenient form factor, affordable price...
POSSIBLE SOLUTION:
• DATAFLOW MODEL OF COMPUTATION
– Modularity and parallelism EASIER INTEGRATIONAND FAVOURED RE-USABILITY
• COARSE-GRAINED RECONFIGURABILITY
– Flexibility and resource sharing MULTI-APPLICATION PORTABLE DEVICES
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau4
6. BACKGROUND
DATAFLOW FORMALISM
• Directed graph of actors
(functional units).
• Actors exchange tokens (data
packets) through dedicated
channels.
CHARATERISTICS
• Explicit the intrinsic
application parallelism.
• Modularity favours model
long-term adaptivity.
A
D
C
actions
state
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau5
B
7. BACKGROUND
DATAFLOW FORMALISM
• Directed graph of actors
(functional units).
• Actors exchange tokens (data
packets) through dedicated
channels.
CHARATERISTICS
• Explicit the intrinsic
application parallelism.
• Modularity favours model
long-term adaptivity.
A
D
C
actions
state
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau5
B
8. BACKGROUND
FINE- GRAINED (FG) RECONFIGURATION
• High flexibility bit-level reconfiguration
• Slow and memory expensive configuration phase
• Suitable for applications with high control flow
COARSE-GRAINED (CG) RECONFIGURATION
• Medium flexibility word-level reconfiguration
• Fast configuration phase
• Suitable for applications with high level of instruction/data parallelism
FG CG
Bit-level Word-level
Flexibility
Reconf.
Speed
Config.
Storage
ASIC
GPP
DSP
RECONFIGURABLE
DESIGNS
Performance
Flexibility
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau6
20. STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau11
21. STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
B
3μW
F
5μW
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
F
5μW
S0
2μW
S1
2μW
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau11
22. STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
B
3μW
F
5μW
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
F
5μW
S0
2μW
S1
2μW
= 52μW
= 53μW
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau11
23. STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
B
3μW
F
5μW
A
12μW
B
3μW
D
8μW
C
6μW
E
15μW
F
5μW
S0
2μW
S1
2μW
= 52μW
= 53μW
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau11
24. A
B
DC D B D B D
α β γ
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
25. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γ
αγβ
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
26. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γ
αγβ
longest
chain
2 SBoxes
S1 S2
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
27. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γA
B
D
C
D
S0
S1
S4
S2 S3
S5
S5
αγβ
βαγ
longest
chain
2 SBoxes
S1 S2
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
28. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γA
B
D
C
D
S0
S1
S4
S2 S3
S5
S5
αγβ
βαγ
longest
chain
2 SBoxes
longest
chain
4 SBoxes
S1 S2
S0
S1
S2
S5
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
29. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γA
B
D
C
D
S0
S1
S4
S2 S3
S5
S5
αγβ
βαγ
longest
chain
2 SBoxes
longest
chain
4 SBoxes
S1 S2
S0
S1
S2
S5
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
30. A
B
DC D B D B D
A
B D
C
D
S0 S1 S2
α β γA
B
D
C
D
S0
S1
S4
S2 S3
S5
S5
αγβ
βαγ
longest
chain
2 SBoxes
longest
chain
4 SBoxes
S1 S2
S0
S1
S2
S5
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau12
STATIC POWER MANAGEMENT:
STRUCTURAL OPTIMIZATION
41. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B F
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
42. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B F
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
43. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B FA
B
D
C
A
B
D
C
E F
S0 S1
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
44. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B F
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
45. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B FE B F
A
B
D
C
E F
S0 S1
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
46. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C
E B F
A
B
D
C
E F
S0 S1
E B FE B F
A
B
D
C
E F
S0 S1
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau14
50. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C E
Net 1
Net 2
Net 3
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau15
51. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
A
B
D
C E
Net 1
Net 2
Net 3
A
B
D
C E
LR1
LR2
LR3
LR4
LR5
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau15
52. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
label actors
LR1 A
LR2 C
LR3 D
LR4 B
LR5 E
DPN regions
Net1 LR1,LR2,LR3,LR4
Net2 LR1,LR3
Net3 LR3,LR4,LR5
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau15
55. DYNAMIC POWER MANAGEMENT:
BEHAVIORAL OPTIMIZATION
S0
S1
E
C
D
B
A
S3
S4
LR1
LR2
LR3
LR4
LR5
en1
en2
en3
en4
en5
ck
ck
ck
ck
ck
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau15
56. OUTLINE
• Introduction
– Problem statement
– Background
– The power issue
• Automatic Power-Awareness Strategies
– Baseline Multi-Dataflow Composer
– Static Power: Structural Optimization
– Dynamic Power: Behavior Optimization
• Performance Assessment
– Design Under Test
– Structural Evaluation
– Behavior Evaluation
• Final Remarks and Future Directions
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau16
60. [MS:MergedSpecifications]
POWER OPTIMAL (TOP.p):
all merged
STRUCTURAL EVALUATION
Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau18
61. [MS:MergedSpecifications]
POWER OPTIMAL (TOP.p):
all merged
FREQ OPTIMAL (TOP.f ):
5 over 7 merged
STRUCTURAL EVALUATION
Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau18
62. [MS:MergedSpecifications]
POWER OPTIMAL (TOP.p):
all merged
FREQ OPTIMAL (TOP.f ):
5 over 7 merged
STRUCTURAL EVALUATION
DESIGN
STATIC POWER
ESTIMATION [mW ]
STATIC POWER
MEASURE [mW]
ESTIMATION
ERROR
TOP.f 30.175 29.650 1.77%
TOP.p 25.833 25.716 0.45%
TOP.p vsTOP.f -13.75% -14.39% 7.78%
Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau18
63. BEHAVIORAL EVALUATION
NOCG = without clock gating implementation
AUTO = with the synthesizer automatic register-level clock gating implementation
CG = with the proposed high-level clock gating implementation
DESIGN # of LRs
NOCG AREA
[μm2]
CG AREA
[μm2]
CG vs NOCG
TOP.f 9 135819 136076 +0.19%
TOP.p 13 124026 124579 +0.25%
TOP.p vsTOP.f +44.44% -8.68% -8.45% +31.58%
Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau19
64. BEHAVIORAL EVALUATION
DESIGN
DYNAMIC POWER
CG vs NOCG CG vs AUTO
TOP.f -74.86% -69.06%
TOP.p -71.30% -63.75%
TOP.p vsTOP.f -13.75% -14.39%
NOCG = without clock gating implementation
AUTO = with the synthesizer automatic register-level clock gating implementation
CG = with the proposed high-level clock gating implementation
DESIGN # of LRs
NOCG AREA
[μm2]
CG AREA
[μm2]
CG vs NOCG
TOP.f 9 135819 136076 +0.19%
TOP.p 13 124026 124579 +0.25%
TOP.p vsTOP.f +44.44% -8.68% -8.45% +31.58%
Synthesis trials have been performed through the Cadence SOC Encounter synthesizer targeting a 90 nm CMOS technology.
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau19
65. OUTLINE
• Introduction
– Problem statement
– Background
– The power issue
• Automatic Power-Awareness Strategies
– Baseline Multi-Dataflow Composer
– Static Power: Structural Optimization
– Dynamic Power: Behavior Optimization
• Performance Assessment
– Design Under Test
– Structural Evaluation
– Behavior Evaluation
• Final Remarks and Future Directions
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau20
66. FINAL REMARKS AND FUTURE
DIRECTIONS
• Power consumption management is a challenging issue
in modern embedded system designs
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau21
67. FINAL REMARKS AND FUTURE
DIRECTIONS
• Power consumption management is a challenging issue
in modern embedded system designs
• The Multi-Dataflow Composer aims at:
– Implementing coarse-grained multi-functional devices
– Providing efficient power-aware architectures
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau21
68. FINAL REMARKS AND FUTURE
DIRECTIONS
• Power consumption management is a challenging issue
in modern embedded system designs
• The Multi-Dataflow Composer aims at:
– Implementing coarse-grained multi-functional devices
– Providing efficient power-aware architectures
• MDC now integrates high-level power aware strategies
reducing both static and dynamic power consumption
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau21
69. FINAL REMARKS AND FUTURE
DIRECTIONS
• Power consumption management is a challenging issue
in modern embedded system designs
• The Multi-Dataflow Composer aims at:
– Implementing coarse-grained multi-functional devices
– Providing efficient power-aware architectures
• MDC now integrates high-level power aware strategies
reducing both static and dynamic power consumption
• Future developments
– Power gating on different logic regions
– Improvements in the estimation models
– Heuristic for the profiler design space exploration
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau21
70. The research leading to these results has received funding from:
• the Region of Sardinia L.R.7/2007
under grant agreement CRP-18324
[RPCT Project].
• the Region of Sardinia, Young
Researchers Grant, POR Sardegna FSE
2007-2013, L.R.7/2007 “Promotion of
the scientific research and
technological innovation in Sardinia”
SiPS 2014 - 2014 October 22nd - Belfast (United Kingdom) - Carlo Sau22
ACKNOWLEDGEMENTS
71. Università degli Studi di Cagliari
DIEE – Dept. of Electrical and Electronics Eng.
EOLAB - Microelectronics and Bioeng. Lab.
2014 IEEE International Workshop on gnal rocessing ystems
October 20 – 22 2014, Belfast, UK
72. Read full article
• IEEE Xplore Digital Library
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumb
er=6986104&tag=1
• RPCT ProjectWebsite:
http://sites.unica.it/rpct/references/
73. Reference
• @INPROCEEDINGS{SAU_SIPS2014,
author={F. Palumbo and C. Sau and L. Raffo},
booktitle={2014 IEEE Workshop on Signal
Processing Systems (SiPS)},
title={Power-awarness in coarse-grained
reconfigurable designs: A dataflow based
strategy},
year={2014},
pages={1-6},
doi={10.1109/SiPS.2014.6986104} }
BibTex