Vittoriano Muttillo, University of L'Aquila, IT
4th Workshop on High-performance and Real-time Embedded Systems (HiRES) in the 11th HiPEAC conference
January 18 - 20, 2016
Multi-core and many-core devices are becoming wide diffused in embedded system area, because of the possibility to increase performance of application execution and to have substantial reduction of energy consumption. In the reconfigurable logic area, the presence of soft-processors allows to start from a programmable component and to select only needed, possibly customized, peripherals. By considering soft-processors in multi-core scenario, embedded systems can be better customized to match the application requirements: for example, to use shared-memory multi-core architectures on FPGA in better manner, a SMP (Symmetric Multi-Processor) configuration can be adopted. Various FPGA vendors provide soft-processors optimized for their reconfigurable logic, while third part vendors offer soft-processors targeted to specific domain. Moreover, to better exploit a multi-processing system, a parallel programming model should be used. One of the most famous is the OpenMP library: OpenMP API is a specification for a set of compiler directives, library routines, and environment variables that can be used to specify high-level parallelism, based on a fork-join model. Unfortunately, the porting of such a library to an embedded target is not straightforward.
This presentation discusses the design flow of a specific embedded platform, constituted by a quad-core Leon3 in SMP configuration, enhanced with an unobtrusive profiling mechanism able to support performance evaluation. An SMP-aware Linux distribution runs on the platform and the porting of needed SW components to allow the exploitation of the OpenMP library has been done. A prototypal version targeting ML605 Development Board has been implemented and used to evaluate the performance of the proposed platform. A set of benchmarks have been considered and used to show the correctness of OpenMP porting. A performance analysis done by means of VIPPE simulation tool has been used to compare speed-up parameter and execution time simulated with the HW profiling system results. Since validation and performance evaluation activities become of critical importance, integration with Rapita Verification Suite (RVS) has been started to allows designers to have meaningful statistics (WCRT, Average time execution, etc.) in order to define and improve the performance of software on dedicated hardware architecture.
In order to test the platform with an industrial application, Mobile ad hoc Network (MANET) localization algorithm based on multi-dimensional scaling approach, provided by Thales Italy, has been executed on the multicore architecture.
Design and validation of a multi-core embedded platform under high performance requirements
1. Design and validation of a multi-core
embedded platform under high
performance requirements
University of L’Aquila
Center of Excellence DEWS
Department of Information Engineering, Computer
Science and Mathematics DISIM
4th Workshop on
High-performance and Real-time
Embedded Systems (HiRES 2016)
V. Muttillo, G. Valente, F. Federici, L. Pomante, M. Faccio
4. Multi-core Embedded
SoC
On-Chip embedded systems are characterized by
several F/NF requirements
• Response time, power consumption, time-to-market etc.
Multi-core embedded systems design
• Suffers from the lack of uniform pathways to system
realization and application deployment
Parallel programming model
• Allows to obtain a speed-up for a multi-threaded
application by splitting the workload
Run-time monitoring solutions
• Allows to monitor system behaviour during life-time
4HiRES 2016
5. Proposed Solution
This work presents the development of an
embedded multi-core platform on FPGA with:
• Multi-LEON3 SMP HW architecture
• Non-intrusive distributed HW profiling subsystem
• Integrated customized Linux OS distribution
• OpenMP parallel programming models
• RVS profiling tool support
Final goal of the work
• Development of high-performance multi-core
embedded platform with run-time resource monitoring
components and off-line verification tools support
5HiRES 2016
6. Platform in
development flow
The work is related to the Artemis-JU ASP CRAFTERS
European project
• It has led to uniform embedded system development
flow in the research and industry domains
• The platform has been proposed to execute and validate
industrial case studies
• Support to embedded system designers
6HiRES 2016
8. LEON3 32-bit synthesizable soft-processors, multi-core mode, dedicated
FPU, MMU for Linux OS etc.
HW Architecture
8HiRES 2016
9. OS and Parallel
Programming Model
Operating System
• A Linux distribution has been customized, starting from
LEON LINUX kernel
• Cross-compiler toolchain, buildroot tool to build user space
application and RAM loader have been provided by Aeroflex
Gaisler
Parallel Programming Model
• Libraries required to implement parallel applications
using OpenMP C/C++, have been added to the
customized Linux distribution
9HiRES 2016
10. HW Profiling System
AIPHS (AdaptIve Profiling Hardware Subsystem)
• Event and Time monitoring functionalities
10HiRES 2016
11. Final Platform
4-core Leon 3 with Linux operating system,
OpenMP libraries and hardware profiling system
11HiRES 2016
ML605 (Virtex 6) Development Board
THE PLATFORM HARDWARE ARCHITECTURE
12. Final Platform
4-core Leon 3 with Linux operating system,
OpenMP libraries and hardware profiling system
12HiRES 2016
ML605 (Virtex 6) Development Board
THE PLATFORM HARDWARE ARCHITECTURE
13. Platform Functionalities
• High performance multi-processing software execution
• Run-time event and time monitoring
• Reconfigurable HW architecture
• Resource monitoring application using MW layer
13HiRES 2016
15. 0
50000000
100000000
150000000
200000000
250000000
300000000
Reduction Parallel SPMD No false Sharing
1 2 3 4
Simulated results
VIPPE-based speed-up evaluation on selected
benchmark
• Verify if OpenMP program parallelization made sense in
a scenario with a given memory organization (i.e. single
cache, DDR3 interface for external memory etc.)
• To check if specific OpenMP library implementation works well
with the proposed memory organization
15HiRES 2016
16. Experimental results
AIPHS-based speed-up evaluation on selected
benchmark
• Execution time increases with number of threads
• Multi-core architecture, based on LEON3 and one level
cache, using OpenMP leads to optimal performances
• False sharing problem is quite influent in this system
16HiRES 2016
0
100000000
200000000
300000000
400000000
500000000
600000000
700000000
Reduction Parallel SPMD No false Sharing
1 2 3 4
17. RVS Support
Rapita Verification Suite provides a framework for
on-target verification of embedded software
The use of AIPHS enables the designer to analyze
time information offline by using Rapita tools
AIPHS allows reducing the need for code
instrumentation so providing information more
related to the real behavior of the considered
application
17HiRES 2016
19. Conclusions
This work has described the design and the
validation of an embedded SoC multi-core platform
• early verification and validation
• enhanced performances in execution time (OpenMP)
• on-chip run-time monitoring (AIPHS)
Support for Rapita Verification Suite (RVS) allows
designers to evaluate meaningful statistics
• WCRT
• Average time execution
• etc…
19HiRES 2016
20. Future developments
Improvement of the profiling system to collect
more data and events while better filtering
overhead due to OS and ISR
Improvement of multi-core monitoring support for
RVS
Preliminary simulation step with VIPPE tool
integrated in the multi-core embedded systems
specific design flow
20HiRES 2016
Good afternoon, I’m Vittoriano Muttillo, PhD student from University of L’Aquila and today I will present “Design and validation of a multi-core embedded platform under high performance requirements”. I will start with a brief introduction, then I will expose the proposed platform and the related evaluation and validation tests and, finally, conclusion and future work related.
Well, the main problem on On-chip embedded systems design is considering both Functional and non-functional requirements that impact in the implementation of specific systems. Run-time analysis, power consumption, time to market and so on, this constrains drive the designer to realize their systems. Multi-core embedded design also suffers from the lack of uniform pathway to system realization and application deployment. To improve performance also parallel programming model allows to obtain a speed-up, in terms of run-time response or execution time, for a multi-threaded application by splitting the workload around the processor. Finally, run-time monitor solution can allows to monitor system behavior during life-time, in term of memory access, bus bottleneck checking and processor stalls.
So this work presents the development and design step of an embedded multi-core platform on FPGA with a specific characteristics related to an industrial avionic scenario firstly and in general to general high requirement environments industry scenario. The specific characteristics are a Leon3 SMP multi-core HW architecture and a list of others specific component for different functionalities (improvements of performance, monitoring functionalities etc). So the final goal of this work is the development of high-performance multi-core embedded platform with run-time resource monitoring component (an embedded distributed monitoring system for memory, buses and so on) and off-line verification tool support, such as RVS or others profiling tools.
Additional information are that the work is related to the Artemis CRAFTERS European project which had the aim of has led to uniform embedded system development flow in the research and industry domains. The platform has been proposed to execute and validate an industrial case study offers by Thales Italia.
The proposed platform realized in this work is based on Leon3 soft processor with a dedicated FPU per processor, a shared memory around the AHB bus and others particular configuration (cache, MMU and so on).
On this HW architecture, a linux distribution has been customized, starting from Linux ernel with using leon patch for the architectural merge. Toolchains, buildroot tool for user space application and ram loader have been provided by Aeroflex Gaisler. For using a parallel programming model we have used the GNU OpenMP library GOMP in terms of dynamic linked library implementation at runtime.
The HW pofiling system implemented, the distributed AIPHS Adaptive HW profiling SubSystems, is composed of an arbitrary number of sniffer with the architecture shown in figure. In particular, the adaptability is given by two part of sniffer component, the adapter and the bus output interface. This particular implementation ensure that is possible to adapt the our profiling system to different HW architecture and to distribute it on the specific HW implementation/prototype.
The final platform is shown in this two figure. This is 4-core LEON3 with linux operating system, OpenMP libraries and distributed Hardware profiling systems. The development board used in this work is a ML605 Virtex 6.
The final platform is shown in this two figure. This is 4-core LEON3 with linux operating system, OpenMP libraries and distributed Hardware profiling systems. The development board used in this work is a ML605 Virtex 6.
So we proposed a specific functionalities framework to allows user to access Profiling HW data from an High level point of view. This framework is a multi-tier architecture with the hw abstraction layer for access AIPHS system, the kernel space with the specific instance of Operating system and parallel programming model and the user space layer with a middleware component that allow user application and third part monitoring software to use AIPHS and profiling data for online and offline analysis at runtime. The final goals of this multi-tier framework is to get high performance multi-processing software execution, ti provide a runtime event and time monitoring with our profiling systems, to use reconfigurable HW architecture and to to monitor application resource using middleware functionalities layer.
Finally we have used a series of benchmark test for evaluation and validation the proposed platform. In this case we preliminary have made simulation using a virtual platform simulation tool provided by University of Cantabria, VIPPE tools. In this manner we have evaluated the speed-up trend on selected benchmark for verify if openmp program parallelization with specific configuration (single cache, DDR3 interface for external memory and so on) and to check if specific OpenMP library implementation works well with the proposed memory organization. In the left figure there are the simulation results that shown the speed-up increase and time decrease in clock cycle number. In the right figure we can be see the vippe simulation methods in which we have used platform model and software c implementation to simulating the run-time code execution of our systems.
After this simulation step, we have used the same benchmark on-target, in a real execution environment. The result has been taken using AIPHS profiling internal system. The data has shown that execution time increases with number of threads and the multi-core architecture, based on LEON3 and one level cache, using OpenMP leads to optimal performance with result similar in speed-up trend respect to simulation execution. In this case we can see that false sharing problem is quite influent in this scenario, different to simulation mode in which this problem was not be relevant.
Finally, we use profiling HW data to support Rapita verification Suite that provides a framework for on-target verification of embedded systems and AIPHS enables designer to analyze time information offline by using rapita tools. The choice of RVS has been driven by the possibility to allows reducing the need for code instrumentation so providing information more related to real behavior of considered application, providing a series of specific code analysis WCET, bus bottleneck, code coverage end so on, in order to help designer in the development step.
This work has described the design and validation of an embedded multi core platform on SoC, with early verification and validation test, enhanced performance in execution time using OpenMP and implementing an on-chip run-time hw monitoring profiling systems called AIPHS in order to collect low level hw data. The support implementation of AIPHS related to RVS allaws designers to evaluate meaningfull statistics such as WCRT, Average time execution, bus utilization and bottlenecks, processor stalls in multi core mode and so on.
Future development involves improvement of the profiling system to collect more data and events while better filtering overhead due to operating systems and interrupt service routine, interrupt time. The proposed platform offers also an improvement of multi-core monitoring support for RVS. Another step is to define a design methodology for developing multi-core embedded system using a preliminary simulation step with VIPPE virtual platform simulation tool integrated in the specific design flow.