Presentation by Osman Unsal and Pirah Noor Soomro at the webinar AI4EU WebCafé: 'Energy-efficient AI, a perspective from the LEGaTO project' on 28 October 2020
A integração das tecnologias industriais, sistemas SCADA e Cloud Computing possibilitando aos usuários a criação de uma interface entre a planta produtiva e a junta executiva.
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The designed SCADA software system ensured remote monitoring of the positions and advanced system health conditions of all the solar tracking systems to provide data analytics and reporting. This SCADA solution was designed and developed toco-exist in a remote system that will continuously monitor multiple fields consisting of several masters and their respective slave trackers.
A integração das tecnologias industriais, sistemas SCADA e Cloud Computing possibilitando aos usuários a criação de uma interface entre a planta produtiva e a junta executiva.
In this deck from the Stanford HPC Conference, Peter Dueben from the European Centre for Medium-Range Weather Forecasts (ECMWF) presents: Machine Learning for Weather Forecasts.
"I will present recent studies that use deep learning to learn the equations of motion of the atmosphere, to emulate model components of weather forecast models and to enhance usability of weather forecasts. I will than talk about the main challenges for the application of deep learning in cutting-edge weather forecasts and suggest approaches to improve usability in the future."
Peter is contributing to the development and optimization of weather and climate models for modern supercomputers. He is focusing on a better understanding of model error and model uncertainty, on the use of reduced numerical precision that is optimised for a given level of model error, on global cloud- resolving simulations with ECMWF's forecast model, and the use of machine learning, and in particular deep learning, to improve the workflow and predictions. Peter has graduated in Physics and wrote his PhD thesis at the Max Planck Institute for Meteorology in Germany. He worked as Postdoc with Tim Palmer at the University of Oxford and has taken up a position as University Research Fellow of the Royal Society at the European Centre for Medium-Range Weather Forecasts (ECMWF) in 2017.
Watch the video: https://youtu.be/ks3fkRj8Iqc
Learn more: https://www.ecmwf.int/
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
In this deck from the Stanford HPC Conference, Katie Lewis from Lawrence Livermore National Laboratory presents: The Incorporation of Machine Learning into Scientific Simulations at Lawrence Livermore National Laboratory.
"Scientific simulations have driven computing at Lawrence Livermore National Laboratory (LLNL) for decades. During that time, we have seen significant changes in hardware, tools, and algorithms. Today, data science, including machine learning, is one of the fastest growing areas of computing, and LLNL is investing in hardware, applications, and algorithms in this space. While the use of simulations to focus and understand experiments is well accepted in our community, machine learning brings new challenges that need to be addressed. I will explore applications for machine learning in scientific simulations that are showing promising results and further investigation that is needed to better understand its usefulness."
Watch the video: https://youtu.be/NVwmvCWpZ6Y
Learn more: https://computing.llnl.gov/research-area/machine-learning
and
http://www.hpcadvisorycouncil.com/events/2020/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
How HPC and large-scale data analytics are transforming experimental scienceinside-BigData.com
In this deck from DataTech19, Debbie Bard from NERSC presents: Supercomputing and the scientist: How HPC and large-scale data analytics are transforming experimental science.
"Debbie Bard leads the Data Science Engagement Group NERSC. NERSC is the mission supercomputing center for the USA Department of Energy, and supports over 7000 scientists and 700 projects with supercomputing needs. A native of the UK, her career spans research in particle physics, cosmology and computing on both sides of the Atlantic. She obtained her PhD at Edinburgh University, and has worked at Imperial College London as well as the Stanford Linear Accelerator Center (SLAC) in the USA, before joining the Data Department at NERSC, where she focuses on data-intensive computing and research, including supercomputing for experimental science and machine learning at scale."
Watch the video: https://wp.me/p3RLHQ-kLV
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The designed SCADA software system ensured remote monitoring of the positions and advanced system health conditions of all the solar tracking systems to provide data analytics and reporting. This SCADA solution was designed and developed toco-exist in a remote system that will continuously monitor multiple fields consisting of several masters and their respective slave trackers.
Modeling Uncertainty For Middleware-based Streaming Power Grid ApplicationsJenny Liu
The power grid is incorporating high throughput sensor devices into power distribution networks. The future power
grid needs to guarantee accuracy and responsiveness of applications that consume data from multiple sensor streams.
The end-to-end performance and overall scalability of cyber-physical energy applications depend on the middleware's ability to handle multi-source sensor data, which exhibits uncertain behavior under highly variable numbers of sensors and middleware topologies. In this paper, we present a parametric approach to model middleware uncertainty and to analyze its eect on distributed power applications. The models encapsulate the entire data
ow paths from sensor devices, through network and middleware components to the power application nodes that utilize sensor data streams. Using the Ptolemy II framework for modeling and simulation, we generate Monte Carlo samples of uncertain parameters that are used to generate parameterized middleware models that are used in end-to-end Discrete-Event(DE) system simulation simulation. The simulation results are further analyzed using regression methods to reveal the parameters that are influential in the limiting middleware behaviour to achieve temporal requirements of the power applications.
Solar panel monitoring solution using IoT-Faststream TechnologiesSudipta Maity
Faststream Technologies offers an automated IOT based solar panel monitoring/troubleshooting system that allows for automated solar panel monitoring from anywhere over the internet. As part of our solution, we make use of several IoT gateways suitable for different needs, based on SoCs like STM32, ESP32, ublox, CC3200, SiliconLabs, to monitor the solar panel parameters, in turn, providing Solar Plant Insights.
Our system constantly monitors the solar panel and transmits various parameters to the Cloud over the IoT system. Here we make use of the IoT platform to transmit solar power parameters to Amazon/ Azure cloud /IOT server via the gateway (over WiFi and Ethernet). A powerful web interface allows viewing of data in meaningful formats, enabling users to make decisions.
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
This talk starts by exploring how electrical power systems are increasingly becoming digitalized, leading to their transformation into a class of cyber-physical systems (a system of systems) where the electrical grid merges with ubiquitous information and communication technologies (ICT).
This type of complex systems present unprecedented challenges in their operation and control, and due to unknown interactions with ICT, require new concepts, methods and tools to facilitate their operational design, manufacturing (of components), and testing/verification/validation of their performance.
Inspired by the tremendous advantages of the model-based system engineering (MBSE) framework developed by the aerospace and military communities, this talk will highlight the challenges to adopt MBSE for electrical power grids. MBSE is not only a framework to deal with all the phases of putting in place complex systems-of-systems, but also provides a foundation for the democratization of technology - both software and hardware.
The talk will illustrate the foundations that have been built by the presenter's research over the last 7 years, placed within the context of MBSE, with focus on areas of power engineering. Some of these foundations and contributions include the OpenIPSL, RaPId, SD3K, BableFish and Khorjin open source software developed and distributed online by the research group, and available at: https://github.com/ALSETLab
Microgrid & renewable integration at burbank water & powerSchneider Electric
This presentation reviews Schneider Electric's collaboration avec Burbank Water and Power, a cutting-edge utility company in Burbank, California, to achieve challenging renewable energy requirements and provide reliable, safe, and affordable power to its customers using advanced technology solutions.
To prepare for increased renewable energy requirements, Burbank Water and Power sought a system to manage load, distributed energy resources, distributed storage systems, generation, and variable renewables in order to balance supply and demand and avoid undesirable voltage, power flow, and power quality problems. Burbank’s Integrated Automated Dispatch System (ADS) includes Schneider Electric’s advanced Power Control System (PCS) - for automatic generator control, load forecasting, and renewable forecasting - integrated with Schneider Electric’s OASyS SCADA and WeatherSentry system. The Integrated ADS will allow Burbank to co-optimize scheduling and dispatch of conventional supply resources, distributed generation, and demand-side resources, enable better control of inadvertent interchanges, and reduce reliance on external generation. Through the Integrated ADS, Burbank’s system operators will be able to manage the available system resources to optimize system reliability while achieving the most economic and sustainable energy supply portfolio.
Energy efficient chaotic whale optimization technique for data gathering in w...IJECEIAES
A Wireless Sensor Network includes the distributed sensor nodes using limited energy, to monitor the physical environments and forward to the sink node. Energy is the major resource in WSN for increasing the network lifetime. Several works have been done in this field but the energy efficient data gathering is still not improved. In order to amend the data gathering with minimal energy consumption, an efficient technique called chaotic whale metaheuristic energy optimized data gathering (CWMEODG) is introduced. The mathematical model called Chaotic tent map is applied to the parameters used in the CWMEODG technique for finding the global optimum solution and fast convergence rate. Simulation of the proposed CWMEODG technique is performed with different parameters such as energy consumption, data packet delivery ratio, data packet loss ratio and delay with deference to dedicated quantity of sensor nodes and number of packets. The consequences discussion shows that the CWMEODG technique progresses the data gathering and network lifetime with minimum delay as well as packet loss than the state-of-the-art methods.
GE Critical Power’s new GP100 Power Supply: Balanced Power. Unparalleled Density. 3-Phase Power. 1RU.
Does energy use in your data center keep you up at night? It should.
Your customers rely on you to keep networks flowing and transactions moving 24 hours a day, seven days a week, 365 days a year. Your data center can’t afford not to be highly reliable and energy efficient. It’s a tall order. And one we don’t take lightly.
Never Balance a Load Again. At GE Critical Power, we understand the issues your data center faces. Chief among these is the burden of balancing the load on the AC grid. Not only does this require a vast amount of resources, but it can add operational costs and the need for additional equipment. Enter the GP100, an innovative new technology that powers the future success of mission critical data centers, telecommunications and supercomputing industries by eliminating single-phase balancing issues and ensuring electrical phases grow in equal increments.
More Power. Less Space. GP100 is also extremely compact: four times smaller than competing products on the market today.
The most efficient, most compact, and first 3-Phase 1RU Power Supply for 19" Rack Mount Applications Ever Created. Great Power, Great Performance, Great Innovation, Great Reliability and Great Value. All from the engineers and experienced design teams at GE Critical Power.
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...LEGATO project
Full presentation of the paper "An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration" by Behzad Salami at the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020)
Modeling Uncertainty For Middleware-based Streaming Power Grid ApplicationsJenny Liu
The power grid is incorporating high throughput sensor devices into power distribution networks. The future power
grid needs to guarantee accuracy and responsiveness of applications that consume data from multiple sensor streams.
The end-to-end performance and overall scalability of cyber-physical energy applications depend on the middleware's ability to handle multi-source sensor data, which exhibits uncertain behavior under highly variable numbers of sensors and middleware topologies. In this paper, we present a parametric approach to model middleware uncertainty and to analyze its eect on distributed power applications. The models encapsulate the entire data
ow paths from sensor devices, through network and middleware components to the power application nodes that utilize sensor data streams. Using the Ptolemy II framework for modeling and simulation, we generate Monte Carlo samples of uncertain parameters that are used to generate parameterized middleware models that are used in end-to-end Discrete-Event(DE) system simulation simulation. The simulation results are further analyzed using regression methods to reveal the parameters that are influential in the limiting middleware behaviour to achieve temporal requirements of the power applications.
Solar panel monitoring solution using IoT-Faststream TechnologiesSudipta Maity
Faststream Technologies offers an automated IOT based solar panel monitoring/troubleshooting system that allows for automated solar panel monitoring from anywhere over the internet. As part of our solution, we make use of several IoT gateways suitable for different needs, based on SoCs like STM32, ESP32, ublox, CC3200, SiliconLabs, to monitor the solar panel parameters, in turn, providing Solar Plant Insights.
Our system constantly monitors the solar panel and transmits various parameters to the Cloud over the IoT system. Here we make use of the IoT platform to transmit solar power parameters to Amazon/ Azure cloud /IOT server via the gateway (over WiFi and Ethernet). A powerful web interface allows viewing of data in meaningful formats, enabling users to make decisions.
Model-Simulation-and-Measurement-Based Systems Engineering of Power System Sy...Luigi Vanfretti
This talk starts by exploring how electrical power systems are increasingly becoming digitalized, leading to their transformation into a class of cyber-physical systems (a system of systems) where the electrical grid merges with ubiquitous information and communication technologies (ICT).
This type of complex systems present unprecedented challenges in their operation and control, and due to unknown interactions with ICT, require new concepts, methods and tools to facilitate their operational design, manufacturing (of components), and testing/verification/validation of their performance.
Inspired by the tremendous advantages of the model-based system engineering (MBSE) framework developed by the aerospace and military communities, this talk will highlight the challenges to adopt MBSE for electrical power grids. MBSE is not only a framework to deal with all the phases of putting in place complex systems-of-systems, but also provides a foundation for the democratization of technology - both software and hardware.
The talk will illustrate the foundations that have been built by the presenter's research over the last 7 years, placed within the context of MBSE, with focus on areas of power engineering. Some of these foundations and contributions include the OpenIPSL, RaPId, SD3K, BableFish and Khorjin open source software developed and distributed online by the research group, and available at: https://github.com/ALSETLab
Microgrid & renewable integration at burbank water & powerSchneider Electric
This presentation reviews Schneider Electric's collaboration avec Burbank Water and Power, a cutting-edge utility company in Burbank, California, to achieve challenging renewable energy requirements and provide reliable, safe, and affordable power to its customers using advanced technology solutions.
To prepare for increased renewable energy requirements, Burbank Water and Power sought a system to manage load, distributed energy resources, distributed storage systems, generation, and variable renewables in order to balance supply and demand and avoid undesirable voltage, power flow, and power quality problems. Burbank’s Integrated Automated Dispatch System (ADS) includes Schneider Electric’s advanced Power Control System (PCS) - for automatic generator control, load forecasting, and renewable forecasting - integrated with Schneider Electric’s OASyS SCADA and WeatherSentry system. The Integrated ADS will allow Burbank to co-optimize scheduling and dispatch of conventional supply resources, distributed generation, and demand-side resources, enable better control of inadvertent interchanges, and reduce reliance on external generation. Through the Integrated ADS, Burbank’s system operators will be able to manage the available system resources to optimize system reliability while achieving the most economic and sustainable energy supply portfolio.
Energy efficient chaotic whale optimization technique for data gathering in w...IJECEIAES
A Wireless Sensor Network includes the distributed sensor nodes using limited energy, to monitor the physical environments and forward to the sink node. Energy is the major resource in WSN for increasing the network lifetime. Several works have been done in this field but the energy efficient data gathering is still not improved. In order to amend the data gathering with minimal energy consumption, an efficient technique called chaotic whale metaheuristic energy optimized data gathering (CWMEODG) is introduced. The mathematical model called Chaotic tent map is applied to the parameters used in the CWMEODG technique for finding the global optimum solution and fast convergence rate. Simulation of the proposed CWMEODG technique is performed with different parameters such as energy consumption, data packet delivery ratio, data packet loss ratio and delay with deference to dedicated quantity of sensor nodes and number of packets. The consequences discussion shows that the CWMEODG technique progresses the data gathering and network lifetime with minimum delay as well as packet loss than the state-of-the-art methods.
GE Critical Power’s new GP100 Power Supply: Balanced Power. Unparalleled Density. 3-Phase Power. 1RU.
Does energy use in your data center keep you up at night? It should.
Your customers rely on you to keep networks flowing and transactions moving 24 hours a day, seven days a week, 365 days a year. Your data center can’t afford not to be highly reliable and energy efficient. It’s a tall order. And one we don’t take lightly.
Never Balance a Load Again. At GE Critical Power, we understand the issues your data center faces. Chief among these is the burden of balancing the load on the AC grid. Not only does this require a vast amount of resources, but it can add operational costs and the need for additional equipment. Enter the GP100, an innovative new technology that powers the future success of mission critical data centers, telecommunications and supercomputing industries by eliminating single-phase balancing issues and ensuring electrical phases grow in equal increments.
More Power. Less Space. GP100 is also extremely compact: four times smaller than competing products on the market today.
The most efficient, most compact, and first 3-Phase 1RU Power Supply for 19" Rack Mount Applications Ever Created. Great Power, Great Performance, Great Innovation, Great Reliability and Great Value. All from the engineers and experienced design teams at GE Critical Power.
An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural ...LEGATO project
Full presentation of the paper "An Experimental Study of Reduced-Voltage Operation in Modern FPGAsfor Neural Network Acceleration" by Behzad Salami at the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2020)
THE ENERGY GRID & Integration of IOT
Track 3 Session 3 Moderator: Mark Walker
Quantified results of an Energy Grid Management Use Case that explore grid performance boundaries in the face of proliferated residential solar array deployments is presented. The Use Case demonstrates how modern IT open source tools can be integrated into a grid simulation that provides a decision support tool for the utility industry to manage future change. GridLab-D is used as an agent based model to simulate energy consumer nodes in a complex inter-connected grid using a modern IBM SystemG graph computing engine. The resulting simulation environment executes the simulated grid network with structured and unstructured data results stored in the graph database. Big Data Analytics performed on the resulting simulation data using IBM Big Data Analytics tools and Sandia National Lab DAKOTA uncertainty quantification and statistical analysis tools allow for interrogation of the resulting performance database to establish performance characteristics visualized through graphs. The work is leverages DoD sponsored research in Uncertainty Quantification in complex System of System Modeling and Simulation environments and demonstrates future model based techniques for risk management, financial modeling, grid resiliency and critical infrastructure protection.
Low Power High-Performance Computing on the BeagleBoard Platforma3labdsp
The ever increasing energy requirements of supercomputers and server farms is driving the scientific and industrial communities to take in deeper consideration the energy efficiency of computing equipments. This contribution addresses the issue proposing a cluster of ARM processors for high-performance computing. The cluster is composed of five BeagleBoard-xM, with one board managing the cluster, and the other boards executing the actual processing. The software platform is based on the Angstrom GNU/Linux distribution and is equipped with a distributed file system to ease sharing data and code among the nodes of the cluster, and with tools for managing tasks and monitoring the status of each node. The computational capabilities of the cluster have been assessed through High-Performance Linpack and a cluster-wide speaker diarization algorithm, while power consumption has been measured using a clamp meter. Experimental results obtained in the speaker diarization task showed that the energy efficiency of the BeagleBoard-xM cluster is comparable to the one of a laptop computer equipped with a Intel Core2 Duo T8300 running at 2.4 GHz. Furthermore, removing the bottleneck due to the Ethernet interface, the BeagleBoard-xM cluster is able to achieve a superior energy efficiency.
Presentation from the EPRI-Sandia Symposium on Secure and Resilient Microgrids: Utility Microgrids: Integrations and Implementation Challenges, presented by Andrew Reid, ConEdison, Baltimore, MD, August 29-31, 2016.
Low-power Innovative techniques for Wearable ComputingOmar Elshal
A presentation i did for the Ubiquitous and Wearable Computing seminar during my senior year in university.
The presentation introduces many research papers on the field then discusses one of them thoroughly.
4 TeraGrid Sites Have Focal Points:
SDSC – The Data Place
Large-scale and high-performance data analysis/handling
Every Cluster Node is Directly Attached to SAN
NCSA – The Compute Place
Large-scale, Large Flops computation
Argonne – The Viz place
Scalable Viz walls
Caltech – The Applications place
Data and flops for applications – Especially some of the GriPhyN Apps
Specific machine configurations reflect this
The GreenDroid mobile application processor is a 45-nm multicore research prototype that targets the Android mobile-phone software stack. It can execute general-purpose mobile programs with 11 times less energy than today’s most energy-efficient designs, at similar or better performance levels.
IoT Tech Expo 2023_Micha vor dem Berge presentationVEDLIoT Project
VEDLIoT Next Generation AIoT Applications. Micha vor dem Berge. VEDLIoT Conference Track co-located with IoT Tech Expo, Amsterdam, Netherlands, September 2023
How to achieve 95%+ Accurate power measurement during architecture exploration? Deepak Shankar
During the conceptualization and architectural exploration phases, it is crucial to assess the power budget.
Would you like to accurately measure the:
1. Power consumed for a proposed embedded software or firmware?
2. Savings of a Power Management Algorithm prior to development?
3. Power impact of hardware configuration change?
4. Trade-off between Power and Performance?
5. Temperature, heat, peak power and cumulative power?
Similar to LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project (20)
Scrooge Attack: Undervolting ARM Processors for ProfitLEGATO project
Malicious cloud provider can intentionally undervolt cloud infrastructure for additional savings on the electricity bill. ARM processors are low power processors which can lead to substantial energy saving for cloud providers. In our scenario we consider a scrooge cloud provider which undervolts its ARM
infrastructure for profit. The instances can be undervolted in a stealthy manner by avoiding critical voltage regions.
Applications running under critical undervolting conditions can
malfunction. These conditions can be exploited by a cloud user to uncover the undervolted instances. For this novel attack scenario we present a detection method for cloud users. The detection method injects non-selectively faults into processes with the intend to crash the cloud instance. Even if the cloud
provider can spoof temperature and voltage readings of the processor, the cloud user is capable to uncover undervolted instances. By crashing instances simultaneously using the detection method, the cloud user is covered by the service licence agreement and exposes the scrooge cloud provider.
TEEMon: A continuous performance monitoring framework for TEEsLEGATO project
LEGaTO paper presented at ACM Middleware 2020 by Robert Krahn, Donald Dragoti, Franz Gregor, Do Le Quoc, Valerio Schiavoni, Pascal Felber, Clenimar Souza, Andrey Brito and Christof Fetzer
Presentation given by Jens Hagemeyer (Bielefeld University) at the ‘Low-Energy Heterogeneous Computing Workshop’ on 16 October 2020 within HiPEAC CSW Autumn 2020
TZ4Fabric: Executing Smart Contracts with ARM TrustZoneLEGATO project
Paper presented by Christian Göttel at SRDS'20.
Abstract: Transparency in blockchains can be an advantage and a disadvantage, in particular if confidential information such as assets or business interactions are exposed. There are no confidentiality guarantees in blockchain systems to protect the logic of a smart contract or the data it processes. One solution to this problem can be trusted execution environments (TEE) which are an emerging technology for example available in edge or mobile-grade processors (e.g., ARM TrustZone) or in server-grade processors (e.g., Intel SGX). In this presentation we introduce TZ4Fabric, an extension of Hyperledger Fabric which leverages ARM TrustZone to shield the execution of smart contracts from compromised systems and powerful attackers. TZ4Fabric exploits the open source OP-TEE framework to enable ARM TrustZone features. We evaluate our prototype on the Raspberry Pi platform and highlight energy and performance trade-offs.
Infection Research with Maxeler Dataflow ComputingLEGATO project
Presentation given by Tobias Becker (Maxeler) at the LEGaTO Final Event: Low-Energy Heterogeneous Computing Workshop on 4 September 2020
This event was collocated with FPL 2020
Presentation given by Nils Kucza (Bielefeld University) at the LEGaTO Final Event: Low-Energy Heterogeneous Computing Workshop on 4 September 2020
This event was collocated with FPL 2020
FPGA Undervolting and Checkpointing for Energy-Efficiency and Error-ResiliencyLEGATO project
Tutorial by Behzad Salami, Osman Unsal and Leonardo Bautista at 30th International Conference on Field-Programmable Logic and Applications (FPL2020), 3 September 2020
Scheduling Task-parallel Applications in Dynamically Asymmetric EnvironmentsLEGATO project
Presentation by Jing Chen and Pirah Noor Soomro (Chalmers University of Technology) at the 16th International Workshop on Scheduling and Resource Management for Parallel and Distributed Systems (SRMPDS 2020) on 17 August 2020.
SRMPDS was a virtual event and collocated with ICPP’20 - 2020 International Conference on Parallel Processing.
RECS – Cloud to Edge Microserver Platform for Energy-Efficient ComputingLEGATO project
Abstract:Today, application developers and data center operators face the challenging task to achieve high performance while at the same time needing to reduce the total cost of ownership, which is especially driven by the energy consumption of the server itself.
This poster shows the RECS Microserver platform, developed by Christmann and Bielefeld University. RECS simplifies the combined use of heterogeneous target architectures to achieve high performance and superior energy-efficiency.
Poster presented by Martin Kaiser at the LEGaTO Final Event: 'Low-Energy Heterogeneous Computing Workshop'
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
LEGaTO: Low-Energy Heterogeneous Computing Use of AI in the project
1. The LEGaTO project has received funding from the European Union's Horizon 2020 research and
innovation programme under the grant agreement No 780681
LEGaTO: Low-Energy,
Heterogeneous Computing
Use of AI in the Project
AI4EU Café Webinar
Osman Unsal
Barcelona Supercomputing Center
28/October/2020
2. AI4EU Cafe
The future challenge of computing: MW, not FLOPS
2
“… without dramatic increases
in efficiency, ICT industry could
use 20% of all electricity and
emit up to 5.5% of the world’s
carbon emissions by 2025.”
“We have a tsunami of data
approaching. Everything which
can be is being digitalised. It is
a perfect storm.”
“ … a single $1bn Apple data
centre planned for Athenry in Co
Galway, expects to eventually
use 300MW of electricity, or
over 8% of the national capacity
and more than the daily entire
usage of Dublin. It will require
144 large diesel generators as
back up for when the wind does
not blow.”
3. AI4EU Cafe
How did we get here?
3
Decades of exponential growth in performance
End of Dennard scaling
Moore’s Law is slowing down
Explore new architectures & models of computation
Exponential growth in demand & data
Move towards accelerators
4. AI4EU Cafe
FPGAs to the rescue?
• The model of computation is key
• Build ultra-deep, highly efficient pipelines
4
5. AI4EU Cafe
LEGaTO Ambition
• Create software stack-support for energy-efficient
heterogeneous computing
o Starting with Made-in-Europe mature software stack, and optimizing
this stack to support energy-efficiency
o Computing on a commercial cutting-edge European-developed
heterogeneous hardware substrate with CPU + GPU + FPGA +
FPGA-based Dataflow Engines (DFE)
• Main goal: energy efficiency
9. AI4EU Cafe
Use Cases
• Healthcare: Infection biomarkers
o Statistical search for biomarkers, which often
needs intensive computation. A biomarker is
a measurable value that can indicate the
state of an organism, and is often the
presence, absence or severity of a specific
disease
• Smart Home: Assisted Living
o The ability of the home to learn from the
users behavior and anticipate future
behavior is still an open task and necessary
to obtain a broad user acceptance of
assisted living in the general public
10. AI4EU Cafe
Use Cases
• Smart City: operational urban
pollutant dispersion modelling
o Modeling city landscape + sensor data +
wind prediction to issue a “pollutant
weather prediction”
• Machine Learning: Automated driving
and graphics rendering
o Object detection using CNN networks for
automated driving systems and CNN-
and LSTM-based methods for realistic
rendering of graphics for gaming and
multi-camera systems
• Secure IoT Gateway
o Variety of sensors and actors in an
industrial and private surrounding
11. AI4EU Cafe
LEGaTO Healthcare Use Case and AI
• Leverage tree based methods and LASSO regression for
Infection Biomarker research
• Integrated to LEGaTO Scone security technology
o Efficient deployment of Intel SGX security extensions
• LEGaTO scheduling techniques help to accelerate one of
the key algorithms using random forest
28.10.20 11
12. AI4EU Cafe
LEGaTO ML (DNN) Use Case
• In presentation of Hans Salomonsson (Embedl)
28.10.20 12
13. AI4EU Cafe
LEGaTO Smart Home Use Case and AI
• In presentation of Nils Kucza (University of Bielefeld)
28.10.20 13
14. AI4EU Cafe
LEGaTO Student Research Perspective on AI
• Scheduling VGG across heterogeneous cores in mobile
edge devices
• On Nvidia Jetson TX2
o 4-core ARM A57 and
o 2-core Denver 2
• In presentation of Pirah Noor Soomro (Chalmers University)
28.10.20 14
16. Reduced-Voltage Operation in Modern FPGAs
for Neural Network Acceleration
Behzad Salami Baturay Onural Ismail Yuksel
Fahrettin Koc Oguz Ergin Adrian Cristal
Osman Unsal Hamid Sarbazi-Azad
17. Executive Summary
• Motivation: Power consumption of neural networks is a main concern
Hardware acceleration: GPUs, FPGAs, and ASICs
• Problem: FPGAs are at least 10X less power-efficient than equivalent ASICs
• Goal: Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by Undervolting below nominal level
• Evaluation Setup
5 Image classification workloads
3 Xilinx UltraScale+ ZCU102 platforms
2 On-chip voltage rails
• Main Results
Large voltage guardband (i.e., 33%)
>3X power-efficiency gain
18. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
19. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
20. Motivation and Background
• Motivation
Power consumption of neural networks is a main concern
Hardware acceleration: GPUs, FPGAs, and ASICs
FPGAs: Getting popular but less power-efficient than equivalent ASICs
Large voltage guardbands (12-35%) for CPUs, GPUs, DRAMs
Any potential of “Undervolting FPGAs” for power-efficiency of neural networks?
• Background
Neural Networks: Widely deployed with an inherent resilience to errors
FPGAs: Higher throughput than GPUs and better flexibility than ASICs
Undervolting: Reduces power cons., may incur reliability or performance issues
21. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
22. Our Goal
• Primary Goal
Bridge the power-efficiency gap between ASIC- and FPGA-based
neural networks by:
Undervolting (i.e., underscaling voltage below nominal level)
• Secondary Goals
Study the voltage behavior of real FPGAs (e.g., guardband)
Study the power-efficiency gain of undervolting for neural networks
Study the reliability overhead
Study the frequency underscaling to prevent the accuracy loss
Study the effect of environmental temperature
23. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
24. Overall Methodology
• 5 CNN image classification
workloads, i.e., VGGNet, GoogleNet,
AlexNet, ResNet50, Inception.
• Xilinx DNNDK to map CNN into FPGA
By default optimized for INT8
• 3 identical samples of Xilinx ZCU102
ZYNQ Ultrscale+ architecture
Hard-core ARM for data orchestration
FPGA for CNN acceleration
• 2 on-chip voltage rails, via PMBus
𝑉𝐶𝐶𝐼𝑁𝑇: DSPs, LUTs, buffers, …
𝑉𝐶𝐶𝐵𝑅𝐴𝑀: BRAMs
𝑉𝑛𝑜𝑚= 850mV (set by manufacturer)
Vast majority (>99.9%) of the power is dissipated on 𝑉𝐶𝐶𝐼𝑁𝑇
25. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
26. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
27. Overall Voltage Behavior
Slight variation of voltage behavior across platforms and benchmarks
FPGA stops operatingCrash
• Guardband: Large region below nominal level (𝑽 𝒏𝒐𝒎 = 𝟖𝟓𝟎𝒎𝑽)
• Critical: Narrower region below guardband (𝑽 𝒎𝒊𝒏 = 𝟓𝟕𝟎𝒎𝑽)
• Crash: FPGA crashes below critical region (𝑽 𝒄𝒓𝒂𝒔𝒉 = 𝟓𝟒𝟎𝒎𝑽)
No performance or reliability loss
Added by the vendor to ensure the
worst-case conditions
Large guardband, average of 33%
Guard
band
A narrow voltage region
Neural network accuracy collapse
Critical
28. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
29. Power-Reliability Trade-off
Power-efficiency (GOPs/W) gain
• >3X power saving (2.6X by eliminating guardband and further 43% in critical region)
Reliability overhead (i.e., CNN accuracy loss)
VGGNet GoogleNet AlexNet ResNet Inception
• Slight variation across 3 platforms and 5 workloads
• No accuracy loss in the guardband, accuracy collapse in the critical region
• Slight variation across 3 platforms and 5 workloads
30. Outline
• Motivation and Background
• Our Goal
• Methodology
• Results
- Overall Voltage Behavior
- Power-Reliability Trade-off
- Environmental Temperature
• Prior Works
• Summary, Conclusion, and Future Works
31. Environmental Temperature
• Effects of environmental temperature on power-reliability
Use fan speed to test temperature in [34 ℃, 50 ℃]
On-board temperature monitored by PMBus
• Temperature effects on power consumption
↓ 𝑇𝑒𝑚𝑝 → ↓ 𝑃𝑜𝑤𝑒𝑟 (direct relation of power and temp)
By undervolting, the impact of temperature on power consumption reduces.
• Temperature effects on reliability
↓ 𝑇𝑒𝑚𝑝 → ↑ 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑙𝑜𝑠𝑠 (indirect relation of reliability and temp)
In our temperature range, 𝑉 𝑚𝑖𝑛 and 𝑉𝑐𝑟𝑎𝑠ℎdo not change significantly.
GoogleNet
34. Denver 0
Denver 1
A57 2
A57 3
A57 4
A57 5
61 200 500
Timeline [s]
Pipeline Stage 1
Pipeline Stage 2
Pipeline Stage 3
Training Phase
FC
FC
FC
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Conv64
MAXPOOL
Conv64
Conv64
MAXPOOL
Conv64
Conv64
Execution of VGG-16 on Nvidia Jetson TX2
Best Configuration: 3 staged pipeline, 6-5-10 layer partitioning, 2-2-2
core assignment
35. Preliminary Results:
Comparison of Pipe Search algorithm
with Brute Force Algorithm
Approach
Number of
trials
Training
Time [s]
Total execution
time of 2000
frames [s]
Best
Configuration*
Seed**
Exhaustive Search 1970 8129.21 8166.9 3,7,4,10,2,1,1, ….
Pipe Search Algorithm 41 116.305 2915.91 3,7,4,10,2,1,1, 3,6,5,10,2,1,1,
Experimental Setup:
Hardware: Nvidia Jetson TX2.
Used cores: 2 Denver, 2 A57
*. Throughput maximizing pipeline configuration. The sequence contains three distinct
sections:
1- Number of Pipeline Stages
2- Layer distribution among Pipeline Stages
3- Core placement for each Pipeline Stage.
**. Seed is a configuration which is calculated using computational hints. A good seeds
minimizes number of trials in search space exploration.
Application: VGG-16
Total 21 Layers, 16/21 are compute
intensive layers
Input Frames = 2000
Editor's Notes
processing on massive scale will have a significant energy impact
MW will be new focus, not FLOPS
data centres need to reduce energy !
for large scale compute, parallelism might not be the most efficient
assembly line model
not even a new idea, compute equivalent is dataflow
this view is Maxeler specific, but the solution is not
Maxeler more explicit to model and develop your application this way
here focus is performance but low energy very related
Thank you.
To begin, I will give a brief overview.
[CLICK] Our motivation is that the power consumption of Neural Networks is a first class concern in state-of-the-art applications, due to the massive amount of data movement and computational power.
[CLICK] To alleviate this issue, hardware acceleration using GPUs, FPGAs, and ASICS is a promising solution.
[CLICK] Among these architectures, FPGAs are getting popular, since, they deliver higher throughput than GPUs and provide better flexibility than ASICs.
[CLICK] But the problem is that the power-efficiency of FPGAs is at least 10X less than equivalent ASICs.
[CLICK] Our goal is to alleviate this issue by undervolting off-the-shelf FPGAs running Neural Network applications. Undervolting means supply voltage underscaling below default level that is set by FPGA vendor.
[CLICK] Our study is based on
[CLICK] 5 image classification workloads
[CLICK] 3 real Xilinx ZCU102 devices which is based on Zynq architecture,
[CLICK] and 2 on-chip voltage rails
[CLICK] Among the main results,
[CLICK] we discover a large voltage guardband of 33% of the nominal voltage level. This guardband is set by vendor to ensure the correct functionality in the worst-case conditions. Eliminating this guardband does not incur any performance or reliability overhead. By undervolting,
[CLICK] by applying this techinque, we achieve more than 3X power-efficiency.
Here is the outline for the talk.
I will first discuss the motivation behind the work and also will briefly provide the necessary background.
[CLICK] First, motivation
[CLICK] The main motivation behind this work is the increasing interest for neural network that are limited to their high power consumption drawback.
[CLICK] Using efficient accelerators usually deliver better power-efficiency than general purpose processors.
[CLICK] Among accelerator frameworks, FPGAs are getting popular thanks to their less time to market; however, their power efficiency is at least 10 times less than ASIC-based neural networks.
[CLICK] as a hardware level technique, Undervolting has been recently studied for off-the-shelf CPUs, GPUs, and DRAMs. They have shown significant potential of this technique since vendors usually add large guardbands below the nominal level. This guardband is usually unnecessary for many real-world applications and eliminating it delivers power-efficieny without compromising performance or reliability.
[CLICK] In this work, we aim to experimentally study the potential of undervolting of real FPGA devices for neural networks.
[CLICK] now I will give a brief background about
[CLICK] Neural networks first. Neural networks are getting popular since they have shown a significant potential to classify unseen objects. They have an inherent resiliency to errors.
[CLICK] fpgas second. FPGAs have reconfigurable, massively parallel, and deeply pipelined architectures. They have advantage of both GPUs and ASICs in terms of flexibility and efficiency.
[CLICK] finally undervolting. We refer undervolting as supply voltage underscaling below the nominal level that is set by vendor. We apply undervolting until the underlying FPGA device stops operating. The direct advantage is the power saving, however, it may have performance or reliability overhead. This trade-off is experimentally studied in this work.
Let me elaborate on our main goals of this work.
[CLICK] Our primary goal is to
[CLICK] Bridge the power-efficiency gap between ASIC- and FPGA-based neural networks by:
[CLICK] Voltage underscaling of real FPGAs below the nominal level
[CLICK] Beside that, we aim to
[CLICK] Study the voltage behavior of real FPGAs such as voltage guardbands
[CLICK] Study the power-efficiency gain of undervolting for neural networks
[CLICK] Study the reliability overhead
[CLICK] Study the frequency underscaling to prevent the accuracy loss below guardband
[CLICK] and finally, study the effect of environmental temperature
I will briefly explain the experimental methodology next.
[CLICK] our experimental methodology is summarized in this figure. Our focus is the classification phase of convolutional neural networks so we start with a pre-trained model.
[CLICK] We selected 5 state-of-the-art image classification benchmarks as listed here. They have different number of layers, neurons, and models sizes from a few KBs to hundreds of MBs.
[CLICK] For mapping CNN models into FPGA, we use a tool from Xilinx, called DNNDK. By using this tool, we make sure that our study is general-enough and not specified for a specifiec design.
[CLICK] DNNDK support several Xilinx FPGAs. We perform our experiments on 3 identical samples of Xilinx ZCU102. This architecture is composed of ARM and fpga. The DNN computations are performed in FPGA part and the ARM processor is used to orchestrate the data movement.
[CLICK] Lastly, we access the voltage rails on the FPGA platform using the PMBus. Among different voltage rails, we focus on on-chip ones including VCCINT and VCCBRAM. VCCINT is used in share by DSPs, LUTs, buffers, and routing resources, and VCCBRAM is individually used by on-chip memories. Note that this is hard setup of the platform set by vendor. The default voltage level for both these rails is 850mv.
[CLICK] Among these voltage rails, we measure the power consumption at the nominal level and observed that the BRAMs power is negligible. This can be the result of the efficient on-chip memory design in Xilinx Ultrascale+ family. Hence, we focus on undervolting VCCINT.
I will now discuss the experimental results.
I will start with presenting and discussing the voltage behavior we experimentally observed for FPGAs.
Here, we show the overall voltage behavior.
[CLICK] Undervolting below the nominal level, we observe three voltage region: guardband, critical, and crash.
[CLICK] Guardband is added by vendor to ensure the correct functionality in the worst-case conditions. We measured it an average of 33%. There is no reliability or performance loss in this region. So eliminating it can deliver significant power saving in normal conditions.
[CLICK] Below guardband, there is a narrower region at which FPGA operates but with the CNN accuracy loss. We call it critical region. A minimum safe voltage level or Vmin separates the guardband and critical regions.
[CLICK] Finally below the critical regions and at Vcrach FPGA does not operate. Vcrash is measured to be average of 540mV.
[CLICK] Note that there is a slight variation of voltage behavior across 3 platforms and 5 benchmarks studied.
I will discuss now the effect of the reduced-voltage operations on the power consumption and DNN accuracy trade-off according to the voltage behavior discussed.
[CLICK] First power. There is a total of more than 3X gain in power-efficiency when undervolting from nominal level to the crash point. 2.6X of this is achieved by eliminating the guardband and a further 43% is as the result of undervolting in the critical region which has the cost of accuracy loss.
[CLICK] there is a slight variation across 3 benchmarks and 5 platforms.
[CLICK] the CNN accuracy is substantially reduced in the critical region.
[click] and there is an accuracy collapse for all benchmarks. [click]As it can be seen, with a slight variation across 3 benchmarks and 5 platforms.
Lastly, I will present our experimental analysis about the effect of the environmental temperature in the power-reliability trade-off.
Temperature has significant impact on the power consumption as well as on the reliability.
[CLICK] We studied its impact while undervolting 𝑉 𝐶𝐶𝐼𝑁𝑇 . [CLICK] In our tests, we use the speed of the FPGA fan that can generate the temperature in the range of [34 ℃, 50 ℃]. Wealso use PMBus to monitor the on-chip temperature.
[CLICK] First power: as can be seen in this figure, temperature directly impacts the power consumption, meaning that lowering the temperature leads to lower power consumption. This is mainly due to the reduction of static power.
[CLICK] However, by undervolting further, the impact of temperature on the power consumption reduces and as you can see, in the critical voltage region there is a negligible impact. This can be explained by noting that by undervolting, the static power also gradually reduces. As the result, the temperature which also impacts the static power does now show much effect on it.
[CLICK] On the other side, temperature has an indirect impact on the reliability, meaning that reducing the temperature the reliability improves or the accuracy loss reduces.
[CLICK] Also, we observe a negligible change in the voltage regions by temperature changes. Although, in wider ranges we may observe differences. This needs further experiments and equipment.
The Framework consists of 2 modules.
In Offline module, We determine the computational hints from network descriptor. The hints provide a notion of computational weights of each layer based on which we model the initial partition of network layers to generate a pipeline stage.
In the second module, which is online we measure some configurations which are expected as a good candidate for a balanced pipeline on a given platform. The training finishes when algorithm has found the optimal solution for mapping. Rest of the input data is then processed in pipelined fashion.
This figure shows:
1) The training phase of the execution
2) The normal phase of the pipeline processing
During training, after 61 input frames, the algorithm converged to a 3-Staged pipeline.
Pipeline Stage 1 comprises of first 6 layers and executes on 2 Denver cores
Pipeline Stage 2 comprises of next 5 layers and executes on 2 A57 cores
Pipeline Stage 3 comprises of first 10 layers and executes on 2 A57 cores
As throughput maximizing pipeline is the one which has smallest bubble size and the slowest stage takes minimum possible time. This configuration is chosen based on the fact that it minimizes the bottleneck of the pipeline. As we can also observe in the figure that there is a small bubble in pipeline.