Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #7: FlexTiles Emulation platform
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #6: FlexTiles Embedded FPGA Accelerators
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
FPL'2014 - FlexTiles Workshop - 8 - FlexTiles DemoFlexTiles Team
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #8: FlexTiles Demo
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
FlexTiles Platform integrated in 19" Rack EnclosureFlexTiles Team
The FlexTiles Development Platform is suitable for verifying the concept of a single 3D SoC chip design, but with a 19" Rack enclosure it can scale to either a larger 3D chip design or become a FPGA-based HPC
Conference on Adaptive Hardware and Systems (AHS'14) - The FlexTiles Embedded...FlexTiles Team
The FP7 FlexTiles Project will provide tools for building a 3D SoC chip. This chip has an FPGA embedded and these slides will explain the ideas and how we will make it a re-configurable fabric like never seen before
Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTilesFlexTiles Team
The FP7 FlexTiles Project uses DSP accelerators. They are connected with each other - and with the general purpose procesors (GPPs) through a Network-on-Chip (NoC). These slides give the details about the DSP accelerator.
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
FPL'2014 - FlexTiles Workshop - 1 - FlexTiles OverviewFlexTiles Team
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #1: FlexTiles overview
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #6: FlexTiles Embedded FPGA Accelerators
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
FPL'2014 - FlexTiles Workshop - 8 - FlexTiles DemoFlexTiles Team
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #8: FlexTiles Demo
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
FlexTiles Platform integrated in 19" Rack EnclosureFlexTiles Team
The FlexTiles Development Platform is suitable for verifying the concept of a single 3D SoC chip design, but with a 19" Rack enclosure it can scale to either a larger 3D chip design or become a FPGA-based HPC
Conference on Adaptive Hardware and Systems (AHS'14) - The FlexTiles Embedded...FlexTiles Team
The FP7 FlexTiles Project will provide tools for building a 3D SoC chip. This chip has an FPGA embedded and these slides will explain the ideas and how we will make it a re-configurable fabric like never seen before
Conference on Adaptive Hardware and Systems (AHS'14) - The DSP for FlexTilesFlexTiles Team
The FP7 FlexTiles Project uses DSP accelerators. They are connected with each other - and with the general purpose procesors (GPPs) through a Network-on-Chip (NoC). These slides give the details about the DSP accelerator.
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
In this deck from IWOCL / SYCLcon 2020, Hal Finkel from Argonne National Laboratory presents: Preparing to program Aurora at Exascale - Early experiences and future directions.
"Argonne National Laboratory’s Leadership Computing Facility will be home to Aurora, our first exascale supercomputer. Aurora promises to take scientific computing to a whole new level, and scientists and engineers from many different fields will take advantage of Aurora’s unprecedented computational capabilities to push the boundaries of human knowledge. In addition, Aurora’s support for advanced machine-learning and big-data computations will enable scientific workflows incorporating these techniques along with traditional HPC algorithms. Programming the state-of-the-art hardware in Aurora will be accomplished using state-of-the-art programming models. Some of these models, such as OpenMP, are long-established in the HPC ecosystem. Other models, such as Intel’s oneAPI, based on SYCL, are relatively-new models constructed with the benefit of significant experience. Many applications will not use these models directly, but rather, will use C++ abstraction libraries such as Kokkos or RAJA. Python will also be a common entry point to high-performance capabilities. As we look toward the future, features in the C++ standard itself will become increasingly relevant for accessing the extreme parallelism of exascale platforms.
This presentation will summarize the experiences of our team as we prepare for Aurora, exploring how to port applications to Aurora’s architecture and programming models, and distilling the challenges and best practices we’ve developed to date. oneAPI/SYCL and OpenMP are both critical models in these efforts, and while the ecosystem for Aurora has yet to mature, we’ve already had a great deal of success. Importantly, we are not passive recipients of programming models developed by others. Our team works not only with vendor-provided compilers and tools, but also develops improved open-source LLVM-based technologies that feed both open-source and vendor-provided capabilities. In addition, we actively participate in the standardization of OpenMP, SYCL, and C++. To conclude, I’ll share our thoughts on how these models can best develop in the future to support exascale-class systems."
Watch the video: https://wp.me/p3RLHQ-lPT
Learn more: https://www.iwocl.org/iwocl-2020/conference-program/
and
https://www.anl.gov/topic/aurora
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
FPL'2014 - FlexTiles Workshop - 1 - FlexTiles OverviewFlexTiles Team
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #1: FlexTiles overview
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
In this deck from the Perth HPC Conference, Werner Scholz from XENON Systems presents: Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC.
"A decade ago, 100 watts per CPU was devastating to thermal design. Today, Intel’s highest performing CPUs (e.g. Intel Cascade Lake-AP 9282 processor) have a thermal design envelope of 400 watts. There really is no end in sight, and accommodating more power is critical to advancing performance. The ability to dissipate the resulting heat is the hard ceiling that systems face in terms of performance – giving greater importance to liquid cooling breakthroughs. With liquid cooling, less energy is expended to cool systems – a significant savings in HPC deployments with arrays of servers drawing energy and generating heat. Electrical current drives the CPU and enables it to function. This electrical power is converted into thermal energy (heat). To maintain a stable temperature, the CPU needs to be cooled by efficiently removing this heat and releasing it. Liquid cooling is the best way to cool a system because liquid transfers heat much more efficiently than air. From an environmental perspective, liquid cooling reduces both those characteristics to create a smarter and more ecological approach on a grand scale. The cascade of value continues, as ambient heat removed from systems can then be used to heat buildings and augment or replace traditional heating systems. It’s an intelligent approach to thermal management, distributing the economic value of reduced energy use and transforming heat into an enterprise asset."
Watch the video: https://wp.me/p3RLHQ-kZa
Learn more: https://www.xenon.com.au/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CINECA for HCP and e-infrastructures infrastructuresCineca
Sanzio Bassini. Head of the HPC Department of Cineca. Cineca is the technological partner of the Ministry of Education, and takes part in the Italian commitment for the development of e-infrastrcuture in Italy and in Europe for HCP and HCP technologies; scientific data repository and management, cloud computing for industries and Public administration, for the development of computing intensive and data intensive methods for science and engineering
Cineca offers a unique offer for: open access of integrated tier0 and tier1 HCP national infrastructure; of education and training activities under the umbrella of PRACE Training
advanced center action; integrated help desk and scale up process for HCP users support
Conference on Adaptive Hardware and Systems (AHS'14) - FlexTiles IntroductionsFlexTiles Team
FlexTiles is a FP7 Project with the goal of designing a tool-chain for the design of a 3D SoC and prototype on a FPGA Development Platform. This presentation covers the "why, how, when and where" of the Project that will complete in Year 2015
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this video from the UK HPC Conference, Richard Graham from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video:
Learn more: http://mellanox.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Energy Efficient GPS Acquisition with Sparse-GPSPrasant Misra
Following rising demands in positioning with GPS, low-cost receivers are becoming widely available; but their energy demands are still too high. For energy efficient GPS sensing in delay-tolerant applications, the possibility of offloading a few milliseconds of raw signal samples and leveraging the greater processing power of the cloud for obtaining a position fix is being actively investigated.
In an attempt to reduce the energy cost of this data offloading operation, we propose Sparse-GPS: a new computing framework for GPS acquisition via sparse approximation.
Within the framework, GPS signals can be efficiently compressed by random ensembles. The sparse acquisition information, pertaining to the visible satellites that are embedded within these limited measurements, can subsequently be recovered by our proposed representation dictionary.
By extensive empirical evaluations, we demonstrate the acquisition quality and energy gains of Sparse-GPS. We show that it is twice as energy efficient than offloading uncompressed data, and has 5-10 times lower energy costs than standalone GPS; with a median positioning accuracy of 40 m.
In this deck from the HPC User Forum at Argonne, Andrew Siegel from Argonne presents: ECP Application Development.
"The Exascale Computing Project is accelerating delivery of a capable exascale computing ecosystem for breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development."
Watch the video: https://wp.me/p3RLHQ-kSL
Learn more: https://www.exascaleproject.org
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The NPOESS Integrated Program Office and its principal contractors -- Raytheon and Northrop-Grumman -- are prepraring for the launch of the NPP risk-reduction mission in 2006. This presentation will review program status, and how HDF will be used to deliver NPOESS products at the domestic weather centrals and worldwide field terminals.
Challenges in Assessing Single Event Upset Impact on Processor SystemsWojciech Koszek
Abstract—This paper presents a test methodology developed at Xilinx for real-time soft-error rate testing as well as the software framework in which Device-Under-Test (DUT) and controlling computer are both synchronized with the proton beam controls and run experiments automatically in a predictable manner. The method presented has been successfully used for Zynq®-7000 All Programmable SoC testing at the UC Davis Crocker Nuclear Lab. Presented are the issues and challenges encountered during design and implementation of the framework, as well as lessons learned from the in-house experiments and bootstrapping tests performed with Thorium Foil. The method presented has helped Xilinx to deliver high-quality experimental data and to optimize time spent in the testing facility.
Keywords—Error detection, soft error, architectural vulnerability, statistical error, confidence level, beam facility control
Abstract— During the past year Xilinx, for the first time ever, set out to quantify the soft error rate of a multi-core microprocessor. This work extends on Xilinx’s 10+ years of heritage in FPGA radiation testing. Built on the 28 nanometer technology node, Xilinx’s ZynqTM family of devices integrate a processor subsystem with programmable logic. The processor subsystem includes two 32 bit ARM CortexTM-A9 CPU’s, two NEONTM floating point units, two SIMD processing units, an L1 and L2 cache, on chip SRAM memory and various peripherals. The programmable logic is directly connected with the processing subsystem via ARM’s AMBATM 4 AXI interface. This programmable logic is based on the 7 Series FPGA fabric, consisting of 6-input LUTs and DFFs along with Block RAM, DSP slices, multi-gigabit transceivers, and other blocks. Tests were performed using a proton beam to analyze the soft error susceptibility of the new device. Proton beam testing was deemed acceptable since previous neutron beam and proton beam testing had shown virtually identical cross-sections for 7 Series programmable logic. The results are promising and yield a solid baseline for a typical embedded application targeting any of the Zynq SoC devices. As a foray into processor testing, this Zynq work has laid a solid foundation for future Xilinx SoC test campaigns.
Austin Lesea, Wojciech Koszek, Glenn Steiner, Gary Swift, and Dagan White Xilinx, Inc.
Paper: SELSE 2014 @ Stanford University (PDF, 456KB), 2014
Slides: (PDF, 933KB), 2014
In this deck, Gilad Shainer from Mellanox announces the world’s first HDR 200Gb/s data center interconnect solutions. "These 200Gb/s HDR InfiniBand solutions maintain Mellanox’s generation-ahead leadership while enabling customers and users to leverage an open, standards-based technology that maximizes application performance and scalability while minimizing overall data center total cost of ownership. Mellanox 200Gb/s HDR solutions will become generally available in 2017.
Watch the video presentation: http://insidehpc.com/2016/11/hdr-infiniband/
In this deck from the Perth HPC Conference, Rob Farber from TechEnablement presents: AI is Impacting HPC Everywhere.
"The convergence of AI and HPC has created a fertile venue that is ripe for imaginative researchers — versed in AI technology — to make a big impact in a variety of scientific fields. From new hardware to new computational approaches, the true impact of deep- and machine learning on HPC is, in a word, “everywhere”. Just as technology changes in the personal computer market brought about a revolution in the design and implementation of the systems and algorithms used in high performance computing (HPC), so are recent technology changes in machine learning bringing about an AI revolution in the HPC community. Expect new HPC analytic techniques including the use of GANs (Generative Adversarial Networks) in physics-based modeling and simulation, as well as reduced precision math libraries such as NLAFET and HiCMA to revolutionize many fields of research. Other benefits of the convergence of AI and HPC include the physical instantiation of data flow architectures in FPGAs and ASICs, plus the development of powerful data analytic services."
Learn more: http://www.techenablement.com/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the Argonne Training Program on Extreme-Scale Computing 2019, Howard Pritchard from LANL and Simon Hammond from Sandia present: NNSA Explorations: ARM for Supercomputing.
"The Arm-based Astra system at Sandia will be used by the National Nuclear Security Administration (NNSA) to run advanced modeling and simulation workloads for addressing areas such as national security, energy and science.
"By introducing Arm processors with the HPE Apollo 70, a purpose-built HPC architecture, we are bringing powerful elements, like optimal memory performance and greater density, to supercomputers that existing technologies in the market cannot match,” said Mike Vildibill, vice president, Advanced Technologies Group, HPE. “Sandia National Laboratories has been an active partner in leveraging our Arm-based platform since its early design, and featuring it in the deployment of the world’s largest Arm-based supercomputer, is a strategic investment for the DOE and the industry as a whole as we race toward achieving exascale computing.”
Watch the video: https://wp.me/p3RLHQ-l29
Learn more: https://insidehpc.com/2018/06/arm-goes-big-hpe-builds-petaflop-supercomputer-sandia/
and
https://extremecomputingtraining.anl.gov/agenda-2019/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
Today Xilinx announced Versal Premium, the third series in the Versal ACAP portfolio. The Versal Premium series features highly integrated, networked and power-optimized cores and the industry’s highest bandwidth and compute density on an adaptable platform. Versal Premium is designed for the highest bandwidth networks operating in thermally and spatially constrained environments, as well as for cloud providers who need scalable, adaptable application acceleration.
Versal is the industry’s first adaptive compute acceleration platform (ACAP), a revolutionary new category of heterogeneous compute devices with capabilities that far exceed those of conventional silicon architectures. Developed on TSMC’s 7-nanometer process technology, Versal Premium combines software programmability with dynamically configurable hardware acceleration and pre-engineered connectivity and security features to enable a faster time-to- market. The Versal Premium series delivers up to 3X higher throughput compared to current generation FPGAs, with built-in Ethernet, Interlaken, and cryptographic engines that enable fast and secure networks. The series doubles the compute density of currently deployed mainstream FPGAs and provides the adaptability to keep pace with increasingly diverse and evolving cloud and networking workloads.
Learn more: https://insidehpc.com/2020/03/xilinx-announces-versal-premium-acap-for-network-and-cloud-acceleration/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck from the UK HPC Conference, Gunter Roeth from NVIDIA presents: Hardware & Software Platforms for HPC, AI and ML.
"Data is driving the transformation of industries around the world and a new generation of AI applications are effectively becoming programs that write software, powered by data, vs by computer programmers. Today, NVIDIA’s tensor core GPU sits at the core of most AI, ML and HPC applications, and NVIDIA software surrounds every level of such a modern application, from CUDA and libraries like cuDNN and NCCL embedded in every deep learning framework and optimized and delivered via the NVIDIA GPU Cloud to reference architectures designed to streamline the deployment of large scale infrastructures."
Watch the video: https://wp.me/p3RLHQ-l2Y
Learn more: http://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
In this deck, Paul Isaacs from Linaro presents: State of ARM-based HPC. This talk provides an overview of applications and infrastructure services successfully ported to Aarch64 and benefiting from scale.
"With its debut on the TOP500, the 125,000-core Astra supercomputer at New Mexico’s Sandia Labs uses Cavium ThunderX2 chips to mark Arm’s entry into the petascale world. In Japan, the Fujitsu A64FX Arm-based CPU in the pending Fugaku supercomputer has been optimized to achieve high-level, real-world application performance, anticipating up to one hundred times the application execution performance of the K computer. K was the first computer to top 10 petaflops in 2011."
Watch the video: https://wp.me/p3RLHQ-lIT
Learn more: https://www.linaro.org/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPCinside-BigData.com
In this deck from the Perth HPC Conference, Werner Scholz from XENON Systems presents: Exceeding the Limits of Air Cooling to Unlock Greater Potential in HPC.
"A decade ago, 100 watts per CPU was devastating to thermal design. Today, Intel’s highest performing CPUs (e.g. Intel Cascade Lake-AP 9282 processor) have a thermal design envelope of 400 watts. There really is no end in sight, and accommodating more power is critical to advancing performance. The ability to dissipate the resulting heat is the hard ceiling that systems face in terms of performance – giving greater importance to liquid cooling breakthroughs. With liquid cooling, less energy is expended to cool systems – a significant savings in HPC deployments with arrays of servers drawing energy and generating heat. Electrical current drives the CPU and enables it to function. This electrical power is converted into thermal energy (heat). To maintain a stable temperature, the CPU needs to be cooled by efficiently removing this heat and releasing it. Liquid cooling is the best way to cool a system because liquid transfers heat much more efficiently than air. From an environmental perspective, liquid cooling reduces both those characteristics to create a smarter and more ecological approach on a grand scale. The cascade of value continues, as ambient heat removed from systems can then be used to heat buildings and augment or replace traditional heating systems. It’s an intelligent approach to thermal management, distributing the economic value of reduced energy use and transforming heat into an enterprise asset."
Watch the video: https://wp.me/p3RLHQ-kZa
Learn more: https://www.xenon.com.au/
and
http://hpcadvisorycouncil.com/events/2019/australia-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
CINECA for HCP and e-infrastructures infrastructuresCineca
Sanzio Bassini. Head of the HPC Department of Cineca. Cineca is the technological partner of the Ministry of Education, and takes part in the Italian commitment for the development of e-infrastrcuture in Italy and in Europe for HCP and HCP technologies; scientific data repository and management, cloud computing for industries and Public administration, for the development of computing intensive and data intensive methods for science and engineering
Cineca offers a unique offer for: open access of integrated tier0 and tier1 HCP national infrastructure; of education and training activities under the umbrella of PRACE Training
advanced center action; integrated help desk and scale up process for HCP users support
Conference on Adaptive Hardware and Systems (AHS'14) - FlexTiles IntroductionsFlexTiles Team
FlexTiles is a FP7 Project with the goal of designing a tool-chain for the design of a 3D SoC and prototype on a FPGA Development Platform. This presentation covers the "why, how, when and where" of the Project that will complete in Year 2015
In this deck from GTC Digital, William Beaudin from DDN presents: HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD.
Enabling high performance computing through the use of GPUs requires an incredible amount of IO to sustain application performance. We'll cover architectures that enable extremely scalable applications through the use of NVIDIA’s SuperPOD and DDN’s A3I systems.
The NVIDIA DGX SuperPOD is a first-of-its-kind artificial intelligence (AI) supercomputing infrastructure. DDN A³I with the EXA5 parallel file system is a turnkey, AI data storage infrastructure for rapid deployment, featuring faster performance, effortless scale, and simplified operations through deeper integration. The combined solution delivers groundbreaking performance, deploys in weeks as a fully integrated system, and is designed to solve the world's most challenging AI problems.
Watch the video: https://wp.me/p3RLHQ-lIV
Learn more: https://www.ddn.com/download/nvidia-superpod-ddn-a3i-ai400-appliance-with-the-exa5-filesystem/
and
https://www.nvidia.com/en-us/gtc/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
InfiniBand In-Network Computing Technology and Roadmapinside-BigData.com
In this video from the UK HPC Conference, Richard Graham from Mellanox presents: InfiniBand In-Network Computing Technology and Roadmap.
"In-Network Computing transforms the data center interconnect to become a "distributed CPU", and "distributed memory", enables to overcome performance barriers and to enable faster and more scalable data analysis. HDR 200G InfiniBand In-Network Computing technology includes several elements - Scalable Hierarchical Aggregation and Reduction Protocol (SHARP), smart Tag Matching and rendezvoused protocol, and more. These technologies are in use at some of the recent large scale supercomputers around the world, including the top TOP500 platforms. The session will discuss the InfiniBand In-Network Computing technology and performance results, as well as view to future roadmap."
Watch the video:
Learn more: http://mellanox.com
and
http://hpcadvisorycouncil.com/events/2019/uk-conference/agenda.php
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Energy Efficient GPS Acquisition with Sparse-GPSPrasant Misra
Following rising demands in positioning with GPS, low-cost receivers are becoming widely available; but their energy demands are still too high. For energy efficient GPS sensing in delay-tolerant applications, the possibility of offloading a few milliseconds of raw signal samples and leveraging the greater processing power of the cloud for obtaining a position fix is being actively investigated.
In an attempt to reduce the energy cost of this data offloading operation, we propose Sparse-GPS: a new computing framework for GPS acquisition via sparse approximation.
Within the framework, GPS signals can be efficiently compressed by random ensembles. The sparse acquisition information, pertaining to the visible satellites that are embedded within these limited measurements, can subsequently be recovered by our proposed representation dictionary.
By extensive empirical evaluations, we demonstrate the acquisition quality and energy gains of Sparse-GPS. We show that it is twice as energy efficient than offloading uncompressed data, and has 5-10 times lower energy costs than standalone GPS; with a median positioning accuracy of 40 m.
In this deck from the HPC User Forum at Argonne, Andrew Siegel from Argonne presents: ECP Application Development.
"The Exascale Computing Project is accelerating delivery of a capable exascale computing ecosystem for breakthroughs in scientific discovery, energy assurance, economic competitiveness, and national security. ECP is chartered with accelerating delivery of a capable exascale computing ecosystem to provide breakthrough modeling and simulation solutions to address the most critical challenges in scientific discovery, energy assurance, economic competitiveness, and national security. This role goes far beyond the limited scope of a physical computing system. ECP’s work encompasses the development of an entire exascale ecosystem: applications, system software, hardware technologies and architectures, along with critical workforce development."
Watch the video: https://wp.me/p3RLHQ-kSL
Learn more: https://www.exascaleproject.org
and
http://hpcuserforum.com
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
The NPOESS Integrated Program Office and its principal contractors -- Raytheon and Northrop-Grumman -- are prepraring for the launch of the NPP risk-reduction mission in 2006. This presentation will review program status, and how HDF will be used to deliver NPOESS products at the domestic weather centrals and worldwide field terminals.
Challenges in Assessing Single Event Upset Impact on Processor SystemsWojciech Koszek
Abstract—This paper presents a test methodology developed at Xilinx for real-time soft-error rate testing as well as the software framework in which Device-Under-Test (DUT) and controlling computer are both synchronized with the proton beam controls and run experiments automatically in a predictable manner. The method presented has been successfully used for Zynq®-7000 All Programmable SoC testing at the UC Davis Crocker Nuclear Lab. Presented are the issues and challenges encountered during design and implementation of the framework, as well as lessons learned from the in-house experiments and bootstrapping tests performed with Thorium Foil. The method presented has helped Xilinx to deliver high-quality experimental data and to optimize time spent in the testing facility.
Keywords—Error detection, soft error, architectural vulnerability, statistical error, confidence level, beam facility control
Abstract— During the past year Xilinx, for the first time ever, set out to quantify the soft error rate of a multi-core microprocessor. This work extends on Xilinx’s 10+ years of heritage in FPGA radiation testing. Built on the 28 nanometer technology node, Xilinx’s ZynqTM family of devices integrate a processor subsystem with programmable logic. The processor subsystem includes two 32 bit ARM CortexTM-A9 CPU’s, two NEONTM floating point units, two SIMD processing units, an L1 and L2 cache, on chip SRAM memory and various peripherals. The programmable logic is directly connected with the processing subsystem via ARM’s AMBATM 4 AXI interface. This programmable logic is based on the 7 Series FPGA fabric, consisting of 6-input LUTs and DFFs along with Block RAM, DSP slices, multi-gigabit transceivers, and other blocks. Tests were performed using a proton beam to analyze the soft error susceptibility of the new device. Proton beam testing was deemed acceptable since previous neutron beam and proton beam testing had shown virtually identical cross-sections for 7 Series programmable logic. The results are promising and yield a solid baseline for a typical embedded application targeting any of the Zynq SoC devices. As a foray into processor testing, this Zynq work has laid a solid foundation for future Xilinx SoC test campaigns.
Austin Lesea, Wojciech Koszek, Glenn Steiner, Gary Swift, and Dagan White Xilinx, Inc.
Paper: SELSE 2014 @ Stanford University (PDF, 456KB), 2014
Slides: (PDF, 933KB), 2014
In this deck, Gilad Shainer from Mellanox announces the world’s first HDR 200Gb/s data center interconnect solutions. "These 200Gb/s HDR InfiniBand solutions maintain Mellanox’s generation-ahead leadership while enabling customers and users to leverage an open, standards-based technology that maximizes application performance and scalability while minimizing overall data center total cost of ownership. Mellanox 200Gb/s HDR solutions will become generally available in 2017.
Watch the video presentation: http://insidehpc.com/2016/11/hdr-infiniband/
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #5:FlexTiles Simulation Platform
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Adaptive Hardware and Systems (AHS'14) - FlexTiles OVP DemoFlexTiles Team
The OVP - http://www.ovpworld.org/ - tools are used by the FlexTiles Team to simulate the MultiCore implementation of the GPU. In this example, we have 2x MicroBlaze CPU running in Parallel
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #4: FlexTiles Virtual Platform
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Moishe House West Hartford Info Session presentationDerek Holodak
The slideshow that was meant to accompany the presentation given by Derek and Carly on 10/10/12 at Hebrew High School of New England. To register, fill out an application at http://is.gd/go_moho_weha/
The DM8168 Image Capture solutions will provide a dual High-Resolution Image Capture and Dual Streaming to a Image Store Server, without any interruptions to the System and without any disruption to the User.
A versatile PC/104 Power Supply with Power-over-Ethernet from SundanceFlemming Christensen
The SMT1024 is a dedicated PC/104 Power Supply that provide single-rail output of 3.3V, 5V, 12V and -12V - and up to 60W with DC Input and 24W with Power-over-Ethernet
Slides presented at the FlexTiles Workshop at FPL'2014.
Presentation #3: FlexTiles DSP Accelerators
FlexTiles is a heterogeneous many-core platform reconfigurable at run-time developed within an FP7 project.
Conference on Adaptive Hardware and Systems (AHS'14) - FlexTiles FPGA EmulationFlexTiles Team
The FP7 FlexTiles Project will provide a tool-chain that allows DSPs, CPUs and a FPGA to be implemented on the FlexTiles Development Platform. This slide gives some details about the dynamic re-configurable of the FPGA by the CPU
In Network Computing Prototype Using P4 at KSC/KREONET 2019Kentaro Ebisawa
Case Study of P4 applying to CAN (Control Area Network) data pre-processing using FPGA + Netcope P4 compiler.
Presented at KSC / KREONET WORKSHOP 2019 | DAY 1 Session 1: SDN/NFV/P4
http://www.ksc2019.re.kr/
NFV and SDN: 4G LTE and 5G Wireless Networks on Intel(r) ArchitectureMichelle Holley
The Presentation will outline the KPIs and key optimizations at the platform, NFVi and Stack level in implementing wireless base station stack and Telco Edge cloud on Intel Architecture. The presentation will use the FlexRAN LTE Reference PHY and NEV SDK for MEC to outline the NFV and 5G use cases like network slicing.
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...Linaro
Session ID: HKG18-301
Session Name: HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integrated Processors
Speaker: Glenn Steiner
Track: LITE
★ Session Summary ★
Key Takeaways:
With the drive to increase integration, reduce system costs, accelerate performance, and enhance reliability, software developers are discovering the processor they would like to target is simply not fast enough. This session will help you the system architect or software developer understand how you can architect and develop software on an FPGA integrated processor, and accelerate software code via FPGA accelerators.
Abstract:
As a software developer, in order to meet system level performance requirements, you may have realized that your next software project will be targeting a processor inside of an FPGA. How will this impact your development process and what benefits might you gain with this tight integration of processor and FPGA? Starting from the basics of what FPGAs are (in terms of software programming), this session will provide a simple to understand primer of what modern FPGAs with embedded processors can do. We will wrap up with examples of how high level synthesis tools can move software to programmable logic hardware enabling dramatic software acceleration.
---------------------------------------------------
★ Resources ★
Event Page: http://connect.linaro.org/resource/hkg18/hkg18-301/
Presentation: http://connect.linaro.org.s3.amazonaws.com/hkg18/presentations/hkg18-301.pdf
Video: http://connect.linaro.org.s3.amazonaws.com/hkg18/videos/hkg18-301.mp4
---------------------------------------------------
★ Event Details ★
Linaro Connect Hong Kong 2018 (HKG18)
19-23 March 2018
Regal Airport Hotel Hong Kong
---------------------------------------------------
Keyword: LITE
'http://www.linaro.org'
'http://connect.linaro.org'
---------------------------------------------------
Follow us on Social Media
https://www.facebook.com/LinaroOrg
https://www.youtube.com/user/linaroorg?sub_confirmation=1
https://www.linkedin.com/company/1026961
Best Practice of Compression/Decompression Codes in Apache Spark with Sophia...Databricks
Nowadays, people are creating, sharing and storing data at a faster pace than ever before, effective data compression / decompression could significantly reduce the cost of data usage. Apache Spark is a general distributed computing engine for big data analytics, and it has large amount of data storing and shuffling across cluster in runtime, the data compression/decompression codecs can impact the end to end application performance in many ways.
However, there’s a trade-off between the storage size and compression/decompression throughput (CPU computation). Balancing the data compress speed and ratio is a very interesting topic, particularly while both software algorithms and the CPU instruction set keep evolving. Apache Spark provides a very flexible compression codecs interface with default implementations like GZip, Snappy, LZ4, ZSTD etc. and Intel Big Data Technologies team also implemented more codecs based on latest Intel platform like ISA-L(igzip), LZ4-IPP, Zlib-IPP and ZSTD for Apache Spark; in this session, we’d like to compare the characteristics of those algorithms and implementations, by running different micro workloads as well as end to end workloads, based on different generations of Intel x86 platform and disk.
It’s supposedly to be the best practice for big data software engineers to choose the proper compression/decompression codecs for their applications, and we also will present the methodologies of measuring and tuning the performance bottlenecks for typical Apache Spark workloads.
DAIS19: On the Performance of ARM TrustZoneLEGATO project
Presented by Valerio Schiavoni @vschiavoni at DAIS 19
The TrustZone technology, available in the vast majority of recent Arm processors, allows the execution of code inside a so-called secure world. It effectively provides hardware-isolated areas of the processor for sensitive data and code, i.e., a trusted execution environment (TEE). The Op-Tee framework provides a collection of toolchain, opensource libraries and secure kernel specifically geared to develop applications for TrustZone. This paper presents an in-depth performance- and energy-wise study of TrustZone using the Op-Tee framework, including secure storage and the cost of switching between secure and unsecure worlds, using emulated and hardware measurements.
Hardware simulation for exponential blind equal throughput algorithm using sy...IJECEIAES
Scheduling mechanism is the process of allocating radio resources to User Equipment (UE) that transmits different flows at the same time. It is performed by the scheduling algorithm implemented in the Long Term Evolution base station, Evolved Node B. Normally, most of the proposed algorithms are not focusing on handling the real-time and non-real-time traffics simultaneously. Thus, UE with bad channel quality may starve due to no resources allocated for quite a long time. To solve the problems, Exponential Blind Equal Throughput (EXP-BET) algorithm is proposed. User with the highest priority metrics is allocated the resources firstly which is calculated using the EXP-BET metric equation. This study investigates the implementation of the EXP-BET scheduling algorithm on the FPGA platform. The metric equation of the EXP-BET is modelled and simulated using System Generator. This design has utilized only 10% of available resources on FPGA. Fixed numbers are used for all the input to the scheduler. The system verification is performed by simulating the hardware co-simulation for the metric value of the EXP-BET metric algorithm. The output from the hardware co-simulation showed that the metric values of EXP-BET produce similar results to the Simulink environment. Thus, the algorithm is ready for prototyping and Virtex-6 FPGA is chosen as the platform.
Conference on Adaptive Hardware and Systems (AHS'14) - Why FlexTiles uses OVP...FlexTiles Team
The FlexTiles concept is going to integrate DSPs, GPPs/CPUs and a Embedded FPGA and OVP - http://www.ovpworld.org/ - tools makes it easier to simulate and these Slides will explain how
Conference on Adaptive Hardware and Systems (AHS'14) - The 3D FlexTiles ConceptFlexTiles Team
The FP7 FlexTiles Project's ultimate goal is to design tools for enableing the design of a System-on-Chip that contains CPUs/GPPs, DSPs and FPGA logic and this chip is not an ordinary SoC chip; it's a 3D chip and these slides explains why a 3D concept is requries
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
The Internet of Things (IoT) is a revolutionary concept that connects everyday objects and devices to the internet, enabling them to communicate, collect, and exchange data. Imagine a world where your refrigerator notifies you when you’re running low on groceries, or streetlights adjust their brightness based on traffic patterns – that’s the power of IoT. In essence, IoT transforms ordinary objects into smart, interconnected devices, creating a network of endless possibilities.
Here is a blog on the role of electrical and electronics engineers in IOT. Let's dig in!!!!
For more such content visit: https://nttftrg.com/
CW RADAR, FMCW RADAR, FMCW ALTIMETER, AND THEIR PARAMETERSveerababupersonal22
It consists of cw radar and fmcw radar ,range measurement,if amplifier and fmcw altimeterThe CW radar operates using continuous wave transmission, while the FMCW radar employs frequency-modulated continuous wave technology. Range measurement is a crucial aspect of radar systems, providing information about the distance to a target. The IF amplifier plays a key role in signal processing, amplifying intermediate frequency signals for further analysis. The FMCW altimeter utilizes frequency-modulated continuous wave technology to accurately measure altitude above a reference point.
1. www.flextiles.eu
FlexTiles
Prof. Dr.-Ing Michael Hübner
Ruhr-University Bochum (RUB)
FlexTiles Workshop at FPL 2014
FPGA-Based Emulation of the FlexTiles Platform
2. 2 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Global View
GPP Node
AI
DSP
FPGA Fabric
NI
Network-on-Chip (NoC)
NI
NI
AI
NI
Reconfig. Control
DDR Node
NI
Tile
Tile
GPP Node
NI
I/O
NI
eFPGA
3. 3 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Development Platform
Software Emulator
Based on Open Virtual Platform
Enable Software prototyping
Hardware Emulator
FlexTiles Development Platform
Based on 2 Virtex-6 FPGAs
Hardware design by TU/e
Enable Hw/Sw prototyping
4. 4 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Development Platform
System Components
GPP Node
MicroBlaze soft-core processors
Network on Chip
Nodes connected via Network Interfaces
Network Interface
Device Transport Layer (DTL) protocol
Accelerator Interface
Newly developed within this project
Accelerators
Micro-programmed
Data-flow
GPP Node
AI
Accelerator
NI
NoC
NI
5. 5 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Network on Chip
Network on Chip
AElite NoC
Network Interface
Main Task
Data into packets NoC
NoC data out of packets
Device Transport Layer
Master / Slave
Data flow
Load / Store
Stream
GPP Node
AI
Accelerator
NI
NoC
NI
6. 6 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Accelerator Interface
Main Tasks
Control accelerators
Provide a unique interface
Components
Accelerator control
Protocol parser
GPP Node
AI
Accelerator
NI
NoC
NI
AC
7. 7 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
AI : DTL-To-AI
Device Transaction Layer (DTL)
Command (dtl_cmd)
Write (dtl_wr)
Read (dtl_rd)
DTL-To-AI Structure
GPP Node
AI
Accelerator
NI
NoC
NI
DTL2AI
AC
8. 8 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
AI : Accelerator Control
Accelerator Control Structure
GPP Node
AI
Accelerator
NI
NoC
NI
DTL2AI
AC
in
out
9. 9 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Accelerators
Accelerator Types
Micro-programmed: FlexTiles DSP
Data-flow: Accelerators on eFPGA
Lite Accelerator
Example/Test Application
Implements
2 register banks
3 FIFOs
Adding two sequential values
GPP Node
AI
Accelerator
NI
NoC
NI
10. 10 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Software Functions
Accelerator Software Functions
send_rqst
(int acc_number, int size, int ch_id, int addr, char type)
send_data
(int acc_number, int size, int ch_id, int addr, int* array)
read_data
(int acc_number, int size, int addr, int* array)
write_register
(int acc_number, int ch_id, int addr, int value)
read_register
(int acc_number, char regis)
GPP Node
AI
Accelerator
NI
NoC
NI
11. 11 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Development Platform Configuration
Xilinx Platform Studio Project
TUe Platform configured via XML file
Xilinx Microprocessor Project file (XMP) Platform Configuration
User Constraint File (UCF)
Microprocessor Hardware Specification (MHS)
Microprocessor Software Specification (MSS)
12. 12 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Adding Accelerators
Accelerator Integration
Peripheral Core (PCORE)
Add Accelerator files to pcore directory
Add Accelerator entry to MHS Configuration Files
Black Box Definition file (BBD)
Peripheral Analyze Order file (PAO)
Microprocessor Peripheral Definition (MPD)
13. 13 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Implementation
XPS Command Line Mode
xps -nw -scr system.tcl system.xmp Simple Design:
GPP
AI
+
Acc
Debug
Monitor
NoC
14. 14 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Implementation Results
Results
Standalone platform
Occupied Slices: 5%
IOB: 1%
RAMB36E1: 13%
RAMB18E1: 0%
BUFG: 9%
Platform with lite accelerator
Occupied Slices: 7%
IOB: 1%
RAMB36E1: 13%
RAMB18E1: 1%
BUFG: 12%
GPP Node
NI
NoC
Monitor Node
GPP Node
AI
Lite Accelerator
NI
NoC
NI
Monitor Node
15. 15 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Implementation Results
Results
Platform with DSP
Occupied Slices: 62%
IOB: 1%
RAMB36E1: 15%
RAMB18E1: 1%
BUFG: 9%
GPP Node
AI
DSP
NI
NoC
NI
Monitor Node
16. 16 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Challenges
Platform with DSP
Unroutable Signals (Rats Nests) Colors
Monitor: yellow
GPP: cyan
NoC: purple
AI: green
DSP: red
17. 17 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution, copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Summary
Emulator Goal
Run demo application like optical flow
NoC
GPP Node
NI
NI
GPP Node
GPP Node
NI
NI
GPP Node
NI
NI
NI
NI
AI
Acc
AI
Acc
Acc
AI
Acc
AI
NoC
AURORA
18. 18 /
The information contained in this document and any attachments are the property of FlexTiles consortium. You are hereby notified that any review, dissemination, distribution,
copying or otherwise use of this document must be done in accordance with the CA of the project (TRT/DJ/624412785.2011). Template version 1.0
18
Many Thanks for Your Attention!
www.flextiles.eu
Benedikt.Janssen@rub.de
The last Slide