The document describes the BondMachine project, which aims to build a software ecosystem for dynamically generating computer architectures composed of many small computing cores that can have different instruction sets and interconnect in flexible ways. The core building block is a "connecting processor" with registers, memory, and a set of operations. The goal is to mold architectures specifically to computational problems rather than using static designs.
Slides of the talk at: "Joint ICTP-IAEA School on FPGA-based SoC and its
Applications to Nuclear and Scientific Instrumentation". 14 November - 02 December 2022 ICTP, Trieste, Italy
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
Slides of the course: "Introductory course to HLS FPGA programming", Nov 27 – 30, 2023. ICSC National research center on HPC, big data and Quantum Computing
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
Slides of the talk at: "Joint ICTP-IAEA School on FPGA-based SoC and its
Applications to Nuclear and Scientific Instrumentation". 14 November - 02 December 2022 ICTP, Trieste, Italy
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
Slides of the course: "Introductory course to HLS FPGA programming", Nov 27 – 30, 2023. ICSC National research center on HPC, big data and Quantum Computing
A Primer on FPGAs - Field Programmable Gate ArraysTaylor Riggan
A focus on the use of FPGAs by cloud service providers. Includes Microsoft Azure Catapult, Google Tensor Processors, and Amazon EC2 F1 instances. Also includes background info on how to get started with FPGAs
This presentation contains the basics of FPGA design, what are HDL [hardware description languages], how VHDL design works on FPGA, what are the high tech applications, FPGA R & D opportunities, latest FPGA tools, resources and the National Activities on FPGA happened at Nepal.
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
"Project Hydrogen is a major Apache Spark initiative to bring state-of-the-art AI and Big Data solutions together. It contains three major projects: 1) barrier execution mode 2) optimized data exchange and 3) accelerator-aware scheduling. A basic implementation of barrier execution mode was merged into Apache Spark 2.4.0, and the community is working on the latter two. In this talk, we will present progress updates to Project Hydrogen and discuss the next steps.
First, we will review the barrier execution mode implementation from Spark 2.4.0. It enables developers to embed distributed training jobs properly on a Spark cluster. We will demonstrate distributed AI integrations built on top it, e.g., Horovod and Distributed TensorFlow. We will also discuss the technical challenges to implement those integrations and future work. Second, we will outline on-going work for optimized data exchange. Its target scenario is distributed model inference. We will present how we do performance testing/profiling, where the bottlenecks are, and how to improve the overall throughput on Spark. If time allows, we might also give updates on accelerator-aware scheduling.
"
Checkpointing the Un-checkpointable: MANA and the Split-Process Approachinside-BigData.com
In this deck from the MVAPICH User Group, Gene Cooperman from Northeastern University presents: Checkpointing the Un-checkpointable: MANA and the Split-Process Approach.
"Checkpointing is the ability to save the state of a running process to stable storage, and later restarting that process from the point at which it was checkpointed. Transparent checkpointing (also known as system-level checkpointing) refers to the ability to checkpoint a (possibly MPI-parallel or distributed) application, without modifying the binaries of that target application. Traditional wisdom has assumed that the transparent checkpointing approach has some natural restrictions. Examples of long-held restrictions are: (i) the need for a separate network-aware checkpoint-restart module for each network that will be targeted (e.g., one for TCP, one for InfiniBand, one for Intel Omni-Path, etc.); (ii) the impossibility of transparently checkpointing a CUDA-based GPU application that uses NVIDIA UVM (UVM is "unified virtual memory", which allows the host CPU and the GPU device to each access the same virtual address space at the same time.); and (iii) the impossibility of transparently checkpointing an MPI application that was compiled for one MPI library implementation (e.g., for MPICH or for Open MPI), and then restarting under an MPI implementation with targeted optimizations (e.g., MVAPICH2-X or MVAPICH2-EA). This talk breaks free from the restrictions described above, and presents an efficient, new software architecture: split processes. The "MANA for MPI" software demonstrates this split-process architecture. The MPI application code resides in "upper-half memory", and the MPI/network libraries reside in "lower-half memory". The tight coupling of upper and lower half ensures low runtime overhead. And yet, when restarting from a checkpoint, "MANA for MPI" allows one to choose to replace the original lower half with a different MPI library implementation. This different MPI implementation may offer such specialized features as enhanced intra- and inter-node point-to-point performance and enhanced performance of collective communication (e.g., with MVAPICH2-X); or perhaps better energy awareness (e.g., with MVAPICH2-EA). Further, the new lower half MPI may be optimized to run on different hardware, including a different network interconnect, a different number of CPU cores, a different configuration of ranks-per-node, etc. This makes cross-cluster migration both efficient and practical. This talk represents joint work with Rohan Garg and Gregory Price."
Watch the video: https://wp.me/p3RLHQ-kMn
Learn more: http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCarlos Reaño González
Paper presented at the 2nd International Workshop on Deployment and Use of Accelerators (DUAC). Co-located with the 51st International Conference on Parallel Processing (ICPP). August 29, 2021 (virtual event). More information at: https://duac2022.wordpress.com/
FPGAs: A considerable surge in data traffic is recognized every single day as now many technologies have evolved, leading to more and more data production.
To know more about FPGAs for Data Center Acceleration, see https://www.logic-fruit.com/blog/fpga/fpgas-for-data-center-acceleration/
About Logic Fruit Technologies
Logic Fruit Technologies is a product engineering R&D & consulting services provider for embedded systems and application development. We provide end-to-end solutions from the conception of the idea and design to the finished product. We have been servicing customers globally for over a decade.
The company has specific experience in various fields, such as
-FPGA Design & hardware design
-RTL IP Design
-A variety of digital protocols
-Communication buses as1G, 10G
-Ethernet
-PCIe
-DIGRF
-STM16/64
-HDMI.
Logic Fruit Technologies is also an expert in developing,
-software-defined radio (SDR) IPs
-Encryption
-Signal generation
-Data analysis, and
-Multiple Image Processing
Techniques.
Recently Logic Fruit technologies are also exploring FPGA acceleration on data centers for real-time data processing.
**Our Social Media Channels**
Facebook: https://www.facebook.com/LogicFruit/
Twitter: https://twitter.com/logicfruit
LinkedIn: https://www.linkedin.com/company/logi…
Website: https://www.logic-fruit.com/
#LFT #LogicFruitTechnologies #LogicFruit
Interested to view more Slide Shares, Click on the below links,
https://www.slideshare.net/LogicFruit/a-designers-practical-guide-to-arinc-429-standard-3pptx
https://www.slideshare.net/LogicFruit/a-swift-introduction-to-milstd
https://www.slideshare.net/LogicFruit/arinc-the-ultimate-guide-to-modern-avionics-protocol/LogicFruit/arinc-the-ultimate-guide-to-modern-avionics-protocol
https://www.slideshare.net/LogicFruit/arinc-629-digital-data-bus-specifications/LogicFruit/arinc-629-digital-data-bus-specifications
https://www.slideshare.net/LogicFruit/afdx
https://www.slideshare.net/LogicFruit/end-system-design-parameters-of-the-arinc-664-part-7
https://www.slideshare.net/LogicFruit/compute-express-link-cxl-everything-you-ought-to-know
https://www.logic-fruit.com/blog/fpga/what-is-fpga/
https://www.slideshare.net/LogicFruit/cxl-vs-pcie-gen-5-the-brief-comparison
https://www.slideshare.net/LogicFruit/fpga-technology-development-and-market-trends-in-the-new-decade
https://www.slideshare.net/LogicFruit/fpga-design-an-ultimate-guide-for-fpga-enthusiasts
https://www.slideshare.net/LogicFruit/fpga-vs-asic-design-comparison
https://www.slideshare.net/LogicFruit/afdx-a-timedeterministic-application-of-arinc-664-part-7
https://www.slideshare.net/LogicFruit/fpgas-expansion-in-adas-autonomous-driving
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the upcoming OpenACC Summit, a complete schedule of upcoming events, using OpenACC to optimize structural analysis, new resources and more!
With the rise of fog and edge-computing as the basic paradigms for future communication standards such as 6G, new processing requirements are established. On the other hand, new security algorithms appear with the scaling of quantum technology, increasing the complexity of the cryptography applications for IoT devices with a tight standardization timeline. Finally, the integration of satellites as nodes for communication networks includes fault-tolerance and error-correction codes as design parameters.
FPGA-based soft-processors are supported by industry and space agencies as promising candidates to overcome all these challenges, due to their flexibility and power consumption compared to GPUs or multithreading CPUs with co-processors. To optimize these architectures to a wide range of scenarios, common methods, and arithmetic functions need to be integrated into the ISA. This talk will show some examples of the RISC-V EL2 core for both classical and post-quantum cryptography and error correction codes, reducing the latency of standardized solutions at a cost of a small cross-section increase keeping the behavior under radiation effects similar to the original core.
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers a complete schedule of upcoming events, using OpenACC for a biophysics problem, HPC Summit Digital, overview of the SDSC GPU Hackathon, OmpSs-2 programming model, new resources and more!
Implementation of Block Chain Technology In Java PlatformPrudhviRamMaddina
As of now the emerging trend in technology is never complete without having a secure platform for its applications and Block-Chain technology promises this cryptography features by employing secured hashing algorithms. This BlockChain technology uses distributed technology unlike the centralized databases in typical internet model which made it theoretically 51% unhackable in near future. Who knows in future Block-Chain Technology may replace the whole internet model. Don't doubt it for a couple of reasons. Firstly it has potential features which are more than enough to replace the current internet model and secondly a huge number of companies are investing in this Block-Chain Platform. In this paper, we are going to integrate the Oracle database to implement Block Chain Technology for basic student database in java platform and consequently implementing for any relational database. Further with these works, we are willing to monitor big data such as stock market transactions etc.
This presentation contains the basics of FPGA design, what are HDL [hardware description languages], how VHDL design works on FPGA, what are the high tech applications, FPGA R & D opportunities, latest FPGA tools, resources and the National Activities on FPGA happened at Nepal.
Updates from Project Hydrogen: Unifying State-of-the-Art AI and Big Data in A...Databricks
"Project Hydrogen is a major Apache Spark initiative to bring state-of-the-art AI and Big Data solutions together. It contains three major projects: 1) barrier execution mode 2) optimized data exchange and 3) accelerator-aware scheduling. A basic implementation of barrier execution mode was merged into Apache Spark 2.4.0, and the community is working on the latter two. In this talk, we will present progress updates to Project Hydrogen and discuss the next steps.
First, we will review the barrier execution mode implementation from Spark 2.4.0. It enables developers to embed distributed training jobs properly on a Spark cluster. We will demonstrate distributed AI integrations built on top it, e.g., Horovod and Distributed TensorFlow. We will also discuss the technical challenges to implement those integrations and future work. Second, we will outline on-going work for optimized data exchange. Its target scenario is distributed model inference. We will present how we do performance testing/profiling, where the bottlenecks are, and how to improve the overall throughput on Spark. If time allows, we might also give updates on accelerator-aware scheduling.
"
Checkpointing the Un-checkpointable: MANA and the Split-Process Approachinside-BigData.com
In this deck from the MVAPICH User Group, Gene Cooperman from Northeastern University presents: Checkpointing the Un-checkpointable: MANA and the Split-Process Approach.
"Checkpointing is the ability to save the state of a running process to stable storage, and later restarting that process from the point at which it was checkpointed. Transparent checkpointing (also known as system-level checkpointing) refers to the ability to checkpoint a (possibly MPI-parallel or distributed) application, without modifying the binaries of that target application. Traditional wisdom has assumed that the transparent checkpointing approach has some natural restrictions. Examples of long-held restrictions are: (i) the need for a separate network-aware checkpoint-restart module for each network that will be targeted (e.g., one for TCP, one for InfiniBand, one for Intel Omni-Path, etc.); (ii) the impossibility of transparently checkpointing a CUDA-based GPU application that uses NVIDIA UVM (UVM is "unified virtual memory", which allows the host CPU and the GPU device to each access the same virtual address space at the same time.); and (iii) the impossibility of transparently checkpointing an MPI application that was compiled for one MPI library implementation (e.g., for MPICH or for Open MPI), and then restarting under an MPI implementation with targeted optimizations (e.g., MVAPICH2-X or MVAPICH2-EA). This talk breaks free from the restrictions described above, and presents an efficient, new software architecture: split processes. The "MANA for MPI" software demonstrates this split-process architecture. The MPI application code resides in "upper-half memory", and the MPI/network libraries reside in "lower-half memory". The tight coupling of upper and lower half ensures low runtime overhead. And yet, when restarting from a checkpoint, "MANA for MPI" allows one to choose to replace the original lower half with a different MPI library implementation. This different MPI implementation may offer such specialized features as enhanced intra- and inter-node point-to-point performance and enhanced performance of collective communication (e.g., with MVAPICH2-X); or perhaps better energy awareness (e.g., with MVAPICH2-EA). Further, the new lower half MPI may be optimized to run on different hardware, including a different network interconnect, a different number of CPU cores, a different configuration of ranks-per-node, etc. This makes cross-cluster migration both efficient and practical. This talk represents joint work with Rohan Garg and Gregory Price."
Watch the video: https://wp.me/p3RLHQ-kMn
Learn more: http://mug.mvapich.cse.ohio-state.edu/program/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCarlos Reaño González
Paper presented at the 2nd International Workshop on Deployment and Use of Accelerators (DUAC). Co-located with the 51st International Conference on Parallel Processing (ICPP). August 29, 2021 (virtual event). More information at: https://duac2022.wordpress.com/
FPGAs: A considerable surge in data traffic is recognized every single day as now many technologies have evolved, leading to more and more data production.
To know more about FPGAs for Data Center Acceleration, see https://www.logic-fruit.com/blog/fpga/fpgas-for-data-center-acceleration/
About Logic Fruit Technologies
Logic Fruit Technologies is a product engineering R&D & consulting services provider for embedded systems and application development. We provide end-to-end solutions from the conception of the idea and design to the finished product. We have been servicing customers globally for over a decade.
The company has specific experience in various fields, such as
-FPGA Design & hardware design
-RTL IP Design
-A variety of digital protocols
-Communication buses as1G, 10G
-Ethernet
-PCIe
-DIGRF
-STM16/64
-HDMI.
Logic Fruit Technologies is also an expert in developing,
-software-defined radio (SDR) IPs
-Encryption
-Signal generation
-Data analysis, and
-Multiple Image Processing
Techniques.
Recently Logic Fruit technologies are also exploring FPGA acceleration on data centers for real-time data processing.
**Our Social Media Channels**
Facebook: https://www.facebook.com/LogicFruit/
Twitter: https://twitter.com/logicfruit
LinkedIn: https://www.linkedin.com/company/logi…
Website: https://www.logic-fruit.com/
#LFT #LogicFruitTechnologies #LogicFruit
Interested to view more Slide Shares, Click on the below links,
https://www.slideshare.net/LogicFruit/a-designers-practical-guide-to-arinc-429-standard-3pptx
https://www.slideshare.net/LogicFruit/a-swift-introduction-to-milstd
https://www.slideshare.net/LogicFruit/arinc-the-ultimate-guide-to-modern-avionics-protocol/LogicFruit/arinc-the-ultimate-guide-to-modern-avionics-protocol
https://www.slideshare.net/LogicFruit/arinc-629-digital-data-bus-specifications/LogicFruit/arinc-629-digital-data-bus-specifications
https://www.slideshare.net/LogicFruit/afdx
https://www.slideshare.net/LogicFruit/end-system-design-parameters-of-the-arinc-664-part-7
https://www.slideshare.net/LogicFruit/compute-express-link-cxl-everything-you-ought-to-know
https://www.logic-fruit.com/blog/fpga/what-is-fpga/
https://www.slideshare.net/LogicFruit/cxl-vs-pcie-gen-5-the-brief-comparison
https://www.slideshare.net/LogicFruit/fpga-technology-development-and-market-trends-in-the-new-decade
https://www.slideshare.net/LogicFruit/fpga-design-an-ultimate-guide-for-fpga-enthusiasts
https://www.slideshare.net/LogicFruit/fpga-vs-asic-design-comparison
https://www.slideshare.net/LogicFruit/afdx-a-timedeterministic-application-of-arinc-664-part-7
https://www.slideshare.net/LogicFruit/fpgas-expansion-in-adas-autonomous-driving
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers the upcoming OpenACC Summit, a complete schedule of upcoming events, using OpenACC to optimize structural analysis, new resources and more!
With the rise of fog and edge-computing as the basic paradigms for future communication standards such as 6G, new processing requirements are established. On the other hand, new security algorithms appear with the scaling of quantum technology, increasing the complexity of the cryptography applications for IoT devices with a tight standardization timeline. Finally, the integration of satellites as nodes for communication networks includes fault-tolerance and error-correction codes as design parameters.
FPGA-based soft-processors are supported by industry and space agencies as promising candidates to overcome all these challenges, due to their flexibility and power consumption compared to GPUs or multithreading CPUs with co-processors. To optimize these architectures to a wide range of scenarios, common methods, and arithmetic functions need to be integrated into the ISA. This talk will show some examples of the RISC-V EL2 core for both classical and post-quantum cryptography and error correction codes, reducing the latency of standardized solutions at a cost of a small cross-section increase keeping the behavior under radiation effects similar to the original core.
Stay up-to-date on the latest news, events and resources for the OpenACC community. This month’s highlights covers a complete schedule of upcoming events, using OpenACC for a biophysics problem, HPC Summit Digital, overview of the SDSC GPU Hackathon, OmpSs-2 programming model, new resources and more!
Implementation of Block Chain Technology In Java PlatformPrudhviRamMaddina
As of now the emerging trend in technology is never complete without having a secure platform for its applications and Block-Chain technology promises this cryptography features by employing secured hashing algorithms. This BlockChain technology uses distributed technology unlike the centralized databases in typical internet model which made it theoretically 51% unhackable in near future. Who knows in future Block-Chain Technology may replace the whole internet model. Don't doubt it for a couple of reasons. Firstly it has potential features which are more than enough to replace the current internet model and secondly a huge number of companies are investing in this Block-Chain Platform. In this paper, we are going to integrate the Oracle database to implement Block Chain Technology for basic student database in java platform and consequently implementing for any relational database. Further with these works, we are willing to monitor big data such as stock market transactions etc.
BondMachine slides for the InnovateFPGA 2018 Grand FinalMirko Mariotti
The BondMachine Project reached the InnovateFPGA 2018 Grand Final. These are the slides of the presentation of the project given at the Intel Campus in San Jose on the 15th of August 2018
Harvesting dispersed computational resources with OpenstackMirko Mariotti
Harvesting dispersed computational resources is nowadays an important and strategic topic especially in an environment, like the computational science one, where computing needs constantly increase. On the other hand managing dispersed resources might not be neither an easy task not costly effective. We successfully explored the usage of OpenStack middleware to achieve this objective, aiming not only at harvesting resource but also at providing a modern paradigm of computing and data usage access. The deployment of a multi-sites Cloud implies the management of multiple computing services with the goal of sharing resources between more sites, allowing local data centers to scale out with external assets. The cited scenario immediately points out portability, interoperability and compatibility issues, but on the other side this opens really interesting scenarios and use cases especially in load balancing, disaster recovery and advanced features such as cross-site networks, cross-site storage systems, cross-site scheduling and placement of remote VMs. Moreover, a cloud infrastructure allows the exploitation of Platform as a Service, and software as a server solutions which in the end it is what really matter for the science. The main goal of the present work is to illustrate a real example on how to build a geographically distributed cloud to share and manage computing and storage resources, owned by heterogeneous cooperating entities. The sites are connected via a SDN technology through encrypted point-to-point channels being the “backbone” for the overlay networks. Specifically openVPN and IPsec have been used, depending on specific networking constraints. Over the point-to-point mesh, OSI L2 Ethernet frames of both the OpenStack management and projects VLAN networks have been encapsulated using the VXLAN protocol. The networking structure has been deployed for the specific purpose of creating a distributed overlay L2 Ethernet to be used by OpenStack. Finally, to fine tune the network setup and avoid cross site unnecessary traffic, different sites have been logically separated using different availability zones, with VMs on every zone having the capability of independently accessing the Internet through the site where they are physically running.
The BondMachine: a fully reconfigurable computing ecosystemMirko Mariotti
The aim of the BondMachine (BM) project is to implement a computing system to enable a real and full exploitation of the underlying hardware. This is a key to the success in the era of hybrid computing. In order to achieve this objective the BM has been designed to create a heterogeneous and flexible architecture on top of FPGAs. Moreover the overall vision is based on a reduction of the number of hardware/software layers which by product guarantees a simpler software development. As such, the BM project has been thought as a complete reconfigurable computing ecosystem that, starting from a high-level description, creates both the hardware and the software that runs on it.
The two architectural pillars are computing elements (processors) and non-computing elements (for example memories, channels, barriers). The latter are meant to be shared among processors. Finally, thanks to a custom network protocol, many BondMachines can be interconnected together, therefore building heterogeneous multi-core systems or even clusters of multi-cores.
The flexibility of the BM makes possible the implementation of any computing system ranging from networks of small agents, like IoT (Internet of Things), to high-performance devices for ML (Machine Learning) or real time data analysis, and even systems that mix all this different characteristics together. The BM can interact with standard Linux workstations both as a special purpose hardware accelerator or as part of a computer/BM hybrid clusters.
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'Mirko Mariotti
Slides presentate al Workshop CCR 2016 dell' Istituto Nazionale di Fisica Nucleare, riguardanti la progettazione e lo sviluppo del sistema di gestione dei servizi erogati dal Dipartimento di Fisica e Geologia dell'Universita' di Perugia e dalla locale sezione INFN.
Presentazione delle attivita' svolte utilizzando il sistema di virtualizzazione Xen nel dipartimento di fisica dell' universita' di Perugia fino al 2006.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Saudi Arabia stands as a titan in the global energy landscape, renowned for its abundant oil and gas resources. It's the largest exporter of petroleum and holds some of the world's most significant reserves. Let's delve into the top 10 oil and gas projects shaping Saudi Arabia's energy future in 2024.
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Final project report on grocery store management system..pdfKamal Acharya
In today’s fast-changing business environment, it’s extremely important to be able to respond to client needs in the most effective and timely manner. If your customers wish to see your business online and have instant access to your products or services.
Online Grocery Store is an e-commerce website, which retails various grocery products. This project allows viewing various products available enables registered users to purchase desired products instantly using Paytm, UPI payment processor (Instant Pay) and also can place order by using Cash on Delivery (Pay Later) option. This project provides an easy access to Administrators and Managers to view orders placed using Pay Later and Instant Pay options.
In order to develop an e-commerce website, a number of Technologies must be studied and understood. These include multi-tiered architecture, server and client-side scripting techniques, implementation technologies, programming language (such as PHP, HTML, CSS, JavaScript) and MySQL relational databases. This is a project with the objective to develop a basic website where a consumer is provided with a shopping cart website and also to know about the technologies used to develop such a website.
This document will discuss each of the underlying technologies to create and implement an e- commerce website.
Student information management system project report ii.pdfKamal Acharya
Our project explains about the student management. This project mainly explains the various actions related to student details. This project shows some ease in adding, editing and deleting the student details. It also provides a less time consuming process for viewing, adding, editing and deleting the marks of the students.
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Runway Orientation Based on the Wind Rose Diagram.pptx
INFN SOSC 2022 Talk
1. Machine Learning on FPGA: The BondMachine Project
Mirko Mariotti 1,2 Giulio Bianchini 1 Loriano Storchi 3,2 Giacomo Surace 2
Daniele Spiga 2
1Dipartimento di Fisica e Geologia, Universitá degli Studi di Perugia
2INFN sezione di Perugia
3Dipartimento di Farmacia, Universitá degli Studi G. D’Annunzio
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 1
2. Outline
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 2
3. Introduction
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 3
4. Current challenges in computing
■ Von Neumann Bottleneck:
New computational problems show that current architectural models has to be
improved or changed to address future payloads.
■ Energy Efficient computation:
Not wasting "resources" (silicon, time, energy, instructions).
Using the right resource for the specific case
■ Edge/Fog/Cloud Computing:
Making the computation where it make sense
Avoiding the transfer of unnecessary data
Creating consistent interfaces for distributed systems
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 4
5. Current challenges in computing
■ Von Neumann Bottleneck:
New computational problems show that current architectural models has to be
improved or changed to address future payloads.
■ Energy Efficient computation:
Not wasting "resources" (silicon, time, energy, instructions).
Using the right resource for the specific case
■ Edge/Fog/Cloud Computing:
Making the computation where it make sense
Avoiding the transfer of unnecessary data
Creating consistent interfaces for distributed systems
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 4
6. Current challenges in computing
■ Von Neumann Bottleneck:
New computational problems show that current architectural models has to be
improved or changed to address future payloads.
■ Energy Efficient computation:
Not wasting "resources" (silicon, time, energy, instructions).
Using the right resource for the specific case
■ Edge/Fog/Cloud Computing:
Making the computation where it make sense
Avoiding the transfer of unnecessary data
Creating consistent interfaces for distributed systems
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 4
7. FPGA
A field programmable gate array (FPGA) is an integrated
circuit whose logic is re-programmable.
■ Parallel computing
■ Highly specialized
■ Energy efficient
■ Array of programmable logic blocks
■ Logic blocks configurable
to perform complex functions
■ The configuration is specified
with the hardware description language
FIRMWARE
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 5
10. FPGA
Use in computing
The use of FPGA in computing is growing due several reasons:
■ can potentially deliver great performance via massive parallelism
■ can address payloads which are not performing well on uniprocessors (Neural
Networks, Deep Learning)
■ can handle efficiently non-standard data types
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 8
11. FPGA
Use in computing
The use of FPGA in computing is growing due several reasons:
■ can potentially deliver great performance via massive parallelism
■ can address payloads which are not performing well on uniprocessors (Neural
Networks, Deep Learning)
■ can handle efficiently non-standard data types
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 8
12. FPGA
Use in computing
The use of FPGA in computing is growing due several reasons:
■ can potentially deliver great performance via massive parallelism
■ can address payloads which are not performing well on uniprocessors (Neural
Networks, Deep Learning)
■ can handle efficiently non-standard data types
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 8
13. Integration of neural networks on FPGA
FPGAs are playing an increasingly important role in the industry sampling and data
processing.
Deep Learning
In the industrial field
■ Intelligent vision;
■ Financial services;
■ Scientific simulations;
■ Life science and medical data analysis;
In the scientific field
■ Real time deep learning in particle
physics;
■ Hardware trigger of LHC experiments;
■ And many others ...
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 9
14. CPU vs GPU vs FPGA: Instructions, memory and parallelism
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 10
15. FPGA Performance
■ FPGA Performance is predictable
■ There is no context switch, garbage collector or any background process
■ The bitstream will be executed the same number of clock cycles every time
■ The number of clock cycles needed can be computed easily
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 11
16. Software point of view
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 12
17. FPGA
Drawbacks
On the other hand the adoption on FPGA has several drawbacks:
■ Porting of legacy code is usually hard.
■ Interoperability with standard applications is problematic.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 13
18. FPGA
Drawbacks
On the other hand the adoption on FPGA has several drawbacks:
■ Porting of legacy code is usually hard.
■ Interoperability with standard applications is problematic.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 13
19. FPGA Programming workflow
HDL code
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 14
20. FPGA Programming workflow
HDL code
Netlist of elementary gates
Synthesis
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 14
21. FPGA Programming workflow
HDL code
Netlist of elementary gates
Synthesis
Mapped Logic Cells
Implementation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 14
22. FPGA Programming workflow
HDL code
Netlist of elementary gates
Synthesis
Mapped Logic Cells
Implementation
Bitstream
Bitstream creation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 14
23. CPU vs GPU vs FPGA (HDL): Coding difficulty
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 15
24. HLS Worklow
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 16
25. CPU vs GPU vs FPGA (HLS): Coding difficulty
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 17
26. Firmware generation
Many projects have the goal of abstracting the firmware generation and use process.
High Level
Low Level
FIRMWARE
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 18
27. Connections
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 19
28. Latency and Throughput
■ Latency: time passed from inputs to outputs
■ Throughput: quantity of outputs in the unit of time
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 20
29. Latency in FPGA
FPGAs achieve lower latency than software:
■ latency below the microsecond
■ latency constant and predictable
CPUs are not suitable as thread overhead is order of 10 microseconds
GPUs are even worse since PCIe bus sys-call could require milliseconds
Examples:
■ Ultra low latency trading
■ CERN trigger system
Slide credits: M.Barbone
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 21
30. Occupancy
Is the amount of the various resources
used on and FPGA (LUTs, DSP, etc)
Problems:
■ The algorithms has to create a circuit
that can be contained in the available
resources
■ Tools to shrink or grow the occupancy
■ The need for a runtime, platforms
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 22
31. Cloud resources
■ Amazon EC2 F1 use FPGAs to enable delivery of custom hardware accelerations.
F1 instances are easy to program and come with everything you need to develop,
simulate, debug, and compile hardware acceleration code.
■ Microsoft’s Project Brainwave is a deep learning platform for real-time AI inference
in the cloud and on the edge. A soft Neural Processing Unit (NPU), based on a
high-performance field-programmable gate array (FPGA).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 23
33. Firmware
Hardware Description Language High Level Synthesis
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 24
34. Firmware
Hardware Description Language High Level Synthesis BondMachine
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 24
35. The BondMachine
High level sources: Go, TensorFlow, NN, ...
Building a new kind of computer architecture (multi-core and heterogeneous both in
cores types and interconnections) which dynamically adapt to the specific
computational problem rather than be static.
BM architecture Layer
FPGA
Concurrency
and Specializa-
tion
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 25
36. The BondMachine
High level sources: Go, TensorFlow, NN, ...
Building a new kind of computer architecture (multi-core and heterogeneous both in
cores types and interconnections) which dynamically adapt to the specific
computational problem rather than be static.
BM architecture Layer
FPGA
Concurrency
and Specializa-
tion
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 25
37. The BondMachine
High level sources: Go, TensorFlow, NN, ...
Building a new kind of computer architecture (multi-core and heterogeneous both in
cores types and interconnections) which dynamically adapt to the specific
computational problem rather than be static.
BM architecture Layer
FPGA
Concurrency
and Specializa-
tion
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 25
38. The BondMachine
High level sources: Go, TensorFlow, NN, ...
Building a new kind of computer architecture (multi-core and heterogeneous both in
cores types and interconnections) which dynamically adapt to the specific
computational problem rather than be static.
BM architecture Layer
FPGA
Concurrency
and Specializa-
tion
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 25
39. The BondMachine
High level sources: Go, TensorFlow, NN, ...
Building a new kind of computer architecture (multi-core and heterogeneous both in
cores types and interconnections) which dynamically adapt to the specific
computational problem rather than be static.
BM architecture Layer
FPGA
Concurrency
and Specializa-
tion
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 25
40. The BondMachine project
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 26
41. Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation of computer
architectures that:
■ Are composed by many, possibly hundreds, computing cores.
■ Have very small cores and not necessarily of the same type (different ISA and ABI).
■ Have a not fixed way of interconnecting cores.
■ May have some elements shared among cores (for example channels and shared
memories).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 27
42. Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation of computer
architectures that:
■ Are composed by many, possibly hundreds, computing cores.
■ Have very small cores and not necessarily of the same type (different ISA and ABI).
■ Have a not fixed way of interconnecting cores.
■ May have some elements shared among cores (for example channels and shared
memories).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 27
43. Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation of computer
architectures that:
■ Are composed by many, possibly hundreds, computing cores.
■ Have very small cores and not necessarily of the same type (different ISA and ABI).
■ Have a not fixed way of interconnecting cores.
■ May have some elements shared among cores (for example channels and shared
memories).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 27
44. Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation of computer
architectures that:
■ Are composed by many, possibly hundreds, computing cores.
■ Have very small cores and not necessarily of the same type (different ISA and ABI).
■ Have a not fixed way of interconnecting cores.
■ May have some elements shared among cores (for example channels and shared
memories).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 27
45. Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation of computer
architectures that:
■ Are composed by many, possibly hundreds, computing cores.
■ Have very small cores and not necessarily of the same type (different ISA and ABI).
■ Have a not fixed way of interconnecting cores.
■ May have some elements shared among cores (for example channels and shared
memories).
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 27
47. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
48. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
General purpose registers
2R registers: r0,r1,r2,r3 ... r2R
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
49. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
I/O specialized registers
N input registers: i0,i1 ... iN
M output registers: o0,o1 ... oM
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
50. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
Full set of possible opcodes
adc,add,addf,addi,and,chc,chw,cil,cilc,cir,cirn,clc,clr,cpy,cset,dec,div,divf,dpc,expf,hit
hlt,i2r,i2rw,incc,inc,j,jc,je,jgt0f,jlt,jlte,jr,jz,lfsr82,lfsr162r,m2r,mod,mulc,mult,multf
nand,nop,nor,not,or,r2m,r2o,r2owa,r2owaa,r2s,r2v,r2vri,ro2r,ro2rri,rsc,rset,sic,s2r,saj,sbc
sub,wrd,wwr,xnor,xor
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
51. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
RAM and ROM
■ 2L RAM memory cells.
■ 2O ROM memory cells.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
52. Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor” (CP) and has:
■ Some general purpose registers of size Rsize.
■ Some I/O dedicated registers of size Rsize.
■ A set of implemented opcodes chosen among many available.
■ Dedicated ROM and RAM.
■ Three possible operating modes.
Operating modes
■ Full Harvard mode.
■ Full Von Neuman mode.
■ Hybrid mode.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 29
53. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
54. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
55. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
56. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
57. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
58. Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called “Shared Objects”
(SO).
Examples of their purposes are:
■ Data storage (Memories).
■ Message passing.
■ CP synchronization.
A single SO can be shared among different CPs. To use it CPs have special instructions
(opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the Shared Memory, the
Barrier and a Pseudo Random Numbers Generator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 30
59. Handle BM computer architectures
The BM computer architecture is managed by a set of tools to:
■ build a specify architecture
■ modify a pre-existing architecture
■ simulate or emulate the behavior
■ generate the Hardware Description Language Code (HDL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the HDL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s HDL code
Simulation Framework
Simulates the be-
haviour, emulates a BM
on a standard Linux
workstation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 31
60. Handle BM computer architectures
The BM computer architecture is managed by a set of tools to:
■ build a specify architecture
■ modify a pre-existing architecture
■ simulate or emulate the behavior
■ generate the Hardware Description Language Code (HDL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the HDL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s HDL code
Simulation Framework
Simulates the be-
haviour, emulates a BM
on a standard Linux
workstation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 31
61. Handle BM computer architectures
The BM computer architecture is managed by a set of tools to:
■ build a specify architecture
■ modify a pre-existing architecture
■ simulate or emulate the behavior
■ generate the Hardware Description Language Code (HDL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the HDL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s HDL code
Simulation Framework
Simulates the be-
haviour, emulates a BM
on a standard Linux
workstation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 31
62. Handle BM computer architectures
The BM computer architecture is managed by a set of tools to:
■ build a specify architecture
■ modify a pre-existing architecture
■ simulate or emulate the behavior
■ generate the Hardware Description Language Code (HDL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the HDL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s HDL code
Simulation Framework
Simulates the be-
haviour, emulates a BM
on a standard Linux
workstation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 31
63. Toolchains
A set of toolchains allow the build and the direct deploy to a target device of
BondMachines
Bondgo Toolchain main targets
A file local.mk contains references to the source code as well all the build necessities
make bondmachine creates the JSON representation of the BM and assemble its code
make hdl creates the HDL files of the BM
make show displays a graphical representation of the BM
make simulate [simbatch] start a simulation [batch simulation]
make bitstream [design_bitstream] create the firwware [accelerator firmware]
make program flash the device into the destination target
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 32
64. Simulation
An important feature of the tools is the possibility of simulating BondMachine behavior.
An event input file describes how BondMachines elements has to change during the
simulation timespan and which one has to be be reported.
The simulator can produce results in the form of:
■ Activity log of the BM internal.
■ Graphical representation of the simulation.
■ Report file with quantitative data. Useful to construct metrics
Graphical simulation in action
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 33
65. Simulation
An important feature of the tools is the possibility of simulating BondMachine behavior.
An event input file describes how BondMachines elements has to change during the
simulation timespan and which one has to be be reported.
The simulator can produce results in the form of:
■ Activity log of the BM internal.
■ Graphical representation of the simulation.
■ Report file with quantitative data. Useful to construct metrics
Graphical simulation in action
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 33
66. Simulation
An important feature of the tools is the possibility of simulating BondMachine behavior.
An event input file describes how BondMachines elements has to change during the
simulation timespan and which one has to be be reported.
The simulator can produce results in the form of:
■ Activity log of the BM internal.
■ Graphical representation of the simulation.
■ Report file with quantitative data. Useful to construct metrics
Graphical simulation in action
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 33
67. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
68. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
69. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
70. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
71. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
72. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
73. Molding the BondMachine
As stated before BondMachines are not general purpose architectures, and to be
effective have to be shaped according the specific problem.
Several methods (apart from writing in assembly and building a BondMachine from
scratch) have been developed to do that:
■ bondgo: A new type of compiler that create not only the CPs assembly but also
the architecture itself.
■ basm: The BondMachine Assembler.
■ A set of API to create BondMachine to fit a specific computational problems.
■ An Evolutionary Computation framework to “grow” BondMachines according some
fitness function via simulation.
■ A set of tools to use BondMachine in Machine Learning.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 34
74. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
75. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
76. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
77. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
78. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
79. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
80. Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Basm
The BondMachine
assembler
Bondgo
The architecture
compiler
ML tools
Map computational
graphs to BM
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 35
81. Bondgo
The major innovation of the BondMachine Project is its compiler.
Bondgo is the name chosen for the compiler developed for the BondMachine.
The compiler source language is Go as the name suggest.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 36
82. Bondgo
This is the standard flow when building computer programs
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 37
83. Bondgo
This is the standard flow when building computer programs
high level language source
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 37
84. Bondgo
This is the standard flow when building computer programs
high level language source
assembly file
Compiling
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 37
85. Bondgo
This is the standard flow when building computer programs
high level language source
assembly file
Compiling
machine code
Assembling
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 37
86. Bondgo
Bondgo does something different from standard compilers ...
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
87. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
88. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
89. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
90. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
91. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
Processor implementation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
92. Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
Processor implementation
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 38
93. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
94. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
95. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
96. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
97. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
98. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
99. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
100. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
101. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
102. Bondgo workflow example
counter go source
package main
import (
"bondgo"
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 39
103. Bondgo
... bondgo may not only create the binaries, but also the CP architecture, and ...
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 40
104. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
105. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
106. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
107. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
108. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
109. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
Interconnections
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
110. Bondgo
... it can do even much more interesting things when compiling concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
BondMachine
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 41
111. Bondgo
A multi-core example
multi-core counter
package main
import (
"bondgo"
)
func pong () {
var in0 bondgo.Input
var out0 bondgo.Output
in0 = bondgo.Make(bondgo.Input , 3)
out0 = bondgo.Make(bondgo.Output , 5)
for {
bondgo.IOWrite(out0 , bondgo.IORead(in0)+1)
}
}
func main () {
var in0 bondgo.Input
var out0 bondgo.Output
in0 = bondgo.Make(bondgo.Input , 5)
out0 = bondgo.Make(bondgo.Output , 3)
device_0:
go pong ()
for {
bondgo.IOWrite(out0 , bondgo.IORead(in0))
}
}
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 42
113. Compiling Architectures
One of the most important result
The architecture creation is a part of the compilation process.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 44
114. Basm
The BondMachine assembler Basm is the compiler complementary tools.
The BondMachine "fluid" nature gives the assembler some unique features:
■ Support for template based assembly code
■ Combining and rewriting fragments of assembly code
■ Building hardware from assembly
■ Software/Hardware rearrange capabilities
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 45
115. Basm
The BondMachine assembler Basm is the compiler complementary tools.
The BondMachine "fluid" nature gives the assembler some unique features:
■ Support for template based assembly code
■ Combining and rewriting fragments of assembly code
■ Building hardware from assembly
■ Software/Hardware rearrange capabilities
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 45
116. Basm
The BondMachine assembler Basm is the compiler complementary tools.
The BondMachine "fluid" nature gives the assembler some unique features:
■ Support for template based assembly code
■ Combining and rewriting fragments of assembly code
■ Building hardware from assembly
■ Software/Hardware rearrange capabilities
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 45
117. Basm
The BondMachine assembler Basm is the compiler complementary tools.
The BondMachine "fluid" nature gives the assembler some unique features:
■ Support for template based assembly code
■ Combining and rewriting fragments of assembly code
■ Building hardware from assembly
■ Software/Hardware rearrange capabilities
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 45
118. Basm
The BondMachine assembler Basm is the compiler complementary tools.
The BondMachine "fluid" nature gives the assembler some unique features:
■ Support for template based assembly code
■ Combining and rewriting fragments of assembly code
■ Building hardware from assembly
■ Software/Hardware rearrange capabilities
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 45
119. Abstract Assembly
The Assembly language for the BM has been kept as independent as possible from the
particular CP.
Given a specific piece of assembly code Bondgo has the ability to compute the
“minimum CP” that can execute that code.
These are Building Blocks for complex BondMachines.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 46
120. Builders API
With these Building Blocks
Several libraries have been developed to map specific problems on BondMachines:
■ Symbond, to handle mathematical expression.
■ Boolbond, to map boolean expression.
■ Matrixwork, to perform matrices operations.
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 47
121. Builders API
With these Building Blocks
Several libraries have been developed to map specific problems on BondMachines:
■ Symbond, to handle mathematical expression.
■ Boolbond, to map boolean expression.
■ Matrixwork, to perform matrices operations.
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 47
122. Builders API
With these Building Blocks
Several libraries have been developed to map specific problems on BondMachines:
■ Symbond, to handle mathematical expression.
■ Boolbond, to map boolean expression.
■ Matrixwork, to perform matrices operations.
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 47
123. Builders API
With these Building Blocks
Several libraries have been developed to map specific problems on BondMachines:
■ Symbond, to handle mathematical expression.
■ Boolbond, to map boolean expression.
■ Matrixwork, to perform matrices operations.
more about these tools
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 47
124. BondMachine as accelerators
Talk with details about how the accelerator is build
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 48
126. Misc
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 50
127. BondMachine recap
■ The BondMachine is a software
ecosystem for the dynamical
generation (from several HL types of
origin) of computer architectures that
can be synthesized of FPGA and
■ used as standalone devices,
■ as clustered devices,
■ and as firmware for computing
accelerators.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 51
128. BondMachine recap
■ The BondMachine is a software
ecosystem for the dynamical
generation (from several HL types of
origin) of computer architectures that
can be synthesized of FPGA and
■ used as standalone devices,
■ as clustered devices,
■ and as firmware for computing
accelerators.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 51
129. BondMachine recap
■ The BondMachine is a software
ecosystem for the dynamical
generation (from several HL types of
origin) of computer architectures that
can be synthesized of FPGA and
■ used as standalone devices,
■ as clustered devices,
■ and as firmware for computing
accelerators.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 51
130. BondMachine recap
■ The BondMachine is a software
ecosystem for the dynamical
generation (from several HL types of
origin) of computer architectures that
can be synthesized of FPGA and
■ used as standalone devices,
■ as clustered devices,
■ and as firmware for computing
accelerators.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 51
131. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
132. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
133. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
134. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
135. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
136. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
137. Project timeline
■ CCR 2015 First ideas, 2016 Poster, 2017 Talk
■ InnovateFPGA 2018 Iron Award, Grand Final at
Intel Campus (CA) USA
■ Invited lectures at: "Advanced Workshop on
Modern FPGA Based Technology for Scientific
Computing", ICTP 2019
■ Invited lectures at: "NiPS Summer School 2019
– Architectures and Algorithms for
Energy-Efficient IoT and HPC Applications"
■ Golab 2018 talk and ISGC 2019 PoS
■ Article published on Parallel Computing,
Elsevier 2022
■ PON PHD program
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 52
138. Machine Learning
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 53
139. Machine Learning with BondMachine
Architectures with multiple interconnected processors like the ones produced by the
BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.
Several ways to map this structures to BondMachine has been developed:
■ A native Neural Network library
■ A Tensorflow to BondMachine translator
■ An NNEF based BondMachine composer
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 54
140. Machine Learning with BondMachine
Architectures with multiple interconnected processors like the ones produced by the
BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs.
Several ways to map this structures to BondMachine has been developed:
■ A native Neural Network library
■ A Tensorflow to BondMachine translator
■ An NNEF based BondMachine composer
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 54
141. Machine Learning with BondMachine
Native Neural Network library
The tool neuralbond allow the creation of BM-based neural chips from an API go
interface.
■ Neurons are converted to BondMachine connecting processors.
■ Tensors are mapped to CP connections.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 55
142. TensorFlow™ to Bondmachine
tf2bm
TensorFlow™ is an open source software library for numerical computation using data
flow graphs.
Graphs can be converted to BondMachines with the tf2bm tool.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 56
143. Machine Learning with BondMachine
NNEF Composer
Neural Network Exchange Format (NNEF) is a standard from Khronos Group to enable
the easy transfer of trained networks among frameworks, inference engines and devices
The NNEF BM tool approach is to descent NNEF models and build BondMachine
multi-core accordingly
This approch has several advandages over the previous:
■ It is not limited to a single framework
■ NNEF is a textual file, so no complex operations are needed to read models
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 57
144. Specs
FPGA
■ Digilent Zedboard
■ Soc: Zynq XC7Z020-CLG484-1
■ 512 MB DDR3
■ Vivado 2020.2
■ 100MHz
■ PYNQ 2.6 (custom build)
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 58
145. BM inference: A first tentative idea
A neuron of a neural network
can be seen as Connecting Processor of BM
X1
X2
X3
X4
inputs
H1
hidden layer
S1
S2
output layer
Y1
Y2
outputs
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 59
146. From idea to implementation
Starting from High Level Code, a NN model trained with TensorFlow and exported in a
standard interpreted by neuralbond that converts nodes and weights of the network
into a set of heterogeneous processors.
High Level Code
Firmware
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 60
147. A first test
Dataset info:
■ Dataset name: Banknote
Authentication
■ Description: Dataset on the
distinction between genuine and
counterfeit banknotes. The data was
extracted from images taken from
genuine and fake banknote-like
samples.
■ N. features: 4
■ Classification: binary
■ Samples: 1097
Neural network info:
■ Class: Multilayer perceptron fully
connected
■ Layers:
1 An hidden layer with 1 linear neuron
2 One output layer with 2 softmax
neurons
Graphic representation:
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 61
148. BM demo
NN demo N.1: Train
Goals are:
■ To train the simple model and prepare it from the BM
Reference notebook: banknote-train.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 62
149. BM demo
NN demo N.1 outcome
The outcome of this first part of the demo are three files:
■ sample.csv is a test dataset that will be used to feed the inferences of both: the
BM hardware and the BM simulation
■ sw.csv is the software predictions over that dataset and will be used to check the
BM inference probabilities and predictions
■ modelBM.json is the trained network that will use as BM source in the next demo
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 63
150. BM demo
NN demo N.2: BondMachine creation
Goals are:
■ Use modelBM created on the previous step to create a BondMachine
Reference notebook: proj_zedboard_ml_creation /notebook.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 64
151. BM demo
NN demo N.2 outcome
The outcome of this second part of the demo are:
■ bondmachine.json, a representation of the generated abstract machine
■ Al the HDL files needed to build the firmware for the given board
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 65
152. BM demo
NN demo N.3: BondMachine simulation
Goals are:
■ Simulate the test dataset with the BondMachine simulator
■ Compare the results with the keras prediction
Reference notebook: proj_zedboard_ml_simbatch /notebook.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 66
153. Simulation
NN demo N.3 outcome
The outcome of this third part of the demo is:
■ simbatchoutput.csv, a simulated CSV files containing the output probabilities and
the prediction
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 67
154. BM demo
NN demo N.4: Accelerator creation
Goals are:
■ Create the accelerator firmware
Reference notebook: proj_zedboard_ml_accelerator /notebook.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 68
155. Accelerator creation
NN demo N.4 outcome
FIRMWARE
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 69
156. Notebook on the board - predictions and correctness
Thanks to PYNQ we can easily load the bitstream and
program the FPGA in real time.
With their APIs we interact with the memory addresses
of the BM IP to send data into the inputs
and read the outputs (not using BM kernel module)
Dump output results for future analysis
Open the notebook
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 70
157. Benchcore
Benchmark an IP is not an
easy task.
Fortunately we have a
custom design and an
FPGA.
We can put the benchmarks
tool inside the accelerator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 71
158. Benchcore
Benchmark an IP is not an
easy task.
Fortunately we have a
custom design and an
FPGA.
We can put the benchmarks
tool inside the accelerator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 71
159. Benchcore
Benchmark an IP is not an
easy task.
Fortunately we have a
custom design and an
FPGA.
We can put the benchmarks
tool inside the accelerator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 71
160. Benchcore
Benchmark an IP is not an
easy task.
Fortunately we have a
custom design and an
FPGA.
We can put the benchmarks
tool inside the accelerator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 71
161. Benchcore
Benchmark an IP is not an
easy task.
Fortunately we have a
custom design and an
FPGA.
We can put the benchmarks
tool inside the accelerator.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 71
162. Inference evaluation
Evaluation metrics used:
■ Inference speed: time taken to predict a sample i.e. time between the arrival of
the input and the change of the output measured with the benchcore;
■ Resource usage: luts and registers in use;
■ Accuracy: as the average percentage of error on probabilities.
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102.68 µs
Resource usage
resource value occupancy
regs 15122 28.42%
luts 11192 10.51%
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 72
163. Analysis notebook
Another notebook is used to compare runs from different accelerators.
Software
prob0 prob1 class
0.6895 0.3104 0
0.5748 0.4251 0
0.4009 0.5990 1
BondMachine
prob0 prob1 class
0.6895 0.3104 0
0.5748 0.4251 0
0.4009 0.5990 1
The output of the bm corresponds to the software output
Reference notebook: analysis /zedboard_banknote /analysis.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 73
164. Optimizations
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 74
165. A first example of optimization
Remember the softmax function?
σ(zi ) = ezi
PN
j=1 e
zj
ex =
PK
l=0
xl
l!
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 75
166. A first example of optimization
ex =
PK
l=0
xl
l! K can be customize as needed
Improves latency
Decreases accuracy
benefit
drawback
tradeoff
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 76
167. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 16
■ σ: 2106.32
■ Mean: 7946.16
■ Latency: 79 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
168. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 13
■ σ: 1669.88
■ Mean: 6312.26
■ Latency: 63 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
169. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 10
■ σ: 1232.47
■ Mean: 4766.75
■ Latency: 47 µs
■ Prediction: 100%
mean σ
prob0 1.6162E-07 1.1013E-07
prob1 1.6525E-07 1.1831E-07
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
170. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 8
■ σ: 1015.50
■ Mean: 3913.66
■ Latency: 39 µs
■ Prediction: 100%
mean σ
prob0 6.5562E-05 1.7607E-05
prob1 6.6098E-05 1.7609E-05
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
171. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 5
■ σ: 740
■ Mean: 2911
■ Latency: 29 µs
■ Prediction: 100%
mean σ
prob0 3.1070E-05 7.5290E-05
prob1 3.1070E-05 7.5290E-05
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
172. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 3
■ σ: 394.10
■ Mean: 1750.93
■ Latency: 17 µs
■ Prediction: 100%
mean σ
prob0 0.0053 0.0090
prob1 0.0053 0.0090
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
173. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 2
■ σ: 268.69
■ Mean: 1311.11
■ Latency: 13.11 µs
■ Prediction: 100%
mean σ
prob0 0.0193 0.0232
prob1 0.0193 0.0232
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
174. Results of optimization
Changing number of K of the exponential factors in the softmax function...
■ K: 20
■ σ: 2875.94
■ Mean: 10268.45
■ Latency: 102 µs
■ Prediction: 100%
mean σ
prob0 1.6470E-07 1.2332E-07
prob1 1.6623E-07 1.2142E-07
■ K: 1
■ σ: 173.25
■ Mean: 923.71
■ Latency: 9.23 µs
■ Prediction: 100%
mean σ
prob0 0.0990 0.1641
prob1 0.0990 0.1641
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 77
175. Results of optimization
K Inference time
1 9.23 µs
2 13.11 µs
3 17.50 µs
5 29.11 µs
8 39.13 µs
10 47.66 µs
13 63.12 µs
16 79.46 µs
20 102.68 µs
Reduced inference times by a factor of 10 ... only by decreasing the number of
iterations.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 78
176. Fragments composition
■ The tools (neuralbond+basm) create a graph
of relations among fragments of assembly
■ Not necessarily a fragment has to be mapped
to a single CP
■ They can arbitrarily be rearranged into CPs
■ The resulting firmwares are identical in term of
the computing outcome, but differs in
occupancy and latency.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 79
177. Fragments composition
■ The tools (neuralbond+basm) create a graph
of relations among fragments of assembly
■ Not necessarily a fragment has to be mapped
to a single CP
■ They can arbitrarily be rearranged into CPs
■ The resulting firmwares are identical in term of
the computing outcome, but differs in
occupancy and latency.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 79
178. Fragments composition
■ The tools (neuralbond+basm) create a graph
of relations among fragments of assembly
■ Not necessarily a fragment has to be mapped
to a single CP
■ They can arbitrarily be rearranged into CPs
■ The resulting firmwares are identical in term of
the computing outcome, but differs in
occupancy and latency.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 79
179. Fragments composition
■ The tools (neuralbond+basm) create a graph
of relations among fragments of assembly
■ Not necessarily a fragment has to be mapped
to a single CP
■ They can arbitrarily be rearranged into CPs
■ The resulting firmwares are identical in term of
the computing outcome, but differs in
occupancy and latency.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 79
180. BM demo
NN demo N.5: CP pruning
Goals are:
■ Prune a processor and find out the consequences
Reference notebook: proj_zedboard_ml_cp_pruning /notebook.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 80
181. BM demo
NN demo N.6: CP collapsing
Goals are:
■ Collapse processors and find out the consequences
Reference notebook: proj_zedboard_ml_cp_collapsing /notebook.ipynb
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 81
182. BM demo
Goals are:
■ Copy a project directory and try pruning, collapsing, simulating and the assembly
of the neurons
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 82
183. Several ways for customization and optimization
The great control over of the architectures generated by the BondMachine gives several
possible optimizations.
Mixing hardware
and software
optimizations
Fine control over
occupancy vs
latency
HW instructions
swapping
Software based
functions
CP Pruning
and/or collapsing
Fragment
composition
Fabric
independent
HW/SW
Templates
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 83
184. Conclusions and Future directions
1 Introduction
Challenges
FPGA
HDL workflow
HLS Workflow
Concepts
Cloud
2 The BondMachine project
Architectures handling
Architectures molding
Bondgo
Basm
API
3 Misc
Project timeline
4 Machine Learning
Train
BondMachine creation
Simulation
Accelerator
Benchmark
5 Optimizations
6 Conclusions and Future directions
Conclusions
Ongoing
Future
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 84
185. Conclusions
The BondMachine is a new kind of computing device made possible in practice only by the
emerging of new re-programmable hardware technologies such as FPGA.
The result of this process is the construction of a computer architecture that is not anymore a
static constraint where computing occurs but its creation becomes a part of the computing
process, gaining computing power and flexibility.
Over this abstraction is it possible to create a full computing Ecosystem, ranging from small
interconnected IoT devices to Machine Learning accelerators.
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 85
186. Ongoing
The project
■ Move all the code to github
■ Documentation
■ First DAQ use case
■ Complete the inclusion of Intel and Lattice FPGAs
■ ML inference in a cloud workflow
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 86
187. Ongoing
Accelerators
■ Different data types and operations, especially low and trans-precision
■ Different boards support, especially data center accelerator
■ Compare with GPUs
■ Include some real power consumption measures
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 87
188. Ongoing
Machine Learning
With ML we are still at the beginning ...
■ Quantization
■ More datasets: test on other datasets with more features and multiclass
classification
■ Neurons: increase the library of neurons to support other activation functions
■ Evaluate results: compare the results obtained with other technologies (CPU and
GPU) in terms of inference speed and energy efficiency
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 88
189. Future work
■ Include new processor shared objects and currently unsupported opcodes
■ Extend the compiler to include more data structures
■ Assembler improvements, fragments optimization and others
■ Improve the networking including new kind of interconnection firmware
What would an OS for BondMachines look like ?
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 89
190. Future work
■ Include new processor shared objects and currently unsupported opcodes
■ Extend the compiler to include more data structures
■ Assembler improvements, fragments optimization and others
■ Improve the networking including new kind of interconnection firmware
What would an OS for BondMachines look like ?
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 89
191. Future work
■ Include new processor shared objects and currently unsupported opcodes
■ Extend the compiler to include more data structures
■ Assembler improvements, fragments optimization and others
■ Improve the networking including new kind of interconnection firmware
What would an OS for BondMachines look like ?
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 89
192. Future work
■ Include new processor shared objects and currently unsupported opcodes
■ Extend the compiler to include more data structures
■ Assembler improvements, fragments optimization and others
■ Improve the networking including new kind of interconnection firmware
What would an OS for BondMachines look like ?
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 89
193. Future work
■ Include new processor shared objects and currently unsupported opcodes
■ Extend the compiler to include more data structures
■ Assembler improvements, fragments optimization and others
■ Improve the networking including new kind of interconnection firmware
What would an OS for BondMachines look like ?
School on Open Science Cloud 2022 Perugia Machine Learning on FPGA: The BondMachine Project 89