SlideShare a Scribd company logo
1 of 125
Download to read offline
The BondMachine Toolkit
Enabling Machine Learning on FPGA
Mirko Mariotti
Department of Physics and Geology - University of Perugia
INFN Perugia
International Symposium on Grids & Clouds 2019 (ISGC 2019)
April 5, 2019
Academia Sinica - Taipei (TW)
Introduction
Introduction
The BondMachine Toolkit: Enabling Machine Learning on FPGA
In this presentation i will talk about:
 Technological background of the project.
 The BondMachine Project: architecture and tools.
 BondMachine for Machine Learning.
 Building accelerators and their use on the Cloud.
 Conclusion.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 2/37
Introduction Programmable Logic
FPGA
What is it ?
 A field-programmable gate array (FPGA) is an integrated circuit whose
logic is re-programmable. It’s used to build reconfigurable digital circuits.

FPGAs contain an array of programmable
logic blocks, and a hierarchy of reconfig-
urable interconnects that allow the blocks
to be wired together.

Logic blocks can be configured to perform
complex combinational functions.

The FPGA configuration is generally specified using a hardware
description language (HDL).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
Introduction Programmable Logic
FPGA
What is it ?
 A field-programmable gate array (FPGA) is an integrated circuit whose
logic is re-programmable. It’s used to build reconfigurable digital circuits.

FPGAs contain an array of programmable
logic blocks, and a hierarchy of reconfig-
urable interconnects that allow the blocks
to be wired together.

Logic blocks can be configured to perform
complex combinational functions.

The FPGA configuration is generally specified using a hardware
description language (HDL).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
Introduction Programmable Logic
FPGA
What is it ?
 A field-programmable gate array (FPGA) is an integrated circuit whose
logic is re-programmable. It’s used to build reconfigurable digital circuits.

FPGAs contain an array of programmable
logic blocks, and a hierarchy of reconfig-
urable interconnects that allow the blocks
to be wired together.

Logic blocks can be configured to perform
complex combinational functions.

The FPGA configuration is generally specified using a hardware
description language (HDL).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
Introduction Programmable Logic
FPGA
What is it ?
 A field-programmable gate array (FPGA) is an integrated circuit whose
logic is re-programmable. It’s used to build reconfigurable digital circuits.

FPGAs contain an array of programmable
logic blocks, and a hierarchy of reconfig-
urable interconnects that allow the blocks
to be wired together.

Logic blocks can be configured to perform
complex combinational functions.

The FPGA configuration is generally specified using a hardware
description language (HDL).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
Introduction Architectures
Computer Architectures
Multi-core and Heterogeneous
Today’s computer architecture are:
 Multi-core,Two or more independent actual processing units
execute multiple instructions at the same time.
 The power is given by the number of cores.
 Parallelism has to be addressed.
 Heterogeneous, different types of processing units.
 Cell, GPU, Parallela, TPU.
 The power is given by the specialization.
 The units data transfer has to be addressed.
 The scheduling has to be addressed.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
BondMachine The Idea
The BondMachine
The idea
High level sources: Go, TensorFlow, NN
Building a new kind of computer architecture (multi-core and
heterogeneous both in cores types and interconnections) which
dynamically adapt to the specific computational problem rather than
be static.
BM architecture Layer
FPGA
Concurrency
and Special-
ization
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
BondMachine The Idea
The BondMachine
The idea
High level sources: Go, TensorFlow, NN
Building a new kind of computer architecture (multi-core and
heterogeneous both in cores types and interconnections) which
dynamically adapt to the specific computational problem rather than
be static.
BM architecture Layer
FPGA
Concurrency
and Special-
ization
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
BondMachine The Idea
The BondMachine
The idea
High level sources: Go, TensorFlow, NN
Building a new kind of computer architecture (multi-core and
heterogeneous both in cores types and interconnections) which
dynamically adapt to the specific computational problem rather than
be static.
BM architecture Layer
FPGA
Concurrency
and Special-
ization
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
BondMachine The Idea
The BondMachine
The idea
High level sources: Go, TensorFlow, NN
Building a new kind of computer architecture (multi-core and
heterogeneous both in cores types and interconnections) which
dynamically adapt to the specific computational problem rather than
be static.
BM architecture Layer
FPGA
Concurrency
and Special-
ization
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
BondMachine The Idea
The BondMachine
The idea
High level sources: Go, TensorFlow, NN
Building a new kind of computer architecture (multi-core and
heterogeneous both in cores types and interconnections) which
dynamically adapt to the specific computational problem rather than
be static.
BM architecture Layer
FPGA
Concurrency
and Special-
ization
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
BondMachine Architecture
Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation
of computer architectures that:
 Are composed by many, possibly hundreds, computing cores.
 Have very small cores and not necessarily of the same type
(different ISA and ABI).
 Have a not fixed way of interconnecting cores.
 May have some elements shared among cores (for example channels
and shared memories).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
BondMachine Architecture
Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation
of computer architectures that:
 Are composed by many, possibly hundreds, computing cores.
 Have very small cores and not necessarily of the same type
(different ISA and ABI).
 Have a not fixed way of interconnecting cores.
 May have some elements shared among cores (for example channels
and shared memories).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
BondMachine Architecture
Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation
of computer architectures that:
 Are composed by many, possibly hundreds, computing cores.
 Have very small cores and not necessarily of the same type
(different ISA and ABI).
 Have a not fixed way of interconnecting cores.
 May have some elements shared among cores (for example channels
and shared memories).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
BondMachine Architecture
Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation
of computer architectures that:
 Are composed by many, possibly hundreds, computing cores.
 Have very small cores and not necessarily of the same type
(different ISA and ABI).
 Have a not fixed way of interconnecting cores.
 May have some elements shared among cores (for example channels
and shared memories).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
BondMachine Architecture
Introducing the BondMachine (BM)
The BondMachine is a software ecosystem for the dynamic generation
of computer architectures that:
 Are composed by many, possibly hundreds, computing cores.
 Have very small cores and not necessarily of the same type
(different ISA and ABI).
 Have a not fixed way of interconnecting cores.
 May have some elements shared among cores (for example channels
and shared memories).
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
BondMachine Architecture
The BondMachine
An example
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 7/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Connecting Processors
Connecting Processor (CP)
The computational unit of the BM
The atomic computational unit of a BM is the “connecting processor”
(CP) and has:
 Some general purpose registers of size Rsize.
 Some I/O dedicated registers of size Rsize.
 A set of implemented opcodes chosen among many available.
 Dedicated ROM and RAM.
 There possible operating modes.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Shared Objects
Shared Objects (SO)
The non-computational element of the BM
Alongside CPs, BondMachines include non-computing units called
“Shared Objects” (SO).
Examples of their purposes are:
 Data storage (Memories).
 Message passing.
 CP synchronization.
A single SO can be shared among different CPs. To use it CPs have
special instructions (opcodes) oriented to the specific SO.
Four kind of SO have been developed so far: the Channel, the
Shared Memory, the Barrier and a Pseudo Random Numbers Gen-
erator.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
BondMachine Tools
Handle the BM computer architecture
The BM computer architecture is managed by a set of tools to:
 build a specify architecture
 modify a pre-existing architecture
 simulate or emulate the behavior
 Generate the Register Tranfer Code (RTL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the RTL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s RTL code
Simulation Framework
Simulates the be-
haviour, emulates a
BM on a standard
Linux workstation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
BondMachine Tools
Handle the BM computer architecture
The BM computer architecture is managed by a set of tools to:
 build a specify architecture
 modify a pre-existing architecture
 simulate or emulate the behavior
 Generate the Register Tranfer Code (RTL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the RTL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s RTL code
Simulation Framework
Simulates the be-
haviour, emulates a
BM on a standard
Linux workstation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
BondMachine Tools
Handle the BM computer architecture
The BM computer architecture is managed by a set of tools to:
 build a specify architecture
 modify a pre-existing architecture
 simulate or emulate the behavior
 Generate the Register Tranfer Code (RTL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the RTL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s RTL code
Simulation Framework
Simulates the be-
haviour, emulates a
BM on a standard
Linux workstation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
BondMachine Tools
Handle the BM computer architecture
The BM computer architecture is managed by a set of tools to:
 build a specify architecture
 modify a pre-existing architecture
 simulate or emulate the behavior
 Generate the Register Tranfer Code (RTL)
Processor Builder
Selects the single pro-
cessor, assembles and
disassembles, saves on
disk as JSON, creates
the RTL code of a CP
BondMachine Builder
Connects CPs and SOs
together in custom
topologies, loads and
saves on disk as JSON,
create BM’s RTL code
Simulation Framework
Simulates the be-
haviour, emulates a
BM on a standard
Linux workstation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding API and Libraries
Use the BM computer architecture
Mapping specific computational problems to BMs
Symbond
Map symbolic
mathematical
expressions to BM
Boolbond
Map boolean systems
to BM
Matrixwork
Basic matrix
computation
Evolutive BM
Evolutionary
computing to BM
Neuralbond
Map neural networks
to BM
tf2bm  nnef2bm
Map computational
graphs to BM
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
Moulding Bondgo
Bondgo
The major innovation of the BondMachine Project is its compiler.
Bondgo is the name chosen for the compiler developed for the
BondMachine.
The compiler source language is Go as the name suggest.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 12/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
Processor implementation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
Bondgo does something different from standard compilers ...
high level GO source
assembly file
Compiling
CP specs
Arch generating
machine code
Assembling
Processor implementation
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
A first example
counter go source
package main
import (
bondgo
)
func main () {
var outgoing bondgo.Output
var reg_int uint8
outgoing =
bondgo.Make(bondgo.Output , 3)
reg_int = 0
for {
bondgo.IOWrite(outgoing ,
reg_int)
reg_int ++
}
}
bondgo –input-file counter.go
-mpm
BM JSON rapresentation
counter asm
clr r0
rset r1 0
cpy r0 r1
cpy r1 r0
r2o r1 o0
inc r0
j 3
bondmachine tool
BM HDL code
FPGA
procbuilder tool
Binary
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
Moulding Bondgo
Bondgo
... bondgo may not only create the binaries, but also the CP
architecture, and ...
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 15/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
Interconnections
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
... it can do even much more interesting things when compiling
concurrent programs.
high level GO source
assembly CP 1
assembly CP 2
assembly CP 3
assembly CP 4
assembly CP ...
assembly CP N
Compiling
CP 1 specs
CP 2
CP 3
CP 4
CP ...
CP N
Archs generating
Asm and Binaries
BondMachine
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
Moulding Bondgo
Bondgo
A multi-core example
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 17/37
Moulding Bondgo
Compiling Architectures
One of the most important result
The architecture creation is a part of the compilation process.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 18/37
Machine Learning with BondMachine
Machine Learning with BondMachine
Architectures with multiple interconnected processors like the ones
produced by the BondMachine Toolkit are a perfect fit for Neural
Networks and Computational Graphs.
Several ways to map this structures to BondMachine has been
developed:
 A native Neural Network library
 A Tensorflow to BondMachine translator
 An NNEF based BondMachine composer
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 19/37
Machine Learning with BondMachine
Machine Learning with BondMachine
Architectures with multiple interconnected processors like the ones
produced by the BondMachine Toolkit are a perfect fit for Neural
Networks and Computational Graphs.
Several ways to map this structures to BondMachine has been
developed:
 A native Neural Network library
 A Tensorflow to BondMachine translator
 An NNEF based BondMachine composer
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 19/37
Machine Learning with BondMachine Bondmachine native Neural Network library
Machine Learning with BondMachine
Native Neural Network library
The tool neuralbond allow the creation of BM-based neural chips from
an API go interface.
 Neurons are converted to BondMachine connecting processors.
 Tensors are mapped to CP connections.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 20/37
Machine Learning with BondMachine TensorFlow to Bondmachine
TensorFlow™ to Bondmachine
tf2bm
TensorFlow™ is an open source software library for numerical
computation using data flow graphs.
Graphs can be converted to BondMachines with the tf2bm tool.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 21/37
Machine Learning with BondMachine NNEF Composer
Machine Learning with BondMachine
NNEF Composer
Neural Network Exchange Format (NNEF) is a standard from Khronos
Group to enable the easy transfer of trained networks among
frameworks, inference engines and devices
The NNEF BM tool approach is to descent NNEF models and build
BondMachine multi-core accordingly
This approch has several advandages over the previous:
 It is not limited to a single framework
 NNEF is a textual file, so no complex operations are needed to
read models
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 22/37
Clustering
BondMachine Clustering
So far we saw:
 An user friendly approach to create processors (single core).
 Optimizing a single device to support intricate computational
work-flows (multi-cores) over an heterogeneous layer.
Interconnected BondMachines
What if we could extend the this layer to multiple interconnected
devices ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
Clustering
BondMachine Clustering
So far we saw:
 An user friendly approach to create processors (single core).
 Optimizing a single device to support intricate computational
work-flows (multi-cores) over an heterogeneous layer.
Interconnected BondMachines
What if we could extend the this layer to multiple interconnected
devices ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
Clustering
BondMachine Clustering
So far we saw:
 An user friendly approach to create processors (single core).
 Optimizing a single device to support intricate computational
work-flows (multi-cores) over an heterogeneous layer.
Interconnected BondMachines
What if we could extend the this layer to multiple interconnected
devices ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
Clustering
BondMachine Clustering
So far we saw:
 An user friendly approach to create processors (single core).
 Optimizing a single device to support intricate computational
work-flows (multi-cores) over an heterogeneous layer.
Interconnected BondMachines
What if we could extend the this layer to multiple interconnected
devices ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
Clustering
BondMachine Clustering
The same logic existing among CP have been extended among different
BondMachines organized in clusters.
Protocols, one ethernet called etherbond and one using UDP called
udpbond have been created for the purpose.
FPGA based BondMachines, standard Linux Workstations, Emulated
BondMachines might join a cluster an contribute to a single distributed
computational problem.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
Clustering
BondMachine Clustering
The same logic existing among CP have been extended among different
BondMachines organized in clusters.
Protocols, one ethernet called etherbond and one using UDP called
udpbond have been created for the purpose.
FPGA based BondMachines, standard Linux Workstations, Emulated
BondMachines might join a cluster an contribute to a single distributed
computational problem.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
Clustering
BondMachine Clustering
The same logic existing among CP have been extended among different
BondMachines organized in clusters.
Protocols, one ethernet called etherbond and one using UDP called
udpbond have been created for the purpose.
FPGA based BondMachines, standard Linux Workstations, Emulated
BondMachines might join a cluster an contribute to a single distributed
computational problem.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
Clustering
BondMachine Clustering
A distributed example
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 25/37
Clustering
BondMachine Clustering
Results
Results
 User can deploy an entire HW/SW cluster starting from code
written in a high level description (Go, NNEF, etc)
 Workstation with emulated BondMachines, workstation with
etherbond drivers, standalone BondMachines (FPGA) may join
these clusters.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 26/37
Clustering
BondMachine Clustering
Results
Results
 User can deploy an entire HW/SW cluster starting from code
written in a high level description (Go, NNEF, etc)
 Workstation with emulated BondMachines, workstation with
etherbond drivers, standalone BondMachines (FPGA) may join
these clusters.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 26/37
Uses
Use cases
Two use cases in Physics experiments are currently being developed:
 Real time pulse shape analysis in neutron detectors
 bringing the intelligence to the edge
 Test beam for space experiments (DAMPE, HERD)
 increasing testbed operations efficiency
Computing Accelerator
Our effort is now in enabling the possibility of building computing
accelerators to be used from within standard (Linux) applications.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 27/37
Uses
Use cases
Two use cases in Physics experiments are currently being developed:
 Real time pulse shape analysis in neutron detectors
 bringing the intelligence to the edge
 Test beam for space experiments (DAMPE, HERD)
 increasing testbed operations efficiency
Computing Accelerator
Our effort is now in enabling the possibility of building computing
accelerators to be used from within standard (Linux) applications.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 27/37
Uses
Accelerators
Types
We are currently working to enable the use the BM as accelerator in
two directions:
 Using standard processor/FPGA hybrid chips
 Zynq, Cyclone V
 Using PCI-express FPGA evaluation boards
 Kintek 7 Evaluation board
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 28/37
Uses
Accelerators
Hybrid chips
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
Uses
Accelerators
Hybrid chips
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
Uses
Accelerators
Hybrid chips
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
Uses
Accelerators
Hybrid chips
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
Uses
Accelerators
PCI-express boards
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
Uses
Accelerators
PCI-express boards
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
Uses
Accelerators
PCI-express boards
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
Uses
Accelerators
PCI-express boards
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
Uses
Accelerators
PCI-express boards
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
Uses
Accelerators
Hardware
Hybrid chips
Digilent Zedboard
Zynq-7000 SoC XC7Z020
512 MB DDR3
Up to 667 MHz
Xilinx ZC702
Zynq-7000 SoC XC7Z020
1GB DDR3
85k cells - 220 DSP slices
Terasic DE10Nano
Intel Cyclone V
1GB DDR3 SDRAM
110K LEs
PCI-Express board
Xilinx KC705
Kintex-7 FPGAs
1GB DDR3 SODIM
326k cells - 840 DSP slices
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 31/37
Uses
Accelerators
Cloud
FPGA accelerators can be used in the cloud:
 Several public cloud providers offers solution of VM connected to
FPGAs (Amazon, Nimbix)
 FPGAs can be inserted in private clouds infrastructures
To be used a firmware has to be uploaded to the accelerated VM FPGA
The BondMachine toolkit can be used to build such firmware
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
Uses
Accelerators
Cloud
FPGA accelerators can be used in the cloud:
 Several public cloud providers offers solution of VM connected to
FPGAs (Amazon, Nimbix)
 FPGAs can be inserted in private clouds infrastructures
To be used a firmware has to be uploaded to the accelerated VM FPGA
The BondMachine toolkit can be used to build such firmware
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
Uses
Accelerators
Cloud
FPGA accelerators can be used in the cloud:
 Several public cloud providers offers solution of VM connected to
FPGAs (Amazon, Nimbix)
 FPGAs can be inserted in private clouds infrastructures
To be used a firmware has to be uploaded to the accelerated VM FPGA
The BondMachine toolkit can be used to build such firmware
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Uses
Accelerated ML in the Cloud
Neural Network API TF2BM NNEF
BM firmware
Prebuild BM
firmware
library
FPGA enabled
VM
BM
hw
deploy
BM
sw
deploy
Accelerated VM
Application
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
Project History
Project History
 May 2016 - First tests on the idea.
 October 2016 - Prototype at “Makerfaire 2016 Rome”
 Jul 2018 - InnovateFPGA EMEA Silver Award.
 Aug 2018 - Presented at Intel Campus, Santa Jose (CA) .
 Aug 2018 - InnovateFPGA Iron Award in the Grand Final.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 34/37
Conclusions
Conclusions
The BondMachine is a new kind of computing device made possible in
practice only by the emerging of new re-programmable hardware technologies
such as FPGA.
The result of this process is the construction of a computer architecture that
is not anymore a static constraint where computing occurs but its creation
becomes a part of the computing process, gaining computing power and
flexibility.
Over this abstraction is it possible to create a full computing Ecosystem,
ranging from small interconnected IoT devices to Machine Learning
accelerators.
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 35/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
Future work
The project is at the stage of a working prototype, so work has to be
done in several areas:
 Include new processor shared objects and currently unsupported
opcodes.
 Extend the compiler to include more data structures.
 Improve the networking including new interconnection firmwares.
 Work on BondMachine as accelerators.
What would an OS for BondMachines look like ?
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
Future work
If you have question/curiosity on the project:
Mirko Mariotti
mirko.mariotti@unipg.it
http://bondmachine.fisica.unipg.it
Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 37/37

More Related Content

What's hot

Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMartin Děcký
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware LandscapeGrigory Sapunov
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGAIJERA Editor
 
Smalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareSmalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareESUG
 
FPGA Design Challenges
FPGA Design ChallengesFPGA Design Challenges
FPGA Design ChallengesKrishna Gaihre
 
FPGA Selection Methodology for Real time projects
FPGA Selection Methodology for Real time projectsFPGA Selection Methodology for Real time projects
FPGA Selection Methodology for Real time projectsKrishna Gaihre
 

What's hot (9)

Microkernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric ComputingMicrokernels in the Era of Data-Centric Computing
Microkernels in the Era of Data-Centric Computing
 
J044084349
J044084349J044084349
J044084349
 
Deep learning: Hardware Landscape
Deep learning: Hardware LandscapeDeep learning: Hardware Landscape
Deep learning: Hardware Landscape
 
LPC4300_two_cores
LPC4300_two_coresLPC4300_two_cores
LPC4300_two_cores
 
Design and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGADesign and Implementation of Quintuple Processor Architecture Using FPGA
Design and Implementation of Quintuple Processor Architecture Using FPGA
 
Smalltalk-80 : hardware and software
Smalltalk-80 : hardware and softwareSmalltalk-80 : hardware and software
Smalltalk-80 : hardware and software
 
2020-04-29 SIT Insights in Technology - Bertrand Meyer
2020-04-29 SIT Insights in Technology - Bertrand Meyer2020-04-29 SIT Insights in Technology - Bertrand Meyer
2020-04-29 SIT Insights in Technology - Bertrand Meyer
 
FPGA Design Challenges
FPGA Design ChallengesFPGA Design Challenges
FPGA Design Challenges
 
FPGA Selection Methodology for Real time projects
FPGA Selection Methodology for Real time projectsFPGA Selection Methodology for Real time projects
FPGA Selection Methodology for Real time projects
 

Similar to The BondMachine Toolkit: Enabling ML on FPGA

Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"Mirko Mariotti
 
Short course on FPGA programming
Short course on FPGA programmingShort course on FPGA programming
Short course on FPGA programmingMirko Mariotti
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) ijceronline
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)ijceronline
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
FPGA Design, Architecture and Applications
FPGA Design, Architecture and ApplicationsFPGA Design, Architecture and Applications
FPGA Design, Architecture and ApplicationsLogic Fruit Technologies
 
Adaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarAdaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarSuyog Potdar
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapManolis Vavalis
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmAngie Lee
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdfRioCarthiis
 
64 Bit Multi Processor Paper Presentation
64 Bit Multi Processor Paper Presentation64 Bit Multi Processor Paper Presentation
64 Bit Multi Processor Paper Presentationguestac67362
 
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete Approach
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete ApproachSFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete Approach
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete ApproachSouth Tyrol Free Software Conference
 
Specialized parallel computing
Specialized parallel computingSpecialized parallel computing
Specialized parallel computingAlaref Abushaala
 
Selective fitting strategy based real time placement algorithm for dynamicall...
Selective fitting strategy based real time placement algorithm for dynamicall...Selective fitting strategy based real time placement algorithm for dynamicall...
Selective fitting strategy based real time placement algorithm for dynamicall...eSAT Publishing House
 

Similar to The BondMachine Toolkit: Enabling ML on FPGA (20)

Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"Course: "Introductory course to HLS FPGA programming"
Course: "Introductory course to HLS FPGA programming"
 
Short course on FPGA programming
Short course on FPGA programmingShort course on FPGA programming
Short course on FPGA programming
 
ICTP Workshop 2022
ICTP Workshop 2022ICTP Workshop 2022
ICTP Workshop 2022
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER) International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)International Journal of Computational Engineering Research(IJCER)
International Journal of Computational Engineering Research(IJCER)
 
FPGA In a Nutshell
FPGA In a NutshellFPGA In a Nutshell
FPGA In a Nutshell
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
team12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptxteam12.project_ver_1_(1).pptx
team12.project_ver_1_(1).pptx
 
FPGA Design, Architecture and Applications
FPGA Design, Architecture and ApplicationsFPGA Design, Architecture and Applications
FPGA Design, Architecture and Applications
 
Adaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog PotdarAdaptive Computing Seminar - Suyog Potdar
Adaptive Computing Seminar - Suyog Potdar
 
Automatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmapAutomatic generation of platform architectures using open cl and fpga roadmap
Automatic generation of platform architectures using open cl and fpga roadmap
 
Nt1310 Unit 5 Algorithm
Nt1310 Unit 5 AlgorithmNt1310 Unit 5 Algorithm
Nt1310 Unit 5 Algorithm
 
electronics-11-03883.pdf
electronics-11-03883.pdfelectronics-11-03883.pdf
electronics-11-03883.pdf
 
Research Paper
Research PaperResearch Paper
Research Paper
 
1
11
1
 
64 Bit Multi Processor Paper Presentation
64 Bit Multi Processor Paper Presentation64 Bit Multi Processor Paper Presentation
64 Bit Multi Processor Paper Presentation
 
chameleon chip
chameleon chipchameleon chip
chameleon chip
 
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete Approach
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete ApproachSFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete Approach
SFScon 2020 - Roberto Innocenti - 202x Open Hardware Concrete Approach
 
Specialized parallel computing
Specialized parallel computingSpecialized parallel computing
Specialized parallel computing
 
Selective fitting strategy based real time placement algorithm for dynamicall...
Selective fitting strategy based real time placement algorithm for dynamicall...Selective fitting strategy based real time placement algorithm for dynamicall...
Selective fitting strategy based real time placement algorithm for dynamicall...
 

More from Mirko Mariotti

INFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 TalkINFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 TalkMirko Mariotti
 
INFN ML FPGA Course 2022
INFN ML FPGA Course 2022INFN ML FPGA Course 2022
INFN ML FPGA Course 2022Mirko Mariotti
 
Firmware Develpment for hybrid (ARM and FPGA) processors
Firmware Develpment for hybrid (ARM and FPGA) processorsFirmware Develpment for hybrid (ARM and FPGA) processors
Firmware Develpment for hybrid (ARM and FPGA) processorsMirko Mariotti
 
Harvesting dispersed computational resources with Openstack
Harvesting dispersed computational resources with OpenstackHarvesting dispersed computational resources with Openstack
Harvesting dispersed computational resources with OpenstackMirko Mariotti
 
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'Mirko Mariotti
 
Xen - Esperienze a Perugia
Xen - Esperienze a PerugiaXen - Esperienze a Perugia
Xen - Esperienze a PerugiaMirko Mariotti
 

More from Mirko Mariotti (7)

INFN SOSC 2022 Talk
INFN SOSC 2022 TalkINFN SOSC 2022 Talk
INFN SOSC 2022 Talk
 
INFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 TalkINFN Advanced ML Hackaton 2022 Talk
INFN Advanced ML Hackaton 2022 Talk
 
INFN ML FPGA Course 2022
INFN ML FPGA Course 2022INFN ML FPGA Course 2022
INFN ML FPGA Course 2022
 
Firmware Develpment for hybrid (ARM and FPGA) processors
Firmware Develpment for hybrid (ARM and FPGA) processorsFirmware Develpment for hybrid (ARM and FPGA) processors
Firmware Develpment for hybrid (ARM and FPGA) processors
 
Harvesting dispersed computational resources with Openstack
Harvesting dispersed computational resources with OpenstackHarvesting dispersed computational resources with Openstack
Harvesting dispersed computational resources with Openstack
 
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'
Pim - Un Esempio di integrazione AA dei servizi INFN/Universita'
 
Xen - Esperienze a Perugia
Xen - Esperienze a PerugiaXen - Esperienze a Perugia
Xen - Esperienze a Perugia
 

Recently uploaded

Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaPraksha3
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayZachary Labe
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxpriyankatabhane
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptArshadWarsi13
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxpriyankatabhane
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...lizamodels9
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxyaramohamed343013
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555kikilily0909
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxEran Akiva Sinbar
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsHajira Mahmood
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxFarihaAbdulRasheed
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)DHURKADEVIBASKAR
 

Recently uploaded (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tantaDashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
Dashanga agada a formulation of Agada tantra dealt in 3 Rd year bams agada tanta
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 
Welcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work DayWelcome to GFDL for Take Your Child To Work Day
Welcome to GFDL for Take Your Child To Work Day
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 
Speech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptxSpeech, hearing, noise, intelligibility.pptx
Speech, hearing, noise, intelligibility.pptx
 
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Aiims Metro Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Transposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.pptTransposable elements in prokaryotes.ppt
Transposable elements in prokaryotes.ppt
 
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptxMicrophone- characteristics,carbon microphone, dynamic microphone.pptx
Microphone- characteristics,carbon microphone, dynamic microphone.pptx
 
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Munirka Delhi 💯Call Us 🔝8264348440🔝
 
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
Best Call Girls In Sector 29 Gurgaon❤️8860477959 EscorTs Service In 24/7 Delh...
 
Scheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docxScheme-of-Work-Science-Stage-4 cambridge science.docx
Scheme-of-Work-Science-Stage-4 cambridge science.docx
 
‏‏VIRUS - 123455555555555555555555555555555555555555
‏‏VIRUS -  123455555555555555555555555555555555555555‏‏VIRUS -  123455555555555555555555555555555555555555
‏‏VIRUS - 123455555555555555555555555555555555555555
 
zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptxTwin's paradox experiment is a meassurement of the extra dimensions.pptx
Twin's paradox experiment is a meassurement of the extra dimensions.pptx
 
Solution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutionsSolution chemistry, Moral and Normal solutions
Solution chemistry, Moral and Normal solutions
 
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptxRESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
RESPIRATORY ADAPTATIONS TO HYPOXIA IN HUMNAS.pptx
 
Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)Recombinant DNA technology( Transgenic plant and animal)
Recombinant DNA technology( Transgenic plant and animal)
 

The BondMachine Toolkit: Enabling ML on FPGA

  • 1. The BondMachine Toolkit Enabling Machine Learning on FPGA Mirko Mariotti Department of Physics and Geology - University of Perugia INFN Perugia International Symposium on Grids & Clouds 2019 (ISGC 2019) April 5, 2019 Academia Sinica - Taipei (TW)
  • 2. Introduction Introduction The BondMachine Toolkit: Enabling Machine Learning on FPGA In this presentation i will talk about: Technological background of the project. The BondMachine Project: architecture and tools. BondMachine for Machine Learning. Building accelerators and their use on the Cloud. Conclusion. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 2/37
  • 3. Introduction Programmable Logic FPGA What is it ? A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It’s used to build reconfigurable digital circuits. FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfig- urable interconnects that allow the blocks to be wired together. Logic blocks can be configured to perform complex combinational functions. The FPGA configuration is generally specified using a hardware description language (HDL). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
  • 4. Introduction Programmable Logic FPGA What is it ? A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It’s used to build reconfigurable digital circuits. FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfig- urable interconnects that allow the blocks to be wired together. Logic blocks can be configured to perform complex combinational functions. The FPGA configuration is generally specified using a hardware description language (HDL). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
  • 5. Introduction Programmable Logic FPGA What is it ? A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It’s used to build reconfigurable digital circuits. FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfig- urable interconnects that allow the blocks to be wired together. Logic blocks can be configured to perform complex combinational functions. The FPGA configuration is generally specified using a hardware description language (HDL). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
  • 6. Introduction Programmable Logic FPGA What is it ? A field-programmable gate array (FPGA) is an integrated circuit whose logic is re-programmable. It’s used to build reconfigurable digital circuits. FPGAs contain an array of programmable logic blocks, and a hierarchy of reconfig- urable interconnects that allow the blocks to be wired together. Logic blocks can be configured to perform complex combinational functions. The FPGA configuration is generally specified using a hardware description language (HDL). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 3/37
  • 7. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 8. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 9. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 10. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 11. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 12. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 13. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 14. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 15. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 16. Introduction Architectures Computer Architectures Multi-core and Heterogeneous Today’s computer architecture are: Multi-core,Two or more independent actual processing units execute multiple instructions at the same time. The power is given by the number of cores. Parallelism has to be addressed. Heterogeneous, different types of processing units. Cell, GPU, Parallela, TPU. The power is given by the specialization. The units data transfer has to be addressed. The scheduling has to be addressed. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 4/37
  • 17. BondMachine The Idea The BondMachine The idea High level sources: Go, TensorFlow, NN Building a new kind of computer architecture (multi-core and heterogeneous both in cores types and interconnections) which dynamically adapt to the specific computational problem rather than be static. BM architecture Layer FPGA Concurrency and Special- ization Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
  • 18. BondMachine The Idea The BondMachine The idea High level sources: Go, TensorFlow, NN Building a new kind of computer architecture (multi-core and heterogeneous both in cores types and interconnections) which dynamically adapt to the specific computational problem rather than be static. BM architecture Layer FPGA Concurrency and Special- ization Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
  • 19. BondMachine The Idea The BondMachine The idea High level sources: Go, TensorFlow, NN Building a new kind of computer architecture (multi-core and heterogeneous both in cores types and interconnections) which dynamically adapt to the specific computational problem rather than be static. BM architecture Layer FPGA Concurrency and Special- ization Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
  • 20. BondMachine The Idea The BondMachine The idea High level sources: Go, TensorFlow, NN Building a new kind of computer architecture (multi-core and heterogeneous both in cores types and interconnections) which dynamically adapt to the specific computational problem rather than be static. BM architecture Layer FPGA Concurrency and Special- ization Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
  • 21. BondMachine The Idea The BondMachine The idea High level sources: Go, TensorFlow, NN Building a new kind of computer architecture (multi-core and heterogeneous both in cores types and interconnections) which dynamically adapt to the specific computational problem rather than be static. BM architecture Layer FPGA Concurrency and Special- ization Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 5/37
  • 22. BondMachine Architecture Introducing the BondMachine (BM) The BondMachine is a software ecosystem for the dynamic generation of computer architectures that: Are composed by many, possibly hundreds, computing cores. Have very small cores and not necessarily of the same type (different ISA and ABI). Have a not fixed way of interconnecting cores. May have some elements shared among cores (for example channels and shared memories). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
  • 23. BondMachine Architecture Introducing the BondMachine (BM) The BondMachine is a software ecosystem for the dynamic generation of computer architectures that: Are composed by many, possibly hundreds, computing cores. Have very small cores and not necessarily of the same type (different ISA and ABI). Have a not fixed way of interconnecting cores. May have some elements shared among cores (for example channels and shared memories). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
  • 24. BondMachine Architecture Introducing the BondMachine (BM) The BondMachine is a software ecosystem for the dynamic generation of computer architectures that: Are composed by many, possibly hundreds, computing cores. Have very small cores and not necessarily of the same type (different ISA and ABI). Have a not fixed way of interconnecting cores. May have some elements shared among cores (for example channels and shared memories). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
  • 25. BondMachine Architecture Introducing the BondMachine (BM) The BondMachine is a software ecosystem for the dynamic generation of computer architectures that: Are composed by many, possibly hundreds, computing cores. Have very small cores and not necessarily of the same type (different ISA and ABI). Have a not fixed way of interconnecting cores. May have some elements shared among cores (for example channels and shared memories). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
  • 26. BondMachine Architecture Introducing the BondMachine (BM) The BondMachine is a software ecosystem for the dynamic generation of computer architectures that: Are composed by many, possibly hundreds, computing cores. Have very small cores and not necessarily of the same type (different ISA and ABI). Have a not fixed way of interconnecting cores. May have some elements shared among cores (for example channels and shared memories). Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 6/37
  • 27. BondMachine Architecture The BondMachine An example Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 7/37
  • 28. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 29. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 30. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 31. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 32. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 33. BondMachine Connecting Processors Connecting Processor (CP) The computational unit of the BM The atomic computational unit of a BM is the “connecting processor” (CP) and has: Some general purpose registers of size Rsize. Some I/O dedicated registers of size Rsize. A set of implemented opcodes chosen among many available. Dedicated ROM and RAM. There possible operating modes. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 8/37
  • 34. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 35. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 36. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 37. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 38. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 39. BondMachine Shared Objects Shared Objects (SO) The non-computational element of the BM Alongside CPs, BondMachines include non-computing units called “Shared Objects” (SO). Examples of their purposes are: Data storage (Memories). Message passing. CP synchronization. A single SO can be shared among different CPs. To use it CPs have special instructions (opcodes) oriented to the specific SO. Four kind of SO have been developed so far: the Channel, the Shared Memory, the Barrier and a Pseudo Random Numbers Gen- erator. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 9/37
  • 40. BondMachine Tools Handle the BM computer architecture The BM computer architecture is managed by a set of tools to: build a specify architecture modify a pre-existing architecture simulate or emulate the behavior Generate the Register Tranfer Code (RTL) Processor Builder Selects the single pro- cessor, assembles and disassembles, saves on disk as JSON, creates the RTL code of a CP BondMachine Builder Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM’s RTL code Simulation Framework Simulates the be- haviour, emulates a BM on a standard Linux workstation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
  • 41. BondMachine Tools Handle the BM computer architecture The BM computer architecture is managed by a set of tools to: build a specify architecture modify a pre-existing architecture simulate or emulate the behavior Generate the Register Tranfer Code (RTL) Processor Builder Selects the single pro- cessor, assembles and disassembles, saves on disk as JSON, creates the RTL code of a CP BondMachine Builder Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM’s RTL code Simulation Framework Simulates the be- haviour, emulates a BM on a standard Linux workstation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
  • 42. BondMachine Tools Handle the BM computer architecture The BM computer architecture is managed by a set of tools to: build a specify architecture modify a pre-existing architecture simulate or emulate the behavior Generate the Register Tranfer Code (RTL) Processor Builder Selects the single pro- cessor, assembles and disassembles, saves on disk as JSON, creates the RTL code of a CP BondMachine Builder Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM’s RTL code Simulation Framework Simulates the be- haviour, emulates a BM on a standard Linux workstation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
  • 43. BondMachine Tools Handle the BM computer architecture The BM computer architecture is managed by a set of tools to: build a specify architecture modify a pre-existing architecture simulate or emulate the behavior Generate the Register Tranfer Code (RTL) Processor Builder Selects the single pro- cessor, assembles and disassembles, saves on disk as JSON, creates the RTL code of a CP BondMachine Builder Connects CPs and SOs together in custom topologies, loads and saves on disk as JSON, create BM’s RTL code Simulation Framework Simulates the be- haviour, emulates a BM on a standard Linux workstation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 10/37
  • 44. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 45. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 46. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 47. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 48. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 49. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 50. Moulding API and Libraries Use the BM computer architecture Mapping specific computational problems to BMs Symbond Map symbolic mathematical expressions to BM Boolbond Map boolean systems to BM Matrixwork Basic matrix computation Evolutive BM Evolutionary computing to BM Neuralbond Map neural networks to BM tf2bm nnef2bm Map computational graphs to BM Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 11/37
  • 51. Moulding Bondgo Bondgo The major innovation of the BondMachine Project is its compiler. Bondgo is the name chosen for the compiler developed for the BondMachine. The compiler source language is Go as the name suggest. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 12/37
  • 52. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 53. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 54. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source assembly file Compiling Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 55. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source assembly file Compiling CP specs Arch generating Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 56. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source assembly file Compiling CP specs Arch generating machine code Assembling Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 57. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source assembly file Compiling CP specs Arch generating machine code Assembling Processor implementation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 58. Moulding Bondgo Bondgo Bondgo does something different from standard compilers ... high level GO source assembly file Compiling CP specs Arch generating machine code Assembling Processor implementation Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 13/37
  • 59. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 60. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 61. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 62. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 63. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 64. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 65. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 66. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 67. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 68. Moulding Bondgo Bondgo A first example counter go source package main import ( bondgo ) func main () { var outgoing bondgo.Output var reg_int uint8 outgoing = bondgo.Make(bondgo.Output , 3) reg_int = 0 for { bondgo.IOWrite(outgoing , reg_int) reg_int ++ } } bondgo –input-file counter.go -mpm BM JSON rapresentation counter asm clr r0 rset r1 0 cpy r0 r1 cpy r1 r0 r2o r1 o0 inc r0 j 3 bondmachine tool BM HDL code FPGA procbuilder tool Binary Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 14/37
  • 69. Moulding Bondgo Bondgo ... bondgo may not only create the binaries, but also the CP architecture, and ... Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 15/37
  • 70. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 71. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 72. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source assembly CP 1 assembly CP 2 assembly CP 3 assembly CP 4 assembly CP ... assembly CP N Compiling Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 73. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source assembly CP 1 assembly CP 2 assembly CP 3 assembly CP 4 assembly CP ... assembly CP N Compiling CP 1 specs CP 2 CP 3 CP 4 CP ... CP N Archs generating Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 74. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source assembly CP 1 assembly CP 2 assembly CP 3 assembly CP 4 assembly CP ... assembly CP N Compiling CP 1 specs CP 2 CP 3 CP 4 CP ... CP N Archs generating Asm and Binaries Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 75. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source assembly CP 1 assembly CP 2 assembly CP 3 assembly CP 4 assembly CP ... assembly CP N Compiling CP 1 specs CP 2 CP 3 CP 4 CP ... CP N Archs generating Asm and Binaries Interconnections Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 76. Moulding Bondgo Bondgo ... it can do even much more interesting things when compiling concurrent programs. high level GO source assembly CP 1 assembly CP 2 assembly CP 3 assembly CP 4 assembly CP ... assembly CP N Compiling CP 1 specs CP 2 CP 3 CP 4 CP ... CP N Archs generating Asm and Binaries BondMachine Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 16/37
  • 77. Moulding Bondgo Bondgo A multi-core example Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 17/37
  • 78. Moulding Bondgo Compiling Architectures One of the most important result The architecture creation is a part of the compilation process. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 18/37
  • 79. Machine Learning with BondMachine Machine Learning with BondMachine Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs. Several ways to map this structures to BondMachine has been developed: A native Neural Network library A Tensorflow to BondMachine translator An NNEF based BondMachine composer Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 19/37
  • 80. Machine Learning with BondMachine Machine Learning with BondMachine Architectures with multiple interconnected processors like the ones produced by the BondMachine Toolkit are a perfect fit for Neural Networks and Computational Graphs. Several ways to map this structures to BondMachine has been developed: A native Neural Network library A Tensorflow to BondMachine translator An NNEF based BondMachine composer Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 19/37
  • 81. Machine Learning with BondMachine Bondmachine native Neural Network library Machine Learning with BondMachine Native Neural Network library The tool neuralbond allow the creation of BM-based neural chips from an API go interface. Neurons are converted to BondMachine connecting processors. Tensors are mapped to CP connections. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 20/37
  • 82. Machine Learning with BondMachine TensorFlow to Bondmachine TensorFlow™ to Bondmachine tf2bm TensorFlow™ is an open source software library for numerical computation using data flow graphs. Graphs can be converted to BondMachines with the tf2bm tool. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 21/37
  • 83. Machine Learning with BondMachine NNEF Composer Machine Learning with BondMachine NNEF Composer Neural Network Exchange Format (NNEF) is a standard from Khronos Group to enable the easy transfer of trained networks among frameworks, inference engines and devices The NNEF BM tool approach is to descent NNEF models and build BondMachine multi-core accordingly This approch has several advandages over the previous: It is not limited to a single framework NNEF is a textual file, so no complex operations are needed to read models Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 22/37
  • 84. Clustering BondMachine Clustering So far we saw: An user friendly approach to create processors (single core). Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer. Interconnected BondMachines What if we could extend the this layer to multiple interconnected devices ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
  • 85. Clustering BondMachine Clustering So far we saw: An user friendly approach to create processors (single core). Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer. Interconnected BondMachines What if we could extend the this layer to multiple interconnected devices ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
  • 86. Clustering BondMachine Clustering So far we saw: An user friendly approach to create processors (single core). Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer. Interconnected BondMachines What if we could extend the this layer to multiple interconnected devices ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
  • 87. Clustering BondMachine Clustering So far we saw: An user friendly approach to create processors (single core). Optimizing a single device to support intricate computational work-flows (multi-cores) over an heterogeneous layer. Interconnected BondMachines What if we could extend the this layer to multiple interconnected devices ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 23/37
  • 88. Clustering BondMachine Clustering The same logic existing among CP have been extended among different BondMachines organized in clusters. Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose. FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
  • 89. Clustering BondMachine Clustering The same logic existing among CP have been extended among different BondMachines organized in clusters. Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose. FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
  • 90. Clustering BondMachine Clustering The same logic existing among CP have been extended among different BondMachines organized in clusters. Protocols, one ethernet called etherbond and one using UDP called udpbond have been created for the purpose. FPGA based BondMachines, standard Linux Workstations, Emulated BondMachines might join a cluster an contribute to a single distributed computational problem. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 24/37
  • 91. Clustering BondMachine Clustering A distributed example Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 25/37
  • 92. Clustering BondMachine Clustering Results Results User can deploy an entire HW/SW cluster starting from code written in a high level description (Go, NNEF, etc) Workstation with emulated BondMachines, workstation with etherbond drivers, standalone BondMachines (FPGA) may join these clusters. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 26/37
  • 93. Clustering BondMachine Clustering Results Results User can deploy an entire HW/SW cluster starting from code written in a high level description (Go, NNEF, etc) Workstation with emulated BondMachines, workstation with etherbond drivers, standalone BondMachines (FPGA) may join these clusters. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 26/37
  • 94. Uses Use cases Two use cases in Physics experiments are currently being developed: Real time pulse shape analysis in neutron detectors bringing the intelligence to the edge Test beam for space experiments (DAMPE, HERD) increasing testbed operations efficiency Computing Accelerator Our effort is now in enabling the possibility of building computing accelerators to be used from within standard (Linux) applications. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 27/37
  • 95. Uses Use cases Two use cases in Physics experiments are currently being developed: Real time pulse shape analysis in neutron detectors bringing the intelligence to the edge Test beam for space experiments (DAMPE, HERD) increasing testbed operations efficiency Computing Accelerator Our effort is now in enabling the possibility of building computing accelerators to be used from within standard (Linux) applications. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 27/37
  • 96. Uses Accelerators Types We are currently working to enable the use the BM as accelerator in two directions: Using standard processor/FPGA hybrid chips Zynq, Cyclone V Using PCI-express FPGA evaluation boards Kintek 7 Evaluation board Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 28/37
  • 97. Uses Accelerators Hybrid chips Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
  • 98. Uses Accelerators Hybrid chips Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
  • 99. Uses Accelerators Hybrid chips Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
  • 100. Uses Accelerators Hybrid chips Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 29/37
  • 101. Uses Accelerators PCI-express boards Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
  • 102. Uses Accelerators PCI-express boards Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
  • 103. Uses Accelerators PCI-express boards Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
  • 104. Uses Accelerators PCI-express boards Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
  • 105. Uses Accelerators PCI-express boards Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 30/37
  • 106. Uses Accelerators Hardware Hybrid chips Digilent Zedboard Zynq-7000 SoC XC7Z020 512 MB DDR3 Up to 667 MHz Xilinx ZC702 Zynq-7000 SoC XC7Z020 1GB DDR3 85k cells - 220 DSP slices Terasic DE10Nano Intel Cyclone V 1GB DDR3 SDRAM 110K LEs PCI-Express board Xilinx KC705 Kintex-7 FPGAs 1GB DDR3 SODIM 326k cells - 840 DSP slices Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 31/37
  • 107. Uses Accelerators Cloud FPGA accelerators can be used in the cloud: Several public cloud providers offers solution of VM connected to FPGAs (Amazon, Nimbix) FPGAs can be inserted in private clouds infrastructures To be used a firmware has to be uploaded to the accelerated VM FPGA The BondMachine toolkit can be used to build such firmware Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
  • 108. Uses Accelerators Cloud FPGA accelerators can be used in the cloud: Several public cloud providers offers solution of VM connected to FPGAs (Amazon, Nimbix) FPGAs can be inserted in private clouds infrastructures To be used a firmware has to be uploaded to the accelerated VM FPGA The BondMachine toolkit can be used to build such firmware Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
  • 109. Uses Accelerators Cloud FPGA accelerators can be used in the cloud: Several public cloud providers offers solution of VM connected to FPGAs (Amazon, Nimbix) FPGAs can be inserted in private clouds infrastructures To be used a firmware has to be uploaded to the accelerated VM FPGA The BondMachine toolkit can be used to build such firmware Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 32/37
  • 110. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 111. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 112. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 113. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 114. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 115. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 116. Uses Accelerated ML in the Cloud Neural Network API TF2BM NNEF BM firmware Prebuild BM firmware library FPGA enabled VM BM hw deploy BM sw deploy Accelerated VM Application Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 33/37
  • 117. Project History Project History May 2016 - First tests on the idea. October 2016 - Prototype at “Makerfaire 2016 Rome” Jul 2018 - InnovateFPGA EMEA Silver Award. Aug 2018 - Presented at Intel Campus, Santa Jose (CA) . Aug 2018 - InnovateFPGA Iron Award in the Grand Final. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 34/37
  • 118. Conclusions Conclusions The BondMachine is a new kind of computing device made possible in practice only by the emerging of new re-programmable hardware technologies such as FPGA. The result of this process is the construction of a computer architecture that is not anymore a static constraint where computing occurs but its creation becomes a part of the computing process, gaining computing power and flexibility. Over this abstraction is it possible to create a full computing Ecosystem, ranging from small interconnected IoT devices to Machine Learning accelerators. Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 35/37
  • 119. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 120. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 121. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 122. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 123. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 124. Future work Future work The project is at the stage of a working prototype, so work has to be done in several areas: Include new processor shared objects and currently unsupported opcodes. Extend the compiler to include more data structures. Improve the networking including new interconnection firmwares. Work on BondMachine as accelerators. What would an OS for BondMachines look like ? Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 36/37
  • 125. Future work If you have question/curiosity on the project: Mirko Mariotti mirko.mariotti@unipg.it http://bondmachine.fisica.unipg.it Mirko Mariotti The BondMachine Toolkit ISGC 2019 - Academia Sinica - Taipei 37/37