SlideShare a Scribd company logo
RISC-V IN SPACE
14.12.2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies
Part 1
Pre-existing work
COMPARE AND SWAP
• Compare-and-swap (CAS) is an instruction
used in multithreading to achieve
synchronisation. It compares the contents of
a memory location with a given value and,
only if they are the same, modi
fi
es the
contents of that memory location to a new
given value. This is done as a single
atomic operation.
• Compare-and-Swap has been an integral
part of the IBM 370 architectures since
1970.
• Maurice Herlihy (1991) proved that CAS can
implement more of these algorithms than
atomic read, write, and fetch-and-add
Event Loop
Sensor Multiplexer
Two main data
processing approaches
Producer 1
Consumer 1 Consumer 2
Producer 2
Producer 3
Consumer
Producer 1
4
Lightweight, modular and compatible with most used operating systems
Worldwide
application
Klepsydra
SDK
Klepsydra
GPU
Streaming
Klepsydra
AI
Klepsydra
ROS2
executor
plugin
SDK – Software Development Kit
Boost data processing at the edge for general
applications and processor intensive
algorithms
AI – Artificial Intelligence
High performance deep neural network
(DNN) engine to deploy any AI or
machine learning module at the edge
ROS2 Executor plugin
Executor for ROS2 able to process up
to 10 times more data with up to 50%
reduction in CPU consumption.
GPU (Graphic Processing Unit)
High parallelisation of GPU to increase
the processing data rate and GPU
utilization
THE PRODUCT
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
ONNX API
class KPSR_API OnnxDNNImporter
{
public:
/**
* @brief import an onnx file and uses a default eventloop factory for all processor cores
* @param onnxFileName
* @param testDNN
* @return a share pointer to a DeepNeuralNetwork object
*
* When log level is debug, dumps the YAML configuration of the default factory.
* It makes use of all processor cores.
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
bool testDNN = false);
/**
* @brief importForTest an onnx file and uses a default synchronous factory
* @param onnxFileName
* @param envFileName. Klepsydra AI configuration environment file.
* @return a share pointer to a DeepNeuralNetwork object
*
* This method is intented to be used for testing purposes only.
*
*/
static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName,
const std::string & envFileName);
};
10
Core API
class DeepNeuralNetwork {
public:
/**
* @brief setCallback
* @param callback. Callback function for the prediction result.
*/
virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0;
/**
* @brief predict. Load input matrix as input to network.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0;
/**
* @brief predict. Copy-less version of predict.
* @param inputVector. An F32AlignedVector of floats containing network input.
*
* @return Unique id corresponding to the input vector
*/
virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0;
};
11
KLEPSYDRA SDO PROCESS
• Klepsydra Streaming Distribution Optimiser (SDO):
• Runs on a separate computer
• Executes several dry runs on the OBC
• Collect statistics
• Runs a genetic algorithm to
fi
nd the optimal
solution for latency, power or throughput
• The main variable to optimise is the distribution of
layers are the two dimension of the threading model
KLEPSYDRA STREAMING DISTRIBUTION
OPTIMISER (SDO)
Part 2
The KATESU Project
QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• LS1046 running Yocto Jethro
• Docker Installed on LS1046
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software fully supported (quantised and non-
quantised)
XILINX ZEDBOARD
ZedBoard
Klepsydra AI Container
PetaLinux
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• ZedBoard running PetaLinux 2019.2
• Docker Installed on ZedBoard
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software with quantised support only
PERFORMANCE RESULTS: CME ON LS1046
0
6,5
13
19,5
26
CPU / Hz
TFLite + NEON Klepsydra
0
45
90
135
180
Latency (ms)
TFLite + NEON Klepsydra
0
4,5
9
13,5
18
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON LS1046
0
6,75
13,5
20,25
27
CPU / Hz
TFLite + NEON Klepsydra
0
30
60
90
120
Latency (ms)
TFLite + NEON Klepsydra
0
7,5
15
22,5
30
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON ZEDBOARD
0
12,5
25
37,5
50
CPU / Hz
TFLite + NEON Klepsydra
0
250
500
750
1000
Latency (ms)
TFLite + NEON Klepsydra
0
0,65
1,3
1,95
2,6
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: BSC ON LS1046
0
20
40
60
80
CPU / Hz
TFLite + NEON Klepsydra
0
1250
2500
3750
5000
Latency (ms)
TFLite + NEON Klepsydra
0
0,15
0,3
0,45
0,6
Throughput (Hz)
TFLite + NEON Klepsydra
Part 2
The PATTERN
Project
THE PATTERN PROJECT
PATTERN: Klepsydra AI ported to the GR740 aNd
RISC-V
• Target Processor: GR740, GR765 (Leon5 & Noel-V)
• Target OS: RTMES5
• Development on commercial FPGA board
• Validation on Space quali
fi
ed hardware
THE MULTI-THREADING API
Klepsydra SDK
Multi-threading framework
POSIX Operating System
PTHREAD
Klepsydra AI
Klepsydra SDK
Threading Abstraction Layer
POSIX
POSIX Operating System
PTHREAD
Klepsydra AI
RTEMS5
Multi-threading framework
RTEMS
THE PARALLELISATION FRAMEWORK
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
POSIX Operating System
PTHREAD
Parallelisation Framework
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
Parallelisation Framework
Threading Abstraction Layer
POSIX
RTEMS5
Multi-threading framework
POSIX Operating System
PTHREAD
RTEMS
THE MATHEMATICAL BACKEND
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
ARM x86 ARM x86
Klepsydra AI
Back-ends
Full-backend (Float32, Int8) Quantized-backend (Int8)
ARM x86 ARM x86
RISC-V?
RISC-V Extensions
• Current version of Klepsydra AI supports RV32GV and RV64GC
• Preparation for NOEL-V in three modes:
• ‘Vanilla’
• P-Extension and V-Extension,
• And more….
THE PLAN
Phase 1:
Klepsydra AI for RTEMS5
Phase 2:
Klepsydra AI for GR765/Leon5
Phase 3:
Klepsydra AI for GR765/Noel-V
Phase 4:
Validation of Klepsydra AI on
GR740 and GR765
THE SCHEDULE
Work Package Start Month End Month
Duration in
Months
0 1 2 3 4 5 6 7 8 9 10 11
KOM MTR1 MTR2 FR
WP4.2 11 11 1
WP0 0 17 18
2
11
10
WP4.1
1
10
10
WP3.3
0 0 1
WP2.1
WP2.2 1 4 4
WP1.1 5
0 4
5 8 4
WP1.2
WP2.3 9 10 2
WP3.2 5 8 4
WP3.1 5 5 1
CONCLUSIONS
• Enable real AI for future missions on the GR765/NOEL-V
• Very easy to use, via a simple API and web-based
optimisation tool
• Highly optimised for the GR765/NOEL-V processors
• Lightweight software (current version is 4Mb)
• Deterministic and full control of the dedicated resources
NEXT STEPS
• In-orbit-demonstration:
• OPSSAT OBC: Using Onboard Altera FPGA and NOEL-V
softcore
• Other?
• Health Monitoring (core operation failures, etc).
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

More Related Content

Similar to RISC V in Spacer

SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...PT Datacomm Diangraha
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageMayaData Inc
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetesTed Jung
 
Dpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevDpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevHemant Agrawal
 
Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Narender Kumar
 
Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Narender Kumar
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Michelle Holley
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0ScyllaDB
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack eurobsdcon
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT Project
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersMichelle Holley
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화OpenStack Korea Community
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networksinside-BigData.com
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...DataWorks Summit/Hadoop Summit
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Cesar Maciel
 

Similar to RISC V in Spacer (20)

SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...Seminar Accelerating Business Using Microservices Architecture in Digital Age...
Seminar Accelerating Business Using Microservices Architecture in Digital Age...
 
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storageWebinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
Webinar: OpenEBS - Still Free and now FASTEST Kubernetes storage
 
Container & kubernetes
Container & kubernetesContainer & kubernetes
Container & kubernetes
 
Dpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdevDpdk 2019-ipsec-eventdev
Dpdk 2019-ipsec-eventdev
 
Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02
 
Ironic
IronicIronic
Ironic
 
Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02Ironic 140622212631-phpapp02
Ironic 140622212631-phpapp02
 
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
Install FD.IO VPP On Intel(r) Architecture & Test with Trex*
 
What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0What’s New in ScyllaDB Open Source 5.0
What’s New in ScyllaDB Open Source 5.0
 
Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack Running Applications on the NetBSD Rump Kernel by Justin Cormack
Running Applications on the NetBSD Rump Kernel by Justin Cormack
 
LSA2 - 02 Namespaces
LSA2 - 02  NamespacesLSA2 - 02  Namespaces
LSA2 - 02 Namespaces
 
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoTVEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
VEDLIoT at FPL'23_Accelerators for Heterogenous Computing in AIoT
 
Using VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear ContainersUsing VPP and SRIO-V with Clear Containers
Using VPP and SRIO-V with Clear Containers
 
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
[OpenStack Days Korea 2016] Track1 - All flash CEPH 구성 및 최적화
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
A Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural NetworksA Dataflow Processing Chip for Training Deep Neural Networks
A Dataflow Processing Chip for Training Deep Neural Networks
 
Current Trends in HPC
Current Trends in HPCCurrent Trends in HPC
Current Trends in HPC
 
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
End to End Processing of 3.7 Million Telemetry Events per Second using Lambda...
 
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
Heterogeneous Computing on POWER - IBM and OpenPOWER technologies to accelera...
 

More from klepsydratechnologie (8)

Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
OBDPC 2022
OBDPC 2022OBDPC 2022
OBDPC 2022
 
Dasia 2022
Dasia 2022Dasia 2022
Dasia 2022
 
Klepsydra Company Presentation
Klepsydra Company PresentationKlepsydra Company Presentation
Klepsydra Company Presentation
 
Roscon2021 Executor
Roscon2021 ExecutorRoscon2021 Executor
Roscon2021 Executor
 
IAC 2020
IAC 2020IAC 2020
IAC 2020
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
IAC 2019
IAC 2019 IAC 2019
IAC 2019
 

Recently uploaded

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownloadvrstrong314
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Krakówbim.edu.pl
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesKrzysztofKkol1
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyanic lab
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?XfilesPro
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion Clinic
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandIES VE
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfMeon Technology
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...rajkumar669520
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowPeter Caitens
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...Alluxio, Inc.
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...informapgpstrackings
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamtakuyayamamoto1800
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisNeo4j
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAlluxio, Inc.
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTier1 app
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareinfo611746
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessWSO2
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Anthony Dahanne
 

Recently uploaded (20)

top nidhi software solution freedownload
top nidhi software solution freedownloadtop nidhi software solution freedownload
top nidhi software solution freedownload
 
Agnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in KrakówAgnieszka Andrzejewska - BIM School Course in Kraków
Agnieszka Andrzejewska - BIM School Course in Kraków
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
Abortion ^Clinic ^%[+971588192166''] Abortion Pill Al Ain (?@?) Abortion Pill...
 
Using IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New ZealandUsing IESVE for Room Loads Analysis - Australia & New Zealand
Using IESVE for Room Loads Analysis - Australia & New Zealand
 
Breaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdfBreaking the Code : A Guide to WhatsApp Business API.pdf
Breaking the Code : A Guide to WhatsApp Business API.pdf
 
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
Facemoji Keyboard released its 2023 State of Emoji report, outlining the most...
 
Advanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should KnowAdvanced Flow Concepts Every Developer Should Know
Advanced Flow Concepts Every Developer Should Know
 
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
AI/ML Infra Meetup | Improve Speed and GPU Utilization for Model Training & S...
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoamOpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
OpenFOAM solver for Helmholtz equation, helmholtzFoam / helmholtzBubbleFoam
 
GraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysisGraphAware - Transforming policing with graph-based intelligence analysis
GraphAware - Transforming policing with graph-based intelligence analysis
 
AI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning FrameworkAI/ML Infra Meetup | Perspective on Deep Learning Framework
AI/ML Infra Meetup | Perspective on Deep Learning Framework
 
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERRORTROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
TROUBLESHOOTING 9 TYPES OF OUTOFMEMORYERROR
 
Studiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting softwareStudiovity film pre-production and screenwriting software
Studiovity film pre-production and screenwriting software
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 

RISC V in Spacer

  • 3. COMPARE AND SWAP • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modi fi es the contents of that memory location to a new given value. This is done as a single atomic operation. • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970. • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add
  • 4. Event Loop Sensor Multiplexer Two main data processing approaches Producer 1 Consumer 1 Consumer 2 Producer 2 Producer 3 Consumer Producer 1 4
  • 5. Lightweight, modular and compatible with most used operating systems Worldwide application Klepsydra SDK Klepsydra GPU Streaming Klepsydra AI Klepsydra ROS2 executor plugin SDK – Software Development Kit Boost data processing at the edge for general applications and processor intensive algorithms AI – Artificial Intelligence High performance deep neural network (DNN) engine to deploy any AI or machine learning module at the edge ROS2 Executor plugin Executor for ROS2 able to process up to 10 times more data with up to 50% reduction in CPU consumption. GPU (Graphic Processing Unit) High parallelisation of GPU to increase the processing data rate and GPU utilization THE PRODUCT
  • 6. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  • 7. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  • 8. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 9. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 10. ONNX API class KPSR_API OnnxDNNImporter { public: /** * @brief import an onnx file and uses a default eventloop factory for all processor cores * @param onnxFileName * @param testDNN * @return a share pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, bool testDNN = false); /** * @brief importForTest an onnx file and uses a default synchronous factory * @param onnxFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a share pointer to a DeepNeuralNetwork object * * This method is intented to be used for testing purposes only. * */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, const std::string & envFileName); }; 10
  • 11. Core API class DeepNeuralNetwork { public: /** * @brief setCallback * @param callback. Callback function for the prediction result. */ virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0; /** * @brief predict. Load input matrix as input to network. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0; /** * @brief predict. Copy-less version of predict. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0; }; 11
  • 12. KLEPSYDRA SDO PROCESS • Klepsydra Streaming Distribution Optimiser (SDO): • Runs on a separate computer • Executes several dry runs on the OBC • Collect statistics • Runs a genetic algorithm to fi nd the optimal solution for latency, power or throughput • The main variable to optimise is the distribution of layers are the two dimension of the threading model
  • 14. Part 2 The KATESU Project
  • 15. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  • 16. STATUS • Successful installation of the following setup: • LS1046 running Yocto Jethro • Docker Installed on LS1046 • Container with the following: • Ubuntu 20.04 • Klepsydra AI software fully supported (quantised and non- quantised)
  • 17. XILINX ZEDBOARD ZedBoard Klepsydra AI Container PetaLinux Klepsydra AI Container
  • 18. STATUS • Successful installation of the following setup: • ZedBoard running PetaLinux 2019.2 • Docker Installed on ZedBoard • Container with the following: • Ubuntu 20.04 • Klepsydra AI software with quantised support only
  • 19. PERFORMANCE RESULTS: CME ON LS1046 0 6,5 13 19,5 26 CPU / Hz TFLite + NEON Klepsydra 0 45 90 135 180 Latency (ms) TFLite + NEON Klepsydra 0 4,5 9 13,5 18 Throughput (Hz) TFLite + NEON Klepsydra
  • 20. PERFORMANCE RESULTS: CME-Q ON LS1046 0 6,75 13,5 20,25 27 CPU / Hz TFLite + NEON Klepsydra 0 30 60 90 120 Latency (ms) TFLite + NEON Klepsydra 0 7,5 15 22,5 30 Throughput (Hz) TFLite + NEON Klepsydra
  • 21. PERFORMANCE RESULTS: CME-Q ON ZEDBOARD 0 12,5 25 37,5 50 CPU / Hz TFLite + NEON Klepsydra 0 250 500 750 1000 Latency (ms) TFLite + NEON Klepsydra 0 0,65 1,3 1,95 2,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 22. PERFORMANCE RESULTS: BSC ON LS1046 0 20 40 60 80 CPU / Hz TFLite + NEON Klepsydra 0 1250 2500 3750 5000 Latency (ms) TFLite + NEON Klepsydra 0 0,15 0,3 0,45 0,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 24. THE PATTERN PROJECT PATTERN: Klepsydra AI ported to the GR740 aNd RISC-V • Target Processor: GR740, GR765 (Leon5 & Noel-V) • Target OS: RTMES5 • Development on commercial FPGA board • Validation on Space quali fi ed hardware
  • 25. THE MULTI-THREADING API Klepsydra SDK Multi-threading framework POSIX Operating System PTHREAD Klepsydra AI Klepsydra SDK Threading Abstraction Layer POSIX POSIX Operating System PTHREAD Klepsydra AI RTEMS5 Multi-threading framework RTEMS
  • 26. THE PARALLELISATION FRAMEWORK Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) POSIX Operating System PTHREAD Parallelisation Framework Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) Parallelisation Framework Threading Abstraction Layer POSIX RTEMS5 Multi-threading framework POSIX Operating System PTHREAD RTEMS
  • 27. THE MATHEMATICAL BACKEND Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) ARM x86 ARM x86 Klepsydra AI Back-ends Full-backend (Float32, Int8) Quantized-backend (Int8) ARM x86 ARM x86 RISC-V? RISC-V Extensions • Current version of Klepsydra AI supports RV32GV and RV64GC • Preparation for NOEL-V in three modes: • ‘Vanilla’ • P-Extension and V-Extension, • And more….
  • 28. THE PLAN Phase 1: Klepsydra AI for RTEMS5 Phase 2: Klepsydra AI for GR765/Leon5 Phase 3: Klepsydra AI for GR765/Noel-V Phase 4: Validation of Klepsydra AI on GR740 and GR765
  • 29. THE SCHEDULE Work Package Start Month End Month Duration in Months 0 1 2 3 4 5 6 7 8 9 10 11 KOM MTR1 MTR2 FR WP4.2 11 11 1 WP0 0 17 18 2 11 10 WP4.1 1 10 10 WP3.3 0 0 1 WP2.1 WP2.2 1 4 4 WP1.1 5 0 4 5 8 4 WP1.2 WP2.3 9 10 2 WP3.2 5 8 4 WP3.1 5 5 1
  • 30. CONCLUSIONS • Enable real AI for future missions on the GR765/NOEL-V • Very easy to use, via a simple API and web-based optimisation tool • Highly optimised for the GR765/NOEL-V processors • Lightweight software (current version is 4Mb) • Deterministic and full control of the dedicated resources
  • 31. NEXT STEPS • In-orbit-demonstration: • OPSSAT OBC: Using Onboard Altera FPGA and NOEL-V softcore • Other? • Health Monitoring (core operation failures, etc).
  • 32. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies