Smallsat 2021

SMALLSAT 2021 PRESENTATION
DR PABLO GHIGLINO
pablo.ghiglino@klepsydra.com
www.klepsydra.com
A Low Power And High Performance Arti
fi
cial Intelligence
Approach To Increase Guidance Navigation And Control
Robustness

KLEPSYDRA AI IN ACTION
The demo:
• Pose estimation of 67P/
Churyumov–Gerasimenko
asteroid.
• Using an AI deep neural network
(DNN)
• Using real and synthetically
generated data from Rosetta
mission.
• Comparison of three AI inference
engines Klepsydra AI,
TensorFlowLite and OpenCV-
CNN
• Three identical computers,
running the same model with
the same input data and FPS.

KLEPSYDRA AI OVERVIEW
Klepsydra AI
Performance
analysis
Language
bindings
Trained Model
Basic features
Advanced features
Images Sensor Data
Timeseries

TRADING SOFTWARE VS EDGE SOFTWARE
Trading Systems
Edge Systems
• Bigger computer did not solve the
problem
• Can be solved using cutting-edge
lock-free programming techniques
• Top investment banks make billions
using these techniques.
• Very few developers have the required
skills
Computer
Usage
Low Medium
Data volume
Saturation

THE TECHNOLOGY
Klepsydra SDK
Sensors
External
Comms
Other Events
Application
Operating System
Patent pending technology
Klepsydra SDK
• 8x more real-time throughput
• 50% less CPU consumption
• No extra hardware or cloud

Event Loop
Sensor Multiplexer
Two main data
processing approaches
Producer 1
Consumer 1 Consumer 2
Producer 2
Producer 3
Consumer
Producer 1
6

Cobham GR716 Microcontroller
7
CPU vs Data processing rate 8 producers
CPU
(%) 25,00
43,75
62,50
81,25
100,00
Processing Rate (Hz)
0,00 1,25 2,50 3,75 5,00
Safe Queue Klepsydra
Traditional concurrent queue
Klepsydra’s Eventloop
Power consumption vs Data Processing
Power
(%)
10
33
55
78
100
Data processing rate (Hz)
0 10 20 30 40
Traditional
edge software
Klepsydra
Technical Spec:
• Processor: GR716
• OS: RTEMS 5
• Middleware: Memory data sharing
Benchmark Scenario:
• Multi-sensor data processing
• Concurrent Queue and Klepsydra’s processing
engine

APPROACHES TO CONCURRENT
ALGORITHMIC EXECUTION
Parallelisation Pipeline

BENCHMARK DESCRIPTION
Description
• Given an input matrix, a number of sequential multiplications will be
performed:
• Step 1: A => B = A x A => Step 2 : C = B x B…
• Matrix A randomly generated on each new sequence
Parameters:
• Matrix dimensions: 100x100
• Data type: Float, integer
• Number of multiplications per matrix: [10, 60]
• Processing frequency: [2Hz - 100Hz]
Technical Spec
• Computer: Odroid XU4
• OS: Ubuntu 18.04

TESTING SCENARIOS
Input
Matrix
B = A x A C = B x B
Output
Matrix
Input
Matrix B = A x A
Output
Matrix
C = B x B
Klepsydra Parallel Streaming Setup
OpenMP Sequential Setup
{
Thread 1
{
Thread 2
{
Vectorised
{
Vectorised

FLOAT PERFORMANCE RESULTS I
CPU Usage. 10 Steps
0,0
22,5
45,0
67,5
90,0
Publishing Rate (Hz)
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Throughput. 10 Steps
0,00
25,00
50,00
75,00
100,00
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
Latency. 10 Steps
0,00
12,50
25,00
37,50
50,00
2,00 26,50 51,00 75,50 100,00
OpenMp Klepsydra
0,00
10,00
20,00
30,00
40,00
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
Latency. 20 Steps
0,00
27,50
55,00
82,50
110,00
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra
CPU Usage. 20 Steps
0,0
22,5
45,0
67,5
90,0
2,00 11,50 21,00 30,50 40,00
OpenMp Klepsydra

FLOAT PERFORMANCE RESULTS II
CPU Usage. 30 Steps
0,0
20,0
40,0
60,0
80,0
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
0,00
5,00
10,00
15,00
20,00
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra
CPU Usage. 40 Steps
0,0
17,5
35,0
52,5
70,0
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
0,00
3,50
7,00
10,50
14,00
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 40 Steps
0,00
60,00
120,00
180,00
240,00
2,00 5,00 8,00 11,00 14,00
OpenMp Klepsydra
Latency. 30 Steps
0,00
45,00
90,00
135,00
180,00
2,00 6,50 11,00 15,50 20,00
OpenMp Klepsydra

FLOAT PERFORMANCE RESULTS III
CPU Usage. 50 Steps
0,0
15,0
30,0
45,0
60,0
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
0,00
2,75
5,50
8,25
11,00
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
Latency. 50 Steps
0,00
100,00
200,00
300,00
400,00
2,00 4,00 6,00 8,00 10,00
OpenMp Klepsydra
CPU Usage. 60 Steps
0,0
15,0
30,0
45,0
60,0
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
0,00
2,00
4,00
6,00
8,00
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra
Latency. 60 Steps
0,00
225,00
450,00
675,00
900,00
2,00 3,50 5,00 6,50 8,00
OpenMp Klepsydra

KLEPSYDRA AI DATA PROCESSING
APPROACH
Input
Data
Layer Layer
Output
Data
Klepsydra AI threading model
{
Thread 1
{
Thread 2
Threading model consists of:
- Number of cores assigned to event loops
- Number of event loops per core
- Number of parallelisation threads for each layer
Most layers can
be parallelised
and are
vectorised.
Eventloops are
assigned to
cores

Performance tuning
Performance Criteria
• CPU usage
• RAM usage
• Throughput (output data rate)
• Latency
15
Performance parameters:
• pool_size
Size of the internal queues of the event loop publish/
subscribe pairs.
High throughput requires large numbers, i.e., more RAM
usage, low throughout requires smaller number, therefore
less RAM.
Performance parameters
• number_of_cores
Number of cores where event loops will be distributed (by
default one event loop per core). High throughput requires
more cores, i.e., more CPU usage, low throughput requires
low number of cores, therefore substantial reduction in
CPU usage.
Performance parameters
• number_of_parallel_threads
Number of threads assigned to parallelise layers. For low
latency requirements, assign large numbers (maximum =
number of cores), i.e., increase CPU usage. For no latency
requirements, use low numbers (minimum = 1), therefore
substantial reduction in CPU usage.

16
Example of performance benchmarks
TensorFlow Klepsydra AI
Latency: 56ms
Latency: 35ms

ROADMAP
Q2 2021
• No third party dependencies.
• Binaries are C/C++ only
• Custom format for models
Q3 2021
• FreeRTOS support (alpha version)
• Xilinx Ultrascale+ board
• Microchip SAM V71
Q4 2021
• PykeOS support (alpha version)
• Xilinx Zedboard
Q1 2022
• NVIDIA Jetson TX2 Support (alpha
release)
• Quantisation support
Q2 2022
• Graphs support
• Memory allocation new model
• C support
Legend:
Hard deadlines
Flexible dates

CONCLUSIONS
• The use of advanced lock-free algorithms for on-board data
processing allows a substantial increase in real-time data
throughput and a 50% reduction in power consumption.
• When combined with pipelining, it can enable ground
breaking performance improvement in AI algorithms.
• Further work will be done in the
fi
eld of GPU and FPGA, self-
tuning and graph AI models.

CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

Smallsat 2021

Recommended

Recommended

More Related Content

Similar to Smallsat 2021

Similar to Smallsat 2021 (20)

More from klepsydratechnologie

More from klepsydratechnologie (7)

Recently uploaded

Recently uploaded (20)

Smallsat 2021