DEEP NEURAL NETWORKS APPLIED TO LOW
POWER ONBOARD IMAGE COMPRESSION
OBPDC 2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies
PRESENTATION CONTENT
• Overview
• Context
• Previous work
• AI for compression
• Research work
• Preliminary results
• Deployment to OBC
• Klepsydra AI
• Past results
• Application to AI for compression
• Conclusions and Future Work
Part 1
Overview
CONTEXT
* Quoted from OBDP2019-S04-01-ESA_Camarero_Introduction_to_CCSDS_compression_standards_and_implementations_offered_by_ESA
PREVIOUS WORK ON AI-BASED
COMPRESSION
Original JPEG B-EED
ratio 68.4:1, PSNR 29.7 dB ratio 70.4:1, PSNR 30.9 dB
History
• In research since lat 1980’s (see paper references)
Principle:
• Generative Adversarial Networks (GAN) with learned compression outperform state-of-the-
art by 2x on bitrates of compressed images of equivalent quality.
AI-BASED COMPRESSION FOR SENTINEL DATA
(a) Original (b) Compressed with bmshj2018-1 (c) Compressed with b2018
Part 2
AI for
Compression
Part 2.1
Research
NONLINEAR TRANSFORM CODING WITH
NEURAL NETWORKS
• JPEG uses a Discrete Cosine
Transform (DCT) for g_a (analysis)
and Inverse DCT for g_s (synthesis).
• Instead, one approach can be to
use Arti
fi
cial Neural networks for
g_a and g_s.
• g_a and g_s need to be trained
together, but once trained they
can be run on separate machines.
• In AI, the combined DNN shown
here are also referred to as
AutoEncoders (since their
objective is to have output = input) Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
NONLINEAR TRANSFORM CODING WITH
NEURAL NETWORKS
• Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
• The strategy to improve
compression
performance is to send
additional information.
• By extending the auto-
encoder into a
hierarchical bayesian
model using hyperprior
`h_a`
• Improve the entropy
model by sending side
information from the
image.
• This is the overview of the
compression model
tested in this research.
• We used the two outputs
of the encoder stage: y
and z for validation
TEST ARCHITECTURE
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
TEST ARCHITECTURE
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
OBC Performance Benchmarking
Part 2
Preliminary
Results
THE GENERALISED DIVISIVE NORMALISATION
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
• The trained network contains
outdated code, or layers not
supported in inference engines
(neither t
fl
ite or onnx).
• We constructed an equivalent
network looking at the internal
code.
• The conv + GDN block is shown
here.
• The rest of the network is
relatively simple with
convolutional, ReLU, Abs and
transposed convolutional layers.
TEST ON SENTINEL IMAGE WITH UPDATED NETWORK
https://www.esa.int/ESA_Multimedia/Images/2022/06/Greenland_ice_sheet_melt
Trained network used on ESA Sentinel
Mission Images
• Image compressed on “ground OBC”
• Compressed data copied to local machine
(62 Kb)
• Generated image obtained from
compressed data on local machine
• Trained model present on “ground OBC” and
local machine
TEST ARCHITECTURE - PRELIM
PERFORMANCE RESULTS
Reported Runtimes (as of 2018)
• On Desktop CPU : ~ 330 ms
• On mid-range gaming GPU (~20 ms)
• Network not developed with edge devices in mind
This was consistent with our own benchmarks:
• Including conversion of GDN to equivalent layers
Part 3
Deployment
to OBC
Part 3.1
Klepsydra AI
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
23
Example of performance benchmarks
Klepsydra AI optimised for Latency vs CPU
CPU
Usage
0
30
60
90
Data Rate (Hz)
0 10 20 30
Latency Optimisation CPU Optimisaion TF Lite
Latency = 29ms
Low CPU Saturation
Latency = 97ms
TF Lite Saturation
Latency = 120ms
Part 3.2
Past Results
THE KATESU PROJECT
Klepsydra AI Test for Space Use
• Processors
• QorIQ® Layerscape LS1046A Multicore Processor
• ZedBoard Zynq-7000 ARM/FPGA SoC Development Board
• Operating systems:
• Yocto Linux
• PetaLinux
• DNN models:
• Standard models (AlexNet, MobileNet, YOLO and SDD) and
standard data.
• ESA provided models and data (CME and Cloud Detection)
QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• LS1046 running Yocto Jethro
• Docker Installed on LS1046
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software fully supported (quantised and non-
quantised)
XILINX ZEDBOARD
ZedBoard
Klepsydra AI Container
PetaLinux
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• ZedBoard running PetaLinux 2019.2
• Docker Installed on ZedBoard
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software with quantised support only
PERFORMANCE RESULTS: CME ON LS1046
0
6,5
13
19,5
26
CPU / Hz
TFLite + NEON Klepsydra
0
45
90
135
180
Latency (ms)
TFLite + NEON Klepsydra
0
4,5
9
13,5
18
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON LS1046
0
6,75
13,5
20,25
27
CPU / Hz
TFLite + NEON Klepsydra
0
30
60
90
120
Latency (ms)
TFLite + NEON Klepsydra
0
7,5
15
22,5
30
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON ZEDBOARD
0
12,5
25
37,5
50
CPU / Hz
TFLite + NEON Klepsydra
0
250
500
750
1000
Latency (ms)
TFLite + NEON Klepsydra
0
0,65
1,3
1,95
2,6
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: BSC ON LS1046
0
20
40
60
80
CPU / Hz
TFLite + NEON Klepsydra
0
1250
2500
3750
5000
Latency (ms)
TFLite + NEON Klepsydra
0
0,15
0,3
0,45
0,6
Throughput (Hz)
TFLite + NEON Klepsydra
Part 3.3
Cloud Detection
Part 3.3
Application to
AI on
compression
PRELIMINARY RESULTS
• Performance results carried out on
• x86 2-core OBC
• Ubuntu Linux 20.02
• Klepsydra AI v5.9
PERFORMANCE RESULTS
0
42,5
85
127,5
170
CPU / Hz
TFLite + AVX2 Klepsydra
0
0,25
0,5
0,75
1
Latency (ms)
TFLite + AVX2 Klepsydra
0
400
800
1200
1600
Throughput (Hz)
TFLite + AVX2 Klepsydra
Part 4
Conclusions
and Future
work
CONCLUSIONS
• AI for compression can achieve 2x compressed ratio
compared to other lossy compression solutions like
JPEG2000
• Compression and decompression are trained together in
one network that then will be separated: compression in
OBC, decompression in ground.
• Performance benchmarks show that this AI networks are
quite heavy. By using Klepsydra AI, execution on OBC
become feasible
FUTURE WORK
• ARM performance benchmarks: currently under research
and optimisation effort.
• AI compression training for Space images to increase
compressed ratio.
• Machine-to-machine AI Compression
• Compression performance benchmarks on Jetson NX
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

OBDPC 2022

  • 1.
    DEEP NEURAL NETWORKSAPPLIED TO LOW POWER ONBOARD IMAGE COMPRESSION OBPDC 2022 pablo.ghiglino@klepsydra.com www.klepsydra.com Klepsydra Technologies
  • 2.
    PRESENTATION CONTENT • Overview •Context • Previous work • AI for compression • Research work • Preliminary results • Deployment to OBC • Klepsydra AI • Past results • Application to AI for compression • Conclusions and Future Work
  • 3.
  • 4.
    CONTEXT * Quoted fromOBDP2019-S04-01-ESA_Camarero_Introduction_to_CCSDS_compression_standards_and_implementations_offered_by_ESA
  • 5.
    PREVIOUS WORK ONAI-BASED COMPRESSION Original JPEG B-EED ratio 68.4:1, PSNR 29.7 dB ratio 70.4:1, PSNR 30.9 dB History • In research since lat 1980’s (see paper references) Principle: • Generative Adversarial Networks (GAN) with learned compression outperform state-of-the- art by 2x on bitrates of compressed images of equivalent quality.
  • 6.
    AI-BASED COMPRESSION FORSENTINEL DATA (a) Original (b) Compressed with bmshj2018-1 (c) Compressed with b2018
  • 7.
  • 8.
  • 9.
    NONLINEAR TRANSFORM CODINGWITH NEURAL NETWORKS • JPEG uses a Discrete Cosine Transform (DCT) for g_a (analysis) and Inverse DCT for g_s (synthesis). • Instead, one approach can be to use Arti fi cial Neural networks for g_a and g_s. • g_a and g_s need to be trained together, but once trained they can be run on separate machines. • In AI, the combined DNN shown here are also referred to as AutoEncoders (since their objective is to have output = input) Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
  • 10.
    NONLINEAR TRANSFORM CODINGWITH NEURAL NETWORKS • Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. • The strategy to improve compression performance is to send additional information. • By extending the auto- encoder into a hierarchical bayesian model using hyperprior `h_a` • Improve the entropy model by sending side information from the image. • This is the overview of the compression model tested in this research. • We used the two outputs of the encoder stage: y and z for validation
  • 11.
    TEST ARCHITECTURE • JohannesBallé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018.
  • 12.
    TEST ARCHITECTURE • JohannesBallé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. OBC Performance Benchmarking
  • 13.
  • 14.
    THE GENERALISED DIVISIVENORMALISATION • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. • The trained network contains outdated code, or layers not supported in inference engines (neither t fl ite or onnx). • We constructed an equivalent network looking at the internal code. • The conv + GDN block is shown here. • The rest of the network is relatively simple with convolutional, ReLU, Abs and transposed convolutional layers.
  • 15.
    TEST ON SENTINELIMAGE WITH UPDATED NETWORK https://www.esa.int/ESA_Multimedia/Images/2022/06/Greenland_ice_sheet_melt Trained network used on ESA Sentinel Mission Images • Image compressed on “ground OBC” • Compressed data copied to local machine (62 Kb) • Generated image obtained from compressed data on local machine • Trained model present on “ground OBC” and local machine
  • 16.
    TEST ARCHITECTURE -PRELIM PERFORMANCE RESULTS Reported Runtimes (as of 2018) • On Desktop CPU : ~ 330 ms • On mid-range gaming GPU (~20 ms) • Network not developed with edge devices in mind This was consistent with our own benchmarks: • Including conversion of GDN to equivalent layers
  • 17.
  • 18.
  • 19.
    LOCK-FREE AS ALTERNATIVETO PARALLELISATION Parallelisation Pipeline
  • 20.
    2-DIM THREADING MODEL Input Data Layer Output Data Firstdimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  • 21.
    2-DIM THREADING MODEL Input Data Output Data Seconddimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 22.
    2-DIM THREADING MODEL Core1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 23.
    23 Example of performancebenchmarks Klepsydra AI optimised for Latency vs CPU CPU Usage 0 30 60 90 Data Rate (Hz) 0 10 20 30 Latency Optimisation CPU Optimisaion TF Lite Latency = 29ms Low CPU Saturation Latency = 97ms TF Lite Saturation Latency = 120ms
  • 24.
  • 25.
    THE KATESU PROJECT KlepsydraAI Test for Space Use • Processors • QorIQ® Layerscape LS1046A Multicore Processor • ZedBoard Zynq-7000 ARM/FPGA SoC Development Board • Operating systems: • Yocto Linux • PetaLinux • DNN models: • Standard models (AlexNet, MobileNet, YOLO and SDD) and standard data. • ESA provided models and data (CME and Cloud Detection)
  • 26.
    QORIQ® LAYERSCAPE LS1046A MULTICOREPROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  • 27.
    STATUS • Successful installationof the following setup: • LS1046 running Yocto Jethro • Docker Installed on LS1046 • Container with the following: • Ubuntu 20.04 • Klepsydra AI software fully supported (quantised and non- quantised)
  • 28.
    XILINX ZEDBOARD ZedBoard Klepsydra AIContainer PetaLinux Klepsydra AI Container
  • 29.
    STATUS • Successful installationof the following setup: • ZedBoard running PetaLinux 2019.2 • Docker Installed on ZedBoard • Container with the following: • Ubuntu 20.04 • Klepsydra AI software with quantised support only
  • 30.
    PERFORMANCE RESULTS: CMEON LS1046 0 6,5 13 19,5 26 CPU / Hz TFLite + NEON Klepsydra 0 45 90 135 180 Latency (ms) TFLite + NEON Klepsydra 0 4,5 9 13,5 18 Throughput (Hz) TFLite + NEON Klepsydra
  • 31.
    PERFORMANCE RESULTS: CME-QON LS1046 0 6,75 13,5 20,25 27 CPU / Hz TFLite + NEON Klepsydra 0 30 60 90 120 Latency (ms) TFLite + NEON Klepsydra 0 7,5 15 22,5 30 Throughput (Hz) TFLite + NEON Klepsydra
  • 32.
    PERFORMANCE RESULTS: CME-QON ZEDBOARD 0 12,5 25 37,5 50 CPU / Hz TFLite + NEON Klepsydra 0 250 500 750 1000 Latency (ms) TFLite + NEON Klepsydra 0 0,65 1,3 1,95 2,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 33.
    PERFORMANCE RESULTS: BSCON LS1046 0 20 40 60 80 CPU / Hz TFLite + NEON Klepsydra 0 1250 2500 3750 5000 Latency (ms) TFLite + NEON Klepsydra 0 0,15 0,3 0,45 0,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 34.
  • 35.
  • 36.
    PRELIMINARY RESULTS • Performanceresults carried out on • x86 2-core OBC • Ubuntu Linux 20.02 • Klepsydra AI v5.9
  • 37.
    PERFORMANCE RESULTS 0 42,5 85 127,5 170 CPU /Hz TFLite + AVX2 Klepsydra 0 0,25 0,5 0,75 1 Latency (ms) TFLite + AVX2 Klepsydra 0 400 800 1200 1600 Throughput (Hz) TFLite + AVX2 Klepsydra
  • 38.
  • 39.
    CONCLUSIONS • AI forcompression can achieve 2x compressed ratio compared to other lossy compression solutions like JPEG2000 • Compression and decompression are trained together in one network that then will be separated: compression in OBC, decompression in ground. • Performance benchmarks show that this AI networks are quite heavy. By using Klepsydra AI, execution on OBC become feasible
  • 40.
    FUTURE WORK • ARMperformance benchmarks: currently under research and optimisation effort. • AI compression training for Space images to increase compressed ratio. • Machine-to-machine AI Compression • Compression performance benchmarks on Jetson NX
  • 41.
    CONTACT INFORMATION Dr PabloGhiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies