SlideShare a Scribd company logo
1 of 41
Download to read offline
DEEP NEURAL NETWORKS APPLIED TO LOW
POWER ONBOARD IMAGE COMPRESSION
OBPDC 2022
pablo.ghiglino@klepsydra.com
www.klepsydra.com
Klepsydra Technologies
PRESENTATION CONTENT
• Overview
• Context
• Previous work
• AI for compression
• Research work
• Preliminary results
• Deployment to OBC
• Klepsydra AI
• Past results
• Application to AI for compression
• Conclusions and Future Work
Part 1
Overview
CONTEXT
* Quoted from OBDP2019-S04-01-ESA_Camarero_Introduction_to_CCSDS_compression_standards_and_implementations_offered_by_ESA
PREVIOUS WORK ON AI-BASED
COMPRESSION
Original JPEG B-EED
ratio 68.4:1, PSNR 29.7 dB ratio 70.4:1, PSNR 30.9 dB
History
• In research since lat 1980’s (see paper references)
Principle:
• Generative Adversarial Networks (GAN) with learned compression outperform state-of-the-
art by 2x on bitrates of compressed images of equivalent quality.
AI-BASED COMPRESSION FOR SENTINEL DATA
(a) Original (b) Compressed with bmshj2018-1 (c) Compressed with b2018
Part 2
AI for
Compression
Part 2.1
Research
NONLINEAR TRANSFORM CODING WITH
NEURAL NETWORKS
• JPEG uses a Discrete Cosine
Transform (DCT) for g_a (analysis)
and Inverse DCT for g_s (synthesis).
• Instead, one approach can be to
use Arti
fi
cial Neural networks for
g_a and g_s.
• g_a and g_s need to be trained
together, but once trained they
can be run on separate machines.
• In AI, the combined DNN shown
here are also referred to as
AutoEncoders (since their
objective is to have output = input) Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
NONLINEAR TRANSFORM CODING WITH
NEURAL NETWORKS
• Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
• The strategy to improve
compression
performance is to send
additional information.
• By extending the auto-
encoder into a
hierarchical bayesian
model using hyperprior
`h_a`
• Improve the entropy
model by sending side
information from the
image.
• This is the overview of the
compression model
tested in this research.
• We used the two outputs
of the encoder stage: y
and z for validation
TEST ARCHITECTURE
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
TEST ARCHITECTURE
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
OBC Performance Benchmarking
Part 2
Preliminary
Results
THE GENERALISED DIVISIVE NORMALISATION
• Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston.
“Variational image compression with a scale hyperprior”. In: International Conference on
Learning Representations. 2018.
• The trained network contains
outdated code, or layers not
supported in inference engines
(neither t
fl
ite or onnx).
• We constructed an equivalent
network looking at the internal
code.
• The conv + GDN block is shown
here.
• The rest of the network is
relatively simple with
convolutional, ReLU, Abs and
transposed convolutional layers.
TEST ON SENTINEL IMAGE WITH UPDATED NETWORK
https://www.esa.int/ESA_Multimedia/Images/2022/06/Greenland_ice_sheet_melt
Trained network used on ESA Sentinel
Mission Images
• Image compressed on “ground OBC”
• Compressed data copied to local machine
(62 Kb)
• Generated image obtained from
compressed data on local machine
• Trained model present on “ground OBC” and
local machine
TEST ARCHITECTURE - PRELIM
PERFORMANCE RESULTS
Reported Runtimes (as of 2018)
• On Desktop CPU : ~ 330 ms
• On mid-range gaming GPU (~20 ms)
• Network not developed with edge devices in mind
This was consistent with our own benchmarks:
• Including conversion of GDN to equivalent layers
Part 3
Deployment
to OBC
Part 3.1
Klepsydra AI
LOCK-FREE AS ALTERNATIVE TO
PARALLELISATION
Parallelisation Pipeline
2-DIM THREADING MODEL
Input
Data
Layer
Output
Data
First dimension: pipelining
{
Thread 1 (Core 1)
Layer
Layer
Layer
Layer
Layer
{
Thread 2 (Core 2)
Layer
Layer
Layer
Layer
Layer Layer Layer Layer Layer
Deep Neural Network Structure
2-DIM THREADING MODEL
Input
Data
Output
Data
Second dimension: Matrix
multiplication parallelisation
{
T
hread
1
(Core
1)
Layer
{
T
hread
2
(Core
2)
{
T
hread
3
(Core
3)
2-DIM THREADING MODEL
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Core 1 Core 2
Core 3 Core 4
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
Layer
• Low CPU
• High throughput CPU
• High latency
• Mid CPU
• Mid throughput CPU
• Mid latency
• High CPU
• Mid throughput CPU
• Low latency
Threading model con
fi
guration
23
Example of performance benchmarks
Klepsydra AI optimised for Latency vs CPU
CPU
Usage
0
30
60
90
Data Rate (Hz)
0 10 20 30
Latency Optimisation CPU Optimisaion TF Lite
Latency = 29ms
Low CPU Saturation
Latency = 97ms
TF Lite Saturation
Latency = 120ms
Part 3.2
Past Results
THE KATESU PROJECT
Klepsydra AI Test for Space Use
• Processors
• QorIQ® Layerscape LS1046A Multicore Processor
• ZedBoard Zynq-7000 ARM/FPGA SoC Development Board
• Operating systems:
• Yocto Linux
• PetaLinux
• DNN models:
• Standard models (AlexNet, MobileNet, YOLO and SDD) and
standard data.
• ESA provided models and data (CME and Cloud Detection)
QORIQ® LAYERSCAPE LS1046A
MULTICORE PROCESSOR
QorIQ® Layerscape LS1046A
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• LS1046 running Yocto Jethro
• Docker Installed on LS1046
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software fully supported (quantised and non-
quantised)
XILINX ZEDBOARD
ZedBoard
Klepsydra AI Container
PetaLinux
Klepsydra AI Container
STATUS
• Successful installation of the following setup:
• ZedBoard running PetaLinux 2019.2
• Docker Installed on ZedBoard
• Container with the following:
• Ubuntu 20.04
• Klepsydra AI software with quantised support only
PERFORMANCE RESULTS: CME ON LS1046
0
6,5
13
19,5
26
CPU / Hz
TFLite + NEON Klepsydra
0
45
90
135
180
Latency (ms)
TFLite + NEON Klepsydra
0
4,5
9
13,5
18
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON LS1046
0
6,75
13,5
20,25
27
CPU / Hz
TFLite + NEON Klepsydra
0
30
60
90
120
Latency (ms)
TFLite + NEON Klepsydra
0
7,5
15
22,5
30
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: CME-Q ON ZEDBOARD
0
12,5
25
37,5
50
CPU / Hz
TFLite + NEON Klepsydra
0
250
500
750
1000
Latency (ms)
TFLite + NEON Klepsydra
0
0,65
1,3
1,95
2,6
Throughput (Hz)
TFLite + NEON Klepsydra
PERFORMANCE RESULTS: BSC ON LS1046
0
20
40
60
80
CPU / Hz
TFLite + NEON Klepsydra
0
1250
2500
3750
5000
Latency (ms)
TFLite + NEON Klepsydra
0
0,15
0,3
0,45
0,6
Throughput (Hz)
TFLite + NEON Klepsydra
Part 3.3
Cloud Detection
Part 3.3
Application to
AI on
compression
PRELIMINARY RESULTS
• Performance results carried out on
• x86 2-core OBC
• Ubuntu Linux 20.02
• Klepsydra AI v5.9
PERFORMANCE RESULTS
0
42,5
85
127,5
170
CPU / Hz
TFLite + AVX2 Klepsydra
0
0,25
0,5
0,75
1
Latency (ms)
TFLite + AVX2 Klepsydra
0
400
800
1200
1600
Throughput (Hz)
TFLite + AVX2 Klepsydra
Part 4
Conclusions
and Future
work
CONCLUSIONS
• AI for compression can achieve 2x compressed ratio
compared to other lossy compression solutions like
JPEG2000
• Compression and decompression are trained together in
one network that then will be separated: compression in
OBC, decompression in ground.
• Performance benchmarks show that this AI networks are
quite heavy. By using Klepsydra AI, execution on OBC
become feasible
FUTURE WORK
• ARM performance benchmarks: currently under research
and optimisation effort.
• AI compression training for Space images to increase
compressed ratio.
• Machine-to-machine AI Compression
• Compression performance benchmarks on Jetson NX
CONTACT INFORMATION
Dr Pablo Ghiglino
pablo.ghiglino@klepsydra.com
+41786931544
www.klepsydra.com
linkedin.com/company/klepsydra-technologies

More Related Content

Similar to OBDPC 2022

Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer visionMarcin Jedyk
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptxruvex
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksMarcinJedyk
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusionDong-Won Shin
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdfdsfajkh
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Phase based-binarization-of-ancie...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Phase based-binarization-of-ancie...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...IEEEBEBTECHSTUDENTPROJECTS
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architectureDhaval Kaneria
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeIntel® Software
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression Roberto Iacoviello
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelKoichi Shirahata
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NECST Lab @ Politecnico di Milano
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...Edge AI and Vision Alliance
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfObject Automation
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...NECST Lab @ Politecnico di Milano
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Sara Granados Cabeza
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetEric Haibin Lin
 

Similar to OBDPC 2022 (20)

Introduction to computer vision
Introduction to computer visionIntroduction to computer vision
Introduction to computer vision
 
DigitRecognition.pptx
DigitRecognition.pptxDigitRecognition.pptx
DigitRecognition.pptx
 
Introduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural NetworksIntroduction to computer vision with Convoluted Neural Networks
Introduction to computer vision with Convoluted Neural Networks
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
Analysis of KinectFusion
Analysis of KinectFusionAnalysis of KinectFusion
Analysis of KinectFusion
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
DALL-E.pdf
DALL-E.pdfDALL-E.pdf
DALL-E.pdf
 
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Phase based-binarization-of-ancie...IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS  Phase based-binarization-of-ancie...
IEEE 2014 MATLAB IMAGE PROCESSING PROJECTS Phase based-binarization-of-ancie...
 
Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
SDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's StampedeSDVIs and In-Situ Visualization on TACC's Stampede
SDVIs and In-Situ Visualization on TACC's Stampede
 
Machine Learning approaches at video compression
Machine Learning approaches at video compression Machine Learning approaches at video compression
Machine Learning approaches at video compression
 
Deep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLabDeep Learning Initiative @ NECSTLab
Deep Learning Initiative @ NECSTLab
 
Performance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming ModelPerformance Analysis of Lattice QCD with APGAS Programming Model
Performance Analysis of Lattice QCD with APGAS Programming Model
 
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
NNECST: an FPGA-based approach for the hardware acceleration of Convolutional...
 
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
“Efficiently Map AI and Vision Applications onto Multi-core AI Processors Usi...
 
Deploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdfDeploying Pretrained Model In Edge IoT Devices.pdf
Deploying Pretrained Model In Edge IoT Devices.pdf
 
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
CNNECST: an FPGA-based approach for the hardware acceleration of Convolutiona...
 
Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...Efficient architecture to condensate visual information driven by attention ...
Efficient architecture to condensate visual information driven by attention ...
 
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNetFrom Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
From Hours to Minutes: The Journey of Optimizing Mask-RCNN and BERT Using MXNet
 

More from klepsydratechnologie (9)

Robotics technical Presentation
Robotics technical PresentationRobotics technical Presentation
Robotics technical Presentation
 
RISC V in Spacer
RISC V in SpacerRISC V in Spacer
RISC V in Spacer
 
Klepsydra Company Presentation
Klepsydra Company PresentationKlepsydra Company Presentation
Klepsydra Company Presentation
 
ADCSS 2022
ADCSS 2022ADCSS 2022
ADCSS 2022
 
Roscon2021 Executor
Roscon2021 ExecutorRoscon2021 Executor
Roscon2021 Executor
 
IAC 2020
IAC 2020IAC 2020
IAC 2020
 
GR740 User day
GR740 User dayGR740 User day
GR740 User day
 
Smallsat 2021
Smallsat 2021Smallsat 2021
Smallsat 2021
 
IAC 2019
IAC 2019 IAC 2019
IAC 2019
 

Recently uploaded

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsMehedi Hasan Shohan
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - InfographicHr365.us smith
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWave PLM
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyFrank van der Linden
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfPower Karaoke
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 

Recently uploaded (20)

The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
XpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software SolutionsXpertSolvers: Your Partner in Building Innovative Software Solutions
XpertSolvers: Your Partner in Building Innovative Software Solutions
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Asset Management Software - Infographic
Asset Management Software - InfographicAsset Management Software - Infographic
Asset Management Software - Infographic
 
What is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need ItWhat is Fashion PLM and Why Do You Need It
What is Fashion PLM and Why Do You Need It
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Engage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The UglyEngage Usergroup 2024 - The Good The Bad_The Ugly
Engage Usergroup 2024 - The Good The Bad_The Ugly
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
The Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdfThe Evolution of Karaoke From Analog to App.pdf
The Evolution of Karaoke From Analog to App.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 

OBDPC 2022

  • 1. DEEP NEURAL NETWORKS APPLIED TO LOW POWER ONBOARD IMAGE COMPRESSION OBPDC 2022 pablo.ghiglino@klepsydra.com www.klepsydra.com Klepsydra Technologies
  • 2. PRESENTATION CONTENT • Overview • Context • Previous work • AI for compression • Research work • Preliminary results • Deployment to OBC • Klepsydra AI • Past results • Application to AI for compression • Conclusions and Future Work
  • 4. CONTEXT * Quoted from OBDP2019-S04-01-ESA_Camarero_Introduction_to_CCSDS_compression_standards_and_implementations_offered_by_ESA
  • 5. PREVIOUS WORK ON AI-BASED COMPRESSION Original JPEG B-EED ratio 68.4:1, PSNR 29.7 dB ratio 70.4:1, PSNR 30.9 dB History • In research since lat 1980’s (see paper references) Principle: • Generative Adversarial Networks (GAN) with learned compression outperform state-of-the- art by 2x on bitrates of compressed images of equivalent quality.
  • 6. AI-BASED COMPRESSION FOR SENTINEL DATA (a) Original (b) Compressed with bmshj2018-1 (c) Compressed with b2018
  • 9. NONLINEAR TRANSFORM CODING WITH NEURAL NETWORKS • JPEG uses a Discrete Cosine Transform (DCT) for g_a (analysis) and Inverse DCT for g_s (synthesis). • Instead, one approach can be to use Arti fi cial Neural networks for g_a and g_s. • g_a and g_s need to be trained together, but once trained they can be run on separate machines. • In AI, the combined DNN shown here are also referred to as AutoEncoders (since their objective is to have output = input) Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY
  • 10. NONLINEAR TRANSFORM CODING WITH NEURAL NETWORKS • Learned Image Compression: https://www.youtube.com/watch?v=x_q7cZviXkY • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. • The strategy to improve compression performance is to send additional information. • By extending the auto- encoder into a hierarchical bayesian model using hyperprior `h_a` • Improve the entropy model by sending side information from the image. • This is the overview of the compression model tested in this research. • We used the two outputs of the encoder stage: y and z for validation
  • 11. TEST ARCHITECTURE • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018.
  • 12. TEST ARCHITECTURE • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. OBC Performance Benchmarking
  • 14. THE GENERALISED DIVISIVE NORMALISATION • Johannes Ballé, David Minnen, Saurabh Singh, Sung Jin Hwang, and Nick Johnston. “Variational image compression with a scale hyperprior”. In: International Conference on Learning Representations. 2018. • The trained network contains outdated code, or layers not supported in inference engines (neither t fl ite or onnx). • We constructed an equivalent network looking at the internal code. • The conv + GDN block is shown here. • The rest of the network is relatively simple with convolutional, ReLU, Abs and transposed convolutional layers.
  • 15. TEST ON SENTINEL IMAGE WITH UPDATED NETWORK https://www.esa.int/ESA_Multimedia/Images/2022/06/Greenland_ice_sheet_melt Trained network used on ESA Sentinel Mission Images • Image compressed on “ground OBC” • Compressed data copied to local machine (62 Kb) • Generated image obtained from compressed data on local machine • Trained model present on “ground OBC” and local machine
  • 16. TEST ARCHITECTURE - PRELIM PERFORMANCE RESULTS Reported Runtimes (as of 2018) • On Desktop CPU : ~ 330 ms • On mid-range gaming GPU (~20 ms) • Network not developed with edge devices in mind This was consistent with our own benchmarks: • Including conversion of GDN to equivalent layers
  • 19. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  • 20. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  • 21. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  • 22. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  • 23. 23 Example of performance benchmarks Klepsydra AI optimised for Latency vs CPU CPU Usage 0 30 60 90 Data Rate (Hz) 0 10 20 30 Latency Optimisation CPU Optimisaion TF Lite Latency = 29ms Low CPU Saturation Latency = 97ms TF Lite Saturation Latency = 120ms
  • 25. THE KATESU PROJECT Klepsydra AI Test for Space Use • Processors • QorIQ® Layerscape LS1046A Multicore Processor • ZedBoard Zynq-7000 ARM/FPGA SoC Development Board • Operating systems: • Yocto Linux • PetaLinux • DNN models: • Standard models (AlexNet, MobileNet, YOLO and SDD) and standard data. • ESA provided models and data (CME and Cloud Detection)
  • 26. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  • 27. STATUS • Successful installation of the following setup: • LS1046 running Yocto Jethro • Docker Installed on LS1046 • Container with the following: • Ubuntu 20.04 • Klepsydra AI software fully supported (quantised and non- quantised)
  • 28. XILINX ZEDBOARD ZedBoard Klepsydra AI Container PetaLinux Klepsydra AI Container
  • 29. STATUS • Successful installation of the following setup: • ZedBoard running PetaLinux 2019.2 • Docker Installed on ZedBoard • Container with the following: • Ubuntu 20.04 • Klepsydra AI software with quantised support only
  • 30. PERFORMANCE RESULTS: CME ON LS1046 0 6,5 13 19,5 26 CPU / Hz TFLite + NEON Klepsydra 0 45 90 135 180 Latency (ms) TFLite + NEON Klepsydra 0 4,5 9 13,5 18 Throughput (Hz) TFLite + NEON Klepsydra
  • 31. PERFORMANCE RESULTS: CME-Q ON LS1046 0 6,75 13,5 20,25 27 CPU / Hz TFLite + NEON Klepsydra 0 30 60 90 120 Latency (ms) TFLite + NEON Klepsydra 0 7,5 15 22,5 30 Throughput (Hz) TFLite + NEON Klepsydra
  • 32. PERFORMANCE RESULTS: CME-Q ON ZEDBOARD 0 12,5 25 37,5 50 CPU / Hz TFLite + NEON Klepsydra 0 250 500 750 1000 Latency (ms) TFLite + NEON Klepsydra 0 0,65 1,3 1,95 2,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 33. PERFORMANCE RESULTS: BSC ON LS1046 0 20 40 60 80 CPU / Hz TFLite + NEON Klepsydra 0 1250 2500 3750 5000 Latency (ms) TFLite + NEON Klepsydra 0 0,15 0,3 0,45 0,6 Throughput (Hz) TFLite + NEON Klepsydra
  • 35. Part 3.3 Application to AI on compression
  • 36. PRELIMINARY RESULTS • Performance results carried out on • x86 2-core OBC • Ubuntu Linux 20.02 • Klepsydra AI v5.9
  • 37. PERFORMANCE RESULTS 0 42,5 85 127,5 170 CPU / Hz TFLite + AVX2 Klepsydra 0 0,25 0,5 0,75 1 Latency (ms) TFLite + AVX2 Klepsydra 0 400 800 1200 1600 Throughput (Hz) TFLite + AVX2 Klepsydra
  • 39. CONCLUSIONS • AI for compression can achieve 2x compressed ratio compared to other lossy compression solutions like JPEG2000 • Compression and decompression are trained together in one network that then will be separated: compression in OBC, decompression in ground. • Performance benchmarks show that this AI networks are quite heavy. By using Klepsydra AI, execution on OBC become feasible
  • 40. FUTURE WORK • ARM performance benchmarks: currently under research and optimisation effort. • AI compression training for Space images to increase compressed ratio. • Machine-to-machine AI Compression • Compression performance benchmarks on Jetson NX
  • 41. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies