Successfully reported this slideshow.
Your SlideShare is downloading. ×

ADCSS 2022

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Upcoming SlideShare
Dasia 2022
Dasia 2022
Loading in …3
×

Check these out next

1 of 42 Ad

ADCSS 2022

Download to read offline

A novel approach to Artificial Intelligence On-Board
New generations of spacecrafts are required to perform tasks with an increased level of autonomy. Space exploration, Earth Observation, space robotics, etc. are all growing fields in Space that require more sensors and more computational power to perform these missions.

Sensors, embedded processors, and hardware in general have hugely evolved in the last decade, equipping embedded systems with large number of sensors that will produce data at rates that has not been seen before while simultaneously having computing power capable of large data processing on-board. Near-future spacecrafts will be equipped with large number of sensors that will produce data at high-speed rates in space and data processing power will be significantly increased.
Future missions such as Active Debris Removal will rely on novel high-performance avionics to support image processing and Artificial Intelligence algorithms with large workloads. Similar requirements come from Earth Observation applications, where data processing on-board can be critical in order to provide real-time reliable information to Earth. This new scenario has brought new challenges with it: low determinism, excessive power needs, data losses and large response latency.

In this project, Klepsydra AI is used as a novel approach to on-board artificial intelligence. It provides a very sophisticated threading model combination of pipeline and parallelization techniques applied to deep neural networks, making AI applications much more efficient and reliable. This new approach has been validated with several DNN models and two different computer architectures. The results show that the data processing rate and power saving of the applications increase substantially with respect to standard AI solutions.

A novel approach to Artificial Intelligence On-Board
New generations of spacecrafts are required to perform tasks with an increased level of autonomy. Space exploration, Earth Observation, space robotics, etc. are all growing fields in Space that require more sensors and more computational power to perform these missions.

Sensors, embedded processors, and hardware in general have hugely evolved in the last decade, equipping embedded systems with large number of sensors that will produce data at rates that has not been seen before while simultaneously having computing power capable of large data processing on-board. Near-future spacecrafts will be equipped with large number of sensors that will produce data at high-speed rates in space and data processing power will be significantly increased.
Future missions such as Active Debris Removal will rely on novel high-performance avionics to support image processing and Artificial Intelligence algorithms with large workloads. Similar requirements come from Earth Observation applications, where data processing on-board can be critical in order to provide real-time reliable information to Earth. This new scenario has brought new challenges with it: low determinism, excessive power needs, data losses and large response latency.

In this project, Klepsydra AI is used as a novel approach to on-board artificial intelligence. It provides a very sophisticated threading model combination of pipeline and parallelization techniques applied to deep neural networks, making AI applications much more efficient and reliable. This new approach has been validated with several DNN models and two different computer architectures. The results show that the data processing rate and power saving of the applications increase substantially with respect to standard AI solutions.

Advertisement
Advertisement

More Related Content

Similar to ADCSS 2022 (20)

Recently uploaded (20)

Advertisement

ADCSS 2022

  1. 1. ADCSS PRESENTATION 26.10.2022 pablo.ghiglino@klepsydra.com www.klepsydra.com Klepsydra Technologies
  2. 2. Part 1 Klepsydra Technology
  3. 3. Part 2.1 Lock-free programming
  4. 4. CONTEXT: PARALLEL PROCESSING • Recurrent mission failures due to software • Access to sensor data from Earth is time consuming. • Satellites struggle to meet power requirements Consequences for Space applications Challenges on on-board processing CPU Usage Low Medium Data volume Modern hardware and old software: • Computers max out with low to medium data volumes • Inef fi cient use of resources • Excessive power for low data processing
  5. 5. COMPARE AND SWAP • Compare-and-swap (CAS) is an instruction used in multithreading to achieve synchronisation. It compares the contents of a memory location with a given value and, only if they are the same, modi fi es the contents of that memory location to a new given value. This is done as a single atomic operation. • Compare-and-Swap has been an integral part of the IBM 370 architectures since 1970. • Maurice Herlihy (1991) proved that CAS can implement more of these algorithms than atomic read, write, and fetch-and-add
  6. 6. LOCK BASED PARALLELISATION VS LOCK FREE PARALLELISATION • Threads need to acquire lock to access resource. • Context switch: • Suspended while resource is locked by someone else • Awaken when resource is available. • Not deterministic, power consuming context switch. • Threads access resources using ‘Atomic Operations’ • Compare and Swap (CAS): • Try to update a memory entry • If not possible tried again • No locks involved, but ‘busy wait’ • No context switch required.
  7. 7. PROS AND CONS OF LOCK-FREE PROGRAMMING CPU Usage Data volume CPU Usage Data volume Lock-free programming Pros: • Less CPU consumption required • Lower latency and higher data throughput • Substantial increase in determinism Cons: • Extremely dif fi cult programming technique • Requires processor with CAS instructions (90% of the market have them, though)
  8. 8. Part 2.2 Klepsydra SDK
  9. 9. Lightweight, modular and compatible with most used operating systems Worldwide application Klepsydra SDK Klepsydra GPU Streaming Klepsydra AI Klepsydra ROS2 executor plugin SDK – Software Development Kit Boost data processing at the edge for general applications and processor intensive algorithms AI – Artificial Intelligence High performance deep neural network (DNN) engine to deploy any AI or machine learning module at the edge ROS2 Executor plugin Executor for ROS2 able to process up to 10 times more data with up to 50% reduction in CPU consumption. GPU (Graphic Processing Unit) High parallelisation of GPU to increase the processing data rate and GPU utilization THE PRODUCT
  10. 10. KLEPSYDRA SDK OVERVIEW SDK CORE Plugins: ROS, ROS2, OpenCV, CANopen Code generator Performance analysis Language bindings Applications Basic features Advanced features
  11. 11. Event Loop Sensor Multiplexer Two main data processing approaches Producer 1 Consumer 1 Consumer 2 Producer 2 Producer 3 Consumer Producer 1 11
  12. 12. Part 2.3 Algorithms in Klepsydra SDK
  13. 13. LOCK-FREE AS ALTERNATIVE TO PARALLELISATION Parallelisation Pipeline
  14. 14. APPROACH Input Matrix B = A x A C = B x B Output Matrix Input Matrix B = A x A Output Matrix C = B x B Klepsydra Parallel Streaming Setup OpenMP Sequential Setup { Thread 1 { Thread 2 { Parallelised { Parallelised
  15. 15. BENCHMARK DESCRIPTION Description • Given an input matrix, a number of sequential multiplications will be performed: • Step 1: A => B = A x A => Step 2 : C = B x B… • Matrix A randomly generated on each new sequence Parameters: • Matrix dimensions: 100x100 • Data type: Float, integer • Number of multiplications per matrix: [10, 60] • Processing frequency: [2Hz - 100Hz] Technical Spec • Computer: Odroid XU4 • OS: Ubuntu 18.04
  16. 16. FLOAT PERFORMANCE RESULTS II CPU Usage. 30 Steps 0,0 20,0 40,0 60,0 80,0 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra Throughput. 30 Steps 0,00 5,00 10,00 15,00 20,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra CPU Usage. 40 Steps 0,0 17,5 35,0 52,5 70,0 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Throughput. 40 Steps 0,00 3,50 7,00 10,50 14,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 40 Steps 0,00 60,00 120,00 180,00 240,00 Publishing Rate (Hz) 2,00 5,00 8,00 11,00 14,00 OpenMp Klepsydra Latency. 30 Steps 0,00 45,00 90,00 135,00 180,00 Publishing Rate (Hz) 2,00 6,50 11,00 15,50 20,00 OpenMp Klepsydra
  17. 17. Part 2.4 Klepsydra AI
  18. 18. 2-DIM THREADING MODEL Input Data Layer Output Data First dimension: pipelining { Thread 1 (Core 1) Layer Layer Layer Layer Layer { Thread 2 (Core 2) Layer Layer Layer Layer Layer Layer Layer Layer Layer Deep Neural Network Structure
  19. 19. 2-DIM THREADING MODEL Input Data Output Data Second dimension: Matrix multiplication parallelisation { T hread 1 (Core 1) Layer { T hread 2 (Core 2) { T hread 3 (Core 3)
  20. 20. 2-DIM THREADING MODEL Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Core 1 Core 2 Core 3 Core 4 Layer Layer Layer Layer Layer Layer Layer Layer Layer • Low CPU • High throughput CPU • High latency • Mid CPU • Mid throughput CPU • Mid latency • High CPU • Mid throughput CPU • Low latency Threading model con fi guration
  21. 21. 21 Example of performance benchmarks Klepsydra AI optimised for Latency vs CPU CPU Usage 0 30 60 90 Data Rate (Hz) 0 10 20 30 Latency Optimisation CPU Optimisaion TF Lite Latency = 29ms Low CPU Saturation Latency = 97ms TF Lite Saturation Latency = 120ms
  22. 22. Part 2.4 The auto-tuning software
  23. 23. KLEPSYDRA SDO PROCESS • Klepsydra Streaming Optimiser (KSO): • Runs on a separate computer • Executes several dry runs on the OBC • Collect statistics • Runs a genetic algorithm to fi nd the optimal solution for latency, power or throughput • The main variable to optimise is the distribution of layers on the HPDP cluster
  24. 24. KLEPSYDRA STREAMING DISTRIBUTION OPTIMISER (SDO)
  25. 25. ONNX API class KPSR_API OnnxDNNImporter { public: /** * @brief import an onnx file and uses a default eventloop factory for all processor cores * @param onnxFileName * @param testDNN * @return a share pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, bool testDNN = false); /** * @brief importForTest an onnx file and uses a default synchronous factory * @param onnxFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a share pointer to a DeepNeuralNetwork object * * This method is intented to be used for testing purposes only. * */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & onnxFileName, const std::string & envFileName); }; 25
  26. 26. KPSR File API class KPSR_API DNNImporter { public: /** * @brief import a klepsydra model file and uses a default eventloop factory for all processor cores * @param modelFileName * @param testDNN * @return a shared pointer to a DeepNeuralNetwork object * * When log level is debug, dumps the YAML configuration of the default factory. * It makes use of all processor cores. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, bool testDNN = false); /** * @brief importForTest a klepsydra model file and uses a default synchronous factory * @param modelFileName * @param envFileName. Klepsydra AI configuration environment file. * @return a shared pointer to a DeepNeuralNetwork object * * This method is intended to be used for testing purposes only. */ static std::shared_ptr<kpsr::ai::DeepNeuralNetworkFactory> createDNNFactory(const std::string & modelFileName, const std::string & envFileName); }; 26
  27. 27. Core API class DeepNeuralNetwork { public: /** * @brief setCallback * @param callback. Callback function for the prediction result. */ virtual void setCallback(std::function<void(const unsigned long &, const kpsr::ai::F32AlignedVector &)> callback) = 0; /** * @brief predict. Load input matrix as input to network. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const kpsr::ai::F32AlignedVector& inputVector) = 0; /** * @brief predict. Copy-less version of predict. * @param inputVector. An F32AlignedVector of floats containing network input. * * @return Unique id corresponding to the input vector */ virtual unsigned long predict(const std::shared_ptr<kpsr::ai::F32AlignedVector> & inputVector) = 0; }; 27
  28. 28. Part 3 KATESU Project
  29. 29. QORIQ® LAYERSCAPE LS1046A MULTICORE PROCESSOR QorIQ® Layerscape LS1046A Klepsydra AI Container
  30. 30. STATUS • Successful installation of the following setup: • LS1046 running Yocto Jethro • Docker Installed on LS1046 • Container with the following: • Ubuntu 20.04 • Klepsydra AI software fully supported (quantised and non- quantised)
  31. 31. XILINX ZEDBOARD ZedBoard Klepsydra AI Container PetaLinux Klepsydra AI Container
  32. 32. STATUS • Successful installation of the following setup: • ZedBoard running PetaLinux 2019.2 • Docker Installed on ZedBoard • Container with the following: • Ubuntu 20.04 • Klepsydra AI software with quantised support only
  33. 33. Part 3.2 Performance Results
  34. 34. PERFORMANCE RESULTS: CME ON LS1046 0 6,5 13 19,5 26 CPU / Hz TFLite + NEON Klepsydra 0 45 90 135 180 Latency (ms) TFLite + NEON Klepsydra 0 4,5 9 13,5 18 Throughput (Hz) TFLite + NEON Klepsydra
  35. 35. PERFORMANCE RESULTS: CME-Q ON LS1046 0 6,75 13,5 20,25 27 CPU / Hz TFLite + NEON Klepsydra 0 30 60 90 120 Latency (ms) TFLite + NEON Klepsydra 0 7,5 15 22,5 30 Throughput (Hz) TFLite + NEON Klepsydra
  36. 36. PERFORMANCE RESULTS: CME-Q ON ZEDBOARD 0 12,5 25 37,5 50 CPU / Hz TFLite + NEON Klepsydra 0 250 500 750 1000 Latency (ms) TFLite + NEON Klepsydra 0 0,65 1,3 1,95 2,6 Throughput (Hz) TFLite + NEON Klepsydra
  37. 37. PERFORMANCE RESULTS: BSC ON LS1046 0 20 40 60 80 CPU / Hz TFLite + NEON Klepsydra 0 1250 2500 3750 5000 Latency (ms) TFLite + NEON Klepsydra 0 0,15 0,3 0,45 0,6 Throughput (Hz) TFLite + NEON Klepsydra
  38. 38. Part 3.3 BSC Demo
  39. 39. Part 4 Conclusions and Future work
  40. 40. Vision-based navigation Earth Observation Telecommunications • Process more images per second • Increase con fi dence in the mission • Reduce power consumption up to 50% • Faster access to data from Earth • Increase processed request per second (increase revenue) • Enable AI telecomm (Cognitive radios) APPLICATION TO SPACE 40
  41. 41. ROADMAP • 2022: • Klepsydra AI for GPU (Q4 2022) • 2023: • Klepsydra AI For FGPA (Q2 2023) • Support for operating systems: FreeRTOS (Q3 2023) • Support for processor architectures: • GR740 and GR765 (Q4 2023) • RISC-V (Q4 2023) • HPDP (Q4 2023)
  42. 42. CONTACT INFORMATION Dr Pablo Ghiglino pablo.ghiglino@klepsydra.com +41786931544 www.klepsydra.com linkedin.com/company/klepsydra-technologies

×