More Related Content
Similar to Presentation - webinar embedded machine learning (20)
Presentation - webinar embedded machine learning
- 2. 2 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Table of Contents
ā¢ Introduction:Why embedded machine learning?
ā¢ Three main ingredients
ā¢ Training our model
ā¢ How to run inference on a Raspberry Pi PICO?
ā¢ Conclusion
- 4. 4 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Why Machine Learning?
ā¢ Very good at finding patterns
ā¢ Less human input needed
ā¢ Broadly applicable
ā¢ More processing power
ā¢ Lots and lots of data
ā¢ Explicability?
Can be a great tool!
Not the only tool!
ie., letās avoid doing it for the fancy factor when
traditional computer vision techniques are more
suitable!
Need Machine learning
?
Machine learning ???
- 5. 5 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Today
ā¢ Machine learning, and in particular deep learning with convolutional networks, is a
good tool to do classification on images
ā¢ Letās try to look into such a classification taskā¦ Embedded!
- 6. 6 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Why Machine Learning on EDGE
Low Cost
Data stays local Less space usage
Independent of Internet
Connection
Low Energy Consumption
Low Latency
ā Autonomous
ā Reliable
ā No bandwidth limitations
ā Data privacy
ā Security
ā Control ā Mobile application
ā No Cloud Computations
ā Vehicles, ā¦
ā Viability of business case
Why?
- 7. 7 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Embedded Machine Learning
ā¢ Only the inference will be embedded
ā¢ i.e. no on-device training in this presentation
- 10. 10 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Choosing a Frameworkā¦
ā¦BEFORETHE MICROCONTROLLER
ā¢ Similar accuracy (better with CNN?)
ā¢ Better for deployment
ā¢ Harder without Keras, easier with Keras
ā¢ TF lite/TFLĪ¼ for Embedded systems
ā¢ Similar accuracy (better with RNN?)
ā¢ Better for GPU support?
ā¢ Easier and more pythonic
ā¢ PyTorch live/mobile for ML on smartphones
- 11. 11 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Choosing a Frameworkā¦
ā¦BEFORETHE MICROCONTROLLER
ā¢ Similar accuracy (better with CNN?)
ā¢ Better for deployment
ā¢ Harder without Keras, easier with Keras
ā¢ TF lite/TFLĪ¼ for Embedded systems
ā¢ Similar accuracy (better with RNN?)
ā¢ Better for GPU support?
ā¢ Easier and more pythonic
ā¢ PyTorch live/mobile for ML on smartphones
- 12. 12 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
A Link: good or bad news?
ā¢ Sometimes, you can get the best of both worldsā¦
A SOTA PyTorch
model
TF lite deployment
ONNX ā¢ Open Neural Network Exchange
ā¢ Allowing exchange between frameworks
ā¢ Helping Hardware providers for AI optimisation
Letās say you must useTensorFlow Lite for Microcontrollers (TFLu) ā¦That does
not mean you can skip the choiceTensorFlow vs PyTorch!
- 13. 13 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Embedded Frameworks
ā¢ A dedicated embedded framework providesā¦
ā¢ It is not mandatory
ā¢ i.e., one could directly use python on a SBC like a raspberry PI
Optimisation/
Compression
On-device inference
Ā« engine Ā»
Getting rid of as much
as possible
- 14. 14 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
STM32 CUBE AI
TFLu CUBE AI
Wider availability (Not only for STM boards) Better performance
CLI CLI/GUI
Interpreter-based Generated C++ code
Open-source Tools for testing
More help available on internet More information on your model
TFLU
Embedded Frameworks
- 15. 15 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Different Networks
Convolutional layer Depthwise SeparableConvolution
4*4*27*2 = 864 operations 4*4*9*3 + 4*4*3*2 = 528 operations
ā¢ Depthwise separable convolution
ā¢ Thinner model possible (š¼)
MobileNetV2
- 16. 16 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Different Networks
GroupConvolution
Normal Convolution:
4*4*54*3 = 2592
operations
Group Convolution:
4*4*18*3 = 864
operations
4
4
channel shuffle
ShuffleNet
- 18. 18 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Different Hardware
ā The hardware can be microcontrollers not dedicated toAI
ā¢ Raspberry Pi PICO (RP2040/Arm cortex M0+), STM32 boards, ā¦
ā But it can be helped with a co-processorā¦
- 19. 19 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Why NPUs?
CPU
GPU
Or DSP (Digital Signal
Processor), ā¦
NPU
Neural Processing Unit
ā Always required
ā Fast & versatile
ā Improving everyday
ā Models are smaller
ā Can be a stand-alone good choice for inference
speed in some cases (sequential aspects of
Recurrent Neural Networks, small deep networks?)
ā Better for parallelization
ā Convolutional networks
ā Can be much faster, but never
alone
TPU (Tensor Processing Unit)
from google
VPU (Vision Processing Unit)
from intel
Tensor Cores (Nvidia)
FPGA (reconfigurable aspects !)
ā¦
- 20. 20 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Our Setup
Raspberry 4
(Note: with RPI5, PCIe port present. &
Coral PCIe is 20 ā¬)
Tensorflow Lite model
&TFL_runtime
(smaller python package with only theTF
lite interpreter)
MobileNetV2
(224, 224)
Full integer quantized model
Coral USB accelerator
USB 3.0 required (otherwise, little gain due
to low data transfer)
60ā¬
PICAM
- 23. 23 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Many things existā¦
1
2
3
Letās avoid name-dropping !
Many optionsā¦
Letās pick !
Feeling lost ā¦
ā¦But many available options is
also a good thing!
- 24. 24 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Three Ingredients : the path for today!
Hardware A Framework A Model
TensorFlow Lite for
šControllers
Raspberry PICO
MobileNetV2
- 26. 26 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Reduction/Optimization
Convert to .h file
TFLu interpreter Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference
- 27. 27 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Reduction/Optimization
Convert to .h file
TFLu interpreter Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference
- 28. 28 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Database
SOURCES
ā¢ PascalVOC
ā¢ Image Classification & Object Detection
ā¢ 11.540 images, 20 classes
ā¢ COCO
ā¢ Object Detection
ā¢ 200.000 images, 80 classes
- 29. 29 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Database
CUSTOM DATASET
ā¢ 8 classes: airplane, boat, bus, car, motorbike, none, person, train
ā¢ 800 images per class (600 training + 200 validation)
ā¢ Image size 224 x 224
ā¢ Bounding box
ā¢ Random size (minimum 25% of image)
ā¢ Random location
- 30. 30 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Transfer Learning
BUILD NEW APPLICATIONS
ā¢ Pretrained network
ā¢ Feature extraction
ā¢ Reuse for different task
ā¢ Benefits
ā¢ Less data needed
ā¢ Lower training time
ā¢ Better generalization
ā¢ Fine-tuning
Note: We only re-train a last dense
layer for our classification task
- 31. 31 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Training
HYPERPARAMETERS & OVERFITTING
ā¢ Data augmentation
ā¢ Prevent overfitting
ā¢ Increase dataset size
ā¢ Improve accuracy
ā¢ Images: flip, crop, rotate, zoom, stretch, contrast, brightnessā¦
ā¢ Batch size
ā¢ Smaller = less overfitting BUT slower training
ā¢ Learning rate
ā¢ Larger batches = higher learning rate
ā¢ Decrease over time
- 32. 32 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Database
INFLUENCEOF DATAAMOUNT
ā¢ MobileNetV2 (96 x 96) ā remove random training images (same amount per class)
ā¢ 600 training images per class
ā¢ Accuracy: 0.868
ā¢ 150 training images per class
ā¢ Accuracy: 0.825
ā¢ 75 training images per class
ā¢ Accuracy: 0.812
ā¢ 37 training images per class
ā¢ Accuracy: 0.794
ā Transfer Learning is Great!
- 33. 33 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Training
EXAMPLE INTENSORFLOW
ā¢ Pretrained network: MobilenetV2
ā¢ 2.257.984 parameters
ā¢ Pretrained on ImageNet (1.300.000 images)
ā¢ Classifier: Fully connected layer
ā¢ 10.248 parameters
ā¢ Batch size = 20
ā¢ Learning rate = 0.0002
ā¢ Training time ā 2 minutes/epoch
- 34. 34 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
ResNet50
ā¢ Parameters
ā¢ Base model: 23.587.712
ā¢ Classifier: 16.392
ā¢ Training time: 9 epochs, 6 minutes/epoch
ā¢ Size
ā¢ Normal: 94 Mb
ā¢ TFLite: 92 kb
ā¢ Performance
ā¢ Accuracy: 0.979
MobileNetV2
ā¢ Parameters
ā¢ Base model: 2.257.984
ā¢ Classifier: 10.248
ā¢ Training time: 18 epochs, 2 minutes/epoch
ā¢ Size
ā¢ Normal: 12 Mb
ā¢ TFLite: 9 kb
ā¢ Performance
ā¢ Accuracy: 0.970
Comparison
RESNETVS. MOBILENET (224 X 224)
- 35. 35 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
MobileNetV2
COMPARISON
Input Resolution Scaling Factor Size (TFLite) Accuracy F1 score Inference time (pc)
224 x 224 1 8.698 kb 0.970 0.969 11.05 ms
96 x 96 1 8.698 kb 0.931 0.931 2.22 ms
96 x 96 0.35 1.597 kb 0.868 0.869 0.67 ms
48 x 48 1 8.698 kb 0.732 0.732 1.21 ms
48 x 48 0.35 1.597 kb 0.630 0.633 0.28 ms
(Accuracy drop in 48 x 48 model partly due to pretrained weights not available)
- 38. 38 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite
COMPRESSED FLATBUFFER FORMAT
ā¢ Benefits
ā¢ Reduced size
ā¢ Faster inference
ā¢ Includes optimization possibilities
ā¢ Works out-0f-the-box for most models
ā¢ Not allTensorFlow operations supported
- 39. 39 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Quantization
ā¢ Changing the datatype
ā¢ Like moving from RGB888 to RGB565 orYCbCr422 in computer vision
ā¢ Can be float16, dynamic range, ā¦
ā¢ Here, we use 8 bit integers
ā¢ Post-TrainingQuantization (PTQ) vs QuantizationAwareTraining (QAT)
ā¢ Model is smaller & fasterā¦
-128 127
min max
At the cost ofā¦ Accuracy drop?
- 40. 40 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Quantization Size (TFLite) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 0.869 0.830 4.17 ms
float16 825 kb 0.870 0.831 4.12 ms
Dynamic Range 538 kb 0.869 0.834 4.66 ms
Full Integer 611 kb 0.845 0.805 3.46 ms
Full integer
(Quantization Aware)
611 kb 0.852 0.810 3.46 ms
Quantization
RESULTS
Model: MobileNetV2 (96 x 96, Ī± = 0.35)
- 41. 41 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Pruning
PRINCIPLE
ā¢ Weight pruning
ā¢ Gradually zero out weights
ā¢ Based on magnitude, activation, gradient ā¦
ā¢ Intermediate training for recalibration
ā¢ Structured pruning
ā¢ Remove neurons/filters
ā¢ Ī± factor in MobileNetV2 architecture
ā¢ Reduced size due to efficient compression
ā¢ Improved inference time (skip zero computations)
- 42. 42 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Pruning Size (TFLite) Size (zip) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 1.463 kb 0.869 0.830 4.17 ms
Dense layer (80%) 1.564 kb 1.439 kb 0.865 0.820 3.95 ms
Dense layer (90%) 1.558 kb 1.434 kb o.855 0.823 3.97 ms
Dense layer (80%)
+
1/3 Conv layers (50%)
1.191 kb 989 kb 0.769 0.748 3.99 ms
Pruning
RESULTS
Model: MobileNetV2 (96 x 96, Ī± = 0.35)
Less accuracy drop when pruning the
latter stages of the model
- 43. 43 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Pruning and Quantization
COMBINED
Pruning Size (TFLite) Size (zip) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 1.463 kb 0.869 0.830 4.17 ms
Full integer 611 kb 0.845 0.805 3.46 ms
Dense layer (80%)
+
50 Conv layers (50%)
1.191 kb 989 kb 0.769 0.748 3.99 ms
Dense layer (80%)
+
1/3 Conv layers (50%)
+
Full integer
611 kb 365 kb 0.734 0.709 3.62 ms
Model: MobileNetV2 (96 x 96, Ī± = 0.35)
- 44. 44 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Weight Clustering
PRINCIPLE
ā¢ Cluster weights in a layer in N clusters
ā¢ Cluster centroid value gets shared to all weights in cluster
ā¢ Additional fine-tuning possible
- 45. 45 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Weight Clustering
RESULTS
Pruning Size (TFLite) Size (zip) Accuracy F1 score
None 1.590 kb 1.463 kb 0.869 0.830
Dense layer (90%) 1.558 kb 1.434 kb o.855 0.823
Dense layer
(16 clusters)
1.671 kb 1.442 kb 0.872 0.832
Model: MobileNetV2 (96 x 96, Ī± = 0.35)
Strip_clustering_wrapper function inTensorFlow not working
- 46. 46 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Parameter Accuracy Size Inference
Decreased input resolution - - = ++ Enough information necessary in pixels
Decreased model size (Ī±) - - - ++ +++ NAS can improve the accuracy loss
Full integer quantization - ++ + Often required for inference on MCU
Pruning - (-) + = Less impact on later layers
Weight Clustering - + =
Overview
INFLUENCEOF DIFFERENT PARAMETERS
- 47. 47 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Howeverā¦
ā We do not need to redo everything from scratch
ā Many tools, tutorials, etc. are available
ā A bunch of weights does not mean anything for us humans (hence all the work done on explicable AI) but we do not
need to understand themā¦
ā¢ Accepting abstraction + using available tools = simpler than it may look?
OK, I lied.
I SAIDWE HADTHEWAY ANDTHEN I COMEWITH OTHER COMPARISONTABLESā¦
ā¢ We donāt need to create new model architectures : MobileNetV2!
ā¢ We donāt need to implement them: TensorFlow!
ā¢ We donāt even train most of that: transfer learning!
- 48. 48 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Memory & accuracy are known from the model Inference time depends on the hardware !
Some results of inference time
BUT
Model RPI4 Coral
(usb2.0)
Coral Coral+
(96, 96) ~ 3.4 ms NA ~ 1.72 ms NA
(224, 224) ~100 ms ~ 12.6 ms ~ 5 ms ~3.3 ms
20 x faster
2 x faster
STM32H747I + MBV2 (ššš = 96, š¼ = 0,35)
šš šš
PICO + MBV2 (ššš = 48, š¼ = 0,35)
250 šš
Spoiler alert!
- 50. 50 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Reduction/Optimization
Convert to .h file
TFLĪ¼
interpreter
Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference
- 51. 51 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Reduction/Optimization
Convert to .h file
TFLu interpreter Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference
- 52. 52 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Content
OVERVIEW
ā¢ Raspberry Pi Pico
1. Initial setup
ā¢ CMake file
ā¢ Pico-sdk library
2. Blinking a LED
3. Run inference
ā¢ TFLite-micro library
ā¢ Include model
ā¢ Execute the code
Ī¼
Ubuntu WSL
- 53. 53 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Resources
WHERETO BEGIN
ā¢ Datasheet
ā¢ āGetting Startedā
ā¢ Githubās README
We are helped!
- 54. 54 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Initial Setup
FOLDER STRUCTURE
ā¢ Libraries needed:
ā¢ pico-sdk (software development kit)
ā¢ Install toolchain
ā¢ $ sudo apt install cmake gcc-arm-none-eabi libnewlib-arm-none-eabi build-essential
- 55. 55 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Pico-sdk Library
INSTALLATION PROCEDURE
ā¢ Clone the repository and update the submodules
ā¢ $ git clone https://github.com/raspberrypi/pico-sdk.git --branch master
ā¢ $ cd pico-sdk
ā¢ $ git submodule update --init
ā¢ Copy pico_sdk_import.cmake from lib/pico-sdk/external to main folder
ā¢ Update pico_sdk_path variable
ā¢ $ export PICO_SDK_PATH=ā<main_folder>/lib/pico_sdk
- 58. 58 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Blinking a LED
MAIN.CPP
#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/gpio.h"
#include "pico/binary_info.h"
/* As per raspberry pico pinout documentation. */
#define LED_PIN 28
/* Program entry point. */
int main() {
/* Initilisation of standard lib for input/output. */
stdio_init_all();
/* Initialisation of LED pin as ouput PIN, with LOW initial value. */
gpio_init(LED_PIN);
gpio_set_dir(LED_PIN, GPIO_OUT);
gpio_put(LED_PIN, 0);
/* Forever loop. */
while (true) {
/* Blinking the LED. */
gpio_put(LED_PIN, 1);
sleep_ms(1000);
gpio_put(LED_PIN, 0);
sleep_ms(1000);
}
/* Unreachable code. */
return 0;
}
- 59. 59 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Blinking a LED
CMAKE FILE
cmake_minimum_required(VERSION 3.12)
include(pico_sdk_import.cmake)
project(picoDemo C CXX ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
pico_sdk_init()
add_compile_options(-Wall -Wno-format -Wno-unused-function -Wno-maybe-uninitialized)
add_executable(picoDemo src/main.cpp)
target_link_libraries(${PROJECT_NAME} pico_stdlib)
pico_enable_stdio_usb(picoDemo 1)
pico_enable_stdio_uart(picoDemo 0)
pico_add_extra_outputs(picoDemo)
pico-sdk library
uf2 file format to easily flash to the pico
Use USB connection for communication
- 60. 60 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Blinking a LED
BUILDINGTHE PROJECT
ā¢ Navigate to the build folder
ā¢ $ cmake ..
ā¢ $ make
ā¢ Copy the created .uf2 file to the PICO to flash the program
- 62. 62 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Pico-tflmicro
CLONE REPOSITORY
ā¢ Navigate to the lib folder
ā¢ $ git clone https://github.com/raspberrypi/pico-tflmicro.git
ā¢ This is a repository that:
ā¢ Includes theTensorFlow library (https://github.com/tensorflow/tflite-micro)
ā¢ BUT already configured for the pico
ā¢ Later we will show how to configure directly theTensorFlow library
- 64. 64 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
INCLUDE IN PROJECT
Pico-tflmicro
ā¢ In main.cpp add
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.hā
ā¢ In CMakeLists.txt add
target_link_libraries(${PROJECT_NAME} pico_stdlib pico-tflmicro)
add_subdirectory("lib/pico-tflmicro" EXCLUDE_FROM_ALL)
- 65. 65 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
INCLUDE TEST IMAGE
Test image
Actual 48x48 image used for inference
Included as image.h (array with the data), eventually should come from camera
- 66. 66 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
INCLUDE MODEL
ā¢ .tflite model can be converted to .h via a command
ā¢ $ xxd āi model_name.tflite > new_model_name.h
ā¢ Open the model and make it a const unsigned char!
- 67. 67 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā NECESSARY INCLUDES
#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/gpio.h"
#include "pico/binary_info.h"
/* Specific includes for tensorflow lite for microcontrollers. */
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
/* The image that will be tested. */
#include "image.h"
/* The trained model, convert for TFLu, and within a C header file. */
#include "..//models//model.h"
/* As per raspberry pico pinout documentation. */
#define LED_PIN 28
/* Adding all operations that were available before, i.e. 128 operations. */
using AllOpsResolver_t = tflite::MicroMutableOpResolver<9>;
- 68. 68 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā FUNCTIONSTO PREPROCESSTHE IMAGE (NEEDED FOR CORRECT INPUTTOTHE MODEL)
/* Rescaling to perform operations on data fo similar scale. */
float rescaling(float x, float scale, float offset) {
return (x * scale) - offset;
}
/* Quantization procedure, i.e. moving from a number represented with floats to a number
represented with int8. */
int8_t quantize(float x, float scale, float zero_point) {
return (x/scale) + zero_point;
}
- 69. 69 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā INITIALIZE LED, LABELSAND IMAGE SIZE
/* Program entry point. */
int main() {
/* Initilisation of the standard lib for input/output. */
stdio_init_all();
/* Initialisation of the LED pin, as an ouput PIN, with LOW initial value. */
gpio_init(LED_PIN);
gpio_set_dir(LED_PIN, GPIO_OUT);
gpio_put(LED_PIN, 0);
/* Image dimensions (48,48) on 3 channels (RGB). */
int Npix = 48;
int Nchan = 3;
int Nlabels = 8;
/* The 8 possible labels for the classifier as strings for the serial output. */
const char *label [] = {"aeroplane", "boat", "bus", "car", "motorbike", "none", "person", "train"};
- 70. 70 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā INITIALIZETFLITE-MICROOBJECTS
/* Initialisation of the TFLu interpreter. */
static const tflite::Model* tflu_model = nullptr;
static tflite::MicroInterpreter* tflu_interpreter = nullptr;
static TfLiteTensor* tflu_i_tensor = nullptr;
static TfLiteTensor* tflu_o_tensor = nullptr;
/* The ops resolver and error report. */
static AllOpsResolver_t op_resolver;
static tflite::MicroErrorReporter micro_error_reporter;
tflite::ErrorReporter* error_reporter = µ_error_reporter;
- 71. 71 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā ADDTHEOPERATIONS INCLUDED INYOUR MODEL
op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());
op_resolver.AddDepthwiseConv2D(tflite::Register_DEPTHWISE_CONV_2D_INT8());
op_resolver.AddPad();
op_resolver.AddAdd(tflite::Register_ADD_INT8());
op_resolver.AddRelu6();
op_resolver.AddMean();
op_resolver.AddSoftmax(tflite::Register_SOFTMAX_INT8());
op_resolver.AddFullyConnected(tflite::Register_FULLY_CONNECTED_INT8());
op_resolver.AddDequantize();
- 72. 72 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā MORE INITIALIZATION + INCLUDINGTHE MODEL
/* Allocation of the tensor arena, in the HEAP. */
constexpr int tensor_arena_size = 144000;
uint8_t *tensor_arena = nullptr;
tensor_arena = (uint8_t *)malloc(tensor_arena_size);
/* Initilizing the scaling values. */
float scaling_scale = 1.0f/127.5f;
int32_t scaling_offset = -1.0f;
/* Retrieving the model from the header file. */
tflu_model = ::tflite::GetModel(mobilenet48_int_input_tflite);
/* Creating the interpreter and allocating tensors. */
static tflite::MicroInterpreter static_interpreter(tflu_model, op_resolver, tensor_arena, tensor_arena_size);
tflu_interpreter = &static_interpreter;
TfLiteStatus allocate_status = tflu_interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
printf("Issue when allocating the tensors. rn");
}
- 73. 73 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā LINK INPUT/OUTPUTANDGET QUANTIZATION PARAMETERS
/* Linking the interpreter to the input/output tensors. */
tflu_i_tensor = tflu_interpreter->input(0);
tflu_o_tensor = tflu_interpreter->output(0);
/* Retrieving the quantization parameters from the model. */
const auto* i_quantization = reinterpret_cast<TfLiteAffineQuantization*>(tflu_i_tensor->quantization.params);
float tfluQuant_scale = i_quantization->scale->data[0];
int32_t tfluQuant_zeropoint = i_quantization->zero_point->data[0];
/* Indices initialization. */
int idx = 0;
float value = 0;
float value_scaled = 0;
float value_quant = 0;
int idx_tf = 0;
- 74. 74 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā DO RESCALINGANDQUANTIZATIONAND GIVE ITAS INPUTTOTHE MODEL
/* Forever loop. */
while (true) {
/* Blinking the LED. */
gpio_put(LED_PIN, 1);
sleep_ms(1000);
gpio_put(LED_PIN, 0);
sleep_ms(1000);
/* Preparing the input. */
for (int i(0); i<Npix; i++) {
for (int j(0); j<Npix; j++) {
for (int k(0); k<Nchan; k++) {
/* Compute the 1D index*/
idx = k*Npix*Npix + j*Npix + i;
value = test_image[idx];
/* Re-scale than quantize the result. */
value_scaled = rescaling(value, scaling_scale, scaling_offset);
value_quant = quantize(value_scaled, tfluQuant_scale, tfluQuant_zeropoint);
/* Put the result in the input tensor */
tflu_i_tensor->data.int8[idx] = value_quant;
}
}
}
- 75. 75 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Run Inference
MAIN.CPP ā RUN INFERENCEAND PRINT RESULT
/* Call the interpreter to infer a label. */
TfLiteStatus invoke_status = tflu_interpreter->Invoke();
/* Print the probabilities for each labels with the serial communication. */
printf("Result: [%f; %f; %f; %f; %f; %f; %f; %f].n", tflu_o_tensor->data.f[0], tflu_o_tensor->data.f[1], tflu_o_tensor->data.f[2],
tflu_o_tensor->data.f[3], tflu_o_tensor->data.f[4], tflu_o_tensor->data.f[5],
tflu_o_tensor->data.f[6], tflu_o_tensor->data.f[7]);
/* Retrieve the result with maximum likelihood. */
size_t ix_max = 0;
float pb_max = 0;
for (size_t ix = 0; ix<=Nlabels; ix++) {
if (tflu_o_tensor->data.f[ix] > pb_max) {
ix_max = ix;
pb_max = tflu_o_tensor->data.f[ix];
}
}
/* Print the most likely label with the serial communication. */
printf("Result of inference: %s with proba %f.n", label[ix_max], pb_max);
}
- 77. 77 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-micro
RESULT
ā¢ Result:
ā¢ aeroplane: 0.488
ā¢ boat: 0.446
ā¢ bus: 0.012
ā¢ car: 0.023
ā¢ moborbike: 0.000
ā¢ none: 0.004
ā¢ person: 0.004
ā¢ train: 0.012
- 79. 79 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Three githubs
TensorFlow Tflite-micro Pico-tflmicro
Just a subset Library already
prepared for pico
1
2
Used in the demo until now
Alternative route, shown in
next slides
- 80. 80 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
ONLY USINGTHETENSORFLOW GITHUB
ā¢ Navigate to the lib folder
ā¢ $ git clone https://github.com/tensorflow/tflite-micro.git
ā¢ $ cd tflite-micro
ā¢ $ make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_generic
TARGET_ARCH=cortex-m0plus OPTIMIZED_KERNEL_DIR=cmsis_nn microlite
ā¢ TheTARGET andTARGET_ARCH are specific to the hardware (PICO in this case)
- 81. 81 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
CMAKELISTS.TXT
add_compile_definitions(TF_LITE_STATIC_MEMORY=1)
target_link_libraries(${PROJECT_NAME} pico_stdlib
$ENV{TFLITE_MICRO_PATH}/gen/cortex_m_generic_cortex-m0plus_default/lib/libtensorflow-microlite.a)
include_directories(${PROJECT_NAME} PRIVATE $ENV{TFLITE_MICRO_PATH}/)
include_directories(${PROJECT_NAME} PRIVATE
$ENV{TFLITE_MICRO_PATH}/tensorflow/lite/micro/tools/make/downloads/flatbuffers/include/)
include_directories(${PROJECT_NAME} PRIVATE
$ENV{TFLITE_MICRO_PATH}/tensorflow/lite/micro/tools/make/downloads/gemmlowp/)
tflite-micro library
Includes needed for
the code in main.cpp
- 82. 82 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
MAIN.CPP
ā¢ Most of the code in main.cpp is the same
ā¢ Following slides show the things that change
- 83. 83 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
MAIN.CPP
ā¢ Main.cpp:
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.hā
#include "tensorflow/lite/micro/micro_interpreter.hā
#include "tensorflow/lite/schema/schema_generated.hā
#include "tensorflow/lite/micro/system_setup.hā
#include "tensorflow/lite/micro/micro_log.hā
#include "tensorflow/lite/micro/micro_mutable_op_resolver.hā
- 84. 84 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
MAIN.CPP
ā¢ Main.cpp:
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
constexpr int kTensorArenaSize = 160000;
alignas(16) static uint8_t tensor_arena[kTensorArenaSize];
tflite::InitializeTarget();
static tflite::MicroMutableOpResolver<9> op_resolver;
added
Replaces allOpsResolver
- 85. 85 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
TFLite-Micro
MAIN.CPP
ā¢ Main.cpp:
ā¢ op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());
ā¢ op_resolver.AddDepthwiseConv2D(tflite::Register_DEPTHWISE_CONV_2D_INT8());
ā¢ op_resolver.AddPad();
ā¢ op_resolver.AddAdd(tflite::Register_ADD_INT8());
ā¢ op_resolver.AddRelu6();
ā¢ op_resolver.AddMean();
ā¢ op_resolver.AddSoftmax(tflite::Register_SOFTMAX_INT8());
ā¢ op_resolver.AddFullyConnected(tflite::Register_FULLY_CONNECTED_INT8());
ā¢ op_resolver.AddDequantize();
- 87. 87 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Next steps
We want to have the inference live with images coming from a ~5ā¬ camera
ā¢ OV7670 sensor
Get images but keep the corresponding cost to a minimum ?
Counting on
the CPU (SW)?
Not Counting
on CPU (HW)?
ā¢ Efficient memory usage
ā¢ Fusing operations
ā¢ ā¦
ā¢ PIO
ā¢ DMA (direct memory access)
- 88. 88 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
PIO
ā¢ A PIO instance contains 4 state machines
ā¢ A state machine is like a tiny processor that can execute a limited set of instructions
ā¢ The CPU loads the corresponding instructions, enabling/disabling the state machines
ā¢ Its a way to delegate some workload away from the CPU, i.e. communication with
the camera
ā¢ We wanted to show a quick example where the Raspberry Pi PICO blinks a LED
without using the CPU butā¦ time is ticking !
PROGRAMMABLE INPUT/OUPUT
- 89. 89 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Edge AI is Multidisciplinary
Hardware
Data
science
Software
A
B
C
Wanting a first inference from scratchā¦
A need is defined, hence technical specifications must be met
Seeking to be state of the artā¦
Bad news.
All 3 needed.
Bad news.
Mastering all 3 needed.
Good news.
Insisting on the most comfortable one(s).
Usually what must be
achieved?
- 90. 90 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Edge AI issues
Multidisciplinary expertise requiredā¦ andā¦
ā¢ Fast evolving field (new models, new frameworks,
new HWā¦)
ā¢ Many constraints to respect for the solution:
ā¢ Technical (inference time, memory usage, ā¦)
ā¢ Human (appropriate to user expertise)
ā¢ Business (at an OK cost)
Hardware
Data
science
Software
- 91. 91 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Conclusion
ā¢ Beware of name dropping
ā¢ First a need, then embedded machine learning is a possibility (and technical
specifications are the finish line)
ā¢ Embedded Machine learning is inherently multidisciplinary
ā¢ Very complex from scratchā¦ But we are helped!
- 92. 92 31/01/2024
Ā©SIRRIS ā¢ CONFIDENTIAL ā¢
Donāt hesitate to contact us!
ā¢ Questions about this presentation?
ā¢ Miguel Lejeune (miguel.lejeune@sirris.be, +32 490 01 41 44)
ā¢ Vincent Lucas (vincent.lucas@sirris.be, +32 493 31 15 92)
ā¢ Questions about other technologies/Sirris offerings?
ā¢ Questions about fundings ?
ā¢Bas Rottier (bas.rottier@sirris.be, +32 491 86 91 70)