Presentation - webinar embedded machine learning

2 31/01/2024
©SIRRIS • CONFIDENTIAL •
Table of Contents
➢ Introduction:Why embedded machine learning?
➢ Three main ingredients
➢ Training our model
➢ How to run inference on a Raspberry Pi PICO?
➢ Conclusion

Why Embedded
Machine Learning ?

4 31/01/2024
Why Machine Learning?
➢ Very good at finding patterns
➢ Less human input needed
➢ Broadly applicable
➢ More processing power
➢ Lots and lots of data
➢ Explicability?
Can be a great tool!
Not the only tool!
ie., let’s avoid doing it for the fancy factor when
traditional computer vision techniques are more
suitable!
Need Machine learning
?
Machine learning ???

5 31/01/2024
Today
➢ Machine learning, and in particular deep learning with convolutional networks, is a
good tool to do classification on images
➢ Let’s try to look into such a classification task… Embedded!

6 31/01/2024
Why Machine Learning on EDGE
Low Cost
Data stays local Less space usage
Independent of Internet
Connection
Low Energy Consumption
Low Latency
✓ Autonomous
✓ Reliable
✓ No bandwidth limitations
✓ Data privacy
✓ Security
✓ Control ✓ Mobile application
✓ No Cloud Computations
✓ Vehicles, …
✓ Viability of business case
Why?

7 31/01/2024
Embedded Machine Learning
• Only the inference will be embedded
• i.e. no on-device training in this presentation

9 31/01/2024
Three Ingredients
Hardware A Framework A Model

10 31/01/2024
Choosing a Framework…
…BEFORETHE MICROCONTROLLER
➢ Similar accuracy (better with CNN?)
➢ Better for deployment
➢ Harder without Keras, easier with Keras
➢ TF lite/TFLμ for Embedded systems
➢ Similar accuracy (better with RNN?)
➢ Better for GPU support?
➢ Easier and more pythonic
➢ PyTorch live/mobile for ML on smartphones

11 31/01/2024
Choosing a Framework…
…BEFORETHE MICROCONTROLLER
➢ Similar accuracy (better with CNN?)
➢ Better for deployment
➢ Harder without Keras, easier with Keras
➢ TF lite/TFLμ for Embedded systems
➢ Similar accuracy (better with RNN?)
➢ Better for GPU support?
➢ Easier and more pythonic
➢ PyTorch live/mobile for ML on smartphones

12 31/01/2024
A Link: good or bad news?
➢ Sometimes, you can get the best of both worlds…
A SOTA PyTorch
model
TF lite deployment
ONNX • Open Neural Network Exchange
• Allowing exchange between frameworks
• Helping Hardware providers for AI optimisation
Let’s say you must useTensorFlow Lite for Microcontrollers (TFLu) …That does
not mean you can skip the choiceTensorFlow vs PyTorch!

13 31/01/2024
Embedded Frameworks
➢ A dedicated embedded framework provides…
➢ It is not mandatory
➢ i.e., one could directly use python on a SBC like a raspberry PI
Optimisation/
Compression
On-device inference
« engine »
Getting rid of as much
as possible

14 31/01/2024
STM32 CUBE AI
TFLu CUBE AI
Wider availability (Not only for STM boards) Better performance
CLI CLI/GUI
Interpreter-based Generated C++ code
Open-source Tools for testing
More help available on internet More information on your model
TFLU
Embedded Frameworks

15 31/01/2024
Different Networks
Convolutional layer Depthwise SeparableConvolution
4*4*27*2 = 864 operations 4*4*9*3 + 4*4*3*2 = 528 operations
➢ Depthwise separable convolution
➢ Thinner model possible (𝛼)
MobileNetV2

16 31/01/2024
Different Networks
GroupConvolution
Normal Convolution:
4*4*54*3 = 2592
operations
Group Convolution:
4*4*18*3 = 864
operations
4
4
channel shuffle
ShuffleNet

17 31/01/2024
Different Networks
COMPARISON
NAS = Neural Architecture Search

18 31/01/2024
Different Hardware
❑ The hardware can be microcontrollers not dedicated toAI
➢ Raspberry Pi PICO (RP2040/Arm cortex M0+), STM32 boards, …
❑ But it can be helped with a co-processor…

19 31/01/2024
Why NPUs?
CPU
GPU
Or DSP (Digital Signal
Processor), …
NPU
Neural Processing Unit
✓ Always required
✓ Fast & versatile
✓ Improving everyday
✓ Models are smaller
✓ Can be a stand-alone good choice for inference
speed in some cases (sequential aspects of
Recurrent Neural Networks, small deep networks?)
✓ Better for parallelization
✓ Convolutional networks
✓ Can be much faster, but never
alone
TPU (Tensor Processing Unit)
from google
VPU (Vision Processing Unit)
from intel
Tensor Cores (Nvidia)
FPGA (reconfigurable aspects !)
…

20 31/01/2024
Our Setup
Raspberry 4
(Note: with RPI5, PCIe port present. &
Coral PCIe is 20 €)
Tensorflow Lite model
&TFL_runtime
(smaller python package with only theTF
lite interpreter)
MobileNetV2
(224, 224)
Full integer quantized model
Coral USB accelerator
USB 3.0 required (otherwise, little gain due
to low data transfer)
60€
PICAM

21 31/01/2024
Google Coral EdgeTPU

22 31/01/2024
Three Ingredients : lots of options…
A hardware A framework A model
…
…
…

23 31/01/2024
Many things exist…
1
2
3
Let’s avoid name-dropping !
Many options…
Let’s pick !
Feeling lost …
…But many available options is
also a good thing!

24 31/01/2024
Three Ingredients : the path for today!
Hardware A Framework A Model
TensorFlow Lite for
𝜇Controllers
Raspberry PICO
MobileNetV2

26 31/01/2024
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Reduction/Optimization
Convert to .h file
TFLu interpreter Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference

27 31/01/2024
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Convert to .h file
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference

28 31/01/2024
Database
SOURCES
➢ PascalVOC
➢ Image Classification & Object Detection
➢ 11.540 images, 20 classes
➢ COCO
➢ Object Detection
➢ 200.000 images, 80 classes

29 31/01/2024
Database
CUSTOM DATASET
➢ 8 classes: airplane, boat, bus, car, motorbike, none, person, train
➢ 800 images per class (600 training + 200 validation)
➢ Image size 224 x 224
➢ Bounding box
➢ Random size (minimum 25% of image)
➢ Random location

30 31/01/2024
Transfer Learning
BUILD NEW APPLICATIONS
➢ Pretrained network
➢ Feature extraction
➢ Reuse for different task
➢ Benefits
➢ Less data needed
➢ Lower training time
➢ Better generalization
➢ Fine-tuning
Note: We only re-train a last dense
layer for our classification task

31 31/01/2024
Training
HYPERPARAMETERS & OVERFITTING
➢ Data augmentation
➢ Prevent overfitting
➢ Increase dataset size
➢ Improve accuracy
➢ Images: flip, crop, rotate, zoom, stretch, contrast, brightness…
➢ Batch size
➢ Smaller = less overfitting BUT slower training
➢ Learning rate
➢ Larger batches = higher learning rate
➢ Decrease over time

32 31/01/2024
Database
INFLUENCEOF DATAAMOUNT
➢ MobileNetV2 (96 x 96) → remove random training images (same amount per class)
➢ 600 training images per class
➢ Accuracy: 0.868
➢ Accuracy: 0.825
➢ Accuracy: 0.812
➢ Accuracy: 0.794
✓ Transfer Learning is Great!

33 31/01/2024
Training
EXAMPLE INTENSORFLOW
➢ Pretrained network: MobilenetV2
➢ 2.257.984 parameters
➢ Pretrained on ImageNet (1.300.000 images)
➢ Classifier: Fully connected layer
➢ 10.248 parameters
➢ Batch size = 20
➢ Learning rate = 0.0002
➢ Training time ≈ 2 minutes/epoch

34 31/01/2024
ResNet50
➢ Parameters
➢ Base model: 23.587.712
➢ Classifier: 16.392
➢ Training time: 9 epochs, 6 minutes/epoch
➢ Size
➢ Normal: 94 Mb
➢ TFLite: 92 kb
➢ Performance
➢ Accuracy: 0.979
MobileNetV2
➢ Parameters
➢ Base model: 2.257.984
➢ Classifier: 10.248
➢ Training time: 18 epochs, 2 minutes/epoch
➢ Size
➢ Normal: 12 Mb
➢ TFLite: 9 kb
➢ Performance
➢ Accuracy: 0.970
Comparison
RESNETVS. MOBILENET (224 X 224)

35 31/01/2024
MobileNetV2
COMPARISON
Input Resolution Scaling Factor Size (TFLite) Accuracy F1 score Inference time (pc)
224 x 224 1 8.698 kb 0.970 0.969 11.05 ms
96 x 96 1 8.698 kb 0.931 0.931 2.22 ms
96 x 96 0.35 1.597 kb 0.868 0.869 0.67 ms
48 x 48 1 8.698 kb 0.732 0.732 1.21 ms
48 x 48 0.35 1.597 kb 0.630 0.633 0.28 ms
(Accuracy drop in 48 x 48 model partly due to pretrained weights not available)

36 31/01/2024
ResNet MobileNetV2
Comparison
RESNETVS. MOBILENET (224 X 224)

37 31/01/2024
MobileNetV2
224VS. 96
224 x 224 (α = 1) 96 x 96 (α = 0.35)

38 31/01/2024
TFLite
COMPRESSED FLATBUFFER FORMAT
➢ Benefits
➢ Reduced size
➢ Faster inference
➢ Includes optimization possibilities
➢ Works out-0f-the-box for most models
➢ Not allTensorFlow operations supported

39 31/01/2024
Quantization
➢ Changing the datatype
➢ Like moving from RGB888 to RGB565 orYCbCr422 in computer vision
➢ Can be float16, dynamic range, …
➢ Here, we use 8 bit integers
➢ Post-TrainingQuantization (PTQ) vs QuantizationAwareTraining (QAT)
➢ Model is smaller & faster…
-128 127
min max
At the cost of… Accuracy drop?

40 31/01/2024
Quantization Size (TFLite) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 0.869 0.830 4.17 ms
float16 825 kb 0.870 0.831 4.12 ms
Dynamic Range 538 kb 0.869 0.834 4.66 ms
Full Integer 611 kb 0.845 0.805 3.46 ms
Full integer
(Quantization Aware)
611 kb 0.852 0.810 3.46 ms
Quantization
RESULTS
Model: MobileNetV2 (96 x 96, α = 0.35)

41 31/01/2024
Pruning
PRINCIPLE
➢ Weight pruning
➢ Gradually zero out weights
➢ Based on magnitude, activation, gradient …
➢ Intermediate training for recalibration
➢ Structured pruning
➢ Remove neurons/filters
➢ α factor in MobileNetV2 architecture
➢ Reduced size due to efficient compression
➢ Improved inference time (skip zero computations)

42 31/01/2024
Pruning Size (TFLite) Size (zip) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 1.463 kb 0.869 0.830 4.17 ms
Dense layer (80%) 1.564 kb 1.439 kb 0.865 0.820 3.95 ms
Dense layer (90%) 1.558 kb 1.434 kb o.855 0.823 3.97 ms
Dense layer (80%)
+
1/3 Conv layers (50%)
1.191 kb 989 kb 0.769 0.748 3.99 ms
Pruning
RESULTS
Less accuracy drop when pruning the
latter stages of the model

43 31/01/2024
Pruning and Quantization
COMBINED
Pruning Size (TFLite) Size (zip) Accuracy F1 score Inference time (RPI 4)
None 1.590 kb 1.463 kb 0.869 0.830 4.17 ms
Full integer 611 kb 0.845 0.805 3.46 ms
Dense layer (80%)
+
50 Conv layers (50%)
1.191 kb 989 kb 0.769 0.748 3.99 ms
Dense layer (80%)
+
1/3 Conv layers (50%)
+
Full integer
611 kb 365 kb 0.734 0.709 3.62 ms

44 31/01/2024
Weight Clustering
PRINCIPLE
➢ Cluster weights in a layer in N clusters
➢ Cluster centroid value gets shared to all weights in cluster
➢ Additional fine-tuning possible

45 31/01/2024
Weight Clustering
RESULTS
Pruning Size (TFLite) Size (zip) Accuracy F1 score
None 1.590 kb 1.463 kb 0.869 0.830
Dense layer (90%) 1.558 kb 1.434 kb o.855 0.823
Dense layer
(16 clusters)
1.671 kb 1.442 kb 0.872 0.832
Strip_clustering_wrapper function inTensorFlow not working

46 31/01/2024
Parameter Accuracy Size Inference
Decreased input resolution - - = ++ Enough information necessary in pixels
Decreased model size (α) - - - ++ +++ NAS can improve the accuracy loss
Full integer quantization - ++ + Often required for inference on MCU
Pruning - (-) + = Less impact on later layers
Weight Clustering - + =
Overview
INFLUENCEOF DIFFERENT PARAMETERS

47 31/01/2024
However…
❑ We do not need to redo everything from scratch
❑ Many tools, tutorials, etc. are available
❑ A bunch of weights does not mean anything for us humans (hence all the work done on explicable AI) but we do not
need to understand them…
➢ Accepting abstraction + using available tools = simpler than it may look?
OK, I lied.
I SAIDWE HADTHEWAY ANDTHEN I COMEWITH OTHER COMPARISONTABLES…
• We don’t need to create new model architectures : MobileNetV2!
• We don’t need to implement them: TensorFlow!
• We don’t even train most of that: transfer learning!

48 31/01/2024
Memory & accuracy are known from the model Inference time depends on the hardware !
Some results of inference time
BUT
Model RPI4 Coral
(usb2.0)
Coral Coral+
(96, 96) ~ 3.4 ms NA ~ 1.72 ms NA
(224, 224) ~100 ms ~ 12.6 ms ~ 5 ms ~3.3 ms
20 x faster
2 x faster
STM32H747I + MBV2 (𝑟𝑒𝑠 = 96, 𝛼 = 0,35)
𝟓𝟔 𝒎𝒔
PICO + MBV2 (𝑟𝑒𝑠 = 48, 𝛼 = 0,35)
250 𝒎𝒔
Spoiler alert!

How to run inference
on a
Raspberry Pi PICO?

50 31/01/2024
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Convert to .h file
TFLμ
interpreter
Plane !
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference

51 31/01/2024
TF model
dataset
Pre-trained
weights
MobileNetV2
TF Lite model
Convert to .h file
TFLu
TFLu for PICO
Make file
PICO SDK
C++ code
Training
Inference

52 31/01/2024
Content
OVERVIEW
➢ Raspberry Pi Pico
1. Initial setup
➢ CMake file
➢ Pico-sdk library
2. Blinking a LED
3. Run inference
➢ TFLite-micro library
➢ Include model
➢ Execute the code
μ
Ubuntu WSL

53 31/01/2024
Resources
WHERETO BEGIN
➢ Datasheet
➢ “Getting Started”
➢ Github’s README
We are helped!

54 31/01/2024
Initial Setup
FOLDER STRUCTURE
➢ Libraries needed:
➢ pico-sdk (software development kit)
➢ Install toolchain
➢ $ sudo apt install cmake gcc-arm-none-eabi libnewlib-arm-none-eabi build-essential

55 31/01/2024
Pico-sdk Library
INSTALLATION PROCEDURE
➢ Clone the repository and update the submodules
➢ $ git clone https://github.com/raspberrypi/pico-sdk.git --branch master
➢ $ cd pico-sdk
➢ $ git submodule update --init
➢ Copy pico_sdk_import.cmake from lib/pico-sdk/external to main folder
➢ Update pico_sdk_path variable
➢ $ export PICO_SDK_PATH=‘<main_folder>/lib/pico_sdk

56 31/01/2024
Pico-sdk Library
PATHVARIABLE

58 31/01/2024
Blinking a LED
MAIN.CPP
#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/gpio.h"
#include "pico/binary_info.h"
/* As per raspberry pico pinout documentation. */
#define LED_PIN 28
/* Program entry point. */
int main() {
/* Initilisation of standard lib for input/output. */
stdio_init_all();
/* Initialisation of LED pin as ouput PIN, with LOW initial value. */
gpio_init(LED_PIN);
gpio_set_dir(LED_PIN, GPIO_OUT);
gpio_put(LED_PIN, 0);
/* Forever loop. */
while (true) {
/* Blinking the LED. */
sleep_ms(1000);
sleep_ms(1000);
}
/* Unreachable code. */
return 0;
}

59 31/01/2024
Blinking a LED
CMAKE FILE
cmake_minimum_required(VERSION 3.12)
include(pico_sdk_import.cmake)
project(picoDemo C CXX ASM)
set(CMAKE_C_STANDARD 11)
set(CMAKE_CXX_STANDARD 17)
pico_sdk_init()
add_compile_options(-Wall -Wno-format -Wno-unused-function -Wno-maybe-uninitialized)
add_executable(picoDemo src/main.cpp)
target_link_libraries(${PROJECT_NAME} pico_stdlib)
pico_enable_stdio_usb(picoDemo 1)
pico_enable_stdio_uart(picoDemo 0)
pico_add_extra_outputs(picoDemo)
pico-sdk library
uf2 file format to easily flash to the pico
Use USB connection for communication

60 31/01/2024
Blinking a LED
BUILDINGTHE PROJECT
➢ Navigate to the build folder
➢ $ cmake ..
➢ $ make
➢ Copy the created .uf2 file to the PICO to flash the program

62 31/01/2024
Pico-tflmicro
CLONE REPOSITORY
➢ Navigate to the lib folder
➢ $ git clone https://github.com/raspberrypi/pico-tflmicro.git
➢ This is a repository that:
➢ Includes theTensorFlow library (https://github.com/tensorflow/tflite-micro)
➢ BUT already configured for the pico
➢ Later we will show how to configure directly theTensorFlow library

63 31/01/2024
Folder structure

64 31/01/2024
INCLUDE IN PROJECT
Pico-tflmicro
➢ In main.cpp add
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h“
➢ In CMakeLists.txt add
target_link_libraries(${PROJECT_NAME} pico_stdlib pico-tflmicro)
add_subdirectory("lib/pico-tflmicro" EXCLUDE_FROM_ALL)

65 31/01/2024
Run Inference
INCLUDE TEST IMAGE
Test image
Actual 48x48 image used for inference
Included as image.h (array with the data), eventually should come from camera

66 31/01/2024
Run Inference
INCLUDE MODEL
➢ .tflite model can be converted to .h via a command
➢ $ xxd –i model_name.tflite > new_model_name.h
➢ Open the model and make it a const unsigned char!

67 31/01/2024
Run Inference
MAIN.CPP – NECESSARY INCLUDES
#include <stdio.h>
#include "pico/stdlib.h"
#include "hardware/gpio.h"
#include "pico/binary_info.h"
/* Specific includes for tensorflow lite for microcontrollers. */
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h"
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
/* The image that will be tested. */
#include "image.h"
/* The trained model, convert for TFLu, and within a C header file. */
#include "..//models//model.h"
/* As per raspberry pico pinout documentation. */
#define LED_PIN 28
/* Adding all operations that were available before, i.e. 128 operations. */
using AllOpsResolver_t = tflite::MicroMutableOpResolver<9>;

68 31/01/2024
Run Inference
MAIN.CPP – FUNCTIONSTO PREPROCESSTHE IMAGE (NEEDED FOR CORRECT INPUTTOTHE MODEL)
/* Rescaling to perform operations on data fo similar scale. */
float rescaling(float x, float scale, float offset) {
return (x * scale) - offset;
}
/* Quantization procedure, i.e. moving from a number represented with floats to a number
represented with int8. */
int8_t quantize(float x, float scale, float zero_point) {
return (x/scale) + zero_point;
}

69 31/01/2024
Run Inference
MAIN.CPP – INITIALIZE LED, LABELSAND IMAGE SIZE
/* Program entry point. */
int main() {
/* Initilisation of the standard lib for input/output. */
stdio_init_all();
/* Initialisation of the LED pin, as an ouput PIN, with LOW initial value. */
gpio_init(LED_PIN);
gpio_set_dir(LED_PIN, GPIO_OUT);
/* Image dimensions (48,48) on 3 channels (RGB). */
int Npix = 48;
int Nchan = 3;
int Nlabels = 8;
/* The 8 possible labels for the classifier as strings for the serial output. */
const char *label [] = {"aeroplane", "boat", "bus", "car", "motorbike", "none", "person", "train"};

70 31/01/2024
Run Inference
MAIN.CPP – INITIALIZETFLITE-MICROOBJECTS
/* Initialisation of the TFLu interpreter. */
static const tflite::Model* tflu_model = nullptr;
static tflite::MicroInterpreter* tflu_interpreter = nullptr;
static TfLiteTensor* tflu_i_tensor = nullptr;
static TfLiteTensor* tflu_o_tensor = nullptr;
/* The ops resolver and error report. */
static AllOpsResolver_t op_resolver;
static tflite::MicroErrorReporter micro_error_reporter;
tflite::ErrorReporter* error_reporter = &micro_error_reporter;

71 31/01/2024
Run Inference
MAIN.CPP – ADDTHEOPERATIONS INCLUDED INYOUR MODEL
op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());
op_resolver.AddDepthwiseConv2D(tflite::Register_DEPTHWISE_CONV_2D_INT8());
op_resolver.AddPad();
op_resolver.AddAdd(tflite::Register_ADD_INT8());
op_resolver.AddRelu6();
op_resolver.AddMean();
op_resolver.AddSoftmax(tflite::Register_SOFTMAX_INT8());
op_resolver.AddFullyConnected(tflite::Register_FULLY_CONNECTED_INT8());
op_resolver.AddDequantize();

72 31/01/2024
Run Inference
MAIN.CPP – MORE INITIALIZATION + INCLUDINGTHE MODEL
/* Allocation of the tensor arena, in the HEAP. */
constexpr int tensor_arena_size = 144000;
uint8_t *tensor_arena = nullptr;
tensor_arena = (uint8_t *)malloc(tensor_arena_size);
/* Initilizing the scaling values. */
float scaling_scale = 1.0f/127.5f;
int32_t scaling_offset = -1.0f;
/* Retrieving the model from the header file. */
tflu_model = ::tflite::GetModel(mobilenet48_int_input_tflite);
/* Creating the interpreter and allocating tensors. */
static tflite::MicroInterpreter static_interpreter(tflu_model, op_resolver, tensor_arena, tensor_arena_size);
tflu_interpreter = &static_interpreter;
TfLiteStatus allocate_status = tflu_interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
printf("Issue when allocating the tensors. rn");
}

73 31/01/2024
Run Inference
MAIN.CPP – LINK INPUT/OUTPUTANDGET QUANTIZATION PARAMETERS
/* Linking the interpreter to the input/output tensors. */
tflu_i_tensor = tflu_interpreter->input(0);
tflu_o_tensor = tflu_interpreter->output(0);
/* Retrieving the quantization parameters from the model. */
const auto* i_quantization = reinterpret_cast<TfLiteAffineQuantization*>(tflu_i_tensor->quantization.params);
float tfluQuant_scale = i_quantization->scale->data[0];
int32_t tfluQuant_zeropoint = i_quantization->zero_point->data[0];
/* Indices initialization. */
int idx = 0;
float value = 0;
float value_scaled = 0;
float value_quant = 0;
int idx_tf = 0;

74 31/01/2024
Run Inference
MAIN.CPP – DO RESCALINGANDQUANTIZATIONAND GIVE ITAS INPUTTOTHE MODEL
/* Forever loop. */
while (true) {
/* Blinking the LED. */
sleep_ms(1000);
sleep_ms(1000);
/* Preparing the input. */
for (int i(0); i<Npix; i++) {
for (int j(0); j<Npix; j++) {
for (int k(0); k<Nchan; k++) {
/* Compute the 1D index*/
idx = k*Npix*Npix + j*Npix + i;
value = test_image[idx];
/* Re-scale than quantize the result. */
value_scaled = rescaling(value, scaling_scale, scaling_offset);
value_quant = quantize(value_scaled, tfluQuant_scale, tfluQuant_zeropoint);
/* Put the result in the input tensor */
tflu_i_tensor->data.int8[idx] = value_quant;
}
}
}

75 31/01/2024
Run Inference
MAIN.CPP – RUN INFERENCEAND PRINT RESULT
/* Call the interpreter to infer a label. */
TfLiteStatus invoke_status = tflu_interpreter->Invoke();
/* Print the probabilities for each labels with the serial communication. */
printf("Result: [%f; %f; %f; %f; %f; %f; %f; %f].n", tflu_o_tensor->data.f[0], tflu_o_tensor->data.f[1], tflu_o_tensor->data.f[2],
tflu_o_tensor->data.f[3], tflu_o_tensor->data.f[4], tflu_o_tensor->data.f[5],
tflu_o_tensor->data.f[6], tflu_o_tensor->data.f[7]);
/* Retrieve the result with maximum likelihood. */
size_t ix_max = 0;
float pb_max = 0;
for (size_t ix = 0; ix<=Nlabels; ix++) {
if (tflu_o_tensor->data.f[ix] > pb_max) {
ix_max = ix;
pb_max = tflu_o_tensor->data.f[ix];
}
}
/* Print the most likely label with the serial communication. */
printf("Result of inference: %s with proba %f.n", label[ix_max], pb_max);
}

77 31/01/2024
TFLite-micro
RESULT
➢ Result:
➢ aeroplane: 0.488
➢ boat: 0.446
➢ bus: 0.012
➢ car: 0.023
➢ moborbike: 0.000
➢ none: 0.004
➢ person: 0.004
➢ train: 0.012

TFLite-micro
ONLY USINGTHETENSORFLOW GITHUB

79 31/01/2024
Three githubs
TensorFlow Tflite-micro Pico-tflmicro
Just a subset Library already
prepared for pico
1
2
Used in the demo until now
Alternative route, shown in
next slides

80 31/01/2024
TFLite-Micro
ONLY USINGTHETENSORFLOW GITHUB
➢ Navigate to the lib folder
➢ $ git clone https://github.com/tensorflow/tflite-micro.git
➢ $ cd tflite-micro
➢ $ make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_generic
TARGET_ARCH=cortex-m0plus OPTIMIZED_KERNEL_DIR=cmsis_nn microlite
➢ TheTARGET andTARGET_ARCH are specific to the hardware (PICO in this case)

81 31/01/2024
TFLite-Micro
CMAKELISTS.TXT
add_compile_definitions(TF_LITE_STATIC_MEMORY=1)
target_link_libraries(${PROJECT_NAME} pico_stdlib
$ENV{TFLITE_MICRO_PATH}/gen/cortex_m_generic_cortex-m0plus_default/lib/libtensorflow-microlite.a)
include_directories(${PROJECT_NAME} PRIVATE $ENV{TFLITE_MICRO_PATH}/)
include_directories(${PROJECT_NAME} PRIVATE
$ENV{TFLITE_MICRO_PATH}/tensorflow/lite/micro/tools/make/downloads/flatbuffers/include/)
include_directories(${PROJECT_NAME} PRIVATE
$ENV{TFLITE_MICRO_PATH}/tensorflow/lite/micro/tools/make/downloads/gemmlowp/)
tflite-micro library
Includes needed for
the code in main.cpp

82 31/01/2024
TFLite-Micro
MAIN.CPP
➢ Most of the code in main.cpp is the same
➢ Following slides show the things that change

83 31/01/2024
TFLite-Micro
MAIN.CPP
➢ Main.cpp:
#include "tensorflow/lite/micro/tflite_bridge/micro_error_reporter.h“
#include "tensorflow/lite/micro/micro_interpreter.h“
#include "tensorflow/lite/schema/schema_generated.h“
#include "tensorflow/lite/micro/system_setup.h“
#include "tensorflow/lite/micro/micro_log.h“
#include "tensorflow/lite/micro/micro_mutable_op_resolver.h“

84 31/01/2024
TFLite-Micro
MAIN.CPP
➢ Main.cpp:
const tflite::Model* model = nullptr;
tflite::MicroInterpreter* interpreter = nullptr;
TfLiteTensor* input = nullptr;
constexpr int kTensorArenaSize = 160000;
alignas(16) static uint8_t tensor_arena[kTensorArenaSize];
tflite::InitializeTarget();
static tflite::MicroMutableOpResolver<9> op_resolver;
added
Replaces allOpsResolver

85 31/01/2024
TFLite-Micro
MAIN.CPP
➢ Main.cpp:
➢ op_resolver.AddConv2D(tflite::Register_CONV_2D_INT8());
➢ op_resolver.AddDepthwiseConv2D(tflite::Register_DEPTHWISE_CONV_2D_INT8());
➢ op_resolver.AddPad();
➢ op_resolver.AddAdd(tflite::Register_ADD_INT8());
➢ op_resolver.AddRelu6();
➢ op_resolver.AddMean();
➢ op_resolver.AddSoftmax(tflite::Register_SOFTMAX_INT8());
➢ op_resolver.AddFullyConnected(tflite::Register_FULLY_CONNECTED_INT8());
➢ op_resolver.AddDequantize();

87 31/01/2024
Next steps
We want to have the inference live with images coming from a ~5€ camera
➢ OV7670 sensor
Get images but keep the corresponding cost to a minimum ?
Counting on
the CPU (SW)?
Not Counting
on CPU (HW)?
• Efficient memory usage
• Fusing operations
• …
• PIO
• DMA (direct memory access)

88 31/01/2024
PIO
➢ A PIO instance contains 4 state machines
➢ A state machine is like a tiny processor that can execute a limited set of instructions
➢ The CPU loads the corresponding instructions, enabling/disabling the state machines
➢ Its a way to delegate some workload away from the CPU, i.e. communication with
the camera
➢ We wanted to show a quick example where the Raspberry Pi PICO blinks a LED
without using the CPU but… time is ticking !
PROGRAMMABLE INPUT/OUPUT

89 31/01/2024
Edge AI is Multidisciplinary
Hardware
Data
science
Software
A
B
C
Wanting a first inference from scratch…
A need is defined, hence technical specifications must be met
Seeking to be state of the art…
Bad news.
All 3 needed.
Bad news.
Mastering all 3 needed.
Good news.
Insisting on the most comfortable one(s).
Usually what must be
achieved?

90 31/01/2024
Edge AI issues
Multidisciplinary expertise required… and…
➢ Fast evolving field (new models, new frameworks,
new HW…)
➢ Many constraints to respect for the solution:
➢ Technical (inference time, memory usage, …)
➢ Human (appropriate to user expertise)
➢ Business (at an OK cost)
Hardware
Data
science
Software

91 31/01/2024
Conclusion
➢ Beware of name dropping
➢ First a need, then embedded machine learning is a possibility (and technical
specifications are the finish line)
➢ Embedded Machine learning is inherently multidisciplinary
➢ Very complex from scratch… But we are helped!

92 31/01/2024
Don’t hesitate to contact us!
➢ Questions about this presentation?
➢ Miguel Lejeune (miguel.lejeune@sirris.be, +32 490 01 41 44)
➢ Vincent Lucas (vincent.lucas@sirris.be, +32 493 31 15 92)
➢ Questions about other technologies/Sirris offerings?
➢ Questions about fundings ?
➢Bas Rottier (bas.rottier@sirris.be, +32 491 86 91 70)

93 31/01/2024
Thanks!

LINKEDIN.COM/COMPANY/SIRRIS
@SIRRIS_BE
FACEBOOK.COM/SIRRIS.BE
SIRRIS.BE

Presentation - webinar embedded machine learning

Recommended

Recommended

More Related Content

Similar to Presentation - webinar embedded machine learning

Similar to Presentation - webinar embedded machine learning (20)

More from Sirris

More from Sirris (20)

Recently uploaded

Recently uploaded (20)

Presentation - webinar embedded machine learning