Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Deep learning on microcontrollers - IETF 101 - T2TRG

1,331 views

Published on

Machine learning is cool, but requires clusters of GPUs. But what if we can put machine learning on the edge? It would enable new use cases such as sensor fusion, federated learning, or super-advanced compression through auto-encoders. uTensor and CMSIS-NN make it possible for microcontrollers that cost <1$.

Videos in this presentation:

Slide 12: https://www.youtube.com/watch?v=FhbCAd0sO1c
Slide 23: https://twitter.com/janjongboom/status/953014129580748800
Slide 25: https://www.youtube.com/watch?v=PdWi_fvY9Og

Presentation for Thing-to-Thing Research Group during IETF 101 in London.

Published in: Internet
  • Be the first to comment

  • Be the first to like this

Deep learning on microcontrollers - IETF 101 - T2TRG

  1. 1. Deep learning on microcontrollers Jan Jongboom IETF 101, London 22 March 2018 https://twitter.com/billyd/status/701804370560729089
  2. 2. © 2018 Arm Limited 3 Machine learning
  3. 3. © 2018 Arm Limited 4 Why machine learning on the edge? Sensor fusion http://www.gierad.com/projects/supersensor/
  4. 4. © 2018 Arm Limited 5 Why machine learning on the edge? Federated learning https://research.googleblog.com/2017/04/federated-learning-collaborative.html
  5. 5. © 2018 Arm Limited 6 Why machine learning on the edge? LPWANs http://ufldl.stanford.edu/tutorial/unsupervised/Autoencoders/
  6. 6. © 2018 Arm Limited 7 Why machine learning on the edge? Offline self-contained systems https://os.mbed.com/blog/entry/streaming-data-cows-dsa2017/
  7. 7. © 2018 Arm Limited 8 Edge vs. Cloud Environmental data model American Society of Heating, Refrigerating and Air-Conditioning Engineers (ASHRAE) RP-884 Adaptive Model Project
  8. 8. © 2018 Arm Limited 9 Microcontrollers Small (1cm²) Cheap (~1$) Efficient (standby: 0.3 μA) Slow (max. 100 MHz) Limited memory (max. 256K RAM) Downsides 2.7cm
  9. 9. © 2018 Arm Limited 10 uTensor Machine learning for microcontrollers Runs in <256K RAM TensorFlow compatible Built on top of Mbed OS 5
 (file systems, drivers, 150 boards compatible) Open source, Apache 2.0 license
  10. 10. © 2018 Arm Limited 11 uTensor Team Neil Tan Arm Michael Bartling Arm Dboy Liao Piniko Kazami Hsieh Academia Sinica
  11. 11. © 2018 Arm Limited 12
  12. 12. © 2018 Arm Limited 13 How? MNIST data set Training set: 60,000 images Every drawing is downsampled to 28x28 pixels Supervised learning through backpropagation https://github.com/uTensor/tf-node-viewer/blob/master/deep_mlp.py
  13. 13. © 2018 Arm Limited 14 Multi-layer perceptron (MLP) classification 28x28 = 784 Neuron Output 9 Matrix multiplication (weight), bias, then activation function Potential outputs
  14. 14. © 2018 Arm Limited 15 How?
  15. 15. © 2018 Arm Limited 16 Quantization 8-bit integers instead of 32-bit floats Only during classification 79.9% accuracy vs. 80.3% accuracy (CIFAR-10) TensorFlow requires floating-point de-quantization between layers (nodes?) https://petewarden.com/2016/05/03/how-to-quantize-neural-networks-with-tensorflow/
  16. 16. © 2018 Arm Limited 17 Memory usage Matrix multiplication in first hidden layer dominates RAM usage: Input elements: 784 Number of neurons (1st layer): 128 Number of weight (input to 1st layer): 128 * 784 Resulting values (Pre-activation function): 128 Data type: 8-bit integer (1 byte) 1 byte * (784 + (128 * 784) + 128) = 98.891 kB
  17. 17. © 2018 Arm Limited 18 Other tricks Paging of memory for larger models (sacrifices speed) Graph in ROM (requires pre-processing Take advantage in sparsity of data, sacrifice accuracy (TBD)
  18. 18. © 2018 Arm Limited 19 Operators Add, Subtract Min, Max, ArgMax ReLU, Matrix multiplication, Reshape, Quantization Convolution (WIP) Pooling (WIP) Networked (TBD)
  19. 19. © 2018 Arm Limited 20 Tensors RAM tensor Flash tensor Sparse tensor Networked tensor Tensors can be paged to fit larger networks
  20. 20. © 2018 Arm Limited 21 Workflow Matrix multiplication data from trained model Tensor graph
  21. 21. © 2018 Arm Limited 22 TensorBoard vs uTensor
  22. 22. Developing using the simulator
  23. 23. © 2018 Arm Limited 24 CMSIS-NN New neural network kernel functions Leverages the DSP/SIMD functions in silicon See speedup of 4-5x Hardware acceleration for convolution, pooling, etc. uTensor will be built on top of CMSIS-NN
  24. 24. © 2018 Arm Limited 26 Speed Measured on Cortex-M7 216 MHz, 133 KB memory used 3 convolution layers, 1 fully connected layer, 32x32 color images
  25. 25. © 2018 Arm Limited 27 Recap 1. Buy a development board (http://os.mbed.com/platforms) 2. Clone uTensor (https://github.com/uTensor/uTensor) 3. ??? 4. PROFIT!!!
  26. 26. © 2018 Arm Limited 28 https://labs.mbed.com Jan Jongboom, Arm Thank you!

×