https://learn.xnextcon.com/event/eventdetails/W20040610
This talk explains how to practically bring the power of convolutional neural networks and deep learning to memory and power-constrained devices like smartphones. You will learn various strategies to circumvent obstacles and build mobile-friendly shallow CNN architectures that significantly reduce the memory footprint and therefore make them easier to store on a smartphone;
The talk also dives into how to use a family of model compression techniques to prune the network size for live image processing, enabling you to build a CNN version optimized for inference on mobile devices. Along the way, you will learn practical strategies to preprocess your data in a manner that makes the models more efficient in the real world.
27. @PRACTICALDLBOOK@PRACTICALDLBOOK
TensorFlow Lite is Fast
27
Takes advantage of on-device
hardware acceleration
FlatBuffers
•Reduces code footprint,
memory usage
•Reduces CPU cycles on
serialization and
deserialization
•Improves startup time
Pre-fused activations
Combining batch
normalization layer with
previous Convolution
Static memory and static
execution plan
Decreases load time
31. @PRACTICALDLBOOK@PRACTICALDLBOOK
ML Kit
31
Simple Abstraction over
TensorFlow Lite
Built in APIs for
Image Labeling, OCR, Face
Detection, Barcode
scanning, Landmark
detection, Smart reply
Model management
with Firebase
Upload model on web
interface to distribute
A/B Testing
41. @PRACTICALDLBOOK 41
Alchemy by Fritz
https://alchemy.fritz.ai/
Python library to analyze and
estimate mobile performance
No need to deploy on mobile
43. @PRACTICALDLBOOK 43
Battery
Standpoint
Won’t AI inference kill the battery quickly?
Answers: You don’t usually run AI models constantly, you run
it for a few seconds.
With a modern flagship phone, running Mobilenet
at 30 fps should burn battery in 2-3 hours.
Bigger question - Do you really need to run it at 30
FPS? Or could it be run 1 FPS?
46. @PRACTICALDLBOOK
Seeing AI
Audible Barcode recognition
Aim: Help blind users identify products using barcode
Issue: Blind users don’t know where the barcode is
Solution: Guide user in finding a barcode with audio cues
46
47. @PRACTICALDLBOOK
AR Hand Puppets
Hart Woolery from 2020CV
Object Detection (Hand) + Key Point Estimation
47
[https://twitter.com/2020cv_inc/status/1093219359676280832]
AR Hand Puppets, Hart Woolery from 2020CV, Object Detection (Hand) + Key Point Estimation
58. @PRACTICALDLBOOK
Model Pruning
Aim: Remove all connections with absolute weights below a threshold
58
Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015
59. @PRACTICALDLBOOK
Pruning in Keras
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
59
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)),
tf.keras.layers.Dropout(0.2),
prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
])
60. @PRACTICALDLBOOK
So many techniques - so little time!
Quantization
Weight sharing
Channel pruning
Filter pruning (ThiNet)
Better Layers (Dilated Conv, HetConv, OctConv)
Knowledge Distillation
Binary networks (BNN, XNOR-Net)
Lottery Ticket Hypothesis
and many more ...
60
64. @PRACTICALDLBOOK@PRACTICALDLBOOK
AutoML – Let AI Design an Efficient Arch
64
Neural Architecture Search (NAS) - An
automated approach for designing models using
reinforcement learning while maximizing
accuracy.
Hardware Aware NAS = Maximizes accuracy
while minimizing run-time on device
Incorporates latency information into the reward
objective function
Measure real-world inference latency by
executing on a particular platform
1.5x faster than MobileNetV2 (MnasNet)
ResNet-50 accuracy with 19x less parameters
SSD300 mAP with 35x less FLOPs
66. @PRACTICALDLBOOK
ProxylessNAS – Per Hardware Tuned CNNs
66
Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019
69. @PRACTICALDLBOOK
On-Device Training in Core ML
69
let updateTask = try MLUpdateTask(
forModelAt: modelUrl,
trainingData: trainingData,
configuration: configuration,
completionHandler: { [weak self]
self.model = context.model
context.model.write(to: newModelUrl)
})
⁻ Core ML 3 introduced on device learning
⁻ Never have to send training data to the server with the
help of MLUpdateTask.
⁻ Schedule training when device is charging to save power
72. @PRACTICALDLBOOK 72
TensorFlow
Federated
Train a global model using 1000s of devices
without access to data
Encryption + Secure Aggregation Protocol
Can take a few days to wait for aggregations to
build up
https://github.com/tensorflow/federated
73. @PRACTICALDLBOOK
What We Learnt Today
73
⁻ Why deep learning on mobile?
⁻ Building a model
⁻ Running a model
⁻ Hardware factors
⁻ Benchmarking
⁻ State-of-the-art applications
⁻ Making a model more efficient
⁻ Federated Learning