Deep learning on mobile - 2019 Practitioner's Guide

@PRACTICALDLBOOK@PRACTICALDLBOOK
Deep Learning On
Mobile
A PRACTITIONER’S GUIDE

@PRACTICALDLBOOK
@SiddhaGanju
@MeherKasam
@AnirudhKoul
4

@PRACTICALDLBOOK
Why Deep Learning on Mobile?
Privacy Reliability
Cost Latency
5

@PRACTICALDLBOOK 6
https://media.giphy.com/media/fBzSGPMxD0isw/giphy.gif

@PRACTICALDLBOOK
Latency Is Expensive!
7
100 milliseconds 1% loss
[Amazon 2008]

Latency Is Expensive!
8
>3 sec
load time
53%
bounce
Mobile Site Visits
[Google Research, Webpagetest.org]

Power of 10
9
0.1s
Seamless Uninterrupted
flow of thought
1s 10s
Limit of
attention
[Miller 1968; Card et al. 1991; Nielsen 1993]

@PRACTICALDLBOOK 10
Efficient Mobile
Inference Engine
Efficient
Model+ = DL App

@PRACTICALDLBOOK
How to Train My
Model?
11

@PRACTICALDLBOOK 12
Learn to Play
Melodica
3 Months

@PRACTICALDLBOOK
Already
Play
Piano?
13

@PRACTICALDLBOOK 14
FINE-TUNE
your skills
3months
1week

Fine-tuning
15
Assemble a
dataset
Find a pre-
trained
model
Fine-tune a
pre-trained
model
Run using
existing
frameworks
“Don’t Be A Hero”
- Andrej Karpathy

CustomVision.ai
16
Use Fatkun Browser Extension to download images from Search Engine, or use Bing Image Search API to
programmatically download photos with proper rights

@PRACTICALDLBOOK
How Do I Run My Models?
20

@PRACTICALDLBOOK 21
Core ML TF Lite ML Kit

Apple Ecosystem
22
Metal
• 2014
BNNS + MPS
• 2016
Core ML
• 2017
Core ML 2
• 2018
Core ML 3
• 2019
- Tiny models (~ KB)!
- 1 bit model quantization support
- Batch API for improved performance
- Conversion support for MXNet, ONNX
- tf-coreml

Apple Ecosystem
23
Metal
• 2014
BNNS + MPS
• 2016
Core ML
• 2017
Core ML 2
• 2018
Core ML 3
• 2019
- On-device training
- Personalization
- Create ML UI

@PRACTICALDLBOOK
Core ML Benchmark
538
129
75
557
109
7877 44 3674 35 3071 33 2926 18 15
0
100
200
300
400
500
600
ResNet-50 MobileNet SqueezeNet
EXECUTION TIME (MS) ON APPLE
DEVICES
iPhone 5s (2013) iPhone 6 (2014) iPhone 6s (2015)
iPhone 7 (2016) iPhone X (2017) iPhone XS (2018)
24
https://heartbeat.fritz.ai/ios-12-core-ml-benchmarks-b7a79811aac1
GPUs became a
thing here!

TensorFlow Ecosystem
25
TensorFlow
• 2015
TensorFlow Mobile
• 2016
TensorFlow Lite
• 2018
Smaller Faster Minimal
dependencies
Allows running
custom operators

TensorFlow Lite is small
26
75KB
Core
Interpreter
1.5MB
TensorFlow
Mobile
400KB
Core Interpreter +
Supported
Operations

TensorFlow Lite is Fast
27
Takes advantage of on-device
hardware acceleration
FlatBuffers
•Reduces code footprint,
memory usage
•Reduces CPU cycles on
serialization and
deserialization
•Improves startup time
Pre-fused activations
Combining batch
normalization layer with
previous Convolution
Static memory and static
execution plan
Decreases load time

28
TensorFlow
• 2015
TensorFlow Mobile
• 2016
TensorFlow Lite
• 2018
Smaller Faster Minimal
dependencies
Allows running
custom operators

29
TensorFlow
• 2015
TensorFlow Mobile
• 2016
TensorFlow Lite
• 2018
$ tflite_convert --keras_model_file = keras_model.h5 --output_file=foo.tflite

30
TensorFlow
• 2015
TensorFlow Mobile
• 2016
TensorFlow Lite
• 2018
Trained
TensorFlow Model
TF Lite Converter .tflite model
Android App
iOS App

ML Kit
31
Simple Abstraction over
TensorFlow Lite
Built in APIs for
Image Labeling, OCR, Face
Detection, Barcode
scanning, Landmark
detection, Smart reply
Model management
with Firebase
Upload model on web
interface to distribute
A/B Testing

@PRACTICALDLBOOK 32
How Do I
Keep My IP
Safe?

@PRACTICALDLBOOK
Fritz
Full fledged mobile lifecycle support
Deployment, instrumentation, etc. from Python
33

Recommendation for Product Development
34
Train a model
using
framework of
choice
Convert to
TensorFlow Lite
format
Upload to
Firebase
Deploy to
iOS/Android
apps with MLKit

An Important Question
35
APP TOO BIG! WHAT DO?
Apple does not allow apps over 200 MB to be
downloaded over cellular network. Download on
demand, and interpret on device instead.

@PRACTICALDLBOOK 36
What Effect Does
Hardware have on
Performance?

@PRACTICALDLBOOK
Big Things Come In Small Packages
37

@PRACTICALDLBOOK
Effect of Hardware
L-R: iPhone XS,
iPhone X, iPhone 5
38
https://twitter.com/matthieurouif/status/1126575118812110854?s=11

@PRACTICALDLBOOK
TensorFlow Lite Benchmarks
Alpha Lab releases Numericcal: http://alpha.lab.numericcal.com/

@PRACTICALDLBOOK 40
TensorFlow Lite Benchmarks
Crowdsourcing AI Benchmark App by Andrey Ignatov from ETH Zurich. http://ai-benchmark.com/

@PRACTICALDLBOOK 41
Alchemy by Fritz
https://alchemy.fritz.ai/
Python library to analyze and
estimate mobile performance
No need to deploy on mobile

@PRACTICALDLBOOK 42
User
Experience
Standpoint
TO GET 95%+ USER COVERAGE,
SUPPORT PHONES RELEASED IN THE
PAST 3.5 YEARS
IF NOT POSSIBLE, OFFER GRACEFUL
DEGRADATION

@PRACTICALDLBOOK 43
Battery
Standpoint
Won’t AI inference kill the battery quickly?
Answers: You don’t usually run AI models constantly, you run
it for a few seconds.
With a modern flagship phone, running Mobilenet
at 30 fps should burn battery in 2-3 hours.
Bigger question - Do you really need to run it at 30
FPS? Or could it be run 1 FPS?

Energy Reduction from 30 FPS to 1 FPS
44
iPad Pro 2017

@PRACTICALDLBOOK 45
What Exciting
Applications Can I
Build?

@PRACTICALDLBOOK
Seeing AI
Audible Barcode recognition
Aim: Help blind users identify products using barcode
Issue: Blind users don’t know where the barcode is
Solution: Guide user in finding a barcode with audio cues
46

@PRACTICALDLBOOK
AR Hand Puppets
Hart Woolery from 2020CV
Object Detection (Hand) + Key Point Estimation
47
[https://twitter.com/2020cv_inc/status/1093219359676280832]
AR Hand Puppets, Hart Woolery from 2020CV, Object Detection (Hand) + Key Point Estimation

@PRACTICALDLBOOK 48
Zero-Gravity Space, Takahiro Horikawa, Mask RCNN (segmentation) + PixMix (Image In-Painting) + Unity (Physics)

@PRACTICALDLBOOK 49
[HomeCourt.ai]Object Detection (Ball, Hoop, Player) + Body Pose + Perspective Transformation

@PRACTICALDLBOOK 50
Polarr, Machine Guided Composition, Automated cropping with highest aesthetic score

@PRACTICALDLBOOK
Remove objects
Brian Schulman, Adventurous Co.
Object Segmentation + Image In Painting
51
https://twitter.com/smashfactory/status/1139461813710442496

@PRACTICALDLBOOK
Magic Sudoku App
Edge Detection + Classification + AR Kit
52
https://twitter.com/braddwyer/status/910030265006923776

@PRACTICALDLBOOK
People Segmentation
AR Kit
Abound Labs https://www.aboundlabs.com/
53
https://twitter.com/nobbis/status/1135975245406515202

Snapchat
54
Face Swap
GANs

@PRACTICALDLBOOK
Can I Make My Model Even More
Efficient?
55

How To Find Efficient Pre-Trained Models
56
Papers with Code
https://paperswithcode.com/sota
Model Zoo
https://modelzoo.co

@PRACTICALDLBOOK@PRACTICALDLBOOK 57
What you can affordWhat you want

@PRACTICALDLBOOK
Model Pruning
Aim: Remove all connections with absolute weights below a threshold
58
Song Han, Jeff Pool, John Tran, William J. Dally, "Learning both Weights and Connections for Efficient Neural Networks", 2015

@PRACTICALDLBOOK
Pruning in Keras
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
tf.keras.layers.Dense(512, activation=tf.nn.relu),
tf.keras.layers.Dropout(0.2),
tf.keras.layers.Dense(10, activation=tf.nn.softmax)
])
59
model = tf.keras.models.Sequential([
tf.keras.layers.Flatten(),
prune.Prune(tf.keras.layers.Dense(512, activation=tf.nn.relu)),
tf.keras.layers.Dropout(0.2),
prune.Prune(tf.keras.layers.Dense(10, activation=tf.nn.softmax))
])

@PRACTICALDLBOOK
So many techniques - so little time!
Quantization
Weight sharing
Channel pruning
Filter pruning (ThiNet)
Better Layers (Dilated Conv, HetConv, OctConv)
Knowledge Distillation
Binary networks (BNN, XNOR-Net)
Lottery Ticket Hypothesis
and many more ...
60

@PRACTICALDLBOOK 61
The one with
the best thing
ever

@PRACTICALDLBOOK
Pocket Flow – 1 Line to Make a Model Efficient
Tencent AI Labs created an Automatic Model Compression (AutoMC) framework
62

@PRACTICALDLBOOK 63
Can I design a better
architecture myself?
Maybe? But AI can
do it much better!

AutoML – Let AI Design an Efficient Arch
64
Neural Architecture Search (NAS) - An
automated approach for designing models using
reinforcement learning while maximizing
accuracy.
Hardware Aware NAS = Maximizes accuracy
while minimizing run-time on device
Incorporates latency information into the reward
objective function
Measure real-world inference latency by
executing on a particular platform
1.5x faster than MobileNetV2 (MnasNet)
ResNet-50 accuracy with 19x less parameters
SSD300 mAP with 35x less FLOPs

@PRACTICALDLBOOK 65
Evolution of Mobile NAS Methods
Method Top-1 Acc (%) Pixel-1 Runtime Search Cost
(GPU Hours)
MobileNetV1 70.6 113 Manual
MobileNetV2 72.0 75 Manual
MnasNet 74.0 76 40,000 (4 years+)
ProxylessNas-R 74.6 78 200
Single-Path NAS 74.9 79.5 3.75 hours

@PRACTICALDLBOOK
ProxylessNAS – Per Hardware Tuned CNNs
66
Han Cai and Ligeng Zhu and Song Han, "ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware", ICLR 2019

@PRACTICALDLBOOK
Can I Train on a Mobile Device?
68

@PRACTICALDLBOOK
On-Device Training in Core ML
69
let updateTask = try MLUpdateTask(
forModelAt: modelUrl,
trainingData: trainingData,
configuration: configuration,
completionHandler: { [weak self]
self.model = context.model
context.model.write(to: newModelUrl)
})
⁻ Core ML 3 introduced on device learning
⁻ Never have to send training data to the server with the
help of MLUpdateTask.
⁻ Schedule training when device is charging to save power

@PRACTICALDLBOOK
Can I Train a Global Model
Without Access to User Data?
70

@PRACTICALDLBOOK 71
FEDERATED LEARNING!!!
https://federated.withgoogle.com/

@PRACTICALDLBOOK 72
TensorFlow
Federated
Train a global model using 1000s of devices
without access to data
Encryption + Secure Aggregation Protocol
Can take a few days to wait for aggregations to
build up
https://github.com/tensorflow/federated

@PRACTICALDLBOOK
What We Learnt Today
73
⁻ Why deep learning on mobile?
⁻ Building a model
⁻ Running a model
⁻ Hardware factors
⁻ Benchmarking
⁻ State-of-the-art applications
⁻ Making a model more efficient
⁻ Federated Learning

How to Access the Slides
in 1 Second
HTTP://PRACTICALDL.AI
@PRACTICALDLBOOK

@PRACTICALDLBOOK
@SiddhaGanju
@MeherKasam
@AnirudhKoul
75

Deep learning on mobile - 2019 Practitioner's Guide

In this document