Artificial intelligence at the edge

Jameson Toole
Artificial Intelligence at the Edge
ODSC Meetup, 2018

Jameson Toole · Artificial Intelligence at the Edge · ODSC
Before we get started...
Demo + Code About me
● Community: heartbeat.fritz.ai
● App: bit.ly/tryheartbeat
● Code: bit.ly/heartbeatsource
● School: Michigan, MIT
● Background: Data science / ML
● Now: Cofounder / CEO, Fritz

Jameson Toole · Artificial Intelligence at the Edge · ODSC 4
Infer
AI extracts relevance
Act
Based on intelligence
Sense
Everything, realtime
Edge Intelligence with Centralized Learning
Learn
Source: The End of Cloud Computing by Peter Levine

60 Frame-per-second problems

Compute doesn’t grow on trees

Edge devices will dominate AI inference

Growth of AI Fueled by Edge Computing
Now: Edge Intelligence
Next: Mobile dominates
AI inference
Future: Training on
Edge

1 Python in the cloud vs Swift on the edge.
Stack Mismatch
2 Need engineers that know ML and mobile.
Ninja Unicorns Wanted
3
Long chain from R&D to production. Failures are silent.
Fragile Infrastructure4
Thousands of devices, processors, operating systems.
Heterogeneous Hardware
State of Edge Computing

Magic Sudoku

InstaSaber

Weightlifting App (iOS)

DensePose

https://www.eff.org/ai/metrics

Lifecycle Management
Expectation management
● What’s different between cloud and edge?
● Lots of talk about deep learning (but traditional
ML is fun too!)
● Tools and concepts

Collect
Lifecycle Management: Overview
Train
Optimize
Convert
Monitor
Deploy
Protect
The Life of a
Mobile Model

Collect
Open Data
● Image Recognition: ImageNet,
CIFAR
● Object Detection: COCO, KITTI
● OCR: EMNIST, Tesseract
● Exhaustive List:
deeplearning4j.org/opendata

Collect - Data Augmentation
img = PIL.Image.open(img_data)
rot_img = img.rotate(85)
blur_img = img.filter(PIL.ImageFilter.BLUR)
dark_img = PIL.Image.fromarray(
(numpy.array(img) / 2).astype('uint8')
)
Training matches production

Collect - Input Sizes
● Beware of large inputs
● Explicit dimensions
TensorFlow / Keras Core ML
[None, None, None, 3] -> [1, 1, 1, 3]
● Proportion and orientation
->

Collect - Input / Output Sampling
● Cache model inputs and outputs
○ Buffer size
○ Sampling rate
○ Device characteristics
● Send back to the cloud
○ Connectivity
○ Bandwidth limitations
● Privacy
API
DB
Client

Train
● Training environments should match production.
● Beware of pre- and post-processing.
● Keep an eye on distributed training.

Train - Mobile Friendly Tools
● Core ML
● TensorFlow Lite
● Windows ML
● Caffe2Go
● Turi Create
● IBM Watson Studio
● Azure Custom Vision

Optimize
● Architecture
○ Depthwise-separable convolutions
○ Hardware-aware design
○ Architecture search
● Pruning
● Compression
○ Quantization
○ Serialization

Optimize - Architecture
Iterative process
● Generate architecture
● Train network
● Benchmark performance
● Minimize Cost(speed, accuracy, size)
alchemy.fritz.ai

Optimize - Architecture
Depthwise-separable Convolution
FLOPs = DK · DK · M · N · DF · DF FLOPs = DK ·DK · M · DF · DF
+ M · N · DF · DF
Standard Convolution
MobileNets: https://arxiv.org/abs/1704.04861
Kernel size
Input channels
Output channels
Input size

Optimize - Pruning
MobileNet width multiplier
● “The role of the width multiplier
α is to thin a network uniformly
at each layer.”
● The number of input channels M
becomes αM
● The number of output channels
N becomes αN

Optimize - Pruning
Goal: Remove operations that don’t
contribute to accuracy
● Step 1: Compute importance
● Step 2: Prune
● Step 3: Rewire
● Step 4: Fine-tune
http://machinethink.net/blog/compressing-deep-neural-nets/
Original network size: 4,253,864 parameters
Compressed network size: 3,210,232 parameters
Compressed to: 75.5% of original size
Top-1 accuracy over 50000 images = 67.2%
Top-5 accuracy over 50000 images = 87.7%

Optimization - Quantization
Fixed-point Quantization
● Smaller model size
● Faster runtime
● Less memory
● tf.contrib.quantize
tf.contrib.quantize.create_training_graph(...)
tf.contrib.quantize.create_eval_graph(...)
● mxnet.contrib.quantization
mxnet.contrib.quantize_model(
sym, arg_params, aux_params,...)
● 3-5x speed up, 2-10x compression, 0-5% less accurate

Convert
Server Side
TensorFlow
Keras
Caffe2
PyTorch
MXNet
Mobile Friendly
Core ML
TensorFlow Lite
Windows ML
Caffe2Go
coremltools
TOCO
tf-coreml
torch2coreml
ONNX
mxnet-to-coreml

Protect
Once a model is on a device, assume anyone can access it.
● Encryption
○ Full model -> Can’t use mobile frameworks
○ Weight data -> Need to write custom layers
● Obfuscation
○ Proprietary pre-processing (i.e. rotate 90 degrees)
○ Scramble weights or layer order

Deploy
● Port pre- and post-processing
● Pre-loaded vs fetch at runtime
● Models are features
○ Staged rollouts
○ A/B testing
○ Versioning
● Put the right model on the
right device
Integrate: bit.ly/heartbeatsource

Monitor
● Metrics
○ Runtime
○ Memory
○ Battery drain
○ Usage
● Slice by
○ OS
○ Device
○ Chipset

Collect
Lifecycle Management: Overview
Train
Optimize
Convert
Monitor
Deploy
Protect
The Life of a
Mobile Model

SDKTrain your
own model
Use one
of ours
Cross-platform
portability
Analytics
Monitoring
Developer API
Optimize
Native ModelBuild
Validation OTA Update
Release Manage
Monitoring
iOS & Android
Deploy ML/AI models on all your mobile devices
Complete Platform for Edge Intelligence
44

“Interactive Sketch-Based Normal Map Generation with
Deep Neural Networks

Skydio R1

https://machinelearning.apple.com/2017/10/01/hey-siri.html

@ben_ferns

Want to be an edge expert?
Questions?
Jameson Toole, CEO
jameson@fritz.ai
Get started: www.fritz.ai
Community: Heartbeat
Tools: Alchemy

Artificial intelligence at the edge

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Artificial intelligence at the edge

Similar to Artificial intelligence at the edge (20)

Recently uploaded

Recently uploaded (20)

Artificial intelligence at the edge