Affordable AI
Connects To A Better Life
Bofu Chen, Sep 21, 2016
Intelligent Gateway
Affordable AI Techniques
Implementation
Example: Pepper Robot
Example: Campus Security System
AGENDA
Intelligent Gateway
Photo: Robert Bond
Photo: Robert Bond
Cat Recognition
Photo: Robert Bond
Photo: Robert Bond
Deep Learning Inference
Cat!
No Backpropagation
Inference Essentials
MB
Computing Time Memory Usage
Shorten the prediction time
is always welcome
Device memory is limited,
but deep learning model can
be huge
Techniques To Make AI Affordable
Inference Researches
Weight Storage Hardware Usage
Reduce weight storage size
without sacrificing accuracy
Utilize computing
components (CPU, GPU,
etc.) as many as possible
simultaneously
Binarized Neural Networks, http://arxiv.org/abs/1602.02830 | XNOR-Net, http://arxiv.org/abs/1603.05279 | DoReFa-Net, https://arxiv.org/abs/1606.06160 | DeepX, http://niclane.org/pubs/deepx_ipsn.pdf
Approaches
Compression
Nvidia
TensorRT
Optimization
Throughput
Power efficiency
Memory usage
Keep accuracy
Speed up
Low-level speed up
Nvidia TensorRT
Like a model compiler
Production Deep Learning with NVIDIA GPU Inference Engine, https://devblogs.nvidia.com/parallelforall/production-deep-learning-nvidia-gpu-inference-engine/
Pruning
Learning both Weights and Connections for Efficient Neural Networks, https://arxiv.org/abs/1506.02626
Quantization
How to Quantize Neural Networks with TensorFlow, https://github.com/tensorflow/tensorflow/blob/master/tensorflow/g3doc/how_tos/quantization/index.md
DNN is noise tolerable
FP16 to INT8
Hardware speedup
FPU to ALU
Inference Without DL Frameworks
Likely A compiler intermediate representation for image recognition and heterogeneous computing, http://liblikely.org/
Implementation
Deep Learning Computer Vision
NVIDIA TX1
Pre-trained
On Server
End Devices (Sender)
Architecture
End Devices (Receiver)
Intelligent Gateway
NVIDIA
TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
Inference Choices
TensorRT
Fast Object
Slow Motion
TensorFlow on TX1
DONE
Model Server
Maximize Performance
NEXT
Inference Optimization
on Ubuntu
Other Attempts
Raspberry Pi 3
Qualcomm
Snapdragon 801
0.9s/img
GoogLeNet
Real-Time
Inception v3
Pepper, The Emotional Robot
HW Specification
4-core
1.9 GHz
4 GB 790 MHz
Pepper motherboard specification, http://doc.aldebaran.com/2-4/family/pepper_technical/motherboard_pep.html
Vision and Speech Limitations
Instead of
face identification
Keywords instead of NLP
FaceRecognition SpeechRecognition
Cloud Solution Drawbacks
CostConnectivity Privacy
Need to ensure bandwidth,
stability and latency are
good enough
Huge amount of
image transmission
You might want to keep
family information locally
Architecture
Pepper Gateway
NVIDIA
TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
Real World Gesture Recognition Algorithm
Campus Security System
Current Solution
CloudEnd Device
Current Solution
Cloud
NOT
INTELLIGENT
Current Solution
Cloud
NOT
INTELLIGENT
NOT
REAL-TIME
Architecture
Security Gateway
NVIDIA
TX1
Ubuntu
Tensorflow
REST
TensorRT
gRPC
Student
Student
Suspects
Student
Student
Student
Student
Student
DT42
Violent Event
Kinect v2
Update
USB Firmware
Open Source
Libraries
Fix data transmission issue libfreenect2 and pylibfreenect2
make enablement easier
MS Kinect v2 on Nvidia Jetson TX1, http://jetsonhacks.com/2016/07/11/ms-kinect-v2-nvidia-jetson-tx1/
We Are DT42
Affordable AI Connects To A Better Life

Affordable AI Connects To A Better Life