3. Confidential
3
● 30+ years in R&D
● 17 years in Israel HighTech
● ECI, Telrad, RAD, Audiocodes companies
● HW, SW, Mechanical design engineer
● Project & Product Manager
● Business developer for EMEA & CIS
countries
● Solution Architect
● 22 publications, US patent
● Counseling & SW development teaching
About us
● Over 7 years of IT experience
● Embedded Linux programming
● IoT related project.
● C, Python, BLE, Mesh networking, IoT, Embedded, Linux,
ZeroMQ, nRF51, STM8, UART, SPI
● National Technical University of Ukraine Kiev Polytechnic
Institute
● MS in Electronics Engineering
4. Confidential
4
1. AI algorithms overview
2. Application examples and request for embedded installation
3. Intel Neural Compute Stick overview
4. NCS demonstration for Classification & Detection problems
5. Hardware for Embedded AI
Agenda
11. Confidential
11
● Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for human detection." Computer Vision and Pattern Recognition,
2005. CVPR 2005. IEEE Computer Society Conference on. Vol. 1. IEEE, 2005. [PDF]
● Felzenszwalb, Pedro, David McAllester, and Deva Ramanan. "A discriminatively trained, multiscale, deformable part model."
Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008 [PDF]
● Everingham, Mark, et al. "The pascal visual object classes (VOC) challenge." International Journal of Computer Vision 88.2 (2010):
303-338. [PDF]
● Deng, Jia, et al. "Imagenet: A large-scale hierarchical image database." Computer Vision and Pattern Recognition, 2009. CVPR 2009.
IEEE Conference on. IEEE, 2009. [PDF]
● Russakovsky, Olga, et al. "Imagenet Large Scale Visual Recognition Challenge." arXiv:1409.0575. [PDF]
● Lin, Yuanqing, et al. "Large-scale image classification: fast feature extraction and SVM training."
● Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on. IEEE, 2011. [PDF]
● Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep
● convolutional neural networks." Advances in neural information processing systems. 2012. [PDF]
● Szegedy, Christian, et al. "Going deeper with convolutions." arXiv preprint arXiv:1409.4842 (2014).
● [PDF]
● Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014). [PDF]
● He, Kaiming, et al. "Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition." arXiv preprint arXiv:1406.4729
(2014). [PDF]
● LeCun, Yann, et al. "Gradient-based learning applied to document recognition." Proceedings of the IEEE 86.11 (1998): 2278-2324.
[PDF]
● Fei-Fei, Li, et al. "What do we perceive in a glance of a real-world scene?." Journal of vision 7.1 (2007): 10. [PDF]
Reference
16. Confidential
16
• Secure for employees
• Much chipper
• Detect and measure better than human
Construction inspection
17. Confidential
17
• Power consumption
• Dimensions and weight
• Real time operation
• No network connections
For such application we have challenges
• Optimized model
• Special hardware
18. Confidential
18
Limit the number of input
channels by adding an extra 1x1
convolution before the 3x3 and 5x5
convolutions
Factorize 5x5 convolution to two
3x3 convolution operations to
improve computational speed
Inception model - next level of engineering optimization
19. Confidential
19
1. Replace 3x3 filters with 1x1 filters - Fire layer
2. Decrease the number of input channels to 3x3 filters
3. Pooling layer in place of FC layer in the end.
SqueezeNet - 510× smaller than AlexNet
Major principle - use CNN
only where high input exist
21. Confidential
21
Ultra-Low Power with over 1 TOPS
Deep neural network processing unit
VPU architecture which minimizes power by
reducing data movement on-chip
Imaging and vision hardware accelerators
based on VLIW vector processors
16 Programmable 128-bit VLIW Vector
Processors
16 Configurable MIPI Lanes
On-chip memory architecture allows for up to
400 GB/sec of internal bandwidth
Movidius VPU - Vision Processing Unit
25. Confidential
25
The Inference Engine deployment process assumes you used the Model
Optimizer to convert your trained model to an Intermediate Representation.
Deployment Workflow
27. Confidential
27
A summary of the steps for optimizing and deploying a trained model:
• Configure the Model Optimizer for your framework.
- Caffe models
- TensorFlow models
- MXNet models
- ONNX models
- Kaldi models
• Convert a trained model to produce an optimized Intermediate Representation (IR)
- Produce a valid Intermediate Representation. (.xml and .bin)
- Produce an optimized Intermediate Representation. Dropout some layers
• Test the model in the Intermediate Representation format using the Inference Engine
• Integrate the Inference Engine into your application to deploy the model in the target environment.
Module Optimizer
34. Confidential
36
Next Step
Road Map project - Object classificator:
Integrate few Sticks
Robot comes to the toy and plays relevant
sound:
● Cat
● Dog
● Car, etc
+
35. Confidential
37
Embedded Word - March 2019 Nuremberg
Google come to the arena - Coral
USB Accelerator
A USB accessory featuring the Edge TPU that
brings ML inferencing to existing systems.
● Supported OS: Debian Linux
● Compatible with Raspberry Pi boards
● Supported Framework: TensorFlow
Lite
37. Confidential
39
• GCP AI based on Coral
• Only TensorFlow light framework
Coral project
• Three type of pre-trained models:
- Image classification
• MobileNet V1/V2
• Inception V1/V2/V3/V4
- Object detection
• MobileNet v1/v2
- Embedded extractor (Classification)
• MobileNet v1
• Possibility to retrain only lat layer or full network
• Two frequency modes
38. Confidential
40
Real time object detection with Coral Dev Board
Edge TPU Performance Demo
The video demonstrates the real time
processing power of the Edge TPU by running
a MovileNer SSD model that can identify and
classify multiple objects.
The footage of the cars is a recording, but the
MobileNet model is executing in realtime on
CoralDev Board to detect each car included
with a box (limited to 20 detected cars).
40. Confidential
42
Intel:
- Async & Sync calls
- May integrated many
sticks in HUB
- OpenVino library ML
framework independent
solution
- Required OpenVino
installation
- User friendly SDK
- No difference found USB
2/3 for image classification
Compare Intel - Google USB Accelerators
Google:
- 3 time less power
consumption in Standby
mode
- 4 time better
performance with USB 3
- Only TensorFlow light
framework
- Quick training mode with
pretrained model
- Two operation clock
modes
- Nothing to be installed
41. Confidential
43
Image detection video power consumption:
Intel Neural Network Stick 350 mA (1,75 Watt) with 140
ms detection time
Google Coral Stick 60 ma (300 mWatt) with 17 ms
detection time
Power consumption and performance comparison
42. Confidential
44
• Inference at the edge
• Offline Inference
• Minimal latency - Real Real-
Time
• Privacy and security
What it does mean
All tested models were trained using the ImageNet dataset with 1,000 classes and an input size of 224x224, except for Inception v4 which has an input size of 299x299.