Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

byteLAKE
byteLAKEAI Solutions for Industries | Automated Quality Inspection | Data Insights | Self-Checkout | byteLAKE.com

The document summarizes byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards. Key takeaway: the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks.

AI on EDGE
GPU VS. VPU
byteLAKE’s basic benchmark results between two different setups
of example edge devices: with NVIDIA GPU and with Intel’s
Movidius cards.
Artificial
Intelligence
HPC
Machine
Learning
Deep Learning
Computer Vision
Edge Intelligence
byteLAKE
pl. Solny 14/3
50-062 Wroclaw, Poland
+48 508 091 885
+48 505 322 282
+1 650 735 2063
www.byteLAKE.com
AI on EDGE: GPU vs. VPU  Jul-18 2
Devices Configuration
Tests were run on two Lenovo’s Tiny PCs.
Tiny#1: Lenovo ThinkCentre M910x Tiny
• CPU: Intel Core i7-7700T vPro
• AI accelerator: 2 x Intel Movidius Myriad 2 VPU
• Memory: 4 GB LPDDR3
• System: Ubuntu 16.04 LTS
Tiny#2: Lenovo ThinkCentre M920x Tiny
• CPU: Intel Core™ i5-8500T
• AI accelerator: NVIDIA Quadro P1000
• Memory: 4 GB GDDR5
• System: Ubuntu 18.04 LTS
Software Configuration:
• Frameworks: Caffe, Tensorflow, OpenCV 3.4
• Drivers:
o Tiny #1: Intel Movidius Neural Compute SDK v1
o Tiny #2: Nvidia GPU Drivers ver. 390.48; CUDA Toolkit 8
AI on EDGE: GPU vs. VPU  Jul-18 3
Test procedure description:
During the course of the studies, we analyzed the performance of two Tiny PCs using the state-of-the-
art YOLO (You Only Look Once) real-time detection model [1]. In both cases we focused on a special
version of the YOLO model, called Tiny YOLO model.
The model consists of a single input layer, 8 convolution layers, 8 batch norm layers, 8 relu layers and
a single full-connected layer. Tiny YOLO is able to recognize objects out of 20 classes, including: aero-
plane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person,
potted plant, sheep, sofa, train and tv-monitor. The size of the pre-trained Tiny YOLO detection model
is 50 MB.
The deep neural net (DNN) used for this study has been implemented using Python Caffe AI framework.
Benchmarks were based on a real-time analysis of the sequence of frames captured from the camera.
Also, these were performed using three configurations of the AI devices, including:
• single Movidius Myriad accelerator enabled (Tiny#1);
• two Movidius Myriad cards enabled (Tiny#1);
• single NVIDIA GPU (Tiny#2).
AI on EDGE: GPU vs. VPU  Jul-18 4
The procedure to assess the overall performance of the above Tiny PC configurations took into account
all steps required to generate resulting movie, including:
• grabbing of the frames from the camera;
• frames preparation;
• forwarding of the images through the deep neural net;
• filtering the results of the analysis;
• drawing the results on the frame;
• presenting the results of the analysis.
In order to ensure objectivity of measurements for all of the configurations, the analysis was performed
for a defined number of frames. At the same time, we assumed two criteria of performance: (i) average
value of Frame Per Second (FPS) factor, and (ii) execution time of AI computations using all above
mentioned configurations.
Figure 1 below presents the method of taking the measurements in details (sample code from a single
Movidius configuration; for others: the method has been implemented in a similar fashion).
[1]. YOLO: Real-Time Object Detection, URL: https://pjreddie.com/darknet/yolo/utm_source=
next.36kr.com
AI on EDGE: GPU vs. VPU  Jul-18 5
Figure 1. Adopted method of performance measurements for a single Movidius Myriad accelerator
AI on EDGE: GPU vs. VPU  Jul-18 6
Results
The tests described above were based on RGB frames grabbed by a Creative Live! Cam Sync USB camera.
The original size of a single frame was 1080 x 720 (HD) pixels but due to the required structure of the
input layer of the YOLO detector, we resized the frames to 448 by 448 RGB pixels.
The benchmarks were carried out for a sequence of 500 frames.
The performance results for different configuration of AI accelerators are presented in Table 1 below.
The average FPS factor was calculated using the following formula:
FPSavg = 500 / Ta
where Ta refers to the time of the overall analysis of 500 frames (as described above).
Table 1. Performance results
1 x Movidius Myriad 2 2 x Movidius Myriad 2 1 x NVIDIA P100 GPU
Time [s] 123.1 69.8 23.3
Average FPS factor 4.05 7.16 21.3
As expected, the best performance results were achieved while using the GPU accelerator.
The execution time of this version for 500 frames took ca. 23 seconds, and it allowed for a processing
with the average frequency of ca. 21 frames. Consequently, a single GPU turned out to be 5.28 times
faster than a single Myriad chip and 2.99 times faster than the configuration with two Movidius
accelerators (at least for the given benchmark procedure).
In the scenario where we enabled both Movidius cards, we developed an approach which allowed for
parallel analysis of frames being grabbed from the camera. In consequence, this version was 1.76
times faster than the version with a single Myriad chip. In the given scenario, a single Intel Movidius
was able to perform only at the rate of ca. 4 FPS whereas a double-Movidius configuration reached
ca. 6 FPS.
AI on EDGE: GPU vs. VPU  Jul-18 7
Conclusions
The results of this study show that using a GPU for objects detection based on YOLO model allows to
analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips
do not provide desired efficiency in the given scenario. However, it still can be successfully used in the
applications where real-time processing is not necessary and near-real-time is enough.
The comparison of both devices is presented in the Table 2 below. Based on the knowledge gained
during this study, we conclude that the advantage of NVIDIA GPU over Intel Movidius VPU is not only
in performance of computations. The GPU allows for both: training of the DNNs and interference
whereas Movidius is designed only for a cooperation with pre-trained models.
Another difference between both accelerators is about their support for various AI
libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and
Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano.
The difference between these two accelerators can also be noticed on the side of the programming
process. In many cases the implementation of an application which uses GPU does not require any
special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support
for GPU computing (both training and interference) out of the box. In Movidius case, however, it is
required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in
the chain.
When comparing both accelerators, another difference is also the area of usage. While the GPU is a
powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators
can be an obstacle in many areas. GPU offers notable high performance of computations (order of few
TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is
a low-power AI solution dedicated for on-device computer vision. The size of device and power
consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security.
Given the context above, here are some additional remarks one might consider when deciding which
accelerator is a better fit for a given design. However, it is important to emphasize that the comparison
of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that
these two are meant for different tasks. Therefore looking at these only thru the perspective of the
performance benchmarking results might be misleading. To properly choose between Movidius and
NVIDIA GPU one should foremost take into account the intended application rather than the
performance benchmark results only. Movidius is primarily designed to execute the AI workloads based
on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it
really depends whether the planned device is to work in execute-only-mode or be capable of
updating/re-training its models (brains) as well. And of course these make sense as long as we are
talking of executing such tasks within a reasonable time frame.
AI on EDGE: GPU vs. VPU  Jul-18 8
Table 2. The comparison of Nvidia GPU and Intel Movidius VPU
INTEL MOVIDIUS NVIDIA GPU
FOR INFERENCING YES YES
FOR TRAINING NO YES
AI FRAMEWORKS CAFFE / TENSORFLOW CAFE/TENSORFLOW/CUDNN
and more...
MAX MODEL SIZE 320 MB No limit
EASY TO CODE? Except knowledge about AI
framework/library,
programmers need to learn
Movidius programming SDK.
Programming AI applications
requires knowledge about
utilized library/framework, eg.:
Caffe or Tensorflow.
FORM FACTOR Small (i.e. mobile, IoT) medium+
POWER CONSUMPTION Low, ~1W medium+
HEATING + -
CAN WORK OFFLINE Yes Yes
MAIN PURPOSE Classification and recognition of
objects
General AI
OS Ubuntu 16.04, Raspberry Pi 3
Raspbian Stretch
As long as the drivers are
available (Windows, Linux)
COMPUTATIONAL POWER 150 GFlops Very high, TFlops and higher
OTHER Imaging/vision accelerators
included (12 specialized vector
VLIW processors (SHAVEs) +
2*RISC processors).
ARITHMETIC 8/16/32 integer, 16/32 floating
point
all
PRICE TAG <$80 $100+
AI on EDGE: GPU vs. VPU  Jul-18 9
Thank you!
Contact us at: welcome@byteLAKE.com
AI on EDGE: GPU vs. VPU  Jul-18 10
Learn how we work:
Listen Actively
We start with a consultancy
session to better understand our
client’s requirements &
assumptions.
1 2
Suggest
We thoroughly analyze the
gathered information and
prepare a draft offer.
3
Agree
We fine tune the offer further
and wrap up everything into a
binding contract.
4
Deliver
Finally, the execution starts. We
deliver projects in a fully
transparent, Agile (SCRUM-
based) fashion.
AI on EDGE: GPU vs. VPU  Jul-18 11
We build Artificial Intelligence
software and integrate that into
products.
We port and optimize algorithms
for parallel, CPU+GPU HPC
architectures.
We deploy AI on data centers, the
cloud and constrained, embedded
devices (AI on Edge).
byteLAKE
www.byteLAKE.com
We are specialists in:
Helping companies transform
for the era of Artificial Intelligence.
We are a team of scientists, programmers, designers
and technology enthusiasts helping industries incorporate
AI techniques into products.
Machine Learning
Deep Learning
Computer Vision
High Performance Computing
Heterogeneous Computing
Edge Intelligence

Recommended

O Caboclo Boiadeiro: O Ser dos pastos sujos por Itamar Pereira de Aguiar by
O Caboclo Boiadeiro: O Ser dos pastos sujos por Itamar Pereira de AguiarO Caboclo Boiadeiro: O Ser dos pastos sujos por Itamar Pereira de Aguiar
O Caboclo Boiadeiro: O Ser dos pastos sujos por Itamar Pereira de AguiarCarreiro de Tropa
1K views25 slides
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS by
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONSA SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONS
A SURVEY ON GPU SYSTEM CONSIDERING ITS PERFORMANCE ON DIFFERENT APPLICATIONScseij
597 views9 slides
Image Processing Application on Graphics processors by
Image Processing Application on Graphics processorsImage Processing Application on Graphics processors
Image Processing Application on Graphics processorsCSCJournals
189 views7 slides
Graphics Processing Unit: An Introduction by
Graphics Processing Unit: An IntroductionGraphics Processing Unit: An Introduction
Graphics Processing Unit: An Introductionijtsrd
53 views3 slides
GPGPU programming with CUDA by
GPGPU programming with CUDAGPGPU programming with CUDA
GPGPU programming with CUDASavith Satheesh
553 views22 slides
Graphics processing unit ppt by
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit pptSandeep Singh
27.6K views19 slides

More Related Content

Similar to Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

Introduction to Software Defined Visualization (SDVis) by
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)Intel® Software
1.1K views45 slides
Enabling Artificial Intelligence - Alison B. Lowndes by
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
894 views57 slides
Dell NVIDIA AI Powered Transformation Webinar by
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation WebinarBill Wong
116 views44 slides
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ... by
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Intel® Software
1.2K views34 slides
Apu fc & s project by
Apu fc & s projectApu fc & s project
Apu fc & s projectNeelesh Vaish
497 views21 slides
Backend.AI Technical Introduction (19.09 / 2019 Autumn) by
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Lablup Inc.
329 views38 slides

Similar to Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius(20)

Introduction to Software Defined Visualization (SDVis) by Intel® Software
Introduction to Software Defined Visualization (SDVis)Introduction to Software Defined Visualization (SDVis)
Introduction to Software Defined Visualization (SDVis)
Intel® Software1.1K views
Enabling Artificial Intelligence - Alison B. Lowndes by WithTheBest
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
WithTheBest894 views
Dell NVIDIA AI Powered Transformation Webinar by Bill Wong
Dell NVIDIA AI Powered Transformation WebinarDell NVIDIA AI Powered Transformation Webinar
Dell NVIDIA AI Powered Transformation Webinar
Bill Wong116 views
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ... by Intel® Software
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Tuning For Deep Learning Inference with Intel® Processor Graphics | SIGGRAPH ...
Intel® Software1.2K views
Backend.AI Technical Introduction (19.09 / 2019 Autumn) by Lablup Inc.
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Lablup Inc.329 views
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA... by Stefano Di Carlo
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Stefano Di Carlo107 views
Harnessing the virtual realm for successful real world artificial intelligence by Alison B. Lowndes
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
Alison B. Lowndes150 views
Stream Processing by arnamoy10
Stream ProcessingStream Processing
Stream Processing
arnamoy10666 views
Enhanced Human Computer Interaction using hand gesture analysis on GPU by maheshkha
Enhanced Human Computer Interaction using hand gesture analysis on GPUEnhanced Human Computer Interaction using hand gesture analysis on GPU
Enhanced Human Computer Interaction using hand gesture analysis on GPU
maheshkha203 views
Accelerating Real Time Applications on Heterogeneous Platforms by IJMER
Accelerating Real Time Applications on Heterogeneous PlatformsAccelerating Real Time Applications on Heterogeneous Platforms
Accelerating Real Time Applications on Heterogeneous Platforms
IJMER383 views
Accelerate Your AI Today by DESMOND YUEN
Accelerate Your AI TodayAccelerate Your AI Today
Accelerate Your AI Today
DESMOND YUEN47 views
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ... by Intel® Software
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Enhance and Accelerate Your AI and Machine Learning Solution | SIGGRAPH 2019 ...
Intel® Software1K views
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture by mohamedragabslideshare
Revisiting Co-Processing for Hash Joins on the CoupledCpu-GPU ArchitectureRevisiting Co-Processing for Hash Joins on the CoupledCpu-GPU Architecture
Revisiting Co-Processing for Hash Joins on the Coupled Cpu-GPU Architecture
Hybrid CPU GPU MATLAB Image Processing Benchmarking by Dimitris Vayenas
Hybrid CPU GPU MATLAB Image Processing BenchmarkingHybrid CPU GPU MATLAB Image Processing Benchmarking
Hybrid CPU GPU MATLAB Image Processing Benchmarking
Dimitris Vayenas6.2K views
Computer-Vision_Integrating-Technology_MOB_17.06.16 by Schuyler Kennedy
Computer-Vision_Integrating-Technology_MOB_17.06.16Computer-Vision_Integrating-Technology_MOB_17.06.16
Computer-Vision_Integrating-Technology_MOB_17.06.16
Schuyler Kennedy176 views

More from byteLAKE

byteLAKE's expertise across NVIDIA architectures and configurations by
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE
3 views23 slides
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ... by
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...byteLAKE
32 views23 slides
Empowering Industries with byteLAKE's High-Performance AI by
Empowering Industries with byteLAKE's High-Performance AIEmpowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AIbyteLAKE
47 views33 slides
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE) by
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)byteLAKE
25 views33 slides
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020) by
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)byteLAKE
1.4K views8 slides
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning) by
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)byteLAKE
728 views35 slides

More from byteLAKE(12)

byteLAKE's expertise across NVIDIA architectures and configurations by byteLAKE
byteLAKE's expertise across NVIDIA architectures and configurationsbyteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE's expertise across NVIDIA architectures and configurations
byteLAKE3 views
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ... by byteLAKE
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
CFD Suite (AI-accelerated CFD) - Sztuczna Inteligencja Przyspiesza Symulacje ...
byteLAKE32 views
Empowering Industries with byteLAKE's High-Performance AI by byteLAKE
Empowering Industries with byteLAKE's High-Performance AIEmpowering Industries with byteLAKE's High-Performance AI
Empowering Industries with byteLAKE's High-Performance AI
byteLAKE47 views
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE) by byteLAKE
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)
Automatyczny Monitoring Jakości w Fabryce (Sztuczna Inteligencja, byteLAKE)
byteLAKE25 views
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020) by byteLAKE
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)
Sztuczna Inteligencja dla Biznesu (Made In Wroclaw 2020)
byteLAKE1.4K views
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning) by byteLAKE
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
AI for Manufacturing (Machine Vision, Edge AI, Federated Learning)
byteLAKE728 views
byteLAKE's Alveo FPGA Solutions by byteLAKE
byteLAKE's Alveo FPGA SolutionsbyteLAKE's Alveo FPGA Solutions
byteLAKE's Alveo FPGA Solutions
byteLAKE322 views
CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC work... by byteLAKE
CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC work...CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC work...
CFD Acceleration with FPGA (byteLAKE's & Xilinx's presentation from H2RC work...
byteLAKE255 views
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019) by byteLAKE
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
CFD acceleration with FPGA (byteLAKE's presentation from PPAM 2019)
byteLAKE245 views
byteLAKE and Lenovo presenting Federated Learning at MWC 2019 by byteLAKE
byteLAKE and Lenovo presenting Federated Learning at MWC 2019byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE484 views
byteLAKE's Edge AI by byteLAKE
byteLAKE's Edge AIbyteLAKE's Edge AI
byteLAKE's Edge AI
byteLAKE107 views
AI optimizing HPC simulations (presentation from 6th EULAG Workshop) by byteLAKE
AI optimizing HPC simulations (presentation from  6th EULAG Workshop)AI optimizing HPC simulations (presentation from  6th EULAG Workshop)
AI optimizing HPC simulations (presentation from 6th EULAG Workshop)
byteLAKE182 views

Benchmark of common AI accelerators: NVIDIA GPU vs. Intel Movidius

  • 1. AI on EDGE GPU VS. VPU byteLAKE’s basic benchmark results between two different setups of example edge devices: with NVIDIA GPU and with Intel’s Movidius cards. Artificial Intelligence HPC Machine Learning Deep Learning Computer Vision Edge Intelligence byteLAKE pl. Solny 14/3 50-062 Wroclaw, Poland +48 508 091 885 +48 505 322 282 +1 650 735 2063 www.byteLAKE.com
  • 2. AI on EDGE: GPU vs. VPU  Jul-18 2 Devices Configuration Tests were run on two Lenovo’s Tiny PCs. Tiny#1: Lenovo ThinkCentre M910x Tiny • CPU: Intel Core i7-7700T vPro • AI accelerator: 2 x Intel Movidius Myriad 2 VPU • Memory: 4 GB LPDDR3 • System: Ubuntu 16.04 LTS Tiny#2: Lenovo ThinkCentre M920x Tiny • CPU: Intel Core™ i5-8500T • AI accelerator: NVIDIA Quadro P1000 • Memory: 4 GB GDDR5 • System: Ubuntu 18.04 LTS Software Configuration: • Frameworks: Caffe, Tensorflow, OpenCV 3.4 • Drivers: o Tiny #1: Intel Movidius Neural Compute SDK v1 o Tiny #2: Nvidia GPU Drivers ver. 390.48; CUDA Toolkit 8
  • 3. AI on EDGE: GPU vs. VPU  Jul-18 3 Test procedure description: During the course of the studies, we analyzed the performance of two Tiny PCs using the state-of-the- art YOLO (You Only Look Once) real-time detection model [1]. In both cases we focused on a special version of the YOLO model, called Tiny YOLO model. The model consists of a single input layer, 8 convolution layers, 8 batch norm layers, 8 relu layers and a single full-connected layer. Tiny YOLO is able to recognize objects out of 20 classes, including: aero- plane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train and tv-monitor. The size of the pre-trained Tiny YOLO detection model is 50 MB. The deep neural net (DNN) used for this study has been implemented using Python Caffe AI framework. Benchmarks were based on a real-time analysis of the sequence of frames captured from the camera. Also, these were performed using three configurations of the AI devices, including: • single Movidius Myriad accelerator enabled (Tiny#1); • two Movidius Myriad cards enabled (Tiny#1); • single NVIDIA GPU (Tiny#2).
  • 4. AI on EDGE: GPU vs. VPU  Jul-18 4 The procedure to assess the overall performance of the above Tiny PC configurations took into account all steps required to generate resulting movie, including: • grabbing of the frames from the camera; • frames preparation; • forwarding of the images through the deep neural net; • filtering the results of the analysis; • drawing the results on the frame; • presenting the results of the analysis. In order to ensure objectivity of measurements for all of the configurations, the analysis was performed for a defined number of frames. At the same time, we assumed two criteria of performance: (i) average value of Frame Per Second (FPS) factor, and (ii) execution time of AI computations using all above mentioned configurations. Figure 1 below presents the method of taking the measurements in details (sample code from a single Movidius configuration; for others: the method has been implemented in a similar fashion). [1]. YOLO: Real-Time Object Detection, URL: https://pjreddie.com/darknet/yolo/utm_source= next.36kr.com
  • 5. AI on EDGE: GPU vs. VPU  Jul-18 5 Figure 1. Adopted method of performance measurements for a single Movidius Myriad accelerator
  • 6. AI on EDGE: GPU vs. VPU  Jul-18 6 Results The tests described above were based on RGB frames grabbed by a Creative Live! Cam Sync USB camera. The original size of a single frame was 1080 x 720 (HD) pixels but due to the required structure of the input layer of the YOLO detector, we resized the frames to 448 by 448 RGB pixels. The benchmarks were carried out for a sequence of 500 frames. The performance results for different configuration of AI accelerators are presented in Table 1 below. The average FPS factor was calculated using the following formula: FPSavg = 500 / Ta where Ta refers to the time of the overall analysis of 500 frames (as described above). Table 1. Performance results 1 x Movidius Myriad 2 2 x Movidius Myriad 2 1 x NVIDIA P100 GPU Time [s] 123.1 69.8 23.3 Average FPS factor 4.05 7.16 21.3 As expected, the best performance results were achieved while using the GPU accelerator. The execution time of this version for 500 frames took ca. 23 seconds, and it allowed for a processing with the average frequency of ca. 21 frames. Consequently, a single GPU turned out to be 5.28 times faster than a single Myriad chip and 2.99 times faster than the configuration with two Movidius accelerators (at least for the given benchmark procedure). In the scenario where we enabled both Movidius cards, we developed an approach which allowed for parallel analysis of frames being grabbed from the camera. In consequence, this version was 1.76 times faster than the version with a single Myriad chip. In the given scenario, a single Intel Movidius was able to perform only at the rate of ca. 4 FPS whereas a double-Movidius configuration reached ca. 6 FPS.
  • 7. AI on EDGE: GPU vs. VPU  Jul-18 7 Conclusions The results of this study show that using a GPU for objects detection based on YOLO model allows to analyze data in real-time. At the same time, single Intel Movidius as well as two Intel Movidius chips do not provide desired efficiency in the given scenario. However, it still can be successfully used in the applications where real-time processing is not necessary and near-real-time is enough. The comparison of both devices is presented in the Table 2 below. Based on the knowledge gained during this study, we conclude that the advantage of NVIDIA GPU over Intel Movidius VPU is not only in performance of computations. The GPU allows for both: training of the DNNs and interference whereas Movidius is designed only for a cooperation with pre-trained models. Another difference between both accelerators is about their support for various AI libraries/frameworks. While Movidius provides support for two popular frameworks (Caffe and Tensorflow), GPU supports more AI libraries, eg.: cuDNN or Theano. The difference between these two accelerators can also be noticed on the side of the programming process. In many cases the implementation of an application which uses GPU does not require any special knowledge about the accelerator itself. Most of the AI frameworks provide a built-in support for GPU computing (both training and interference) out of the box. In Movidius case, however, it is required to gain knowledge about its SDK as well. It is not a painful process but still yet another tool in the chain. When comparing both accelerators, another difference is also the area of usage. While the GPU is a powerful accelerator for AI computations, electricity consumption and size of this kind of accelerators can be an obstacle in many areas. GPU offers notable high performance of computations (order of few TFlops or more), however it is usually dedicated for HPC solutions. At the same time, Intel Movidius is a low-power AI solution dedicated for on-device computer vision. The size of device and power consumption makes it attractive for many usages, eg: IoT solutions, drones or smart security. Given the context above, here are some additional remarks one might consider when deciding which accelerator is a better fit for a given design. However, it is important to emphasize that the comparison of Movidius and NVIDIA as two competing accelerators for AI workloads leads to a conclusion that these two are meant for different tasks. Therefore looking at these only thru the perspective of the performance benchmarking results might be misleading. To properly choose between Movidius and NVIDIA GPU one should foremost take into account the intended application rather than the performance benchmark results only. Movidius is primarily designed to execute the AI workloads based on trained models (inference). NVIDIA’s GPU on the other hand can do these plus training. Therefore it really depends whether the planned device is to work in execute-only-mode or be capable of updating/re-training its models (brains) as well. And of course these make sense as long as we are talking of executing such tasks within a reasonable time frame.
  • 8. AI on EDGE: GPU vs. VPU  Jul-18 8 Table 2. The comparison of Nvidia GPU and Intel Movidius VPU INTEL MOVIDIUS NVIDIA GPU FOR INFERENCING YES YES FOR TRAINING NO YES AI FRAMEWORKS CAFFE / TENSORFLOW CAFE/TENSORFLOW/CUDNN and more... MAX MODEL SIZE 320 MB No limit EASY TO CODE? Except knowledge about AI framework/library, programmers need to learn Movidius programming SDK. Programming AI applications requires knowledge about utilized library/framework, eg.: Caffe or Tensorflow. FORM FACTOR Small (i.e. mobile, IoT) medium+ POWER CONSUMPTION Low, ~1W medium+ HEATING + - CAN WORK OFFLINE Yes Yes MAIN PURPOSE Classification and recognition of objects General AI OS Ubuntu 16.04, Raspberry Pi 3 Raspbian Stretch As long as the drivers are available (Windows, Linux) COMPUTATIONAL POWER 150 GFlops Very high, TFlops and higher OTHER Imaging/vision accelerators included (12 specialized vector VLIW processors (SHAVEs) + 2*RISC processors). ARITHMETIC 8/16/32 integer, 16/32 floating point all PRICE TAG <$80 $100+
  • 9. AI on EDGE: GPU vs. VPU  Jul-18 9 Thank you! Contact us at: welcome@byteLAKE.com
  • 10. AI on EDGE: GPU vs. VPU  Jul-18 10 Learn how we work: Listen Actively We start with a consultancy session to better understand our client’s requirements & assumptions. 1 2 Suggest We thoroughly analyze the gathered information and prepare a draft offer. 3 Agree We fine tune the offer further and wrap up everything into a binding contract. 4 Deliver Finally, the execution starts. We deliver projects in a fully transparent, Agile (SCRUM- based) fashion.
  • 11. AI on EDGE: GPU vs. VPU  Jul-18 11 We build Artificial Intelligence software and integrate that into products. We port and optimize algorithms for parallel, CPU+GPU HPC architectures. We deploy AI on data centers, the cloud and constrained, embedded devices (AI on Edge). byteLAKE www.byteLAKE.com We are specialists in: Helping companies transform for the era of Artificial Intelligence. We are a team of scientists, programmers, designers and technology enthusiasts helping industries incorporate AI techniques into products. Machine Learning Deep Learning Computer Vision High Performance Computing Heterogeneous Computing Edge Intelligence