Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Benchmark
NVIDIA VS. AMD
byteLAKE’s Computer Vision benchmark results between two edge
devices: powered by NVIDIA vs. AMD ...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 2
Specification of Target Platforms
Hardware Configuration
Tes...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 3
(including the latest ones) caused numerous problems during ...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 4
Figure 1. Adopted method of measurements used to evaluate th...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 5
Performance Results
The above presented tests were based on ...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 6
23.62 for NVIDIA and AMD, respectively. Figure 4 presents th...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 7
Table 2 illustrates the performance comparison of the GPU de...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 8
Thank you!
Contact us: welcome@byteLAKE.com
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 9
Learn how we work:
Listen Actively
We start with a consultan...
Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 10
We build AI and HPC solutions.
Focusing on software.
We use...
Upcoming SlideShare
Loading in …5
×

Benchmark of AI edge devices: powered by NVIDIA Quadro P1000 vs. AMD industry customized working sample GfX

98 views

Published on

The document summarizes byteLAKE’s benchmark results for two alternative setups of AI edge devices. First one was equipped with NVIDIA Quadro P1000. The other: AMD industry customized working sample GfX.

Key takeaway: NVIDIA not only offers a better performance but thanks to a much wider software ecosystem, it is also supported by more frameworks. It will be very interesting to see how AMD sample develops further and enters the AI and Computer Vision ecosystem in particular.

Published in: Devices & Hardware
  • Be the first to comment

  • Be the first to like this

Benchmark of AI edge devices: powered by NVIDIA Quadro P1000 vs. AMD industry customized working sample GfX

  1. 1. Benchmark NVIDIA VS. AMD byteLAKE’s Computer Vision benchmark results between two edge devices: powered by NVIDIA vs. AMD industry customized working sample GfX. Artificial Intelligence Machine Learning Computer Vision Intelligent Devices Cognitive Process Automation RPA HPC Optimization Federated Learning for IoT byteLAKE Europe & USA +48 508 091 885 +48 505 322 282 +1 650 735 2063
  2. 2. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 2 Specification of Target Platforms Hardware Configuration Tests were run on two edge devices. The parameters of these platforms are presented below. with NVIDIA GPU • CPU: Intel Core™ i5-8500T • GPU accelerator: NVIDIA Quadro P1000 • Main memory: 4 GB GDDR5 • System: Ubuntu 18.04 LTS • Drivers: CUDA 10.1, cuDNN 7.5, NVIDIA Graphic Drivers 410.104 with AMD GPU • CPU: Intel Core™ i5-8600T • GPU accelerator: AMD industry customized working sample GfX • Main memory: 8 GB GDDR5 • System: Ubuntu 18.04 LTS • Drivers: AMDGPU-PRO Driver 19.10 Software Configuration OpenCV library and Caffe AI frameworks have been installed on both platforms. We used the OpenCV 3.4.4 compiled and installed from the source code (https://github.com/opencv/opencv/archive/3.4.4.zip). In case of the Caffe framework, for the NVIDIA- based platform we used caffe-cuda implementation, while for the AMD-based platform we utilized caffe-opencl version. In both cases, Caffe framework was downloaded from the official BVLC GitHub repository (https://github.com/BVLC/caffe). Remark: software installation During this work, we met several problems with software installation and configuration. These issues were notable especially for the AMD GPU. While in the case of NVIDIA-based platform, both installation and configuration of the necessary software components (CUDA, cuDNN, etc.) were straightforward, for the device equipped with AMD GPU, however, the setup of the necessary software component was quite a challenge. The first problem for AMD-based platform was observed during installation of the graphics drivers. Actually, it is hard to find the driver dedicated for AMD industry customized working sample GfX on the AMD website. In our case, it was necessary to install several versions of the drives in order to find the correct one. We considered both: the official and open source drivers. In fact, most of the official drivers
  3. 3. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 3 (including the latest ones) caused numerous problems during the system booting process. The advices of users found in the internet (e.g. stackoverflow) was not useful for these issues. At the same time, the open-source drivers did not work good enough (i.e. there were problems with utilization of the GPU device for AI). Other problem is related to AI frameworks. The number of frameworks that work with AMD GPUs is lower when compared to NVIDIA GPUs. Moreover, the installation process is more complicated. We observed such problem during the trial of installation of the Caffe framework from the AMD GitHub repository. In fact, there were the problems with the availability of additional software necessary to installation. As a result, we decided to install the Caffe framework using BVLC GitHub repository. Description of Test Procedures During the course of the studies, we analyzed the performance using the state-of-the-art YOLO (You Only Look Once) real-time detection model [1]. In both cases we focused on a special smaller version of the YOLO network, called Tiny YOLO. The model consists of a single input layer, 8 convolution layers, 8 batch norm layers, 8 relu layers and a single full-connected layer. Tiny YOLO is able to recognize objects out of 20 classes, including: aero- plane, bicycle, bird, boat, bottle, bus, car, cat, chair, cow, dining table, dog, horse, motorbike, person, potted plant, sheep, sofa, train and tv-monitor. The size of the pre-trained Tiny YOLO model is 50 MB. The deep neural net (DNN) used for this study has been implemented using Python Caffe AI framework. For the purpose of this work, we have developed two test scenarios. The first case assesses the overall performance of the whole platform (procedure #1), while the second one evaluates only the performance of the GPU accelerators (procedure #2). Detailed description of the procedures is described in the next subsections. Procedure #1 This procedure assesses the overall performance of the above described configurations and takes into account all steps required to generate resulting movie, including: • reading frames of the movie; • frames preparation; • forwarding of the images through the Tiny YOLO net; • filtering the results of the analysis; • drawing the results on the frame; • presenting the results of the analysis. Figure 1 presents the Python implementation of the above measurement method. In order to ensure objectivity of the measurements for all of the configurations, the analysis was performed for a defined number of frames. At the same time, we assumed two criteria of performance: (i) execution time of AI computations using all above mentioned configurations and (ii) average value of Frame Per Second (FPS) factor.
  4. 4. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 4 Figure 1. Adopted method of measurements used to evaluate the performance of the platform as a whole Procedure #2 The second procedure, on the other hand, evaluates the performance of each GPU accelerator itself. It bases on the measurement of the total time required to forwarding of the images through the Tiny YOLO net. Like in the previous scenario, the test is executed for a defined number of frames. The source code of this method is presented on Figure 2. Figure 2. The method used to assess the performance of the GPU accelerator only
  5. 5. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 5 Performance Results The above presented tests were based on RGB frames read form the movie. The original size of a single frame was 1920 × 1080 pixels (Full HD), but due to the required structure of the input layer of the Tiny YOLO detector, the resize of frames to 448 by 448 RGB pixels was necessary. The benchmarks were executed for a sequence of 500 frames. Figure 3 presents the resulting frame of analyzed movie. Figure 3. Single frame of movie analyzed using Tiny-YOLO DNN Table 1 presents the performance results obtained for the first testing scenario. The average FPS factor was calculated using the following formula: FPSavg = 500 / Ta where Ta refers to the time of the overall analysis of 500 frames (as described above). To ensure the reliability of the performance results, the measurements of the execution time were repeated r = 10 times, and the median values are used finally. Table 1. Performance results obtained for the whole platform NVIDIA GPU AMD GPU Analysis time [s] 14.54 21.16 Average FPS factor [FPS/s] 34.39 23.62 When analyzing the results shown in Table 1, we can observe that NVIDIA-based platform provides higher performance of the computations. The total time necessary to analyze the sequence of 500 frames is equal to 14.54 seconds for device with NVIDIA GPU. At the same time, the analysis of the movie using the device equipped with AMD GPU is performed 1.46 times longer. The execution time in this case is equal to 21.16 seconds. Average FPS factor achieved for the platforms is equal to 34.39 and
  6. 6. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 6 23.62 for NVIDIA and AMD, respectively. Figure 4 presents the screenshots of the system terminals showing the achieved performance and information about devices. Figure 4. The screenshots of the system terminals of PCs equipped with: a) NVIDIA GPU and b) AMD GPU, respectively
  7. 7. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 7 Table 2 illustrates the performance comparison of the GPU devices. The achieved results show that NVIDIA GPU of provides higher performance of inferencing. The time necessary to analyze the sequence of 500 frames is equal to 7.36 seconds. At the same time, the inferencing performance with AMD GPU of takes 11.86 seconds. Table 2. The comparison of the performance of GPU devices NVIDIA GPU AMD GPU Inferencing time [s] 7.36 11.86 When it comes to the noise level, we observed that its level is similar for both and it is not disruptive. Conclusions The results of this study show that utilization of NVIDIA GPU for objects detection based on Tiny-YOLO model gives a possibility to analyze data in real-time. At the same time, the performance of AMD industry customized working sample GfX is about 40% less in comparison with NVIDIA Quadro P1000. However, considering the achieved average FPS factor for AMD, we can conclude that it offers near- real-time processing. Based on the knowledge gained during this study, we can conclude also that the difference between both devices is not only the performance of computations. This difference is associated with the software installation and configuration. In the case of the NVIDIA GPUs, there is a wide range of software that is quite easy to use, while for the AMD GPUs the software setup process can cause hassle. Another difference between both devices is about their support for various AI libraries/frameworks. While most of the AI frameworks are implemented in CUDA programing model in order to ensure support for NVIDIA GPUs, usually it is hard to find official and stable OpenCL implementation of these frameworks that could work with the AMD devices. An example of such framework is Darknet framework which implements YOLO object detectors [2]. References [1] YOLO: Real-Time Object Detection, URL: https://pjreddie.com/darknet/yolo/utm_source= next.36kr.com [2] Darknet: Open Source Neural Networks in C, URL: https://pjreddie.com/darknet/
  8. 8. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 8 Thank you! Contact us: welcome@byteLAKE.com
  9. 9. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 9 Learn how we work: Listen Actively We start with a consultancy session to better understand our client’s requirements & assumptions. 1 2 Suggest We thoroughly analyze the gathered information and prepare a draft offer. 3 Agree We fine tune the offer further and wrap up everything into a binding contract. 4 Deliver Finally, the execution starts. We deliver projects in a fully transparent, Agile (SCRUM- based) fashion.
  10. 10. Benchmark: NVIDIA vs. AMD variants of edge devices  Jul-19 10 We build AI and HPC solutions. Focusing on software. We use machine/ deep learning to bring automation and optimize operations in businesses across various industries. We create highly optimized software for supercomputers. Our researchers hold PhD and DSc degrees. byteLAKE www.byteLAKE.com • AI (highly optimized AI engines to analyze text, image, video, time series data) • HPC (highly optimized apps and kernels for HPC architectures) Building solutions for real-life business problems

×