Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences

Pipelined Compression in Remote
GPU Virtualization Systems using
rCUDA: Early Experiences
Cristian Peñaranda, Carlos Reaño and Federico Silla
ICPP 2022 DUAC Workshop
August 29, 2022

Introduction
IoT
device
Pros:
Low energy consumption
Cheap devices
Cons:
Low performance
Low bandwidth networks

Introduction
Edge Computing
IoT
device

Introduction
Machine learning applications
Edge devices don’t usually have GPU Solution Remote GPU Virtualization
Main problem

Remote GPU virtualization architecture
Compression System
Slow network

Transfer to
| GPU Memory
Pipeline compression architecture
Data
Data
Chunk
Data
Chunk
… Data
Chunk
Data
Chunk
…
GPU Data
Data
Chunk
Data
Chunk
… Data
Chunk
Data
Chunk
…
… …
… …
… …
… …
… …
… …
… …
… …
… …
Client
Pipeline
Server
Pipeline
Client Side Server Side
|
Compression | Decompression
Compression
Decompression
| Receive
Send
|
Send Receive
Host Memory

Machine learning applications
Alexnet:
Cifar10:
Inception:
Mnist:
Evalues the inference time using Alexnet CNN model.
Uses Cifar10 dataset to evaluate the image classification of a simple CNN.
Evaluates the image classification using LeNet-5-like CNN and Mnist
dataset.
Uses flowers dataset to evaluate the image classification using inception-V3
CNN model.

Compression libraries
Smash: Benchmark of compression libraries
● 41 different lossless compression libraries.
● Different options to conﬁgure compression libraries.
● Available at https://github.com/cpenaranda/smash

Compression libraries
Lz4:
Zlib:
Snappy:
Zstandard (Zstd):
Based on LZ77 focused on fast compression and decompression.
Uses a combination of LZ77 and Huffman coding.
Created by Meta and is based on LZ77 with a combination of a fast Finite
State Entropy and Huffman coding.
Based on LZ77 and created by Google. It is focused on getting a shorter
computation time.
Gipfeli:
FastLZ:
Based on LZ77 and developed by Google. It is focused on getting higher
compression ratios.
An implementation of the LZ77 algorithm for lossless data compression.

Experimental setup
Edge Device
Raspberry Pi 4
Model B
Server Node
Quad core ARM Cortex-A72 64-bit 1.5GHz Intel(R) Xeon(R) CPU E5-2637 v2 3.50GHz
NVIDIA
V100
GPU
Network
10Mbps

Results
- CPU results are better than others except for
Mnist.
- Compression libraries reduce the execution
time between 1 and 6 minutes.

Results
- The [8B-16B[ data size range represents more
than 35% of all data transfers.
- rCUDA is implemented with chunks of 1,024
bytes.
- More than 90% data transfers have a size
between 1 byte and 1,023 bytes.
Compression is done without pipeline

Analysis of data transfers in the range of [8B-16B[
TensorFlow application Number of data transfers Number of data transfers
with different data values
Alexnet 15,218 2,820
Cifar10 33,067 10,479
Mnist 83,665 15,855
Inception 97,346 25,530
- All data transfers have a size of 8 bytes (2^64 possible values).
- TF applications use less than 65,535 different data values (less than 2^16). Data could
therefore be represented by 2 bytes instead of 8 bytes.

Inception
Alexnet Cifar10 Mnist
The data shown is the most repetitive. They have a frequency greater than 0.2%.
- Values could be represented using 1 byte.
- These data represent between 42.69% and 67.98% of all 8-byte data transfers.

TensorFlow
application
Number of data
transfers
Number of data
transfers with
different data
values
Size without
compression
Size with
compression
proposed
Alexnet 15,218 2,820 118.89KB 19.62-23.38KB
Cifar10 33,067 10,479 258.34KB 42.63-50.80KB
Mnist 83,665 15,855 653.63KB 107.87-128.53KB
Inception 97,346 25,530 760.52KB 125.50-149.55KB

Conclusions
- Initial pipelined implementation of on-the-ﬂy data compression using rCUDA.
- We have leveraged four popular machine learning applications.
- This initial implementation is able to reduce the execution time.
- We have pointed out several ways to improve the performance of our pipelined on-the-ﬂy data
compression mechanism.

Contact: cripeace@gap.upv.es
Get a free copy of rCUDA at:
http://www.rcuda.net
Get a free copy of smash at:
https://github.com/cpenaranda/smash
THANK YOU!

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences

Recommended

Recommended

More Related Content

Similar to Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences

Similar to Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences (20)

More from Carlos Reaño González

More from Carlos Reaño González (6)

Recently uploaded

Recently uploaded (20)

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences