SlideShare a Scribd company logo
1 of 18
Download to read offline
Pipelined Compression in Remote
GPU Virtualization Systems using
rCUDA: Early Experiences
Cristian Peñaranda, Carlos Reaño and Federico Silla
ICPP 2022 DUAC Workshop
August 29, 2022
Introduction
IoT
device
Pros:
Low energy consumption
Cheap devices
Cons:
Low performance
Low bandwidth networks
Introduction
Edge Computing
IoT
device
Introduction
Machine learning applications
Edge devices don’t usually have GPU Solution Remote GPU Virtualization
Main problem
Remote GPU virtualization architecture
Compression System
Slow network
Transfer to
| GPU Memory
Compression architecture without pipeline
Data
GPU Data
Client Side Server Side
|
Compression | Decompression
Compression
Decompression
| Receive
Send
|
Send Receive
Host Memory
Transfer to
| GPU Memory
Pipeline compression architecture
Data
Data
Chunk
Data
Chunk
… Data
Chunk
Data
Chunk
…
GPU Data
Data
Chunk
Data
Chunk
… Data
Chunk
Data
Chunk
…
… …
… …
… …
… …
… …
… …
… …
… …
… …
Client
Pipeline
Server
Pipeline
Client Side Server Side
|
Compression | Decompression
Compression
Decompression
| Receive
Send
|
Send Receive
Host Memory
Machine learning applications
Alexnet:
Cifar10:
Inception:
Mnist:
Evalues the inference time using Alexnet CNN model.
Uses Cifar10 dataset to evaluate the image classification of a simple CNN.
Evaluates the image classification using LeNet-5-like CNN and Mnist
dataset.
Uses flowers dataset to evaluate the image classification using inception-V3
CNN model.
Compression libraries
Smash: Benchmark of compression libraries
● 41 different lossless compression libraries.
● Different options to configure compression libraries.
● Available at https://github.com/cpenaranda/smash
Compression libraries
Lz4:
Zlib:
Snappy:
Zstandard (Zstd):
Based on LZ77 focused on fast compression and decompression.
Uses a combination of LZ77 and Huffman coding.
Created by Meta and is based on LZ77 with a combination of a fast Finite
State Entropy and Huffman coding.
Based on LZ77 and created by Google. It is focused on getting a shorter
computation time.
Gipfeli:
FastLZ:
Based on LZ77 and developed by Google. It is focused on getting higher
compression ratios.
An implementation of the LZ77 algorithm for lossless data compression.
Experimental setup
Edge Device
Raspberry Pi 4
Model B
Server Node
Quad core ARM Cortex-A72 64-bit 1.5GHz Intel(R) Xeon(R) CPU E5-2637 v2 3.50GHz
NVIDIA
V100
GPU
Network
10Mbps
Results
- CPU results are better than others except for
Mnist.
- Compression libraries reduce the execution
time between 1 and 6 minutes.
Results
- The [8B-16B[ data size range represents more
than 35% of all data transfers.
- rCUDA is implemented with chunks of 1,024
bytes.
- More than 90% data transfers have a size
between 1 byte and 1,023 bytes.
Compression is done without pipeline
Analysis of data transfers in the range of [8B-16B[
TensorFlow application Number of data transfers Number of data transfers
with different data values
Alexnet 15,218 2,820
Cifar10 33,067 10,479
Mnist 83,665 15,855
Inception 97,346 25,530
- All data transfers have a size of 8 bytes (2^64 possible values).
- TF applications use less than 65,535 different data values (less than 2^16). Data could
therefore be represented by 2 bytes instead of 8 bytes.
Analysis of data transfers in the range of [8B-16B[
Inception
Alexnet Cifar10 Mnist
The data shown is the most repetitive. They have a frequency greater than 0.2%.
- Values could be represented using 1 byte.
- These data represent between 42.69% and 67.98% of all 8-byte data transfers.
Analysis of data transfers in the range of [8B-16B[
TensorFlow
application
Number of data
transfers
Number of data
transfers with
different data
values
Size without
compression
Size with
compression
proposed
Alexnet 15,218 2,820 118.89KB 19.62-23.38KB
Cifar10 33,067 10,479 258.34KB 42.63-50.80KB
Mnist 83,665 15,855 653.63KB 107.87-128.53KB
Inception 97,346 25,530 760.52KB 125.50-149.55KB
Conclusions
- Initial pipelined implementation of on-the-fly data compression using rCUDA.
- We have leveraged four popular machine learning applications.
- This initial implementation is able to reduce the execution time.
- We have pointed out several ways to improve the performance of our pipelined on-the-fly data
compression mechanism.
Contact: cripeace@gap.upv.es
Get a free copy of rCUDA at:
http://www.rcuda.net
Get a free copy of smash at:
https://github.com/cpenaranda/smash
THANK YOU!

More Related Content

Similar to Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences

Insights into the performance and configuration of TCP in Automotive Ethernet...
Insights into the performance and configuration of TCP in Automotive Ethernet...Insights into the performance and configuration of TCP in Automotive Ethernet...
Insights into the performance and configuration of TCP in Automotive Ethernet...RealTime-at-Work (RTaW)
 
Dccp evaluation for sip signaling ict4 m
Dccp evaluation for sip signaling   ict4 m Dccp evaluation for sip signaling   ict4 m
Dccp evaluation for sip signaling ict4 m Agus Awaludin
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP Project
 
Reconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingReconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingAvi Networks
 
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...IJECEIAES
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...Otávio Carvalho
 
PacketCloud: an Open Platform for Elastic In-network Services.
PacketCloud: an Open Platform for Elastic In-network Services. PacketCloud: an Open Platform for Elastic In-network Services.
PacketCloud: an Open Platform for Elastic In-network Services. yeung2000
 
An ethernet based_approach_for_tm_data_analysis_v2
An ethernet based_approach_for_tm_data_analysis_v2An ethernet based_approach_for_tm_data_analysis_v2
An ethernet based_approach_for_tm_data_analysis_v2Priyasloka Arya
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Cellular LPWA and LTE-M
Cellular LPWA and LTE-MCellular LPWA and LTE-M
Cellular LPWA and LTE-MNicolas Damour
 
Lte-m Sierra Wireless V1
Lte-m Sierra Wireless V1Lte-m Sierra Wireless V1
Lte-m Sierra Wireless V1IoT Academy
 
Communication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet NetworkCommunication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet NetworkIJERA Editor
 
Enhancing Network Visibility Based On Open Converged Network Appliance
Enhancing Network Visibility Based On Open Converged Network ApplianceEnhancing Network Visibility Based On Open Converged Network Appliance
Enhancing Network Visibility Based On Open Converged Network ApplianceOpen Networking Summit
 
IXP 23XX Network processor
IXP 23XX Network processorIXP 23XX Network processor
IXP 23XX Network processorYuvaraja Ravi
 
ONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINAONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINAJunho Suh
 
Study and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlayStudy and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlaySatya Prakash Rout
 

Similar to Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences (20)

I010315760
I010315760I010315760
I010315760
 
Insights into the performance and configuration of TCP in Automotive Ethernet...
Insights into the performance and configuration of TCP in Automotive Ethernet...Insights into the performance and configuration of TCP in Automotive Ethernet...
Insights into the performance and configuration of TCP in Automotive Ethernet...
 
Dccp evaluation for sip signaling ict4 m
Dccp evaluation for sip signaling   ict4 m Dccp evaluation for sip signaling   ict4 m
Dccp evaluation for sip signaling ict4 m
 
RECAP: The Simulation Approach
RECAP: The Simulation ApproachRECAP: The Simulation Approach
RECAP: The Simulation Approach
 
Reconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern TroubleshootingReconsider TCPdump for Modern Troubleshooting
Reconsider TCPdump for Modern Troubleshooting
 
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...
Performance Evaluation of UDP, DCCP, SCTP and TFRC for Different Traffic Flow...
 
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
IoT Workload Distribution Impact Between Edge and Cloud Computing in a Smart ...
 
PacketCloud: an Open Platform for Elastic In-network Services.
PacketCloud: an Open Platform for Elastic In-network Services. PacketCloud: an Open Platform for Elastic In-network Services.
PacketCloud: an Open Platform for Elastic In-network Services.
 
An ethernet based_approach_for_tm_data_analysis_v2
An ethernet based_approach_for_tm_data_analysis_v2An ethernet based_approach_for_tm_data_analysis_v2
An ethernet based_approach_for_tm_data_analysis_v2
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Cellular LPWA and LTE-M
Cellular LPWA and LTE-MCellular LPWA and LTE-M
Cellular LPWA and LTE-M
 
Lte-m Sierra Wireless V1
Lte-m Sierra Wireless V1Lte-m Sierra Wireless V1
Lte-m Sierra Wireless V1
 
Communication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet NetworkCommunication Performance Over A Gigabit Ethernet Network
Communication Performance Over A Gigabit Ethernet Network
 
Enhancing Network Visibility Based On Open Converged Network Appliance
Enhancing Network Visibility Based On Open Converged Network ApplianceEnhancing Network Visibility Based On Open Converged Network Appliance
Enhancing Network Visibility Based On Open Converged Network Appliance
 
IXP 23XX Network processor
IXP 23XX Network processorIXP 23XX Network processor
IXP 23XX Network processor
 
ONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINAONS Summit 2017 SKT TINA
ONS Summit 2017 SKT TINA
 
Study and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple PlayStudy and Emulation of 10G-EPON with Triple Play
Study and Emulation of 10G-EPON with Triple Play
 
Vlsics08
Vlsics08Vlsics08
Vlsics08
 
computerNetworkSecurity.ppt
computerNetworkSecurity.pptcomputerNetworkSecurity.ppt
computerNetworkSecurity.ppt
 
Lambda Data Grid
Lambda Data GridLambda Data Grid
Lambda Data Grid
 

More from Carlos Reaño González

vAccel: Interoperable Application Hardware Acceleration
vAccel: Interoperable Application Hardware AccelerationvAccel: Interoperable Application Hardware Acceleration
vAccel: Interoperable Application Hardware AccelerationCarlos Reaño González
 
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPUA Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPUCarlos Reaño González
 
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...Carlos Reaño González
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCarlos Reaño González
 
A framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolutionA framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolutionCarlos Reaño González
 

More from Carlos Reaño González (6)

DUAC 2022 Workshop - Welcome Slides
DUAC 2022 Workshop - Welcome SlidesDUAC 2022 Workshop - Welcome Slides
DUAC 2022 Workshop - Welcome Slides
 
vAccel: Interoperable Application Hardware Acceleration
vAccel: Interoperable Application Hardware AccelerationvAccel: Interoperable Application Hardware Acceleration
vAccel: Interoperable Application Hardware Acceleration
 
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPUA Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU
A Study on Atomics-based Integer Sum Reduction in HIP on AMD GPU
 
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...
Optimizing Hardware Resource Partitioning and Job Allocations on Modern GPUs ...
 
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA CouplingCygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
Cygnus - World First Multi-Hybrid Accelerated Cluster with GPU and FPGA Coupling
 
A framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolutionA framework for low communication approaches for large scale 3D convolution
A framework for low communication approaches for large scale 3D convolution
 

Recently uploaded

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsHyundai Motor Group
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 

Recently uploaded (20)

Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptxVulnerability_Management_GRC_by Sohang Sengupta.pptx
Vulnerability_Management_GRC_by Sohang Sengupta.pptx
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter RoadsSnow Chain-Integrated Tire for a Safe Drive on Winter Roads
Snow Chain-Integrated Tire for a Safe Drive on Winter Roads
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 

Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences

  • 1. Pipelined Compression in Remote GPU Virtualization Systems using rCUDA: Early Experiences Cristian Peñaranda, Carlos Reaño and Federico Silla ICPP 2022 DUAC Workshop August 29, 2022
  • 2. Introduction IoT device Pros: Low energy consumption Cheap devices Cons: Low performance Low bandwidth networks
  • 4. Introduction Machine learning applications Edge devices don’t usually have GPU Solution Remote GPU Virtualization Main problem
  • 5. Remote GPU virtualization architecture Compression System Slow network
  • 6. Transfer to | GPU Memory Compression architecture without pipeline Data GPU Data Client Side Server Side | Compression | Decompression Compression Decompression | Receive Send | Send Receive Host Memory
  • 7. Transfer to | GPU Memory Pipeline compression architecture Data Data Chunk Data Chunk … Data Chunk Data Chunk … GPU Data Data Chunk Data Chunk … Data Chunk Data Chunk … … … … … … … … … … … … … … … … … … … Client Pipeline Server Pipeline Client Side Server Side | Compression | Decompression Compression Decompression | Receive Send | Send Receive Host Memory
  • 8. Machine learning applications Alexnet: Cifar10: Inception: Mnist: Evalues the inference time using Alexnet CNN model. Uses Cifar10 dataset to evaluate the image classification of a simple CNN. Evaluates the image classification using LeNet-5-like CNN and Mnist dataset. Uses flowers dataset to evaluate the image classification using inception-V3 CNN model.
  • 9. Compression libraries Smash: Benchmark of compression libraries ● 41 different lossless compression libraries. ● Different options to configure compression libraries. ● Available at https://github.com/cpenaranda/smash
  • 10. Compression libraries Lz4: Zlib: Snappy: Zstandard (Zstd): Based on LZ77 focused on fast compression and decompression. Uses a combination of LZ77 and Huffman coding. Created by Meta and is based on LZ77 with a combination of a fast Finite State Entropy and Huffman coding. Based on LZ77 and created by Google. It is focused on getting a shorter computation time. Gipfeli: FastLZ: Based on LZ77 and developed by Google. It is focused on getting higher compression ratios. An implementation of the LZ77 algorithm for lossless data compression.
  • 11. Experimental setup Edge Device Raspberry Pi 4 Model B Server Node Quad core ARM Cortex-A72 64-bit 1.5GHz Intel(R) Xeon(R) CPU E5-2637 v2 3.50GHz NVIDIA V100 GPU Network 10Mbps
  • 12. Results - CPU results are better than others except for Mnist. - Compression libraries reduce the execution time between 1 and 6 minutes.
  • 13. Results - The [8B-16B[ data size range represents more than 35% of all data transfers. - rCUDA is implemented with chunks of 1,024 bytes. - More than 90% data transfers have a size between 1 byte and 1,023 bytes. Compression is done without pipeline
  • 14. Analysis of data transfers in the range of [8B-16B[ TensorFlow application Number of data transfers Number of data transfers with different data values Alexnet 15,218 2,820 Cifar10 33,067 10,479 Mnist 83,665 15,855 Inception 97,346 25,530 - All data transfers have a size of 8 bytes (2^64 possible values). - TF applications use less than 65,535 different data values (less than 2^16). Data could therefore be represented by 2 bytes instead of 8 bytes.
  • 15. Analysis of data transfers in the range of [8B-16B[ Inception Alexnet Cifar10 Mnist The data shown is the most repetitive. They have a frequency greater than 0.2%. - Values could be represented using 1 byte. - These data represent between 42.69% and 67.98% of all 8-byte data transfers.
  • 16. Analysis of data transfers in the range of [8B-16B[ TensorFlow application Number of data transfers Number of data transfers with different data values Size without compression Size with compression proposed Alexnet 15,218 2,820 118.89KB 19.62-23.38KB Cifar10 33,067 10,479 258.34KB 42.63-50.80KB Mnist 83,665 15,855 653.63KB 107.87-128.53KB Inception 97,346 25,530 760.52KB 125.50-149.55KB
  • 17. Conclusions - Initial pipelined implementation of on-the-fly data compression using rCUDA. - We have leveraged four popular machine learning applications. - This initial implementation is able to reduce the execution time. - We have pointed out several ways to improve the performance of our pipelined on-the-fly data compression mechanism.
  • 18. Contact: cripeace@gap.upv.es Get a free copy of rCUDA at: http://www.rcuda.net Get a free copy of smash at: https://github.com/cpenaranda/smash THANK YOU!