Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering

Ole Wegen, Matthias Trapp, Jürgen Döllner
Hasso Plattner Institute, Faculty of Digital Engineering,
University of Potsdam, Germany
Sebastian Pasewaldt
Digital Masterpieces GmbH
Germany
In Cooperation with:
Funded by:

Image processing is a common task
Observations:
1. Increased usage of mobile hardware
2. Increase in cloud-processing capabilities and infrastructure
3. Increase of network throughput
Service-based provisioning of image processing functionality
for web-based or mobile applications:
▪ Accessible from anywhere
▪ Useable with any device (hardware independent)

Important aspects to account for:
1. Processing implementations often rely on hardware-acceleration (e.g. GPUs)
2. Scalability as a crucial factor when using a service-based approach
These requirements are often accompanied with high financial costs:
• Dedicated server hardware w.r.t. specs
• Limited availability
Software rendering (SWR) can reduce these costs:
• Rendering performed entirely on the CPU (affordable)
• Enables execution of GPU-based programs on hosts without GPU support

SWR was investigated for:
▪ Ma and Parker, 2001:
Visualizing large-scale datasets
▪ Mileff and Dudra, 2013:
Texture rendering
▪ Hayashi et al., 2018:
In-situ visualization for volume rendering system
Contribution A: SWR for Service-based image processing
Contribution B: Performance comparison between dedicated GPU server vs. SWR server:
▪ Main classes of image processing techniques
(point and neighbourhood, single vs. multipass)
▪ Different image resolutions
▪ Various server configurations

Mesa3D:
▪ 3D Graphics Library implementing
graphics API specifications
▪ Supports OpenGL, Vulkan and others
▪ Commonly used for SWR
Gallium3D:
▪ API to support driver development
▪ Abstracting from graphics API
▪ Abstracting from operating system
Available Gallium3D Driver:
▪ softpipe
▪ LLVMpipe
▪ openSWR
https://en.wikipedia.org/wiki/Mesa_(computer_graphics)

Test machine:
▪ Intel Core i5-8400
▪ 6 Cores at 2.8 GHz
▪ 16 GB DDR4 RAM
Operation: Morphological Closing (Kernel size 3)

Two Docker Containers:
▪ The instance of image processor
▪ A NodeJS server exposing a REST [Winkler and Schlesiger, 2013] interface for communication
▪ Communication between these containers through WebSockets

▪ Processing Techniques:
• Color Invert (A)
• Point-based
• Single Pass
• Morphological Closing (B)
• Neighbourhood-based
• Separated Passes
• Tested kernel sizes: 3, 14, 90
• Oilpaint (C)
• Multipass, Neighbourhood-based
• # Passes: 18
▪ Tested spatial resolutions:
• 1280 x 720 (HD)
• 1920 x 1080 (FHD)
• 2560 x 1440 (QHD)
• 3840 x 2160 (4K)
A
B C

“[…] each vCPU in an Amazon EC2 instance is a hyperthread of an Intel Xeon CPU core.”
Demystifying the Number of vCPUs for Optimal Workload Performance, Amazon, Sept. 2018
https://d1.awsstatic.com/whitepapers/Demystifying_vCPUs.pdf
GPU Server
EC2 t2.large
Amazon Elastic Cloud
EC2 c4.4xlarge EC2 c5.18xlarge
CPU Intel Xeon
3.5 GHz
Intel Xeon
3.0 GHz
Intel Xeon
2.9 GHz
Intel Xeon Platinum
3.0 GHz
# Cores/vCPUs 8 Cores 2 vCPUs 16 vCPUs 72 vCPUs
RAM 64 GB RAM 8 GB RAM 30 GB RAM 144 GB RAM
GPU NVIDIA Quadro M6000
24 GB
None None None

Test Procedure:
1. Six measurements for each combination of resolution and processing technique
2. Discarding the first measurement (setup costs of image processor)
3. Averaging the remaining five
GPU Server
EC2 t2.large
Amazon Elastic Cloud
EC2 c4.4xlarge EC2 c5.18xlarge
CPU Intel Xeon
3.5 GHz
Intel Xeon
3.0 GHz
Intel Xeon
2.9 GHz
Intel Xeon Platinum
3.0 GHz
# Cores/vCPUs 8 Cores 2 vCPUs 16 vCPUs 72 vCPUs
RAM 64 GB RAM 8 GB RAM 30 GB RAM 144 GB RAM
GPU NVIDIA Quadro M6000
24 GB
None None None

▪ GPU-based rendering
is significant faster
▪ Same relations between
processing techniques
▪ For Invert, software
rendering is faster
(maybe due to no
RAM-VRAM bus
transfer costs)

▪ For complex techniques:
the greater the resolution
the faster is GPU-based
rendering compared to
software rendering
▪ For simple techniques:
stable speed factor
▪ Bend in the curve for FHD
Speed factor = duration SWR / duration GPU

▪t2.large instance:
• 2 vCPUs
• 8 GB RAM
c4.large instance:
• 16 vCPUs
• 30 GB RAM

More vCPUs reduce
processing time
t2.large instance:
▪ 2 vCPUs
▪ 8 GB RAM
c4.large instance:
▪ 16 vCPUs
▪ 30 GB RAM

There probably exists
an upper bound but
none could be observed

Without GPU:
With GPU:
Instance type t2.large c4.4xlarge c5.18xlarge
Number of vCPUs 2 16 (x8) 72 (x36)
USD per hour 0.1008 0.905 (x9) 3.456 (x34)
Instance type g3.4xlarge
GPU Nvidia Tesla M60
8 GB
USD per hour 1.21

Run-time performance cost are mainly determined by:
1. Complexity of the processing technique
2. Resolution of the input image
3. Number of virtual CPUs
The performance penalty can be attenuated
by increasing the number of vCPUs/Threads
Software rendering is a suitable approach for
reducing the financial costs for a scalable web-based provisioning of
image processing, but it comes with costs regarding performance

Contact:
▪ ole.wegen@student.hpi.de | trapp@hpi.de | doellner@hpi.de
▪ sebastian.pasewaldt@digitalmasterpieces.de
Funded by (01IS15041):
In Cooperation with:

Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering

Recommended

Recommended

More Related Content

Similar to Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering

Similar to Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering (20)

More from Matthias Trapp

More from Matthias Trapp (20)

Recently uploaded

Recently uploaded (20)

Performance Evaluation and Comparison of Service-based Image Processing based on Software Rendering