SlideShare a Scribd company logo
1 of 5
Download to read offline
Streaming Ultra High Resolution Images to Large Tiled
Display at Nearly Interactive Frame Rates with vl3
Jie Jiang
University of Illinois-Chicago
jjiang24@uic.edu
Mark Hereld
Argonne National Laboratory
hereld@anl.gov
Joseph Insley
Argonne National Laboratory
insley@anl.gov
Michael E. Papka
Argonne National Laboratory
Northern Illinois University
papka@anl.gov
Silvio Rizzi
Argonne National Laboratory
srizzi@anl.gov
Thomas Uram
Argonne National Laboratory
turam@anl.gov
Venkatram Vishwanath
Argonne National Laboratory
venkat@anl.gov
ABSTRACT
Visualization of large-scale simulations running on super-
computers requires ultra-high resolution images to capture
important features in the data. In this work, we present
a system for streaming ultra-high resolution images from a
visualization cluster to a remote tiled display at nearly in-
teractive frame rates. vl3, a modular framework for large
scale data visualization and analysis, provides the backbone
of our implementation. With this system we are able to
stream over the network volume renderings of a 20483
voxel
dataset at a resolution of 6144x3072 pixels with a frame rate
of approximately 3.3 frames per second.
Categories and Subject Descriptors
I.3.2 [Computing Methodologies]: Computer Graphics-
Graphics Systems: Distributed/network graphics
1. INTRODUCTION
The increasing computing power of leadership supercom-
puters enables scientific simulations at a very large scale
and produces enormous amounts of data. Large scale data
may be challenging to analyze and visualize. The first chal-
lenge is to efficiently recognize and perceive features in the
data. For this, large ultra-high resolution displays become a
prevalent tool for presenting big data with great detail. The
survey in [9] summarizes quantitative and qualitative eval-
uations regarding visual effects and user interaction with
large high-resolution displays. These studies validate the
positive effects of large high-resolution displays on human
performance exploring big datasets.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are
not made or distributed for profit or commercial advantage and that copies
bear this notice and the full citation on the first page. To copy otherwise, to
republish, to post on servers or to redistribute to lists, requires prior specific
permission and/or a fee.
SC ’15 Austin, Texas USA
Copyright 2015 ACM X-XXXXX-XX-X/XX/XX ...$15.00.
Figure 1: A 6144x3072 pixel image streamed to a 6x4
projector-based tiled display. Image rendered from a 40963
voxel dataset.
The second challenge is visualizing large scale data effi-
ciently. In many domains, including cosmology, astrophysics
and biosciences, large scale simulations running on lead-
ership supercomputers generate extremely large data sets.
For example, the Hardware/Hybrid Accelerated Cosmology
Code (HACC) framework [2] has modeled more than a tril-
lion particles in their simulations. Exploring data interac-
tively at such scales requires a visualization framework that
can render, composite, and stream frames at a sufficient rate.
Our system is built on vl3, an efficient parallel visual-
ization framework. It achieves a nearly-interactive stream-
ing frame rate of ultra-high resolution images by leverag-
ing a parallel compositing scheme and multiple streaming
channels between a visualization cluster and a large high-
resolution tiled display. Figure 1 shows a visualization of a
40963
voxel volumetric fluid simulation on a 6144x3072 pixel
tiled display.
2. BACKGROUND
The system presented in this work relies on large tiled
displays, parallel volume rendering, and remote rendering
and streaming. In this section we describe previous research
in these areas.
2.1 SAGE
The scalable adaptive graphics environment (SAGE) [6] is
a middleware developed by the Electronic Visualization Lab-
oratory at the University of Illinois at Chicago to support
distance collaboration in ultra resolution display environ-
ments. The SAGE architecture enables data, high-definition
video, and high-resolution graphics to be streamed in real-
time from remotely distributed rendering and storage clus-
ters to scalable display walls over high-speed networks. The
framework supports the streaming of multiple dynamic ap-
plication windows [4], but it does not support user interac-
tion within each application. It also lacks a native visual-
ization tool for large scale data. SAGE2 [7] is a complete
redesign and implementation of SAGE built on cloud-based
and web browser technologies, focusing on data intensive
co-located and remote collaboration.
2.2 DisplayCluster
DisplayCluster [5] is an interactive visualization environ-
ment for cluster-driven tiled displays. It is designed as a
desktop-like windowing system that can present media in na-
tive high-resolution and also stream graphics content. Dis-
playCluster has been used on Stallion, a 15 x 5 tiled dis-
play wall with a resolution of 2560 x 1600 for each tile, and
is driven by a visualization cluster consisting of 23 render
nodes and one head node. In contrast, our approach focuses
on driving a tiled display from a single node, though it also
supports a cluster configuration.
2.3 vl3
vl3 is a parallel visualization and data analysis framework
developed at Argonne National Laboratory and the Com-
putation Institute of the University of Chicago. It supports
hardware-accelerated rendering of point sprites for particle
data sets and ray casting for volume rendering of regular
grids. vl3 has been used to interactively render large scale
datasets [11]. Its modular design and extensible architecture
allow for rapid development and testing of new functional-
ities. Moreover, a high level of parallelism across all stages
of the rendering pipeline is crucial for achieving an excellent
scalability, as shown in [10].
2.4 Streaming
Support for streaming of large scale rendered images was
added to vl3 to enable remote visualization on tiled displays
[3]. This capability was first showcased at the SC09 con-
ference, where real time volume renderings of a 40963
Enzo
cosmology data set were generated on a visualization clus-
ter at Argonne National Laboratory and streamed to a 5x3
multi-panel display wall on the conference exhibit floor in
Portland, Oregon. This was demonstrated again at the SC10
conference, with the addition of interactive controls and used
to stream multiple visualizations of different variables from
the simulation for interactive exploration and comparison.
3. FRAMEWORK
vl3, a parallel framework for real-time interactive visual-
ization, can run on multiple hardware platforms. Here we
focus on high performance visualization clusters for remote
rendering and streaming of ultra high resolution visualiza-
tions.
Figure 2: System topology and pipeline.
3.1 Design
Our design consists of four components (Figure 2): visual-
ization cluster, tiled display, high speed network connection,
and interaction node.
The process starts at the visualization cluster generating
images and streaming them over a high speed network to the
tiled display. The tiled display presents the current image
to the user, while the interaction node provides a graphical
user interface that takes user input and sends control events
to the visualization cluster.
The system leverages parallelism during data loading, ren-
dering, compositing, streaming and displaying for maximum
performance. It exploits asynchronous streaming to over-
come the disparity between the rendering capacity of the
visualization cluster and the bandwidth of the high speed
network. With this architecture we have three frame rates
that measure performance in various stages of the pipeline:
(i) rendering frame rate on the visualization cluster, (ii)
streaming frame rate for the network communication, and
(iii) graphics update frame rate on the tiled display.
3.1.1 Visualization cluster
The visualization cluster provides computational power to
load, process, and render large data sets in parallel. The par-
allel direct send algorithm is used for compositing [1]. Each
compositor sends one or more streams of images to the tiled
display resource. The last step of gathering tiles from each
compositor and generating a single integrated image is left to
the displaying end. This approach reduces the communica-
tion overhead during compositing, increasing the rendering
frame rate. This adds no extra work on the displaying end
and works consistently with our multiple channel streaming.
3.1.2 Tiled display
The tiled display can be driven by either a single com-
puter with multiple graphics cards, or a cluster of worksta-
tions. An MPI-enabled parallel client runs on the tiled dis-
play single computer or cluster. Each client process receives
a chunk of the final image from the corresponding streamer
and shows it on its corresponding area of the tiled display.
The client application has been designed with flexibility in
mind, so that it can run on different hardware configura-
tions.
3.1.3 High speed network
The visualization cluster and tiled display are connected
through a high speed network. The network bandwidth
and latency determine an upper bound for the streaming
frame rate. We use a multi-channel streaming scheme to
take into account various hardware environments. We ad-
just the streaming configuration according to the network
topology between the visualization cluster and the tiled dis-
play, optimizing the bandwidth utilization and streaming
frame rate.
3.1.4 Interaction node
The interaction node provides a graphical user interface
(GUI) for intuitive user interactions. It sends control events
to the visualization cluster using the http protocol. Large
tiled displays do not necessarily support a common interac-
tion device. This poses a challenge for capturing user in-
put across a diverse set of display technologies. By decou-
pling the display and interaction components, we overcome
the variability of available interaction methods between tiled
displays. A consistent interaction application, with familiar
GUI components, should also have an easier learning curve
for the scientists using the system.
3.2 Implementation
vl3 runs on a visualization cluster and it is the core of our
system. In this section, we present extensions developed to
support driving the tiled display and communicating with
the interaction node. The system architecture is shown in
Figure 2.
3.2.1 Streaming Configuration
In this section we define two concepts: group stream-
ing partition and parallel compositing configuration. Group
streaming partition refers to the layout for dividing the im-
age into rectangular sub-images. It defines the total num-
ber of streams as well as the resolution and offset of each
streamed image. Adjusting the total number of streams af-
fects utilization of the aggregated network bandwidth to the
tiled display. Parallel compositing configuration defines the
total number of parallel compositing processes on the visu-
alization cluster, as well as the resolution and offset of the
partial image for each compositor. Altering the number of
compositors affects both the visualization performance and
bandwidth utilization for each compositor.
Combining the group streaming partition and the parallel
compositing configuration we generate a streaming configu-
ration. The streaming configuration contains an entry for
each stream along with information of the resolution, offset,
and hostname of the streaming source. The master process
on the visualization cluster generates the streaming config-
uration as a json package. The tiled display application re-
trieves the streaming configuration. With this configuration
file it has the information required to initiate group stream-
ing receivers and assemble the image tiles appropriately to
produce the final image. In this way, Streaming configura-
tion improves the usability of the system by facilitating the
configuration procedure.
3.2.2 Display application
We developed a Python-based and MPI-aware display client.
The Python code is light-weight and provides great cross-
platform compatibility. We use MPI for inter-process com-
munication, which makes our code compatible for both sin-
gle node or cluster driven tiled displays.
The display client launches in two stages. In stage one,
the display client connects to the master process on the vi-
sualization cluster to retrieve the streaming configuration.
Subsequently, in stage two, the client parses the stream-
ing configuration and launches MPI-aware instances of the
core threads that receive images from the visualization clus-
ter and display them at their corresponding position on the
tiled display.
3.2.3 Streaming
We use Celeritas [12] for high-throughput streaming of raw
pixel data over TCP. The visualization cluster uses asyn-
chronous rendering and streaming. We take into account
the difference between the streaming frame rate and render-
ing frame rate using a fixed-length frame queue. For that,
the streaming sender pops the head frame from the queue
and sends it out. In the meantime, if there is an empty slot
in the streaming queue, the renderer will push the current
frame into the queue; otherwise the current frame will be
discarded.
When the streaming frame rate is higher than the ren-
dering frame rate, every rendered frame will be sent out.
Otherwise, when the streaming frame rate is lower than the
rendering frame rate, new frames are pushed into the queue
faster than they are streamed to the client. In that case, once
the queue reaches its maximum capacity, newly rendered
frames are discarded until a slot in the streaming queue be-
comes available again. This ensures that the server always
streams the latest frame, also improving interactivity.
3.2.4 Synchronization
Synchronization among processes of the display client is
indispensable to ensure the image integrity on the tiled dis-
play and to avoid tearing between tiles while updating ren-
dering parameters. We implement a dual synchronization
mechanism similar to the one described in [8] : data syn-
chronization and swap buffer synchronization.
Data synchronization ensures that each streaming re-
ceiver (i.e a single client process) gets the same frame from
their streaming source. On the visualization cluster, the
streaming sender adds a frame id in increasing order at the
end of each send buffer. Streaming sources synchronize be-
fore pushing a rendered frame into the streaming queue; they
only push the current frame into the queue when every pro-
cess has an empty slot. This guarantees that every streaming
client will be receiving an identical frame sequence.
Swap buffer synchronization takes care of synchro-
nization of the receiving and graphics threads. The dis-
play client uses a double buffer for graphics update. Every
streaming receiver synchronizes at the end of each received
frame. They confirm everyone has an identical frame and
then update their graphics buffer. At the same time, graph-
ics threads synchronize before each buffer swap to ensure
that the whole display updates the displaying content syn-
chronously.
3.2.5 Interaction
A Qt-based interaction client runs on a separate node.
The interaction client provides a GUI that allows intuitive
and easy user interaction with the visualization cluster. The
user has precise control of camera position, transfer function,
color values, and other parameters. The interaction client
sends all user updates to visualization cluster and the modi-
fication will be seen on the tiled display in the subsequently
Figure 3: Network performance experiment results. Frame
buffer size is fixed at 6144x3072 pixels.
rendered frames.
4. EXPERIMENTS
We perform two experiments to evaluate our system per-
formance. Network bandwidth test validates the improve-
ment of bandwidth utilization with multi-channel streaming.
Weak scalability test shows the improvement of system per-
formance with multi-channel streaming at different scales.
4.1 Experimental Platform
We use computing and networking resources at the Ar-
gonne Leadership Computing Facility, Argonne National Lab-
oratory. Visualization cluster Cooley and tiled display Oc-
ular are both located in Theory and Computing Sciences
building and connected to each other with high speed net-
work.
4.1.1 Cooley
Cooley is a visualization cluster with a total of 126 com-
pute nodes; each node has two 2.4 GHz Intel Haswell E5-
2620 CPUs (6 cores per CPU, 12 cores total) and one NVIDIA
Tesla K80 dual-GPU card. Memory on Cooley is 384GB
RAM and 24GB GPU memory per node. Aggregate GPU
peak performance is over 293 teraflops double precision. Net-
work interconnect is FDR Infiniband. Each Cooley node has
an independent 10Gbps bandwidth ethernet connection to
one of three aggregating switches.
4.1.2 Ocular
Ocular is a display node with 8 graphics cards driving a
6x4 projector-based tiled display. Each tile has a resolution
of 1024x768 pixels. The full resolution for the tiled display is
6144x3072 pixels. Ocular has a 10Gbps ethernet over fiber
connection.
The 10Gbps connection between Ocular to Cooley is the
bottle neck of communication.
4.2 Results
Our first experiment studies the network capacity by as-
sessing quantitive improvement of aggregated bandwidth from
group streaming. We create a parallel streaming sender that
uses Celeritas and continuously streams an aggregate buffer
of 6144x3072 pixels in parallel. The streaming sender runs
on Cooley while our display client receives and displays the
buffer on Ocular. We change number of streams and mea-
Figure 4: System performance experiment results
sure the receiving frame rate and the corresponding band-
width utilization. The experiment results can be seen at
Figure 3.
We observe that the streaming frame rate increases with
the number of streams. As the number of streams increases
from 1 to 24, the streaming frame rate rises from around
4 FPS to 17 FPS, the equivalent bandwidth for 17 FPS
is around 7344 Mbps. The available bandwidth between
Cooley and Ocular is 10 Gbps. This result shows that group
streaming could improve the bandwidth utilization and with
24 streams our high speed network could deliver a streaming
frame rate sufficient for real-time user interaction.
Our second experiment tests weak scalability of the visu-
alization system. We keep a constant data load of 5123
voxel
volume on each GPU at the visualization cluster while the
full image resolution remains constant at 6144x3072 pixels.
We run 4 cases, where each case doubles the total number of
working GPUs. We assign a single stream for each compos-
itor. Within each case, we run multiple samples, doubling
the number of compositors for each sample. We measure
the streaming frame rate at Ocular. The first few frames
are discarded while the network connection stabilizes. We
then take 300 frames and calculate the average frame rate
every 5 seconds over that period.
Experiment result could be seen at Figure 4. Each line
draws the average frame rate and the maximum/minimum
frame rate observed during each case. For each case with
fixed number of GPUs, generally we observe performance
boost as we increase the number of compositors/streams ex-
cept for samples where number of streams/compositors are
equal to the number of GPUs. The flat part of the line
for the last sample in each test is expected. Each Cooley
node has 2 GPUs. While all GPUs are used for render-
ing for all tests, we double the number of GPUs used for
compositing/streaming at each sample. Increasing the num-
ber of nodes used for compositing/streaming increases the
available bandwidth for compositing communication. For
early samples, only one GPU per node is used for composit-
ing/streaming, and we see an increase in performance. How-
ever, in the last sample both GPUs on each node are used for
compositing/streaming. In this case the two GPUs share the
available bandwidth on that node, so performance remains
relatively constant.
Moreover, the system performance does not reach the
frame rates achieved in our network experiments, indicating
that it is not the network capacity, or the sending and re-
ceiving components, but rather the rendering performance
that is the bottleneck. Overall scaling performance of vl3
was modeled and measured in [10], where the network com-
munication component of compositing emerged as the bot-
tleneck. So in our current experiment, as we assign more
compositors, we conclude that it is again this network com-
munication which jeopardizes scalability.
5. CONCLUSIONS
Our system visualizes ultra high resolution images of large
scale simulation data on a large tiled display at nearly in-
teractive frame rates. Our experiment give very promising
results and shows the potential of streaming ultra high res-
olution visualization at an interactive frame rate. It verifies
that combining multi-channel streaming and parallelism of
visualization pipeline could practically raise the efficiency
and utilization of existing hardware. Our methodology and
design could be applied to existing parallel visualization sys-
tem to improve the capability of handling large data set.
Meanwhile, the modular design of our distributed system
could be modified and extended to fit specific visualization
tasks.
The future work of this project is to improve the scala-
bility of this system by optimizing the performance of the
compositor within vl3. This will enable the visualization
of larger data sets and provide higher resolution images at
interactive frame rates. Exploring the application of our sys-
tem for collaboration among geologically distributed groups
would also be an interesting topic.
6. ACKNOWLEDGMENTS
This work was supported by the Office of Advanced Sci-
entific Computing Research, Office of Science, U.S. Depart-
ment of Energy, under Contract DE-AC02-06CH11357 in-
cluding the Scientific Discovery through Advanced Com-
puting (SciDAC) Institute for Scalable Data Management,
Analysis, and Visualization. This research has been funded
in part and used resources of the Argonne Leadership Com-
puting Facility at Argonne National Laboratory, which is
supported by the Office of Science of the U.S. Department
of Energy under contract DE-AC02-06CH11357.
7. REFERENCES
[1] S. Eilemann and R. Pajarola. Direct send compositing
for parallel sort-last rendering. In Proceedings of the
7th Eurographics conference on Parallel Graphics and
Visualization, pages 29–36. Eurographics Association,
2007.
[2] S. Habib, V. Morozov, N. Frontiere, H. Finkel,
A. Pope, and K. Heitmann. Hacc: extreme scaling and
performance across diverse architectures. In
Proceedings of the International Conference on High
Performance Computing, Networking, Storage and
Analysis, page 6. ACM, 2013.
[3] M. Hereld, J. Insley, E. C. Olson, M. E. Papka,
V. Vishwanath, M. L. Norman, and R. Wagner.
Exploring large data over wide area networks. In Large
Data Analysis and Visualization (LDAV), 2011 IEEE
Symposium on, pages 133–134. IEEE, 2011.
[4] B. Jeong, L. Renambot, R. Jagodic, R. Singh,
J. Aguilera, A. Johnson, and J. Leigh.
High-performance dynamic graphics streaming for
scalable adaptive graphics environment. In SC 2006
Conference, Proceedings of the ACM/IEEE, pages
24–24. IEEE, 2006.
[5] G. P. Johnson, G. D. Abram, B. Westing, P. Navr’til,
and K. Gaither. Displaycluster: An interactive
visualization environment for tiled displays. In Cluster
Computing (CLUSTER), 2012 IEEE International
Conference on, pages 239–247. IEEE, 2012.
[6] J. Leigh, L. Renambot, A. Johnson, R. Jagodic,
H. Hur, E. Hofer, and D. Lee. Scalable adaptive
graphics middleware for visualization streaming and
collaboration in ultra resolution display environments.
In Proc. of Workshop on Ultrascale Visualization,
pages 47–54, 2008.
[7] T. Marrinan, J. Aurisano, A. Nishimoto,
K. Bharadwaj, V. Mateevitsi, L. Renambot, L. Long,
A. Johnson, and J. Leigh. Sage2: A new approach for
data intensive collaboration using scalable resolution
shared displays. In Collaborative Computing:
Networking, Applications and Worksharing
(CollaborateCom), 2014 International Conference on,
pages 177–186. IEEE, 2014.
[8] S. Nam, S. Deshpande, V. Vishwanath, B. Jeong,
L. Renambot, and J. Leigh. Multi-application
inter-tile synchronization on ultra-high-resolution
display walls. In Proceedings of the first annual ACM
SIGMM conference on Multimedia systems, pages
145–156. ACM, 2010.
[9] T. Ni, G. S. Schmidt, O. G. Staadt, M. Livingston,
R. Ball, and R. May. A survey of large high-resolution
display technologies, techniques, and applications. In
Virtual Reality Conference, 2006, pages 223–236.
IEEE, 2006.
[10] S. Rizzi, M. Hereld, J. Insley, M. E. Papka, T. Uram,
and V. Vishwanath. Performance Modeling of vl3
Volume Rendering on GPU-Based Clusters. In
M. Amor and M. Hadwiger, editors, Eurographics
Symposium on Parallel Graphics and Visualization.
The Eurographics Association, 2014.
[11] S. Rizzi, M. Hereld, J. Insley, M. E. Papka, T. Uram,
and V. Vishwanath. Large-Scale Parallel Visualization
of Particle-Based Simulations using Point Sprites and
Level-Of-Detail. In C. Dachsbacher and P. Navr˜Aatil,
editors, Eurographics Symposium on Parallel Graphics
and Visualization. The Eurographics Association,
2015.
[12] V. Vishwanath. LambdaRAM: A HighPerformance,
MultiDimensional, Distributed Cache Over UltraHigh
Speed Networks. PhD thesis, University of Illinois at
Chicago, 2009.

More Related Content

What's hot

Heterogeneous Networks of Remote Monitoring with High Availability and Resili...
Heterogeneous Networks of Remote Monitoring with High Availability and Resili...Heterogeneous Networks of Remote Monitoring with High Availability and Resili...
Heterogeneous Networks of Remote Monitoring with High Availability and Resili...IJCSIS Research Publications
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY ijccsa
 
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...AtakanAral
 
An Overview and Classification of Approaches to Information Extraction in Wir...
An Overview and Classification of Approaches to Information Extraction in Wir...An Overview and Classification of Approaches to Information Extraction in Wir...
An Overview and Classification of Approaches to Information Extraction in Wir...M H
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...M H
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Dataijccsa
 
An Overview of Information Extraction from Mobile Wireless Sensor Networks
An Overview of Information Extraction from Mobile Wireless Sensor NetworksAn Overview of Information Extraction from Mobile Wireless Sensor Networks
An Overview of Information Extraction from Mobile Wireless Sensor NetworksM H
 
On network throughput variability in microsoft azure cloud
On network throughput variability in microsoft azure cloudOn network throughput variability in microsoft azure cloud
On network throughput variability in microsoft azure cloudssuser79fc19
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...Bomm Kim
 
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor NetworksMobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor NetworksEswar Publications
 
F233842
F233842F233842
F233842irjes
 
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...neirew J
 
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...ijccsa
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)inventionjournals
 
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...HarshitParkar6677
 
Integrating sensors with the cloud using dynamic proxies
Integrating sensors with the cloud using dynamic proxiesIntegrating sensors with the cloud using dynamic proxies
Integrating sensors with the cloud using dynamic proxiesduythangbk01
 
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...IJERA Editor
 

What's hot (19)

Heterogeneous Networks of Remote Monitoring with High Availability and Resili...
Heterogeneous Networks of Remote Monitoring with High Availability and Resili...Heterogeneous Networks of Remote Monitoring with High Availability and Resili...
Heterogeneous Networks of Remote Monitoring with High Availability and Resili...
 
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
LOCALITY SIM: CLOUD SIMULATOR WITH DATA LOCALITY
 
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
Resource Mapping Optimization for Distributed Cloud Services - PhD Thesis Def...
 
An Overview and Classification of Approaches to Information Extraction in Wir...
An Overview and Classification of Approaches to Information Extraction in Wir...An Overview and Classification of Approaches to Information Extraction in Wir...
An Overview and Classification of Approaches to Information Extraction in Wir...
 
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
An Integrated Inductive-Deductive Framework for Data Mapping in Wireless Sens...
 
Data Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big DataData Distribution Handling on Cloud for Deployment of Big Data
Data Distribution Handling on Cloud for Deployment of Big Data
 
An Overview of Information Extraction from Mobile Wireless Sensor Networks
An Overview of Information Extraction from Mobile Wireless Sensor NetworksAn Overview of Information Extraction from Mobile Wireless Sensor Networks
An Overview of Information Extraction from Mobile Wireless Sensor Networks
 
On network throughput variability in microsoft azure cloud
On network throughput variability in microsoft azure cloudOn network throughput variability in microsoft azure cloud
On network throughput variability in microsoft azure cloud
 
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...(Im2col)accelerating deep neural networks on low power heterogeneous architec...
(Im2col)accelerating deep neural networks on low power heterogeneous architec...
 
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor NetworksMobile Agents based Energy Efficient Routing for Wireless Sensor Networks
Mobile Agents based Energy Efficient Routing for Wireless Sensor Networks
 
F233842
F233842F233842
F233842
 
International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)International Journal of Engineering Inventions (IJEI)
International Journal of Engineering Inventions (IJEI)
 
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
Neuro-Fuzzy System Based Dynamic Resource Allocation in Collaborative Cloud C...
 
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
NEURO-FUZZY SYSTEM BASED DYNAMIC RESOURCE ALLOCATION IN COLLABORATIVE CLOUD C...
 
International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)International Journal of Engineering and Science Invention (IJESI)
International Journal of Engineering and Science Invention (IJESI)
 
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...
Mobile Fog: A Programming Model for Large–Scale Applications on the Internet ...
 
Integrating sensors with the cloud using dynamic proxies
Integrating sensors with the cloud using dynamic proxiesIntegrating sensors with the cloud using dynamic proxies
Integrating sensors with the cloud using dynamic proxies
 
ICICCE0298
ICICCE0298ICICCE0298
ICICCE0298
 
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
Achieving High Performance Distributed System: Using Grid, Cluster and Cloud ...
 

Viewers also liked (12)

Grupos de rock más influyentes del siglo xx
Grupos de rock más influyentes del siglo xxGrupos de rock más influyentes del siglo xx
Grupos de rock más influyentes del siglo xx
 
Ed Daily 1
Ed Daily 1Ed Daily 1
Ed Daily 1
 
CV_Harry_Taylor
CV_Harry_TaylorCV_Harry_Taylor
CV_Harry_Taylor
 
Intrapreneuring
IntrapreneuringIntrapreneuring
Intrapreneuring
 
Life and Basketball
Life and BasketballLife and Basketball
Life and Basketball
 
Japan Effect
Japan EffectJapan Effect
Japan Effect
 
Taller 1
Taller 1Taller 1
Taller 1
 
11A2 Primeras Sociedades Recolectoras
11A2 Primeras Sociedades Recolectoras11A2 Primeras Sociedades Recolectoras
11A2 Primeras Sociedades Recolectoras
 
IP routing in linux
IP routing in linuxIP routing in linux
IP routing in linux
 
RAK Public Transport Conceptional
RAK Public Transport ConceptionalRAK Public Transport Conceptional
RAK Public Transport Conceptional
 
Telephone system & multiplexing
Telephone system & multiplexingTelephone system & multiplexing
Telephone system & multiplexing
 
Wooden Flooring
Wooden FlooringWooden Flooring
Wooden Flooring
 

Similar to vistech2015

Visual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportVisual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportMohit Kumar
 
Hardware virtualized flexible network for wireless data center optical interc...
Hardware virtualized flexible network for wireless data center optical interc...Hardware virtualized flexible network for wireless data center optical interc...
Hardware virtualized flexible network for wireless data center optical interc...ieeepondy
 
A Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceA Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceNathan Mathis
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Otávio Carvalho
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) Tomasz Bednarz
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfKishaKiddo
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationRichard Jung
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter OverviewLarry Smarr
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd Iaetsd
 
Multiple Screens
Multiple ScreensMultiple Screens
Multiple Screensgraphitech
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud ComputingAnimesh Chaturvedi
 
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...Dr. Richard Otieno
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 

Similar to vistech2015 (20)

poster
posterposter
poster
 
Visual Cryptography Industrial Training Report
Visual Cryptography Industrial Training ReportVisual Cryptography Industrial Training Report
Visual Cryptography Industrial Training Report
 
Hardware virtualized flexible network for wireless data center optical interc...
Hardware virtualized flexible network for wireless data center optical interc...Hardware virtualized flexible network for wireless data center optical interc...
Hardware virtualized flexible network for wireless data center optical interc...
 
Grid Computing
Grid ComputingGrid Computing
Grid Computing
 
A Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning ExperienceA Framework To Generate 3D Learning Experience
A Framework To Generate 3D Learning Experience
 
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
Distributed Near Real-Time Processing of Sensor Network Data Flows for Smart ...
 
Paper444012-4014
Paper444012-4014Paper444012-4014
Paper444012-4014
 
High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS) High-End Visualisation System (HEVS)
High-End Visualisation System (HEVS)
 
CS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdfCS8603_Notes_003-1_edubuzz360.pdf
CS8603_Notes_003-1_edubuzz360.pdf
 
Virtualization in Distributed System: A Brief Overview
Virtualization in Distributed System: A Brief OverviewVirtualization in Distributed System: A Brief Overview
Virtualization in Distributed System: A Brief Overview
 
GRID COMPUTING
GRID COMPUTINGGRID COMPUTING
GRID COMPUTING
 
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud CommunicationTurn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
Turn InSecure And High Speed Intra-Cloud and Inter-Cloud Communication
 
OptIPuter Overview
OptIPuter OverviewOptIPuter Overview
OptIPuter Overview
 
Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)Iaetsd survey on big data analytics for sdn (software defined networks)
Iaetsd survey on big data analytics for sdn (software defined networks)
 
Multiple Screens
Multiple ScreensMultiple Screens
Multiple Screens
 
Grid computing
Grid computingGrid computing
Grid computing
 
Introduction to Cloud Computing
Introduction to Cloud ComputingIntroduction to Cloud Computing
Introduction to Cloud Computing
 
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...
Enhancing Data Security in Cloud Computation Using Addition-Composition Fully...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
H1075460
H1075460H1075460
H1075460
 

vistech2015

  • 1. Streaming Ultra High Resolution Images to Large Tiled Display at Nearly Interactive Frame Rates with vl3 Jie Jiang University of Illinois-Chicago jjiang24@uic.edu Mark Hereld Argonne National Laboratory hereld@anl.gov Joseph Insley Argonne National Laboratory insley@anl.gov Michael E. Papka Argonne National Laboratory Northern Illinois University papka@anl.gov Silvio Rizzi Argonne National Laboratory srizzi@anl.gov Thomas Uram Argonne National Laboratory turam@anl.gov Venkatram Vishwanath Argonne National Laboratory venkat@anl.gov ABSTRACT Visualization of large-scale simulations running on super- computers requires ultra-high resolution images to capture important features in the data. In this work, we present a system for streaming ultra-high resolution images from a visualization cluster to a remote tiled display at nearly in- teractive frame rates. vl3, a modular framework for large scale data visualization and analysis, provides the backbone of our implementation. With this system we are able to stream over the network volume renderings of a 20483 voxel dataset at a resolution of 6144x3072 pixels with a frame rate of approximately 3.3 frames per second. Categories and Subject Descriptors I.3.2 [Computing Methodologies]: Computer Graphics- Graphics Systems: Distributed/network graphics 1. INTRODUCTION The increasing computing power of leadership supercom- puters enables scientific simulations at a very large scale and produces enormous amounts of data. Large scale data may be challenging to analyze and visualize. The first chal- lenge is to efficiently recognize and perceive features in the data. For this, large ultra-high resolution displays become a prevalent tool for presenting big data with great detail. The survey in [9] summarizes quantitative and qualitative eval- uations regarding visual effects and user interaction with large high-resolution displays. These studies validate the positive effects of large high-resolution displays on human performance exploring big datasets. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. SC ’15 Austin, Texas USA Copyright 2015 ACM X-XXXXX-XX-X/XX/XX ...$15.00. Figure 1: A 6144x3072 pixel image streamed to a 6x4 projector-based tiled display. Image rendered from a 40963 voxel dataset. The second challenge is visualizing large scale data effi- ciently. In many domains, including cosmology, astrophysics and biosciences, large scale simulations running on lead- ership supercomputers generate extremely large data sets. For example, the Hardware/Hybrid Accelerated Cosmology Code (HACC) framework [2] has modeled more than a tril- lion particles in their simulations. Exploring data interac- tively at such scales requires a visualization framework that can render, composite, and stream frames at a sufficient rate. Our system is built on vl3, an efficient parallel visual- ization framework. It achieves a nearly-interactive stream- ing frame rate of ultra-high resolution images by leverag- ing a parallel compositing scheme and multiple streaming channels between a visualization cluster and a large high- resolution tiled display. Figure 1 shows a visualization of a 40963 voxel volumetric fluid simulation on a 6144x3072 pixel tiled display. 2. BACKGROUND The system presented in this work relies on large tiled displays, parallel volume rendering, and remote rendering and streaming. In this section we describe previous research
  • 2. in these areas. 2.1 SAGE The scalable adaptive graphics environment (SAGE) [6] is a middleware developed by the Electronic Visualization Lab- oratory at the University of Illinois at Chicago to support distance collaboration in ultra resolution display environ- ments. The SAGE architecture enables data, high-definition video, and high-resolution graphics to be streamed in real- time from remotely distributed rendering and storage clus- ters to scalable display walls over high-speed networks. The framework supports the streaming of multiple dynamic ap- plication windows [4], but it does not support user interac- tion within each application. It also lacks a native visual- ization tool for large scale data. SAGE2 [7] is a complete redesign and implementation of SAGE built on cloud-based and web browser technologies, focusing on data intensive co-located and remote collaboration. 2.2 DisplayCluster DisplayCluster [5] is an interactive visualization environ- ment for cluster-driven tiled displays. It is designed as a desktop-like windowing system that can present media in na- tive high-resolution and also stream graphics content. Dis- playCluster has been used on Stallion, a 15 x 5 tiled dis- play wall with a resolution of 2560 x 1600 for each tile, and is driven by a visualization cluster consisting of 23 render nodes and one head node. In contrast, our approach focuses on driving a tiled display from a single node, though it also supports a cluster configuration. 2.3 vl3 vl3 is a parallel visualization and data analysis framework developed at Argonne National Laboratory and the Com- putation Institute of the University of Chicago. It supports hardware-accelerated rendering of point sprites for particle data sets and ray casting for volume rendering of regular grids. vl3 has been used to interactively render large scale datasets [11]. Its modular design and extensible architecture allow for rapid development and testing of new functional- ities. Moreover, a high level of parallelism across all stages of the rendering pipeline is crucial for achieving an excellent scalability, as shown in [10]. 2.4 Streaming Support for streaming of large scale rendered images was added to vl3 to enable remote visualization on tiled displays [3]. This capability was first showcased at the SC09 con- ference, where real time volume renderings of a 40963 Enzo cosmology data set were generated on a visualization clus- ter at Argonne National Laboratory and streamed to a 5x3 multi-panel display wall on the conference exhibit floor in Portland, Oregon. This was demonstrated again at the SC10 conference, with the addition of interactive controls and used to stream multiple visualizations of different variables from the simulation for interactive exploration and comparison. 3. FRAMEWORK vl3, a parallel framework for real-time interactive visual- ization, can run on multiple hardware platforms. Here we focus on high performance visualization clusters for remote rendering and streaming of ultra high resolution visualiza- tions. Figure 2: System topology and pipeline. 3.1 Design Our design consists of four components (Figure 2): visual- ization cluster, tiled display, high speed network connection, and interaction node. The process starts at the visualization cluster generating images and streaming them over a high speed network to the tiled display. The tiled display presents the current image to the user, while the interaction node provides a graphical user interface that takes user input and sends control events to the visualization cluster. The system leverages parallelism during data loading, ren- dering, compositing, streaming and displaying for maximum performance. It exploits asynchronous streaming to over- come the disparity between the rendering capacity of the visualization cluster and the bandwidth of the high speed network. With this architecture we have three frame rates that measure performance in various stages of the pipeline: (i) rendering frame rate on the visualization cluster, (ii) streaming frame rate for the network communication, and (iii) graphics update frame rate on the tiled display. 3.1.1 Visualization cluster The visualization cluster provides computational power to load, process, and render large data sets in parallel. The par- allel direct send algorithm is used for compositing [1]. Each compositor sends one or more streams of images to the tiled display resource. The last step of gathering tiles from each compositor and generating a single integrated image is left to the displaying end. This approach reduces the communica- tion overhead during compositing, increasing the rendering frame rate. This adds no extra work on the displaying end and works consistently with our multiple channel streaming. 3.1.2 Tiled display The tiled display can be driven by either a single com- puter with multiple graphics cards, or a cluster of worksta- tions. An MPI-enabled parallel client runs on the tiled dis- play single computer or cluster. Each client process receives a chunk of the final image from the corresponding streamer and shows it on its corresponding area of the tiled display. The client application has been designed with flexibility in mind, so that it can run on different hardware configura- tions. 3.1.3 High speed network The visualization cluster and tiled display are connected through a high speed network. The network bandwidth
  • 3. and latency determine an upper bound for the streaming frame rate. We use a multi-channel streaming scheme to take into account various hardware environments. We ad- just the streaming configuration according to the network topology between the visualization cluster and the tiled dis- play, optimizing the bandwidth utilization and streaming frame rate. 3.1.4 Interaction node The interaction node provides a graphical user interface (GUI) for intuitive user interactions. It sends control events to the visualization cluster using the http protocol. Large tiled displays do not necessarily support a common interac- tion device. This poses a challenge for capturing user in- put across a diverse set of display technologies. By decou- pling the display and interaction components, we overcome the variability of available interaction methods between tiled displays. A consistent interaction application, with familiar GUI components, should also have an easier learning curve for the scientists using the system. 3.2 Implementation vl3 runs on a visualization cluster and it is the core of our system. In this section, we present extensions developed to support driving the tiled display and communicating with the interaction node. The system architecture is shown in Figure 2. 3.2.1 Streaming Configuration In this section we define two concepts: group stream- ing partition and parallel compositing configuration. Group streaming partition refers to the layout for dividing the im- age into rectangular sub-images. It defines the total num- ber of streams as well as the resolution and offset of each streamed image. Adjusting the total number of streams af- fects utilization of the aggregated network bandwidth to the tiled display. Parallel compositing configuration defines the total number of parallel compositing processes on the visu- alization cluster, as well as the resolution and offset of the partial image for each compositor. Altering the number of compositors affects both the visualization performance and bandwidth utilization for each compositor. Combining the group streaming partition and the parallel compositing configuration we generate a streaming configu- ration. The streaming configuration contains an entry for each stream along with information of the resolution, offset, and hostname of the streaming source. The master process on the visualization cluster generates the streaming config- uration as a json package. The tiled display application re- trieves the streaming configuration. With this configuration file it has the information required to initiate group stream- ing receivers and assemble the image tiles appropriately to produce the final image. In this way, Streaming configura- tion improves the usability of the system by facilitating the configuration procedure. 3.2.2 Display application We developed a Python-based and MPI-aware display client. The Python code is light-weight and provides great cross- platform compatibility. We use MPI for inter-process com- munication, which makes our code compatible for both sin- gle node or cluster driven tiled displays. The display client launches in two stages. In stage one, the display client connects to the master process on the vi- sualization cluster to retrieve the streaming configuration. Subsequently, in stage two, the client parses the stream- ing configuration and launches MPI-aware instances of the core threads that receive images from the visualization clus- ter and display them at their corresponding position on the tiled display. 3.2.3 Streaming We use Celeritas [12] for high-throughput streaming of raw pixel data over TCP. The visualization cluster uses asyn- chronous rendering and streaming. We take into account the difference between the streaming frame rate and render- ing frame rate using a fixed-length frame queue. For that, the streaming sender pops the head frame from the queue and sends it out. In the meantime, if there is an empty slot in the streaming queue, the renderer will push the current frame into the queue; otherwise the current frame will be discarded. When the streaming frame rate is higher than the ren- dering frame rate, every rendered frame will be sent out. Otherwise, when the streaming frame rate is lower than the rendering frame rate, new frames are pushed into the queue faster than they are streamed to the client. In that case, once the queue reaches its maximum capacity, newly rendered frames are discarded until a slot in the streaming queue be- comes available again. This ensures that the server always streams the latest frame, also improving interactivity. 3.2.4 Synchronization Synchronization among processes of the display client is indispensable to ensure the image integrity on the tiled dis- play and to avoid tearing between tiles while updating ren- dering parameters. We implement a dual synchronization mechanism similar to the one described in [8] : data syn- chronization and swap buffer synchronization. Data synchronization ensures that each streaming re- ceiver (i.e a single client process) gets the same frame from their streaming source. On the visualization cluster, the streaming sender adds a frame id in increasing order at the end of each send buffer. Streaming sources synchronize be- fore pushing a rendered frame into the streaming queue; they only push the current frame into the queue when every pro- cess has an empty slot. This guarantees that every streaming client will be receiving an identical frame sequence. Swap buffer synchronization takes care of synchro- nization of the receiving and graphics threads. The dis- play client uses a double buffer for graphics update. Every streaming receiver synchronizes at the end of each received frame. They confirm everyone has an identical frame and then update their graphics buffer. At the same time, graph- ics threads synchronize before each buffer swap to ensure that the whole display updates the displaying content syn- chronously. 3.2.5 Interaction A Qt-based interaction client runs on a separate node. The interaction client provides a GUI that allows intuitive and easy user interaction with the visualization cluster. The user has precise control of camera position, transfer function, color values, and other parameters. The interaction client sends all user updates to visualization cluster and the modi- fication will be seen on the tiled display in the subsequently
  • 4. Figure 3: Network performance experiment results. Frame buffer size is fixed at 6144x3072 pixels. rendered frames. 4. EXPERIMENTS We perform two experiments to evaluate our system per- formance. Network bandwidth test validates the improve- ment of bandwidth utilization with multi-channel streaming. Weak scalability test shows the improvement of system per- formance with multi-channel streaming at different scales. 4.1 Experimental Platform We use computing and networking resources at the Ar- gonne Leadership Computing Facility, Argonne National Lab- oratory. Visualization cluster Cooley and tiled display Oc- ular are both located in Theory and Computing Sciences building and connected to each other with high speed net- work. 4.1.1 Cooley Cooley is a visualization cluster with a total of 126 com- pute nodes; each node has two 2.4 GHz Intel Haswell E5- 2620 CPUs (6 cores per CPU, 12 cores total) and one NVIDIA Tesla K80 dual-GPU card. Memory on Cooley is 384GB RAM and 24GB GPU memory per node. Aggregate GPU peak performance is over 293 teraflops double precision. Net- work interconnect is FDR Infiniband. Each Cooley node has an independent 10Gbps bandwidth ethernet connection to one of three aggregating switches. 4.1.2 Ocular Ocular is a display node with 8 graphics cards driving a 6x4 projector-based tiled display. Each tile has a resolution of 1024x768 pixels. The full resolution for the tiled display is 6144x3072 pixels. Ocular has a 10Gbps ethernet over fiber connection. The 10Gbps connection between Ocular to Cooley is the bottle neck of communication. 4.2 Results Our first experiment studies the network capacity by as- sessing quantitive improvement of aggregated bandwidth from group streaming. We create a parallel streaming sender that uses Celeritas and continuously streams an aggregate buffer of 6144x3072 pixels in parallel. The streaming sender runs on Cooley while our display client receives and displays the buffer on Ocular. We change number of streams and mea- Figure 4: System performance experiment results sure the receiving frame rate and the corresponding band- width utilization. The experiment results can be seen at Figure 3. We observe that the streaming frame rate increases with the number of streams. As the number of streams increases from 1 to 24, the streaming frame rate rises from around 4 FPS to 17 FPS, the equivalent bandwidth for 17 FPS is around 7344 Mbps. The available bandwidth between Cooley and Ocular is 10 Gbps. This result shows that group streaming could improve the bandwidth utilization and with 24 streams our high speed network could deliver a streaming frame rate sufficient for real-time user interaction. Our second experiment tests weak scalability of the visu- alization system. We keep a constant data load of 5123 voxel volume on each GPU at the visualization cluster while the full image resolution remains constant at 6144x3072 pixels. We run 4 cases, where each case doubles the total number of working GPUs. We assign a single stream for each compos- itor. Within each case, we run multiple samples, doubling the number of compositors for each sample. We measure the streaming frame rate at Ocular. The first few frames are discarded while the network connection stabilizes. We then take 300 frames and calculate the average frame rate every 5 seconds over that period. Experiment result could be seen at Figure 4. Each line draws the average frame rate and the maximum/minimum frame rate observed during each case. For each case with fixed number of GPUs, generally we observe performance boost as we increase the number of compositors/streams ex- cept for samples where number of streams/compositors are equal to the number of GPUs. The flat part of the line for the last sample in each test is expected. Each Cooley node has 2 GPUs. While all GPUs are used for render- ing for all tests, we double the number of GPUs used for compositing/streaming at each sample. Increasing the num- ber of nodes used for compositing/streaming increases the available bandwidth for compositing communication. For early samples, only one GPU per node is used for composit- ing/streaming, and we see an increase in performance. How- ever, in the last sample both GPUs on each node are used for compositing/streaming. In this case the two GPUs share the available bandwidth on that node, so performance remains relatively constant. Moreover, the system performance does not reach the frame rates achieved in our network experiments, indicating that it is not the network capacity, or the sending and re- ceiving components, but rather the rendering performance that is the bottleneck. Overall scaling performance of vl3
  • 5. was modeled and measured in [10], where the network com- munication component of compositing emerged as the bot- tleneck. So in our current experiment, as we assign more compositors, we conclude that it is again this network com- munication which jeopardizes scalability. 5. CONCLUSIONS Our system visualizes ultra high resolution images of large scale simulation data on a large tiled display at nearly in- teractive frame rates. Our experiment give very promising results and shows the potential of streaming ultra high res- olution visualization at an interactive frame rate. It verifies that combining multi-channel streaming and parallelism of visualization pipeline could practically raise the efficiency and utilization of existing hardware. Our methodology and design could be applied to existing parallel visualization sys- tem to improve the capability of handling large data set. Meanwhile, the modular design of our distributed system could be modified and extended to fit specific visualization tasks. The future work of this project is to improve the scala- bility of this system by optimizing the performance of the compositor within vl3. This will enable the visualization of larger data sets and provide higher resolution images at interactive frame rates. Exploring the application of our sys- tem for collaboration among geologically distributed groups would also be an interesting topic. 6. ACKNOWLEDGMENTS This work was supported by the Office of Advanced Sci- entific Computing Research, Office of Science, U.S. Depart- ment of Energy, under Contract DE-AC02-06CH11357 in- cluding the Scientific Discovery through Advanced Com- puting (SciDAC) Institute for Scalable Data Management, Analysis, and Visualization. This research has been funded in part and used resources of the Argonne Leadership Com- puting Facility at Argonne National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under contract DE-AC02-06CH11357. 7. REFERENCES [1] S. Eilemann and R. Pajarola. Direct send compositing for parallel sort-last rendering. In Proceedings of the 7th Eurographics conference on Parallel Graphics and Visualization, pages 29–36. Eurographics Association, 2007. [2] S. Habib, V. Morozov, N. Frontiere, H. Finkel, A. Pope, and K. Heitmann. Hacc: extreme scaling and performance across diverse architectures. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 6. ACM, 2013. [3] M. Hereld, J. Insley, E. C. Olson, M. E. Papka, V. Vishwanath, M. L. Norman, and R. Wagner. Exploring large data over wide area networks. In Large Data Analysis and Visualization (LDAV), 2011 IEEE Symposium on, pages 133–134. IEEE, 2011. [4] B. Jeong, L. Renambot, R. Jagodic, R. Singh, J. Aguilera, A. Johnson, and J. Leigh. High-performance dynamic graphics streaming for scalable adaptive graphics environment. In SC 2006 Conference, Proceedings of the ACM/IEEE, pages 24–24. IEEE, 2006. [5] G. P. Johnson, G. D. Abram, B. Westing, P. Navr’til, and K. Gaither. Displaycluster: An interactive visualization environment for tiled displays. In Cluster Computing (CLUSTER), 2012 IEEE International Conference on, pages 239–247. IEEE, 2012. [6] J. Leigh, L. Renambot, A. Johnson, R. Jagodic, H. Hur, E. Hofer, and D. Lee. Scalable adaptive graphics middleware for visualization streaming and collaboration in ultra resolution display environments. In Proc. of Workshop on Ultrascale Visualization, pages 47–54, 2008. [7] T. Marrinan, J. Aurisano, A. Nishimoto, K. Bharadwaj, V. Mateevitsi, L. Renambot, L. Long, A. Johnson, and J. Leigh. Sage2: A new approach for data intensive collaboration using scalable resolution shared displays. In Collaborative Computing: Networking, Applications and Worksharing (CollaborateCom), 2014 International Conference on, pages 177–186. IEEE, 2014. [8] S. Nam, S. Deshpande, V. Vishwanath, B. Jeong, L. Renambot, and J. Leigh. Multi-application inter-tile synchronization on ultra-high-resolution display walls. In Proceedings of the first annual ACM SIGMM conference on Multimedia systems, pages 145–156. ACM, 2010. [9] T. Ni, G. S. Schmidt, O. G. Staadt, M. Livingston, R. Ball, and R. May. A survey of large high-resolution display technologies, techniques, and applications. In Virtual Reality Conference, 2006, pages 223–236. IEEE, 2006. [10] S. Rizzi, M. Hereld, J. Insley, M. E. Papka, T. Uram, and V. Vishwanath. Performance Modeling of vl3 Volume Rendering on GPU-Based Clusters. In M. Amor and M. Hadwiger, editors, Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association, 2014. [11] S. Rizzi, M. Hereld, J. Insley, M. E. Papka, T. Uram, and V. Vishwanath. Large-Scale Parallel Visualization of Particle-Based Simulations using Point Sprites and Level-Of-Detail. In C. Dachsbacher and P. Navr˜Aatil, editors, Eurographics Symposium on Parallel Graphics and Visualization. The Eurographics Association, 2015. [12] V. Vishwanath. LambdaRAM: A HighPerformance, MultiDimensional, Distributed Cache Over UltraHigh Speed Networks. PhD thesis, University of Illinois at Chicago, 2009.