1. Page 1
Uncertainty Visualization in Simulation
Ensemble Decomposition
Christopher J. Johnson, Paul Rosen
Our understanding of many scientific models is incomplete, thus leading to some inherent
uncertainty in understanding. By varying input conditions, a large number of simulations
can be combined into an ensemble that scientists can use to cope with this inherent
uncertainty. By generating large numbers of simulations, the scientist also creates the
opportunity to study the behavior of the numerical model that created them. Doing so
provides insight into model stability, understanding of the model’s reaction to input values,
as well as other previously unknown features of the dataset. By nature of the large amount
of data encapsulated in such large quantities of simulations, visualization of this data will
require both static and dynamic approaches. In this paper, we present our development of
the Non-Negligible Distance Signatures (NNDS), a technique for using an isocontour line to
represent a complete simulation’s relationship to the rest of the dataset using a single line.
We show the usefulness of NNDS by applying it to create static and dynamic images of the
Utah Torso Dataset, which is a model of the propagation of electricity across a human torso
[1]. Our explorations of this dataset will culminate in using NNDS to visually denote the
most useful external points on the torso model for predicting internal electrical behavior.
INTRODUCTION
Our understanding of many scientific models is incomplete, thus leading to some inherent
uncertainty in understanding. By varying input conditions, a large number of simulations can be
combined into an ensemble that scientists can use to cope with this inherent uncertainty. As our
reliance upon this tactic grows, it is increasingly important to understand the behavior of the
simulations that make up the presented ensembles. The most pressing issue with the use of
ensembles is that the size and complexity of the data means that a large amount of important
information in never communicated to the designer of the experiment. For example, the study of
individual simulations enables the scientist to discover areas of an experiment that are most
influenced by input parameters. These could be outliers or unstable areas inordinately affected by
small changes in input values. Understanding how these individual simulations relate to the
overall ensemble can allow the scientist to recognize unacceptable behaviors or to further
understand and replicate correct results. Addressing these challenges will require both static and
dynamic approaches.
In this paper, we present our explorations into creating both a static and dynamic representation
of the underlying simulations that make up an aggregate. We do this through the development of
Non-Negligible Distance Signatures (NNDS), which clearly designate the areas of highest
variance, similarity between simulation groups and individuals, and the relationships of input
values to final results. Using these signatures, we will be able to visualize complete datasets of
thousands of simulations in a single static image. We will also demonstrate the usefulness of
2. Page 2
dynamic images or animations in conjunction with NNDS in order to visualize relationships
between the input values of a simulation and the final results.
For the duration of this paper, we will be discussing our findings gleaned from the Utah Torso
Dataset [1] which is a collection of simulations of the propagation of electricity across the human
torso. The model of the torso is split up into different components based on the kinds of tissue
located in that region.
Figure 1: Tissue distribution of torso model
Each of these specific regions had specific conductivity properties produce by actually running a
current through the tissue. Then, using the method of Generalized Polynomial Chaos, a model
was produced. The data set consists of 100,000 individual simulations using 961 unique input
values in a uniform distribution to account for differences between individuals, aleatorica
uncertainty, etc. The actual values stored at each point are the final amount of voltage once the
model reached a state of equilibrium.
We selected this model because of the large quantity of simulations available to us and the
inherent uncertainty contained within it. The overarching goal was to visualize and predict
internal torso behavior using only external reading. In a medical situation, it is not feasible to
perform surgery to ascertain electrical behavior, but it is generally possible to get a reading of
conductivity on the surface with some sort of sensor. Because the model used provided us with
the complete torso, our general approach was to find a correlation between external and internal
values. The model also provided us with the input value used to create a given simulation.
Finding a correlation between the input values used to create a torso and the internal torso values
was also decided to be a viable option for extrapolating internal torso values. In both cases, real
world applications could reach into the realm of defibrillators and electric shock therapy in
addition to many other kinds of possible medical situations.
a
Aleatoric uncertainty refers to statistical uncertainty that often accompanies random or other types of algorithms
3. Page 3
METHODOLOGY
Our exploration of this problem began with exploring specific statistical moments of the data and
their respective positions on the data mesh. In particular, our first attempts were focused on
analyzing and visualizing the standard deviation of the data. As shown in figure 2, a simple color
mapping allowed us to easily visualize areas of the torso with relatively high standard deviation,
which appeared in the area representing the left back and right chest of the torso.
Figure 2: Point wise standard deviation across complete dataset
While this gave us insight as to which areas of the simulation were more prone to vary, we felt
that we could not reasonably assume a correlation between external and internal values from this
model. The next step taken was to explore the correspondence of the input values to internal
torso behavior.
We felt that we needed a way to compare simulations and that an aggregate simulation would be
useful as a point of reference. For our purposes an aggregate simulation will be defined as a
simulation generated by calculating the mean at every point for some set of simulations. We
considered S to be a set of simulations available and P to be the set of points on the torso mesh,
each point of the aggregate simulation was defined as the following:
That is, for every point on the torso mesh, we summed up the values over all of the simulations
in the set and then divided that final value by the total number of simulations in order to create a
final aggregate simulation. Effectively speaking, this aggregate simulation represents the mean
of all of the simulations in the dataset.
4. Page 4
Using an aggregate simulation generated from our set of 100,000 simulations, it was decided to
measure the differences between individual simulations and that aggregate. The theory behind
this was that outlier simulations would be readily visible while simulations that were generally
closer to the mean would not be as prominent. To visualize this, we implemented the marching
squares algorithm [2] to draw contour lines on differing thresholds based on a simulation’s
Euclidean distance or L2 Norm from the aggregate.
Figure 3: Contour Lines of a sample simulation’s distance from the 100,000 simulation aggregate
This method revealed two interesting aspects of the data set. Firstly, a torso’s distance from the
mean was shown to originate from the areas of high variance. Secondly, continuous threshold
lines passing through the interior of the torso to the exterior can be observed. This, above all else
confirmed the possibility of extrapolating internal torso values using external measurements.
While both methods described above illustrated areas of potential interest, neither was able to
give a cohesive idea of what kinds of relationships existed between simulations and their input
values.
As we continued to work with the dataset, we found that using all 100,000 of the provided
simulations proved to be unwieldy and we began searching for a suitable way to reduce our
dataset into a more manageable form. By observing that the distribution of input values across
the dataset was roughly uniform, we decided to pursue the option of creating input value
aggregates made up of the mean of simulations with similar input values. We followed the same
procedure described above for creating these input value aggregates, but instead of including the
entire data set, we created the input value aggregates by using only the simulations with the same
input value. By calculating each individual simulation’s Euclidean distance to its own input
value aggregate, it was satisfactorily shown that differences within input value groups were
negligible as shown in figure 4.
5. Page 5
Figure 4: A plot showing the distance of simulations to their input value aggregate.
Each line represents the extreme and average simulations within an input value group. As the
maximum Euclidean distance or L2 norm was relatively small for the number of points in the
torso mesh, the input value aggregates were judged to be a fair representation of the complete
data set.
The use of these input value aggregates also provided the opportunity to view the each
aggregate’s overall distance from the mean via Euclidean distance. Figure 5 shows a simple plot
representing the input value aggregate’s distance from the mean.
Figure 5: A plot showing each input value aggregate’s Euclidean distance to the mean of the
entire dataset
6. Page 6
An interesting observation from this is that simulations with lower input values tend to have a
greater amount of distance from the mean. We will later show other visualizations that confirmed
this finding.
Moving forward with this reduced dataset, we began trying different methods to represent an
individual input value’s relationship to the aggregate mean of the dataset. This again was done
with a simple color mapping that readily showed that simulations with lower input values tended
to have lower voltage value representations in the final data set, which confirmed the observation
in figure 5. Cycling through all of these input value simulations showed the predicted correlation
between input and final values, but we still felt that we were lacking a cohesive, static visual that
could communicate a broad overview of how each simulation related to the aggregate. Our
solution to this again involved an isocontour approach, but this time our goal was to represent
each simulation as a single contour line on the torso and to represent all of the simulations at
once. Our previous experimentation with the point classification methods effectively showed that
differences between the simulations and the aggregate appeared to propagate from two specific
areas on the torso, therefore, we decided to designate a signature for each simulation based on
those same absolute differences. This signature was decided to be the line on the torso where
non-negligible differences began to appear for each simulation. Given some simulation X and an
aggregate simulation:
We drew an isocontour line on the torso at a threshold equal to some arbitrarily small constant,
effectively showing where non-negligible distance begins to occur. A good analog for this is the
contour line on an elevation map that denotes an elevation just above zero, marking where
elevation begins to increase. One of the great advantages of using Non-Negligible Distance
Signatures (NNDS) is that we could reduce an entire simulation into one line that can be drawn
on a data mesh in conjunction with other simulations, thus avoiding informational overload in
creating static images.
For our first experiment with NNDS, we decided to apply it to the input value aggregates to
understand how the input value affected where non-negligible distance began on the torso. We
chose to vary the color of our isocontour lines based on the order of simulations and by varying
this order we were able to visually confirm some of what we had already suspected about the
higher standard deviation areas of the torso.
7. Page 7
Figure 6: Non-Negligible Distance signatures for all 961 input value aggregates with respect to
the mean of the complete dataset
As shown in figure 6, it was possible to see that the signatures of the simulations with the
smallest amount of distance to the aggregate move towards the left part of the subject’s back
(lower right area of the graphic) as the mean input value is approached, signifying that the areas
of higher variance is the starting point from which higher variance spreads out. It is also possible
to look at the outer limits of the simulation signatures when they are related to their input values.
Using our Euclidean distance graph that we had generated previously, we already knew that the
simulations with a lower input value had a greater amount of Euclidean distance between them
and the aggregate, but figure 6 confirms that observation. It is readily seen that the blue lines
representing the lower input values extend past the higher input value red lines, signifying that
the area of non-negligible difference begins farther away from the source, allowing more room
for contribution to distance from the mean.
One unfortunate problem with the visualization that described above is that of transparency.
Using the aggregate simulations that we created from the input values and then ordering them by
their input values, we created figure 6 which shows that where different lines over-lap, the colors
blend together and obscure the results. To combat this, we decided to create a video
representation of the contour line signatures. By doing this, it was readily seen that the
simulation signature converges on the areas of high variance as the input values reach their
median value. Once the simulations begin to diverge from the aggregate, we could see that the
signature again diverged from the high variance areas. By doing this, it was easily seen that our
original conclusion regarding the sources of variance was true. While this was not a static image,
we did feel that it was useful to integrate this feature into our software because of the clarity that
the aspect of motion provided to the user.
8. Page 8
Figure 7: Use of NNDS in a dynamic setting to create an animation. (One frame of animation per
input value aggregate)
With our method of non-negligible distance signatures explained, we now take the opportunity to
confirm our original claim that the aggregate of each input value is a viable representation of all
of the individual simulations with that same input value. In figure 4, we visualized the average,
maximum, and minimum amounts of Euclidean distance between each input value group and
their aggregate. The amount of Euclidean distance was trivial for so many points, and by looking
at the simulation signatures for a single group of simulations sharing the same input value, we
find that this simulation signature is very consistent between all of the runs. This re-confirms our
claim that we are safe to use an aggregate of these runs to cut down on the amount of processed
data. This also demonstrated the usefulness of NNDS in visualizing the similarities of several
different simulations and could be applied in outlier detection scenarios.
Figure 8: NNDS calculated on a single input value group with respect to the mean of the same
group, showing that variance within the group was minimal.
RESULTS
With the above mentioned methods for visualization at our disposal, we now turn our attention to
our original goal of predicting internal torso values based on external readings. Recalling that
9. Page 9
simulations with similar input values were very similar, our principle strategy was to classify an
unknown torso by assigning it an input value and to predict internal torso behavior based on the
aggregate of other simulations with the same input value. This essentially turned the problem
into an exercise in clustering.
Given a simulation with an unknown input value, it was possible to cluster the interior points
according to which input value aggregate they were closest to. Then, after determining the most
likely input value used to generate the simulation, the interior points were filled in by the input
value aggregate. By following this greedy approach, single points clustering accuracy of up to
98.6% over all 100,000 simulations. The average error of the extrapolated internal points was on
the order of 10-4
which was judged to be satisfactory.
.
Figure 9: Classification Error Percentages using a single external point
By examining figure 9 in comparison to figure 6, we could see that points with relatively low
accuracy were the points where the non-negligible difference signatures rarely or never reached.
These points were also shown to have a relatively low level of variance from the standard
deviation mapping in figure 2.
In order to improve upon the classification accuracy and definitively find the most useful points
for assigning input values to unknown simulations, we decided to turn to principle component
analysis using the 105 surface points over the 961 input value aggregate simulations. By looking
at the singular values of the singular value decomposition, we found that the data was strongly
tailed and that 99.99% of the variance lied in the first five components. The first component
alone accounted for more than 95% of the total variance.
Knowing that only a few components really contributed to the variance, we decided to take the
top points from the first ten components and use them as an ensemble. By following the same
clustering algorithm as before, at least one of the 10 points had the correct answer 99.5% of the
time while using every single point on the outside of the torso only improved potential accuracy
to 99.7%. This showed that the dimensionality reduction successfully chose the points that were
the greatest contributors to variance and thus the best candidates to improve accuracy. Further
10. Page
10
analysis could be pursued in order to determine how to get one correct answer from the
ensemble, but that was judged to be out of the scope for this project.
DISCUSSION
Our exploration and understanding of the Utah Torso Dataset is not complete, but we have made
some significant achievements in our understanding of it. Primarily, we have found important
correlations between the external and internal values of the torso. These findings were made
possible by our pre-requisite work in reducing the large amount of simulations into a manageable
format after proving that this did not impact the integrity of the data.
Along with this understanding, we have developed several techniques for visualizing these new
findings. Among these, we felt that our work has found a compelling technique for the use of
isocontours in understanding large amounts of data, namely, the use of NNDS. In static
situations, NNDS was able to visually confirm uniformity within input value groups, show
sources of variance within simulations and to give a static overview of the behavior of
simulations with respect to their input values. While the issue of opacity needs to be resolved in
order to achieve optimal clarity, the use of NNDS has the capacity to be applied to a dynamic
viewing setting which also provides the added channel of motion to aid the user in his
interpretation. The method of NNDS proved useful both as a static and dynamic representation
of the Utah Torso Dataset and we look forward to its further development and application to
other datasets.
ACKNOWLEDGMENT
The authors would like to thank the Undergraduate Office of Research Opportunities at the
University of Utah for their financial support.
REFERENCES
[1] R.S. MacLeod, C.R. Johnson, and P.R. Ershler. "Construction of an Inhomogeneous Model
of the Human Torso for Use in Computational Electrocardiography." In IEEE Engineering in
Medicine and Biology Society 13th Annual International Conference, pages 688--689. IEEE
Press, 1991.
[2] LORENSEN W. E., CLINE H. E.: Marching cubes: A high resolution 3d surface
construction algorithm. In Proc. of ACM SIGGRAPH 1987 (1987), pp. 163169.