SlideShare a Scribd company logo
1 of 10
Download to read offline
Page 1
Uncertainty Visualization in Simulation
Ensemble Decomposition
Christopher J. Johnson, Paul Rosen
Our understanding of many scientific models is incomplete, thus leading to some inherent
uncertainty in understanding. By varying input conditions, a large number of simulations
can be combined into an ensemble that scientists can use to cope with this inherent
uncertainty. By generating large numbers of simulations, the scientist also creates the
opportunity to study the behavior of the numerical model that created them. Doing so
provides insight into model stability, understanding of the model’s reaction to input values,
as well as other previously unknown features of the dataset. By nature of the large amount
of data encapsulated in such large quantities of simulations, visualization of this data will
require both static and dynamic approaches. In this paper, we present our development of
the Non-Negligible Distance Signatures (NNDS), a technique for using an isocontour line to
represent a complete simulation’s relationship to the rest of the dataset using a single line.
We show the usefulness of NNDS by applying it to create static and dynamic images of the
Utah Torso Dataset, which is a model of the propagation of electricity across a human torso
[1]. Our explorations of this dataset will culminate in using NNDS to visually denote the
most useful external points on the torso model for predicting internal electrical behavior.
INTRODUCTION
Our understanding of many scientific models is incomplete, thus leading to some inherent
uncertainty in understanding. By varying input conditions, a large number of simulations can be
combined into an ensemble that scientists can use to cope with this inherent uncertainty. As our
reliance upon this tactic grows, it is increasingly important to understand the behavior of the
simulations that make up the presented ensembles. The most pressing issue with the use of
ensembles is that the size and complexity of the data means that a large amount of important
information in never communicated to the designer of the experiment. For example, the study of
individual simulations enables the scientist to discover areas of an experiment that are most
influenced by input parameters. These could be outliers or unstable areas inordinately affected by
small changes in input values. Understanding how these individual simulations relate to the
overall ensemble can allow the scientist to recognize unacceptable behaviors or to further
understand and replicate correct results. Addressing these challenges will require both static and
dynamic approaches.
In this paper, we present our explorations into creating both a static and dynamic representation
of the underlying simulations that make up an aggregate. We do this through the development of
Non-Negligible Distance Signatures (NNDS), which clearly designate the areas of highest
variance, similarity between simulation groups and individuals, and the relationships of input
values to final results. Using these signatures, we will be able to visualize complete datasets of
thousands of simulations in a single static image. We will also demonstrate the usefulness of
Page 2
dynamic images or animations in conjunction with NNDS in order to visualize relationships
between the input values of a simulation and the final results.
For the duration of this paper, we will be discussing our findings gleaned from the Utah Torso
Dataset [1] which is a collection of simulations of the propagation of electricity across the human
torso. The model of the torso is split up into different components based on the kinds of tissue
located in that region.
Figure 1: Tissue distribution of torso model
Each of these specific regions had specific conductivity properties produce by actually running a
current through the tissue. Then, using the method of Generalized Polynomial Chaos, a model
was produced. The data set consists of 100,000 individual simulations using 961 unique input
values in a uniform distribution to account for differences between individuals, aleatorica
uncertainty, etc. The actual values stored at each point are the final amount of voltage once the
model reached a state of equilibrium.
We selected this model because of the large quantity of simulations available to us and the
inherent uncertainty contained within it. The overarching goal was to visualize and predict
internal torso behavior using only external reading. In a medical situation, it is not feasible to
perform surgery to ascertain electrical behavior, but it is generally possible to get a reading of
conductivity on the surface with some sort of sensor. Because the model used provided us with
the complete torso, our general approach was to find a correlation between external and internal
values. The model also provided us with the input value used to create a given simulation.
Finding a correlation between the input values used to create a torso and the internal torso values
was also decided to be a viable option for extrapolating internal torso values. In both cases, real
world applications could reach into the realm of defibrillators and electric shock therapy in
addition to many other kinds of possible medical situations.
a
Aleatoric uncertainty refers to statistical uncertainty that often accompanies random or other types of algorithms
Page 3
METHODOLOGY
Our exploration of this problem began with exploring specific statistical moments of the data and
their respective positions on the data mesh. In particular, our first attempts were focused on
analyzing and visualizing the standard deviation of the data. As shown in figure 2, a simple color
mapping allowed us to easily visualize areas of the torso with relatively high standard deviation,
which appeared in the area representing the left back and right chest of the torso.
Figure 2: Point wise standard deviation across complete dataset
While this gave us insight as to which areas of the simulation were more prone to vary, we felt
that we could not reasonably assume a correlation between external and internal values from this
model. The next step taken was to explore the correspondence of the input values to internal
torso behavior.
We felt that we needed a way to compare simulations and that an aggregate simulation would be
useful as a point of reference. For our purposes an aggregate simulation will be defined as a
simulation generated by calculating the mean at every point for some set of simulations. We
considered S to be a set of simulations available and P to be the set of points on the torso mesh,
each point of the aggregate simulation was defined as the following:
That is, for every point on the torso mesh, we summed up the values over all of the simulations
in the set and then divided that final value by the total number of simulations in order to create a
final aggregate simulation. Effectively speaking, this aggregate simulation represents the mean
of all of the simulations in the dataset.
Page 4
Using an aggregate simulation generated from our set of 100,000 simulations, it was decided to
measure the differences between individual simulations and that aggregate. The theory behind
this was that outlier simulations would be readily visible while simulations that were generally
closer to the mean would not be as prominent. To visualize this, we implemented the marching
squares algorithm [2] to draw contour lines on differing thresholds based on a simulation’s
Euclidean distance or L2 Norm from the aggregate.
Figure 3: Contour Lines of a sample simulation’s distance from the 100,000 simulation aggregate
This method revealed two interesting aspects of the data set. Firstly, a torso’s distance from the
mean was shown to originate from the areas of high variance. Secondly, continuous threshold
lines passing through the interior of the torso to the exterior can be observed. This, above all else
confirmed the possibility of extrapolating internal torso values using external measurements.
While both methods described above illustrated areas of potential interest, neither was able to
give a cohesive idea of what kinds of relationships existed between simulations and their input
values.
As we continued to work with the dataset, we found that using all 100,000 of the provided
simulations proved to be unwieldy and we began searching for a suitable way to reduce our
dataset into a more manageable form. By observing that the distribution of input values across
the dataset was roughly uniform, we decided to pursue the option of creating input value
aggregates made up of the mean of simulations with similar input values. We followed the same
procedure described above for creating these input value aggregates, but instead of including the
entire data set, we created the input value aggregates by using only the simulations with the same
input value. By calculating each individual simulation’s Euclidean distance to its own input
value aggregate, it was satisfactorily shown that differences within input value groups were
negligible as shown in figure 4.
Page 5
Figure 4: A plot showing the distance of simulations to their input value aggregate.
Each line represents the extreme and average simulations within an input value group. As the
maximum Euclidean distance or L2 norm was relatively small for the number of points in the
torso mesh, the input value aggregates were judged to be a fair representation of the complete
data set.
The use of these input value aggregates also provided the opportunity to view the each
aggregate’s overall distance from the mean via Euclidean distance. Figure 5 shows a simple plot
representing the input value aggregate’s distance from the mean.
Figure 5: A plot showing each input value aggregate’s Euclidean distance to the mean of the
entire dataset
Page 6
An interesting observation from this is that simulations with lower input values tend to have a
greater amount of distance from the mean. We will later show other visualizations that confirmed
this finding.
Moving forward with this reduced dataset, we began trying different methods to represent an
individual input value’s relationship to the aggregate mean of the dataset. This again was done
with a simple color mapping that readily showed that simulations with lower input values tended
to have lower voltage value representations in the final data set, which confirmed the observation
in figure 5. Cycling through all of these input value simulations showed the predicted correlation
between input and final values, but we still felt that we were lacking a cohesive, static visual that
could communicate a broad overview of how each simulation related to the aggregate. Our
solution to this again involved an isocontour approach, but this time our goal was to represent
each simulation as a single contour line on the torso and to represent all of the simulations at
once. Our previous experimentation with the point classification methods effectively showed that
differences between the simulations and the aggregate appeared to propagate from two specific
areas on the torso, therefore, we decided to designate a signature for each simulation based on
those same absolute differences. This signature was decided to be the line on the torso where
non-negligible differences began to appear for each simulation. Given some simulation X and an
aggregate simulation:
We drew an isocontour line on the torso at a threshold equal to some arbitrarily small constant,
effectively showing where non-negligible distance begins to occur. A good analog for this is the
contour line on an elevation map that denotes an elevation just above zero, marking where
elevation begins to increase. One of the great advantages of using Non-Negligible Distance
Signatures (NNDS) is that we could reduce an entire simulation into one line that can be drawn
on a data mesh in conjunction with other simulations, thus avoiding informational overload in
creating static images.
For our first experiment with NNDS, we decided to apply it to the input value aggregates to
understand how the input value affected where non-negligible distance began on the torso. We
chose to vary the color of our isocontour lines based on the order of simulations and by varying
this order we were able to visually confirm some of what we had already suspected about the
higher standard deviation areas of the torso.
Page 7
Figure 6: Non-Negligible Distance signatures for all 961 input value aggregates with respect to
the mean of the complete dataset
As shown in figure 6, it was possible to see that the signatures of the simulations with the
smallest amount of distance to the aggregate move towards the left part of the subject’s back
(lower right area of the graphic) as the mean input value is approached, signifying that the areas
of higher variance is the starting point from which higher variance spreads out. It is also possible
to look at the outer limits of the simulation signatures when they are related to their input values.
Using our Euclidean distance graph that we had generated previously, we already knew that the
simulations with a lower input value had a greater amount of Euclidean distance between them
and the aggregate, but figure 6 confirms that observation. It is readily seen that the blue lines
representing the lower input values extend past the higher input value red lines, signifying that
the area of non-negligible difference begins farther away from the source, allowing more room
for contribution to distance from the mean.
One unfortunate problem with the visualization that described above is that of transparency.
Using the aggregate simulations that we created from the input values and then ordering them by
their input values, we created figure 6 which shows that where different lines over-lap, the colors
blend together and obscure the results. To combat this, we decided to create a video
representation of the contour line signatures. By doing this, it was readily seen that the
simulation signature converges on the areas of high variance as the input values reach their
median value. Once the simulations begin to diverge from the aggregate, we could see that the
signature again diverged from the high variance areas. By doing this, it was easily seen that our
original conclusion regarding the sources of variance was true. While this was not a static image,
we did feel that it was useful to integrate this feature into our software because of the clarity that
the aspect of motion provided to the user.
Page 8
Figure 7: Use of NNDS in a dynamic setting to create an animation. (One frame of animation per
input value aggregate)
With our method of non-negligible distance signatures explained, we now take the opportunity to
confirm our original claim that the aggregate of each input value is a viable representation of all
of the individual simulations with that same input value. In figure 4, we visualized the average,
maximum, and minimum amounts of Euclidean distance between each input value group and
their aggregate. The amount of Euclidean distance was trivial for so many points, and by looking
at the simulation signatures for a single group of simulations sharing the same input value, we
find that this simulation signature is very consistent between all of the runs. This re-confirms our
claim that we are safe to use an aggregate of these runs to cut down on the amount of processed
data. This also demonstrated the usefulness of NNDS in visualizing the similarities of several
different simulations and could be applied in outlier detection scenarios.
Figure 8: NNDS calculated on a single input value group with respect to the mean of the same
group, showing that variance within the group was minimal.
RESULTS
With the above mentioned methods for visualization at our disposal, we now turn our attention to
our original goal of predicting internal torso values based on external readings. Recalling that
Page 9
simulations with similar input values were very similar, our principle strategy was to classify an
unknown torso by assigning it an input value and to predict internal torso behavior based on the
aggregate of other simulations with the same input value. This essentially turned the problem
into an exercise in clustering.
Given a simulation with an unknown input value, it was possible to cluster the interior points
according to which input value aggregate they were closest to. Then, after determining the most
likely input value used to generate the simulation, the interior points were filled in by the input
value aggregate. By following this greedy approach, single points clustering accuracy of up to
98.6% over all 100,000 simulations. The average error of the extrapolated internal points was on
the order of 10-4
which was judged to be satisfactory.
.
Figure 9: Classification Error Percentages using a single external point
By examining figure 9 in comparison to figure 6, we could see that points with relatively low
accuracy were the points where the non-negligible difference signatures rarely or never reached.
These points were also shown to have a relatively low level of variance from the standard
deviation mapping in figure 2.
In order to improve upon the classification accuracy and definitively find the most useful points
for assigning input values to unknown simulations, we decided to turn to principle component
analysis using the 105 surface points over the 961 input value aggregate simulations. By looking
at the singular values of the singular value decomposition, we found that the data was strongly
tailed and that 99.99% of the variance lied in the first five components. The first component
alone accounted for more than 95% of the total variance.
Knowing that only a few components really contributed to the variance, we decided to take the
top points from the first ten components and use them as an ensemble. By following the same
clustering algorithm as before, at least one of the 10 points had the correct answer 99.5% of the
time while using every single point on the outside of the torso only improved potential accuracy
to 99.7%. This showed that the dimensionality reduction successfully chose the points that were
the greatest contributors to variance and thus the best candidates to improve accuracy. Further
Page
10
analysis could be pursued in order to determine how to get one correct answer from the
ensemble, but that was judged to be out of the scope for this project.
DISCUSSION
Our exploration and understanding of the Utah Torso Dataset is not complete, but we have made
some significant achievements in our understanding of it. Primarily, we have found important
correlations between the external and internal values of the torso. These findings were made
possible by our pre-requisite work in reducing the large amount of simulations into a manageable
format after proving that this did not impact the integrity of the data.
Along with this understanding, we have developed several techniques for visualizing these new
findings. Among these, we felt that our work has found a compelling technique for the use of
isocontours in understanding large amounts of data, namely, the use of NNDS. In static
situations, NNDS was able to visually confirm uniformity within input value groups, show
sources of variance within simulations and to give a static overview of the behavior of
simulations with respect to their input values. While the issue of opacity needs to be resolved in
order to achieve optimal clarity, the use of NNDS has the capacity to be applied to a dynamic
viewing setting which also provides the added channel of motion to aid the user in his
interpretation. The method of NNDS proved useful both as a static and dynamic representation
of the Utah Torso Dataset and we look forward to its further development and application to
other datasets.
ACKNOWLEDGMENT
The authors would like to thank the Undergraduate Office of Research Opportunities at the
University of Utah for their financial support.
REFERENCES
[1] R.S. MacLeod, C.R. Johnson, and P.R. Ershler. "Construction of an Inhomogeneous Model
of the Human Torso for Use in Computational Electrocardiography." In IEEE Engineering in
Medicine and Biology Society 13th Annual International Conference, pages 688--689. IEEE
Press, 1991.
[2] LORENSEN W. E., CLINE H. E.: Marching cubes: A high resolution 3d surface
construction algorithm. In Proc. of ACM SIGGRAPH 1987 (1987), pp. 163169.

More Related Content

What's hot

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 
An adaptive gmm approach to background subtraction for application in real ti...
An adaptive gmm approach to background subtraction for application in real ti...An adaptive gmm approach to background subtraction for application in real ti...
An adaptive gmm approach to background subtraction for application in real ti...eSAT Publishing House
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONijaia
 
Publication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchPublication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchA. LE
 
Expert System Analysis of Electromyogram
Expert System Analysis of ElectromyogramExpert System Analysis of Electromyogram
Expert System Analysis of Electromyogramidescitation
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...CSCJournals
 
Another Adaptive Approach to Novelty Detection in Time Series
Another Adaptive Approach to Novelty Detection in Time Series Another Adaptive Approach to Novelty Detection in Time Series
Another Adaptive Approach to Novelty Detection in Time Series csandit
 

What's hot (11)

Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 
Machine learning session2
Machine learning   session2Machine learning   session2
Machine learning session2
 
An adaptive gmm approach to background subtraction for application in real ti...
An adaptive gmm approach to background subtraction for application in real ti...An adaptive gmm approach to background subtraction for application in real ti...
An adaptive gmm approach to background subtraction for application in real ti...
 
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATIONA MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
A MODIFIED VORTEX SEARCH ALGORITHM FOR NUMERICAL FUNCTION OPTIMIZATION
 
Topology for data science
Topology for data scienceTopology for data science
Topology for data science
 
Publication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during searchPublication - The feasibility of gaze tracking for “mind reading” during search
Publication - The feasibility of gaze tracking for “mind reading” during search
 
Morse-Smale Regression
Morse-Smale RegressionMorse-Smale Regression
Morse-Smale Regression
 
Expert System Analysis of Electromyogram
Expert System Analysis of ElectromyogramExpert System Analysis of Electromyogram
Expert System Analysis of Electromyogram
 
Ec4030
Ec4030Ec4030
Ec4030
 
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
Biclustering using Parallel Fuzzy Approach for Analysis of Microarray Gene Ex...
 
Another Adaptive Approach to Novelty Detection in Time Series
Another Adaptive Approach to Novelty Detection in Time Series Another Adaptive Approach to Novelty Detection in Time Series
Another Adaptive Approach to Novelty Detection in Time Series
 

Viewers also liked

T-REC-G.798-201212-I!!PDF-E
T-REC-G.798-201212-I!!PDF-ET-REC-G.798-201212-I!!PDF-E
T-REC-G.798-201212-I!!PDF-EMichel Rodrigues
 
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut ...
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut  ...Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut  ...
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut ...Nida Salamah
 
Programación Educación plástica 2016-17
Programación Educación plástica 2016-17Programación Educación plástica 2016-17
Programación Educación plástica 2016-17Beatriz Dorado Estévez
 
Resincronización Cardiaca: Actualización 2013
Resincronización Cardiaca: Actualización 2013Resincronización Cardiaca: Actualización 2013
Resincronización Cardiaca: Actualización 2013CardioTeca
 
Mpr mss 1 c_user_manual (Alcatel Lucent)
Mpr mss 1 c_user_manual (Alcatel Lucent)Mpr mss 1 c_user_manual (Alcatel Lucent)
Mpr mss 1 c_user_manual (Alcatel Lucent)engramjadislam78
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessCloudera, Inc.
 
Intro to interventional radiology
Intro to interventional radiologyIntro to interventional radiology
Intro to interventional radiologypryce27
 
Optimizando la resincronización : Utilidad del QLV
Optimizando la resincronización: Utilidad del QLVOptimizando la resincronización: Utilidad del QLV
Optimizando la resincronización : Utilidad del QLVAlejandro Paredes C.
 
Presentation1, radiological imaging of shoulder dislocation.
Presentation1, radiological imaging of shoulder dislocation.Presentation1, radiological imaging of shoulder dislocation.
Presentation1, radiological imaging of shoulder dislocation.Abdellah Nazeer
 
Contents page and dps marked
Contents page and dps markedContents page and dps marked
Contents page and dps markedPatrickColl99
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Caserta
 

Viewers also liked (17)

T-REC-G.798-201212-I!!PDF-E
T-REC-G.798-201212-I!!PDF-ET-REC-G.798-201212-I!!PDF-E
T-REC-G.798-201212-I!!PDF-E
 
Programación Matemáticas 2016-17
Programación Matemáticas 2016-17Programación Matemáticas 2016-17
Programación Matemáticas 2016-17
 
Payment request
Payment requestPayment request
Payment request
 
Guía 2017
Guía 2017Guía 2017
Guía 2017
 
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut ...
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut  ...Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut  ...
Konsentrasi PM-10 pada Kabut Asap Akibatkan Infeksi Saluran Pernafasan Akut ...
 
Resincronización Cardíaca
Resincronización CardíacaResincronización Cardíaca
Resincronización Cardíaca
 
UNESAR Ablación con catéteres punta de oro
UNESAR Ablación con catéteres punta de oroUNESAR Ablación con catéteres punta de oro
UNESAR Ablación con catéteres punta de oro
 
Programación Educación plástica 2016-17
Programación Educación plástica 2016-17Programación Educación plástica 2016-17
Programación Educación plástica 2016-17
 
Resincronización Cardiaca: Actualización 2013
Resincronización Cardiaca: Actualización 2013Resincronización Cardiaca: Actualización 2013
Resincronización Cardiaca: Actualización 2013
 
Mpr mss 1 c_user_manual (Alcatel Lucent)
Mpr mss 1 c_user_manual (Alcatel Lucent)Mpr mss 1 c_user_manual (Alcatel Lucent)
Mpr mss 1 c_user_manual (Alcatel Lucent)
 
Intel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data SuccessIntel and Cloudera: Accelerating Enterprise Big Data Success
Intel and Cloudera: Accelerating Enterprise Big Data Success
 
Intro to interventional radiology
Intro to interventional radiologyIntro to interventional radiology
Intro to interventional radiology
 
Optimizando la resincronización : Utilidad del QLV
Optimizando la resincronización: Utilidad del QLVOptimizando la resincronización: Utilidad del QLV
Optimizando la resincronización : Utilidad del QLV
 
Presentation1, radiological imaging of shoulder dislocation.
Presentation1, radiological imaging of shoulder dislocation.Presentation1, radiological imaging of shoulder dislocation.
Presentation1, radiological imaging of shoulder dislocation.
 
Contents page and dps marked
Contents page and dps markedContents page and dps marked
Contents page and dps marked
 
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
Big Data Warehousing Meetup: Dimensional Modeling Still Matters!!!
 
Sampling
SamplingSampling
Sampling
 

Similar to Christopher Johnson Bachelor's Thesis

Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageeSAT Publishing House
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageeSAT Journals
 
Summary Of Thesis
Summary Of ThesisSummary Of Thesis
Summary Of Thesisguestb452d6
 
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...sipij
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationChristopher Peter Makris
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...ijscmcj
 
Evaluating competing predictive distributions
Evaluating competing predictive distributionsEvaluating competing predictive distributions
Evaluating competing predictive distributionsAndreas Collett
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalAConeAdam Cone
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Framework for Channel Attenuation Model Final Paper
Framework for Channel Attenuation Model Final PaperFramework for Channel Attenuation Model Final Paper
Framework for Channel Attenuation Model Final PaperKhade Grant
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesAdrián Vallés
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelSayed Abulhasan Quadri
 
Behavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmBehavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmijscmcj
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONijaia
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ijaia
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distributionswarna dey
 

Similar to Christopher Johnson Bachelor's Thesis (20)

Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
 
Multi fractal analysis of human brain mr image
Multi fractal analysis of human brain mr imageMulti fractal analysis of human brain mr image
Multi fractal analysis of human brain mr image
 
Summary Of Thesis
Summary Of ThesisSummary Of Thesis
Summary Of Thesis
 
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
Maxillofacial Pathology Detection Using an Extended a Contrario Approach Comb...
 
Exploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image SegmentationExploration of Imputation Methods for Missingness in Image Segmentation
Exploration of Imputation Methods for Missingness in Image Segmentation
 
8421ijbes01
8421ijbes018421ijbes01
8421ijbes01
 
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
BEHAVIOR STUDY OF ENTROPY IN A DIGITAL IMAGE THROUGH AN ITERATIVE ALGORITHM O...
 
Evaluating competing predictive distributions
Evaluating competing predictive distributionsEvaluating competing predictive distributions
Evaluating competing predictive distributions
 
TPCMFinalACone
TPCMFinalAConeTPCMFinalACone
TPCMFinalACone
 
panel regression.pptx
panel regression.pptxpanel regression.pptx
panel regression.pptx
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
StockMarketPrediction
StockMarketPredictionStockMarketPrediction
StockMarketPrediction
 
Framework for Channel Attenuation Model Final Paper
Framework for Channel Attenuation Model Final PaperFramework for Channel Attenuation Model Final Paper
Framework for Channel Attenuation Model Final Paper
 
Predicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian VallesPredicting breast cancer: Adrian Valles
Predicting breast cancer: Adrian Valles
 
Decentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis ModelDecentralized Data Fusion Algorithm using Factor Analysis Model
Decentralized Data Fusion Algorithm using Factor Analysis Model
 
Behavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithmBehavior study of entropy in a digital image through an iterative algorithm
Behavior study of entropy in a digital image through an iterative algorithm
 
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATIONGENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
GENETIC ALGORITHM FOR FUNCTION APPROXIMATION: AN EXPERIMENTAL INVESTIGATION
 
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
ON THE PREDICTION ACCURACIES OF THREE MOST KNOWN REGULARIZERS : RIDGE REGRESS...
 
Sampling distribution
Sampling distributionSampling distribution
Sampling distribution
 

Christopher Johnson Bachelor's Thesis

  • 1. Page 1 Uncertainty Visualization in Simulation Ensemble Decomposition Christopher J. Johnson, Paul Rosen Our understanding of many scientific models is incomplete, thus leading to some inherent uncertainty in understanding. By varying input conditions, a large number of simulations can be combined into an ensemble that scientists can use to cope with this inherent uncertainty. By generating large numbers of simulations, the scientist also creates the opportunity to study the behavior of the numerical model that created them. Doing so provides insight into model stability, understanding of the model’s reaction to input values, as well as other previously unknown features of the dataset. By nature of the large amount of data encapsulated in such large quantities of simulations, visualization of this data will require both static and dynamic approaches. In this paper, we present our development of the Non-Negligible Distance Signatures (NNDS), a technique for using an isocontour line to represent a complete simulation’s relationship to the rest of the dataset using a single line. We show the usefulness of NNDS by applying it to create static and dynamic images of the Utah Torso Dataset, which is a model of the propagation of electricity across a human torso [1]. Our explorations of this dataset will culminate in using NNDS to visually denote the most useful external points on the torso model for predicting internal electrical behavior. INTRODUCTION Our understanding of many scientific models is incomplete, thus leading to some inherent uncertainty in understanding. By varying input conditions, a large number of simulations can be combined into an ensemble that scientists can use to cope with this inherent uncertainty. As our reliance upon this tactic grows, it is increasingly important to understand the behavior of the simulations that make up the presented ensembles. The most pressing issue with the use of ensembles is that the size and complexity of the data means that a large amount of important information in never communicated to the designer of the experiment. For example, the study of individual simulations enables the scientist to discover areas of an experiment that are most influenced by input parameters. These could be outliers or unstable areas inordinately affected by small changes in input values. Understanding how these individual simulations relate to the overall ensemble can allow the scientist to recognize unacceptable behaviors or to further understand and replicate correct results. Addressing these challenges will require both static and dynamic approaches. In this paper, we present our explorations into creating both a static and dynamic representation of the underlying simulations that make up an aggregate. We do this through the development of Non-Negligible Distance Signatures (NNDS), which clearly designate the areas of highest variance, similarity between simulation groups and individuals, and the relationships of input values to final results. Using these signatures, we will be able to visualize complete datasets of thousands of simulations in a single static image. We will also demonstrate the usefulness of
  • 2. Page 2 dynamic images or animations in conjunction with NNDS in order to visualize relationships between the input values of a simulation and the final results. For the duration of this paper, we will be discussing our findings gleaned from the Utah Torso Dataset [1] which is a collection of simulations of the propagation of electricity across the human torso. The model of the torso is split up into different components based on the kinds of tissue located in that region. Figure 1: Tissue distribution of torso model Each of these specific regions had specific conductivity properties produce by actually running a current through the tissue. Then, using the method of Generalized Polynomial Chaos, a model was produced. The data set consists of 100,000 individual simulations using 961 unique input values in a uniform distribution to account for differences between individuals, aleatorica uncertainty, etc. The actual values stored at each point are the final amount of voltage once the model reached a state of equilibrium. We selected this model because of the large quantity of simulations available to us and the inherent uncertainty contained within it. The overarching goal was to visualize and predict internal torso behavior using only external reading. In a medical situation, it is not feasible to perform surgery to ascertain electrical behavior, but it is generally possible to get a reading of conductivity on the surface with some sort of sensor. Because the model used provided us with the complete torso, our general approach was to find a correlation between external and internal values. The model also provided us with the input value used to create a given simulation. Finding a correlation between the input values used to create a torso and the internal torso values was also decided to be a viable option for extrapolating internal torso values. In both cases, real world applications could reach into the realm of defibrillators and electric shock therapy in addition to many other kinds of possible medical situations. a Aleatoric uncertainty refers to statistical uncertainty that often accompanies random or other types of algorithms
  • 3. Page 3 METHODOLOGY Our exploration of this problem began with exploring specific statistical moments of the data and their respective positions on the data mesh. In particular, our first attempts were focused on analyzing and visualizing the standard deviation of the data. As shown in figure 2, a simple color mapping allowed us to easily visualize areas of the torso with relatively high standard deviation, which appeared in the area representing the left back and right chest of the torso. Figure 2: Point wise standard deviation across complete dataset While this gave us insight as to which areas of the simulation were more prone to vary, we felt that we could not reasonably assume a correlation between external and internal values from this model. The next step taken was to explore the correspondence of the input values to internal torso behavior. We felt that we needed a way to compare simulations and that an aggregate simulation would be useful as a point of reference. For our purposes an aggregate simulation will be defined as a simulation generated by calculating the mean at every point for some set of simulations. We considered S to be a set of simulations available and P to be the set of points on the torso mesh, each point of the aggregate simulation was defined as the following: That is, for every point on the torso mesh, we summed up the values over all of the simulations in the set and then divided that final value by the total number of simulations in order to create a final aggregate simulation. Effectively speaking, this aggregate simulation represents the mean of all of the simulations in the dataset.
  • 4. Page 4 Using an aggregate simulation generated from our set of 100,000 simulations, it was decided to measure the differences between individual simulations and that aggregate. The theory behind this was that outlier simulations would be readily visible while simulations that were generally closer to the mean would not be as prominent. To visualize this, we implemented the marching squares algorithm [2] to draw contour lines on differing thresholds based on a simulation’s Euclidean distance or L2 Norm from the aggregate. Figure 3: Contour Lines of a sample simulation’s distance from the 100,000 simulation aggregate This method revealed two interesting aspects of the data set. Firstly, a torso’s distance from the mean was shown to originate from the areas of high variance. Secondly, continuous threshold lines passing through the interior of the torso to the exterior can be observed. This, above all else confirmed the possibility of extrapolating internal torso values using external measurements. While both methods described above illustrated areas of potential interest, neither was able to give a cohesive idea of what kinds of relationships existed between simulations and their input values. As we continued to work with the dataset, we found that using all 100,000 of the provided simulations proved to be unwieldy and we began searching for a suitable way to reduce our dataset into a more manageable form. By observing that the distribution of input values across the dataset was roughly uniform, we decided to pursue the option of creating input value aggregates made up of the mean of simulations with similar input values. We followed the same procedure described above for creating these input value aggregates, but instead of including the entire data set, we created the input value aggregates by using only the simulations with the same input value. By calculating each individual simulation’s Euclidean distance to its own input value aggregate, it was satisfactorily shown that differences within input value groups were negligible as shown in figure 4.
  • 5. Page 5 Figure 4: A plot showing the distance of simulations to their input value aggregate. Each line represents the extreme and average simulations within an input value group. As the maximum Euclidean distance or L2 norm was relatively small for the number of points in the torso mesh, the input value aggregates were judged to be a fair representation of the complete data set. The use of these input value aggregates also provided the opportunity to view the each aggregate’s overall distance from the mean via Euclidean distance. Figure 5 shows a simple plot representing the input value aggregate’s distance from the mean. Figure 5: A plot showing each input value aggregate’s Euclidean distance to the mean of the entire dataset
  • 6. Page 6 An interesting observation from this is that simulations with lower input values tend to have a greater amount of distance from the mean. We will later show other visualizations that confirmed this finding. Moving forward with this reduced dataset, we began trying different methods to represent an individual input value’s relationship to the aggregate mean of the dataset. This again was done with a simple color mapping that readily showed that simulations with lower input values tended to have lower voltage value representations in the final data set, which confirmed the observation in figure 5. Cycling through all of these input value simulations showed the predicted correlation between input and final values, but we still felt that we were lacking a cohesive, static visual that could communicate a broad overview of how each simulation related to the aggregate. Our solution to this again involved an isocontour approach, but this time our goal was to represent each simulation as a single contour line on the torso and to represent all of the simulations at once. Our previous experimentation with the point classification methods effectively showed that differences between the simulations and the aggregate appeared to propagate from two specific areas on the torso, therefore, we decided to designate a signature for each simulation based on those same absolute differences. This signature was decided to be the line on the torso where non-negligible differences began to appear for each simulation. Given some simulation X and an aggregate simulation: We drew an isocontour line on the torso at a threshold equal to some arbitrarily small constant, effectively showing where non-negligible distance begins to occur. A good analog for this is the contour line on an elevation map that denotes an elevation just above zero, marking where elevation begins to increase. One of the great advantages of using Non-Negligible Distance Signatures (NNDS) is that we could reduce an entire simulation into one line that can be drawn on a data mesh in conjunction with other simulations, thus avoiding informational overload in creating static images. For our first experiment with NNDS, we decided to apply it to the input value aggregates to understand how the input value affected where non-negligible distance began on the torso. We chose to vary the color of our isocontour lines based on the order of simulations and by varying this order we were able to visually confirm some of what we had already suspected about the higher standard deviation areas of the torso.
  • 7. Page 7 Figure 6: Non-Negligible Distance signatures for all 961 input value aggregates with respect to the mean of the complete dataset As shown in figure 6, it was possible to see that the signatures of the simulations with the smallest amount of distance to the aggregate move towards the left part of the subject’s back (lower right area of the graphic) as the mean input value is approached, signifying that the areas of higher variance is the starting point from which higher variance spreads out. It is also possible to look at the outer limits of the simulation signatures when they are related to their input values. Using our Euclidean distance graph that we had generated previously, we already knew that the simulations with a lower input value had a greater amount of Euclidean distance between them and the aggregate, but figure 6 confirms that observation. It is readily seen that the blue lines representing the lower input values extend past the higher input value red lines, signifying that the area of non-negligible difference begins farther away from the source, allowing more room for contribution to distance from the mean. One unfortunate problem with the visualization that described above is that of transparency. Using the aggregate simulations that we created from the input values and then ordering them by their input values, we created figure 6 which shows that where different lines over-lap, the colors blend together and obscure the results. To combat this, we decided to create a video representation of the contour line signatures. By doing this, it was readily seen that the simulation signature converges on the areas of high variance as the input values reach their median value. Once the simulations begin to diverge from the aggregate, we could see that the signature again diverged from the high variance areas. By doing this, it was easily seen that our original conclusion regarding the sources of variance was true. While this was not a static image, we did feel that it was useful to integrate this feature into our software because of the clarity that the aspect of motion provided to the user.
  • 8. Page 8 Figure 7: Use of NNDS in a dynamic setting to create an animation. (One frame of animation per input value aggregate) With our method of non-negligible distance signatures explained, we now take the opportunity to confirm our original claim that the aggregate of each input value is a viable representation of all of the individual simulations with that same input value. In figure 4, we visualized the average, maximum, and minimum amounts of Euclidean distance between each input value group and their aggregate. The amount of Euclidean distance was trivial for so many points, and by looking at the simulation signatures for a single group of simulations sharing the same input value, we find that this simulation signature is very consistent between all of the runs. This re-confirms our claim that we are safe to use an aggregate of these runs to cut down on the amount of processed data. This also demonstrated the usefulness of NNDS in visualizing the similarities of several different simulations and could be applied in outlier detection scenarios. Figure 8: NNDS calculated on a single input value group with respect to the mean of the same group, showing that variance within the group was minimal. RESULTS With the above mentioned methods for visualization at our disposal, we now turn our attention to our original goal of predicting internal torso values based on external readings. Recalling that
  • 9. Page 9 simulations with similar input values were very similar, our principle strategy was to classify an unknown torso by assigning it an input value and to predict internal torso behavior based on the aggregate of other simulations with the same input value. This essentially turned the problem into an exercise in clustering. Given a simulation with an unknown input value, it was possible to cluster the interior points according to which input value aggregate they were closest to. Then, after determining the most likely input value used to generate the simulation, the interior points were filled in by the input value aggregate. By following this greedy approach, single points clustering accuracy of up to 98.6% over all 100,000 simulations. The average error of the extrapolated internal points was on the order of 10-4 which was judged to be satisfactory. . Figure 9: Classification Error Percentages using a single external point By examining figure 9 in comparison to figure 6, we could see that points with relatively low accuracy were the points where the non-negligible difference signatures rarely or never reached. These points were also shown to have a relatively low level of variance from the standard deviation mapping in figure 2. In order to improve upon the classification accuracy and definitively find the most useful points for assigning input values to unknown simulations, we decided to turn to principle component analysis using the 105 surface points over the 961 input value aggregate simulations. By looking at the singular values of the singular value decomposition, we found that the data was strongly tailed and that 99.99% of the variance lied in the first five components. The first component alone accounted for more than 95% of the total variance. Knowing that only a few components really contributed to the variance, we decided to take the top points from the first ten components and use them as an ensemble. By following the same clustering algorithm as before, at least one of the 10 points had the correct answer 99.5% of the time while using every single point on the outside of the torso only improved potential accuracy to 99.7%. This showed that the dimensionality reduction successfully chose the points that were the greatest contributors to variance and thus the best candidates to improve accuracy. Further
  • 10. Page 10 analysis could be pursued in order to determine how to get one correct answer from the ensemble, but that was judged to be out of the scope for this project. DISCUSSION Our exploration and understanding of the Utah Torso Dataset is not complete, but we have made some significant achievements in our understanding of it. Primarily, we have found important correlations between the external and internal values of the torso. These findings were made possible by our pre-requisite work in reducing the large amount of simulations into a manageable format after proving that this did not impact the integrity of the data. Along with this understanding, we have developed several techniques for visualizing these new findings. Among these, we felt that our work has found a compelling technique for the use of isocontours in understanding large amounts of data, namely, the use of NNDS. In static situations, NNDS was able to visually confirm uniformity within input value groups, show sources of variance within simulations and to give a static overview of the behavior of simulations with respect to their input values. While the issue of opacity needs to be resolved in order to achieve optimal clarity, the use of NNDS has the capacity to be applied to a dynamic viewing setting which also provides the added channel of motion to aid the user in his interpretation. The method of NNDS proved useful both as a static and dynamic representation of the Utah Torso Dataset and we look forward to its further development and application to other datasets. ACKNOWLEDGMENT The authors would like to thank the Undergraduate Office of Research Opportunities at the University of Utah for their financial support. REFERENCES [1] R.S. MacLeod, C.R. Johnson, and P.R. Ershler. "Construction of an Inhomogeneous Model of the Human Torso for Use in Computational Electrocardiography." In IEEE Engineering in Medicine and Biology Society 13th Annual International Conference, pages 688--689. IEEE Press, 1991. [2] LORENSEN W. E., CLINE H. E.: Marching cubes: A high resolution 3d surface construction algorithm. In Proc. of ACM SIGGRAPH 1987 (1987), pp. 163169.