Simulation Informatics; Analyzing Large Scientific Datasets

Simulation Informatics!
Analyzing Large Datasets
from Scientiﬁc Simulations

DAVID F. GLEICH ! PAUL G. CONSTANTINE!
PURDUE UNIVERSITY
STANFORD UNIVERSITY
COMPUTER SCIENCE ! JOE RUTHRUFF!
DEPARTMENT
& JEREMY TEMPLETON !
SANDIA NATIONAL LABS

1
David Gleich · Purdue
CS&E Seminar

This talk is a story …

2
CS&E Seminar

How I learned to stop
worrying and love the
simulation!

3
CS&E Seminar

I asked …!
Can we do UQ on
PageRank?

4
CS&E Seminar

PageRank by Google
Google’s PageRank
PageRank by Google
3
3
The Model
2 5 1.The Model uniformly with
follow edges
2
4
5 1. follow edges uniformly with
probability , and
4
2. randomly jump, with probability
probability and
1 6
2. randomlyassume everywhere is
1 , we’ll jump with probability
1 6 equally, likely assume everywhere is
1 we’ll
equally likely

The places we ﬁnd the
The places we ﬁnd the
surfer most often are im-
portant pages. often are im-
surfer most
portant pages.

5
David F. Gleich (Sandia) PageRank intro
CS&E Seminar
/ 36
Purdue 5

h sensitivity?
alpha alpha PageRank
PageRa
PageRank
RandomPageRank
dom alpha
Random alpha
RAPr
or PageRank meets UQ

( P)x = (1 )v
s the random variables as the random variables
Model PageRank
ageRank as the random variables
y to the links : examined and understoo
x(A) x(A)
x(A)
and look at
k E [x(A)] and Std [x(A)] .
at
E [x(A)] and Std [x(A)] .
y to the E [x(A)]: and Std [x(A)] .understood,
jump examined,
Explored in Constantine and Gleich, WAW2007; and "
Constantine and Gleich, J. Internet Mathematics 2011.

6
CS&E Seminar

Random alpha PageRank has
Convergence theory
a rigorous convergence theory.
Method Conv. Work Required What is N?
1 number of
Monte Carlo p N PageRank systems
N samples from A
Path Damping
r N+2 N + 1 matrix vector terms of
(without
N1+ products Neumann series
Std [x(A)])
number of
Gaussian
r 2N N PageRank systems quadrature
Quadrature
points

and r are parameters from Bet ( , b, , r)

7
David F. Gleich (Sandia) David
Random sensitivity Gleich · Purdue
CS&E Seminar
/ 36
Purdue 27

Working with
PageRank showed us
how to treat UQ more
generally …

8
CS&E Seminar

Constantine, Gleich, and Iaccarino.
We studied Spectral Methods for Parameterized
Matrix Equations, SIMAX, 2010.
parameterized

A(s)x(s) = b(s)

matrices.

, A(J 1 )x(J 1 ) = b(J 1 )

) A(J N )x(J N ) = b(J N ) or
Parameterized
Solution

) AN (J 1 )xN (J 1 ) = bN (J 1 )

Constantine, Gleich, and Iaccarino. A
A(s)x(s) = b(s) factorization of the spectral Galerkin
system for parameterized matrix
equations: derivation and applications,
SISC 2011.

How to compute the Galerkin solution
Discretized PDE in a weakly intrusive manner.!
with explicit
parameters

9
CS&E Seminar

Simulation!
The Third Pillar of Science
21st Century Science in a nutshell!
Experiments are not practical or feasible.
Simulate things instead.
But do we trust the simulations?!

We’re trying!
Model Fidelity
Veriﬁcation & Validation (V&V)
Uncertainty Quantiﬁcation (UQ)

10

CS&E Seminar

The message
Insight and conﬁdence
requires multiple runs.

11
CS&E Seminar

The problem
A simulation run ain’t cheap!

12
CS&E Seminar

Another problem
It’s very hard to “modify”
current codes.

13
CS&E Seminar

Large scale nonlinear, time
dependent heat transfer problem
105 nodes
103 time steps
30 minutes on 16 cores

Questions
What is the probability of failure?
Which input values cause failure?

14
CS&E Seminar

It’s time to ask "
What can science
learn from Google?"
"

- Wired Magazine (2008)

15
CS&E Seminar

We can throw the numbers
21.1st Century Science
into the biggest computing in a nutshell?
clusters the world has ever Simulations are "
seen and let statistical too expensive.
algorithms ﬁnd patterns
Let data provide a
where science cannot.
surrogate.
- Wired (again)

16/18
CS&E Seminar

Our approach!
Construct an interpolating
reduced order model from a
budget-constrained ensemble of
runs for uncertainty and
optimization studies.

17
CS&E Seminar

That is, we store the runs
Supercomputer Data computing cluster Engineer

Each multi-day HPC A data cluster can … enabling engineers to query
simulation generates hold hundreds or thousands and analyze months of simulation
gigabytes of data. of old simulations … data for statistical studies and
uncertainty quantification.

and build the interpolant from
the pre-computed data.

18
CS&E Seminar

The Database

Input " Time history" s1 -> f1
Parameters
of simulation
s2 -> f2
s
f

sk -> fk

2 3 A single simulation
The simulation as a vector

q(x1 , t1 , s)
6 .
. 7 at one time step
6 . 7
6 7
6q(xn , t1 , s)7
6 7
6q(x1 , t2 , s)7
6 7 ⇥ ⇤
f(s) = 6 . 7
6
6
.
. 7
7 X = f(s1 ) f(s2 ) ... f(sp )
6q(xn , t2 , s)7
6 7
6 . 7 The database as a matrix
4 .
. 5
q(xn , tk , s)

19
CS&E Seminar

The interpolant

Motivation!
This idea was inspired by
Let the data give you the basis.
the success of other
⇥ ⇤ reduced order models

X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s
residual minimizing idea.
Then ﬁnd the right combination
Xr

f(s) ⇡ uj ↵j (s)

j=1

These are the left singular
vectors from X!

20
CS&E Seminar

Why the SVD?!
Let’s study a simple case.
2 3
g(x1 , s1 ) g(x1 , s2 ) ··· g(x1 , sp )
6 .. .. .
. 7
6 g(x2 , s1 ) . . . 7
X=6
6 .
7
7
4 . .. ..
. . . g(xm 1 , sp )5 treat each right
g(xm , s1 ) g(xm , sp singular vector
··· 1) g(xm , sp ).
as samples of
= U⌃VT , the unknown
r
X r
X basis functions
g(xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj )
`=1 `=1 split x and s
a general parameter
r p
X X (`)
g(xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s)
`=1 j=1
Interpolate v any way you wish

21
CS&E Seminar

Method summary

Compute SVD of X!
Compute interpolant of right singular vectors
Approximate a new value of f(s)!

22
CS&E Seminar

A quiz!
Which section would you rather
try and interpolate, A or B?

A
B

23
CS&E Seminar

How predictable is a !
singular vector?
Folk Theorem (O’Leary 2011)
The singular vectors of a matrix of “smooth” data
become more oscillatory as the index increases.
Implication!
The gradient of the singular vectors increases as
the index increases.

v1 (s), v2 (s), ... , vt (s)

vt+1 (s), ... , vr (s)
Predictable
Unpredictable

24
CS&E Seminar

A reﬁned method with !
an error model
Don’t even try to
interpolate the
predictable modes.
t(s) r
X X
f(s) ⇡ uj ↵j (s) + uj j ⌘j
j=1 Predictable
j=t(s)+1 Unpredictable
⌘j ⇠ N(0, 1)
0 1
r
X
TA
Variance[f] = diag @ j uj uj
j=t(s)+1

But now, how to choose t(s)?

25
CS&E Seminar

Our current approach to
choosing the predictability

t(s) is the largest such that
⌧
X
1 @vi
i < threshold
1 @s
i=1

26
CS&E Seminar

An experimental test case

A heat equation
problem

Two parameters
that control the
material properties

27
CS&E Seminar

Experiments

20 point, Latin hypercube sample

28
CS&E Seminar

Our Reduced Order Model

Where the error is the worst

The Truth

29
CS&E Seminar

A Large Scale Example

Nonlinear heat transfer model
80k nodes, 300 time-steps
104 basis runs
SVD of 24m x 104 data matrix
500x reduction in wall clock time
(100x including the SVD)

30
CS&E Seminar

PART 2!

Tall-and-skinny
QR (and SVD)!
on MapReduce

31
CS&E Seminar

Quick review of QR
QR Factorization
Let   , real Using QR for regression

  is given by
  the solution of  

QR is block normalization
  is   orthogonal (   ) “normalize” a vector
usually generalizes to
computing   in the QR
  is   upper triangular.

0
A = Q
R

32
David Gleich (Sandia) David
MapReduce 2011 Gleich · Purdue
CS&E Seminar
4/22

Intro to MapReduce
Originated at Google for indexing web Data scalable
pages and computing PageRank.
Maps
M M
1
2
1
M
The idea Bring the Reduce
2
M M M
computations to the data.
R 3
4
3
M
R
M M
Express algorithms in "
4
5
5
M Shuffle
data-local operations.
Fault-tolerance by design
Implement one type of Input stored in triplicate
communication: shuffle.
M
Reduce input/"
output on disk
M
Shuffle moves all data with M
R

the same key to the same M R

reducer.
Map output"
persisted to disk"

33
before shuffle
CS&E Seminar

Mesh point variance in MapReduce
Run 1
Run 2
Run 3

T=1
T=2
T=3
T=1
T=2
T=3
T=1
T=2
T=3

34
CS&E Seminar

Mesh point variance in MapReduce
Run 1
Run 2
Run 3

T=1
T=2
T=3
T=1
T=2
T=3
T=1
T=2
T=3
M
M
M

1. Each mapper out- 2. Shufﬂe moves all
puts the mesh points values from the same
with the same key.
mesh point to the
R
R
same reducer.

3. Reducers just
compute a numerical
variance.
Bring the computations
to the data!

35
CS&E Seminar

Communication avoiding QR
Communication avoiding TSQR
(Demmel et al. 2008)

First, do QR Second, compute
factorizations a QR factorization
of each local of the new “R”
matrix  

36
Demmel et al.David Communicating avoiding CS&E and sequential QR.
2008. Gleich · Purdue
parallel Seminar

Serial QR factorizations!
Fully serialet al. 2008)
(Demmel TSQR

Compute QR of   ,
read   , update QR, …

37
Demmel et al. 2008. Communicating avoiding
parallel and sequential QR.
David Gleich · Purdue CS&E Seminar

Tall-and-skinnymatrix storage
MapReduce matrix
storage in MapReduce
 
A1

Key is an arbitrary row-id
Value is the   array for A2
a row.

A3
Each submatrix   is an
input split.
A4

38
David Gleich (Sandia) MapReduce 2011 10/2
CS&E Seminar

Algorithm
Data Rows of a matrix
A1 A1 Map QR factorization of rows
A2
qr Reduce QR factorization of rows
A2 Q2 R2
Mapper 1 qr
Serial TSQR A3 A3 Q3 R3
A4 qr emit
A4 Q4 R4

A5 A5
qr
A6 A6 Q6 R6
Mapper 2 qr
Serial TSQR A7 A7 Q7 R7

A8 qr emit
A8 Q8 R8

R4 R4
Reducer 1
Serial TSQR qr emit
R8 R8 Q R

39
CS&E Seminar

Key Limitations
Computes only R and not Q

Can get Q via Q = AR+ with another MR iteration. "
(we currently use this for computing the SVD)
Dubious numerical stability; iterative reﬁnement helps.

Working on better ways to compute Q "
(with Austin Benson, Jim Demmel)

40
CS&E Seminar

In hadoopy
Full code in hadoopy
import random, numpy, hadoopy def close(self):
class SerialTSQR: self.compress()
def __init__(self,blocksize,isreducer): for row in self.data:
key = random.randint(0,2000000000)
self.bsize=blocksize yield key, row
self.data = []
if isreducer: self.__call__ = self.reducer def mapper(self,key,value):
else: self.__call__ = self.mapper self.collect(key,value)

def reducer(self,key,values):
def compress(self): for value in values: self.mapper(key,value)
R = numpy.linalg.qr(
numpy.array(self.data),'r') if __name__=='__main__':
# reset data and re-initialize to R mapper = SerialTSQR(blocksize=3,isreducer=False)
self.data = [] reducer = SerialTSQR(blocksize=3,isreducer=True)
for row in R: hadoopy.run(mapper, reducer)
self.data.append([float(v) for v in row])

def collect(self,key,value):
self.data.append(value)
if len(self.data)>self.bsize*len(self.data[0]):
self.compress()

41
CS&E Seminar

Lots many maps? an iteration.
Too of data? Add Add an iteration!
map emit reduce emit reduce emit
R1 R2,1 R
A1 Mapper 1-1
S1 Reducer 1-1
S(2)
A2 Reducer 2-1
Serial TSQR Serial TSQR Serial TSQR

shuffle
identity map
map emit reduce emit
R2 R2,2
A2 Mapper 1-2
S(1) A2
S Reducer 1-2
shuffle
Serial TSQR Serial TSQR

A
map emit reduce emit
R3 R2,3
A3 Mapper 1-3
A2
S3 Reducer 1-3
Serial TSQR Serial TSQR

map emit
R4
A3
4 Mapper 1-4
Serial TSQR

Iteration 1 Iteration 2

42
CS&E Seminar

mrtsqr – of parameters
parameters
Summary summary of
Blocksize How many rows to
A1 A1
read before computing a QR
qr
factorization, expressed as a A2 A2 Q2
multiple of the number of
columns (See paper)
map emit
R1
Splitsize The size of each local A1 Mapper 1-1
matrix Serial TSQR

Reduction tree

(Red)
S(2)
The number of

(Red)
(Red) S(2)
shuffle

reducers and S(1)
A
iterations to use

Iteration 1 Iter 2 Iter 3

43
David Gleich (Sandia) MapReduce 2011
David 15/22
Gleich · Purdue
CS&E Seminar

Varying splitsize and the tree
Data
Varying splitsize Synthetic
Cols. Iters. Split Maps Secs. Increasing split size
(MB) improves performance
50 1 64 8000 388 (accounts for Hadoop
– – 256 2000 184 data movement)
– – 512 1000 149

– 2 64 8000 425 Increasing iterations helps
– – 256 2000 220 for problems with many
columns.
– – 512 1000 191

1000 1 512 1000 666 (1000 columns with 64-MB
split size overloaded the
– 2 64 6000 590
single reducer.)
– – 256 2000 432
– – 512 1000 337

44
CS&E Seminar

MapReduceTSQR summary
MapReduce is great for TSQR!
Data A tall and skinny (TS) matrix by rows

Map QR factorization of local rows Demmel et al. showed that
this construction works to
Reduce QR factorization of local rows compute a QR factorization
with minimal communication
Input 500,000,000-by-100 matrix
Each record 1-by-100 row
HDFS Size 423.3 GB
Time to compute   (the norm of each column) 161 sec.
Time to compute   in qr(   ) 387 sec.

45
On a 64-node Hadoop cluster with · Purdue
CS&E Seminar
David Gleich 4x2TB, one Core i7-920, 12GB RAM/node

Our vision!
To enable analysts
and engineers to
hypothesize from " Paul G. Constantine "

data computations Sandia!
Jeremy Templeton
Joe Ruthruff
instead of expensive
… and you ? …
HPC computations.

46
CS&E Seminar

Simulation Informatics; Analyzing Large Scientific Datasets

More Related Content

Similar to Simulation Informatics; Analyzing Large Scientific Datasets

More from David Gleich

Recently uploaded

Simulation Informatics; Analyzing Large Scientific Datasets