Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- SVD by Sreelekshmy Selvin 123 views
- Image compression using singular va... by PRADEEP Cheekatla 3534 views
- SVD - IEEE WiSPNET 2016 Conference ... by Aishwarya K. M. 255 views
- Singular Value Decomposition Image ... by Aishwarya K. M. 497 views
- A Short PMML Tutorial by LatentView by ramesh.latentview 5761 views
- Dimensionality reductionPCA, SVD, ... by Jun Wang 2345 views

932 views

Published on

My talk at the massive data and signal processing workshop at ICASSP 2012 in Kyoto Japan.

No Downloads

Total views

932

On SlideShare

0

From Embeds

0

Number of Embeds

3

Shares

0

Downloads

17

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Distinguishing signal noisefrom noise in an SVDof simulation dataDAVID F. GLEICH ! PAUL G. CONSTANTINE! PURDUE UNIVERSITY STANFORD UNIVERSITYCOMPUTER SCIENCE ! DEPARTMENT 1 David Gleich · Purdue ICASSP
- 2. Large scale non-linear, timedependent heat transfer problem 105 nodes, 103 time steps 30 minutes on 16 cores ~ 1GB Questions What is the probability of failure? Which input values cause failure? 2 David Gleich · Purdue ICASSP
- 3. Insight and conﬁdence requires multiple runsand hits the curse of dimensionality.The problemA simulation run is time-consuming!Our solutionUse “big-data” techniques and platforms. 3 David Gleich · Purdue ICASSP
- 4. We store a few runs …Supercomputer Data computing cluster EngineerRun 100-1000 Store them on the Run 10000-100000simulations MapReduce cluster interpolated simulations for approximate statistics … and build an interpolant from the data for computational steering. 4 David Gleich · Purdue ICASSP
- 5. The Database Input " Time history" Parameters of simulation s1 -> f1 s2 -> f2 s f " 5-10 of them “a few gigabytes” sk -> fk 2 3 A single simulation q(x1 , t1 , s) 6 . . 7 at one time stepThe simulation 6 6 . 7 7 as a vector 6q(xn , t1 , s)7 6 7 6q(x1 , t2 , s)7 ⇥ ⇤ 6 7 f(s) = 6 . 7 X = f(s1 ) f(s2 ) ... f(sp ) 6 . . 7 6 7 6q(xn , t2 , s)7 6 7 The database as a matrix. 6 . 7 4 . . 5 100GB – 100TB q(xn , tk , s) 5 David Gleich · Purdue ICASSP
- 6. Xi,j = f (xi , sj ) One-dimensional 1 test problemf (x, s) = log[1 + 4s(x 2 x)] 8s f(x) X= f1 f2 f5 x “plot( X )” “imagesc(X )” 6 David Gleich · Purdue ICASSP
- 7. The interpolantMotivation! This idea was inspired byLet the data give you the basis. the success of other ⇥ ⇤ reduced order models X = f(s1 ) f(s2 ) ... f(sp ) like POD; and Paul’s residual minimizing idea.Then ﬁnd the right combination Xr f(s) ⇡ uj ↵j (s) j=1 These are the left singular vectors from X! 7 David Gleich · Purdue ICASSP
- 8. Why the SVD? It splits “space- time” from “parameters” treat each right singular vector x is the “space-time” index as samples of the unknown r r basis functions X Xf (xi , sj ) = Ui,` ` Vj,` = u` (xi ) ` v` (sj ) `=1 `=1 split x and s a general parameter r p X X (`)f (xi , s) = u` (xi ) ` v` (s) v` (s) ⇡ v` (sj ) j (s) `=1 j=1 Interpolate v any way you wish … and it has a “smoothness” property. 8 David Gleich · Purdue ICASSP
- 9. MapReduce and Interpolation f1 Interpolation Sample f2 Interp.! f5 The Database New Samples The Surrogate s1 -> f1 sa -> fa s2 -> f2 Use SVD on Form a linear sb -> fb MapReduce Just one machine combination of sk -> fk cluster to get singular vectors s -> f c c singular vector On the MapReduce cluster basis On the MapReduce clusterICASSP David Gleich · Purdue 9/18
- 10. A quiz!Which section would you rathertry and interpolate, A or B? A B 10 David Gleich · Purdue ICASSP
- 11. Fig. 1. An example of when the functions v` become dHow predictable is a ! cult to interpolate. Each plot shows a singular-vector f the example in Section 3, which we interpret as a funcsingular vector? v` (s). While we might have some conﬁdence in an interp tion of v1 (s) and v2 (s), interpolating v3 (s) for s nearby problematic, and interpolating v7 (s) anywhere is dubiousFolk Theorem (O’Leary 2011) v1 v2 1 1The singular vectors of a matrix of 0 0“smooth” data become more −1 −1oscillatory as the index increases. −1 0 1 −1 0 1 v v 3 7Implication! 0.5 0.5The gradient of the singular vectors 0 0increases as the index increases. −0.5 −1 0 1 −0.5 −1 0 1 Fig. 2. For reference, we show a ﬁner discretization ofv1 (s), v2 (s), ... , vt (s) v (s), ... , v (s) functions above, which shows that interpolating v7 (s) ne 1 is difﬁcult.t+1 r Predictable signal Unpredictable noise Once we have determined the predictable bases, w 11 terpolate them using procedures discussed above to cr David Gleich · Purdue ICASSP the ↵` (s). From the singular values and left singular vec
- 12. A reﬁned method with !an error model Don’t even try to interpolate the predictable modes. t(s) r X Xf(s) ⇡ uj ↵j (s) + uj j ⌘j j=1 Predictable j=t(s)+1 Unpredictable ⌘j ⇠ N(0, 1) 0 1 r X 2 TAVariance[f] = diag @ j uj uj j=t(s)+1 But now, how to choose t(s)? 12 David Gleich · Purdue ICASSP
- 13. Our current approach tochoosing the predictability v1 v2t(s) is the largest 𝜏 such that 1 1 0 0 X⌧ 1 @vi −1 −1 0 −1 1 −1 0 1 i v3 v7 1 @s 1 1 i=1 0 0 < threshold −1 −1 −1 0 1 −1 0 1Better ideas? Come talk to me! We can use more black v` becom Fig. 1. An example of when the functions gradients than red gradients, cult to interpolate. Each will be higher singular-vecto so error plot shows a for red. the example in Section 3, which we interpret as a fu 13 v` (s). While we might have some conﬁdence in an int tion of vDavidand v2 (s), interpolating v3 (s) for s nearb 1 (s) Gleich · Purdue ICASSP
- 14. An experimental test case A heat equation problem Two parameters that control the material properties 14 David Gleich · Purdue ICASSP
- 15. Where the error is the worst Error Our Reduced Order Model 10-2 10-3Histogram of errors The Truth 15 Error 10-3 10-2 David Gleich · Purdue ICASSP
- 16. A Large Scale ExampleNonlinear heat transfer model80k nodes, 300 time-steps104 basis runsSVD of 24m x 104 data matrix 500x reduction in wall clock time(100x including the SVD) 16 David Gleich · Purdue ICASSP
- 17. SVD from QR: R-SVDOld algorithm …Let A = QR Tthen A= QUR ⌃R VR… helps when A is tall and skinny. 17 David Gleich · Purdue ICASSP
- 18. Intro to MapReduceOriginated at Google for indexing web Data scalablepages and computing PageRank. Maps M M 1 2 1 MThe idea Bring the Reduce 2 M M Mcomputations to the data. R 3 4 3 M R M MExpress algorithms in " 4 5 5 M Shufﬂedata-local operations. Fault-tolerance by designImplement one type of Input stored in triplicatecommunication: shufﬂe. M Reduce input/" output on disk MShufﬂe moves all data with M Rthe same key to the same M Rreducer. Map output" persisted to disk" 18 before shufﬂe David Gleich · Purdue ICASSP
- 19. MapReduceTSQR summary MapReduce is great for TSQR!Data A tall and skinny (TS) matrix by rowsMap QR factorization of local rows Demmel et al. showed that this construction works toReduce QR factorization of local rows compute a QR factorization with minimal communicationInput 500,000,000-by-100 matrixEach record 1-by-100 rowHDFS Size 423.3 GBTime to compute (the norm of each column) 161 sec.Time to compute in qr( ) 387 sec. 19 On a 64-node Hadoop cluster with · Purdue David Gleich 4x2TB, one Core i7-920,ICASSP 12GB RAM/node
- 20. Key LimitationsComputes only R and not QCan get Q via Q = AR+ with another MR iteration. " (we currently use this for computing the SVD) Not numerically orthogonal; iterative reﬁnement helps.We are working on better ways to compute Q"(with Austin Benson, Jim Demmel) 20 David Gleich · Purdue ICASSP
- 21. Our vision!To enable analystsand engineers tohypothesize from " Paul G. Constantine " data computationsinstead of expensiveHPC computations. 21 David Gleich · Purdue ICASSP

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment