Your SlideShare is downloading. ×
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
PPT
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
635
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
6
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • When they are applied to spatial data, the prediction accuracy happens to be low due to the disregarded spatial dependencies among the samples. The SAR model solution of the general case requires a maximum likelihood estimation procedure involving a large number of matrix inversion and determinant computations over very large matrices for real-world applications.
  • Computing eigenvalues is very hard. So, we claim that there are quicker but approximate methods to solve SAR model
  • The dotted boxed term is the term added to linear regression to get spatial auto-regression.
  • There is only one parallel implementation of SAR model based on the estimation of maximum likelihood method by eigenvalue computation from [Li, 1996].
  • The random number generator can generate very long sequence of normal random numbers with desired mean and standard deviation. Such algorithms are rarely found
  • Short-cut comes from doing two matrix-vector multiplications instead of 2 matrix-matrix multiplications. D^(-1/2) is a vector instead of a diagonal matrix. W_tilda is symmetric and has got the same eigenvalues as W.
  • Please go back to SLIDE #10 for the second term in the log-likelihood function
  • Please refer to slide #10 for the funtion to optimize
  • Back-up Slide
  • Back-up slide
  • Back-up slide
  • Back-up slide
  • Transcript

    • 1. AHPCRC SPATIAL DATA-MINING TUTORIAL on Scalable Parallel Formulations of Spatial Auto-Regression (SAR) Models for Mining Regular Grid Geospatial Data Shashi Shekhar, Barış M. Kazar, David J. Lilja EECS Department @ University of Minnesota Army High Performance Computing Research Center (AHPCRC) Minnesota Supercomputing Institute (MSI) 05.14.2003
    • 2. Outline
      • Motivation for Parallel SAR Models
      • Background on Spatial Auto-regression Model
      • Our Contributions
      • Problem Definition & Hypothesis
      • Introduction to the SAR Software
      • Experimental Design
      • Related Work
      • Conclusions and Future Work
    • 3. Motivation for Parallel SAR Models
      • Linear regression models make the assumption of independent identical distribution (a.k.a. iid) about learning data samples.
      • Therefore, low prediction accuracy occurs
      • SAR model = generalization of linear regression model with an auto-correlation term
      • Incorporating the auto-correlation term:
        • Results in better prediction accuracy
        • However, computational complexity increases due to the logarithm-determinant (a.k.a. Jacobian) term of the maximum likelihood estimation procedure
        • Parallel processing can reduce execution time
    • 4. Our Contributions
      • This study is the first study that offers the only available parallel SAR formulation and evaluates its scalability
      • All of the eigenvalues of any type of dense neighborhood (square) matrix can be computed in parallel
      • Scalable parallel formulations of spatial auto-regression (SAR) models for 1-D and 2-D location prediction problems for planar surface partitionings using the eigenvalue computation
      • Hand-parallelized EISPACK, pre-parallelized LAPACK-based NAG linear algebra libraries and shared-memory parallel programming, i.e. OpenMP are used
    • 5. Background on Spatial Autoregression Model
      • Mixed Regressive Spatial Auto-regressive (SAR) Model
      • where:
      • y : n- by - 1 vector of observations on dependent variable
      •  : spatial autoregression parameter (coefficient)
      • W : n -by- n matrix of spatial weights
      • (i.e. contiguity, neighborhood matrix)
      • X : n -by- k matrix of observations on the explanatory variables
      •  : k -by-1 vector of regression coefficients
      • I : n -by- n Identity matrix
      •  : n -by-1 vector of unobservable error term ~ N(0,  2 I )
      •  : common variance of error term
      • When  = 0, the SAR model collapses to the classical regression model
    • 6. Background on SAR Model –Cont’d
          • If x = 0 and W 2 = 0 , then first-order spatial a utoregressive model is obtained as follows:
      • y =  W 1 y + 
      •  ~ N(0,  2 I )
          • If W 2 = 0 , then mixed regressive spatial autoregressive model (a . k . a . spatial lag model) is obtained as follows:
      • y =  W 1 y + x  + 
      •  ~ N(0,  2 I )
          • If W 1 = 0 , then regression model with spatial autocorrelation in the disturbances (a . k . a . spatial error model ) is obtained as follows:
      • y = x  + u
      • u =  W 2 u + 
      •  ~ N(0,  2 I )
    • 7. Problem Definition
      • Given:
          • A Sequential solution procedure: “Serial Dense Matrix Approach”
      • Same as Prof. Bin Li’s method in Fortran 77
      • Find:
          • Faster serial formulation called as “Serial Sparse Matrix Approach”
          • New Parallel Formulation of Serial Dense Matrix Approach different from Prof. Bin Li’s method in CM-fortran
      • Constraints:   N(0,  2 I ) IID
          • Rastor Data (resulting in binary neighborhood,contiguity matrix W )
          • Parallel Platform (HW: Origin 3800, IBM SP, IBM Regatta, Cray X1; SW: Fortran, OpenMP)
          • Size of W (big vs. small and dense vs. sparse)
      • Objective:
          • Maximize speedup (T serial /T parallel )
          • Scalability- Better than (Li,1996)’s formulation
    • 8. Hypotheses
      • There are a number of sequential algorithms computing SAR model, most of which are based on the estimation of maximum likelihood method that solves for the spatial autoregression parameter (  ) and regression coefficients (  ).
      • As the problem size gets bigger, the sequential methods are incapable of solving this problem due to
        • extensive number of computations and
        • large memory requirement .
      • The new parallel formulation proposed in this study will outperform the previous parallel implementation in terms of:
          • Speedup (S),
          • Scalability,
          • Problem size (PS) and,
          • Memory requirement.
    • 9. Serial SAR Model Solution
      • Starting with normal density function:
      • The maximum likelihood (ML) function:
      • The explicit form of maximum likelihood (ML) function:
    • 10. Serial SAR Model Solution - (Cont’d)
      • The logarithm of the maximum likelihood function is called
      • log-likelihood function
      • The ML estimates of the SAR parameters:
      • The function to optimize:
    • 11. System Diagram B Golden Section Search to find that minimizes ML function A Compute Eigenvalues Pre-processing Step C Compute and given the best estimate using least squares Calculate the ML function Eigenvalues of W The Symmetric Eigenvalue-equivalent Neighborhood Matrix
    • 12. Introduction to SAR Software
      • Following slides will describe:
        • how we implemented the SAR model solution
        • execution trace
      • Matlab is good for small size problems
        • Memory is not enough
        • Execution time is too long
      • Compiler language needed for larger problems such as Fortran
        • Up to 80GB can be used on supercomputers
        • Execution time decreased considerably due to parallel processing
    • 13. 1. Pre-processing Step
      • It consists of four sub-steps
        • Forming Epsilon (  ) Vector
        • Form W , the row-normalized Neighborhood Matrix
        • Symmetrize W
        • Form y & x Vectors
      Form Training Data Set y Form W Symmetrization of W To box B To box A
    • 14. 1.0 Form Epsilon (  ) Vector
      • Input: p =4; q =4; n = pq =16;
      • Output: a 16-by-1 column vector  (epsilon)
      • The  term i.e. the 16 -by-1 vector of unobservable error term ~ N(0,  2 I ) in the SAR equation
      • y =  Wy + x  + 
      • The elements are IID with zero mean and unit standard deviation normal random numbers
      • Prof. Isaku Wada’s normal random number generator written in C is used, which is much better than Matlab’s nrand function
      •  T =[0.9914 -1.2780 -0.4976 1.2271 0.4740 -0.0700 0.0834 -0.8789 0.3378 -1.5901 0.0638 -0.2717 2.3235 -0.2973 -0.1964 0.1416]
    • 15. 1.1 Form W , the Neighborhood Matrix
      • Input: p =4; q =4; n = pq =16; neighborhood type (4- Neighbors)
      • Output: The binary (non-row-normalized) 16-by-16 C matrix; the row-sum in a 16-by-1 column vector D_onehalf; the row-normalized 16-by-16 neighborhood matrix W
      • The neighborhood matrix, W is formed by using the following neighborhood relationship ((i.j) is the current pixel):
      • The solution can handle more cases such as:
        • 1-D 2-neighbors
        • 2-D 8,16,24 etc. neighbors
    • 16. 1.1 Matrices for 4-by-4 Regular Grid Space
      • (a) (b) (c)
      • (a) The spatial framework which is p -by- q (4-by-4) where p may or may not be equal to q
      • (b) the pq -by- pq (16-by-16) non-row-normalized (non-stochastic) neighborhood matrix C with 4 nearest neighbors, and
      • (c) the row-normalized version i.e. W which is also pq -by- pq (16-by-16). The product pq is equal to n (16), i.e. the problem size.
      6 th row 6 th row
    • 17. 1.1 Neighborhood Structures 2-D 16 Neighborhood 2-D 8 Neighborhood 1-D 2 Neighborhood 2-D 4 Neighborhood 2-D 24 Neighborhood
    • 18. 1.2 Symmetrize W
      • Input: p , q , n, W,C,D_onehalf
      • Output: the 16-by-16 symmetric eigenvalue-eqiuvalent matrix of W
      • Matlab programs do not need this step since eig function can find the eigenvalues of a dense non-symmetric matrix
      • EISPACK’s subroutines find all eigenvalues of a symmetric dense matrix. Therefore, need to convert W to its eigenvalue-equivalent form
      • The following short-cut algorithm achieves this task:
    • 19. 1.3 Form y & x Vectors
      • Input: p , q , n ,  , W
      • Output: y and x vectors each of which is 16-by-1
      • They are the synthetic (training) data to test SAR model
      • Fortran programs perform this task separately in another program using NAG SMP library subroutines
      • For n=16:
      • x T = [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15,16]
      • y T =[ 8.2439 7.9720 11.7226 16.4201 17.9623 19.5719 22.8666 24.9425 29.9876 30.6348 35.1528 37.4712 42.1301 42.7142 45.7171 47.8384]
    • 20. 2. Find All of the Eigenvalues of W
      • Matlab programs use eig function which finds all of the eigenvalues of non-symmetric dense matrix
      • Fortran programs use the tred2 and tql1 EISPACK subroutines, which is the most efficient to find eigenvalues
      • There are two sub-steps:
        • Convert dense symmetric matrix to tridiagonal matrix
        • Find all eigenvalues of the tridiagonal matrix
    • 21. 2.1 Convert symmetric matrix to tridiagonal matrix
      • Input: n ,
      • Output: Diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector d , the sub-diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector e
      • This is Householder Transformation which is only used in fortran programs
      • This step is the most-time consuming part (%99 of the total execution time)
    • 22. 2.2 Find All Eigenvalues of the Tridiagonal Matrix
      • Input: Diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector d , the sub-diagonal elements of the resulting tri-diagonal matrix in 16-by-1 column vector e
      • Output: All of the eigenvalues of the neighborhood matrix W
      • This is QL transformation which is only used in fortran programs
      • This is left serial in fortran programs
    • 23. 3. Fit for the SAR Parameter rho (  )
      • There are two subroutines in this step
        • Calculating constant statistics terms
          • Saves time by calculating the non-changing terms during the golden section search algorithm
        • Golden section search
          • Similar to binary search
    • 24. 3.1 Calculating constant statistics terms
      • Input: x,y , n , W
      • Output: KY , KWY column vectors
      • Fortran programs use this subroutine to perform K-optimization where constant term K =[ I - x (( x T x ) -1 ) x T ] which is 16-by-16 for n=16
        • The second term in log-likelihood expression
        • = y T ( I -rho W ) T [ I - x (( x T x ) -1 ) x T ] T [ I - x *(( x T x ) -1 )* x T ] ( I -rho W ) y
        • = y T ( I -rho* W )'* K T K ( I -rho W ) y
        • = ( K ( I -rho W ) y )T * ( K ( I -rho W ) y )
        • = ( Ky - rho KWy ) T * ( Ky - rho KWy )
        • which saves many matrix-vector multiplications
      • Matlab programs directly calculate all (constant and non-constant) terms in the log-likelihood function over and over again so do not need this operation
        • Those terms are not expensive to calculate in Matlab but expensive in fortran
    • 25. 3.2 Golden Section Search
      • Input: x,y,n,eigenn, W , KY , KWY ,ax,bx,cx,tol
      • Output: fgld, xmin, niter
      • The best estimate for rho (xmin) is found
      • The optimization function is the log-likelihood function
      • Similar to binary search and bisection method
      • The detailed outputs are given in the execution traces
    • 26. 4. Calculate Beta (  ) and Sigma (  )
      • Input: x,y , n , niter , rho_cap , W , KY , KWY
      • Output: beta_cap and sigmasqr_cap which are both scalars
      • Niter (number of iterations), rho_cap are scalars
      • Calculating best estimate of beta and sigma 2
      • The formulas are:
      • As n (I.e. problem size) incerases, the estimates become more accurate
    • 27. Sample Output from Matlab Programs (n=16)
      • Comparison of methods:
        • First row: Dense straight log-det calculation
        • Second row: Log-det calculation via exact eigenvalue calculation
        • Third row: Log-det calculation via approximate eigenvalue calculation (Griffith)
        • Fourth row: Log-det calculation via better approximate eigenvalue calculation (Griffith)
        • Fifth row is coming up as Markov Chain Monte Carlo approximated SAR
      • Columns are defined as follows: actual rho=0.412 & beta=1.91
        • ML value min_log rho_cap niter beta_cap sigma_sqr
      • 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
      • 2.6687 -1.0000 0.4014 77.0000 1.9465 0.8509
      • 2.6645 -1.0000 0.4018 77.0000 1.9452 0.8508
      • 2.6641 -1.0000 0.4019 77.0000 1.9451 0.8508
    • 28. Sample Output from Matlab Programs (n=2500)
      • Comparison of methods:
        • First row: Dense straight log-det calculation
        • Second row: Log-det calculation via exact eigenvalue calculation
        • Third row: Log-det calculation via approximate eigenvalue calculation (Griffith)
        • Fourth row: Log-det calculation via better approximate eigenvalue calculation (Griffith)
        • Fifth row is coming up as Markov Chain Monte Carlo approximated SAR
      • Columns are defined as follows: actual rho=0.412 & beta=1.91
        • ML value min_log rho_cap niter beta_cap sigma_sqr
      • 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
      • 7.8685 -1.0000 0.4127 77.0000 1.9079 0.9985
      • 7.8683 -1.0000 0.4127 77.0000 1.9079 0.9985
      • 7.8681 -1.0000 0.4127 77.0000 1.9079 0.9985
    • 29. Instructions on How-To-Run Matlab Code
      • On SGI Origins
        • train_test_SAR (interactively)
        • matlab -nodisplay < train_test_SAR.m > output
      • On IBM SP
        • same
      • On IBM Regatta
        • same
      • On Cray X1
        • Not available yet
    • 30. Instructions on How-To-Run Fortran Code
      • On SGI Origins with 4 threads (processors)
        • f77 -64 -Ofast -mp sar_exactlogdeteigvals_2DN4_2500.f
        • setenv OMP_NUM_THREADS 4
        • time a.out
      • On IBM SP with 4 threads (processors)
        • xlf_r -O3 -qstrict -q64 -qsmp=omp sar_exactlogdeteigvals_2DN4_2500.f
        • setenv OMP_NUM_THREADS 4
        • time a.out
      • On IBM Regatta with 4 threads (processors)
        • xlf_r -O3 -qstrict -q64 -qsmp=omp sar_exactlogdeteigvals_2DN4_2500.f
        • setenv OMP_NUM_THREADS 4
        • time a.out
      • On Cray X1
        • Coming soon…
    • 31. Illustration on a 2-by-2 Regular Grid Space
      • Aim is to show how SAR works, i.e. Execution Trace
      • Forming training data, y
        • With known rho (0.1), beta (1.0), x ([1;2;3;4]), epsilon(=0.01*rand(4,1)), and
      • the neighborhood matrix W for 1-D, compute y (observed variable)
      • (Matlab notation is used here)
        • y=inv(eye(4,4)-rho.*W)*(beta.* x +epsilon)
      • Testing SAR
        • Solving SAR model = Finding eigenvalues & Fitting for SAR parameters
        • Run SAR model with inputs as y, epsilon, W and with a range of rho [0,1)
        • Find beta as well
        • The prediction for rho is very close to 0.1 (with an error of %0.01 !)
      Binary form Row-normalized (Stochastic) form Neighborhood Matrix 1 2 3 4 Landscape
    • 32. Summary of the SAR Software i.e. The Big Picture
    • 33. New Parallel SAR Model Solution Formulation
      • Prof. Mark Bull’s “Expert Programmer vs Parallelizing Compiler” Scientific Programming 1996 paper
      • The loop 240 & loop 280 are the major bottlenecks and parallelized most of the code as will be shown
      • The data distribution on both loops should be similar to benefit from value re-use
      • Loop 280 cannot benefit from block-wise partitioning, it should use interleaved scheduling for load balance. Thus, both loops use interleaved scheduling
      • Parallelizing initialization phase imitates manual data distribution, page-placement & page-migration utilities of SGI Origin machines
      • The variable “etemp” enables reduction operation on the variable “e” that is updated by different processors
    • 34. Experimental Design – Response Variables & Factors
      • Speedup: S =T serial / T parallel ,
        • Time taken to solve a problem on a single processor over time to solve the same problem on a parallel computer with p identical processors .
      • Scalability of a Parallel System:
        • The measure of the algorithm’s capacity to increase S in proportion to p in a particular parallel system.
        • Scalable systems has the ability to maintain efficiency (i.e. S / p ) at a fixed value by simultaneously increasing p and PS .
        • Reflects a parallel system’s ability to utilize increasing processing resources effectively.
      • Name of Factor Factor’s Parameter Domain
        • Problem Size ( n ) {400, 2500, 6400,10000}
        • Neighborhood Structure {2-D 4-Neighbors}
        • Range of rho {[0,1)}
        • Parallelization options {Static scheduling for box B and C , dynamic with chunk size 4 for box A }
        • Number of processors {1,…,16}
        • Algorithm used {householder transformation followed by QL algorithm followed by golden section search}
    • 35. Related Work
      • Eigenvalue software on the web are studied in depth at the following web sites:
        • http://www-users.cs.umn.edu/~kazar/sar/sar_presentations/eigenvalue_solvers.html
        • http://www.netlib.org/utk/people/JackDongarra/la-sw.html
      • [Griffith,1995] computed analytically the eigenvalues for 1-D, 2-D with {4,8} neighbor cases. However, this is tedious for other cases
        • The approximate and better approximate eigenvalues are from this source
        • However, there are no closed form expression for many other cases
      • Bin Li [1996] implemented parallel SAR using a constant mean model
        • The programs cannot run anymor (CM-fortran)
        • Our model is more general and hard to solve
        • Our programs are the only running algorithms in the literature
    • 36. SAR Model Solutions are cross-classified
    • 37. References
      • [ICP03] Baris Kazar, Shashi Shekhar, and David J. Lilja, &quot;Parallel Formulation of Spatial Auto-Regression&quot;, submitted to ICPP 2003 [under review]
      • [IEE02] S. Shekhar, P. Schrater, R. Vatsavai, W. Wu, and S. Chawla, Spatial Contextual Classification and Prediction Models for Mining Geospatial Data , IEEE Transactions on Multimedia (special issue on Multimedia Dataabses) , 2002
    • 38. Conclusions & Future Work
      • Able to solve SAR model for any type of W matrix until the memory limit is reached by finding all of its eigenvalues
      • Finding eigenvalues is hard
      • Sparsity of W should be exploited if eigenvalue subroutines allow
      • Efficient Partitioning of very large sparse matrices via parMETIS
      • New methods will be studied:
        • Markov Chain Monte Carlo Estimates (parSARmcmc)
        • parSALE = parallel spatial auto-regression local estimation
        • Characteristic Polynomial Approach
      • Contributions to spatial statistics package Version 2.0 from Prof. Kelley Pace will continue
    • 39. Short Tutorial on OpenMP
      • Fork-join model of parallel execution
      • The parallel regions are denoted by directives in fortran and pragmas in C/C++
      • Data environment: (first/last/thread) Private, shared variables and their scopes across parallel and serial regions
      • Work-sharing constructs: parallel do, sections (Static, dynamic scheduling with/without chunks)
      • Synchronization: atomic, critical, barrier, flush, ordered, implicit synchronization after each parallel for loop
      • Run-time library functions e.g. to determine which thread is executing at some time
      Serial Region Master Thread Child Threads Master Thread Child Threads Parallel Region Parallel Region Serial Region
    • 40. Short Tutorial on Eigenvalues
      • Let A be a linear transformation represented by a matrix. If there is a X different from zero vector such that:
      • for some scalar λ , then λ is called the eigenvalue of A with corresponding (right) eigenvector X .
      • Eigenvalues are also known as characteristic roots, proper values, or latent roots (Marcus and Minc 1988, p. 144).
      • (A-I λ )X=0 is the characteristic equation (polynomial) and roots of this polynomial are the eigenvalues.
    • 41. Hands-On Part
      • Please goto:
        • http://www.cs.umn.edu/~kazar/sar/index.html
        • Find 05.14.2003 phrase
        • Type in “shashi” for username
        • Type in “shashi” for password
      • To run the programs we need to login to one of the SGI Origins, IBM SP, IBM Regatta (Cray X1 is not ready yet)
      • All programs are run by submitting to a queue
      • No interactive runs recommended for fortran programs due to the system load and high number of processors needed for execution

    ×