Future of LAPACK and ScaLAPACK

  • 1,095 views
Uploaded on

Progress and directions for LAPACK and ScaLAPACK as of 2005. See

Progress and directions for LAPACK and ScaLAPACK as of 2005. See

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,095
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
7
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005
  • 2. Outline Survey responses: What users want Improving LAPACK and ScaLAPACK Improved Numerics Improved Performance Improved Functionality Improved Engineering and Community Two Example Improvements Numerics: Iterative Refinement for Ax = b Performance: The MRRR Algorithm for Ax = λx 2 / 16
  • 3. Survey: What users want Survey available from http://www.netlib.org/lapack-dev/. 212 responses, over 100 different, non-anonymous groups Problem sizes: 100 1K 10K 100K 1M (other) 8% 26% 24% 12% 6% (24%) >80% interested in small-medium SMPs >40% interested in large distributed-memory systems Vendor libs seen as faster, buggier over 20% want > double precision, 70% out-of-core Requests: High-level interfaces, low-level interfaces, parallel redistribution∗ and tuning 3 / 16
  • 4. Participants UC Berkeley Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida, o Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . . U Tennessee, Knoxville Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek, Stan Tomov, . . . Other Academic Institutions UT Austin, UC Davis, U Kansas, U Maryland, North Carolina SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen, U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb a Research Institutions CERFACS, LBL, UEC (Japan) Industrial Partners Cray, HP, Intel, MathWorks, NAG, SGI You? 4 / 16
  • 5. Improved Numerics Improved accuracy with standard asymptotic speed: Some are faster! Iterative refinement for linear systems, least squares Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan Pivoting and scaling for symmetric systems Definite and indefinite Jacobi SVD (and faster) — Drmaˇ / Veseli´ c c Condition numbers and estimators Higham / Cheng / Tisseur Useful approximate error estimates 5 / 16
  • 6. Improved Performance Improved performance with at least standard accuracy MRRR algorithm for eigenvalues, SVD Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri o Fast Hessenberg QR & QZ Byers / Mathias / Braman, K˚gstr¨m / Kressner a o Fast reductions and BLAS2.5 van de Geijn, Bischof / Lang, Howell / Fulton Recursive data layouts Gustavson / K˚gstr¨m / Elmroth / Jonsson a o generalized SVD — Bai, Wang Polynomial roots from semi-separable form Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel Automated tuning, optimizations in ScaLAPACK, . . . 6 / 16
  • 7. Improved Functionality Algorithms Updating / downdating factorizations — Stewart, Langou More generalized SVDs: products, CSD — Bai, Wang More generalized Sylvester, Lyupanov solvers K˚gstr¨m, Jonsson, Granat a o Quadratic eigenproblems — Mehrmann Matrix functions — Higham Implementations Add “missing” features to ScaLAPACK Generate LAPACK, ScaLAPACK for higher precisions 7 / 16
  • 8. Improved Engineering and Community Use new features without a rewrite Use modern Fortran 95, maybe 2003 DO . . . END DO, recursion, allocation (in wrappers) Provide higher-level wrappers for common languages F95, C, C++ Automatic generation of precisions, bindings Full automation (FLAME, etc.) not quite ready for all functions Tests for algorithms, implementations, installations Open development Need a community for long-term evolution. http://www.netlib.org/lapack-dev/ Lots of work to do, research and development. 8 / 16
  • 9. Two Example Improvements Recent, locally developed improvements Improved Numerics Iterative refinement for linear systems Ax = b: Extra precision ⇒ small error, dependable estimate Both normwise and componentwise (See LAWN 165 for full details.) Improved Performance MRRR algorithm for eigenvalue, SVD problems Optimal complexity: O(n) per value/vector (See LAWNs 162, 163, 166, 167. . . for more details.) 9 / 16
  • 10. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough √ Not-too-ill-conditioned ⇒ error O( n ε) Normwise error vs. condition number κnorm . (2000000 cases) Normwise error vs. condition number κnorm . (2000000 cases) 1 1 167212 1156099 104 40377 0 0 104 -1 -1 103 -2 -2 103 -3 -3 log10 Enorm log10 Enorm -4 463800 22544 102 -4 1539 102 -5 -5 -6 -6 101 101 -7 -7 -8 -8 190085 260 821097 1136987 -9 100 -9 100 0 5 10 15 0 5 10 15 log10 κnorm log10 κnorm 10 / 16
  • 11. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Dependable normwise relative error estimate Normwise Error vs. Bound (2000000 cases) Normwise Error vs. Bound (2000000 cases) 1 1 133497 485930 105 100 0 0 105 -1 -1 104 -2 -2 104 -3 -3 log10 Bnorm log10 Bnorm 1323311 103 40359 -4 56848 -4 343 103 -5 10 2 -5 414 1361 18 102 -6 -6 -7 101 -7 101 -8 -8 1957741 78 -9 100 -9 100 -8 -6 -4 -2 0 -8 -6 -4 -2 0 log10 Enorm log10 Enorm 11 / 16
  • 12. Numerics: Iterative Refinement Improve solution to Ax = b PSfrag replacemen Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Also small componentwise errors and dependable estimates Componentwise error vs. condition number κcomp . (2000000 cases) Componentwise Error vs. Bound (2000000 cases) 1 1 41755 1 236 0 0 104 105 -1 -1 -2 -2 104 103 -3 -3 log10 Bcomp log10 Ecomp 41702 -4 32249 -4 13034 103 102 -5 -5 25307 53 102 -6 -6 -7 101 -7 101 -8 -8 545427 1380569 1912961 6706 -9 100 -9 100 0 5 10 15 -8 -6 -4 -2 0 log10 κcomp log10 Ecomp 12 / 16
  • 13. Relying on Condition Numbers Need condition numbers for dependable estimates. Picking the right condition number and estimating it well. κnorm : single vs. double precision (2000000 cases) 14 0.0% 28.9% 12 104 log10 κnorm (double precision) 10 103 8 30.1% 6 19.2% 102 4 101 2 21.8% 0.0% 0 100 0 2 4 6 8 10 12 14 log10 κnorm (single precision) 13 / 16
  • 14. Performance: The MRRR Algorithm Multiple Relatively Robust Representations 1999 Householder Award honorable mention for Dhillon Optimal complexity with small error! O(nk) flops for k eigenvalues/vectors of n × n tridiagonal matrix Small residuals: Txi − λi xi = O(nε) Orthogonal eigenvectors: xiT xj = O(nε) Similar algorithm for SVD. Eigenvectors computed independently ⇒ naturally parallelizable (LAPACK r3 had bugs, missing cases) 14 / 16
  • 15. Performance: The MRRR Algorithm “fast DC”: Wilkinson, deflate like crazy 15 / 16
  • 16. Summary LAPACK and ScaLAPACK are open for improvement! Planned improvements in numerics, performance, functionality, and engineering. Forming a community for long-term development. 16 / 16