Your SlideShare is downloading. ×
0
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Future of LAPACK and ScaLAPACK
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Future of LAPACK and ScaLAPACK

1,130

Published on

Progress and directions for LAPACK and ScaLAPACK as of 2005. See

Progress and directions for LAPACK and ScaLAPACK as of 2005. See

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,130
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
8
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005
  • 2. Outline Survey responses: What users want Improving LAPACK and ScaLAPACK Improved Numerics Improved Performance Improved Functionality Improved Engineering and Community Two Example Improvements Numerics: Iterative Refinement for Ax = b Performance: The MRRR Algorithm for Ax = λx 2 / 16
  • 3. Survey: What users want Survey available from http://www.netlib.org/lapack-dev/. 212 responses, over 100 different, non-anonymous groups Problem sizes: 100 1K 10K 100K 1M (other) 8% 26% 24% 12% 6% (24%) >80% interested in small-medium SMPs >40% interested in large distributed-memory systems Vendor libs seen as faster, buggier over 20% want > double precision, 70% out-of-core Requests: High-level interfaces, low-level interfaces, parallel redistribution∗ and tuning 3 / 16
  • 4. Participants UC Berkeley Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida, o Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . . U Tennessee, Knoxville Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek, Stan Tomov, . . . Other Academic Institutions UT Austin, UC Davis, U Kansas, U Maryland, North Carolina SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen, U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb a Research Institutions CERFACS, LBL, UEC (Japan) Industrial Partners Cray, HP, Intel, MathWorks, NAG, SGI You? 4 / 16
  • 5. Improved Numerics Improved accuracy with standard asymptotic speed: Some are faster! Iterative refinement for linear systems, least squares Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan Pivoting and scaling for symmetric systems Definite and indefinite Jacobi SVD (and faster) — Drmaˇ / Veseli´ c c Condition numbers and estimators Higham / Cheng / Tisseur Useful approximate error estimates 5 / 16
  • 6. Improved Performance Improved performance with at least standard accuracy MRRR algorithm for eigenvalues, SVD Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri o Fast Hessenberg QR & QZ Byers / Mathias / Braman, K˚gstr¨m / Kressner a o Fast reductions and BLAS2.5 van de Geijn, Bischof / Lang, Howell / Fulton Recursive data layouts Gustavson / K˚gstr¨m / Elmroth / Jonsson a o generalized SVD — Bai, Wang Polynomial roots from semi-separable form Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel Automated tuning, optimizations in ScaLAPACK, . . . 6 / 16
  • 7. Improved Functionality Algorithms Updating / downdating factorizations — Stewart, Langou More generalized SVDs: products, CSD — Bai, Wang More generalized Sylvester, Lyupanov solvers K˚gstr¨m, Jonsson, Granat a o Quadratic eigenproblems — Mehrmann Matrix functions — Higham Implementations Add “missing” features to ScaLAPACK Generate LAPACK, ScaLAPACK for higher precisions 7 / 16
  • 8. Improved Engineering and Community Use new features without a rewrite Use modern Fortran 95, maybe 2003 DO . . . END DO, recursion, allocation (in wrappers) Provide higher-level wrappers for common languages F95, C, C++ Automatic generation of precisions, bindings Full automation (FLAME, etc.) not quite ready for all functions Tests for algorithms, implementations, installations Open development Need a community for long-term evolution. http://www.netlib.org/lapack-dev/ Lots of work to do, research and development. 8 / 16
  • 9. Two Example Improvements Recent, locally developed improvements Improved Numerics Iterative refinement for linear systems Ax = b: Extra precision ⇒ small error, dependable estimate Both normwise and componentwise (See LAWN 165 for full details.) Improved Performance MRRR algorithm for eigenvalue, SVD problems Optimal complexity: O(n) per value/vector (See LAWNs 162, 163, 166, 167. . . for more details.) 9 / 16
  • 10. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough √ Not-too-ill-conditioned ⇒ error O( n ε) Normwise error vs. condition number κnorm . (2000000 cases) Normwise error vs. condition number κnorm . (2000000 cases) 1 1 167212 1156099 104 40377 0 0 104 -1 -1 103 -2 -2 103 -3 -3 log10 Enorm log10 Enorm -4 463800 22544 102 -4 1539 102 -5 -5 -6 -6 101 101 -7 -7 -8 -8 190085 260 821097 1136987 -9 100 -9 100 0 5 10 15 0 5 10 15 log10 κnorm log10 κnorm 10 / 16
  • 11. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Dependable normwise relative error estimate Normwise Error vs. Bound (2000000 cases) Normwise Error vs. Bound (2000000 cases) 1 1 133497 485930 105 100 0 0 105 -1 -1 104 -2 -2 104 -3 -3 log10 Bnorm log10 Bnorm 1323311 103 40359 -4 56848 -4 343 103 -5 10 2 -5 414 1361 18 102 -6 -6 -7 101 -7 101 -8 -8 1957741 78 -9 100 -9 100 -8 -6 -4 -2 0 -8 -6 -4 -2 0 log10 Enorm log10 Enorm 11 / 16
  • 12. Numerics: Iterative Refinement Improve solution to Ax = b PSfrag replacemen Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Also small componentwise errors and dependable estimates Componentwise error vs. condition number κcomp . (2000000 cases) Componentwise Error vs. Bound (2000000 cases) 1 1 41755 1 236 0 0 104 105 -1 -1 -2 -2 104 103 -3 -3 log10 Bcomp log10 Ecomp 41702 -4 32249 -4 13034 103 102 -5 -5 25307 53 102 -6 -6 -7 101 -7 101 -8 -8 545427 1380569 1912961 6706 -9 100 -9 100 0 5 10 15 -8 -6 -4 -2 0 log10 κcomp log10 Ecomp 12 / 16
  • 13. Relying on Condition Numbers Need condition numbers for dependable estimates. Picking the right condition number and estimating it well. κnorm : single vs. double precision (2000000 cases) 14 0.0% 28.9% 12 104 log10 κnorm (double precision) 10 103 8 30.1% 6 19.2% 102 4 101 2 21.8% 0.0% 0 100 0 2 4 6 8 10 12 14 log10 κnorm (single precision) 13 / 16
  • 14. Performance: The MRRR Algorithm Multiple Relatively Robust Representations 1999 Householder Award honorable mention for Dhillon Optimal complexity with small error! O(nk) flops for k eigenvalues/vectors of n × n tridiagonal matrix Small residuals: Txi − λi xi = O(nε) Orthogonal eigenvectors: xiT xj = O(nε) Similar algorithm for SVD. Eigenvectors computed independently ⇒ naturally parallelizable (LAPACK r3 had bugs, missing cases) 14 / 16
  • 15. Performance: The MRRR Algorithm “fast DC”: Wilkinson, deflate like crazy 15 / 16
  • 16. Summary LAPACK and ScaLAPACK are open for improvement! Planned improvements in numerics, performance, functionality, and engineering. Forming a community for long-term development. 16 / 16

×