Progress and directions for LAPACK and ScaLAPACK as of 2005. See <a href="http://www.netlib.org/lapack/">LAPACK at Netlib</a> and <a href="http://www.netlib.org/lapack-dev/">lapack-dev</a> for more current information.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Future of LAPACK and ScaLAPACK
1. The Future of LAPACK and
ScaLAPACK
Jason Riedy, Yozo Hida, James Demmel
EECS Department
University of California, Berkeley
November 18, 2005
2. Outline
Survey responses: What users want
Improving LAPACK and ScaLAPACK
Improved Numerics
Improved Performance
Improved Functionality
Improved Engineering and Community
Two Example Improvements
Numerics: Iterative Refinement for Ax = b
Performance: The MRRR Algorithm for Ax = λx
2 / 16
3. Survey: What users want
Survey available from
http://www.netlib.org/lapack-dev/.
212 responses, over 100 different, non-anonymous groups
Problem sizes:
100 1K 10K 100K 1M (other)
8% 26% 24% 12% 6% (24%)
>80% interested in small-medium SMPs
>40% interested in large distributed-memory systems
Vendor libs seen as faster, buggier
over 20% want > double precision, 70% out-of-core
Requests: High-level interfaces, low-level interfaces,
parallel redistribution∗ and tuning
3 / 16
4. Participants
UC Berkeley
Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye
Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida,
o
Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .
U Tennessee, Knoxville
Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,
Stan Tomov, . . .
Other Academic Institutions
UT Austin, UC Davis, U Kansas, U Maryland, North Carolina
SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,
U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb
a
Research Institutions
CERFACS, LBL, UEC (Japan)
Industrial Partners
Cray, HP, Intel, MathWorks, NAG, SGI
You?
4 / 16
5. Improved Numerics
Improved accuracy with standard asymptotic speed:
Some are faster!
Iterative refinement for linear systems, least squares
Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan
Pivoting and scaling for symmetric systems
Definite and indefinite
Jacobi SVD (and faster) — Drmaˇ / Veseli´
c c
Condition numbers and estimators
Higham / Cheng / Tisseur
Useful approximate error estimates
5 / 16
6. Improved Performance
Improved performance with at least standard accuracy
MRRR algorithm for eigenvalues, SVD
Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri
o
Fast Hessenberg QR & QZ
Byers / Mathias / Braman, K˚gstr¨m / Kressner
a o
Fast reductions and BLAS2.5
van de Geijn, Bischof / Lang, Howell / Fulton
Recursive data layouts
Gustavson / K˚gstr¨m / Elmroth / Jonsson
a o
generalized SVD — Bai, Wang
Polynomial roots from semi-separable form
Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel
Automated tuning, optimizations in ScaLAPACK, . . .
6 / 16
7. Improved Functionality
Algorithms
Updating / downdating factorizations — Stewart, Langou
More generalized SVDs: products, CSD — Bai, Wang
More generalized Sylvester, Lyupanov solvers
K˚gstr¨m, Jonsson, Granat
a o
Quadratic eigenproblems — Mehrmann
Matrix functions — Higham
Implementations
Add “missing” features to ScaLAPACK
Generate LAPACK, ScaLAPACK for higher precisions
7 / 16
8. Improved Engineering and Community
Use new features without a rewrite
Use modern Fortran 95, maybe 2003
DO . . . END DO, recursion, allocation (in wrappers)
Provide higher-level wrappers for common languages
F95, C, C++
Automatic generation of precisions, bindings
Full automation (FLAME, etc.) not quite ready for all
functions
Tests for algorithms, implementations, installations
Open development
Need a community for long-term evolution.
http://www.netlib.org/lapack-dev/
Lots of work to do, research and development.
8 / 16
9. Two Example Improvements
Recent, locally developed improvements
Improved Numerics
Iterative refinement for linear systems Ax = b:
Extra precision ⇒ small error, dependable estimate
Both normwise and componentwise
(See LAWN 165 for full details.)
Improved Performance
MRRR algorithm for eigenvalue, SVD problems
Optimal complexity: O(n) per value/vector
(See LAWNs 162, 163, 166, 167. . . for more details.)
9 / 16
10. Numerics: Iterative Refinement
Improve solution to Ax = b
Repeat: r = b − Ax, dx = A−1 r , x = x + dx
Until: good enough
√
Not-too-ill-conditioned ⇒ error O( n ε)
Normwise error vs. condition number κnorm . (2000000 cases) Normwise error vs. condition number κnorm . (2000000 cases)
1 1
167212 1156099 104 40377
0 0 104
-1 -1
103
-2 -2
103
-3 -3
log10 Enorm
log10 Enorm
-4 463800 22544 102 -4 1539
102
-5 -5
-6 -6
101 101
-7 -7
-8 -8
190085 260 821097 1136987
-9 100 -9 100
0 5 10 15 0 5 10 15
log10 κnorm log10 κnorm
10 / 16
12. Numerics: Iterative Refinement
Improve solution to Ax = b
PSfrag replacemen
Repeat: r = b − Ax, dx = A−1 r , x = x + dx
Until: good enough
Also small componentwise errors and dependable estimates
Componentwise error vs. condition number κcomp . (2000000 cases) Componentwise Error vs. Bound (2000000 cases)
1 1
41755 1 236
0 0
104 105
-1 -1
-2 -2 104
103
-3 -3
log10 Bcomp
log10 Ecomp
41702
-4 32249 -4 13034 103
102
-5 -5
25307 53 102
-6 -6
-7 101 -7
101
-8 -8
545427 1380569 1912961 6706
-9 100 -9 100
0 5 10 15 -8 -6 -4 -2 0
log10 κcomp log10 Ecomp
12 / 16
13. Relying on Condition Numbers
Need condition numbers for dependable estimates.
Picking the right condition number and estimating it well.
κnorm : single vs. double precision (2000000 cases)
14
0.0% 28.9%
12
104
log10 κnorm (double precision)
10
103
8
30.1%
6 19.2% 102
4
101
2
21.8% 0.0%
0 100
0 2 4 6 8 10 12 14
log10 κnorm (single precision)
13 / 16
14. Performance: The MRRR Algorithm
Multiple Relatively Robust Representations
1999 Householder Award honorable mention for Dhillon
Optimal complexity with small error!
O(nk) flops for k eigenvalues/vectors of n × n
tridiagonal matrix
Small residuals: Txi − λi xi = O(nε)
Orthogonal eigenvectors: xiT xj = O(nε)
Similar algorithm for SVD.
Eigenvectors computed independently ⇒ naturally
parallelizable
(LAPACK r3 had bugs, missing cases)
14 / 16
16. Summary
LAPACK and ScaLAPACK are open for improvement!
Planned improvements in
numerics,
performance,
functionality, and
engineering.
Forming a community for long-term development.
16 / 16