Future of LAPACK and ScaLAPACK

The Future of LAPACK and
ScaLAPACK

Jason Riedy, Yozo Hida, James Demmel

EECS Department
University of California, Berkeley

November 18, 2005

Outline

Survey responses: What users want

Improving LAPACK and ScaLAPACK
Improved Numerics
Improved Performance
Improved Functionality
Improved Engineering and Community

Two Example Improvements
Numerics: Iterative Reﬁnement for Ax = b
Performance: The MRRR Algorithm for Ax = λx

2 / 16

Survey: What users want
Survey available from
http://www.netlib.org/lapack-dev/.
212 responses, over 100 diﬀerent, non-anonymous groups
Problem sizes:
100 1K 10K 100K 1M (other)
8% 26% 24% 12% 6% (24%)
>80% interested in small-medium SMPs
>40% interested in large distributed-memory systems
Vendor libs seen as faster, buggier
over 20% want > double precision, 70% out-of-core
Requests: High-level interfaces, low-level interfaces,
parallel redistribution∗ and tuning

3 / 16

Participants
UC Berkeley
Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye
Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida,
o
Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .
U Tennessee, Knoxville
Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,
Stan Tomov, . . .
Other Academic Institutions
UT Austin, UC Davis, U Kansas, U Maryland, North Carolina
SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,
U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb
a
Research Institutions
CERFACS, LBL, UEC (Japan)
Industrial Partners
Cray, HP, Intel, MathWorks, NAG, SGI
You?
4 / 16

Improved Numerics

Improved accuracy with standard asymptotic speed:
Some are faster!
Iterative refinement for linear systems, least squares
Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan
Pivoting and scaling for symmetric systems
Definite and indefinite
Jacobi SVD (and faster) — Drmaˇ / Veseli´
c c
Condition numbers and estimators
Higham / Cheng / Tisseur
Useful approximate error estimates

5 / 16

Improved performance with at least standard accuracy
MRRR algorithm for eigenvalues, SVD
Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri
o
Fast Hessenberg QR & QZ
Byers / Mathias / Braman, K˚gstr¨m / Kressner
a o
Fast reductions and BLAS2.5
van de Geijn, Bischof / Lang, Howell / Fulton
Recursive data layouts
Gustavson / K˚gstr¨m / Elmroth / Jonsson
a o
generalized SVD — Bai, Wang
Polynomial roots from semi-separable form
Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel
Automated tuning, optimizations in ScaLAPACK, . . .
6 / 16

Improved Functionality
Algorithms
Updating / downdating factorizations — Stewart, Langou
More generalized SVDs: products, CSD — Bai, Wang
More generalized Sylvester, Lyupanov solvers
K˚gstr¨m, Jonsson, Granat
a o
Quadratic eigenproblems — Mehrmann
Matrix functions — Higham

Implementations
Add “missing” features to ScaLAPACK
Generate LAPACK, ScaLAPACK for higher precisions

7 / 16

Improved Engineering and Community
Use new features without a rewrite
Use modern Fortran 95, maybe 2003
DO . . . END DO, recursion, allocation (in wrappers)
Provide higher-level wrappers for common languages
F95, C, C++
Automatic generation of precisions, bindings
Full automation (FLAME, etc.) not quite ready for all
functions
Tests for algorithms, implementations, installations

Open development
Need a community for long-term evolution.
http://www.netlib.org/lapack-dev/
Lots of work to do, research and development.
8 / 16

Two Example Improvements
Recent, locally developed improvements
Improved Numerics
Iterative reﬁnement for linear systems Ax = b:
Extra precision ⇒ small error, dependable estimate
Both normwise and componentwise
(See LAWN 165 for full details.)

MRRR algorithm for eigenvalue, SVD problems
Optimal complexity: O(n) per value/vector
(See LAWNs 162, 163, 166, 167. . . for more details.)

9 / 16

Numerics: Iterative Reﬁnement
Improve solution to Ax = b
Repeat: r = b − Ax, dx = A−1 r , x = x + dx
Until: good enough
√
Not-too-ill-conditioned ⇒ error O( n ε)
Normwise error vs. condition number κnorm . (2000000 cases) Normwise error vs. condition number κnorm . (2000000 cases)
1 1
167212 1156099 104 40377
0 0 104

-1 -1
103
-2 -2
103
-3 -3
log10 Enorm

log10 Enorm
-4 463800 22544 102 -4 1539
102
-5 -5

-6 -6
101 101
-7 -7

-8 -8
190085 260 821097 1136987
-9 100 -9 100
0 5 10 15 0 5 10 15
log10 κnorm log10 κnorm

10 / 16

Until: good enough
Dependable normwise relative error estimate
Normwise Error vs. Bound (2000000 cases) Normwise Error vs. Bound (2000000 cases)
1 1
133497 485930 105 100
0 0
105
-1 -1
104
-2 -2
104
-3 -3
log10 Bnorm

log10 Bnorm
1323311 103 40359
-4 56848 -4 343 103

-5 10 2 -5
414 1361 18 102
-6 -6

-7 101 -7
101
-8 -8
1957741 78
-9 100 -9 100
-8 -6 -4 -2 0 -8 -6 -4 -2 0
log10 Enorm log10 Enorm

11 / 16

PSfrag replacemen
Until: good enough
Also small componentwise errors and dependable estimates
Componentwise error vs. condition number κcomp . (2000000 cases) Componentwise Error vs. Bound (2000000 cases)
1 1
41755 1 236
0 0
104 105
-1 -1

-2 -2 104
103
-3 -3

log10 Bcomp
log10 Ecomp

41702
-4 32249 -4 13034 103
102
-5 -5
25307 53 102
-6 -6

-7 101 -7
101
-8 -8
545427 1380569 1912961 6706
-9 100 -9 100
0 5 10 15 -8 -6 -4 -2 0
log10 κcomp log10 Ecomp

12 / 16

Relying on Condition Numbers
Need condition numbers for dependable estimates.
Picking the right condition number and estimating it well.
κnorm : single vs. double precision (2000000 cases)
14
0.0% 28.9%

12
104
log10 κnorm (double precision)

10

103
8

30.1%
6 19.2% 102

4
101
2

21.8% 0.0%
0 100
0 2 4 6 8 10 12 14
log10 κnorm (single precision)

13 / 16

Performance: The MRRR Algorithm
Multiple Relatively Robust Representations
1999 Householder Award honorable mention for Dhillon
Optimal complexity with small error!
O(nk) ﬂops for k eigenvalues/vectors of n × n
tridiagonal matrix
Small residuals: Txi − λi xi = O(nε)
Orthogonal eigenvectors: xiT xj = O(nε)
Similar algorithm for SVD.
Eigenvectors computed independently ⇒ naturally
parallelizable
(LAPACK r3 had bugs, missing cases)

14 / 16

Performance: The MRRR Algorithm

“fast DC”: Wilkinson, deﬂate like crazy

15 / 16

Summary

LAPACK and ScaLAPACK are open for improvement!
Planned improvements in
numerics,
performance,
functionality, and
engineering.
Forming a community for long-term development.

16 / 16

Future of LAPACK and ScaLAPACK

Recommended

Recommended

More Related Content

Similar to Future of LAPACK and ScaLAPACK

Similar to Future of LAPACK and ScaLAPACK (20)

More from Jason Riedy

More from Jason Riedy (20)

Recently uploaded

Recently uploaded (20)

Future of LAPACK and ScaLAPACK