SlideShare a Scribd company logo
The Future of LAPACK and
       ScaLAPACK

 Jason Riedy, Yozo Hida, James Demmel

               EECS Department
        University of California, Berkeley


          November 18, 2005
Outline

  Survey responses: What users want

  Improving LAPACK and ScaLAPACK
     Improved Numerics
     Improved Performance
     Improved Functionality
     Improved Engineering and Community

  Two Example Improvements
    Numerics: Iterative Refinement for Ax = b
    Performance: The MRRR Algorithm for Ax = λx




                                                  2 / 16
Survey: What users want
    Survey available from
    http://www.netlib.org/lapack-dev/.
    212 responses, over 100 different, non-anonymous groups
    Problem sizes:
              100 1K 10K 100K 1M (other)
              8% 26% 24% 12% 6% (24%)
    >80% interested in small-medium SMPs
    >40% interested in large distributed-memory systems
    Vendor libs seen as faster, buggier
    over 20% want > double precision, 70% out-of-core
    Requests: High-level interfaces, low-level interfaces,
    parallel redistribution∗ and tuning


                                                             3 / 16
Participants
     UC Berkeley
         Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye
         Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida,
                                       o
         Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . .
     U Tennessee, Knoxville
         Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek,
         Stan Tomov, . . .
     Other Academic Institutions
         UT Austin, UC Davis, U Kansas, U Maryland, North Carolina
         SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen,
         U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb
                                        a
     Research Institutions
         CERFACS, LBL, UEC (Japan)
     Industrial Partners
         Cray, HP, Intel, MathWorks, NAG, SGI
                              You?
                                                                       4 / 16
Improved Numerics

  Improved accuracy with standard asymptotic speed:
  Some are faster!
      Iterative refinement for linear systems, least squares
      Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan
      Pivoting and scaling for symmetric systems
           Definite and indefinite
      Jacobi SVD (and faster) — Drmaˇ / Veseli´
                                     c        c
      Condition numbers and estimators
      Higham / Cheng / Tisseur
      Useful approximate error estimates




                                                                   5 / 16
Improved Performance
  Improved performance with at least standard accuracy
      MRRR algorithm for eigenvalues, SVD
      Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri
                           o
      Fast Hessenberg QR & QZ
      Byers / Mathias / Braman, K˚gstr¨m / Kressner
                                 a    o
      Fast reductions and BLAS2.5
      van de Geijn, Bischof / Lang, Howell / Fulton
      Recursive data layouts
      Gustavson / K˚gstr¨m / Elmroth / Jonsson
                   a    o
      generalized SVD — Bai, Wang
      Polynomial roots from semi-separable form
      Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel
      Automated tuning, optimizations in ScaLAPACK, . . .
                                                                    6 / 16
Improved Functionality
  Algorithms
     Updating / downdating factorizations — Stewart, Langou
     More generalized SVDs: products, CSD — Bai, Wang
     More generalized Sylvester, Lyupanov solvers
     K˚gstr¨m, Jonsson, Granat
      a    o
     Quadratic eigenproblems — Mehrmann
     Matrix functions — Higham

  Implementations
     Add “missing” features to ScaLAPACK
     Generate LAPACK, ScaLAPACK for higher precisions


                                                              7 / 16
Improved Engineering and Community
  Use new features without a rewrite
      Use modern Fortran 95, maybe 2003
          DO . . . END DO, recursion, allocation (in wrappers)
      Provide higher-level wrappers for common languages
          F95, C, C++
      Automatic generation of precisions, bindings
          Full automation (FLAME, etc.) not quite ready for all
          functions
      Tests for algorithms, implementations, installations

  Open development
         Need a community for long-term evolution.
          http://www.netlib.org/lapack-dev/
        Lots of work to do, research and development.
                                                                  8 / 16
Two Example Improvements
  Recent, locally developed improvements
  Improved Numerics
  Iterative refinement for linear systems Ax = b:
       Extra precision ⇒ small error, dependable estimate
       Both normwise and componentwise
       (See LAWN 165 for full details.)

  Improved Performance
  MRRR algorithm for eigenvalue, SVD problems
      Optimal complexity: O(n) per value/vector
      (See LAWNs 162, 163, 166, 167. . . for more details.)



                                                              9 / 16
Numerics: Iterative Refinement
  Improve solution to Ax = b
                 Repeat: r = b − Ax, dx = A−1 r , x = x + dx
                   Until: good enough
                                                          √
                        Not-too-ill-conditioned ⇒ error O( n ε)
                Normwise error vs. condition number κnorm . (2000000 cases)                        Normwise error vs. condition number κnorm . (2000000 cases)
                1                                                                                  1
                  167212                                       1156099        104                                                                    40377
                0                                                                                  0                                                             104

                -1                                                                                -1
                                                                              103
                -2                                                                                -2
                                                                                                                                                                 103
                -3                                                                                -3
  log10 Enorm




                                                                                    log10 Enorm
                -4 463800                                          22544      102                 -4                                                  1539
                                                                                                                                                                 102
                -5                                                                                -5

                -6                                                                                -6
                                                                              101                                                                                101
                -7                                                                                -7

                -8                                                                                -8
                      190085                                        260                                 821097                                    1136987
                -9                                                            100                 -9                                                             100
                  0             5               10            15                                    0              5               10            15
                                        log10 κnorm                                                                        log10 κnorm




                                                                                                                                                                       10 / 16
Numerics: Iterative Refinement
  Improve solution to Ax = b
                 Repeat: r = b − Ax, dx = A−1 r , x = x + dx
                   Until: good enough
                                  Dependable normwise relative error estimate
                            Normwise Error vs. Bound (2000000 cases)                                        Normwise Error vs. Bound (2000000 cases)
                 1                                                                                 1
                     133497              485930                            105                                           100
                 0                                                                                 0
                                                                                                                                                                105
                -1                                                                                -1
                                                                           104
                -2                                                                                -2
                                                                                                                                                                104
                -3                                                                                -3
  log10 Bnorm




                                                                                    log10 Bnorm
                                                                 1323311   103                                                                     40359
                -4 56848                                                                          -4 343                                                        103

                -5                                                         10   2                 -5
                                                  414                                                                           1361                       18   102
                -6                                                                                -6

                -7                                                         101                    -7
                                                                                                                                                                101
                -8                                                                                -8
                                                                                                            1957741                78
                -9                                                         100                    -9                                                            100
                       -8           -6           -4       -2           0                               -8           -6              -4       -2        0
                                            log10 Enorm                                                                        log10 Enorm




                                                                                                                                                                      11 / 16
Numerics: Iterative Refinement
  Improve solution to Ax = b
                                                                   PSfrag replacemen
                   Repeat: r = b − Ax, dx = A−1 r , x = x + dx
                     Until: good enough
                Also small componentwise errors and dependable estimates
                Componentwise error vs. condition number κcomp . (2000000 cases)                                Componentwise Error vs. Bound (2000000 cases)
                  1                                                                                     1
                                                                    41755                                   1                   236
                  0                                                                                     0
                                                                                   104                                                                               105
                  -1                                                                                   -1

                 -2                                                                                    -2                                                            104
                                                                                   103
                 -3                                                                                    -3




                                                                                         log10 Bcomp
  log10 Ecomp




                                                                                                                                                          41702
                 -4                                                 32249                              -4 13034                                                      103
                                                                                   102
                 -5                                                                                    -5
                                                                                                                                      25307                     53   102
                 -6                                                                                    -6

                 -7                                                                101                 -7
                                                                                                                                                                     101
                  -8                                                                                   -8
                       545427                                     1380569                                            1912961           6706
                 -9                                                                100                 -9                                                            100
                   0                 5                  10            15                                        -8         -6              -4       -2     0
                                          log10 κcomp                                                                                 log10 Ecomp




                                                                                                                                                                           12 / 16
Relying on Condition Numbers
  Need condition numbers for dependable estimates.
  Picking the right condition number and estimating it well.
                                                        κnorm : single vs. double precision (2000000 cases)
                                                14
                                                      0.0%                    28.9%

                                                12
                                                                                                                   104
               log10 κnorm (double precision)




                                                10

                                                                                                                   103
                                                 8

                                                                                                       30.1%
                                                 6 19.2%                                                           102


                                                 4
                                                                                                                   101
                                                 2

                                                                      21.8%                               0.0%
                                                 0                                                                 100
                                                  0          2      4       6        8       10      12       14
                                                                  log10 κnorm (single precision)




                                                                                                                         13 / 16
Performance: The MRRR Algorithm
  Multiple Relatively Robust Representations
      1999 Householder Award honorable mention for Dhillon
      Optimal complexity with small error!
          O(nk) flops for k eigenvalues/vectors of n × n
          tridiagonal matrix
          Small residuals: Txi − λi xi = O(nε)
          Orthogonal eigenvectors: xiT xj = O(nε)
      Similar algorithm for SVD.
      Eigenvectors computed independently ⇒ naturally
      parallelizable
      (LAPACK r3 had bugs, missing cases)



                                                             14 / 16
Performance: The MRRR Algorithm




        “fast DC”: Wilkinson, deflate like crazy


                                                  15 / 16
Summary


    LAPACK and ScaLAPACK are open for improvement!
    Planned improvements in
        numerics,
        performance,
        functionality, and
        engineering.
    Forming a community for long-term development.




                                                     16 / 16

More Related Content

Similar to Future of LAPACK and ScaLAPACK

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchGreg Makowski
 
OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017Deborah Porchivina
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018Manish Pandey
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification學翰 施
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsThe Linux Foundation
 
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...AEI / Affiliated Engineers
 
Optimization of water distribution systems design parameters using genetic al...
Optimization of water distribution systems design parameters using genetic al...Optimization of water distribution systems design parameters using genetic al...
Optimization of water distribution systems design parameters using genetic al...Tanay Kulkarni
 
2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptxGhazal Tashakor
 
ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015Guilherme Vaz
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...Pooyan Jamshidi
 
Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002Miguel Morales
 
UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day PresentationSDavis7
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 

Similar to Future of LAPACK and ScaLAPACK (20)

Heuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient searchHeuristic design of experiments w meta gradient search
Heuristic design of experiments w meta gradient search
 
OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017OIF 112G Panel at DesignCon 2017
OIF 112G Panel at DesignCon 2017
 
Catapult DOE Case Study
Catapult DOE Case StudyCatapult DOE Case Study
Catapult DOE Case Study
 
Keynote: Machine Learning for Design Automation at DAC 2018
Keynote:  Machine Learning for Design Automation at DAC 2018Keynote:  Machine Learning for Design Automation at DAC 2018
Keynote: Machine Learning for Design Automation at DAC 2018
 
Compressed learning for time series classification
Compressed learning for time series classificationCompressed learning for time series classification
Compressed learning for time series classification
 
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core SystemsTackling the Management Challenges of Server Consolidation on Multi-core Systems
Tackling the Management Challenges of Server Consolidation on Multi-core Systems
 
ACM 2013-02-25
ACM 2013-02-25ACM 2013-02-25
ACM 2013-02-25
 
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...
Emerging Hazards: Renewables and Microgrids, U.S. Department of Energy, Energ...
 
Optimization of water distribution systems design parameters using genetic al...
Optimization of water distribution systems design parameters using genetic al...Optimization of water distribution systems design parameters using genetic al...
Optimization of water distribution systems design parameters using genetic al...
 
Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...Introduction to Genetic algorithm and its significance in VLSI design and aut...
Introduction to Genetic algorithm and its significance in VLSI design and aut...
 
2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx2022_01_TSMP_HPC_Asia_Final.pptx
2022_01_TSMP_HPC_Asia_Final.pptx
 
23AFMC_Beamer.pdf
23AFMC_Beamer.pdf23AFMC_Beamer.pdf
23AFMC_Beamer.pdf
 
Self healing data
Self healing dataSelf healing data
Self healing data
 
ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015ReFRESCO-General-Jan2015
ReFRESCO-General-Jan2015
 
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn... A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
A Framework for Robust Control of Uncertainty in Self-Adaptive Software Conn...
 
DefenseTalk_Trimmed
DefenseTalk_TrimmedDefenseTalk_Trimmed
DefenseTalk_Trimmed
 
Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002Jag Trasgo Helsinki091002
Jag Trasgo Helsinki091002
 
UMBC Research Day Presentation
UMBC Research Day PresentationUMBC Research Day Presentation
UMBC Research Day Presentation
 
Prediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source toolsPrediction of pKa from chemical structure using free and open source tools
Prediction of pKa from chemical structure using free and open source tools
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 

More from Jason Riedy

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13Jason Riedy
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFJason Riedy
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architecturesJason Riedy
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and EmusJason Riedy
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...Jason Riedy
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureJason Riedy
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondJason Riedy
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksJason Riedy
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateJason Riedy
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Jason Riedy
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsJason Riedy
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisJason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs Jason Riedy
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsJason Riedy
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraJason Riedy
 

More from Jason Riedy (20)

Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
LAGraph 2021-10-13
LAGraph 2021-10-13LAGraph 2021-10-13
LAGraph 2021-10-13
 
Lucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoFLucata at the HPEC GraphBLAS BoF
Lucata at the HPEC GraphBLAS BoF
 
Graph analysis and novel architectures
Graph analysis and novel architecturesGraph analysis and novel architectures
Graph analysis and novel architectures
 
GraphBLAS and Emus
GraphBLAS and EmusGraphBLAS and Emus
GraphBLAS and Emus
 
Reproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to ArchitectureReproducible Linear Algebra from Application to Architecture
Reproducible Linear Algebra from Application to Architecture
 
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
PEARC19: Wrangling Rogues: A Case Study on Managing Experimental Post-Moore A...
 
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to ArchitectureICIAM 2019: Reproducible Linear Algebra from Application to Architecture
ICIAM 2019: Reproducible Linear Algebra from Application to Architecture
 
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph AnalysisICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
ICIAM 2019: A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
Novel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and BeyondNovel Architectures for Applications in Data Science and Beyond
Novel Architectures for Applications in Data Science and Beyond
 
Characterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with MicrobenchmarksCharacterization of Emu Chick with Microbenchmarks
Characterization of Emu Chick with Microbenchmarks
 
CRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery UpdateCRNCH 2018 Summit: Rogues Gallery Update
CRNCH 2018 Summit: Rogues Gallery Update
 
Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018Augmented Arithmetic Operations Proposed for IEEE-754 2018
Augmented Arithmetic Operations Proposed for IEEE-754 2018
 
Graph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New ArchitecturesGraph Analysis: New Algorithm Models, New Architectures
Graph Analysis: New Algorithm Models, New Architectures
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing PlatformsCRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
CRNCH Rogues Gallery: A Community Core for Novel Computing Platforms
 
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph AnalysisA New Algorithm Model for Massive-Scale Streaming Graph Analysis
A New Algorithm Model for Massive-Scale Streaming Graph Analysis
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
High-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming GraphsHigh-Performance Analysis of Streaming Graphs
High-Performance Analysis of Streaming Graphs
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 

Recently uploaded

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backElena Simperl
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsVlad Stirbu
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Thierry Lestable
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersSafe Software
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutesconfluent
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxDavid Michel
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...Product School
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform EngineeringJemma Hussein Allen
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsExpeed Software
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 

Recently uploaded (20)

Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptxUnpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
Unpacking Value Delivery - Agile Oxford Meetup - May 2024.pptx
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...How world-class product teams are winning in the AI era by CEO and Founder, P...
How world-class product teams are winning in the AI era by CEO and Founder, P...
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
In-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT ProfessionalsIn-Depth Performance Testing Guide for IT Professionals
In-Depth Performance Testing Guide for IT Professionals
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 

Future of LAPACK and ScaLAPACK

  • 1. The Future of LAPACK and ScaLAPACK Jason Riedy, Yozo Hida, James Demmel EECS Department University of California, Berkeley November 18, 2005
  • 2. Outline Survey responses: What users want Improving LAPACK and ScaLAPACK Improved Numerics Improved Performance Improved Functionality Improved Engineering and Community Two Example Improvements Numerics: Iterative Refinement for Ax = b Performance: The MRRR Algorithm for Ax = λx 2 / 16
  • 3. Survey: What users want Survey available from http://www.netlib.org/lapack-dev/. 212 responses, over 100 different, non-anonymous groups Problem sizes: 100 1K 10K 100K 1M (other) 8% 26% 24% 12% 6% (24%) >80% interested in small-medium SMPs >40% interested in large distributed-memory systems Vendor libs seen as faster, buggier over 20% want > double precision, 70% out-of-core Requests: High-level interfaces, low-level interfaces, parallel redistribution∗ and tuning 3 / 16
  • 4. Participants UC Berkeley Jim Demmel, Ming Gu, W. Kahan, Beresford Parlett, Xiaoye Li, Osni Marques, Christof V¨mel, David Bindel, Yozo Hida, o Jason Riedy, Jianlin Xia, Jiang Zhu, undergrads, . . . U Tennessee, Knoxville Jack Dongarra, Julien Langou, Julie Langou, Piotr Luszczek, Stan Tomov, . . . Other Academic Institutions UT Austin, UC Davis, U Kansas, U Maryland, North Carolina SU, San Jose SU, UC Santa Barbara, TU Berlin, FU Hagen, U Madrid, U Manchester, U Ume˚, U Wuppertal, U Zagreb a Research Institutions CERFACS, LBL, UEC (Japan) Industrial Partners Cray, HP, Intel, MathWorks, NAG, SGI You? 4 / 16
  • 5. Improved Numerics Improved accuracy with standard asymptotic speed: Some are faster! Iterative refinement for linear systems, least squares Demmel / Hida / Kahan / Li / Mukherjee / Riedy / Sarkisyan Pivoting and scaling for symmetric systems Definite and indefinite Jacobi SVD (and faster) — Drmaˇ / Veseli´ c c Condition numbers and estimators Higham / Cheng / Tisseur Useful approximate error estimates 5 / 16
  • 6. Improved Performance Improved performance with at least standard accuracy MRRR algorithm for eigenvalues, SVD Parlett / Dhillon / V¨mel / Marques / Willems / Katagiri o Fast Hessenberg QR & QZ Byers / Mathias / Braman, K˚gstr¨m / Kressner a o Fast reductions and BLAS2.5 van de Geijn, Bischof / Lang, Howell / Fulton Recursive data layouts Gustavson / K˚gstr¨m / Elmroth / Jonsson a o generalized SVD — Bai, Wang Polynomial roots from semi-separable form Gu / Chandrasekaran / Zhu / Xia / Bindel / Garmire / Demmel Automated tuning, optimizations in ScaLAPACK, . . . 6 / 16
  • 7. Improved Functionality Algorithms Updating / downdating factorizations — Stewart, Langou More generalized SVDs: products, CSD — Bai, Wang More generalized Sylvester, Lyupanov solvers K˚gstr¨m, Jonsson, Granat a o Quadratic eigenproblems — Mehrmann Matrix functions — Higham Implementations Add “missing” features to ScaLAPACK Generate LAPACK, ScaLAPACK for higher precisions 7 / 16
  • 8. Improved Engineering and Community Use new features without a rewrite Use modern Fortran 95, maybe 2003 DO . . . END DO, recursion, allocation (in wrappers) Provide higher-level wrappers for common languages F95, C, C++ Automatic generation of precisions, bindings Full automation (FLAME, etc.) not quite ready for all functions Tests for algorithms, implementations, installations Open development Need a community for long-term evolution. http://www.netlib.org/lapack-dev/ Lots of work to do, research and development. 8 / 16
  • 9. Two Example Improvements Recent, locally developed improvements Improved Numerics Iterative refinement for linear systems Ax = b: Extra precision ⇒ small error, dependable estimate Both normwise and componentwise (See LAWN 165 for full details.) Improved Performance MRRR algorithm for eigenvalue, SVD problems Optimal complexity: O(n) per value/vector (See LAWNs 162, 163, 166, 167. . . for more details.) 9 / 16
  • 10. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough √ Not-too-ill-conditioned ⇒ error O( n ε) Normwise error vs. condition number κnorm . (2000000 cases) Normwise error vs. condition number κnorm . (2000000 cases) 1 1 167212 1156099 104 40377 0 0 104 -1 -1 103 -2 -2 103 -3 -3 log10 Enorm log10 Enorm -4 463800 22544 102 -4 1539 102 -5 -5 -6 -6 101 101 -7 -7 -8 -8 190085 260 821097 1136987 -9 100 -9 100 0 5 10 15 0 5 10 15 log10 κnorm log10 κnorm 10 / 16
  • 11. Numerics: Iterative Refinement Improve solution to Ax = b Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Dependable normwise relative error estimate Normwise Error vs. Bound (2000000 cases) Normwise Error vs. Bound (2000000 cases) 1 1 133497 485930 105 100 0 0 105 -1 -1 104 -2 -2 104 -3 -3 log10 Bnorm log10 Bnorm 1323311 103 40359 -4 56848 -4 343 103 -5 10 2 -5 414 1361 18 102 -6 -6 -7 101 -7 101 -8 -8 1957741 78 -9 100 -9 100 -8 -6 -4 -2 0 -8 -6 -4 -2 0 log10 Enorm log10 Enorm 11 / 16
  • 12. Numerics: Iterative Refinement Improve solution to Ax = b PSfrag replacemen Repeat: r = b − Ax, dx = A−1 r , x = x + dx Until: good enough Also small componentwise errors and dependable estimates Componentwise error vs. condition number κcomp . (2000000 cases) Componentwise Error vs. Bound (2000000 cases) 1 1 41755 1 236 0 0 104 105 -1 -1 -2 -2 104 103 -3 -3 log10 Bcomp log10 Ecomp 41702 -4 32249 -4 13034 103 102 -5 -5 25307 53 102 -6 -6 -7 101 -7 101 -8 -8 545427 1380569 1912961 6706 -9 100 -9 100 0 5 10 15 -8 -6 -4 -2 0 log10 κcomp log10 Ecomp 12 / 16
  • 13. Relying on Condition Numbers Need condition numbers for dependable estimates. Picking the right condition number and estimating it well. κnorm : single vs. double precision (2000000 cases) 14 0.0% 28.9% 12 104 log10 κnorm (double precision) 10 103 8 30.1% 6 19.2% 102 4 101 2 21.8% 0.0% 0 100 0 2 4 6 8 10 12 14 log10 κnorm (single precision) 13 / 16
  • 14. Performance: The MRRR Algorithm Multiple Relatively Robust Representations 1999 Householder Award honorable mention for Dhillon Optimal complexity with small error! O(nk) flops for k eigenvalues/vectors of n × n tridiagonal matrix Small residuals: Txi − λi xi = O(nε) Orthogonal eigenvectors: xiT xj = O(nε) Similar algorithm for SVD. Eigenvectors computed independently ⇒ naturally parallelizable (LAPACK r3 had bugs, missing cases) 14 / 16
  • 15. Performance: The MRRR Algorithm “fast DC”: Wilkinson, deflate like crazy 15 / 16
  • 16. Summary LAPACK and ScaLAPACK are open for improvement! Planned improvements in numerics, performance, functionality, and engineering. Forming a community for long-term development. 16 / 16