Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

161 views

Published on

In this deck from PASC18, Hatem Ltaief from KAUST presents: Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System.

"The real-time correction of telescopic images in the search for exoplanets is highly sensitive to atmospheric aberrations. The pseudo-inverse algorithm is an efficient mathematical method to filter out these turbulences. We introduce a new partial singular value decomposition (SVD) algorithm based on QR-based Diagonally Weighted Halley (QDWH) iteration for the pseudo-inverse method of adaptive optics. The QDWH partial SVD algorithm selectively calculates the most significant singular values and their corresponding singular vectors. We develop a high performance implementation and demonstrate the numerical robustness of the QDWH-based partial SVD method. We also perform a benchmarking campaign on various generations of GPU hardware accelerators and compare against the state-of-the-art SVD implementation SGESDD from the MAGMA library. Numerical accuracy and performance results are reported using synthetic and real observational datasets from the Subaru telescope. Our implementation outperforms SGESDD by up to fivefold and fourfold performance speedups on ill-conditioned synthetic matrices and real observational datasets, respectively. The pseudo-inverse simulation code will be deployed on-sky for the Subaru telescope during observation nights scheduled early 2018."

Watch the video: https://wp.me/p3RLHQ-iWN

Learn more: https://pasc18.pasc-conference.org/program/schedule/

Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

  1. 1. Extreme Computing for Extreme Adaptive Optics: the Key to Finding Life Outside our Solar System H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1 1Extreme Computing Research Center, KAUST, Saudi Arabia 3Steward Observatory, University of Arizona, Tucson, USA 2National Institutes of Natural Sciences, Tokyo, Japan 4National Astronomical Observatory of Japan, Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
  2. 2. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
  3. 3. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
  4. 4. The Subaru Telescope Represents a flagship telescope of the National Astronomical Observatory of Japan Carries a 8.2-meter (320in) diameter telescope Contains a high-contrast imaging system for directly imaging exoplanets Operates perhaps the most advanced HPC facility in computational astronomy Lives at the Mauna Kea Observatory in Hawaii HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
  5. 5. The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
  6. 6. (Perhaps) The Highest in Altitude GPU System Recorded at 14,000 feet! HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
  7. 7. The Atmosphere Turbulence and The Optical Aberration HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
  8. 8. The Astronomical Challenge Turbulence in the atmosphere limits the performance of astronomical telescopes Without active correction of such defects, images would be blurred to approximately one arcsecond angle To recover the loss of angular resolution, adaptive optics (AO) systems measure and correct atmospheric turbulence In the absence of optical aberrations, the telescope should provide λ D angular resolution (D:telescope diameter, λ wavelength) HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
  9. 9. Adaptive Optics 101 Wavefront sensor(s) (WFS) Measure details of blurring from ’guide star’ near the object you want to observe A real time controller (RTC) Processes the WFS signals to compute the control matrix based on the pseudo inverse Light from both guide star and astronomical object is reflected from deformable mirror; distortions are removed https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/ HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
  10. 10. How AO Works? HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
  11. 11. And Here Comes the Linear Algebra... Compute the pseudo inverse A+ AA+ A = A ,A ∈ Rm×n (m ≥ n) The numerical challenge of the pseudo inverse are twofold: Numerical: dealing with rectangular matrix which may engender numerical instabilities A A = V ΛV Computational: high algorithmic complexity, it should still be able to keep up with the overall throughput of the AO framework Using SVD: A = U ΣV then: A+ = V Σ−1 U Only most significant singular values with their associated singular vectors are required (≈10%) HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
  12. 12. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
  13. 13. SVD Algorithm - Standard Approach (LAPACK) This forms the SGESDD (divide and conquer algorithm) for the computation of SVD. Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3) Level-2 BLAS (4/3n3) 50%flops 50%flops 6%flops 90%time 85%time 30%time HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
  14. 14. Hardware Trends: Energy Matters! 2011 2018 DP FLOP 100 pJ 10 pJ DP DRAM Read 4800 pJ 1920 pJ Local interconnect 7500 pJ 2500 pJ Cross system 9000 pJ 3500 pJ John Shalf, LBNL HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
  15. 15. The Big Picture (Similar w/ SVD) Cray LibSci17.11.1 HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
  16. 16. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
  17. 17. What is The Polar Decomposition? The polar decomposition: A = UpH , A ∈ Rm×n (m ≥ n), where Up is an orthogonal matrix and H = √ A A is a symmetric positive semidefinite matrix The polar decomposition is a critical numerical algorithm for various applications, including aerospace computations, chemistry, factor analysis HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
  18. 18. QDWH Polar Decomposition Algorithm The QR-Dynamically Weighted Halley iterations: X0 = A/α, √ ckXk I = Q1 Q2 R, Xk+1 = bk ck Xk + 1 √ ck ak − bk ck Q1Q2 , k ≥ 0 The iterative procedure converges: A = UpH, where, UpUp = In, H is symmetric positive semidefinite Backward stable algorithm for computing the polar decomposition Based on conventional computational kernels, i.e., Cholesky/QR factorizations (≤ 6 iterations for double precision) and GEMM HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
  19. 19. Numerical Algorithm Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD. Compute the polar decomposition A = UpH using QDWH Calculate [Q R] = QR(Up + Id) Find the index ind = min(f ind(abs(diaд(R)) < threshold)) Extract ˜Q = Q(:,ind : end) Reduce the original matrix problem ˜A = A × ˜Q Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT Compute the right singular vectors V = ˜QT × ˜V Calculate the pseudo-inverse A+ = V Σ−1UT HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
  20. 20. Algorithmic Complexity Standard QDWH-based QDWH-based SVD Full SVD Partial SVD QDWH: (4+1/3)Nn3 x #itChol Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2 complexity SVD: 22s3 Where, Nn is the matrix size, and s is the number of the selected singular values/vectors (s << Nn) HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
  21. 21. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
  22. 22. Environment Settings Software: GCC compilers MAGMA v2.3 and CUDA v9.0 (including cuBLAS) Single precision (SP) arithmetics is used Ill-conditioned matrices generated using SLATMS MAGMA routine Hardware: The K80 GPU: 12GB of memory, Two-socket 14-core system Intel Broadwell system with 128GB of main memory The P100 and V100 GPUs: 16GB of memory Two-socket 16-core system Intel Haswell systems with 128GB of main memory HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
  23. 23. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
  24. 24. Synthetic Ill-Conditioned matrices, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 AccuracySingularValues Matrix size SGESDD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (a) Singular Value Accuracy. 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 13% SVD QDWHpartial, Right, 13% SVD QDWHpartial, Left, 13% SVD, QR+PO QDWHpartial, Right, 13% SVD, QR+PO QDWHpartial, Left, 10% SVD QDWHpartial, Right, 10% SVD 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 7% SVD QDWHpartial, Right, 7% SVD QDWHpartial, Left, 3% SVD QDWHpartial, Right, 3% SVD SGESDD, Left SGESDD, Right (b) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
  25. 25. Real Observational Datasets, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 AccuracySingularValues Matrix size SGESDD QDWHpartial (c) Singular Value Accuracy. 1e-12 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 ResidualofSVD Matrix size Right Left SGESDD, Left SGESDD, Right (d) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
  26. 26. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
  27. 27. Synthetic Ill-Conditioned matrices, K80 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (e) In Seconds. 0 500 1000 1500 2000 2500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (f) In Gflops/s. Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
  28. 28. Synthetic Ill-Conditioned matrices, P100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (g) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (h) In Gflops/s. Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
  29. 29. Synthetic Ill-Conditioned matrices, V100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (i) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (j) In Gflops/s. Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
  30. 30. Real Observational Datasets, V100 0.1 1 10 100 1161 5805 11610 17415 Time(s) Matrix size SGESDD QDWHpartial (k) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1161 5805 11610 17415 Gflop/s Matrix size QDWHpartial SGESDD (l) In Gflops/s. Up to 4X speedup HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
  31. 31. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
  32. 32. Conclusion and Future Work Comprehensive accuracy/performance analysis of a novel QDWH-based partial SVD algorithm Significant performance improvement of the QDWH-based partial SVD: up to 5X and 4X against state-of-the-art implementations on synthetic ill-conditioned matrices and real datasets across various hardware technologies The pseudo inverse simulation code has been deployed at the Subaru telescope and operating since June 24th 2018! Future work includes: Asynchronous task-based QDWH-based partial SVD implementation Multi-GPUs QDWH-based partial SVD implementation HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
  33. 33. Acknowledgments Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan NVIDIA GPU Research Center Cray Center of Excellence Intel Parallel Computing Center HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
  34. 34. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
  35. 35. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) The largest optical/near-infrared telescope in the world. It will weigh about 2700 tons with a main mirror diameter of 39m. Location: Chile, South America. H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing one of the most challenging instruments (MOSAIC) HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
  36. 36. Exciting Time for Astronomy at KAUST/ECRC! Supporting two major worldwide ground-based astronomy efforts The E-ELT Telescope The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
  37. 37. Bringing Astronomy Back Home ;-) Courtesy from CEMSE Communications, KAUST HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
  38. 38. The Hourglass Revisited @KAUST_ECRC https://www.facebook.com/ecrckaust HL, DS, OG, DK QDWH-Based Partial SVD 38 / 40
  39. 39. Questions? HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
  40. 40. Moving Forward with Extreme AO Last N WFS measurements sensor 1 N x n Last N WFS measurements sensor K N x n MVM Last N WFS measurements N x n MVM Last WFS measurement n MVM DM state m DM state m DM state m Control Matrix m x n Predictive Control Matrix m x ( N x n ) Sensor Fusion and Predictive control Matrix m x ( K x N x n ) HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40

×