SlideShare a Scribd company logo
Extreme Computing for Extreme Adaptive Optics:
the Key to Finding Life Outside our Solar System
H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1
1Extreme Computing Research Center, KAUST, Saudi Arabia
3Steward Observatory, University of Arizona, Tucson, USA
2National Institutes of Natural Sciences, Tokyo, Japan
4National Astronomical Observatory of Japan, Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
The Subaru Telescope
Represents a flagship telescope of the National Astronomical
Observatory of Japan
Carries a 8.2-meter (320in) diameter telescope
Contains a high-contrast imaging system for directly imaging
exoplanets
Operates perhaps the most advanced HPC facility in computational
astronomy
Lives at the Mauna Kea Observatory in Hawaii
HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
(Perhaps) The Highest in Altitude GPU System Recorded
at 14,000 feet!
HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
The Atmosphere Turbulence and The Optical Aberration
HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
The Astronomical Challenge
Turbulence in the atmosphere limits the
performance of astronomical telescopes
Without active correction of such defects, images
would be blurred to approximately one arcsecond
angle
To recover the loss of angular resolution, adaptive
optics (AO) systems measure and correct
atmospheric turbulence
In the absence of optical aberrations, the telescope
should provide λ
D angular resolution (D:telescope
diameter, λ wavelength)
HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
Adaptive Optics 101
Wavefront sensor(s)
(WFS)
Measure details of
blurring from ’guide
star’ near the object
you want to observe
A real time controller
(RTC)
Processes the WFS
signals to compute
the control matrix
based on the pseudo
inverse
Light from both guide
star and astronomical
object is reflected
from deformable
mirror; distortions are
removed
https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/
HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
How AO Works?
HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
And Here Comes the Linear Algebra...
Compute the pseudo inverse A+
AA+
A = A ,A ∈ Rm×n
(m ≥ n)
The numerical challenge of the pseudo inverse are twofold:
Numerical: dealing with rectangular matrix which may engender
numerical instabilities
A A = V ΛV
Computational: high algorithmic complexity, it should still be able to
keep up with the overall throughput of the AO framework
Using SVD: A = U ΣV then:
A+
= V Σ−1
U
Only most significant singular values with their associated singular
vectors are required (≈10%)
HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
SVD Algorithm - Standard Approach (LAPACK)
This forms the SGESDD (divide and conquer algorithm) for the computation
of SVD.
Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3)
Level-2 BLAS (4/3n3)
50%flops 50%flops 6%flops
90%time 85%time 30%time
HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
Hardware Trends: Energy Matters!
2011 2018
DP FLOP 100 pJ 10 pJ
DP DRAM Read 4800 pJ 1920 pJ
Local interconnect 7500 pJ 2500 pJ
Cross system 9000 pJ 3500 pJ
John Shalf, LBNL
HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
The Big Picture (Similar w/ SVD)
Cray
LibSci17.11.1
HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
What is The Polar Decomposition?
The polar decomposition:
A = UpH , A ∈ Rm×n
(m ≥ n),
where Up is an orthogonal matrix and H =
√
A A is a symmetric positive
semidefinite matrix
The polar decomposition is a critical numerical algorithm for various
applications, including aerospace computations, chemistry, factor
analysis
HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
QDWH Polar Decomposition Algorithm
The QR-Dynamically Weighted Halley iterations:
X0 = A/α,
√
ckXk
I
=
Q1
Q2
R, Xk+1 =
bk
ck
Xk +
1
√
ck
ak −
bk
ck
Q1Q2
, k ≥ 0
The iterative procedure converges:
A = UpH,
where, UpUp = In, H is symmetric positive semidefinite
Backward stable algorithm for computing the polar decomposition
Based on conventional computational kernels, i.e., Cholesky/QR
factorizations (≤ 6 iterations for double precision) and GEMM
HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
Numerical Algorithm
Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD.
Compute the polar decomposition A = UpH using QDWH
Calculate [Q R] = QR(Up + Id)
Find the index ind = min(f ind(abs(diaд(R)) < threshold))
Extract ˜Q = Q(:,ind : end)
Reduce the original matrix problem ˜A = A × ˜Q
Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT
Compute the right singular vectors V = ˜QT × ˜V
Calculate the pseudo-inverse A+ = V Σ−1UT
HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
Algorithmic Complexity
Standard QDWH-based QDWH-based
SVD Full SVD Partial SVD
QDWH: (4+1/3)Nn3 x #itChol
Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2
complexity SVD: 22s3
Where, Nn is the matrix size, and s is the number of the selected singular values/vectors
(s << Nn)
HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
Environment Settings
Software:
GCC compilers
MAGMA v2.3 and CUDA v9.0 (including cuBLAS)
Single precision (SP) arithmetics is used
Ill-conditioned matrices generated using SLATMS MAGMA routine
Hardware:
The K80 GPU:
12GB of memory,
Two-socket 14-core system
Intel Broadwell system with 128GB of main memory
The P100 and V100 GPUs:
16GB of memory
Two-socket 16-core system
Intel Haswell systems with 128GB of main memory
HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
Synthetic Ill-Conditioned matrices, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(a) Singular Value Accuracy.
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 13% SVD
QDWHpartial, Right, 13% SVD
QDWHpartial, Left, 13% SVD, QR+PO
QDWHpartial, Right, 13% SVD, QR+PO
QDWHpartial, Left, 10% SVD
QDWHpartial, Right, 10% SVD
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000011000120001300014000150001600017000180001900020000
ResidualofSVD Matrix Size
QDWHpartial, Left, 7% SVD
QDWHpartial, Right, 7% SVD
QDWHpartial, Left, 3% SVD
QDWHpartial, Right, 3% SVD
SGESDD, Left
SGESDD, Right
(b) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
Real Observational Datasets, K80
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
AccuracySingularValues
Matrix size
SGESDD
QDWHpartial
(c) Singular Value Accuracy.
1e-12
1e-11
1e-10
1e-09
1e-08
1e-07
1e-06
1e-05
0.0001
0.001
0.01
0.1
1
1161 5805 11610 17415
ResidualofSVD Matrix size
Right
Left
SGESDD, Left
SGESDD, Right
(d) Backward Error.
HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
Synthetic Ill-Conditioned matrices, K80
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(e) In Seconds.
0
500
1000
1500
2000
2500
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(f) In Gflops/s.
Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
Synthetic Ill-Conditioned matrices, P100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(g) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(h) In Gflops/s.
Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
Synthetic Ill-Conditioned matrices, V100
0.01
0.1
1
10
100
1000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Time(s)
Matrix size
SGESDD
QDWHpartial, 13% SVD, QR+PO
QDWHpartial, 13% SVD
QDWHpartial, 10% SVD
QDWHpartial, 7% SVD
QDWHpartial, 3% SVD
(i) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
1000
2000
3000
4000
5000
6000
7000
8000
9000
10000
11000
12000
13000
14000
15000
16000
17000
18000
19000
20000
Gflop/s
Matrix size
QDWHpartial, 3% SVD
QDWHpartial, 7% SVD
QDWHpartial, 10% SVD
QDWHpartial, 13% SVD
QDWHpartial, 13% SVD, QR+PO
SGESDD
(j) In Gflops/s.
Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance
HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
Real Observational Datasets, V100
0.1
1
10
100
1161
5805
11610
17415
Time(s)
Matrix size
SGESDD
QDWHpartial
(k) In Seconds.
0
1000
2000
3000
4000
5000
6000
7000
8000
1161
5805
11610
17415
Gflop/s
Matrix size
QDWHpartial
SGESDD
(l) In Gflops/s.
Up to 4X speedup
HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
Outline
1 Motivation
2 State-of-the-art Approach
3 QDWH-Based Partial SVD
4 Environment Settings
5 Numerical Accuracy
6 Performance Results
7 Conclusion and Future Work
HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
Conclusion and Future Work
Comprehensive accuracy/performance analysis of a novel
QDWH-based partial SVD algorithm
Significant performance improvement of the QDWH-based partial
SVD: up to 5X and 4X against state-of-the-art implementations on
synthetic ill-conditioned matrices and real datasets across various
hardware technologies
The pseudo inverse simulation code has been deployed at the Subaru
telescope and operating since June 24th 2018!
Future work includes:
Asynchronous task-based QDWH-based partial SVD implementation
Multi-GPUs QDWH-based partial SVD implementation
HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
Acknowledgments
Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan
NVIDIA GPU Research Center
Cray Center of Excellence
Intel Parallel Computing Center
HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
The World’s Biggest Eye on The Sky
Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/)
The largest optical/near-infrared telescope in the world.
It will weigh about 2700 tons with a main mirror diameter of 39m.
Location: Chile, South America.
H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics
Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing
one of the most challenging instruments (MOSAIC)
HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
Exciting Time for Astronomy at KAUST/ECRC!
Supporting two major worldwide ground-based astronomy efforts
The E-ELT Telescope The Subaru Telescope
HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
Bringing Astronomy Back Home ;-)
Courtesy from CEMSE Communications, KAUST
HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
The Hourglass Revisited
@KAUST_ECRC
https://www.facebook.com/ecrckaust
HL, DS, OG, DK QDWH-Based Partial SVD 38 / 40
Questions?
HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
Moving Forward with Extreme AO
Last N WFS
measurements
sensor 1
N x n
Last N WFS
measurements
sensor K
N x n
MVM
Last N WFS
measurements
N x n
MVM
Last WFS
measurement
n
MVM
DM state
m
DM state
m
DM state
m
Control Matrix
m x n
Predictive Control Matrix
m x ( N x n )
Sensor Fusion and
Predictive control Matrix
m x ( K x N x n )
HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40

More Related Content

Similar to Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
Austin Benson
 
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
Toshihiro FUJII
 
vision navigation
vision navigationvision navigation
vision navigation
Ranjeet Raushan
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
Clifford Stone
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
Förderverein Technische Fakultät
 
16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf
Nishant835443
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
Deltares
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
FNian
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
RTEFGDFGJU
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovables
Jordi Cusido
 
Be lab manual csvtu
Be lab manual csvtuBe lab manual csvtu
Be lab manual csvtu
Vivek Kumar Sinha
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx
grssieee
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
N. I. Md. Ashafuddula
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour Landscapes
James T. Metz
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
Massimo Garavaglia
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
Liang Xie, PhD
 
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay NetworksDifferential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
mravendi
 
Task - Surround - Ambient Lighting
 Task - Surround - Ambient Lighting Task - Surround - Ambient Lighting
Task - Surround - Ambient Lighting
Cindy Foster-Warthen
 
final.pptx
final.pptxfinal.pptx
final.pptx
RobertStephen14
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
Abhishek Kesharwani
 

Similar to Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System (20)

QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
QR Factorizations and SVDs for Tall-and-skinny Matrices in MapReduce Architec...
 
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...A next-generation ground array for the detection of ultrahigh-energy cosmic r...
A next-generation ground array for the detection of ultrahigh-energy cosmic r...
 
vision navigation
vision navigationvision navigation
vision navigation
 
Xray interferometry
Xray interferometryXray interferometry
Xray interferometry
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
 
16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf16Mitch-BrachyPrimaryStandards.pdf
16Mitch-BrachyPrimaryStandards.pdf
 
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computationDSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
DSD-INT - SWAN Advanced Course - 02 - Setting up a SWAN computation
 
Hairong Qi V Swaminathan
Hairong Qi V SwaminathanHairong Qi V Swaminathan
Hairong Qi V Swaminathan
 
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdfreservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
reservoir-modeling-using-matlab-the-matalb-reservoir-simulation-toolbox-mrst.pdf
 
Presentació renovables
Presentació renovablesPresentació renovables
Presentació renovables
 
Be lab manual csvtu
Be lab manual csvtuBe lab manual csvtu
Be lab manual csvtu
 
4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx4_IGARSS11_younis_NoGIFv4.pptx
4_IGARSS11_younis_NoGIFv4.pptx
 
PCA and SVD in brief
PCA and SVD in briefPCA and SVD in brief
PCA and SVD in brief
 
Jim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour LandscapesJim Metz.SAR Contour Landscapes
Jim Metz.SAR Contour Landscapes
 
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORTCAPACITIVE SENSORS ELECTRICAL WAFER SORT
CAPACITIVE SENSORS ELECTRICAL WAFER SORT
 
Svd filtered temporal usage clustering
Svd filtered temporal usage clusteringSvd filtered temporal usage clustering
Svd filtered temporal usage clustering
 
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay NetworksDifferential Modulation and Non-Coherent Detection in Wireless Relay Networks
Differential Modulation and Non-Coherent Detection in Wireless Relay Networks
 
Task - Surround - Ambient Lighting
 Task - Surround - Ambient Lighting Task - Surround - Ambient Lighting
Task - Surround - Ambient Lighting
 
final.pptx
final.pptxfinal.pptx
final.pptx
 
2 simulation in aodv and dsr
2 simulation in aodv and dsr2 simulation in aodv and dsr
2 simulation in aodv and dsr
 

More from inside-BigData.com

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
inside-BigData.com
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
inside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
inside-BigData.com
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
inside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
inside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
inside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
inside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
inside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
inside-BigData.com
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
inside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
inside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
inside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
inside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
inside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
inside-BigData.com
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
inside-BigData.com
 

More from inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
DanBrown980551
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Alpen-Adria-Universität
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
Antonios Katsarakis
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
marufrahmanstratejm
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
Miro Wengner
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
innovationoecd
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
Edge AI and Vision Alliance
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
Zilliz
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides5th LF Energy Power Grid Model Meet-up Slides
5th LF Energy Power Grid Model Meet-up Slides
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing InstancesEnergy Efficient Video Encoding for Cloud and Edge Computing Instances
Energy Efficient Video Encoding for Cloud and Edge Computing Instances
 
Dandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity serverDandelion Hashtable: beyond billion requests per second on a commodity server
Dandelion Hashtable: beyond billion requests per second on a commodity server
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Public CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptxPublic CyberSecurity Awareness Presentation 2024.pptx
Public CyberSecurity Awareness Presentation 2024.pptx
 
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
JavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green MasterplanJavaLand 2024: Application Development Green Masterplan
JavaLand 2024: Application Development Green Masterplan
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Presentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of GermanyPresentation of the OECD Artificial Intelligence Review of Germany
Presentation of the OECD Artificial Intelligence Review of Germany
 
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
“Temporal Event Neural Networks: A More Efficient Alternative to the Transfor...
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
Generating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and MilvusGenerating privacy-protected synthetic data using Secludy and Milvus
Generating privacy-protected synthetic data using Secludy and Milvus
 

Extreme Computing for Extreme Adaptive Optics: The Key to Finding Life Outside our Solar System

  • 1. Extreme Computing for Extreme Adaptive Optics: the Key to Finding Life Outside our Solar System H. Ltaief1, D. Sukkari1, O. Guyon2,3,4, and D. Keyes1 1Extreme Computing Research Center, KAUST, Saudi Arabia 3Steward Observatory, University of Arizona, Tucson, USA 2National Institutes of Natural Sciences, Tokyo, Japan 4National Astronomical Observatory of Japan, Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 1 / 40
  • 2. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 2 / 40
  • 3. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 3 / 40
  • 4. The Subaru Telescope Represents a flagship telescope of the National Astronomical Observatory of Japan Carries a 8.2-meter (320in) diameter telescope Contains a high-contrast imaging system for directly imaging exoplanets Operates perhaps the most advanced HPC facility in computational astronomy Lives at the Mauna Kea Observatory in Hawaii HL, DS, OG, DK QDWH-Based Partial SVD 4 / 40
  • 5. The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 5 / 40
  • 6. (Perhaps) The Highest in Altitude GPU System Recorded at 14,000 feet! HL, DS, OG, DK QDWH-Based Partial SVD 6 / 40
  • 7. The Atmosphere Turbulence and The Optical Aberration HL, DS, OG, DK QDWH-Based Partial SVD 7 / 40
  • 8. The Astronomical Challenge Turbulence in the atmosphere limits the performance of astronomical telescopes Without active correction of such defects, images would be blurred to approximately one arcsecond angle To recover the loss of angular resolution, adaptive optics (AO) systems measure and correct atmospheric turbulence In the absence of optical aberrations, the telescope should provide λ D angular resolution (D:telescope diameter, λ wavelength) HL, DS, OG, DK QDWH-Based Partial SVD 8 / 40
  • 9. Adaptive Optics 101 Wavefront sensor(s) (WFS) Measure details of blurring from ’guide star’ near the object you want to observe A real time controller (RTC) Processes the WFS signals to compute the control matrix based on the pseudo inverse Light from both guide star and astronomical object is reflected from deformable mirror; distortions are removed https : //www.uniдe.ch/sciences/astro/index .php/download_f ile/view/34/168/ HL, DS, OG, DK QDWH-Based Partial SVD 9 / 40
  • 10. How AO Works? HL, DS, OG, DK QDWH-Based Partial SVD 10 / 40
  • 11. And Here Comes the Linear Algebra... Compute the pseudo inverse A+ AA+ A = A ,A ∈ Rm×n (m ≥ n) The numerical challenge of the pseudo inverse are twofold: Numerical: dealing with rectangular matrix which may engender numerical instabilities A A = V ΛV Computational: high algorithmic complexity, it should still be able to keep up with the overall throughput of the AO framework Using SVD: A = U ΣV then: A+ = V Σ−1 U Only most significant singular values with their associated singular vectors are required (≈10%) HL, DS, OG, DK QDWH-Based Partial SVD 11 / 40
  • 12. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 12 / 40
  • 13. SVD Algorithm - Standard Approach (LAPACK) This forms the SGESDD (divide and conquer algorithm) for the computation of SVD. Bidiagonal Reduction (8/3n3) SGESDD (Σ)(8/3n3) SGESDD (UΣV )(22n3) Level-2 BLAS (4/3n3) 50%flops 50%flops 6%flops 90%time 85%time 30%time HL, DS, OG, DK QDWH-Based Partial SVD 13 / 40
  • 14. Hardware Trends: Energy Matters! 2011 2018 DP FLOP 100 pJ 10 pJ DP DRAM Read 4800 pJ 1920 pJ Local interconnect 7500 pJ 2500 pJ Cross system 9000 pJ 3500 pJ John Shalf, LBNL HL, DS, OG, DK QDWH-Based Partial SVD 14 / 40
  • 15. The Big Picture (Similar w/ SVD) Cray LibSci17.11.1 HL, DS, OG, DK QDWH-Based Partial SVD 15 / 40
  • 16. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 16 / 40
  • 17. What is The Polar Decomposition? The polar decomposition: A = UpH , A ∈ Rm×n (m ≥ n), where Up is an orthogonal matrix and H = √ A A is a symmetric positive semidefinite matrix The polar decomposition is a critical numerical algorithm for various applications, including aerospace computations, chemistry, factor analysis HL, DS, OG, DK QDWH-Based Partial SVD 17 / 40
  • 18. QDWH Polar Decomposition Algorithm The QR-Dynamically Weighted Halley iterations: X0 = A/α, √ ckXk I = Q1 Q2 R, Xk+1 = bk ck Xk + 1 √ ck ak − bk ck Q1Q2 , k ≥ 0 The iterative procedure converges: A = UpH, where, UpUp = In, H is symmetric positive semidefinite Backward stable algorithm for computing the polar decomposition Based on conventional computational kernels, i.e., Cholesky/QR factorizations (≤ 6 iterations for double precision) and GEMM HL, DS, OG, DK QDWH-Based Partial SVD 18 / 40
  • 19. Numerical Algorithm Algorithm 1 Pseudo-Inverse using the QDWH-Based Partial SVD. Compute the polar decomposition A = UpH using QDWH Calculate [Q R] = QR(Up + Id) Find the index ind = min(f ind(abs(diaд(R)) < threshold)) Extract ˜Q = Q(:,ind : end) Reduce the original matrix problem ˜A = A × ˜Q Compute the SVD of the reduced matrix problem ˜A = U Σ ˜VT Compute the right singular vectors V = ˜QT × ˜V Calculate the pseudo-inverse A+ = V Σ−1UT HL, DS, OG, DK QDWH-Based Partial SVD 19 / 40
  • 20. Algorithmic Complexity Standard QDWH-based QDWH-based SVD Full SVD Partial SVD QDWH: (4+1/3)Nn3 x #itChol Algorithmic 22Nn3 43Nn3 QR and GEMM: 4/3Nn3 + 2sNn2 + 2Nns2 complexity SVD: 22s3 Where, Nn is the matrix size, and s is the number of the selected singular values/vectors (s << Nn) HL, DS, OG, DK QDWH-Based Partial SVD 20 / 40
  • 21. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 21 / 40
  • 22. Environment Settings Software: GCC compilers MAGMA v2.3 and CUDA v9.0 (including cuBLAS) Single precision (SP) arithmetics is used Ill-conditioned matrices generated using SLATMS MAGMA routine Hardware: The K80 GPU: 12GB of memory, Two-socket 14-core system Intel Broadwell system with 128GB of main memory The P100 and V100 GPUs: 16GB of memory Two-socket 16-core system Intel Haswell systems with 128GB of main memory HL, DS, OG, DK QDWH-Based Partial SVD 22 / 40
  • 23. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 23 / 40
  • 24. Synthetic Ill-Conditioned matrices, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 AccuracySingularValues Matrix size SGESDD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (a) Singular Value Accuracy. 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 13% SVD QDWHpartial, Right, 13% SVD QDWHpartial, Left, 13% SVD, QR+PO QDWHpartial, Right, 13% SVD, QR+PO QDWHpartial, Left, 10% SVD QDWHpartial, Right, 10% SVD 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000011000120001300014000150001600017000180001900020000 ResidualofSVD Matrix Size QDWHpartial, Left, 7% SVD QDWHpartial, Right, 7% SVD QDWHpartial, Left, 3% SVD QDWHpartial, Right, 3% SVD SGESDD, Left SGESDD, Right (b) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 24 / 40
  • 25. Real Observational Datasets, K80 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 AccuracySingularValues Matrix size SGESDD QDWHpartial (c) Singular Value Accuracy. 1e-12 1e-11 1e-10 1e-09 1e-08 1e-07 1e-06 1e-05 0.0001 0.001 0.01 0.1 1 1161 5805 11610 17415 ResidualofSVD Matrix size Right Left SGESDD, Left SGESDD, Right (d) Backward Error. HL, DS, OG, DK QDWH-Based Partial SVD 25 / 40
  • 26. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 26 / 40
  • 27. Synthetic Ill-Conditioned matrices, K80 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (e) In Seconds. 0 500 1000 1500 2000 2500 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (f) In Gflops/s. Up to 3X speedup, 1.8Tflop/s, 45% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 27 / 40
  • 28. Synthetic Ill-Conditioned matrices, P100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (g) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (h) In Gflops/s. Up to 4X speedup, 7Tflop/s, 75% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 28 / 40
  • 29. Synthetic Ill-Conditioned matrices, V100 0.01 0.1 1 10 100 1000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Time(s) Matrix size SGESDD QDWHpartial, 13% SVD, QR+PO QDWHpartial, 13% SVD QDWHpartial, 10% SVD QDWHpartial, 7% SVD QDWHpartial, 3% SVD (i) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000 12000 13000 14000 15000 16000 17000 18000 19000 20000 Gflop/s Matrix size QDWHpartial, 3% SVD QDWHpartial, 7% SVD QDWHpartial, 10% SVD QDWHpartial, 13% SVD QDWHpartial, 13% SVD, QR+PO SGESDD (j) In Gflops/s. Up to 5X speedup, 9Tflop/s, 65% of the theoretical peak performance HL, DS, OG, DK QDWH-Based Partial SVD 29 / 40
  • 30. Real Observational Datasets, V100 0.1 1 10 100 1161 5805 11610 17415 Time(s) Matrix size SGESDD QDWHpartial (k) In Seconds. 0 1000 2000 3000 4000 5000 6000 7000 8000 1161 5805 11610 17415 Gflop/s Matrix size QDWHpartial SGESDD (l) In Gflops/s. Up to 4X speedup HL, DS, OG, DK QDWH-Based Partial SVD 30 / 40
  • 31. Outline 1 Motivation 2 State-of-the-art Approach 3 QDWH-Based Partial SVD 4 Environment Settings 5 Numerical Accuracy 6 Performance Results 7 Conclusion and Future Work HL, DS, OG, DK QDWH-Based Partial SVD 31 / 40
  • 32. Conclusion and Future Work Comprehensive accuracy/performance analysis of a novel QDWH-based partial SVD algorithm Significant performance improvement of the QDWH-based partial SVD: up to 5X and 4X against state-of-the-art implementations on synthetic ill-conditioned matrices and real datasets across various hardware technologies The pseudo inverse simulation code has been deployed at the Subaru telescope and operating since June 24th 2018! Future work includes: Asynchronous task-based QDWH-based partial SVD implementation Multi-GPUs QDWH-based partial SVD implementation HL, DS, OG, DK QDWH-Based Partial SVD 32 / 40
  • 33. Acknowledgments Yuji Nakatsukasa, National Institute of Informatics @ Tokyo, Japan NVIDIA GPU Research Center Cray Center of Excellence Intel Parallel Computing Center HL, DS, OG, DK QDWH-Based Partial SVD 33 / 40
  • 34. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) HL, DS, OG, DK QDWH-Based Partial SVD 34 / 40
  • 35. The World’s Biggest Eye on The Sky Credits: ESO (http://www.eso.org/public/teles-instr/e-elt/) The largest optical/near-infrared telescope in the world. It will weigh about 2700 tons with a main mirror diameter of 39m. Location: Chile, South America. H. Ltaief et al., Real-Time Massively Distributed Multi-Object Adaptive Optics Simulations for the European Extremely Large Telescope, IEEE IPDPS 2018: designing one of the most challenging instruments (MOSAIC) HL, DS, OG, DK QDWH-Based Partial SVD 35 / 40
  • 36. Exciting Time for Astronomy at KAUST/ECRC! Supporting two major worldwide ground-based astronomy efforts The E-ELT Telescope The Subaru Telescope HL, DS, OG, DK QDWH-Based Partial SVD 36 / 40
  • 37. Bringing Astronomy Back Home ;-) Courtesy from CEMSE Communications, KAUST HL, DS, OG, DK QDWH-Based Partial SVD 37 / 40
  • 39. Questions? HL, DS, OG, DK QDWH-Based Partial SVD 39 / 40
  • 40. Moving Forward with Extreme AO Last N WFS measurements sensor 1 N x n Last N WFS measurements sensor K N x n MVM Last N WFS measurements N x n MVM Last WFS measurement n MVM DM state m DM state m DM state m Control Matrix m x n Predictive Control Matrix m x ( N x n ) Sensor Fusion and Predictive control Matrix m x ( K x N x n ) HL, DS, OG, DK QDWH-Based Partial SVD 40 / 40