More Related Content
Similar to H-cholesky on manycore
Similar to H-cholesky on manycore (20)
H-cholesky on manycore
- 2. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Background
If A is a positive definite matrix, Cholesky factorization: A = 𝐿𝐿%
Data matrices representing some numerical
observations such as proximity matrix or
correlation matrix are often huge and hard to
analyze, therefore to decompose the data
matrices into some lower-order or lower-rank
canonical forms will reveal the inherent
characteristic and structure of the matrices and
help to interpret their meaning readily.
- 3. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
3
Hierarchical Matrix
Hierarchical matrices (H-matrices) are a powerful tool to represent dense
matrices coming from integral equations or partial differential equations in a
hierarchical, block-oriented, data-sparse way with log-linear memory costs.
- 4. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Hierarchical Matrix
4
- 5. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
5
Implementation: Inadmissible Leaves:
The product index set resolves into admissible and inadmissible leaves of the tree. The
assembly, storage and matrix-vector multiplication differs for the corresponding two classes
of sub matrices.
Inadmissible Leaves:
- 6. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
6
Implementation:Admissible Leaves:
The product index set resolves into admissible and inadmissible leaves of the tree. The
assembly, storage and matrix-vector multiplication differs for the corresponding two classes
of sub matrices.
Admissible Leaves:
- 7. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
7
Hierarchical Matrix Representation
- 8. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
8
Profiling
- 9. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Compiler Optimization – Full matrix
Intel Confidential
9
For icc opt1, icc with optimizations
like -O2.
For icc opt2, icc with default
optimizations like -msse4.2 -O3.
For icc mkl, icc opt2 + mkl function.
- 10. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Numerical Libraries Optimization – Full matrix
Intel Confidential
10
dpotrf_ vs plasma_dpotrf vs
magma_dpotrf
MKL: Intel Math Kernel Library
(Intel MKL) accelerates math
processing routines.
PLASMA: Parallel Linear Algebra
for Scalable Multi-core
Architectures
MAGMA: Matrix Algebra on GPU
and Multicore Architectures
- 11. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
11
Parallel Optimization
The concept of task-based DAG computations is used to split the H-Cholesky
factorization into single tasks and to define corresponding dependencies to form
a DAG.
- 12. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
12
CodeAnalysis
- 13. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
13
Multicore Optimization – H-Cholesky Factorization
13
Example 1:
Example 2:
- 14. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
14
Manycore Optimization – H-Cholesky Factorization
1. Allocate & Copy r->a[row_offset] and r->b[col_offset] into accelerators.
2. Copy result ft->e from accelerators into CPU host memory.
- 15. Copyright © 2015, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. Optimization Notice
Intel Confidential
15
Result & Conclusion
0 500 1000 1500 2000 2500 3000 3500 4000 4500
0
2
4
6
8
10
12
H−Cholesky Decomposition where the problem size (vertices) is 10002
nmin (leaf size)
Time(sec)
MKL
Hybrid
H - Cholesky factorization
on many-core accelerators
is extremely efficient,
which also can be well
scaled on large-scaled H-
matrix.