1. Analysis and Optimization
of CGPOP
Hongtao Cai, Xiaoxiang Hu, Haoruo Peng
Department of CST, Tsinghua University
SIAM Annual Meeting, July 9, 2012
2. Acknowledgment
Prof. XiaogeWang , Prof.Wei Xue
Support from the State 863 Project Fund
Support from Explore-100,Tianhe-1A, Shenwei
supercomputer systems
Support from SIAM
2
3. Outline
Background
Research
Analysis of original PCG Method in CGPOP
Optimizations:
Chebyshev
Richardson-PCG
Richardson-Chebyshev
Experiments
FutureWork
3
4. Outline
Background
Research
Analysis of original PCG Method in CGPOP
Optimizations:
Chebyshev
Richardson-PCG
Richardson-Chebyshev
Experiments
FutureWork
4
5. Parallel Ocean Program
The crucial role of Oceans in Global Climate
70% of earth surface
Water 1000 times higher the heat capacity of air
repository of carbon(93%)
Transport heat
POP : Surface Pressure of Oceans[1]
5
6. Conjugate Gradient Parallel Ocean
Program (CGPOP)
Three computation parts: Barotropic, 3D-update, Baroclinic
Barotropic computation
dominates when core number
exceeds 10,000 [2]
CGPOP contains the core
part of Barotropic compuation
6
7. Conjugate Gradient Parallel Ocean
Program (CGPOP)
Linear equation system in every time step
𝛻 ∙ 𝐻𝛻 −
1
𝑔𝛼𝜏∆𝑡
𝜂 𝑛+1
= 𝛻 ∙ 𝐻
𝑈
𝑔𝛼𝜏
+ 𝛻𝜂 𝑛−1
−
𝜂 𝑛
𝑔𝛼𝜏∆𝑡
−
𝑞 𝑊
𝑛
𝑔𝛼𝜏
Ax = b
(A is a real, sparse, symmetric, positive-definite matrix)
Our work: Exploring new algorithms in CGPOP. Experiments
on top supercomputer in the world.
7
8. Outline
Background
Research
Analysis of original PCG Method in CGPOP
Optimizations:
Chebyshev
Richardson-PCG
Richardson-Chebyshev
Experiments
FutureWork
8
18. Outline
Background
Research
Analysis of original PCG Method
Optimizations
Chebyshev
Richardson-PCG
Richardson-Chebyshev
Experiments
FutureWork
18
19. Richardson-PCG
Single Precision: Faster[5]
A processor can take 2 double or 4 single at a time
Memory Pressure
Double Precision: MoreAccurate
Mix them up
19
32. Conclusion
Two techniques
Reducing dot-products
Effective in large core numbers ( more than 5000)
Mixed precision
Effective in small core numbers ( less than 1000)
32
33. Outline
Background
Research
Analysis of original PCG Method
Optimizations
Chebyshev
Richardson-PCG
Richardson-Chebyshev
Experiments
FutureWork
33
34. FutureWork
Complete the investigation of the current code
IntegrateOptimization techniques into our ocean modeling
programs
Apply our methods to other parallel programs
34
35. References
[1] R. Smith, P. Gent, “Reference Manual for the Parallel Ocean Program(POP)”,
May, 2002, Page 1-74.
[2]A. Stone, J. M. Dennis, M. M. Strout, “The CGPOP Miniapp,Version 1.0”,
July, 2011, Page 4-5.
[3]Y. Saad, A. Sameh, P. Saylor, “Solving elliptic difference equations on a
linear array of processors”, SIAM J. Sci. Stat. Comput.,Vol. 6, No. 4, October
1985, Page 1049-1063.
[4] E. Stiefel, “Kernel polynomials in linear algebra and their numerical
applications”, Nat. Bur. Standards, Appl. Math. Series 49, 1958, page 1-22.
[5] A. Buttari, E. Lyon, J. Dongarra. “Using Mixed Precision for Sparse Matrix
Computations to Enhance the Performance while Achieving 64-bit Accuracy”,
ACMTransactions on Math. Software,Vol.34, No.4, Article 17, Page 1-8.
35