Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Programming Languages & Tools for Higher Performance & Productivity

1,015 views

Published on

By Hitoshi Murai, RIKEN AICS

For higher performance and productivity of HPC systems, it is important to provide users with good programming environment including languages, compilers, and tools. In this talk, the programming model of the post-K supercomputer will be shown.

Hitoshi Murai Bio
Hitoshi Murai received a master's degree in information science from Kyoto University in 1996. He worked as a software developer in NEC from 1996 to 2010. He received a Ph.D degree in computer science from University of Tsukuba in 2010. He is currently a research scientist of the programming environment research team and the Flagship 2020 project in Advanced Institute for Computational Science, RIKEN. His research interests include compilers and parallel programming languages.

Email
h-murai@riken.jp

For more info on The Linaro High Performance Computing (HPC) visit https://www.linaro.org/sig/hpc/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Programming Languages & Tools for Higher Performance & Productivity

  1. 1. Programming Languages & Tools for Higher Performance & Productivity Hitoshi Murai (RIKEN) Shun Kamatsuka (Fujitsu) Tomotake Nakamura (Fujitsu) Dec. 13, 2017 ARM HPC Workshop 1
  2. 2. Introduction of this Session nFor higher performance & productivity on HPC systems, programming environments have a crucial role. ⦁ languages ⦁ compilers ⦁ tools ⦁ libraries nRIKEN AICS and Fujitsu are collaborating to design the programming env. of the upcoming post-K computer. Dec. 13, 2017 ARM HPC Workshop 2
  3. 3. Agenda of this Session 1. XcalableMP PGAS Language ⦁ by Hitoshi Murai 2. Advantages of the Compiler for Post-K Computer ⦁ by Shun Kamatsuka 3. Overview of Programming Assistance Tools for Post-K Computer ⦁ by Tomotake Nakamura Dec. 13, 2017 ARM HPC Workshop 3
  4. 4. XcalableMP PGAS Language Hitoshi Murai (RIKEN) Dec. 13, 2017 ARM HPC Workshop 4
  5. 5. Introduction nMessage Passing Interface (MPI) is a de- facto standard for programming distributed- memory HPC systems. nProgramming with MPI is a very hard work. Dec. 13, 2017 ARM HPC Workshop 5 We are developing the XcalableMP (XMP) PGAS language, which could provide both high performance and productivity, for post-K.
  6. 6. What's PGAS? nPartitioned Global Address Space n"Global" ⦁ All processes or threads share one address space and can access to every data in it. n"Partitioned" ⦁ Remote and local data are distinguished and might have different manners and costs of access. Dec. 13, 2017 ARM HPC Workshop 6 p0 p1 p2 p3 PGAS private address space
  7. 7. What's ? n A directive-based PGAS language ⦁ Extension for C/Fortran. ⦁ Latest ver. 1.3 is available at: ⦁ Defined by XMP WG of the PC Cluster Consortium. n Two models of PGAS for distributed-memory parallel programming: ⦁ Global view (data/work mapping directives) ⦁ Local view (coarray) n Interoperable with other languages and models (e.g. Python, MPI, OpenMP, OpenACC) Dec. 13, 2017 ARM HPC Workshop 7 www.xcalablemp.org
  8. 8. Two Parallelization Models in XMP nGlobal view ⦁ Users specify how a set of nodes cooperate to solve a whole problem. ⦁ Rich directives for data/work mapping and comm. ⦁ Highly productive but suitable mainly to data parallelism. nLocal view ⦁ Users specify how each node works to solve a partial problem. ⦁ Coarray of Fortran 2008. ⦁ Lowly productive but more flexible. Dec. 13, 2017 8ARM HPC Workshop
  9. 9. Example of a Global-view XMP Program Dec. 13, 2017 9 real, dimension(lx,ly,lz) :: sr, se, ... ... do iz = 1, lz-1 do iy = 1, ly do ix = 1, lx wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz ) wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1) wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz ) ... ARM HPC Workshop
  10. 10. Example of a Global-view XMP Program Dec. 13, 2017 10 !$xmp nodes p(npx,npy,npz) !$xmp template (lx,ly,lz) :: t !$xmp distribute (block,block,block) onto p :: t real, dimension(lx,ly,lz) :: sr, se, ... !$xmp align (ix,iy,iz) with t(ix,iy,iz) :: !$xmp& sr, se, sm, sp, sn, sl, ... !$xmp shadow (1,1,1) :: !$xmp& sr, se, sm, sp, sn, sl, ... ... !$xmp reflect (sr, sm, sp, se, sn, sl) !$xmp loop (ix,iy,iz) on t(ix,iy,iz) do iz = 1, lz-1 do iy = 1, ly do ix = 1, lx wu0 = sm(ix,iy,iz ) / sr(ix,iy,iz ) wu1 = sm(ix,iy,iz+1) / sr(ix,iy,iz+1) wv0 = sn(ix,iy,iz ) / sr(ix,iy,iz ) ... stencil communication work mapping (parallel loops) ARM HPC Workshop data mapping
  11. 11. Local-view Programming nCoarray, a PGAS feature of Fortran 2008, is available in XMP/C as well as in XMP/Fortran. nBasic idea: data declared as coarray can be accessed by remote nodes. Dec. 13, 2017 ARM HPC Workshop 11 real a(1024)[*], b(1024) a(512:1024)[1] = b(1:512) sync all float a[1024]:[*], b[1024]; a[512:512]:[0] = b[0:512]; xmp_sync_all(NULL); XMP/Fortran XMP/C 1. An array a is declared as a coarray. 2. A local array section b(1:512) is put to a remote array section a(512:1024) on image 1. 3. A memory fence and barrier synchronization is performed. 1 2 3 1 2 3
  12. 12. Omni XcalableMP Compiler n An open-source reference impl. being developed by RIKEN & U. Tsukuba. n Latest Ver. 1.2.2 available at: n Supported platforms include: K, Fujitsu FX100, NEC SX, IBM BlueGene, Hitachi SR, Cray, Linux clusters, etc. n Proven applications include: ⦁ Plasma (3D fluid) ⦁ Seismic Imaging (3D stencil) ⦁ Fusion (Particle-in-Cell) ⦁ etc. Dec. 13, 2017 ARM HPC Workshop 12 omni-compiler.org C/Fortran compiler Frontend Translator Backend ..... ..... XMP program ..... ..... Executable Comm. libraries XMP runtime Omni XMP C/Fortran+MPI program
  13. 13. HPL (of HPC Challenge Benchmarks) nWritten in the global view of XMP/C nData is distributed in the block-cyclic manner and DGEMM is invoked for each block. nOverlapping comm. and calc. using asynchronous gmove Dec. 13, 2017 13 double A_L[N][NB]; #pragma xmp align A_L[i][*] with t(*,i) : #pragma xmp gmove async(1) A_L[k:len][0:NB] = A[k:len][j:NB]; : for(m=j+NB;m<N;m+=NB){ for(n=j+NB;n<N;n+=NB){ cblas_dgemm(&A[m][n], ..); if(xmp_test_async(1)){ // receive A[k:len][j:NB]; : 10 100 1000 256 2048 16384 423 TFlops (80.7%) 4,096 nodes TFlops Number of nodes 971 TFlops (46.3%) 16,384 nodes ARM HPC Workshop
  14. 14. NICAM-DC (of Fiber Miniapps) Dec. 13, 2017 ARM HPC Workshop 14 10 15 20 25 30 35 10 20 30 40 Speedup (MPI/10 = 10) Number of MPI Processes XMP MPI n Written in the local view of XMP/Fortran with coarray. n The coarray-based impl. is almost comparable to the original MPI-based one.
  15. 15. XcalableMP2.0 nDynamic multitasking for manycore processors ⦁ Breakaway from Bulk Synchronous Parallel (BSP) model. ⦁ More chances for overlapping comm. and comp. nEnhancements of loop parallelization nSupport for newer version of base languages (Fortran 2008, C99, and C++11) Dec. 13, 2017 ARM HPC Workshop 15
  16. 16. Summary n PGAS languages are promising alternatives to MPI. n XMP is a directive-based PGAS extension for Fortran and C. n XMP supports the global- and local-view programming to achieve both high performance and productivity. n XMP will be available on post-K. Dec. 13, 2017 16 omni-compiler.orgwww.xcalablemp.org More information is available at: ARM HPC Workshop

×