Transcript of "Some experiences for porting application to Intel Xeon Phi"
Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US email@example.com Other side of my face maho@FreeBSD.org (FreeBSD committer) firstname.lastname@example.org (Apache OpenOffice committer) 2012/11 Super Computing 201212年11月15日木曜日
Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertises say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth12年11月15日木曜日
What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth12年11月15日木曜日
How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth12年11月15日木曜日
DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./conﬁgure --with-blas="-mkl" --with-lapack="-mkl" ﬁles=$(ﬁnd ./* -name Makeﬁle) perl -p -i -e s/-DMMIC/-mmic/g $ﬁles Super Computing 2012 @ Intel Booth12年11月15日木曜日
Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth12年11月15日木曜日
Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth12年11月15日木曜日
A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere...12年11月15日木曜日
Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/12年11月15日木曜日
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.