Your SlideShare is downloading. ×
0
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Some experiences for porting application to Intel Xeon Phi
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Some experiences for porting application to Intel Xeon Phi

4,895

Published on

Some experiences for porting application to Intel Xeon Phi

Some experiences for porting application to Intel Xeon Phi

Published in: Technology
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,895
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
41
Comments
0
Likes
4
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US maho@riken.jp Other side of my face maho@FreeBSD.org (FreeBSD committer) maho@apache.org (Apache OpenOffice committer)  2012/11 Super Computing 201212年11月15日木曜日
  • 2. Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertises say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 3. What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 4. How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 5. DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 6. SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./configure --with-blas="-mkl" --with-lapack="-mkl" files=$(find ./* -name Makefile) perl -p -i -e s/-DMMIC/-mmic/g $files Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 7. Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 8. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 9. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 10. Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 11. Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth12年11月15日木曜日
  • 12. A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere...12年11月15日木曜日
  • 13. Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/12年11月15日木曜日

×