• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Some experiences for porting application to Intel Xeon Phi
 

Some experiences for porting application to Intel Xeon Phi

on

  • 4,590 views

Some experiences for porting application to Intel Xeon Phi

Some experiences for porting application to Intel Xeon Phi

Statistics

Views

Total Views
4,590
Views on SlideShare
4,581
Embed Views
9

Actions

Likes
4
Downloads
32
Comments
0

2 Embeds 9

https://twitter.com 8
http://twitter.com 1

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Some experiences for porting application to Intel Xeon Phi Some experiences for porting application to Intel Xeon Phi Presentation Transcript

    • Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US maho@riken.jp Other side of my face maho@FreeBSD.org (FreeBSD committer) maho@apache.org (Apache OpenOffice committer)  2012/11 Super Computing 201212年11月15日木曜日
    • Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertises say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./configure --with-blas="-mkl" --with-lapack="-mkl" files=$(find ./* -name Makefile) perl -p -i -e s/-DMMIC/-mmic/g $files Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth12年11月15日木曜日
    • A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere...12年11月15日木曜日
    • Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/12年11月15日木曜日