SlideShare a Scribd company logo
1 of 13
Download to read offline
Porting application to Intel Xeon Phi: some experiences

    RIKEN Advanced Center for Computing and Communication
    2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US

    maho@riken.jp

    Other side of my face
    maho@FreeBSD.org (FreeBSD committer)
    maho@apache.org (Apache OpenOffice committer)
                                                                  2012/11 Super Computing 2012




12年11月15日木曜日
Aims of my talk

    •Proof of concept:
       - Intel says, “One source base, tuned to many targets”
      - Is it true or not?
         - my answer is TRUE.
    •Native model is considered
      - Just compile with Intel Composer XE 2013 :-)
      - Offload model is extremely demanding for modern complicated programs
         - CUDA expertise's say: to get performance, do everything on GPU, do not
           transfer data between CPU and GPU.
         - Modern applications use a lot of external open source / free software
           packages. Very complex structure!
         - Not realistic!
    •Providing Porting tips
     - Gaussian09, povray, sdpa...                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
What is Intel Xeon Phi ??
    • Intel Xeon Phi is a co-processor, connected via PCI-express slot.
    • Peak performance is 1TFlops in double precision
       - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM...
    • We can see as if there are another cluster of computer inside a Linux box.
       - Linux micro OS is provided
    • Better programability
       - x86 based (64bit)
       - Development tool: Intel Composer XE 2013
          - C, C++, Fortran
          - compile and run same code to CPU
          - familiar parallelism : OpenMP, MPI, OpenCL
       - Various programming model
          - MIC centric
          - CPU centric
       -CAUTION: BINARY IS INCOMPATIBLE!
       -Recompile is needed for Xeon Phi!

                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
How to build your program on Xeon Phi
    •Very easy.
    •Just passing -mmic flags to Compilers
      -icc -mmic
      -icpc -mmic
      -ifort -mmic
    •How to link against optimized BLAS and LAPACK?
      -just add -mkl
      -same for CPU case.




                                                      Super Computing 2012 @ Intel Booth

12年11月15日木曜日
DGEMM benchmark: sorry, no free lunch, tune Needed.
    • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU
      performance (if tuned) so it is used for benchmarking.
       - not see the memory bandwidth
    • Intel Xeon Phi’s theoretical peak performance is 1TFlops.
    • Do we need some tunes for Intel Xeon Phi?
       - YES. Otherwise 40% of peak is attained: ~400GFlops
       - If tuned we attain ~816GFlops.
       - memory allocation, thread affinity
    • How to obtain the data?
       - just malloc and fill random values
       - no alignment is specified
       - CPU’s case it is sufficient, but
       - not sufficient for Xeon Phi.




                                                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
SDPA : How to cheat “configure” part I
    • SDPA is a highly efficient semidefinite programming solver.
       - distributed at http://sdpa.sourceforge.net/, under GPL.
    • ./configure ; make (on CPU)
    • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this?
       - almost the same environment...
       - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then
         replace to “-mmic”, then compile.
                           #!/bin/sh

                           CC="icc"; export CC
                           CXX="icpc"; export CXX
                           FC="ifort"; export FC

                           CFLAGS="-DMMIC" ; export CFLAGS
                           CXXFLAGS="-DMMIC" ; export CXXFLAGS
                           FFLAGS="-DMMIC" ; export FFLAGS

                           ./configure --with-blas="-mkl" --with-lapack="-mkl"

                           files=$(find ./* -name Makefile)
                           perl -p -i -e 's/-DMMIC/-mmic/g' $files
                                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure part II
    • The Persistence of Vision Raytracer is a high-quality, totally free tool for
      creating stunning three-dimensional graphics; a famous ray tracing program.
    • This treat how to build Povray 3.7 RC
       - This version is the first pthread parallelized Povray.
    • Requires some external libraries other than provided to Intel Xeon Phi.




                                                                Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • How to build boost and zlib: We took the same strategy as povray.
       - First build and install host version of boost to /home/maho/HOST then Phi
         version to /home/maho/MIC
       - Next, build and install host version of zlib to /home/maho/HOST
       - then, build Phi version as follows:
          - backup /home/maho/MIC to /home/maho/MIC.org
          - copy /home/maho/HOST to /home/maho/MIC
          - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
              - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
          - remove /home/maho/MIC
          - rename /home/maho/MIC.org to /home/maho/MIC
          - replace -DMMIC to -mmic
          - make for Xeon Phi binary.
          - Done.
    • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth
12年11月15日木曜日
Povray: how to cheat configure : part II
    • Prerequisites
       - boost, zlib, jpeg, tiff and libpng.
       - all libraries should be build for Phi :-( :-( :-(
    • Strategy: do build twice: host build then Xeon Phi build
       - build and install host version of libraries to /home/maho/HOST
       - build and install Phi version of libraires to /home/maho/MIC
          - actually,
    • Final configure for Povray should be done as follows:
       - backup /home/maho/MIC to /home/maho/MIC.org
       - copy /home/maho/HOST to /home/maho/MIC
       - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS.
          - be sure LD_LIBRARY_FLAGS points /home/maho/MIC!
       - remove /home/maho/MIC
       - rename /home/maho/MIC.org to /home/maho/MIC
       - replace -DMMIC to -mmic
       - make for Xeon Phi binary.
       - Done.
                                                            Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Gaussian09 is a famous quantum chemical program package and it provides state-
      of the-art capabilities for electronic structure modeling.
    • Very large source code: 1.7 million lines
       - $ cat *F | wc -l
       - 1714217
    • Intel Composer XE is not officially supported compiler
       - Gaussian Inc. only supports PGI compiler.
       - Patches are made by M.N. (sorry, we cannot provide the patches to public)
       - Small set of patches enable us to build
         -   -rw-r--r--. 1 maho users   463 1 30 10:53 2012 patch-bsd+buldg09
         -   -rw-r--r--. 1 maho users   692 1 30 10:53 2012 patch-bsd+fsplit.c
         -   -rw-r--r-- 1 maho users    5674 10 18 16:41 2012 patch-bsd+i386.make
         -   -rw-r--r--. 1 maho users   643 1 30 10:53 2012 patch-bsd+mdutil.F
         -   -rw-r--r--. 1 maho users   240 1 30 10:53 2012 patch-bsd+mygau
         -   -rw-r--r--. 1 maho users   486 1 30 10:53 2012 patch-bsd+set-mflags

       - patches are almost the same as hosts’ one.
         - almost merely adding -mmic
      - somehow shared libs don’t work??
         - utils.a should be a static library.
         - Intel MKL should also be linked statically.
         - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed?
         - Resultant binaries occupy approximately 2GB                              Super Computing 2012 @ Intel Booth

12年11月15日木曜日
Gaussian09 Partially Runs on Intel Xeon Phi!
    • Just run
    • Still very unstable with -O3
       - l303.exe (just wish your luck)
       - l401.exe (should be built with -O0)
       - Passed:(just test000.com-test200.com)
         test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03
         8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11
         5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17
         0,172,177,184,188,195




                                                               Super Computing 2012 @ Intel Booth

12年11月15日木曜日
A packaging system (pkgsrc) porting effort on Intel Phi!!!

    • What is the pkgsrc?
         - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000
           packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://
           www.pkgsrc.org/

    • NAKATA, Maho has over ten years of FreeBSD ports committer experience.
    • Why pkgsrc?
      - We need MORE software packages on Intel Phi!
         - Currently HPC program packages depend on other free software packages.
      - RPM, deb are too complex (to me).
      - Native tool chain for Intel Phi is really important
         - ./configure (autotools) is a good one but cross building is rarely supported.
         - ./configure looks some parameters of the host machine.
         - Intel Composer can be used as if it is a native toolkit with a small trick.
      - highly portable packaging system: works on *BSD (Net, DragonFly, Free),
        various Linux variants, AIX, MacOSX, FreeBSD
    • Status:
      - ./bootstrap : done
    • How to get?
      - I’ll provide ASAP on sourceforge.net or somewhere...
12年11月15日木曜日
Summary and outlook
    • We tested Intel Xeon Phi, especially how to build Phi native binary.
       -“One source base, tuned to many targets” is TRUE!
    • We regard Intel Xeon Phi as a small Linux cluster.
       - but no binary compatibility inbetween.
    • We provided a porting tip; how to build gaussian, povray and sdpa.
    • For packages using autotools (./configure) or similar things, our approach
      requires two pass configure to cheat
       - if configure looks Phi specific stuffs like availability of FMA, then this
         strategy doesn’t work.
       - Yoshikazu Kamoshida’s strategy solves for configure or build system which
         requires run small programs on target machine (SWoPP 2012; Development of
         middleware which facilitate tuning while installation under cross compile
         environment).
    • More packages are needed!
       - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment
         like Intel Xeon Phi.
               - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over
                 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms;
                 http://www.pkgsrc.org/
12年11月15日木曜日

More Related Content

What's hot

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architectureinside-BigData.com
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsWim Vanderbauwhede
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersKazuaki Ishizaki
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterLinaro
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...MLconf
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseKazuaki Ishizaki
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterLinaro
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingJonathan Dursi
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Jim Dowling
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-ServiceHiroshi Doyu
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...waqarnabi
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryIntel® Software
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Anne Nicolas
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learningAdam Gibson
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 

What's hot (20)

Assisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated ArchitectureAssisting User’s Transition to Titan’s Accelerated Architecture
Assisting User’s Transition to Titan’s Accelerated Architecture
 
On the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC ApplicationsOn the Capability and Achievable Performance of FPGAs for HPC Applications
On the Capability and Achievable Performance of FPGAs for HPC Applications
 
Easy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java ProgrammersEasy and High Performance GPU Programming for Java Programmers
Easy and High Performance GPU Programming for Java Programmers
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to ClusterBKK16-408B Data Analytics and Machine Learning From Node to Cluster
BKK16-408B Data Analytics and Machine Learning From Node to Cluster
 
Getting started with AMD GPUs
Getting started with AMD GPUsGetting started with AMD GPUs
Getting started with AMD GPUs
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
Funda Gunes, Senior Research Statistician Developer & Patrick Koch, Principal...
 
Making Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to UseMaking Hardware Accelerator Easier to Use
Making Hardware Accelerator Easier to Use
 
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to ClusterBKK16-404B Data Analytics and Machine Learning- from Node to Cluster
BKK16-404B Data Analytics and Machine Learning- from Node to Cluster
 
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big ComputingEuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
EuroMPI 2016 Keynote: How Can MPI Fit Into Today's Big Computing
 
Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)Distributed TensorFlow on Hops (Papis London, April 2018)
Distributed TensorFlow on Hops (Papis London, April 2018)
 
TinyML as-a-Service
TinyML as-a-ServiceTinyML as-a-Service
TinyML as-a-Service
 
Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...Towards Automated Design Space Exploration and Code Generation using Type Tra...
Towards Automated Design Space Exploration and Code Generation using Type Tra...
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Exploiting GPUs in Spark
Exploiting GPUs in SparkExploiting GPUs in Spark
Exploiting GPUs in Spark
 
Everything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI LibraryEverything You Need to Know About the Intel® MPI Library
Everything You Need to Know About the Intel® MPI Library
 
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
Kernel Recipes 2018 - XDP: a new fast and programmable network layer - Jesper...
 
Advanced spark deep learning
Advanced spark deep learningAdvanced spark deep learning
Advanced spark deep learning
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 

Viewers also liked

Post-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercisePost-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exerciseIntel IT Center
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel IT Center
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloadsinside-BigData.com
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Sean Everett
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana
 

Viewers also liked (6)

Post-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercisePost-processing SAR images on Xeon Phi - a porting exercise
Post-processing SAR images on Xeon Phi - a porting exercise
 
Intel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare pptIntel xeon phi coprocessor slideshare ppt
Intel xeon phi coprocessor slideshare ppt
 
Using Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC WorkloadsUsing Xeon + FPGA for Accelerating HPC Workloads
Using Xeon + FPGA for Accelerating HPC Workloads
 
Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16Intel Nervana Artificial Intelligence Meetup 11/30/16
Intel Nervana Artificial Intelligence Meetup 11/30/16
 
Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016Nervana AI Overview Deck April 2016
Nervana AI Overview Deck April 2016
 
Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17Intel Nervana Artificial Intelligence Meetup 1/31/17
Intel Nervana Artificial Intelligence Meetup 1/31/17
 

Similar to Porting applications to Intel Xeon Phi: tips and experiences

Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer George Markomanolis
 
the NML project
the NML projectthe NML project
the NML projectLei Yang
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 201244CON
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLeszek Godlewski
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012Philip Polstra
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterTim Ellison
 
Hardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopHardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopSlawomir Jasek
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...Toradex
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementGanesan Narayanasamy
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptxssuser0de10a
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...jaxLondonConference
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesDr. Fabio Baruffa
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ HomeAbhishek Parolkar
 
(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systemssosorry
 
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...NRB
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Peter Hlavaty
 

Similar to Porting applications to Intel Xeon Phi: tips and experiences (20)

Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
the NML project
the NML projectthe NML project
the NML project
 
Polstra 44con2012
Polstra 44con2012Polstra 44con2012
Polstra 44con2012
 
Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012Hacking and Forensics on the Go - 44CON 2012
Hacking and Forensics on the Go - 44CON 2012
 
PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )PowerAI Deep Dive ( key points )
PowerAI Deep Dive ( key points )
 
Linux as a gaming platform, ideology aside
Linux as a gaming platform, ideology asideLinux as a gaming platform, ideology aside
Linux as a gaming platform, ideology aside
 
The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012The Deck by Phil Polstra GrrCON2012
The Deck by Phil Polstra GrrCON2012
 
Five cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark fasterFive cool ways the JVM can run Apache Spark faster
Five cool ways the JVM can run Apache Spark faster
 
Hardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshopHardwear.io 2018 BLE Security Essentials workshop
Hardwear.io 2018 BLE Security Essentials workshop
 
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
First Steps Developing Embedded Applications using Heterogeneous Multi-core P...
 
CAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablementCAPI and OpenCAPI Hardware acceleration enablement
CAPI and OpenCAPI Hardware acceleration enablement
 
fpga1 - What is.pptx
fpga1 - What is.pptxfpga1 - What is.pptx
fpga1 - What is.pptx
 
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
Do You Like Coffee with Your dessert? Java and the Raspberry Pi - Simon Ritte...
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core ArchitecturesPerformance Optimization of SPH Algorithms for Multi/Many-Core Architectures
Performance Optimization of SPH Algorithms for Multi/Many-Core Architectures
 
Thotcon2013
Thotcon2013Thotcon2013
Thotcon2013
 
Building SuperComputers @ Home
Building SuperComputers @ HomeBuilding SuperComputers @ Home
Building SuperComputers @ Home
 
(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems(phpconftw2012) PHP as a Middleware in Embedded Systems
(phpconftw2012) PHP as a Middleware in Embedded Systems
 
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
The NRB Group mainframe day 2021 - New Programming Languages on Z - Frank Van...
 
Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!Ice Age melting down: Intel features considered usefull!
Ice Age melting down: Intel features considered usefull!
 

More from Maho Nakata

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)Maho Nakata
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてMaho Nakata
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編Maho Nakata
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてMaho Nakata
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望Maho Nakata
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewMaho Nakata
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IMaho Nakata
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かもMaho Nakata
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装Maho Nakata
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 PubchemqcプロジェクトMaho Nakata
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectMaho Nakata
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回Maho Nakata
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回Maho Nakata
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分Maho Nakata
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理Maho Nakata
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードするMaho Nakata
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用Maho Nakata
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)Maho Nakata
 
The PubChemQC Project
The PubChemQC ProjectThe PubChemQC Project
The PubChemQC ProjectMaho Nakata
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントするMaho Nakata
 

More from Maho Nakata (20)

quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
quantum chemistry on quantum computer handson by Q# (2019/8/4@MDR Hongo, Tokyo)
 
Lie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解についてLie-Trotter-Suzuki分解、特にフラクタル分解について
Lie-Trotter-Suzuki分解、特にフラクタル分解について
 
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
LiHのポテンシャルエネルギー曲面 を量子コンピュータで行う Q#+位相推定編
 
Q#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定についてQ#による量子化学計算 : 水素分子の位相推定について
Q#による量子化学計算 : 水素分子の位相推定について
 
量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望量子コンピュータの量子化学計算への応用の現状と展望
量子コンピュータの量子化学計算への応用の現状と展望
 
qubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a reviewqubitによる波動関数の虚時間発展のシミュレーション: a review
qubitによる波動関数の虚時間発展のシミュレーション: a review
 
Openfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part IOpenfermionを使った分子の計算 part I
Openfermionを使った分子の計算 part I
 
量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも量子コンピュータで量子化学のfullCIが超高速になる(かも
量子コンピュータで量子化学のfullCIが超高速になる(かも
 
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
20180723 量子コンピュータの量子化学への応用; Bravyi-Kitaev基底の実装
 
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト第11回分子科学 2017/9/17 Pubchemqcプロジェクト
第11回分子科学 2017/9/17 Pubchemqcプロジェクト
 
Kobeworkshop pubchemqc project
Kobeworkshop pubchemqc projectKobeworkshop pubchemqc project
Kobeworkshop pubchemqc project
 
計算化学実習講座:第二回
 計算化学実習講座:第二回 計算化学実習講座:第二回
計算化学実習講座:第二回
 
計算化学実習講座:第一回
計算化学実習講座:第一回計算化学実習講座:第一回
計算化学実習講座:第一回
 
HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分HOKUSAIのベンチマーク 理研シンポジウム 中田分
HOKUSAIのベンチマーク 理研シンポジウム 中田分
 
為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理為替取引(FX)でのtickdataの加工とMySQLで管理
為替取引(FX)でのtickdataの加工とMySQLで管理
 
為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする為替のTickdataをDukascopyからダウンロードする
為替のTickdataをDukascopyからダウンロードする
 
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用HPCS2015 pythonを用いた量子化学プログラムの開発と応用
HPCS2015 pythonを用いた量子化学プログラムの開発と応用
 
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
HPCS2015 大規模量子化学計算プログラムSMASHの開発と公開(石村)
 
The PubChemQC Project
The PubChemQC ProjectThe PubChemQC Project
The PubChemQC Project
 
3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする3Dプリンタ導入記 タンパク質の模型をプリントする
3Dプリンタ導入記 タンパク質の模型をプリントする
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 

Porting applications to Intel Xeon Phi: tips and experiences

  • 1. Porting application to Intel Xeon Phi: some experiences RIKEN Advanced Center for Computing and Communication 2012/11 Super Computing 2012 @ Intel Booth, Salt lake city, US maho@riken.jp Other side of my face maho@FreeBSD.org (FreeBSD committer) maho@apache.org (Apache OpenOffice committer)  2012/11 Super Computing 2012 12年11月15日木曜日
  • 2. Aims of my talk •Proof of concept: - Intel says, “One source base, tuned to many targets” - Is it true or not? - my answer is TRUE. •Native model is considered - Just compile with Intel Composer XE 2013 :-) - Offload model is extremely demanding for modern complicated programs - CUDA expertise's say: to get performance, do everything on GPU, do not transfer data between CPU and GPU. - Modern applications use a lot of external open source / free software packages. Very complex structure! - Not realistic! •Providing Porting tips - Gaussian09, povray, sdpa... Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 3. What is Intel Xeon Phi ?? • Intel Xeon Phi is a co-processor, connected via PCI-express slot. • Peak performance is 1TFlops in double precision - many cores : 64 cores, 4 threads each, 512bit AVX, GDDR5 8GB of RAM... • We can see as if there are another cluster of computer inside a Linux box. - Linux micro OS is provided • Better programability - x86 based (64bit) - Development tool: Intel Composer XE 2013 - C, C++, Fortran - compile and run same code to CPU - familiar parallelism : OpenMP, MPI, OpenCL - Various programming model - MIC centric - CPU centric -CAUTION: BINARY IS INCOMPATIBLE! -Recompile is needed for Xeon Phi! Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 4. How to build your program on Xeon Phi •Very easy. •Just passing -mmic flags to Compilers -icc -mmic -icpc -mmic -ifort -mmic •How to link against optimized BLAS and LAPACK? -just add -mkl -same for CPU case. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 5. DGEMM benchmark: sorry, no free lunch, tune Needed. • DGEMM is a matrix-matrix multiplication routine. It uses almost 100% of CPU performance (if tuned) so it is used for benchmarking. - not see the memory bandwidth • Intel Xeon Phi’s theoretical peak performance is 1TFlops. • Do we need some tunes for Intel Xeon Phi? - YES. Otherwise 40% of peak is attained: ~400GFlops - If tuned we attain ~816GFlops. - memory allocation, thread affinity • How to obtain the data? - just malloc and fill random values - no alignment is specified - CPU’s case it is sufficient, but - not sufficient for Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 6. SDPA : How to cheat “configure” part I • SDPA is a highly efficient semidefinite programming solver. - distributed at http://sdpa.sourceforge.net/, under GPL. • ./configure ; make (on CPU) • But Intel Composer XE 2013 for Xeon Phi is a cross-compiler... how to do this? - almost the same environment... - Two pass strategy. First pass, pass dummy “-DDMIC” to configure, then replace to “-mmic”, then compile. #!/bin/sh CC="icc"; export CC CXX="icpc"; export CXX FC="ifort"; export FC CFLAGS="-DMMIC" ; export CFLAGS CXXFLAGS="-DMMIC" ; export CXXFLAGS FFLAGS="-DMMIC" ; export FFLAGS ./configure --with-blas="-mkl" --with-lapack="-mkl" files=$(find ./* -name Makefile) perl -p -i -e 's/-DMMIC/-mmic/g' $files Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 7. Povray: how to cheat configure part II • The Persistence of Vision Raytracer is a high-quality, totally free tool for creating stunning three-dimensional graphics; a famous ray tracing program. • This treat how to build Povray 3.7 RC - This version is the first pthread parallelized Povray. • Requires some external libraries other than provided to Intel Xeon Phi. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 8. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • How to build boost and zlib: We took the same strategy as povray. - First build and install host version of boost to /home/maho/HOST then Phi version to /home/maho/MIC - Next, build and install host version of zlib to /home/maho/HOST - then, build Phi version as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. • Building tiff and png for Phi is similar to above procedure. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 9. Povray: how to cheat configure : part II • Prerequisites - boost, zlib, jpeg, tiff and libpng. - all libraries should be build for Phi :-( :-( :-( • Strategy: do build twice: host build then Xeon Phi build - build and install host version of libraries to /home/maho/HOST - build and install Phi version of libraires to /home/maho/MIC - actually, • Final configure for Povray should be done as follows: - backup /home/maho/MIC to /home/maho/MIC.org - copy /home/maho/HOST to /home/maho/MIC - run configure for host and pass -DMMIC flag to CFLAGS and CXXFLAGS. - be sure LD_LIBRARY_FLAGS points /home/maho/MIC! - remove /home/maho/MIC - rename /home/maho/MIC.org to /home/maho/MIC - replace -DMMIC to -mmic - make for Xeon Phi binary. - Done. Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 10. Gaussian09 Partially Runs on Intel Xeon Phi! • Gaussian09 is a famous quantum chemical program package and it provides state- of the-art capabilities for electronic structure modeling. • Very large source code: 1.7 million lines - $ cat *F | wc -l - 1714217 • Intel Composer XE is not officially supported compiler - Gaussian Inc. only supports PGI compiler. - Patches are made by M.N. (sorry, we cannot provide the patches to public) - Small set of patches enable us to build - -rw-r--r--. 1 maho users 463 1 30 10:53 2012 patch-bsd+buldg09 - -rw-r--r--. 1 maho users 692 1 30 10:53 2012 patch-bsd+fsplit.c - -rw-r--r-- 1 maho users 5674 10 18 16:41 2012 patch-bsd+i386.make - -rw-r--r--. 1 maho users 643 1 30 10:53 2012 patch-bsd+mdutil.F - -rw-r--r--. 1 maho users 240 1 30 10:53 2012 patch-bsd+mygau - -rw-r--r--. 1 maho users 486 1 30 10:53 2012 patch-bsd+set-mflags - patches are almost the same as hosts’ one. - almost merely adding -mmic - somehow shared libs don’t work?? - utils.a should be a static library. - Intel MKL should also be linked statically. - shared libs of MKL should be located at /lib64? LD_LIBRARY_PATH doesn’t parsed? - Resultant binaries occupy approximately 2GB Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 11. Gaussian09 Partially Runs on Intel Xeon Phi! • Just run • Still very unstable with -O3 - l303.exe (just wish your luck) - l401.exe (should be built with -O0) - Passed:(just test000.com-test200.com) test001,023,024,025,026,027,028,029,030,031,032,033,034,035,036,037,03 8,039,040,042,056,076,077,078,079,081,091,092,093,099,101,102,104,108,11 5,116,119,120,130,131,140,142,144,145,149,150,151,153,162,163,165,168,169,17 0,172,177,184,188,195 Super Computing 2012 @ Intel Booth 12年11月15日木曜日
  • 12. A packaging system (pkgsrc) porting effort on Intel Phi!!! • What is the pkgsrc? - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http:// www.pkgsrc.org/ • NAKATA, Maho has over ten years of FreeBSD ports committer experience. • Why pkgsrc? - We need MORE software packages on Intel Phi! - Currently HPC program packages depend on other free software packages. - RPM, deb are too complex (to me). - Native tool chain for Intel Phi is really important - ./configure (autotools) is a good one but cross building is rarely supported. - ./configure looks some parameters of the host machine. - Intel Composer can be used as if it is a native toolkit with a small trick. - highly portable packaging system: works on *BSD (Net, DragonFly, Free), various Linux variants, AIX, MacOSX, FreeBSD • Status: - ./bootstrap : done • How to get? - I’ll provide ASAP on sourceforge.net or somewhere... 12年11月15日木曜日
  • 13. Summary and outlook • We tested Intel Xeon Phi, especially how to build Phi native binary. -“One source base, tuned to many targets” is TRUE! • We regard Intel Xeon Phi as a small Linux cluster. - but no binary compatibility inbetween. • We provided a porting tip; how to build gaussian, povray and sdpa. • For packages using autotools (./configure) or similar things, our approach requires two pass configure to cheat - if configure looks Phi specific stuffs like availability of FMA, then this strategy doesn’t work. - Yoshikazu Kamoshida’s strategy solves for configure or build system which requires run small programs on target machine (SWoPP 2012; Development of middleware which facilitate tuning while installation under cross compile environment). • More packages are needed! - Poring NetBSD’s pkgsrc might be good idea for cross compiling environment like Intel Xeon Phi. - pkgsrc is a framework for building third-party software on NetBSD and other UNIX-like systems, currently containing over 12000 packages. It is used to enable freely available software to be configured and built easily on supported platforms; http://www.pkgsrc.org/ 12年11月15日木曜日