Unleash performance throughparallelism – Intel® Math Kernel LibraryKent MoffatProduct ManagerIntel Developer Products Divi...
Agenda• What is Intel® Math Kernel Library (Intel® MKL)?• Intel® MKL supports Intel® Xeon Phi™ Coprocessor• Intel® MKL usa...
Linear Algebra•BLAS•LAPACK•Sparse solvers•ScaLAPACKFast FourierTransforms•Multidimensional (upto 7D)•FFTW interfaces•Clust...
Intel® MKL Support for Intel® Xeon Phi™ Coprocessor• Intel® MKL 11.0 supports the Intel® Xeon Phi™ coprocessor• Heterogene...
Usage Models on Intel® Xeon Phi™ Coprocessor• Automatic Offload No code changes required Automatically uses both host an...
Execution Models6Multicore HostedGeneral purpose serial andparallel computingOffloadCodes with highly-parallel phasesMany ...
Automatic Offload (AO)• Offloading is automatic and transparent• By default, IntelMKL decides: When to offload Work divi...
How to Use Automatic Offload• Using Automatic Offload is easy• What if there doesn’t exist a coprocessor in the system? R...
Automatic Offload Enabled Functions• A selective set of MKL functions are AO enabled. Only functions with sufficient comp...
Compiler Assisted Offload (CAO)• Offloading is explicitly controlled by compiler pragmas or directives.• All MKL functions...
How to Use Compiler Assisted Offload• The same way you would offload any function call to the coprocessor.• An example in ...
How to Use Compiler Assisted Offload• An example in Fortran:!DEC$ ATTRIBUTES OFFLOAD : TARGET( MIC ) :: SGEMM!DEC$ OMP OFF...
Native Execution• Programs can be built to run only on the coprocessor by using the –mmic build option.• Tips of using MKL...
Suggestions on Choosing Usage Models• Choose native execution if Highly parallel code. Want to use coprocessors as indep...
Intel® MKL Link Line Advisor• A web tool to help users to choose correct link line options.• http://software.intel.com/sit...
Where to Find MKL Documentation?• How to use MKL on Intel® Xeon Phi™ coprocessors:– “Using Intel Math Kernel Library on In...
Unleash performance through parallelism - Intel® Math Kernel Library
Unleash performance through parallelism - Intel® Math Kernel Library
Upcoming SlideShare
Loading in...5
×

Unleash performance through parallelism - Intel® Math Kernel Library

306

Published on

Unleashing performance through parallelism with Intel Math Kernel Library

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
306
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Transcript of "Unleash performance through parallelism - Intel® Math Kernel Library"

  1. 1. Unleash performance throughparallelism – Intel® Math Kernel LibraryKent MoffatProduct ManagerIntel Developer Products DivisionISC ‘13Leipzig, GermanyJune 2013
  2. 2. Agenda• What is Intel® Math Kernel Library (Intel® MKL)?• Intel® MKL supports Intel® Xeon Phi™ Coprocessor• Intel® MKL usage models on Intel® Xeon Phi™ Automatic Offload Compiler Assisted Offload Native Execution• Performance roadmap• Where to find more information?
  3. 3. Linear Algebra•BLAS•LAPACK•Sparse solvers•ScaLAPACKFast FourierTransforms•Multidimensional (upto 7D)•FFTW interfaces•Cluster FFTVector Math•Trigonometric•Hyperbolic•Exponential,Logarithmic•Power / Root•RoundingVector RandomNumber Generators•Congruential•Recursive•Wichmann-Hill•Mersenne Twister•Sobol•Neiderreiter•Non-deterministicSummary Statistics•Kurtosis•Variation coefficient•Quantiles, orderstatistics•Min/max•Variance-covariance•…Data Fitting•Splines•Interpolation•Cell searchIntel® MKL is industry’s leading math library *Many-coreMulticoreMulticoreCPUMulticoreCPUIntel® MICco-processorSourceClusters with Multicoreand Many-core… …MulticoreClusterClustersIntel® MKL* 2011 and 2012 Evans Data N. American developer survey
  4. 4. Intel® MKL Support for Intel® Xeon Phi™ Coprocessor• Intel® MKL 11.0 supports the Intel® Xeon Phi™ coprocessor• Heterogeneous computing Takes advantage of both multicore host and many-core coprocessors• Optimized for wider (512-bit) SIMD instructions• Flexible usage models: Automatic Offload: Offers transparent heterogeneous computing Compiler Assisted Offload: Allows fine offloading control Native execution: Use the coprocessors as independent nodesUsing Intel® MKL on Intel® Xeon Phi™• Performance scales from multicore to many-cores• Familiarity of architecture and programming models• Code re-use, Faster time-to-market
  5. 5. Usage Models on Intel® Xeon Phi™ Coprocessor• Automatic Offload No code changes required Automatically uses both host and target Transparent data transfer and execution management• Compiler Assisted Offload Explicit controls of data transfer and remote execution using compiler offload pragmas/directives Can be used together with Automatic Offload• Native Execution Uses the coprocessors as independent nodes Input data is copied to targets in advance
  6. 6. Execution Models6Multicore HostedGeneral purpose serial andparallel computingOffloadCodes with highly-parallel phasesMany Core HostedHighly-parallel codesSymmetricCodes with balancedneedsMulticore(Intel® Xeon®)Many-core(Intel® Xeon Phi™)Multicore Centric Many-Core Centric
  7. 7. Automatic Offload (AO)• Offloading is automatic and transparent• By default, IntelMKL decides: When to offload Work division between host and targets• Users enjoy host and target parallelism automatically• Users can still control work division to fine tune performance
  8. 8. How to Use Automatic Offload• Using Automatic Offload is easy• What if there doesn’t exist a coprocessor in the system? Runs on the host as usual without any penalty!Call a function:mkl_mic_enable()orSet an env variable:MKL_MIC_ENABLE=1
  9. 9. Automatic Offload Enabled Functions• A selective set of MKL functions are AO enabled. Only functions with sufficient computation to offset datatransfer overhead are subject to AO• In 11.0, AO enabled functions include: Level-3 BLAS: ?GEMM, ?TRSM, ?TRMM LAPACK 3 amigos: LU, QR, Cholesky
  10. 10. Compiler Assisted Offload (CAO)• Offloading is explicitly controlled by compiler pragmas or directives.• All MKL functions can be offloaded in CAO. In comparison, only a subset of MKL is subject to AO.• Can leverage the full potential of compiler’s offloading facility.• More flexibility in data transfer and remote execution management. A big advantage is data persistence: Reusing transferred data formultiple operations
  11. 11. How to Use Compiler Assisted Offload• The same way you would offload any function call to the coprocessor.• An example in C:#pragma offload target(mic) in(transa, transb, N, alpha, beta) in(A:length(matrix_elements)) in(B:length(matrix_elements)) in(C:length(matrix_elements)) out(C:length(matrix_elements) alloc_if(0)){sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N,&beta, C, &N);}
  12. 12. How to Use Compiler Assisted Offload• An example in Fortran:!DEC$ ATTRIBUTES OFFLOAD : TARGET( MIC ) :: SGEMM!DEC$ OMP OFFLOAD TARGET( MIC ) &!DEC$ IN( TRANSA, TRANSB, M, N, K, ALPHA, BETA, LDA, LDB, LDC ), &!DEC$ IN( A: LENGTH( NCOLA * LDA )), &!DEC$ IN( B: LENGTH( NCOLB * LDB )), &!DEC$ INOUT( C: LENGTH( N * LDC ))!$OMP PARALLEL SECTIONS!$OMP SECTIONCALL SGEMM( TRANSA, TRANSB, M, N, K, ALPHA, &A, LDA, B, LDB BETA, C, LDC )!$OMP END PARALLEL SECTIONS
  13. 13. Native Execution• Programs can be built to run only on the coprocessor by using the –mmic build option.• Tips of using MKL in native execution:– Use all threads to get best performance (for 60-core coprocessor)MIC_OMP_NUM_THREADS=240– Thread affinity settingKMP_AFFINITY=explicit,proclist=[1-240:1,0,241,242,243],granularity=fine– Use huge pages for memory allocation (More details on the “Intel Developer Zone”): The mmap( ) system call The open-source libhugetlbfs.so library: http://libhugetlbfs.sourceforge.net/
  14. 14. Suggestions on Choosing Usage Models• Choose native execution if Highly parallel code. Want to use coprocessors as independent compute nodes.• Choose AO when A sufficient Byte/FLOP ratio makes offload beneficial. Level-3 BLAS functions: ?GEMM, ?TRMM, ?TRSM. LU and QR factorization (in upcoming release updates).• Choose CAO when either There is enough computation to offset data transfer overhead. Transferred data can be reused by multiple operations• You can always run on the host
  15. 15. Intel® MKL Link Line Advisor• A web tool to help users to choose correct link line options.• http://software.intel.com/sites/products/mkl/• Also available offline in the MKL product package.
  16. 16. Where to Find MKL Documentation?• How to use MKL on Intel® Xeon Phi™ coprocessors:– “Using Intel Math Kernel Library on Intel® Xeon Phi™ Coprocessors” section in the User’sGuide.– “Support Functions for Intel Many Integrated Core Architecture” section in the ReferenceManual.• Tips, app notes, etc. can also be found on the “Intel Developer Zone”:– http://www.intel.com/software/mic– http://www.intel.com/software/mic-developer

×