Your SlideShare is downloading. ×
Unleash performance through parallelism - Intel® Math Kernel Library
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Saving this for later?

Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime - even offline.

Text the download link to your phone

Standard text messaging rates apply

Unleash performance through parallelism - Intel® Math Kernel Library

264
views

Published on

Unleashing performance through parallelism with Intel Math Kernel Library

Unleashing performance through parallelism with Intel Math Kernel Library


0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
264
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Unleash performance throughparallelism – Intel® Math Kernel LibraryKent MoffatProduct ManagerIntel Developer Products DivisionISC ‘13Leipzig, GermanyJune 2013
  • 2. Agenda• What is Intel® Math Kernel Library (Intel® MKL)?• Intel® MKL supports Intel® Xeon Phi™ Coprocessor• Intel® MKL usage models on Intel® Xeon Phi™ Automatic Offload Compiler Assisted Offload Native Execution• Performance roadmap• Where to find more information?
  • 3. Linear Algebra•BLAS•LAPACK•Sparse solvers•ScaLAPACKFast FourierTransforms•Multidimensional (upto 7D)•FFTW interfaces•Cluster FFTVector Math•Trigonometric•Hyperbolic•Exponential,Logarithmic•Power / Root•RoundingVector RandomNumber Generators•Congruential•Recursive•Wichmann-Hill•Mersenne Twister•Sobol•Neiderreiter•Non-deterministicSummary Statistics•Kurtosis•Variation coefficient•Quantiles, orderstatistics•Min/max•Variance-covariance•…Data Fitting•Splines•Interpolation•Cell searchIntel® MKL is industry’s leading math library *Many-coreMulticoreMulticoreCPUMulticoreCPUIntel® MICco-processorSourceClusters with Multicoreand Many-core… …MulticoreClusterClustersIntel® MKL* 2011 and 2012 Evans Data N. American developer survey
  • 4. Intel® MKL Support for Intel® Xeon Phi™ Coprocessor• Intel® MKL 11.0 supports the Intel® Xeon Phi™ coprocessor• Heterogeneous computing Takes advantage of both multicore host and many-core coprocessors• Optimized for wider (512-bit) SIMD instructions• Flexible usage models: Automatic Offload: Offers transparent heterogeneous computing Compiler Assisted Offload: Allows fine offloading control Native execution: Use the coprocessors as independent nodesUsing Intel® MKL on Intel® Xeon Phi™• Performance scales from multicore to many-cores• Familiarity of architecture and programming models• Code re-use, Faster time-to-market
  • 5. Usage Models on Intel® Xeon Phi™ Coprocessor• Automatic Offload No code changes required Automatically uses both host and target Transparent data transfer and execution management• Compiler Assisted Offload Explicit controls of data transfer and remote execution using compiler offload pragmas/directives Can be used together with Automatic Offload• Native Execution Uses the coprocessors as independent nodes Input data is copied to targets in advance
  • 6. Execution Models6Multicore HostedGeneral purpose serial andparallel computingOffloadCodes with highly-parallel phasesMany Core HostedHighly-parallel codesSymmetricCodes with balancedneedsMulticore(Intel® Xeon®)Many-core(Intel® Xeon Phi™)Multicore Centric Many-Core Centric
  • 7. Automatic Offload (AO)• Offloading is automatic and transparent• By default, IntelMKL decides: When to offload Work division between host and targets• Users enjoy host and target parallelism automatically• Users can still control work division to fine tune performance
  • 8. How to Use Automatic Offload• Using Automatic Offload is easy• What if there doesn’t exist a coprocessor in the system? Runs on the host as usual without any penalty!Call a function:mkl_mic_enable()orSet an env variable:MKL_MIC_ENABLE=1
  • 9. Automatic Offload Enabled Functions• A selective set of MKL functions are AO enabled. Only functions with sufficient computation to offset datatransfer overhead are subject to AO• In 11.0, AO enabled functions include: Level-3 BLAS: ?GEMM, ?TRSM, ?TRMM LAPACK 3 amigos: LU, QR, Cholesky
  • 10. Compiler Assisted Offload (CAO)• Offloading is explicitly controlled by compiler pragmas or directives.• All MKL functions can be offloaded in CAO. In comparison, only a subset of MKL is subject to AO.• Can leverage the full potential of compiler’s offloading facility.• More flexibility in data transfer and remote execution management. A big advantage is data persistence: Reusing transferred data formultiple operations
  • 11. How to Use Compiler Assisted Offload• The same way you would offload any function call to the coprocessor.• An example in C:#pragma offload target(mic) in(transa, transb, N, alpha, beta) in(A:length(matrix_elements)) in(B:length(matrix_elements)) in(C:length(matrix_elements)) out(C:length(matrix_elements) alloc_if(0)){sgemm(&transa, &transb, &N, &N, &N, &alpha, A, &N, B, &N,&beta, C, &N);}
  • 12. How to Use Compiler Assisted Offload• An example in Fortran:!DEC$ ATTRIBUTES OFFLOAD : TARGET( MIC ) :: SGEMM!DEC$ OMP OFFLOAD TARGET( MIC ) &!DEC$ IN( TRANSA, TRANSB, M, N, K, ALPHA, BETA, LDA, LDB, LDC ), &!DEC$ IN( A: LENGTH( NCOLA * LDA )), &!DEC$ IN( B: LENGTH( NCOLB * LDB )), &!DEC$ INOUT( C: LENGTH( N * LDC ))!$OMP PARALLEL SECTIONS!$OMP SECTIONCALL SGEMM( TRANSA, TRANSB, M, N, K, ALPHA, &A, LDA, B, LDB BETA, C, LDC )!$OMP END PARALLEL SECTIONS
  • 13. Native Execution• Programs can be built to run only on the coprocessor by using the –mmic build option.• Tips of using MKL in native execution:– Use all threads to get best performance (for 60-core coprocessor)MIC_OMP_NUM_THREADS=240– Thread affinity settingKMP_AFFINITY=explicit,proclist=[1-240:1,0,241,242,243],granularity=fine– Use huge pages for memory allocation (More details on the “Intel Developer Zone”): The mmap( ) system call The open-source libhugetlbfs.so library: http://libhugetlbfs.sourceforge.net/
  • 14. Suggestions on Choosing Usage Models• Choose native execution if Highly parallel code. Want to use coprocessors as independent compute nodes.• Choose AO when A sufficient Byte/FLOP ratio makes offload beneficial. Level-3 BLAS functions: ?GEMM, ?TRMM, ?TRSM. LU and QR factorization (in upcoming release updates).• Choose CAO when either There is enough computation to offset data transfer overhead. Transferred data can be reused by multiple operations• You can always run on the host
  • 15. Intel® MKL Link Line Advisor• A web tool to help users to choose correct link line options.• http://software.intel.com/sites/products/mkl/• Also available offline in the MKL product package.
  • 16. Where to Find MKL Documentation?• How to use MKL on Intel® Xeon Phi™ coprocessors:– “Using Intel Math Kernel Library on Intel® Xeon Phi™ Coprocessors” section in the User’sGuide.– “Support Functions for Intel Many Integrated Core Architecture” section in the ReferenceManual.• Tips, app notes, etc. can also be found on the “Intel Developer Zone”:– http://www.intel.com/software/mic– http://www.intel.com/software/mic-developer