Experts in numerical algorithmsand HPC servicesAccelerators: the good, the bad and the ugly!Dr Ian ReidIan.Reid@nag.co.uk
2 NAG Introduction Accelerators – NAG experience NAG on Intel Xeon Phi SummaryAgenda
3 Founded 1970 Not-for-profit organisation Surpluses fund on-going R&D Mathematical and Statistical Expertise Librari...
4 Escalator?:Want more performance? Buy the next processor! To get performance/efficiency we have to go(massively) paral...
5 Loose definition: hardware on which to run yoursoftware better than on your (general purpose) CPU Generally NOT an eas...
6 The Intel Xeon Phi is a co-processor attached to ahost system via the PCI express bus Highly parallel architecture Co...
7 Relatively easy to take existing OpenMP based codeand port to Phi Tuning for Phi takes some learning and expertise … ...
8 Offload OpenMP regions to Phi when problem sizesare above some threshold Estimating problem size can be complex Requi...
9 All NAG functions can be explicitly offloaded by user user code modified to include relevant offload statements allow...
10 Users may choose to port their entire application user code modified to include relevant offload statements allows c...
11 Sandybridge CPUs (typically using 32 threads) Knights Corner Phi processor (typically using 240threads)Performance Ex...
1202004006008001,0001,2001,4001,6000 5000 10000 15000 20000 25000 30000Time(s)Problem Size (n)Hierarchical Cluster Analysi...
130501001502002503003504004500 5000 10000 15000 20000 25000 30000Time(s)Problem Size (n)Distance Matrix (g03ea)32 threads ...
140.000.200.400.600.801.001.201.401.60100 10,000 1,000,000 100,000,000Time(s)Size of problem (n, log scale)Uniform RNG - M...
150501001502002503000 0.5 1 1.5 2 2.5 3 3.5 4 4.5Time(s)Problem Size (weighted)Maximum Likelihood Estimates (g03ca)32 thre...
160204060801001201401601802000 1000 2000 3000 4000 5000 6000 7000Time(s)Problem Size (n)Solve real symmetric positive defi...
17 Parallelism is a real issue we all face Exciting for some. Challenging for others! Accelerators are interesting and ...
18Thank YouQuestions?
Upcoming SlideShare
Loading in …5
×

Accelerators: the good, the bad, and the ugly

287
-1

Published on

NAG talks accelerators, and getting the most out of Xeon Phi

Published in: Technology
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
287
On Slideshare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Accelerators: the good, the bad, and the ugly

  1. 1. Experts in numerical algorithmsand HPC servicesAccelerators: the good, the bad and the ugly!Dr Ian ReidIan.Reid@nag.co.uk
  2. 2. 2 NAG Introduction Accelerators – NAG experience NAG on Intel Xeon Phi SummaryAgenda
  3. 3. 3 Founded 1970 Not-for-profit organisation Surpluses fund on-going R&D Mathematical and Statistical Expertise Libraries of components Consulting HPC Services Computational Science and Engineering (CSE) support Procurement advice, market watch, benchmarkingNAG Background
  4. 4. 4 Escalator?:Want more performance? Buy the next processor! To get performance/efficiency we have to go(massively) parallel Disruption causing serious look at ‘other’technologies (and algorithms!) Even CPUs with tens of cores Hybrid, shared-memory and distributed-memoryparallelism Painful whichever way we turn!Where has my Escalator gone?
  5. 5. 5 Loose definition: hardware on which to run yoursoftware better than on your (general purpose) CPU Generally NOT an easy win Significant learning curve and effort Offload disadvantages… The good: put some effort in; get a great result! The bad: put effort in, get an OK result, but learnlessons which can be re-used (often good!) The ugly: put significant effort in, get a poor resultand don’t learn anything substantiveAccelerators
  6. 6. 6 The Intel Xeon Phi is a co-processor attached to ahost system via the PCI express bus Highly parallel architecture Compiler support for OpenMP parallelism It has a distinct memory system from the host Several use cases to consider: Automatic Offloading Explicit Offloading Native ApplicationsIntel Xeon Phi
  7. 7. 7 Relatively easy to take existing OpenMP based codeand port to Phi Tuning for Phi takes some learning and expertise … but feedback into Xeon code is often very strong NAG Library for Intel Xeon Phi supports all models Offload (supports automatic and explicit) and Native libs Windows version from Intel Xeon Phi now in betaNAG Experience with Intel Xeon Phi
  8. 8. 8 Offload OpenMP regions to Phi when problem sizesare above some threshold Estimating problem size can be complex Required data is transferred to/from the hostprior/post executing OpenMP region Data transfer takes time, eats into the benefit of runningthe OpenMP on the Phi Transparent to the user of the Library Just recompile code containing NAG Library function callsto benefit.Automatic Offload
  9. 9. 9 All NAG functions can be explicitly offloaded by user user code modified to include relevant offload statements allows control of which functions offloaded Data transfers to Phi can be dissociated with functionoffloading allowing data to remain on the Phi user responsible for data movement reduces penalty of offloading data by allowing its use bymultiple offloaded function calls before returning to host Effort required by the user to re-code applicationExplicit Offload
  10. 10. 10 Users may choose to port their entire application user code modified to include relevant offload statements allows complete control of which functions are offloaded Data transfers to Phi can be dissociated with functionoffloading allowing data to remain on the Phi user responsible for data movement reduces penalty of offloading data by allowing its use bymultiple offloaded function calls before returning to host Effort required by the user to re-code applicationNative Applications
  11. 11. 11 Sandybridge CPUs (typically using 32 threads) Knights Corner Phi processor (typically using 240threads)Performance Examples and Lessons
  12. 12. 1202004006008001,0001,2001,4001,6000 5000 10000 15000 20000 25000 30000Time(s)Problem Size (n)Hierarchical Cluster Analysis (go3ec)32 threads original Phi offload original Phi offload opt 32 threads opt n=30k; m=3k Xeon 32t: 1,412s Phi 240t*: 1,259s Xeon 32t*: 1,073s For this size problembest to stay on CPUbut take the 25%!
  13. 13. 130501001502002503003504004500 5000 10000 15000 20000 25000 30000Time(s)Problem Size (n)Distance Matrix (g03ea)32 threads original Phi offload original Phi offload opt 32 threads opt n=30k; m=3k Xeon 32t: 192s Phi 240t*: 40.6s Xeon 32t*: 75.7s Phi gain ~2x (~5xover original)
  14. 14. 140.000.200.400.600.801.001.201.401.60100 10,000 1,000,000 100,000,000Time(s)Size of problem (n, log scale)Uniform RNG - Mersenne Twister (g05sa)8 threads original Native Phi original Native Phi opt 8 threads opt n=500m Xeon 8t: 0.25s Phi 240t*: 0.08s Xeon 8t*: 0.22s Phi gain ~3x
  15. 15. 150501001502002503000 0.5 1 1.5 2 2.5 3 3.5 4 4.5Time(s)Problem Size (weighted)Maximum Likelihood Estimates (g03ca)32 threads original Phi offload original Phi offload opt 32 threads opt n=2500; m=2500;nfac=30; nvar=200 Xeon 32t: 256s Phi 240t*: 53.6s Xeon 32t*: 54.7s Phi gain 4x, but alsoXeon speed-up (greenline under red)
  16. 16. 160204060801001201401601802000 1000 2000 3000 4000 5000 6000 7000Time(s)Problem Size (n)Solve real symmetric positive definite simultaneous linearequations using iterative refinement (f04af)32 threads original Phi offload original Phi offload opt 32 threads opt n=6,000; nrhs;1,000 Xeon32t: 171s Phi 240t*: 66s Xeon 32t*: 86s Phi gain ~1.3x (~3xoriginal)
  17. 17. 17 Parallelism is a real issue we all face Exciting for some. Challenging for others! Accelerators are interesting and can offer spectacular wins Intel Phi claiming less spectacular performance gains Less effort than on other Accelerators … and often repays on CPU as well! Acid test is always solving your (complete) problem! NAG can help you try out this technology NAG Library for Phi NAG expertiseSummary
  18. 18. 18Thank YouQuestions?

×