Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Can You Get Performance from
Xeon Phi Easily?
Lessons Learned from Two Real
Cases
Objective
• Check the amount of work to use Intel
Xeon Phi.
• Minimal modifications using only pragmas.
• Two applications...
CalcuNetw: Calculate Measurements in Complex Networks
• Complex networks, consisting of sets of
nodes or vertices joined t...
CalcuNetW
GammaMaps: A figure-of-merit in Radiation
Therapy
X
Y
Z
Dose in voxel i,j,k
X
Y
Z
GammaMaps: A figure-of-merit in
Radiation Therapy
Read
Doses
Initialise and
normalise
Compute
Gamma
Store
Gamma
• Applicat...
Results of Experiments
Platform
Host
CPU Model Intel(R) Xeon(R) CPU E5-2680
0 @ 2.70GHz
Nr. of cores 16
Memory 32788 MB
Operating System Linux 2....
COMPACT - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4...
Results for CalcunetW
CalcunetW
CalcunetW
CalcunetW
Results for GammaMaps
GammaMaps
Host
0
200
400
600
800
1000
1200
1400
0 5 10 15 20
ElapsedTime(s)
Nr. of Threads
Host
local-compact-core
local-compact-fin...
GammaMaps
Xeon Phi poor I/O
Conclusions
• Using MKL library is easy and does not
require changes in the code.
• Easy pragmas on code permit fast usage...
Acknowledge
The authors would like to thank Intel for
providing access to Intel Xeon Phi
coprocessor.
Questions
Andrés Gómez
José Carlos Mouriño
Carmen Cotelo
Aurelio Rodríguez
The TEAM
Upcoming SlideShare
Loading in …5
×

Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

1,077 views

Published on

Intel Xeon Phi is a new x86-compatible co-processor architecture which permits the execution of legacy applications with minimum changes on the code. Using two real applications as example, we have evaluated the effort to run them using it with minimal changes on the code, and we have compared the results against the host performance.

Published in: Technology
  • Be the first to comment

Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases

  1. 1. Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
  2. 2. Objective • Check the amount of work to use Intel Xeon Phi. • Minimal modifications using only pragmas. • Two applications: – CalcunetW. Test MKL Libraries. – GammaMaps. Test pragmas. • Two modes: – Native: Only compiled to execute on Xeon Phi – Offload: Uses Host+Xeon Phi
  3. 3. CalcuNetw: Calculate Measurements in Complex Networks • Complex networks, consisting of sets of nodes or vertices joined together in pairs by links or edges. • Application Calculates for each network: – Subgraph Centrality (SC): characterizes the participation of each node in all subgraphs in a network. – SC odd: account only paths of long odd – SC even: account only paths of long even – Bipartivity: Is a proportion of even to total number of closed walks in the network. – Network Communicability for Connected Nodes: C(p,q): Measures how well communicated are two nodes in the network. – Network Communicability C(G): is the mean of all the C(p,q), Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico CESGA-2005-003
  4. 4. CalcuNetW
  5. 5. GammaMaps: A figure-of-merit in Radiation Therapy X Y Z Dose in voxel i,j,k X Y Z
  6. 6. GammaMaps: A figure-of-merit in Radiation Therapy Read Doses Initialise and normalise Compute Gamma Store Gamma • Application in FORTRAN 90 • Parallelised using OpenMP • Geometric algorithm* • 512 x 512 x 128 = 33,554,432 voxels • Auto-vectorization • Pragmas for offload * T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.
  7. 7. Results of Experiments
  8. 8. Platform Host CPU Model Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Nr. of cores 16 Memory 32788 MB Operating System Linux 2.6.32-279.el6.x86_64 Compiler Version 2013U2 Intel Xeon Phi Model Beta0 Engineering Sample Nr. of cores 61 at 1.09GHz Memory 7936 MB Operating System MPSS Gold U1 Compiler Version 2013U2 GDDR Technology GDDR5 GDDR Frecuency 2750000 KHz • Remote access to Intel systems • Feb. 2013
  9. 9. COMPACT - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 1 2 3 4 5 6 7 Intel Xeon Phi Affinity Policies SCATTER - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 4 1 5 2 6 3 7 BALANCED - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 1 2 3 4 5 6 7 BALANCED - CORE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 {0,1} {2,3} {4,5} {6,7} • TYPE – Compact – Scatter – Balanced • Granularity – Fine or Thread – Core
  10. 10. Results for CalcunetW
  11. 11. CalcunetW
  12. 12. CalcunetW
  13. 13. CalcunetW
  14. 14. Results for GammaMaps
  15. 15. GammaMaps
  16. 16. Host 0 200 400 600 800 1000 1200 1400 0 5 10 15 20 ElapsedTime(s) Nr. of Threads Host local-compact-core local-compact-fine local-scatter-fine local-scatter-core
  17. 17. GammaMaps
  18. 18. Xeon Phi poor I/O
  19. 19. Conclusions • Using MKL library is easy and does not require changes in the code. • Easy pragmas on code permit fast usage • I/O performance issues in Xeon Phi • 1 Xeon Phi ~ 1 Xeon E5-2680 • Improve performance requires additional work.
  20. 20. Acknowledge The authors would like to thank Intel for providing access to Intel Xeon Phi coprocessor.
  21. 21. Questions Andrés Gómez José Carlos Mouriño Carmen Cotelo Aurelio Rodríguez The TEAM

×