This document discusses lessons learned from porting two applications, CalcuNetW and GammaMaps, to the Intel Xeon Phi coprocessor. CalcuNetW calculates measurements in complex networks using MKL libraries, while GammaMaps performs dose calculations for radiation therapy using OpenMP pragmas. With minimal modifications using only pragmas, both applications were able to run natively and offload work to the Xeon Phi. Results showed the Xeon Phi providing similar performance to a single Xeon CPU core but with poor I/O performance. Further optimization work is required to fully leverage the Xeon Phi's capabilities.
1. Can You Get Performance from
Xeon Phi Easily?
Lessons Learned from Two Real
Cases
2. Objective
• Check the amount of work to use Intel
Xeon Phi.
• Minimal modifications using only pragmas.
• Two applications:
– CalcunetW. Test MKL Libraries.
– GammaMaps. Test pragmas.
• Two modes:
– Native: Only compiled to execute on Xeon Phi
– Offload: Uses Host+Xeon Phi
3. CalcuNetw: Calculate Measurements in Complex Networks
• Complex networks, consisting of sets of
nodes or vertices joined together in pairs by
links or edges.
• Application Calculates for each network:
– Subgraph Centrality (SC): characterizes the
participation of each node in all subgraphs in a
network.
– SC odd: account only paths of long odd
– SC even: account only paths of long even
– Bipartivity: Is a proportion of even to total number of
closed walks in the network.
– Network Communicability for Connected Nodes:
C(p,q): Measures how well communicated are two
nodes in the network.
– Network Communicability C(G): is the mean of all
the C(p,q),
Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico
CESGA-2005-003
6. GammaMaps: A figure-of-merit in
Radiation Therapy
Read
Doses
Initialise and
normalise
Compute
Gamma
Store
Gamma
• Application in FORTRAN 90
• Parallelised using OpenMP
• Geometric algorithm*
• 512 x 512 x 128 = 33,554,432
voxels
• Auto-vectorization
• Pragmas for offload
* T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution
comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.
8. Platform
Host
CPU Model Intel(R) Xeon(R) CPU E5-2680
0 @ 2.70GHz
Nr. of cores 16
Memory 32788 MB
Operating System Linux 2.6.32-279.el6.x86_64
Compiler Version 2013U2 Intel Xeon Phi
Model Beta0 Engineering Sample
Nr. of cores 61 at 1.09GHz
Memory 7936 MB
Operating System MPSS Gold U1
Compiler Version 2013U2
GDDR Technology GDDR5
GDDR Frecuency 2750000 KHz
• Remote
access to
Intel systems
• Feb. 2013
9. COMPACT - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
Intel Xeon Phi Affinity Policies
SCATTER - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 4 1 5 2 6 3 7
BALANCED - FINE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
0 1 2 3 4 5 6 7
BALANCED - CORE
C1 C2 C3 C4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
H
T
1
H
T
2
H
T
3
H
T
4
{0,1} {2,3} {4,5} {6,7}
• TYPE
– Compact
– Scatter
– Balanced
• Granularity
– Fine or Thread
– Core
19. Conclusions
• Using MKL library is easy and does not
require changes in the code.
• Easy pragmas on code permit fast usage
• I/O performance issues in Xeon Phi
• 1 Xeon Phi ~ 1 Xeon E5-2680
• Improve performance requires additional
work.