Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

1,077 views

Published on

Published in:
Technology

No Downloads

Total views

1,077

On SlideShare

0

From Embeds

0

Number of Embeds

107

Shares

0

Downloads

4

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Can You Get Performance from Xeon Phi Easily? Lessons Learned from Two Real Cases
- 2. Objective • Check the amount of work to use Intel Xeon Phi. • Minimal modifications using only pragmas. • Two applications: – CalcunetW. Test MKL Libraries. – GammaMaps. Test pragmas. • Two modes: – Native: Only compiled to execute on Xeon Phi – Offload: Uses Host+Xeon Phi
- 3. CalcuNetw: Calculate Measurements in Complex Networks • Complex networks, consisting of sets of nodes or vertices joined together in pairs by links or edges. • Application Calculates for each network: – Subgraph Centrality (SC): characterizes the participation of each node in all subgraphs in a network. – SC odd: account only paths of long odd – SC even: account only paths of long even – Bipartivity: Is a proportion of even to total number of closed walks in the network. – Network Communicability for Connected Nodes: C(p,q): Measures how well communicated are two nodes in the network. – Network Communicability C(G): is the mean of all the C(p,q), Mouriño J.C., Estrada E., Gomez A. “ CalcuNetw: Calculate Measurements in Complex Networks ”,Informe Técnico CESGA-2005-003
- 4. CalcuNetW
- 5. GammaMaps: A figure-of-merit in Radiation Therapy X Y Z Dose in voxel i,j,k X Y Z
- 6. GammaMaps: A figure-of-merit in Radiation Therapy Read Doses Initialise and normalise Compute Gamma Store Gamma • Application in FORTRAN 90 • Parallelised using OpenMP • Geometric algorithm* • 512 x 512 x 128 = 33,554,432 voxels • Auto-vectorization • Pragmas for offload * T. Ju, T. Simpson, J. O. Deasy, and D. A. Low, “Geometric interpretation of the γ dose distribution comparison technique: Interpolation-free calculation,” Medical Physics, vol. 35, no. 3, p. 879, 2008.
- 7. Results of Experiments
- 8. Platform Host CPU Model Intel(R) Xeon(R) CPU E5-2680 0 @ 2.70GHz Nr. of cores 16 Memory 32788 MB Operating System Linux 2.6.32-279.el6.x86_64 Compiler Version 2013U2 Intel Xeon Phi Model Beta0 Engineering Sample Nr. of cores 61 at 1.09GHz Memory 7936 MB Operating System MPSS Gold U1 Compiler Version 2013U2 GDDR Technology GDDR5 GDDR Frecuency 2750000 KHz • Remote access to Intel systems • Feb. 2013
- 9. COMPACT - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 1 2 3 4 5 6 7 Intel Xeon Phi Affinity Policies SCATTER - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 4 1 5 2 6 3 7 BALANCED - FINE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 0 1 2 3 4 5 6 7 BALANCED - CORE C1 C2 C3 C4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 H T 1 H T 2 H T 3 H T 4 {0,1} {2,3} {4,5} {6,7} • TYPE – Compact – Scatter – Balanced • Granularity – Fine or Thread – Core
- 10. Results for CalcunetW
- 11. CalcunetW
- 12. CalcunetW
- 13. CalcunetW
- 14. Results for GammaMaps
- 15. GammaMaps
- 16. Host 0 200 400 600 800 1000 1200 1400 0 5 10 15 20 ElapsedTime(s) Nr. of Threads Host local-compact-core local-compact-fine local-scatter-fine local-scatter-core
- 17. GammaMaps
- 18. Xeon Phi poor I/O
- 19. Conclusions • Using MKL library is easy and does not require changes in the code. • Easy pragmas on code permit fast usage • I/O performance issues in Xeon Phi • 1 Xeon Phi ~ 1 Xeon E5-2680 • Improve performance requires additional work.
- 20. Acknowledge The authors would like to thank Intel for providing access to Intel Xeon Phi coprocessor.
- 21. Questions Andrés Gómez José Carlos Mouriño Carmen Cotelo Aurelio Rodríguez The TEAM

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment