Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Like this presentation? Why not share!

- M.Sc. Thesis presentation by Roberto Fierimonte 156 views
- Profiling and Optimizing for Xeon P... by Intel IT Center 348 views
- Deep Convolutional Network evaluati... by Gaurav Raina 173 views
- Web Dev101 For Journalists by Lisa Williams 1558 views
- Can You Get Performance from Xeon P... by Andrés Gómez 1004 views
- Altair on Intel Xeon Phi: Optimizi... by Intel IT Center 712 views

2,095 views

Published on

ZIB use Xeon Phi to achieve their Connected Compenent Labeling strategy #ISC13 #HPC

No Downloads

Total views

2,095

On SlideShare

0

From Embeds

0

Number of Embeds

19

Shares

0

Downloads

0

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Florian WendeZuse-Institute BerlinConnected ComponentLabeling on Xeon PhiParallelization & Vectorization
- 2. wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, LeipzigConnected Component LabelingSuppose we are given the following image . . .
- 3. . . . and we are to assign unique labels to different connected regions!Connected Component Labelingwende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig
- 4. . . . and we are to assign unique labels to different connected regions!. . . In parallel? Computer VisionDetect connected regions in images Computational PhysicsCluster algorithms for the Ising model Percolation TheoryHow to achieve the labeling? . . .Connected Component Labelingwende@zib.de Connected Component Labeling on Xeon Phi 2ISC13, Leipzig
- 5. 1. Labeling algorithm2. Parallelizationa. Parallel implementation on CPUb. Run the CPU code on the Xeon Phic. Adapt the code for the Xeon Phi3. Vectorization (SIMD)d. Leave it to the compiler (auto-vectorization)e. SIMD intrinsic functionsXeon Phi: 512-Bit SIMD unit for 16 x 32-bit wordsConnected Component Labeling - Strategywende@zib.de Connected Component Labeling on Xeon Phi 3ISC13, Leipzig
- 6. Breadth/Depth first search algorithm, multi-pass algorithms Hoshen-Kopelman algorithm Cluster self-labeling algorithm by Coddington and Baillie1. Assign a unique label to each pixel of the image2. For each pixel consider its adjacent connected pixels in positive 1-, 2-, . . .direction and set the respective labels to the minimum value each3. If for all pixels the minimum operation is the identity function: Finished!Otherwise: Continue with step 2CPU: Hoshen-KopelmanXeon Phi: Hoshen-Kopelman vs. Cluster self-labelingConnected Component Labeling - Algorithmwende@zib.de Connected Component Labeling on Xeon Phi 4ISC13, Leipzig
- 7. Partition the image into equal-sized sub-images, and label themindependently using multiple threadsConnected Comp. Labeling - Parallelizationwende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
- 8. Partition the image into equal-sized sub-images, and label themindependently using multiple threads Unique labels acrossdifferent sub-images Connected regions thatextend over multiple sub-images are merged after thelabeling using atomicprimitivesThread 0Thread 2Thread 4Thread 6Thread 1Thread 3Thread 5Thread 7Connected Comp. Labeling - Parallelizationwende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
- 9. Example: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructionsConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 10. Process multiple data simultaneously using SIMD instructions1. Initialize labeling (array index)Example: Self-labeling within sub-image of thread 2Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 11. 1. Initialize labeling (array index)2. Load row[0] into reg0, andcreate mask for adjacententries in positive 1-direction:1 if equal-colored0 otherwiseExample: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructions1-directionConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 12. 1. Initialize labeling (array index)2. Load row[0] into reg0, andcreate mask for adjacententries in positive 1-direction:1 if equal-colored0 otherwise3. Overlap each element in reg0 with itsadjacent element in positive 1-direction,and write the result to reg1Example: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructionsConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 13. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg1Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 14. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the maskConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 15. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg1Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 16. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg17. Shift all bits in mask one position up, and write the pairwise minimumentries in row[0] and reg1 to row[0] using the shifted maskConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 17. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg17. Shift all bits in mask one position up, and write the pairwise minimumentries in row[0] and reg1 to row[0] using the shifted mask8. Did labels change?Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
- 18. Result of the operations up to now . . .Set adjacent connectedelements in row[0] to thepairwise minimum value eachBeforeAfterRepeat the procedure for the 2-direction.1-direction2-directionConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 7ISC13, Leipzig
- 19. Repeat the procedure for all other rows as long as labels change . . .BeforeAfterNow: Merge labels across different sub-images using atomics!Finished!Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 8ISC13, Leipzig
- 20. CPU: Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz Hoshen-Kopelman algorithm + Atomics for label merging Vectorization was left to the compiler: there are no masked SIMD intrinsics!Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz Hoshen-Kopelman vs. Cluster self-labeling + Atomics for label merging Vectorization by means of _mm512_[mask]_XXX() instrinsicsParallelization by means of OpenMP: #pragma omp parallel {...}Programming effort: approx. 2-3 days for the CPU code (incl. optimization)less than 1 day for the Xeon Phi code (based on CPU code)Connected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 9ISC13, Leipzig
- 21. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHzXeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHzApplication: Swendsen-Wang cluster algorithm for the 2D Ising modelConnected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
- 22. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHzXeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHzApplication: Swendsen-Wang cluster algorithm for the 2D Ising modelConnected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
- 23. Work partially funded byBMBF Grant No. 01IH11004GDr. Thomas SteinkeZuse-Institute Berlin (ZIB)Dr. Michael KlemmIntel GmbH, GermanyAcknowledgementwende@zib.de Connected Component Labeling on Xeon Phi 11ISC13, Leipzig
- 24. [1] C. F. Baillie and P. D. Coddington. Cluster Identification Algorithmsfor Spin Models – Sequential and Parallel, 1991.[2] Hoshen, J. and Kopelman, R. Percolation and Cluster Distribution.I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm.Phys. Rev. B 14, 3438–3445, 1976[3] R. H. Swendsen and J.-S. Wang. Nonuniversal Critical Dynamics inMonte Carlo Simulations. Phys. Rev. Lett., 58:86–88, Jan 1987.[4] Intel Corp. Intel Xeon Phi Coprocessor 5110P, Product Brief, 2012.Referenceswende@zib.de Connected Component Labeling on Xeon Phi 12ISC13, Leipzig

No public clipboards found for this slide

×
### Save the most important slides with Clipping

Clipping is a handy way to collect and organize the most important slides from a presentation. You can keep your great finds in clipboards organized around topics.

Be the first to comment