Your SlideShare is downloading. ×
0
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization and Vectorization

1,471

Published on

ZIB use Xeon Phi to achieve their Connected Compenent Labeling strategy #ISC13 #HPC

ZIB use Xeon Phi to achieve their Connected Compenent Labeling strategy #ISC13 #HPC

0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,471
On Slideshare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Florian WendeZuse-Institute BerlinConnected ComponentLabeling on Xeon PhiParallelization & Vectorization
  • 2. wende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, LeipzigConnected Component LabelingSuppose we are given the following image . . .
  • 3. . . . and we are to assign unique labels to different connected regions!Connected Component Labelingwende@zib.de Connected Component Labeling on Xeon Phi 1ISC13, Leipzig
  • 4. . . . and we are to assign unique labels to different connected regions!. . . In parallel? Computer VisionDetect connected regions in images Computational PhysicsCluster algorithms for the Ising model Percolation TheoryHow to achieve the labeling? . . .Connected Component Labelingwende@zib.de Connected Component Labeling on Xeon Phi 2ISC13, Leipzig
  • 5. 1. Labeling algorithm2. Parallelizationa. Parallel implementation on CPUb. Run the CPU code on the Xeon Phic. Adapt the code for the Xeon Phi3. Vectorization (SIMD)d. Leave it to the compiler (auto-vectorization)e. SIMD intrinsic functionsXeon Phi: 512-Bit SIMD unit for 16 x 32-bit wordsConnected Component Labeling - Strategywende@zib.de Connected Component Labeling on Xeon Phi 3ISC13, Leipzig
  • 6.  Breadth/Depth first search algorithm, multi-pass algorithms Hoshen-Kopelman algorithm Cluster self-labeling algorithm by Coddington and Baillie1. Assign a unique label to each pixel of the image2. For each pixel consider its adjacent connected pixels in positive 1-, 2-, . . .direction and set the respective labels to the minimum value each3. If for all pixels the minimum operation is the identity function: Finished!Otherwise: Continue with step 2CPU: Hoshen-KopelmanXeon Phi: Hoshen-Kopelman vs. Cluster self-labelingConnected Component Labeling - Algorithmwende@zib.de Connected Component Labeling on Xeon Phi 4ISC13, Leipzig
  • 7. Partition the image into equal-sized sub-images, and label themindependently using multiple threadsConnected Comp. Labeling - Parallelizationwende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
  • 8. Partition the image into equal-sized sub-images, and label themindependently using multiple threads Unique labels acrossdifferent sub-images Connected regions thatextend over multiple sub-images are merged after thelabeling using atomicprimitivesThread 0Thread 2Thread 4Thread 6Thread 1Thread 3Thread 5Thread 7Connected Comp. Labeling - Parallelizationwende@zib.de Connected Component Labeling on Xeon Phi 5ISC13, Leipzig
  • 9. Example: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructionsConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 10.  Process multiple data simultaneously using SIMD instructions1. Initialize labeling (array index)Example: Self-labeling within sub-image of thread 2Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 11. 1. Initialize labeling (array index)2. Load row[0] into reg0, andcreate mask for adjacententries in positive 1-direction:1 if equal-colored0 otherwiseExample: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructions1-directionConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 12. 1. Initialize labeling (array index)2. Load row[0] into reg0, andcreate mask for adjacententries in positive 1-direction:1 if equal-colored0 otherwise3. Overlap each element in reg0 with itsadjacent element in positive 1-direction,and write the result to reg1Example: Self-labeling within sub-image of thread 2 Process multiple data simultaneously using SIMD instructionsConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 13. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg1Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 14. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the maskConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 15. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg1Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 16. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg17. Shift all bits in mask one position up, and write the pairwise minimumentries in row[0] and reg1 to row[0] using the shifted maskConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 17. 4. Determine the pairwiseminimum of the entries in reg0and reg1 using the mask, andwrite the result to reg15. Write back entries in reg1 torow[0] using the mask6. Shift all elements in reg1 oneposition in positive 1-direction, shiftingin the 0-th element, and write the result to reg17. Shift all bits in mask one position up, and write the pairwise minimumentries in row[0] and reg1 to row[0] using the shifted mask8. Did labels change?Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 6ISC13, Leipzig
  • 18. Result of the operations up to now . . .Set adjacent connectedelements in row[0] to thepairwise minimum value eachBeforeAfterRepeat the procedure for the 2-direction.1-direction2-directionConnected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 7ISC13, Leipzig
  • 19. Repeat the procedure for all other rows as long as labels change . . .BeforeAfterNow: Merge labels across different sub-images using atomics!Finished!Connected Comp. Labeling - Vectorizationwende@zib.de Connected Component Labeling on Xeon Phi 8ISC13, Leipzig
  • 20. CPU: Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHz Hoshen-Kopelman algorithm + Atomics for label merging Vectorization was left to the compiler: there are no masked SIMD intrinsics!Xeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHz Hoshen-Kopelman vs. Cluster self-labeling + Atomics for label merging Vectorization by means of _mm512_[mask]_XXX() instrinsicsParallelization by means of OpenMP: #pragma omp parallel {...}Programming effort: approx. 2-3 days for the CPU code (incl. optimization)less than 1 day for the Xeon Phi code (based on CPU code)Connected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 9ISC13, Leipzig
  • 21. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHzXeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHzApplication: Swendsen-Wang cluster algorithm for the 2D Ising modelConnected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
  • 22. CPU: Intel Xeon E5-2670, 8 Cores + 2-way Hyper-Threading @ 2.6GHzXeon Phi: 60 Cores + 4-way Hyper-Threading @ 1.1GHzApplication: Swendsen-Wang cluster algorithm for the 2D Ising modelConnected Comp. Labeling - Benchmarkwende@zib.de Connected Component Labeling on Xeon Phi 10ISC13, Leipzig
  • 23. Work partially funded byBMBF Grant No. 01IH11004GDr. Thomas SteinkeZuse-Institute Berlin (ZIB)Dr. Michael KlemmIntel GmbH, GermanyAcknowledgementwende@zib.de Connected Component Labeling on Xeon Phi 11ISC13, Leipzig
  • 24. [1] C. F. Baillie and P. D. Coddington. Cluster Identification Algorithmsfor Spin Models – Sequential and Parallel, 1991.[2] Hoshen, J. and Kopelman, R. Percolation and Cluster Distribution.I. Cluster Multiple Labeling Technique and Critical Concentration Algorithm.Phys. Rev. B 14, 3438–3445, 1976[3] R. H. Swendsen and J.-S. Wang. Nonuniversal Critical Dynamics inMonte Carlo Simulations. Phys. Rev. Lett., 58:86–88, Jan 1987.[4] Intel Corp. Intel Xeon Phi Coprocessor 5110P, Product Brief, 2012.Referenceswende@zib.de Connected Component Labeling on Xeon Phi 12ISC13, Leipzig

×