Connected Components Labeling

2,325 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,325
On SlideShare
0
From Embeds
0
Number of Embeds
32
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Connected Components Labeling

  1. 1. Connected Components Labeling Term Project: CS395T, Software for Multicore Processors Hemanth Kumar Mantri Siddharth Subramanian Kumar Ashish
  2. 2. Big Picture• Studied, Implemented and Evaluated various parallel algorithms for Connected Components Labeling in Graphs• Two Architectures – CPU (OpenMP) and GPU (CUDA)• Different types of graphs• Propose simple Autotuned approach for choosing best technique for a graph
  3. 3. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  4. 4. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  5. 5. Why Connected Components?• Identify vertices that form a connected set in a Graph• Used in: – Pattern Recognition – Physics • Identify Clusters – Biology • DNA components – Social Network Analysis
  6. 6. Applications• Physics • Image Processing – Identify Clusters• Biology – Components in DNA • Pattern Recognition • Gesture Recognition
  7. 7. Sequential Implementation• Disjoint Set Union – MakeSet – Union – Link – FindSet• Depth First Search
  8. 8. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  9. 9. Rooted Star• Directed tree of h = 1• Root points to itself• All children point to the root• Root is called the representative of a connected component
  10. 10. Hooking• (i, j) is an edge in the graph• If i and j are currently in different trees• Merge the two trees in to one• Make representative of one, point to the representative of the other
  11. 11. Breaking Ties• Merging two trees T1 and T2,• Whose representative should be changed? – Toss a coin and choose a winner – Tree with lower(higher) index wins always – Alternate between iterations (Even, Odd) – Tree with greater height wins
  12. 12. Pointer Jumping• Move a node higher in the tree• Single Level• Multi Level• Final Aim – Form Rooter Stars
  13. 13. EXAMPLE
  14. 14. Start From Singletons
  15. 15. Hooking
  16. 16. Pointer Jumping
  17. 17. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  18. 18. SV Algorithm
  19. 19. Revised Deterministic Algorithm
  20. 20. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  21. 21. CPU Optimizations• Single Instance edge storage – (u, v) is same as (v, u) – Reduced Memory Footprint • Support large graphs – Smaller traversal overhead • Every iteration needs to see all edges• Unconditional Hooking – Calling at appropriate iteration helps in decreasing the number of iterations
  22. 22. Multi Level Pointer Jumping• Only form stars in every iteration• No overhead in determining if a node is part of a star
  23. 23. OpenMP Scheduling• Static• Dynamic• Guided Scheduling – Gave best performance
  24. 24. Hide Inactive Edges• If two ends of an edge are part of same connected component, hide them• Save time for next iterations
  25. 25. For GPU• Different from PRAM Model – Threads are grouped into Thread Blocks – Requires explicit synchronization across TBs• 64 bit for representing an edge – Reduced Random Reads – Read edge in single memory transaction• In first Iteration hook neighbors instead of their parents – Reduced irregular reads• GeForce GTX 480 – Use 1024 threads per block
  26. 26. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Autotuning• Future Scope
  27. 27. Datasets• Random Graphs – 1M to 7M nodes, average degree 5• RMAT Graphs – Synthetic Social Networks – 1M to 7M nodes• Real World Data (From SNAP, by Leskovec) – Road Networks: • California • Pennsylvania • Texas – Web Graphs • Google Web • Berkeley-Stanford domains
  28. 28. Execution Environment• CPU (Faraday): A 48 core Intel Xeon E7540 (2.00 GHz), with 18 MB cache, 132 GB RAM• GPU (Gleim): GeForce GTX 480 with 1.5 GB shared memory and 177.4 GB/s memory bandwidth. It was attached to a Quadcore Intel Xeon CPU (2.40 GHz) running CUDA Toolkit/SDK version 4.1. The host machine had 6 GB RAM.
  29. 29. Random Graphs CPU – Scaling with threads
  30. 30. RMAT-Graphs CPU – Scaling with threads
  31. 31. Web graphs CPU – Scaling with threads
  32. 32. Road network CPU – Scaling with threads
  33. 33. Random graph – Scaling with vertices
  34. 34. R-MAT – Scaling with vertices
  35. 35. GPU on Random and RMAT
  36. 36. Real World Graphs
  37. 37. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Analysis and Autotuning• Future Scope
  38. 38. What is Autotuning?• Automatic process for selecting one out of several possible solutions to a computational problem.• The solutions may differ in the – algorithm (quicksort vs selection sort) – implementation (loop unroll).• The versions may result from – transformations (unroll, tile, interchange)• The versions could be generated by – programmer manually (coding or directives) – compiler automatically
  39. 39. How?• Have various ways to do hooking, pointer jumping• Characterize graphs based on some features• Employ the best technique for a given graph
  40. 40. Performance Deciders• Number of Iterations – Each iteration needs to traverse the whole set of edges• Pointer Jumps – Higher the root node, more the work• Trade off – More iterations and Single level jump in each iteration – Less iterations with Multi Level jumps
  41. 41. Choosing Right Approach• More iterations and Single level jump in each iteration – Good for graphs with less edges and less diameter – If edges is constant, works well for social networks• Less iterations with Multi Level jumps – Good for graphs with large diameter – Very good scalability – Good for GPU – Road Network
  42. 42. Graph Types• Road Networks – Large diameter – Forms very deep trees• R-MAT and Social Networks – More Cliques• Web Graphs – Dense graphs
  43. 43. Other Findings• Multilevel Pointer Jumping – Less number of iterations – Star-check is not required – Good for high diameter graphs – Good scalability for R-MAT graphs• Even-Odd Hooking – Works well with random and R-MAT graphs – Performance quite similar to Optimized SV in most cases
  44. 44. Our approach• Given: A graph whose type is unknown• Training phase: Generate models of known graph types by running and profiling the feature values• Test phase: – Run initial algorithm for few iterations – Find the graph similar to current profile – Switch to best algorithm for that graph type
  45. 45. Feature selection• Pointer jumpings per hook – Captures the amount of work per iteration• Percentage of pointer jumpings done per iteration – Might give insights about type of graph – Problem: Needs information from future iterations
  46. 46. Effectiveness of features – Pointer jumpings per hook
  47. 47. Percentage of pointer jumpings
  48. 48. Percentage of pointer jumpings (modified)
  49. 49. Simple tool• parallel_ccl – Optimizations supplied as command line args
  50. 50. Our Menu• Motivation• Definitions• Basic Algorithms• Optimizations• Datasets and Experiments• Analysis and Autotuning• Future Scope
  51. 51. Future Scope• More sophisticated Autotuning – Reduce profiling overhead – Introduce more intelligent modeling based on better features for the graphs• Heterogeneous Algorithm – Start with running on GPU – Parallelism falls after a few iterations • Less active edges – Switch to CPU to save power
  52. 52. GPU power profile

×