Connected Components Labeling

Hemanth Kumar Mantri
Hemanth Kumar MantriGraduate Teaching Assistant
Connected Components Labeling
  Term Project: CS395T, Software for Multicore Processors


                  Hemanth Kumar Mantri
                  Siddharth Subramanian
                      Kumar Ashish
Big Picture
• Studied, Implemented and Evaluated
  various parallel algorithms for Connected
  Components Labeling in Graphs
• Two Architectures
  – CPU (OpenMP) and GPU (CUDA)
• Different types of graphs
• Propose simple Autotuned approach for
  choosing best technique for a graph
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
Why Connected Components?
• Identify vertices that
  form a connected set in
  a Graph
• Used in:
   – Pattern Recognition
   – Physics
      • Identify Clusters
   – Biology
      • DNA components
   – Social Network Analysis
Applications
• Physics               • Image Processing
  – Identify Clusters
• Biology
  – Components in DNA




                        • Pattern Recognition
                        • Gesture Recognition
Sequential Implementation
• Disjoint Set Union
  –   MakeSet
  –   Union
  –   Link
  –   FindSet


• Depth First Search
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
Rooted Star
• Directed tree of h = 1

• Root points to itself

• All children point to the
  root

• Root is called the
  representative of a
  connected component
Hooking
• (i, j) is an edge in the
  graph
• If i and j are currently
  in different trees
• Merge the two trees
  in to one
• Make representative
  of one, point to the
  representative of the
  other
Breaking Ties
• Merging two trees T1 and T2,
• Whose representative should be
  changed?
  – Toss a coin and choose a winner
  – Tree with lower(higher) index wins always
  – Alternate between iterations (Even, Odd)
  – Tree with greater height wins
Pointer Jumping
• Move a node higher
  in the tree

• Single Level

• Multi Level

• Final Aim
  – Form Rooter Stars
EXAMPLE
Start From Singletons
Hooking
Pointer Jumping
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
SV Algorithm
Revised Deterministic Algorithm
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
CPU Optimizations
• Single Instance edge storage
  – (u, v) is same as (v, u)
  – Reduced Memory Footprint
     • Support large graphs
  – Smaller traversal overhead
     • Every iteration needs to see all edges
• Unconditional Hooking
  – Calling at appropriate iteration helps in
    decreasing the number of iterations
Multi Level Pointer Jumping
• Only form stars in
  every iteration
• No overhead in
  determining if a node
  is part of a star
OpenMP Scheduling
• Static

• Dynamic

• Guided Scheduling
  – Gave best performance
Hide Inactive Edges
• If two ends of an edge
  are part of same
  connected
  component, hide
  them
• Save time for next
  iterations
For GPU
• Different from PRAM Model
   – Threads are grouped into Thread Blocks
   – Requires explicit synchronization across TBs

• 64 bit for representing an edge
   – Reduced Random Reads
   – Read edge in single memory transaction

• In first Iteration hook neighbors instead of their parents
   – Reduced irregular reads

• GeForce GTX 480
   – Use 1024 threads per block
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Autotuning
•   Future Scope
Datasets
• Random Graphs
  – 1M to 7M nodes, average degree 5
• RMAT Graphs
  – Synthetic Social Networks
  – 1M to 7M nodes
• Real World Data (From SNAP, by Leskovec)
  – Road Networks:
     • California
     • Pennsylvania
     • Texas
  – Web Graphs
     • Google Web
     • Berkeley-Stanford domains
Execution Environment
• CPU (Faraday): A 48 core Intel Xeon
  E7540 (2.00 GHz), with 18 MB cache, 132
  GB RAM
• GPU (Gleim): GeForce GTX 480 with 1.5
  GB shared memory and 177.4 GB/s
  memory bandwidth. It was attached to a
  Quadcore Intel Xeon CPU (2.40 GHz)
  running CUDA Toolkit/SDK version 4.1.
  The host machine had 6 GB RAM.
Random Graphs CPU – Scaling with threads
RMAT-Graphs CPU – Scaling with threads
Web graphs CPU – Scaling with threads
Road network CPU – Scaling with threads
Random graph – Scaling with vertices
R-MAT – Scaling with vertices
GPU on Random and RMAT
Real World Graphs
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Analysis and Autotuning
•   Future Scope
What is Autotuning?
• Automatic process for selecting one out of several
  possible solutions to a computational problem.
• The solutions may differ in the
   – algorithm (quicksort vs selection sort)
   – implementation (loop unroll).
• The versions may result from
   – transformations (unroll, tile, interchange)
• The versions could be generated by
   – programmer manually (coding or directives)
   – compiler automatically
How?
• Have various ways to do hooking, pointer
  jumping
• Characterize graphs based on some
  features
• Employ the best technique for a given
  graph
Performance Deciders
• Number of Iterations
  – Each iteration needs to traverse the whole set
    of edges
• Pointer Jumps
  – Higher the root node, more the work
• Trade off
  – More iterations and Single level jump in each
    iteration
  – Less iterations with Multi Level jumps
Choosing Right Approach
• More iterations and Single level jump in each
  iteration
  – Good for graphs with less edges and less
    diameter
  – If edges is constant, works well for social
    networks
• Less iterations with Multi Level jumps
  – Good for graphs with large diameter
  – Very good scalability – Good for GPU
  – Road Network
Graph Types
• Road Networks
  – Large diameter
  – Forms very deep trees

• R-MAT and Social Networks
  – More Cliques

• Web Graphs
  – Dense graphs
Other Findings
• Multilevel Pointer Jumping
  – Less number of iterations
  – Star-check is not required
  – Good for high diameter graphs
  – Good scalability for R-MAT graphs
• Even-Odd Hooking
  – Works well with random and R-MAT graphs
  – Performance quite similar to Optimized SV in
    most cases
Our approach
• Given: A graph whose type is unknown
• Training phase: Generate models of
  known graph types by running and
  profiling the feature values
• Test phase:
  – Run initial algorithm for few iterations
  – Find the graph similar to current profile
  – Switch to best algorithm for that graph type
Feature selection
• Pointer jumpings per hook
  – Captures the amount of work per iteration
• Percentage of pointer jumpings done per
  iteration
  – Might give insights about type of graph
  – Problem: Needs information from future
    iterations
Effectiveness of features – Pointer jumpings
                  per hook
Percentage of pointer jumpings
Percentage of pointer jumpings
         (modified)
Simple tool
• parallel_ccl




  – Optimizations supplied as command line args
Our Menu
•   Motivation
•   Definitions
•   Basic Algorithms
•   Optimizations
•   Datasets and Experiments
•   Analysis and Autotuning
•   Future Scope
Future Scope
• More sophisticated Autotuning
  – Reduce profiling overhead
  – Introduce more intelligent modeling based on
    better features for the graphs
• Heterogeneous Algorithm
  – Start with running on GPU
  – Parallelism falls after a few iterations
     • Less active edges
  – Switch to CPU to save power
GPU power profile
1 of 52

Recommended

Simulation Tracking Object Reference Model (STORM) by
Simulation Tracking Object Reference Model (STORM)Simulation Tracking Object Reference Model (STORM)
Simulation Tracking Object Reference Model (STORM)Umar Alharaky
860 views55 slides
Parallelism in sql server by
Parallelism in sql serverParallelism in sql server
Parallelism in sql serverEnrique Catala Bañuls
2.1K views42 slides
Sathya Final review by
Sathya Final reviewSathya Final review
Sathya Final reviewSathiyasainathan Soundararajan
281 views33 slides
Connected component labeling algorithm by
Connected component labeling algorithmConnected component labeling algorithm
Connected component labeling algorithmManas Mantri
2.9K views38 slides
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization... by
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization...Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization...
Connected Component Labeling on Intel Xeon Phi Coprocessors – Parallelization...Intel IT Center
2.7K views24 slides
IMAGE SEGMENTATION. by
IMAGE SEGMENTATION.IMAGE SEGMENTATION.
IMAGE SEGMENTATION.Tawose Olamide Timothy
62K views57 slides

More Related Content

Similar to Connected Components Labeling

Machine learning for IoT - unpacking the blackbox by
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxIvo Andreev
21.6K views35 slides
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx by
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxneju3
10 views15 slides
Saturn - UCSD CNS Research Review by
Saturn - UCSD CNS Research ReviewSaturn - UCSD CNS Research Review
Saturn - UCSD CNS Research ReviewKabirNagrecha
4 views20 slides
Saturn: Joint Optimization for Large-Model Deep Learning by
Saturn: Joint Optimization for Large-Model Deep LearningSaturn: Joint Optimization for Large-Model Deep Learning
Saturn: Joint Optimization for Large-Model Deep LearningKabirNagrecha
19 views20 slides
Random Walks on Large Scale Graphs with Apache Spark with Min Shen by
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min ShenDatabricks
3.7K views26 slides
ENAR short course by
ENAR short courseENAR short course
ENAR short courseDeepak Agarwal
1.5K views309 slides

Similar to Connected Components Labeling(20)

Machine learning for IoT - unpacking the blackbox by Ivo Andreev
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev21.6K views
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx by neju3
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptxPPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
PPT-UEU-Database-Objek-Terdistribusi-Pertemuan-8.pptx
neju310 views
Saturn - UCSD CNS Research Review by KabirNagrecha
Saturn - UCSD CNS Research ReviewSaturn - UCSD CNS Research Review
Saturn - UCSD CNS Research Review
KabirNagrecha4 views
Saturn: Joint Optimization for Large-Model Deep Learning by KabirNagrecha
Saturn: Joint Optimization for Large-Model Deep LearningSaturn: Joint Optimization for Large-Model Deep Learning
Saturn: Joint Optimization for Large-Model Deep Learning
KabirNagrecha19 views
Random Walks on Large Scale Graphs with Apache Spark with Min Shen by Databricks
Random Walks on Large Scale Graphs with Apache Spark with Min ShenRandom Walks on Large Scale Graphs with Apache Spark with Min Shen
Random Walks on Large Scale Graphs with Apache Spark with Min Shen
Databricks3.7K views
Spark Autotuning - Spark Summit East 2017 by Alpine Data
Spark Autotuning - Spark Summit East 2017 Spark Autotuning - Spark Summit East 2017
Spark Autotuning - Spark Summit East 2017
Alpine Data177 views
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen by Spark Summit
Spark Autotuning: Spark Summit East talk by Lawrence SpracklenSpark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Autotuning: Spark Summit East talk by Lawrence Spracklen
Spark Summit1.7K views
Final Presentation - Edan&Itzik by itzik cohen
Final Presentation - Edan&ItzikFinal Presentation - Edan&Itzik
Final Presentation - Edan&Itzik
itzik cohen77 views
Writing Scalable Software in Java by Ruben Badaró
Writing Scalable Software in JavaWriting Scalable Software in Java
Writing Scalable Software in Java
Ruben Badaró13K views
What is jubatus? How it works for you? by Kumazaki Hiroki
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?
Kumazaki Hiroki2.2K views
Optimizing thread performance for a genomics variant caller by AllineaSoftware
Optimizing thread performance for a genomics variant callerOptimizing thread performance for a genomics variant caller
Optimizing thread performance for a genomics variant caller
AllineaSoftware280 views
Sparking Science up with Research Recommendations by Maya Hristakeva by Spark Summit
Sparking Science up with Research Recommendations by Maya HristakevaSparking Science up with Research Recommendations by Maya Hristakeva
Sparking Science up with Research Recommendations by Maya Hristakeva
Spark Summit2.4K views

More from Hemanth Kumar Mantri

TCP Issues in DataCenter Networks by
TCP Issues in DataCenter NetworksTCP Issues in DataCenter Networks
TCP Issues in DataCenter NetworksHemanth Kumar Mantri
1.5K views32 slides
Basic Paxos Implementation in Orc by
Basic Paxos Implementation in OrcBasic Paxos Implementation in Orc
Basic Paxos Implementation in OrcHemanth Kumar Mantri
1.3K views28 slides
Neural Networks in File access Prediction by
Neural Networks in File access PredictionNeural Networks in File access Prediction
Neural Networks in File access PredictionHemanth Kumar Mantri
720 views17 slides
JPEG Image Compression by
JPEG Image CompressionJPEG Image Compression
JPEG Image CompressionHemanth Kumar Mantri
6.8K views42 slides
Traffic Simulation using NetLogo by
Traffic Simulation using NetLogoTraffic Simulation using NetLogo
Traffic Simulation using NetLogoHemanth Kumar Mantri
3K views12 slides
Search Engine Switching by
Search Engine SwitchingSearch Engine Switching
Search Engine SwitchingHemanth Kumar Mantri
282 views30 slides

Recently uploaded

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITShapeBlue
166 views8 slides
"Surviving highload with Node.js", Andrii Shumada by
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada Fwdays
53 views29 slides
Uni Systems for Power Platform.pptx by
Uni Systems for Power Platform.pptxUni Systems for Power Platform.pptx
Uni Systems for Power Platform.pptxUni Systems S.M.S.A.
61 views21 slides
Ransomware is Knocking your Door_Final.pdf by
Ransomware is Knocking your Door_Final.pdfRansomware is Knocking your Door_Final.pdf
Ransomware is Knocking your Door_Final.pdfSecurity Bootcamp
90 views46 slides
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...ShapeBlue
132 views15 slides
Microsoft Power Platform.pptx by
Microsoft Power Platform.pptxMicrosoft Power Platform.pptx
Microsoft Power Platform.pptxUni Systems S.M.S.A.
80 views38 slides

Recently uploaded(20)

Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT by ShapeBlue
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBITUpdates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
Updates on the LINSTOR Driver for CloudStack - Rene Peinthor - LINBIT
ShapeBlue166 views
"Surviving highload with Node.js", Andrii Shumada by Fwdays
"Surviving highload with Node.js", Andrii Shumada "Surviving highload with Node.js", Andrii Shumada
"Surviving highload with Node.js", Andrii Shumada
Fwdays53 views
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R... by ShapeBlue
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
Setting Up Your First CloudStack Environment with Beginners Challenges - MD R...
ShapeBlue132 views
Future of AR - Facebook Presentation by Rob McCarty
Future of AR - Facebook PresentationFuture of AR - Facebook Presentation
Future of AR - Facebook Presentation
Rob McCarty62 views
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive by Network Automation Forum
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLiveAutomating a World-Class Technology Conference; Behind the Scenes of CiscoLive
Automating a World-Class Technology Conference; Behind the Scenes of CiscoLive
DRBD Deep Dive - Philipp Reisner - LINBIT by ShapeBlue
DRBD Deep Dive - Philipp Reisner - LINBITDRBD Deep Dive - Philipp Reisner - LINBIT
DRBD Deep Dive - Philipp Reisner - LINBIT
ShapeBlue140 views
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava... by ShapeBlue
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
Centralized Logging Feature in CloudStack using ELK and Grafana - Kiran Chava...
ShapeBlue101 views
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue by ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlueMigrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
Migrating VMware Infra to KVM Using CloudStack - Nicolas Vazquez - ShapeBlue
ShapeBlue176 views
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda... by ShapeBlue
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
Hypervisor Agnostic DRS in CloudStack - Brief overview & demo - Vishesh Jinda...
ShapeBlue120 views
Digital Personal Data Protection (DPDP) Practical Approach For CISOs by Priyanka Aash
Digital Personal Data Protection (DPDP) Practical Approach For CISOsDigital Personal Data Protection (DPDP) Practical Approach For CISOs
Digital Personal Data Protection (DPDP) Practical Approach For CISOs
Priyanka Aash153 views
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue by ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlueCloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
CloudStack Object Storage - An Introduction - Vladimir Petrov - ShapeBlue
ShapeBlue93 views
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit... by ShapeBlue
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
Transitioning from VMware vCloud to Apache CloudStack: A Path to Profitabilit...
ShapeBlue117 views
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool by ShapeBlue
Extending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPoolExtending KVM Host HA for Non-NFS Storage -  Alex Ivanov - StorPool
Extending KVM Host HA for Non-NFS Storage - Alex Ivanov - StorPool
ShapeBlue84 views
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online by ShapeBlue
KVM Security Groups Under the Hood - Wido den Hollander - Your.OnlineKVM Security Groups Under the Hood - Wido den Hollander - Your.Online
KVM Security Groups Under the Hood - Wido den Hollander - Your.Online
ShapeBlue181 views
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O... by ShapeBlue
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
Declarative Kubernetes Cluster Deployment with Cloudstack and Cluster API - O...
ShapeBlue88 views
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti... by ShapeBlue
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
DRaaS using Snapshot copy and destination selection (DRaaS) - Alexandre Matti...
ShapeBlue98 views

Connected Components Labeling

  • 1. Connected Components Labeling Term Project: CS395T, Software for Multicore Processors Hemanth Kumar Mantri Siddharth Subramanian Kumar Ashish
  • 2. Big Picture • Studied, Implemented and Evaluated various parallel algorithms for Connected Components Labeling in Graphs • Two Architectures – CPU (OpenMP) and GPU (CUDA) • Different types of graphs • Propose simple Autotuned approach for choosing best technique for a graph
  • 3. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 4. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 5. Why Connected Components? • Identify vertices that form a connected set in a Graph • Used in: – Pattern Recognition – Physics • Identify Clusters – Biology • DNA components – Social Network Analysis
  • 6. Applications • Physics • Image Processing – Identify Clusters • Biology – Components in DNA • Pattern Recognition • Gesture Recognition
  • 7. Sequential Implementation • Disjoint Set Union – MakeSet – Union – Link – FindSet • Depth First Search
  • 8. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 9. Rooted Star • Directed tree of h = 1 • Root points to itself • All children point to the root • Root is called the representative of a connected component
  • 10. Hooking • (i, j) is an edge in the graph • If i and j are currently in different trees • Merge the two trees in to one • Make representative of one, point to the representative of the other
  • 11. Breaking Ties • Merging two trees T1 and T2, • Whose representative should be changed? – Toss a coin and choose a winner – Tree with lower(higher) index wins always – Alternate between iterations (Even, Odd) – Tree with greater height wins
  • 12. Pointer Jumping • Move a node higher in the tree • Single Level • Multi Level • Final Aim – Form Rooter Stars
  • 17. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 20. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 21. CPU Optimizations • Single Instance edge storage – (u, v) is same as (v, u) – Reduced Memory Footprint • Support large graphs – Smaller traversal overhead • Every iteration needs to see all edges • Unconditional Hooking – Calling at appropriate iteration helps in decreasing the number of iterations
  • 22. Multi Level Pointer Jumping • Only form stars in every iteration • No overhead in determining if a node is part of a star
  • 23. OpenMP Scheduling • Static • Dynamic • Guided Scheduling – Gave best performance
  • 24. Hide Inactive Edges • If two ends of an edge are part of same connected component, hide them • Save time for next iterations
  • 25. For GPU • Different from PRAM Model – Threads are grouped into Thread Blocks – Requires explicit synchronization across TBs • 64 bit for representing an edge – Reduced Random Reads – Read edge in single memory transaction • In first Iteration hook neighbors instead of their parents – Reduced irregular reads • GeForce GTX 480 – Use 1024 threads per block
  • 26. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Autotuning • Future Scope
  • 27. Datasets • Random Graphs – 1M to 7M nodes, average degree 5 • RMAT Graphs – Synthetic Social Networks – 1M to 7M nodes • Real World Data (From SNAP, by Leskovec) – Road Networks: • California • Pennsylvania • Texas – Web Graphs • Google Web • Berkeley-Stanford domains
  • 28. Execution Environment • CPU (Faraday): A 48 core Intel Xeon E7540 (2.00 GHz), with 18 MB cache, 132 GB RAM • GPU (Gleim): GeForce GTX 480 with 1.5 GB shared memory and 177.4 GB/s memory bandwidth. It was attached to a Quadcore Intel Xeon CPU (2.40 GHz) running CUDA Toolkit/SDK version 4.1. The host machine had 6 GB RAM.
  • 29. Random Graphs CPU – Scaling with threads
  • 30. RMAT-Graphs CPU – Scaling with threads
  • 31. Web graphs CPU – Scaling with threads
  • 32. Road network CPU – Scaling with threads
  • 33. Random graph – Scaling with vertices
  • 34. R-MAT – Scaling with vertices
  • 35. GPU on Random and RMAT
  • 37. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Analysis and Autotuning • Future Scope
  • 38. What is Autotuning? • Automatic process for selecting one out of several possible solutions to a computational problem. • The solutions may differ in the – algorithm (quicksort vs selection sort) – implementation (loop unroll). • The versions may result from – transformations (unroll, tile, interchange) • The versions could be generated by – programmer manually (coding or directives) – compiler automatically
  • 39. How? • Have various ways to do hooking, pointer jumping • Characterize graphs based on some features • Employ the best technique for a given graph
  • 40. Performance Deciders • Number of Iterations – Each iteration needs to traverse the whole set of edges • Pointer Jumps – Higher the root node, more the work • Trade off – More iterations and Single level jump in each iteration – Less iterations with Multi Level jumps
  • 41. Choosing Right Approach • More iterations and Single level jump in each iteration – Good for graphs with less edges and less diameter – If edges is constant, works well for social networks • Less iterations with Multi Level jumps – Good for graphs with large diameter – Very good scalability – Good for GPU – Road Network
  • 42. Graph Types • Road Networks – Large diameter – Forms very deep trees • R-MAT and Social Networks – More Cliques • Web Graphs – Dense graphs
  • 43. Other Findings • Multilevel Pointer Jumping – Less number of iterations – Star-check is not required – Good for high diameter graphs – Good scalability for R-MAT graphs • Even-Odd Hooking – Works well with random and R-MAT graphs – Performance quite similar to Optimized SV in most cases
  • 44. Our approach • Given: A graph whose type is unknown • Training phase: Generate models of known graph types by running and profiling the feature values • Test phase: – Run initial algorithm for few iterations – Find the graph similar to current profile – Switch to best algorithm for that graph type
  • 45. Feature selection • Pointer jumpings per hook – Captures the amount of work per iteration • Percentage of pointer jumpings done per iteration – Might give insights about type of graph – Problem: Needs information from future iterations
  • 46. Effectiveness of features – Pointer jumpings per hook
  • 48. Percentage of pointer jumpings (modified)
  • 49. Simple tool • parallel_ccl – Optimizations supplied as command line args
  • 50. Our Menu • Motivation • Definitions • Basic Algorithms • Optimizations • Datasets and Experiments • Analysis and Autotuning • Future Scope
  • 51. Future Scope • More sophisticated Autotuning – Reduce profiling overhead – Introduce more intelligent modeling based on better features for the graphs • Heterogeneous Algorithm – Start with running on GPU – Parallelism falls after a few iterations • Less active edges – Switch to CPU to save power