• Like
doug
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

Published

 

Published in Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
975
On SlideShare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
19
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_address]
  • 2. What are the BGLs?
    • A collection of libraries for computation on graphs/networks.
      • Graph data structures
      • Graph algorithms
      • Graph input/output
    • Common design
      • Flexibility/customizability throughout
      • Obsessed with performance
      • Common interfaces throughout the collection
    • All open source, freely available online
    Intro
  • 3. The BGL Family
    • The Original (sequential) BGL
    • BGL-Python
    • The Parallel BGL
    • Parallel BGL-Python
    Intro
  • 4. The Original BGL
    • The largest and most mature BGL
      • ~7 years of research and development
      • Many users, contributors outside of the OSL
      • Steadily evolving
    • Written in C++
      • Generic
      • Highly customizable
      • Efficient (both storage and execution)
    Intro BGL
  • 5. BGL: Graph Data Structures
    • Graphs:
      • adjacency_list : highly configurable with user-specified containers for vertices and edges
      • adjacency_matrix
      • compressed_sparse_row
    • Adaptors:
      • subgraphs, filtered graphs, reverse graphs
      • LEDA and Stanford GraphBase
    • Or, use your own…
    Intro BGL
  • 6. Original BGL: Algorithms
    • Searches (breadth-first, depth-first, A*)
    • Single-source shortest paths (Dijkstra, Bellman-Ford, DAG)
    • All-pairs shortest paths (Johnson, Floyd-Warshall)
    • Minimum spanning tree (Kruskal, Prim)
    • Components (connected, strongly connected, biconnected)
    • Maximum cardinality matching
    • Max-flow (Edmonds-Karp, push-relabel)
    • Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)
    • Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)
    • Betweenness centrality
    • PageRank
    • Isomorphism
    • Vertex coloring
    • Transitive closure
    • Dominator tree
    Intro BGL
  • 7. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • 8. Define a Graph Type
    • Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; };
    • Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;
    Intro BGL
  • 9. Read in a GraphViz DOT File
    • Build an empty graph: Graph g;
    • Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g));
    • Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn);
    Intro BGL
  • 10. Run Biconnected Components
    • Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points;
    • Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));
    Intro BGL
  • 11. Output results
    • Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g));
    • Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn);
    • Show articulation points: cout << “Articulation points: “; for ( int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; }
    Intro BGL
  • 12. Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • 13. Original BGL Summary
    • The original BGL is large, stable, efficient
      • Lots of algorithms, graph types
      • Peer-reviewed code with many users, nightly regression testing, etc.
      • Performance comparable to FORTRAN.
    • Who should use the BGL?
      • Programmers comfortable with C++
      • Users with graph sizes from tens of vertices to millions of vertices
    Intro BGL
  • 14. BGL-Python
    • Python is ideal for rapid prototyping:
      • It’s a scripting language (no compiler)
      • Dynamically typed means less typing for you
      • Easy to use: you already know Python…
    • BGL-Python provides access to the BGL from within Python
      • Similar interfaces to C++ BGL
      • Easier to learn than C++
      • Great for scripting, GUI applications
      • help(bgl.dijkstra_shortest_paths)
    Intro BGL Python
  • 15. Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot;biconnected_components.dot&quot;) # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz(&quot;biconnected_components_out.dot&quot;) print &quot;Articulation points: &quot;, node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print &quot;&quot; Intro BGL Python
  • 16. Wrapping the BGL in Python
    • BGL-Python is not a…
      • “ port”
      • reimplementation
    • BGL-Python wraps the C++ BGL
      • Python calls translate to C++ calls
      • C++ can call back into Python
    • Most of the speed of C++
    • Most of the flexibility of Python
  • 17. Performance: Shortest Paths Intro BGL Python
  • 18. BGL-Python Summary
    • BGL-Python is all about tradeoffs:
      • More gradual learning curve
      • Faster time-to-solution
      • Lower performance
    • Our typical approach:
      • Prototype in Python to get your ideas down
      • Port to C++ when performance matters
    Intro BGL Python
  • 19.  
  • 20. The Parallel BGL
    • A version of the C++ BGL for computational clusters
      • Distributed memory for huge graphs
      • Parallel processing for improved performance
    • An active research project
    • Closely related to the original BGL
      • Parallelizing BGL programs should be “easy”
    Intro BGL Parallel Python
  • 21. Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Parallel Python A simple, directed graph…
  • 22. Parallel Graph Algorithms
    • Breadth-first search
    • Eager Dijkstra’s single-source shortest paths
    • Crauser et al. single-source shortest paths
    • Depth-first search
    • Minimum spanning tree (Boruvka, Dehne & Götz)
    • Connected components
    • Strongly connected components
    • Biconnected components
    • PageRank
    • Graph coloring
    • Fruchterman-Reingold layout
    • Max-flow (Dinic’s)
    Intro BGL Parallel Python
  • 23. Performance: Sparse graphs
  • 24. Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
  • 25. Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
  • 26. Parallel BGL Summary
    • The Parallel BGL is built for huge graphs
      • Millions to hundreds of millions of nodes
      • Distributed-memory parallel processing on clusters
      • Future work will permit larger graphs…
    • Parallel programming has a learning curve
      • Parallel graph algorithms much harder to write
      • Distributed graph manipulation can be tricky
    • Parallel BGL is an active research library
    Intro BGL Parallel Python
  • 27. Distributed Graph Layout Intro BGL Parallel Python
  • 28. Parallel BGL in Python
    • Preliminary support for the Parallel BGL in Python
      • Just import boost.graph.distributed
      • Similar interface to sequential BGL-Python
    • Several options for usage with MPI:
      • Straight MPI: mpirun -np 2 python script.py
      • pyMPI: allows interactive use of the interpreter
    • Initially used to prototype our distributed Fruchterman-Reingold implementation.
    Intro BGL Parallel Python
  • 29. Porting for Performance Intro BGL Parallel Python Porting
  • 30. Which BGL is Right for You?
    • Is any BGL right for you?
    • Depends on how large your networks are:
      • Up to 1/2 million vertices, any BGL will do
      • C++ BGL can push to a couple million vertices
      • For tens of millions or larger, Parallel BGL only
    • Other considerations:
      • You can prototype in Python, port to C++
      • Algorithm authors might prefer the original BGL
      • Parallelism is very hard to manage
    Intro BGL Parallel Python Porting
  • 31. Conclusion
    • The Boost Graph Library family is a collection of full-featured graph libraries
      • All are flexible, customizable, efficient
      • Easy to port from Python to C++
      • Can port from sequential to parallel
      • Always growing, improving
    • Is one of the BGLs right for you?
      • A typical “build or buy” decision
    Intro BGL Parallel Python Porting Conclusion
  • 32. For More Information…
    • (Original) Boost Graph Library http://www.boost.org/libs/graph/doc
    • Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl
    • Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python
    • Contact us!
      • Douglas Gregor < [email_address] . iu . edu >
      • Andrew Lumsdaine < [email_address] >
    Intro BGL Parallel Python Porting Conclusion
  • 33. Other BGL Variants
    • QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp
    • Ruby Graph Library http: //rubyforge . org/projects/rgl/
    • Rooster Graph (Scheme) http://savannah. nongnu . org/projects/rgraph/
    • RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html
    • Disclaimer: These are all separate projects. We do not maintain them.
    Intro BGL Parallel Python Porting
  • 34. Comparative Performance Intro BGL