doug
Upcoming SlideShare
Loading in...5
×
 

doug

on

  • 1,979 views

 

Statistics

Views

Total Views
1,979
Views on SlideShare
1,975
Embed Views
4

Actions

Likes
1
Downloads
19
Comments
0

1 Embed 4

http://www.slideshare.net 4

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

doug doug Presentation Transcript

  • Large-Scale Network Analysis with the Boost Graph Libraries Douglas Gregor Open Systems Lab Indiana University [email_address]
  • What are the BGLs?
    • A collection of libraries for computation on graphs/networks.
      • Graph data structures
      • Graph algorithms
      • Graph input/output
    • Common design
      • Flexibility/customizability throughout
      • Obsessed with performance
      • Common interfaces throughout the collection
    • All open source, freely available online
    Intro
  • The BGL Family
    • The Original (sequential) BGL
    • BGL-Python
    • The Parallel BGL
    • Parallel BGL-Python
    Intro
  • The Original BGL
    • The largest and most mature BGL
      • ~7 years of research and development
      • Many users, contributors outside of the OSL
      • Steadily evolving
    • Written in C++
      • Generic
      • Highly customizable
      • Efficient (both storage and execution)
    Intro BGL
  • BGL: Graph Data Structures
    • Graphs:
      • adjacency_list : highly configurable with user-specified containers for vertices and edges
      • adjacency_matrix
      • compressed_sparse_row
    • Adaptors:
      • subgraphs, filtered graphs, reverse graphs
      • LEDA and Stanford GraphBase
    • Or, use your own…
    Intro BGL
  • Original BGL: Algorithms
    • Searches (breadth-first, depth-first, A*)
    • Single-source shortest paths (Dijkstra, Bellman-Ford, DAG)
    • All-pairs shortest paths (Johnson, Floyd-Warshall)
    • Minimum spanning tree (Kruskal, Prim)
    • Components (connected, strongly connected, biconnected)
    • Maximum cardinality matching
    • Max-flow (Edmonds-Karp, push-relabel)
    • Sparse matrix ordering (Cuthill-McKee, King, Sloan, minimum degree)
    • Layout (Kamada-Kawai, Fruchterman-Reingold, Gursoy-Atun)
    • Betweenness centrality
    • PageRank
    • Isomorphism
    • Vertex coloring
    • Transitive closure
    • Dominator tree
    Intro BGL
  • Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • Define a Graph Type
    • Determine vertex/edge properties: struct Vertex { string name; }; struct Edge { int bicomponent; };
    • Determine the graph type: typedef adjacency_list< /*EdgeListS=*/ vecS, /*VertexListS=*/ vecS, /*DirectedS=*/ undirectedS, /*VertexProperty=*/ Vertex, /*EdgeProperty=*/ Edge> Graph;
    Intro BGL
  • Read in a GraphViz DOT File
    • Build an empty graph: Graph g;
    • Map vertex properties: dynamic_properties dyn; dyn.property(“node_id”, get(&Vertex::name, g));
    • Read in the GraphViz graph: ifstream in(“biconnected_components.dot”); read_graphviz(in, g, dyn);
    Intro BGL
  • Run Biconnected Components
    • Keep track of the articulation points: vector<Graph::vertex_descriptor> art_points;
    • Compute biconnected components: biconnected_components (g, get(&Edge::bicomponent, g), back_inserter(art_points));
    Intro BGL
  • Output results
    • Attach bicomponent number to the “label” property of edges: dyn.property(“label”, get(&Edge::bicomponent, g));
    • Write results to another GraphViz file: ofstream out(“bc_out.dot”); write_graphviz(out, g, dyn);
    • Show articulation points: cout << “Articulation points: “; for ( int i = 0;i < art_points.size(); ++i) { cout << g[art_points[i]].name << ‘ ‘; }
    Intro BGL
  • Task: Biconnected Components Input Graph Output Graph Articulation points: B G A Intro BGL
  • Original BGL Summary
    • The original BGL is large, stable, efficient
      • Lots of algorithms, graph types
      • Peer-reviewed code with many users, nightly regression testing, etc.
      • Performance comparable to FORTRAN.
    • Who should use the BGL?
      • Programmers comfortable with C++
      • Users with graph sizes from tens of vertices to millions of vertices
    Intro BGL
  • BGL-Python
    • Python is ideal for rapid prototyping:
      • It’s a scripting language (no compiler)
      • Dynamically typed means less typing for you
      • Easy to use: you already know Python…
    • BGL-Python provides access to the BGL from within Python
      • Similar interfaces to C++ BGL
      • Easier to learn than C++
      • Great for scripting, GUI applications
      • help(bgl.dijkstra_shortest_paths)
    Intro BGL Python
  • Example: Biconnected Components import boost.graph as bgl # Pull in the BGL bindings g = bgl.Graph.read_graphviz(&quot;biconnected_components.dot&quot;) # Compute biconnected components and articulation points bicomponent = g.edge_property_map(‘int’) art_points = bgl.biconnected_components(g, bicomponent); # Save results with bicomponent numbers as edge labels g.edge_properties[‘label’] = bicomponent g.write_graphviz(&quot;biconnected_components_out.dot&quot;) print &quot;Articulation points: &quot;, node_id = g.vertex_properties[‘node_id’] for v in art_points: print node_id[v],’ ’, print &quot;&quot; Intro BGL Python
  • Wrapping the BGL in Python
    • BGL-Python is not a…
      • “ port”
      • reimplementation
    • BGL-Python wraps the C++ BGL
      • Python calls translate to C++ calls
      • C++ can call back into Python
    • Most of the speed of C++
    • Most of the flexibility of Python
  • Performance: Shortest Paths Intro BGL Python
  • BGL-Python Summary
    • BGL-Python is all about tradeoffs:
      • More gradual learning curve
      • Faster time-to-solution
      • Lower performance
    • Our typical approach:
      • Prototype in Python to get your ideas down
      • Port to C++ when performance matters
    Intro BGL Python
  •  
  • The Parallel BGL
    • A version of the C++ BGL for computational clusters
      • Distributed memory for huge graphs
      • Parallel processing for improved performance
    • An active research project
    • Closely related to the original BGL
      • Parallelizing BGL programs should be “easy”
    Intro BGL Parallel Python
  • Parallel BGL: Distributed Graphs distributed across 3 processors. Intro BGL Parallel Python A simple, directed graph…
  • Parallel Graph Algorithms
    • Breadth-first search
    • Eager Dijkstra’s single-source shortest paths
    • Crauser et al. single-source shortest paths
    • Depth-first search
    • Minimum spanning tree (Boruvka, Dehne & Götz)
    • Connected components
    • Strongly connected components
    • Biconnected components
    • PageRank
    • Graph coloring
    • Fruchterman-Reingold layout
    • Max-flow (Dinic’s)
    Intro BGL Parallel Python
  • Performance: Sparse graphs
  • Scalability (~547k vertices/node) Up to 70M Vertices 1B Edges Small-World Graph
  • Performance vs. CGM graph 96k vertices 10M edges Erdos-Renyi 17x 30x Intro BGL Parallel Python
  • Parallel BGL Summary
    • The Parallel BGL is built for huge graphs
      • Millions to hundreds of millions of nodes
      • Distributed-memory parallel processing on clusters
      • Future work will permit larger graphs…
    • Parallel programming has a learning curve
      • Parallel graph algorithms much harder to write
      • Distributed graph manipulation can be tricky
    • Parallel BGL is an active research library
    Intro BGL Parallel Python
  • Distributed Graph Layout Intro BGL Parallel Python
  • Parallel BGL in Python
    • Preliminary support for the Parallel BGL in Python
      • Just import boost.graph.distributed
      • Similar interface to sequential BGL-Python
    • Several options for usage with MPI:
      • Straight MPI: mpirun -np 2 python script.py
      • pyMPI: allows interactive use of the interpreter
    • Initially used to prototype our distributed Fruchterman-Reingold implementation.
    Intro BGL Parallel Python
  • Porting for Performance Intro BGL Parallel Python Porting
  • Which BGL is Right for You?
    • Is any BGL right for you?
    • Depends on how large your networks are:
      • Up to 1/2 million vertices, any BGL will do
      • C++ BGL can push to a couple million vertices
      • For tens of millions or larger, Parallel BGL only
    • Other considerations:
      • You can prototype in Python, port to C++
      • Algorithm authors might prefer the original BGL
      • Parallelism is very hard to manage
    Intro BGL Parallel Python Porting
  • Conclusion
    • The Boost Graph Library family is a collection of full-featured graph libraries
      • All are flexible, customizable, efficient
      • Easy to port from Python to C++
      • Can port from sequential to parallel
      • Always growing, improving
    • Is one of the BGLs right for you?
      • A typical “build or buy” decision
    Intro BGL Parallel Python Porting Conclusion
  • For More Information…
    • (Original) Boost Graph Library http://www.boost.org/libs/graph/doc
    • Parallel Boost Graph Library http://www.osl.iu.edu/research/pbgl
    • Python Bindings for (Parallel) BGL http://www.osl.iu.edu/~dgregor/bgl-python
    • Contact us!
      • Douglas Gregor < [email_address] . iu . edu >
      • Andrew Lumsdaine < [email_address] >
    Intro BGL Parallel Python Porting Conclusion
  • Other BGL Variants
    • QuickGraph (C#) http://www. codeproject . com/cs/miscctrl/quickgraph .asp
    • Ruby Graph Library http: //rubyforge . org/projects/rgl/
    • Rooster Graph (Scheme) http://savannah. nongnu . org/projects/rgraph/
    • RBGL (an R interface to the C++ BGL) http://www.bioconductor.org/packages/bioc/1.8/html/RBGL.html
    • Disclaimer: These are all separate projects. We do not maintain them.
    Intro BGL Parallel Python Porting
  • Comparative Performance Intro BGL