Upcoming SlideShare
Loading in …5
×

# Graph libraries in Matlab: MatlabBGL and gaimc

26,971 views
26,711 views

Published on

Slides from my talk at the Mathworks summit at Stanford

Published in: Technology
1 Comment
6 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
Your message goes here
• Very useful explanation of getting from Matlab's sparse matrix graph format to Boost's CSR or Forward Star graph format, and vice versa.
I wasted many hours doing this myself about 7 or 8 years ago.

Gaimc is a great idea for those who want pure Matlab.

Are you sure you want to  Yes  No
Your message goes here
No Downloads
Views
Total views
26,971
On SlideShare
0
From Embeds
0
Number of Embeds
21,495
Actions
Shares
0
Downloads
99
Comments
1
Likes
6
Embeds 0
No embeds

No notes for slide

### Graph libraries in Matlab: MatlabBGL and gaimc

1. 1. A tale of two Matlab libraries !for graph algorithms!MatlabBGL and gaimc David F. Gleich Purdue University
2. 2. The Settingrecursive spectral graph partitioning
3. 3. To store an m×n sparse matrix M, Matlab uses compressed column format Compr 2 12 4 The Setting[Gilbert et al., ]. Matlab never stores a 0 value in a sparse matrix. It always 16 20 rp“re-compresses” the data structure in these cases. If M is the adjacency matrix 1 10 corresponds to storing theof a graph, then storing the matrix by columns 4 9 7 6graph as an in-edge list. recursive spectral graph partitioning 13 4 ci We briey illustrate compressed row and column storage schemes in g- 3 14 5 ai ure .. 1 2 3 4 5 6 1 0 16 13 Compressed sparse row0 0 0 Compr 0 rp2 1 3 0 5 10 9 11 0 0 cp 2 12 4 7 12 11 0 3 0 4 16 20 0 0 14 4 0 0 20 1 10 4 9 7 6 2 3 3 9 20 5 0 6 6 13 4 ci 0 0 0 7 0 5 4 3 4 ri 4 ai 13 10 12 4 14 9 16 20 4 7 6 0 0 0 ai 3 14 5 0 0 0 0 Compressed sparse column0 16 13 0 0 0 0 cp 0 10 12 0 Most graph algorithms are designe0 0 1 1 3 6 8 9 11 4 0 0 14 in-edge lists. Before running an algo0 0 9 0 0 20
4. 4. The Settingrecursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);*
5. 5. The Settingrecursive spectral graph partitioning A = load_adjacency_matrix; L = speye(sum(A,2)) - A; [V,D] = eigs(L,2,’SA’); f = V(:,2); A1 = A(f=0,f=0); A2 = A(f0, f0);* *Warning Can do much better than this split!
6. 6. The Problemdisconnected components
7. 7. The Problemdisconnected components C = components(A);??? Undeﬁned function or method’components for input arguments of typedouble’.
8. 8. The Problemdisconnected components *Warningthis isn’t a speaking, Strictly problem. However, it’s inefﬁcient to solve larger eigenproblems C = components(A); than required.??? Undeﬁned function or method’components for input arguments of typedouble’.
9. 9. The Rescuedisconnected componentsMESHPART toolkit by John Gilbert and Sheng-hua Teng C = components(A); Uses Matlab’s dmperm function
10. 10. The Failed Rescuedisconnected components C = components(A); caused Matlab to randomly crashI wanted a fast max-ﬂow routine too
11. 11. Matlab and the Boost graph libraryMatlabBGL
12. 12. The Recoupworking recursive spectral partitioningcode using Boost graph library in C++including a max-ﬂow heuristic extensionBoost graph library has a componentsfunction and many other graphalgorithmsBoost has a “generic” graph data-type
13. 13. The Ideaadd graph algorithms to Matlabnaturally using Boost graph library
14. 14. The Plangraph data type = Matlab sparse matrixresults= “natural” Matlab types
15. 15. The Plan A = load_adjacency_matrix d = bfs(A,1); d = dijkstra(A,size(A,1)); T = mst(A); c = components(A); F = maxﬂow(A,s,t); test_dag(A) [ﬂag,K] = test_planarity(A);
16. 16. The Plansuitable for large problems = 10 million edges circa 2006= avoid copying data
17. 17. The CatchBoost graph type Matlab sparse type compressed sparse column vertices(G) 1:nedges(G) [i,j,w] = ﬁnd(A);num_vertices(G) size(A,1)out_edges(G,v) [~,j,w] = ﬁnd(A(v,:))adjacenct(G,v) [~,j] = ﬁnd(A(v,:))
18. 18. graph as an in-edge list. We briey illustrate compressed row and column storage schemes in g-ure .. 2 12 4 Compressed sparse row 16 20 rp 1 3 5 7 9 11 11 1 10 4 9 7 6 13 4 ci 2 3 3 4 2 5 3 6 4 6 3 14 5 ai 16 13 10 12 4 14 9 20 7 40 0 Compressed sparse column 16 13 0 0 0 0 cp 0 10 12 0 0 0 1 1 3 6 8 9 11 4 0 0 14 0 20 0 9 0 0 0 4 ri 1 3 1 2 4 2 5 3 4 5 0 0 7 0 0 0 ai 16 4 13 10 9 12 7 14 20 4 0 0 0 0 Most graph algorithms are designed to work with out-edge lists instead of
19. 19. The Compromisemake a transpose when its requiredbut let “smart” users by-pass it
20. 20. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graph The Detailsfunction from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbglare four main components: m-les, mex-les, libmbgl, and BGL functions.
21. 21. MatlabBGL – Version 1.0Released April 2006 on Matlab File exchangeJuly ‘06 v2.0 added visitorsApril ‘07 v2.1 64-bit MatlabApril ‘08 v3.0 performance improvementOct ‘08 v4.0 planarity testing, layout,structural zerosJan ‘12 v5.0 update forthcoming?
22. 22. ImpactDownloaded over 20,000 timesUsed in over 10 publications by others!including a PNAS article on brain topologyIdentiﬁed numerous bugs in the Boost graph library
23. 23. Impact
24. 24. Network Partitioning… and now for a demo …
25. 25. BGL is largely irrelevant to MatlabBGL. ere is no need for the copy_graphfunction from Boost, for example. Next, gure . shows the high level architecture of MatlabBGL. ere The Devil of the Details dfs dfs bfs bfs Sparse Matrix CSR Graph mst primmst M code extern c code mex code c++ code CSR Graph Boost Matlab libmbgl Compile mex ﬁles on Compile libmbgl on OSX/Linux/Win in OSX/Linux/Win in are four main components: m-les, mex-les, libmbgl, and 64-bit functions. 32-bit and 64-bit mode 32-bit and BGL modeLet’s illustrate a typical call to a MatlabBGL function: dfs for a depth-rstsearch through the graph.
26. 26. The Devil of the DetailsHard to keep up with changes in MatlabHard for users to compile themselves(changes in Boost and changes in Matlab)Hard to play around with new algorithms Mathworks graph library inbioinformatics toolbox
27. 27. graph algorithms in matlab codegaimc
28. 28. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc;Elapsed time is 0.32 seconds
29. 29. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.16 seconds R2007b on 64-bit linux tic, n1 = norm(x,1); toc;Elapsed time is 0.32 seconds
30. 30. A visionfunction n=my1norm(x)n = 0; for i=1:numel(x), n=n+abs(x(i)); end x = randn(1e7,1); tic, n1=my1norm(x); toc NoteElapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = norm(x,1); toc;Elapsed time is 0.1 seconds
31. 31. Quite impressedget within spitting distance of vectorizedperformance using Matlab for loopseven faster than some things in python
32. 32. Another ideaimplement graph algorithms in pureMatlab codeshould only be “somewhat” slowermuch more portable
33. 33. More problemsfunction calls make things REALLY slow(unless the function is built-in, e.g. abs)mst and dijkstra need a heap, a heap in Matlab?
34. 34. Problem speciﬁcsfunction n=my1normfunc(x)n = 0;for i=1:numel(x),n=n+abs1(x(i)); endfunction a=myabs(a), if a0, a=-a; end tic, n1=mynorm1(x); toc NoteElapsed time is 0.15 seconds R2011a on 64-bit osx tic, n1 = my1normfunc(x,1); toc;Elapsed time is 3.16 seconds
35. 35. tation of a heap.ion is inspired by Kahaner []. From a More generally speaking, algorithmsap is a binary tree where smaller elements are written in Fortran are excellent can- A heap in Matlab codeupports the following operations: didates for the Matlab just-in-time compiler. nt to the heap; ement from the heap with the smallest e array 5 6 7 1 9 6 Old reference lue of an element in the heap. corresponds to the following tree: D. K. Kahaner s (or vectors), and a common way to store a 5ociate Algorithm 561: a le child the tree node of index j with index 2 j + 1. See gure . for an example. Fortran implementationMatlab heap will consist of four arrays and one 8 7 of heap programs. ACM TOMS 1980 tores the identiers of the items in the heap. 1 9 6 the element in tree node i and T(1) is the id t of the heap tree. Figure 6.3 – Binary trees as arrays. tores ids of elements in D so that D(T(i)) is
36. 36. Graph access, take 1Simple, efﬁcient neighbor accessAt = A’;[v,~,w] = ﬁnd(At(:,u));
37. 37. Graph access, take 2Complicated neighbor access[i,j,w] = ﬁnd(A);[ai,aj,a] = indexed2csr(i,j,w,size(A,1))v = aj(ai(u):ai(u+1));
38. 38. Graph accessbfs, take 1 At=A’; for w=ﬁnd(A(:,v)) tic, d=bfs(A,1), tocElapsed time 0.05 secondsbfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), tocElapsed time, 0.007 seconds
39. 39. Graph accessbfs, take 1 At=A’; for w=ﬁnd(A(:,v)) tic, d=bfs(A,1), tocElapsed time 0.05 secondsbfs, take 2 indexed2csr(A); for ci=rp(v):rp(v+1) … tic, d=bfs(A,1), tocElapsed time, 0.007 seconds
40. 40. gaimcconvert input to CSR arraysrun graph algorithms on CSR arraysbfs, clustering coefﬁents, core numbers,cosine knn, dfs, dijkstra, ﬂoyd warshall,mst, strong componentsbipartite_matching (Thanks to Ying Wang)
41. 41. nstances of a random symmetric graph with average degree and0, and 10000 vertices. e aggregated results of all these tests are shgure .. The pudding function s=mysumsq(x) 14 Standard 12 = 0; Fast s for i=1:numel(x), s = s + x(i)^2; end 10 x = randn(1e7,1); Slowdown 8 tic, s1 = mysumsq(x); toc; 6 4 tic, s2 = x’*x; toc 2 0 dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
42. 42. nstances of a random symmetric graph with average degree and0, and 10000 vertices. e aggregated results of all these tests are shgure .. The pudding changes function s=mysumsq(x) 35 14 Standard Standard Fast 12 = 0; Fast s 30 for i=1:numel(x), s = s + x(i)^2; end 25 10 x = randn(1e7,1); Slowdown Slowdown 208 tic, s1 = mysumsq(x); toc; 156 104 tic, s2 = x’*x; toc 52 00 dfs dfs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs scomponents dijkstra dirclustercoeffs mst_prim clustercoeffs 6.4 – Performance of the gaimc library. An experimental comparison of the perform
43. 43. Afterward“putting the graph into Matlab”Matlab could just as easily have beencalled “Graphlab” with a few extrafunctionsIt’s a great environment to play withgraphs as matrices