Upcoming SlideShare
×

# Sparse Data Structures for Weighted Bipartite Matching

1,311 views

Published on

A stab at optimizing the inner loop for auctiion-based, sparse bipartite matching.

Published in: Technology, Education
0 Likes
Statistics
Notes
• Full Name
Comment goes here.

Are you sure you want to Yes No
• Be the first to comment

• Be the first to like this

Views
Total views
1,311
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
13
0
Likes
0
Embeds 0
No embeds

No notes for slide

### Sparse Data Structures for Weighted Bipartite Matching

1. 1. Sparse Data Structures for Weighted Bipartite Matching E. Jason Riedy Dr. James Demmel (. . . and thanks to the BeBOP group) SIAM Workshop on Combinatorial Scientiﬁc Computing 2004
2. 2. Use Sparse Matrix Optimizations... Take a ﬁxed, simple algorithm: Auction alg. for matchings Repeated iterations over a sparse graph. What’s expensive, and is there anything we can do about it? Take an idea from optimizing sparse matrix-vector products. A little speed-up in some cases, but there are more ideas available...
3. 3. Where’s the Time Going? Auction algorithm: Iterative, greedy algorithm bipartite matching: 1. An unmatched row i ﬁnds a “most proﬁtable” column j π(i) = maxj b(i, j) − p(i) 2. Row i places a bid for column j. Bid price raised until j is no longer the best choice. (Min. increment µ) 3. Highest bid gets the matching (i, j).
4. 4. Time linear in entries examined... Number of entries examined is problem-dependent. 4.5 4 3.5 3 time (s) 2.5 2 1.5 1 0.5 0 0 5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07 number of entries examined
5. 5. Expensive Inner Loop! 1.3 GHz Itanium 2 10 8 6 4 2 0 100 150 200 250 300 350 400 450 cycles per entry
6. 6. Verifying... Using kcachegrind (N.Nethercote and J.Weidendorfer) and valgrind (J. Seward).
7. 7. And Locating... No obvious culprits in the instructions...
8. 8. And Locating... But considering cache eﬀects!
9. 9. Auction’s Inner Loop Price Index Entry value = entry - price save largest...
10. 10. Auction’s Inner Loop Same accesses as sparse matrix-vector multiplication! Price Index Entry y += a(i,j) * x(j)
11. 11. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)
12. 12. Performance Through Blocking? (Images swiped from Berkeley’s BeBOP group.)
13. 13. Performance Through Blocking? More entries, but 1.5× performance on Pentium 3! (Images swiped from Berkeley’s BeBOP group.)
14. 14. Blocking Speeds Some Matches Finite element matrix from Vavasis (in UF collection):
15. 15. Blocking Speeds Some Matches
16. 16. Blocking Speeds Some Matches
17. 17. Observations A blocked graph data structure may provide additional performance if: you iterate over whole rows, the graph / matrix has runs of columns, and you’re willing to use an automated tuning system. Maximizing the runs: linear arrangement. Hard, but there may be cheap heuristics. Only worth-while if you’re performing many iterations. (For mat-vec, often > 50 computations of Ax.)