Budget-Match: Cost Effective Subgraph Matching on Large Networks

  • 4,160 views
Uploaded on

 

More in: Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
4,160
On Slideshare
0
From Embeds
0
Number of Embeds
4

Actions

Shares
Downloads
0
Comments
0
Likes
2

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. © Adam Perer BudgetMatchBudget-Match: Cost Effective Subgraph Matching on Large Networks Matthias Bröcheler, Andrea Pugliese & V.S. Subrahmanian
  • 2. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  • 3. BudgetMatchLinked Data - RDF
  • 4. BudgetMatch 500 million users50M tweets / day
  • 5. BudgetMatchSubgraph Matching Queries attended ?p Francis friend organi zed attended Peter ?u ?f likes likes type ?b Drama
  • 6. BudgetMatchPrior Work Systems (Storage, Index, Query answering) -  Jena, Sesame, RDF-3X, YARS, DOGMA, COSI, Hexastore, column stores, etc -  AllegroGraph, Neo4J, OWLIM, etc Query Optimization -  Stocker (WWW’08) and others -  similar to RDBMS with schema discovery •  Selectivity estimation, query plan search and join ordering6
  • 7. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  • 8. BudgetMatchNetwork Characteristics
  • 9. BudgetMatchNetwork Statistics Most real world networks have power-law degree distributions -  Hence average statistics are not helpful mean mean Long tail9
  • 10. BudgetMatch Subgraph Matching  On networks with power-law degree distributions, subgraph matching algorithms will visit high degree nodes when using static cost models -  Statistics won’t help us avoid those -  Existing subgraph matching cost models are static10
  • 11. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter attended Feast type ? likes organized likes friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  • 12. BudgetMatchBudgetMatch IDEA: Use a dynamic cost model which updates its cost estimates as it learns more about the network -  Assigns an initial cost estimate •  Fixed or based on average statistics -  Processes nodes using its current cost estimate as a budget for processing -  If budget is exceeded, processing is aborted and the cost estimate updated
  • 13. BudgetMatch BudgetMatch  Depth first search query answering algorithm -  Memory efficient -  Parallelizable  Based on the DOGMA query answering algorithm -  ISWC’09  Provably correct13
  • 14. BudgetMatchExample Query attended ?p Francis friend organi zed attended Peter ?u ?f likes likes type ?b Drama
  • 15. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  • 16. BudgetMatch BudgetMatch Example I attended c= 5 c=5 R = {} ?p Francis R = {francis} friend organi zed attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {} R = {peter} likes type ?b Drama c=5 c=5 ANS = R = {} R = {drama} {} θ = {}16
  • 17. BudgetMatchBudgetMatch Example IIc= 5,R = {}, c=5R’= {Peter’s bday party, Homecoming ?p Francis R = {francis}09, Silvester 2009} R’= {} organi zed attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {} R = {peter} likes R’ = {Mark, John} type ?b Drama c=5 c=5 ANS = R = {} R = {drama} {} θ = {}
  • 18. BudgetMatchBudgetMatch Example III c= 5, R = {Peter’s bday party, c=5 Silvester 2009} ?p Francis R’= {} attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {Mark, John} R = {peter} likes type ?b Drama c=5 c = 25 ANS = R = {} R = {drama} {} R’ = {drama} θ = {}
  • 19. BudgetMatchBudgetMatch Example IV c= 5, R = {Peter’s bday party, c=5 Silvester 2009} ?p Francis R’= {} attended c=5 Peter ?u R = {} ?f c=5 c=5 R = {} R = {peter} likes type ?b Drama c = 25 R = {Titanic, Star Wars IV} c = 25 ANS = R’ = {Titanic, Star Wars IV} R = {drama} {} θ = {?f/Mark}
  • 20. BudgetMatchBudgetMatch Example V c= 5, c=5 R = {} ?p Francis R’= {} c=5 Peter ?u R = {Francis, ?f Jennifer, Ashley} c=5 c=5 R’= {} R = {} R = {peter} type ?b Drama c = 25 R = {Titanic, Star Wars IV} c = 25 ANS = R’ = {Titanic} R = {drama} {} θ = {?f/Mark, ?p/Peter’s bday party, ?u/Jennifer}
  • 21. BudgetMatchBudgetMatch Example VI c= 5, c=5 R = {} ?p Francis R’= {} c=5 Peter ?u R = {} ?f c=5 c=5 R = {} R = {peter} c = 25 R = {}} ?b Drama = 25 c ANS = {θ} R = {} θ = {?f/Mark, ?p/Peter’s bday party, ?u/ Jennifer, ?b/Titanic}
  • 22. BudgetMatch Cost Initialization & Update  Initialize cost -  Constant initial cost -  Using average degree statistics  Cost estimate update -  Multiply by a constant22
  • 23. BudgetMatch Budget Assignment23
  • 24. BudgetMatch Experiments  Evaluated on a network with 1.12 billion edges -  Delicious social network crawl (partial)  Used Neo4J as storage engine -  Custom batch loading, degree lookup  Compared against DOGMA algorithm  Evaluated on a set of 9 diverse benchmark queries (5-12 edges)24
  • 25. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) Logarithmic Scale 25
  • 26. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 26
  • 27. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 27
  • 28. BudgetMatch Query Times (Cold Cache) 100000 1) assignBudget4, λ=100 2) assignBudget4, λ=500 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 10000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 9) SN-3: DOGMA+statisticsTime in ms 1000 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 28
  • 29. BudgetMatch Comparison  Compared configuration 4 against -  Neo4J subgraph matching (SN-1) -  DOGMA without statistics (SN-2) -  DOGMA with statistics (SN-3) SN-1 SN-2 SN-3 Cold Cache 12,867 x 12 x 11 x Warm Cache 44,794 x 18 x 14 x29
  • 30. BudgetMatch DOGMA Index 3 1 Graph Locality 2 4 3 3 1 1 2 4 4 2 3 3 3 3 1 1 1 1 4 4 2 4 2 4 2 2 Alice sponsor Bill Term Tax Term Jeff Term A0467 Nimbe hasRole B004 10/02/94 Healt A1589 10/02/94 forOffice Code 11/06/90 Ryser subject r Carla 5 h Has Role hasRole Male amendmentTo sponsorhasRole Bunes Has Role Care IL B074 Term Senate subject Bill John gender gender Pierce 4 10/12/94 A0056 NY Keith B053 McRie Dickes For Office Term Farmer US 2sponsor amendmentTo Has Role sponsor 10/21/94 For Office sponsor Senate Female Senate Peter Term Traves Term A0772 A2187 A0342 MD B1432 11/10/90 A1232 10/12/94 Disk Pages
  • 31. BudgetMatch COSI Architecture Graph Data Client B ?X  ?Z C  A ?Y load Receive query - Return resultsPartition Graph Distribute data/ (automatic) Dispatch query Query answer     Exchange Data /  Answer Queries (complexity hidden) Forward query
  • 32. BudgetMatch COSI Partitioning Key Theorem Suppose vertex retrieval and inter-node comms are uniform across storage nodes. The partition of DB that minimizes query exec time coincides with the partition that minimizes edge cut cost in the graph (V,VV) with weight function w(u,v)= (E(u,v))+ (E(v,u)).   SO MIN EDGE-CUTS IN COMPLETE GRAPHS IS CLOSELY RELATED TO MINIMIZING QUERY EXECUTION TIME.32
  • 33. BudgetMatch Further Information   COSI: Cloud Oriented Subgraph Identification in Massive Social Networks Matthias Bröcheler, Andrea Pugliese and V.S. Subrahmanian, The 2010 International Conference on Advances in Social Networks Analysis and Mining - Patent Pending -   DOGMA: A Disk-Oriented Graph Matching Algorithm Matthias Broecheler,  Andrea Pugliese,  V.S. Subrahmanian, Proceedings of the 8th International Semantic Web Conference - Patent Pending -33
  • 34. BudgetMatchdogma.umiacs.umd.edu
  • 35. BudgetMatch Conclusion  Dynamic cost models are beneficial for networks with heavy-tailed distributions  Developed BudgetMatch query answering algorithm which dynamically updates cost estimations during execution.  BudgetMatch yields huge improvements over standard static approaches for some queries35
  • 36. ? BudgetMatchQuestions?Comments?