© Adam Perer                                           BudgetMatchBudget-Match: Cost Effective Subgraph     Matching on La...
likes                                                                                                                     ...
BudgetMatchLinked Data - RDF
BudgetMatch                   500 million users50M tweets / day
BudgetMatchSubgraph Matching Queries                          attended                  ?p                 Francis        ...
BudgetMatchPrior Work Systems    (Storage, Index, Query answering)    -  Jena, Sesame, RDF-3X, YARS, DOGMA,        COSI, ...
likes                                                                                                                     ...
BudgetMatchNetwork Characteristics
BudgetMatchNetwork Statistics Most real world networks have power-law  degree distributions    -  Hence average statistic...
BudgetMatch Subgraph Matching  On networks with power-law degree   distributions, subgraph matching   algorithms will vis...
likes                                                                                                                     ...
BudgetMatchBudgetMatch IDEA: Use a dynamic cost model which   updates its cost estimates as it learns   more about the ne...
BudgetMatch BudgetMatch  Depth first search query answering    algorithm     -  Memory efficient     -  Parallelizable  ...
BudgetMatchExample Query                          attended                  ?p                 Francis                    ...
likes                                                                                                                     ...
BudgetMatch BudgetMatch Example I                              attended         c= 5                                      ...
BudgetMatchBudgetMatch Example IIc= 5,R = {},                                                                c=5R’= {Peter...
BudgetMatchBudgetMatch Example III c= 5, R = {Peter’s bday party,                                         c=5 Silvester 20...
BudgetMatchBudgetMatch Example IV c= 5, R = {Peter’s bday party,                                          c=5 Silvester 20...
BudgetMatchBudgetMatch Example V        c= 5,                                          c=5        R = {}       ?p         ...
BudgetMatchBudgetMatch Example VI        c= 5,                                     c=5        R = {}      ?p              ...
BudgetMatch Cost Initialization & Update  Initialize cost     -  Constant initial cost     -  Using average degree statis...
BudgetMatch Budget Assignment23
BudgetMatch Experiments  Evaluated on a network with 1.12 billion    edges     -  Delicious social network crawl (partial...
BudgetMatch             Query Times (Warm Cache)                     1)   assignBudget4, λ=100                     2)   as...
BudgetMatch             Query Times (Warm Cache)                     1)   assignBudget4, λ=100                     2)   as...
BudgetMatch             Query Times (Warm Cache)                     1)   assignBudget4, λ=100                     2)   as...
BudgetMatch             Query Times (Cold Cache) 100000                    1)   assignBudget4, λ=100                      ...
BudgetMatch Comparison  Compared configuration 4 against      -  Neo4J subgraph matching (SN-1)      -  DOGMA without sta...
BudgetMatch                                                          DOGMA Index                                          ...
BudgetMatch           COSI Architecture           Graph Data      Client          B   ?X                                 ...
BudgetMatch COSI Partitioning     Key Theorem      Suppose vertex retrieval and inter-node comms       are uniform across ...
BudgetMatch Further Information   COSI: Cloud Oriented Subgraph Identification     in Massive Social Networks     Matthia...
BudgetMatchdogma.umiacs.umd.edu
BudgetMatch Conclusion  Dynamic cost models are beneficial for    networks with heavy-tailed distributions  Developed Bu...
?             BudgetMatchQuestions?Comments?
Upcoming SlideShare
Loading in...5
×

Budget-Match: Cost Effective Subgraph Matching on Large Networks

4,529

Published on

Published in: Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
4,529
On Slideshare
0
From Embeds
0
Number of Embeds
4
Actions
Shares
0
Downloads
0
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Transcript of "Budget-Match: Cost Effective Subgraph Matching on Large Networks"

  1. 1. © Adam Perer BudgetMatchBudget-Match: Cost Effective Subgraph Matching on Large Networks Matthias Bröcheler, Andrea Pugliese & V.S. Subrahmanian
  2. 2. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  3. 3. BudgetMatchLinked Data - RDF
  4. 4. BudgetMatch 500 million users50M tweets / day
  5. 5. BudgetMatchSubgraph Matching Queries attended ?p Francis friend organi zed attended Peter ?u ?f likes likes type ?b Drama
  6. 6. BudgetMatchPrior Work Systems (Storage, Index, Query answering) -  Jena, Sesame, RDF-3X, YARS, DOGMA, COSI, Hexastore, column stores, etc -  AllegroGraph, Neo4J, OWLIM, etc Query Optimization -  Stocker (WWW’08) and others -  similar to RDBMS with schema discovery •  Selectivity estimation, query plan search and join ordering6
  7. 7. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  8. 8. BudgetMatchNetwork Characteristics
  9. 9. BudgetMatchNetwork Statistics Most real world networks have power-law degree distributions -  Hence average statistics are not helpful mean mean Long tail9
  10. 10. BudgetMatch Subgraph Matching  On networks with power-law degree distributions, subgraph matching algorithms will visit high degree nodes when using static cost models -  Statistics won’t help us avoid those -  Existing subgraph matching cost models are static10
  11. 11. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter attended Feast type ? likes organized likes friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  12. 12. BudgetMatchBudgetMatch IDEA: Use a dynamic cost model which updates its cost estimates as it learns more about the network -  Assigns an initial cost estimate •  Fixed or based on average statistics -  Processes nodes using its current cost estimate as a budget for processing -  If budget is exceeded, processing is aborted and the cost estimate updated
  13. 13. BudgetMatch BudgetMatch  Depth first search query answering algorithm -  Memory efficient -  Parallelizable  Based on the DOGMA query answering algorithm -  ISWC’09  Provably correct13
  14. 14. BudgetMatchExample Query attended ?p Francis friend organi zed attended Peter ?u ?f likes likes type ?b Drama
  15. 15. likes type BudgetMatch friend friend Star Sci-Fi Bob Mark Wars IV friend The attended likes friend friend Titanic likes Godfather likes attended Halloween Pizza John 2008 Peter likes attended Feast organized likes type friend attended organized organized attended Francis Jennifer Peter‘s Drama attended friend likes Bday party attended organized attended Ashley likes attended Pulp Sylvester attended type Home- friend Fiction 2009 organized coming attended 09 type attended Fundraiser organized Gone for School Bob with the Thriller Jessie attended Chill-out wind Night Alice friend attended likes Goodbye type likes Mrs. attended organized Doubtfire organized Spring Inception type Melissa friend Break Trip Alice attended likes likes Comedy likes friend organized likes Harry likes Emily The Lion Potter type Kinglikes Jon likes type Mystery Toy type typelikes Story Family
  16. 16. BudgetMatch BudgetMatch Example I attended c= 5 c=5 R = {} ?p Francis R = {francis} friend organi zed attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {} R = {peter} likes type ?b Drama c=5 c=5 ANS = R = {} R = {drama} {} θ = {}16
  17. 17. BudgetMatchBudgetMatch Example IIc= 5,R = {}, c=5R’= {Peter’s bday party, Homecoming ?p Francis R = {francis}09, Silvester 2009} R’= {} organi zed attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {} R = {peter} likes R’ = {Mark, John} type ?b Drama c=5 c=5 ANS = R = {} R = {drama} {} θ = {}
  18. 18. BudgetMatchBudgetMatch Example III c= 5, R = {Peter’s bday party, c=5 Silvester 2009} ?p Francis R’= {} attended c=5 Peter ?u R = {} ?f likes c=5 c=5 R = {Mark, John} R = {peter} likes type ?b Drama c=5 c = 25 ANS = R = {} R = {drama} {} R’ = {drama} θ = {}
  19. 19. BudgetMatchBudgetMatch Example IV c= 5, R = {Peter’s bday party, c=5 Silvester 2009} ?p Francis R’= {} attended c=5 Peter ?u R = {} ?f c=5 c=5 R = {} R = {peter} likes type ?b Drama c = 25 R = {Titanic, Star Wars IV} c = 25 ANS = R’ = {Titanic, Star Wars IV} R = {drama} {} θ = {?f/Mark}
  20. 20. BudgetMatchBudgetMatch Example V c= 5, c=5 R = {} ?p Francis R’= {} c=5 Peter ?u R = {Francis, ?f Jennifer, Ashley} c=5 c=5 R’= {} R = {} R = {peter} type ?b Drama c = 25 R = {Titanic, Star Wars IV} c = 25 ANS = R’ = {Titanic} R = {drama} {} θ = {?f/Mark, ?p/Peter’s bday party, ?u/Jennifer}
  21. 21. BudgetMatchBudgetMatch Example VI c= 5, c=5 R = {} ?p Francis R’= {} c=5 Peter ?u R = {} ?f c=5 c=5 R = {} R = {peter} c = 25 R = {}} ?b Drama = 25 c ANS = {θ} R = {} θ = {?f/Mark, ?p/Peter’s bday party, ?u/ Jennifer, ?b/Titanic}
  22. 22. BudgetMatch Cost Initialization & Update  Initialize cost -  Constant initial cost -  Using average degree statistics  Cost estimate update -  Multiply by a constant22
  23. 23. BudgetMatch Budget Assignment23
  24. 24. BudgetMatch Experiments  Evaluated on a network with 1.12 billion edges -  Delicious social network crawl (partial)  Used Neo4J as storage engine -  Custom batch loading, degree lookup  Compared against DOGMA algorithm  Evaluated on a set of 9 diverse benchmark queries (5-12 edges)24
  25. 25. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) Logarithmic Scale 25
  26. 26. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 26
  27. 27. BudgetMatch Query Times (Warm Cache) 1) assignBudget4, λ=100 2) assignBudget4, λ=500 10000 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 1000 9) SN-3: DOGMA+statisticsTime in ms 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 27
  28. 28. BudgetMatch Query Times (Cold Cache) 100000 1) assignBudget4, λ=100 2) assignBudget4, λ=500 3) assignBudget4, λ=2000 4) assignBudget4, λ=15000 10000 5) assignBudget4, λ=100, statistics 6) assignBudget2, λ=500 7) assignBudget3, λ=500 8) assignBudget1, λ=500 9) SN-3: DOGMA+statisticsTime in ms 1000 100 10 1 Query1 (5E) Query2 (7E) Query3 (6E) Query4 (9E) Query5 (9E) Query6 (7E) Query7 (12E) Query8 (12E) Query9 (11E) 28
  29. 29. BudgetMatch Comparison  Compared configuration 4 against -  Neo4J subgraph matching (SN-1) -  DOGMA without statistics (SN-2) -  DOGMA with statistics (SN-3) SN-1 SN-2 SN-3 Cold Cache 12,867 x 12 x 11 x Warm Cache 44,794 x 18 x 14 x29
  30. 30. BudgetMatch DOGMA Index 3 1 Graph Locality 2 4 3 3 1 1 2 4 4 2 3 3 3 3 1 1 1 1 4 4 2 4 2 4 2 2 Alice sponsor Bill Term Tax Term Jeff Term A0467 Nimbe hasRole B004 10/02/94 Healt A1589 10/02/94 forOffice Code 11/06/90 Ryser subject r Carla 5 h Has Role hasRole Male amendmentTo sponsorhasRole Bunes Has Role Care IL B074 Term Senate subject Bill John gender gender Pierce 4 10/12/94 A0056 NY Keith B053 McRie Dickes For Office Term Farmer US 2sponsor amendmentTo Has Role sponsor 10/21/94 For Office sponsor Senate Female Senate Peter Term Traves Term A0772 A2187 A0342 MD B1432 11/10/90 A1232 10/12/94 Disk Pages
  31. 31. BudgetMatch COSI Architecture Graph Data Client B ?X  ?Z C  A ?Y load Receive query - Return resultsPartition Graph Distribute data/ (automatic) Dispatch query Query answer     Exchange Data /  Answer Queries (complexity hidden) Forward query
  32. 32. BudgetMatch COSI Partitioning Key Theorem Suppose vertex retrieval and inter-node comms are uniform across storage nodes. The partition of DB that minimizes query exec time coincides with the partition that minimizes edge cut cost in the graph (V,VV) with weight function w(u,v)= (E(u,v))+ (E(v,u)).   SO MIN EDGE-CUTS IN COMPLETE GRAPHS IS CLOSELY RELATED TO MINIMIZING QUERY EXECUTION TIME.32
  33. 33. BudgetMatch Further Information   COSI: Cloud Oriented Subgraph Identification in Massive Social Networks Matthias Bröcheler, Andrea Pugliese and V.S. Subrahmanian, The 2010 International Conference on Advances in Social Networks Analysis and Mining - Patent Pending -   DOGMA: A Disk-Oriented Graph Matching Algorithm Matthias Broecheler,  Andrea Pugliese,  V.S. Subrahmanian, Proceedings of the 8th International Semantic Web Conference - Patent Pending -33
  34. 34. BudgetMatchdogma.umiacs.umd.edu
  35. 35. BudgetMatch Conclusion  Dynamic cost models are beneficial for networks with heavy-tailed distributions  Developed BudgetMatch query answering algorithm which dynamically updates cost estimations during execution.  BudgetMatch yields huge improvements over standard static approaches for some queries35
  36. 36. ? BudgetMatchQuestions?Comments?

×