Processing Reachability Queries with Realistic Constraints on Massive Networks by Hong Cheng

3,481 views

Published on

Massive graphs are ubiquitous in various application domains, such as social networks, road networks, communication networks, biological networks, RDF graphs, and so on. Such graphs are massive (for example, with hundreds of millions of nodes and edges or even more) and contain rich information (for example, node/edge weights, labels and textual contents). In such massive graphs, an important class of problems is to process various graph structure related queries. Graph reachability, as an example, asks whether a node can reach another in a graph. However, the large graph scale presents new challenges for efficient query processing.

In this talk, I will introduce two new yet important types of graph reachability queries: weight constraint reachability that imposes edge weight constraint on the answer path, and k-hop reachability that imposes a length constraint on the answer path. With such realistic constraints, we can find more meaningful and practically feasible answers. These two reachablity queries have wide applications in many real-world problems, such as QoS routing and trip planning.

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Processing Reachability Queries with Realistic Constraints on Massive Networks by Hong Cheng

  1. 1. Processing Reachability Queries with Realistic Constraints on Massive Networks Hong Cheng The Chinese University of Hong Kong
  2. 2. Massive Networks are Everywhere!
  3. 3. A: Yes A: No )9,3(=Q Graph Reachability  Query: can node u reach node v in a directed graph? )11,1(=Q 1 2 3 4 6 7 8 5 9 13 10 11 12 14 15
  4. 4. Graph Reachability  Has been studied extensively in the literature  A comprehensive survey by Yu and Cheng [1]  Main idea  Find all strongly connected components (SCCs) in a graph G  Compress G into a DAG by replacing each SCC with a node  Compute the edge transitive closure on the DAG
  5. 5. Graph Reachability with Realistic Constraints  General reachability query is not expressive enough, and the answers may not be meaningful or practically feasible.  We, for the first time, study graph reachability when realistic constraints are imposed.  Weight constraint [VLDBJ’13]  Distance constraint [PVLDB’12]
  6. 6. Weight Constraint Reachability (WCR)  Input: a weighted undirected graph  Query: can node u reach node v with every edge weight on the path satisfying a constraint c? )4,,( ≤= gaQ ),,( wEVG = A: Yes!
  7. 7. Applications of WCR  QoS routing  Is there a route from one node to another in a communication network, such that each link has a bandwidth ≥ x ?  Trip planning  Is there a route from one city to another in a road network, such that each segment has a speed limit within [50, 80] miles/hour?  Distribution network  Is there a feasible delivery route between two locations, such that each intermediate warehouse, storage point or distribution center has a proper handling capacity ≥ x ?
  8. 8. A Straightforward Solution  Perform BFS/DFS search from node u, until reach node v or no more unvisited nodes left )4,,( ≤= gaQ )( nmO + time!
  9. 9. A Nice Property based on MST  Theorem Two vertices u and v are reachable w.r.t. the weight constraint ≤ y in G Vertices u and v are⇔ reachable w.r.t. the constraint ≤ y in the MST of G. )4,,( ≤= gaQ )()( nOnmO →+ time With this property, can we further reduce the query time and how?
  10. 10. Proof of Theorem  Given and its MST , for any vertices , denote  The removal of creates two connected components and .  Define an edge cut in as Then and ),,( wEVG = T Vvu ∈, )(maxarg ),(max ewe vuPe T∈= maxe uT vT G },|),({ vuuv TbTaEbaeC ∈∈∈= uvCe ∈max )(min)( max ewew uvCe∈= according to the cut property of minimum spanning tree.
  11. 11. Proof of Theorem  For any path , we have .  For any , we have  Thus if , we can conclude and are not reachable w.r.t. the constraint . ),( vuP Φ≠uvCvuP ),( uvCvuPe ),('∈ ),()'()( max vuPewew ≤≤ yew >)( max u v y≤
  12. 12. For any , Given , if , then yes! The Maximum Edge Weight on MST 21, TvTu ∈∈ 4)()(max),( max ),( === ∈ ewewvuP vuPe T T ),,( yvuQ ≤= yvuPT ≤= 4),( 4 maxe 1T 2T
  13. 13. This Property can be Recursively Applied 1T11T 12T 3 maxe For any ,1211, TvTu ∈∈ 3)()(max),( max ),( === ∈ ewewvuP vuPe T T
  14. 14. Building a Hierarchical Edge Index Edge Index Tree
  15. 15. Query on the Edge Index Tree Given , we compute where is the lowest common ancestor of and in the edge index tree. can be computed in time based on size index. Then we only need to test whether or not to answer . ),,( yvuQ ≤= )),((),( vuLCAwvuPT = ),( vuLCA ),( vuLCA )1(O )(nO yvuLCAwvuPT ≤= )),((),( ),,( yvuQ ≤= u v
  16. 16. Query Processing: Examples )4,,( ≤= gaQ A: Yes!)2,,( ≤= daQ A: No!
  17. 17. Complexity Analysis Query Time Index Size Index Time )(nO)1(O )(nO to process queries or .),,( yvuQ ≤= ),,( xvuQ ≥= It can be easily extended to process .]),[,,( yxvuQ =
  18. 18. Answering WCR with a Disk-Resident Index  What happens if the edge index tree is too large to fit in memory?  Problem: it costs a large constant number of random I/O access if we store the edge index tree in the disk  Our solution: design a disk-resident index and an I/O efficient algorithm to answer a WCR query.
  19. 19. A Vertex Coding Idea  We pick an “arbitrary” node of an MST as the root to get a rooted MST. 4)},(),,({max),( == gbPbaPgaP TTT 2)},(),,({max),( == efPfaPeaP TTT ))},(,()),,(,({max),( vuLCAvPvuLCAuPvuP TTT =
  20. 20. Vertex Coding )}2,(),3,(),3,{()( fdbacode = )}4,(),4,(),4,{()( gcbhcode = 4}4,3max{)},(),,({max),( === bhPbaPhaP TTT
  21. 21. A Complexity Issue in Vertex Coding  We store the code for every vertex on the disk.  Given a query , and are read from the disk to compute .  Space complexity:  Query I/O complexity: , where B is the page size ),,( yvuQ ≤= )(ucode )(vcode ),( vuPT )()( 2 nOdepthnO ⊆⋅ )()( B n O B depth O ⊆
  22. 22. Bound the Tree Depth by Balancing  We will balance the rooted MST.  Definition (Median Node) Given an MST , a node is a median node of , if for each neighbor of , the following holds  The median node always and uniquely exists in a tree. We use the median node of an MST as its root. For each subtree underneath the root, we use the median node concept to balance the subtree recursively. T )(TVv∈ T 'v v 2 |)(| |)(| ' TV TV v ≤
  23. 23. Tree Balancing: Example Theorem The depth of the balanced tree is at most .n2log Corollary code(u) for any node u contains at most entries, thus can fit into one page (i.e., , where B=1024 or 4096 bytes). n2log Bn ≤2log
  24. 24. Complexity Analysis Query Time Index Size Index Time Memory Disk 2 I/Os )log( nnO)log( nnO to process queries or .),,( yvuQ ≤= ),,( xvuQ ≥= It can be easily extended to process .]),[,,( yxvuQ = )1(O )(nO)(nO
  25. 25. Experiments  Network statistics Network |V| |E| Facebook New Orleans 63,731 440,384 USARN 23,947,347 29,166,672
  26. 26. Experiment Settings  2.67G Hz CPU, 12GB Memory, test 10,000 queries  Memory-based methods  BFS/DFS on graph  MST-Index  Edge-Index  Disk-based methods  External BFS/DFS on graph  External MST  Balanced Tree Index
  27. 27. Memory-based Algorithms: Query Time Query Time in Microseconds (10-6 seconds) Network DFS BFS MST- Index Edge- Index Facebook 1,098 1,429 1 1 USARN 32,462 30,868 1,382 4
  28. 28. Memory-based Algorithms: Index Size Index Size in GB Network DFS BFS MST- Index Edge- Index Facebook 0.01 0.01 0.0008 0.0025 USARN 0.89 0.89 0.28 0.95
  29. 29. Memory-based Algorithms: Index Time Index Time in Seconds Network DFS BFS MST- Index Edge- Index Facebook 0.4 0.4 0.03 0.06 USARN 33.7 33.7 9.9 39.2
  30. 30. Disk-based Algorithms: Query Time Query Time in Microseconds (10-6 seconds) Network Ext- DFS Ext- BFS Ext- MST Balance d-Index Facebook 31,368 48,152 772 11 USARN 294,521 64,471 422,810 18
  31. 31. Disk-based Algorithms: Index Size Index Size in GB Network Ext- DFS Ext- BFS Ext- MST Balance d-Index Facebook 0.01 0.01 0.0008 0.0035 USARN 0.89 0.89 0.28 0.52
  32. 32. Disk-based Algorithms: Index Time Index Time in Seconds Network Ext- DFS Ext- BFS Ext- MST Balance d-Index Facebook 0.6 0.6 0.048 0.146 USARN 48.8 48.8 12.2 118.8
  33. 33. Summary and Contribution  The first study on WCR query  Computing Weight Constraint Reachability in Large Networks. The VLDB Journal, 22(3):275- 294, 2013.  Design two novel and efficient solutions  Memory: edge index tree for O(1) query time  Disk: balanced tree + vertex coding for 2 I/O query cost
  34. 34. K-Hop Reachability (K-Reach)  Input: an unweighted directed graph  Query: can node u reach node v via a path of length no more than k? faQ 3: → A: Yes! gaQ 3: → A: No!
  35. 35. Applications of K-Reach  In a wireless or sensor network, where a broadcasted message may get lost during any hop, the probability of reception degrades exponentially over multiple hops.  In social networks, the degree of acquaintance may even decrease super-exponentially (i.e., two persons may hardly know each other if they are just 3 hops apart).  K-Reach is helpful since it can model the level and sphere of the influence.
  36. 36. Vertex Cover  A set of vertices is a vertex cover of a graph , if for every edge , we have .  The problem of computing the minimum vertex cover is NP-hard.  But there is a polynomial time algorithm for computing a 2-approxiamte minimum vertex cover. VS ⊆ ),( EVG = Evu ∈),( Φ≠Svu },{
  37. 37. A Vertex Cover-based Index K-Reach Index for k=3
  38. 38. Query Processing  Given , there are four cases:  Case 1  Case 2  Case 3  Case 4 vuQ k→: SvSu ∈∈ , SvSu ∉∈ , SvSu ∈∉ , SvSu ∉∉ ,
  39. 39.  Let k=3,  if , we have  if , we have Case 1: Sgv ∈= SvSu ∈∈ , Sbu ∈= gb 3→ Siv ∈= ib 3→
  40. 40.  Let k=3,  if , we have  if , we have Case 2: Shv ∉= SvSu ∉∈ , Sdu ∈= hd 3→ Sjv ∉= jd 3→
  41. 41.  Let k=3,  if , we have  if , we have Case 3: Sdv ∈= SvSu ∈∉ , Sau ∉= da 3→ Sgv ∈= ga 3→
  42. 42.  Let k=3,  if , we have  if , we have Case 4: Sfv ∉= SvSu ∉∉ , Scu ∉= fc 3→ Shv ∉= hc 3→
  43. 43. Complexity Analysis  Index construction  2-approximate minimum vertex cover  K-Reach index  Query processing  Case 1  Case 2  Case 3  Case 4 )( nmO + )|)(|(∑ ∈Su k uGO )),(deg(log IuO out )),(deg),((deg GvIuO inout + )),(deg),((deg IvGuO inout + ))),(deg),(deg(( ),( GvIwO inoutGuoutNeiw +∑ ∈
  44. 44. Experiments  For processing k-hop reachability queries  For processing classic reachability queries (setting k=n)
  45. 45. Network Statistics
  46. 46. K-Reach: Query Processing Time
  47. 47. Query Breakdown
  48. 48. Classic Reachability: Query Processing Time
  49. 49. Index Construction Time
  50. 50. Index Size
  51. 51. Overall Performance
  52. 52. Summary and Contribution  The first study on K-Reach query  K-Reach: Who is in Your Small World. Proceedings of the VLDB Endowment, 5(11):1292-1303, 2012.  An efficient vertex cover-based index can answer both classic reachability and k-hop reachability queries
  53. 53. Conclusions  We study two graph reachability queries, WCR and K-Reach, when realistic constraints are imposed. This makes the answers to the queries more meaningful and practically useful in many applications.  We exploit the nice property for each query type and design efficient indices for processing these two types of queries.
  54. 54. Joint work with (in alphabetical order)  Lijun Chang  James Cheng  Miao Qiao  Lu Qin  Zechao Shang  Haixun Wang  Jeffrey Xu Yu  Philip S. Yu
  55. 55. References [1] Jeffrey Xu Yu, Jiefeng Cheng: Graph Reachability Queries: A Survey. Managing and Mining Graph Data 2010: 181-215 [2] Miao Qiao, Hong Cheng, Lu Qin, Jeffrey Xu Yu, Philip S. Yu, Lijun Chang: Computing weight constraint reachability in large networks. VLDB J. 22(3): 275-294 (2013) [3] James Cheng, Zechao Shang, Hong Cheng, Haixun Wang, Jeffrey Xu Yu: K-Reach: Who is in Your Small World. PVLDB 5(11): 1292-1303 (2012)

×