Algorithms for Big Data: Graphs and Memory Errors 4 (Lecture by Giuseppe Italiano)

2,720 views

Published on

The first part of my lectures will be devoted to the design of practical algorithms for very large graphs. The second part will be devoted to algorithms resilient to memory errors. Modern memory devices may suffer from faults, where some bits may arbitrarily flip and corrupt the values of the affected memory cells. The appearance of such faults may seriously compromise the correctness and performance of computations, and the larger is the memory usage the higher is the probability to incur into memory errors. In recent years, many algorithms for computing in the presence of memory faults have been introduced in the literature: in particular, an algorithm or a data structure is called resilient if it is able to work correctly on the set of uncorrupted values. This part will cover recent work on resilient algorithms and data structures.

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
2,720
On SlideShare
0
From Embeds
0
Number of Embeds
798
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Algorithms for Big Data: Graphs and Memory Errors 4 (Lecture by Giuseppe Italiano)

  1. 1. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . . ...... Computing the Diameter of Large-Scale Graphs (Based on work by Crescenzi, Grossi, Habib, Lanzi & Marino) ALMADA School, July – August 2013 ALMADA School Diameter of Large-Scale Graphs
  2. 2. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . . Definition .. ...... (Un)weighted (un)directed graph G = (V , E) (w : E → R) (Strongly) connected. The distance d(u, v) is the number (sum of the weights) of edges along shortest path from u to v. The diameter D of a graph is the length of the longest shortest path, D = maxu,v∈V d(u, v) ALMADA School Diameter of Large-Scale Graphs
  3. 3. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. . Definition .. ...... Forward Eccentricity of u: in how many hops can u reach any node? eccF(u) = maxv∈V d(u, v) Backward Eccentricity of u: in how many hops can u be reached from any node? eccB(u) = maxv∈V d(v, u) Diameter: maximum eccF or eccB .. v12 . v11 . v2 . v1 . v3 . v8 . v10 . v5 . v9 . v6 . v4 . v7 v1 v2 v3 v4 v5 v6 v7 v8 v9 v10 v11 v12 eccF v1 0 2 1 3 1 3 2 3 2 4 1 2 4 v2 1 0 2 1 2 3 2 3 3 4 2 3 4 v3 2 1 0 2 3 2 1 2 4 3 3 4 4 v4 1 3 2 0 2 2 1 2 3 3 2 3 3 v5 3 2 1 2 0 3 2 2 1 3 4 5 5 v6 2 4 3 1 3 0 2 3 4 4 3 4 4 v7 3 4 3 2 2 1 0 1 3 2 4 5 5 v8 4 3 2 3 1 4 3 0 2 1 5 6 6 v9 2 4 3 1 2 3 2 1 0 2 3 4 4 v10 5 4 3 4 2 5 4 1 3 0 6 7 7 v11 2 4 3 5 3 5 4 5 4 6 0 1 6 v12 1 3 2 4 2 4 3 4 3 5 2 0 5 eccB 5 4 3 5 3 5 4 5 4 6 6 7 ALMADA School Diameter of Large-Scale Graphs
  4. 4. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. .. v12 . v11 . v2 . v1 . v3 . v8 . v10 . v5 . v9 . v6 . v4 . v7 i FF i (v1) FB i (v1) 1 v3, v5, v11 v2, v4, v12 2 v2, v7, v9, v12 v3, v6, v9, v11 3 v4, v6, v8 v5, v7 4 v10 v8 5 v10 . Forward BFS Tree [O(m) time] .. ...... For any i, the forward fringe, FF i (u) (which nodes are at distance i from u) . Backward BFS Tree [O(m) time] .. ...... For any i, the backward fringe, FB i (u) (from which nodes, u is at distance i)... v1 . v3 . v11 . v12 . v5 . v2 . v7 . v9 . v4 . v6 . v8 . v10 . v1 . v12 . v11 . v2 . v4 . v3 . v6 . v5 . v7 . v9 . v8 . v10 ALMADA School Diameter of Large-Scale Graphs
  5. 5. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Known approaches . ...... Textbook Algorithm: Perform n fbfs and return maximum eccF. Each fbfs takes O(m) time. Total O(mn) time. Too expensive. Several other approaches (see [Zwick, 2001]) that solves all pairs shortest paths. Still too expensive. O(n(3+ω)/2 log n) where ω is the exponent of the matrix multiplication. Empirically finding lower bound L and upper bound U That is, L ≤ D ≤ U D found, when L = U ALMADA School Diameter of Large-Scale Graphs
  6. 6. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Known approaches . ...... In case of undirected graphs: Exact algorithm by Takes & Kosters [CIKM 2012]: up to 1M nodes, 100M edges Approximation algorithm by Ajwani, Meyer, Veith [ESA 2012]: O(m √ n + n2) time to produce estimate D such that ⌊2/3D⌋ ≤ D ≤ D. Roditty and Vassilevska Williams [STOC 2013]: same estimate in O(m √ n ) expected running time if for some ϵ > 0 an algorithm for undirected unweighted graphs runs in O(m2−ϵ ) time and produces an approximation D, with (2/3 + ϵ)D ≤ D ≤ D, then SAT for CNF formulas on n variables can be solved in O((2 − δ)n ) time for some constant δ > 0, and thus the widely believed strong exponential time hypothesis (SETH) of Impagliazzo, Paturi & Zane [JCSS’01] fails. ALMADA School Diameter of Large-Scale Graphs
  7. 7. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Easy lower and upper bounds .. By using Single source (fbfs) and Single target (bbfs) Shortest Path .. v12 . v11 . v2 . v1 . v3 . v8 . v10 . v5 . v9 . v6 . v4 . v7 .. v1 . v3 . v11 . v12 . v5 . v2 . v7 . v9 . v4 . v6 . v8 . v10 . v1 . v12 . v11 . v2 . v4 . v3 . v6 . v5 . v7 . v9 . v8 . v10 Lower Bound The maximum between the forward, eccF, and the backward eccentricity, eccB, of a node. In the example, lower bound is 5: at least a pair is at distance 5. Upper bound The forward eccentricity plus the backward eccentricity eccB (height of the bbfs tree) of a node. In the example, upper bound is 9: every node can reach another node going to v1 in ≤ 5 steps and going to the destination in ≤ 4 steps. . ...... x : d(x, u) = i (x ∈ FB i (u)) and y : d(u, y) = j (y ∈ FF j (u)) =⇒ d(x, y) ≤ i + j i + j is the length of a path from x to y passing through u. Very often: L < D < U (will see in the experiments) In the example, diameter is 7: d(v10, v12) = 7. ALMADA School Diameter of Large-Scale Graphs
  8. 8. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Good lower bounds in undirected graphs . 2-Sweep .. ...... ...1 Run a bfs from a random node r: let a be the farthest node. ...2 Run a bfs from a: let b be the farthest node. ...3 Return the length of the path from a to b. .. r .a . b Return d(a, b). ALMADA School Diameter of Large-Scale Graphs
  9. 9. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments Machine with: Pentium Dual-Core CPU (Intel E5200 @ 2.50GHz), 8GB shared memory. Running with: OS Debian GNU/Linux 6.0, Linux kernel version 2.6.32 gcc version 4.4.5. Code and the data set available at http://piluc.dsi.unifi.it/lasagne/ ALMADA School Diameter of Large-Scale Graphs
  10. 10. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments: effectiveness of 2-Sweep By starting from the highest degree node. 2-dSweepHdOut Category # of Net- works # of Net- works in which lb is tight Maximum error Protein-Protein Interaction 14 11 1 Collaboration 14 12 1 Undirected Social 4 4 0 Undirected Communication 36 34 2 Autonomous System 2 1 1 Road 3 1 14 Word Adjacency 7 4 1 ALMADA School Diameter of Large-Scale Graphs
  11. 11. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Bad cases for 2-Sweep .. x1 . · · · . xp . y In this modified grid with k rows and 1 + 3k/2 columns. The algorithm returns k + 1. The diameter of the network is instead 3k/2. ALMADA School Diameter of Large-Scale Graphs
  12. 12. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Good lower bounds in directed graphs .. . 2-dSweep .. ...... ...1 Run a forward bfs from a random node r: let a1 be the farthest node. ...2 Run a backward bfs from a1: let b1 be the farthest node. ...3 Run a backward bfs from r: let a2 be the farthest node. ...4 Run a forward bfs from a2: let b2 be the farthest node. ...5 If eccB (a1) > eccF (a2), then return the length of the path from b1 to a1. Otherwise return the length of the path from a2 to b2. .. r .a1 . b1 .. r .a2 . b2 Return the maximum between d(a2, b2) and d(b1, a1). First time used in directed graph by Broder et al. to study Graph structure in the web. ALMADA School Diameter of Large-Scale Graphs
  13. 13. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Lower bound: experiments (snap.stanford.edu and webgraph.dsi.unimi.it dataset) .. Numb. of runs Worst Network name D (out of 10) LB in which found LB = D Wiki-Vote 9 10 9 p2p-Gnutella08 19 9 18 p2p-Gnutella09 19 9 18 p2p-Gnutella06 19 10 19 p2p-Gnutella05 22 9 21 p2p-Gnutella04 25 7 22 p2p-Gnutella25 21 8 20 p2p-Gnutella24 28 10 28 p2p-Gnutella30 23 2 22 p2p-Gnutella31 30 9 29 s.s.Slashdot081106 15 10 15 s.s.Slashdot090216 15 10 15 s.s.Slashdot090221 15 10 15 soc-Epinions1 16 9 15 Email-EuAll 10 10 10 soc-sign-epinions 16 10 16 web-NotreDame 93 10 93 Slashdot0811 12 10 12 Slashdot0902 13 3 12 WikiTalk 10 9 9 web-Stanford 210 10 210 web-BerkStan 679 10 679 web-Google 51 10 51 Numb. of runs Worst Network name D (out of 10) LB in which found LB = D wordassociation-2011 10 9 9 enron 10 10 10 uk-2007-05@100000 7 10 7 cnr-2000 81 10 81 uk-2007-05@1000000 40 10 40 in-2004 56 10 56 amazon-2008 47 10 47 eu-2005 82 10 82 indochina-2004 235 10 235 uk-2002 218 10 218 arabic-2005 133 10 133 uk-2005 166 10 166 it-2004 873 10 873 ALMADA School Diameter of Large-Scale Graphs
  14. 14. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments: effectiveness of 2-dSweep By starting from the highest out-degree or the highest in-degree node. 2-dSweepHdOut 2-dSweepHdIn Category # of Net- works # of Net- works in which lb is tight Max er- ror # of Net- works in which lb is tight Max er- ror Metabolic Bipartite 76 73 19 75 19 Metabolic Compound 76 73 9 75 9 Metabolic Reaction 76 73 10 75 10 Directed Social 10 10 0 10 0 Web 16 16 0 16 0 Citation 2 2 0 2 0 Communication 3 3 0 2 1 P2P 9 8 1 7 1 Product co-Purchasing 5 5 0 5 0 Word-association 1 1 0 1 0 ALMADA School Diameter of Large-Scale Graphs
  15. 15. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Directed iterative fringe upper bound (difub ) .. . Recall that .. ...... The trivial algorithm runs a fbfs for any node and return the maximum eccF found (or a bbfs for any node and return the maximum eccB found). . difub is just a special case in which we: .. ...... specify the order in which the bfses have to be executed refine a lower bound, that is the maximum eccF or eccB found until that moment. upper bound the eccentricities of the remaining nodes. stop when the remaining nodes cannot have eccentricity higher than current lower bound. ALMADA School Diameter of Large-Scale Graphs
  16. 16. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Finding a good order for bfses . ...... ...1 Find a starting node u: highest out-degree node highest in-degree node “central” node ...2 Find a good order by analyzing how nodes are placed in the fbfs or bbfs tree of u. ALMADA School Diameter of Large-Scale Graphs
  17. 17. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Finding a “central” node . With heuristic 2-dSweep .. ...... ...1 Run a forward bfs from a random node r: let a1 be the farthest node. ...2 Run a backward bfs from a1: let b1 be the farthest node. ...3 Run a backward bfs from r: let a2 be the farthest node. ...4 Run a forward bfs from a2: let b2 be the farthest node. ...5 If eccB (a1) > eccF (a2), then set u as the middle node between a1 and b1 and the lower bound ℓ equal to eccB (a1). Otherwise, set u as the middle node between a2 and b2 and the lower bound ℓ = eccF (a2). .. r .a1 . b1 .. r .a2 . b2 ALMADA School Diameter of Large-Scale Graphs
  18. 18. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Main idea for bounding eccentricities . Theorem .. ...... For any integer i with 1 < i ≤ eccB (u), for any integer k with 1 ≤ k < i, and for any node x ∈ FB i−k (u) such that eccF (x) > 2(i − 1), there exists y ∈ FF j (u), for some j ≥ i, such that d(x, y) = eccF (x). .. u . u . Level i . Level j . x . y If the forward eccentricity of x is > 2(i − 1), the node y, such that d(x, y) > 2(i − 1), is below in the fbfs tree of u. Analogously: . Theorem .. ...... For any integer i with 1 < i ≤ eccF (u), for any integer k with 1 ≤ k < i, and for any node x ∈ FF i−k (u) such that eccB (x) > 2(i − 1), there exists y ∈ FB j (u), for some j ≥ i, such that d(y, x) = eccB (x). ALMADA School Diameter of Large-Scale Graphs
  19. 19. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . . ...... What the theorems say: ...1 For each node x above level i in bbfs(u) with eccF(x) > 2(i − 1) there must be a corresponding node y on or below level i in fbfs(u), with eccB(y) ≥ eccF(x). ...2 For each node t above level i in fbfs(u) with eccB(t) > 2(i − 1) there must be a corresponding node z on or below level i in bbfs(u), with eccF(z) ≥ eccB(t). .. u . u . Level i . x . y . z . t . ...... Theorems above suggest following algorithm: ...1 Perform forward and backward bfs from a node u and visit trees fbfs(u) and bbfs(u) bottom-up ...2 For each level i, compute the eccentricities of all nodes at level i. At this point, have all the eccB of nodes y and the eccF of nodes z below level i. Let lower bound ℓi be the current maximum. ...3 If ℓi is already bigger than 2(i − 1), then no node to be examined can have eccF or eccB bigger than ℓi : stop and output ℓi as the diameter! ALMADA School Diameter of Large-Scale Graphs
  20. 20. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Upper bound: experiments (snap.stanford.edu dataset) .. Network name n m Avg. Visits Visits worst run Wiki-Vote 1300 39456 17 17 p2p-Gnutella08 2068 9313 45.9 64 p2p-Gnutella09 2624 10776 202.1 230 p2p-Gnutella06 3226 13589 236.6 279 p2p-Gnutella05 3234 13453 60.4 94 p2p-Gnutella04 4317 18742 36.7 38 p2p-Gnutella25 5153 17695 85.1 161 p2p-Gnutella24 6352 22928 13 13 p2p-Gnutella30 8490 31706 255.4 516 p2p-Gnutella31 14149 50916 208.7 255 s.s.Slashdot081106 26996 337351 22.3 25 s.s.Slashdot090216 27222 342747 21.5 26 s.s.Slashdot090221 27382 346652 22.8 26 soc-Epinions1 32223 443506 6.1 7 Email-EuAll 34203 151930 6 6 soc-sign-epinions 41441 693737 6 6 web-NotreDame 53968 304685 7 7 Slashdot0811 70355 888662 40 40 Slashdot0902 71307 912381 32.9 40 WikiTalk 111881 1477893 13.6 19 web-Stanford 150532 1576314 6 6 web-BerkStan 334857 4523232 7 7 web-Google 434818 3419124 9.4 10 ALMADA School Diameter of Large-Scale Graphs
  21. 21. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Upper bound: experiments (webgraph.dsi.unimi.it dataset) .. Network name n m Avg. Visits Visits worst run wordassociation-2011 4845 61567 412.5 423 enron 8271 147353 19 22 uk-2007-05@100000 53856 1683102 14 14 cnr-2000 112023 1646332 17 17 uk-2007-05@1000000 480913 22057738 6 6 in-2004 593687 7827263 14 14 amazon-2008 627646 4706251 136.3 598 eu-2005 752725 17933415 6 6 indochina-2004 3806327 98815195 8 8 uk-2002 12090163 232137936 6 6 arabic-2005 15177163 473619298 58 58 uk-2005 25711307 704151756 170 170 it-2004 29855421 938694394 87 87 ALMADA School Diameter of Large-Scale Graphs
  22. 22. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments for directed graphs .. 1 10 100 1000 10000 100000 100 1000 10000 100000 1e+006 1e+007 1e+008 visits nodes diFUBHdOut diFUBHdIn diFUB+2dSweepHdOut diFUB+2SweepHdIn ALMADA School Diameter of Large-Scale Graphs
  23. 23. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments for directed graphs .. . ...... Performance gain increases (exponentially) with graph size For big graphs (geq 10,000 nodes), ≤ 0.001n visits (instead of n) Number of visits performed is asymptotically constant? ALMADA School Diameter of Large-Scale Graphs
  24. 24. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Experiments for undirected graphs .. 1 10 100 1000 10000 100000 100 1000 10000 100000 1e+006 1e+007 visits nodes iFUBHd iFUB+2SweepHd . ...... The undirected version of difub (called ifub) computed the diameter of Facebook with just 17 bfses. ALMADA School Diameter of Large-Scale Graphs
  25. 25. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Bad cases for difub and ifub Cases in which nodes have close eccentricity. 1. A cycle . All nodes with eccentricity equal to diameter D/2 + 1 iterations will always be executed ALMADA School Diameter of Large-Scale Graphs
  26. 26. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Bad cases for difub and ifub Cases in which nodes have close eccentricity. 2. Special regular graphs [such as Moore graphs] . All nodes with eccentricity equal to diameter D/2 + 1 iterations will always be executed ALMADA School Diameter of Large-Scale Graphs
  27. 27. ..... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . .... . .... . ..... . .... . ..... . .... . .... . .. Future Work? 2-Sweep (both directed and undirected) seems effective in finding tight lower bounds (except for road networks). Why? It is known that in chordal graphs the error of 2-Sweep can be at most 1. Do real-world networks (except for road networks) have some property that may be related to some sort of chordality measure? Understand why the difub and ifub methods work so well in general. Might be related to eccentricities distribution? Design faster (external memory / parallel) implementations of bfs. ALMADA School Diameter of Large-Scale Graphs

×