Two Unrelated Talks

251 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
251
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Two Unrelated Talks

  1. 1. 1/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheory Two unrelated talksLocal ranking inpracticeConclusionspsort, yet another M ARCO B RESSANfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psort January 30, 2012ConclusionsConclusions
  2. 2. Outline 2/43 1 Local computation of PageRank: the ranking sideLocalcomputation of IntroductionPageRank: theranking side MotivationsIntroductionMotivations Local ranking in theoryLocal ranking intheory Local ranking in practiceLocal ranking inpractice ConclusionsConclusionspsort, yet anotherfast stable 2 psort, yet another fast stable external sorting softwareexternal sortingsoftware IntroductionIntroductionMaking sorting a Making sorting a complicate taskcomplicate taskInside psort Inside psortConclusions ConclusionsConclusions 3 Conclusions
  3. 3. 3/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusions Local computation of PageRank:psort, yet anotherfast stable the ranking sideexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  4. 4. Ranking robustly 4/43Local Rank a graph’s nodescomputation ofPageRank: theranking sideIntroduction 1. the graph 2. external factorsMotivationsLocal ranking intheory • (varying) parametersLocal ranking inpracticeConclusions • graph availabilitypsort, yet anotherfast stable • ...external sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  5. 5. Ranking robustly 4/43Local Rank a graph’s nodescomputation ofPageRank: theranking sideIntroduction 1. the graph 2. external factorsMotivationsLocal ranking intheory • (varying) parametersLocal ranking inpracticeConclusions • graph availabilitypsort, yet anotherfast stable • ...external sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusions Is ranking robust?Conclusions How is ranking influenced by external factors?
  6. 6. PageRank 5/43Local PageRank of node v:computation ofPageRank: theranking side u P (u)Introduction P (v) =Motivations u→v o(u) vLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  7. 7. PageRank 5/43Local PageRank of node v:computation ofPageRank: theranking side u P (u) 1−αIntroduction P (v) = α +Motivations u→v o(u) n vLocal ranking intheoryLocal ranking inpractice n = |G| α = damping factorConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  8. 8. PageRank 5/43Local PageRank of node v:computation ofPageRank: theranking side u P (u) 1−αIntroduction P (v) = α +Motivations u→v o(u) n vLocal ranking intheoryLocal ranking inpractice n = |G| α = damping factorConclusionspsort, yet anotherfast stable Applicationsexternal sortingsoftware web search, web crawling, web spam detection, personalized web search, social networkIntroductionMaking sorting a mining, ranking in databases, structural re-ranking, opinion mining, word sensecomplicate taskInside psort disambiguation, credit and reputation systems, bibliometrics, gene ranking, . . .ConclusionsConclusions Among top data mining algorithms Wu et al. Top 10 algorithms in data mining. Knowl. and Inform. Systems, 2007.
  9. 9. Choose the damping, choose the ranking? 6/43 Is PageRank’s ranking P (u) 1−α robust to small variationsLocalcomputation of P (v) = α +PageRank: the u→v o(u) n in α ?ranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  10. 10. Choose the damping, choose the ranking? 6/43 Is PageRank’s ranking P (u) 1−α robust to small variationsLocalcomputation of P (v) = α +PageRank: the u→v o(u) n in α ?ranking sideIntroductionMotivationsLocal ranking intheory ResultsLocal ranking inpracticeConclusions 1. not robust in theory (permutation theorem, reversal theorem)psort, yet anotherfast stable 2. novel tools for checking robustness (lineage analysis)external sortingsoftware 3. somewhat robust in real-world graphs (experiments)IntroductionMaking sorting acomplicate taskInside psort Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking?ConclusionsConclusions J. Discrete Algorithms 8(2): 199-213 (2010) Marco Bressan, Enoch Peserico. Choose the damping, choose the ranking? Proc. of WAW 2009: 76-89
  11. 11. Is it possible to compute the rank locally? 7/43 Local computation RankingLocalcomputation ofPageRank: the 0.15ranking sideIntroductionMotivations 0.3 0.1Local ranking intheory uLocal ranking in 0.2practiceConclusions v 0.25psort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  12. 12. Is it possible to compute the rank locally? 7/43 Local computation RankingLocalcomputation ofPageRank: the 4th 0.15ranking sideIntroduction 1st 5thMotivations 0.3 0.1 3rdLocal ranking intheory u 0.2 2ndLocal ranking inpracticeConclusions v 0.25psort, yet anotherfast stable In many applicationsexternal sortingsoftware only the rank matters!IntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  13. 13. Is it possible to compute the rank locally? 7/43 Local computation RankingLocalcomputation ofPageRank: the 4th 0.15ranking sideIntroduction 1st 5thMotivations 0.3 0.1 3rdLocal ranking intheory u 0.2 2ndLocal ranking inpracticeConclusions v 0.25psort, yet anotherfast stable In many applicationsexternal sortingsoftware only the rank matters!IntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions Is it possible to compute the rank locally? • stated by Chen et al. (CIKM 2004) • restated by Bar-Yossef and Mashiach (CIKM 2008)
  14. 14. Motivating examples (I): crawling 8/43Localcomputation ofPageRank: the The visited graph expands startingranking sideIntroduction from seed nodes.MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  15. 15. Motivating examples (I): crawling 8/43Localcomputation ofPageRank: the The visited graph expands startingranking sideIntroduction from seed nodes.MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  16. 16. Motivating examples (I): crawling 8/43Localcomputation ofPageRank: the The visited graph expands startingranking sideIntroduction from seed nodes.MotivationsLocal ranking in Which red nodes should be visitedtheoryLocal ranking in now? And in what order?practiceConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  17. 17. Motivating examples (I): crawling 8/43Localcomputation ofPageRank: the The visited graph expands startingranking sideIntroduction from seed nodes.MotivationsLocal ranking in Which red nodes should be visitedtheoryLocal ranking in now? And in what order?practiceConclusionspsort, yet another Order the nodes with PageRank!fast stableexternal sortingsoftware Cho et al. Efficient crawling through URLIntroduction ordering. Computer Networks, 1998.Making sorting acomplicate taskInside psortConclusionsConclusions Is it possible to rank the red frontier for a low cost, without visiting the whole crawled graph?
  18. 18. Motivating examples (II): ranking with competitors 9/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate task Retrieve graph structure using e.g. Google’s link:Inside psortConclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverseConclusions PageRank. Proc. ACM CIKM, 2008.
  19. 19. Motivating examples (II): ranking with competitors 9/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate task Retrieve graph structure using e.g. Google’s link:Inside psortConclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverseConclusions PageRank. Proc. ACM CIKM, 2008.
  20. 20. Motivating examples (II): ranking with competitors 9/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate task Retrieve graph structure using e.g. Google’s link:Inside psortConclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverseConclusions PageRank. Proc. ACM CIKM, 2008.
  21. 21. Motivating examples (II): ranking with competitors 9/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate task Retrieve graph structure using e.g. Google’s link:Inside psortConclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverseConclusions PageRank. Proc. ACM CIKM, 2008.
  22. 22. Motivating examples (II): ranking with competitors 9/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate task Retrieve graph structure using e.g. Google’s link:Inside psortConclusions Bar-Yossef and Mashiach. Local approximation of PageRank and reverseConclusions PageRank. Proc. ACM CIKM, 2008. Is it possible to compute this rank efficiently, using few queries?
  23. 23. Motivating examples (III): social network mining 10/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a Rank key users in social networkscomplicate taskInside psort Heidemann et al. Identifying key users in online social networks: AConclusions PageRank based approach. Proc. ICIS, 2010.Conclusions Full graph not available (privacy settings).
  24. 24. Motivating examples (III): social network mining 10/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a Rank key users in social networkscomplicate taskInside psort Heidemann et al. Identifying key users in online social networks: AConclusions PageRank based approach. Proc. ICIS, 2010.Conclusions Full graph not available (privacy settings).
  25. 25. Motivating examples (III): social network mining 10/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a Rank key users in social networkscomplicate taskInside psort Heidemann et al. Identifying key users in online social networks: AConclusions PageRank based approach. Proc. ICIS, 2010.Conclusions Full graph not available (privacy settings).
  26. 26. Motivating examples (III): social network mining 10/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a Rank key users in social networkscomplicate taskInside psort Heidemann et al. Identifying key users in online social networks: AConclusions PageRank based approach. Proc. ICIS, 2010.Conclusions Full graph not available (privacy settings).
  27. 27. Motivating examples (III): social network mining 10/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a Rank key users in social networkscomplicate taskInside psort Heidemann et al. Identifying key users in online social networks: AConclusions PageRank based approach. Proc. ICIS, 2010.Conclusions Full graph not available (privacy settings). Is it still possible to pretend correctness of the output ranking?
  28. 28. Formal definition of the problem 11/43Local Input Outputcomputation ofPageRank: theranking side • graph G of size n • ranking of {v1 , v2 , . . . , vk }IntroductionMotivationsLocal ranking in • target nodes v1 , . . . , vk If (1 − ) < P (vj ) < (1 + ) P (vi )theoryLocal ranking in • score separation > 0 any ranking of {vi , vj } is validpracticeConclusionspsort, yet anotherfast stableexternal sorting Cost ModelsoftwareIntroduction • computation for freeMaking sorting acomplicate task • but visiting G costsInside psortConclusions (query to link server)Conclusions cost of ranking = |queries| = |nodes visited|
  29. 29. Is it possible to compute the rank locally? 12/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  30. 30. Is it possible to compute the rank locally? Our contribution: NO! 12/43Local NO in theory: lower boundscomputation ofPageRank: theranking sideIntroduction 1. Every deterministic local ranking algorithm has an adversarialMotivationsLocal ranking in graph forcing Ω(n) queries (and can be tightened)theoryLocal ranking inpractice 2. Every randomized local ranking algorithm has an adversarialConclusionspsort, yet another graph forcing Ω(n) queriesfast stableexternal sorting even to rank the top k nodes,softwareIntroduction even if their scores are highly separated!Making sorting acomplicate taskInside psortConclusionsConclusions =⇒ a general low-cost local ranking algorithm does not exist
  31. 31. Is it possible to compute the rank locally? Our contribution: NO! 12/43Localcomputation ofPageRank: theranking sideIntroduction NO in practice: experimental resultsMotivationsLocal ranking intheoryLocal ranking inpractice 1. real web/social graphs behave like worst-case input instancesConclusions for local rankingpsort, yet anotherfast stableexternal sorting 2. approximating is not trivial:softwareIntroduction state-of-the-art local score approximation algorithms do notMaking sorting acomplicate task turn into low-cost local rank approximation algorithmsInside psortConclusionsConclusions
  32. 32. Lower bounds (I): deterministic algorithms 13/43 Every det.Localcomputation of algorithm has anPageRank: theranking side adversarial graphIntroduction forcing cost Ω(n)MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction Theorem 1 (paper Thm. 4)Making sorting acomplicate task α2Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . ForConclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where theConclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  33. 33. Lower bounds (I): deterministic algorithms 13/43 Every det.Localcomputation of algorithm has anPageRank: theranking side adversarial graphIntroduction forcing cost Ω(n)MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction Theorem 1 (paper Thm. 4)Making sorting acomplicate task α2Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . ForConclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where theConclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  34. 34. Lower bounds (I): deterministic algorithms 13/43 Every det.Localcomputation of algorithm has anPageRank: theranking side adversarial graphIntroduction forcing cost Ω(n)MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction Theorem 1 (paper Thm. 4)Making sorting acomplicate task α2Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . ForConclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where theConclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  35. 35. Lower bounds (I): deterministic algorithms 13/43 Every det.Localcomputation of algorithm has anPageRank: theranking side adversarial graphIntroduction forcing cost Ω(n)MotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction Theorem 1 (paper Thm. 4)Making sorting acomplicate task α2Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . ForConclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where theConclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) queries.
  36. 36. Lower bounds (I): deterministic algorithms 13/43 Every det.Localcomputation of algorithm has anPageRank: theranking side adversarial graphIntroduction forcing cost Ω(n)MotivationsLocal ranking in n(1 − O( k))theoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction Theorem 1 (paper Thm. 4)Making sorting acomplicate task α2Inside psort Choose integers k > 1 and n0 ≥ k2 , a damping factor α ∈ (0, 1), and ≤ 20k . ForConclusions any deterministic local algorithm A there exists a graph of size n ∈ Θ(n0 ) where theConclusions top k nodes v0 , . . . , vk−1 are -separated and, to compute their relative ranking according to Pα (·), algorithm A performs Ω(n) n(1 − O( k)) queries.
  37. 37. Lower bounds (II): randomized algorithms 14/43 Every rand. v1 (Las Vegas or v2 link server ARANDOMLocalcomputation of Monte Carlo) graph GPageRank: theranking side algorithm has an ~104.5 queries (109 nodes)IntroductionMotivations advers. graph v20Local ranking intheory forcing costLocal ranking in Ω α n [v3 v10 ... v7]practiceConclusionspsort, yet anotherfast stableexternal sorting Theorem 2 (paper Thm. 3)softwareIntroduction α2 k2 α2Making sorting a Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and ∈ 4n0 , 24k . Thencomplicate taskInside psort 1. for any Las Vegas local algorithm AConclusions 2. for any Monte Carlo local algorithm A with constant confidenceConclusions there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are n -separated and, to compute their relative ranking, A performs in expectation Ω α queries.
  38. 38. Lower bounds (II): randomized algorithms 14/43 Every rand. v1 (Las Vegas or v2 link server ARANDOMLocalcomputation of Monte Carlo) graph GPageRank: theranking side algorithm has an ~104.5 108 queries (109 nodes)IntroductionMotivations advers. graph v20Local ranking intheory forcing costLocal ranking in Ω α n Ω(n) [v3 v10 ... v7]practiceConclusionspsort, yet anotherfast stableexternal sorting Theorem 2 (paper Thm. 3)softwareIntroduction α2 k2 α2Making sorting a Choose k > 1, n0 ≥ 6k3 , a damping factor α ∈ (0, 1), and ∈ 4n0 , 24k . Thencomplicate taskInside psort 1. for any Las Vegas local algorithm AConclusions 2. for any Monte Carlo local algorithm A with constant confidenceConclusions there exists a graph of size n ∈ Θ(n0 ) where the top k nodes v0 , . . . , vk−1 are n -separated and, to compute their relative ranking, A performs in expectation Ω α queries.
  39. 39. What happens in practice? 15/43 Two experimentsLocalcomputation of 1. Hardness of real-world graphsPageRank: theranking sideIntroduction Compute the minimal number of nodes that an algorithm mustMotivationsLocal ranking in visit to always guarantee a correct ranking.theoryLocal ranking inpracticeConclusions 2. Performance of approximation algorithmspsort, yet anotherfast stable Evaluate cost and accuracy of local ranking algorithms derivedexternal sortingsoftware from state-of-the-art local score approximation algorithms.IntroductionMaking sorting acomplicate taskInside psort DatasetsConclusionsConclusions nodes arcs crawled publicly available from LAW .it 40M 1150M 2004 - Univ. Milan LiveJournal 5M 79M 2008 http://law.dsi.unimi.it
  40. 40. Exp. 1: hardness of real-world graphs (1/2) 16/43Localcomputation of Breakdown of a local ranking algorithmPageRank: theranking sideIntroductionMotivations 1. Visit ancestors 2. Compute rankingLocal ranking intheoryLocal ranking inpractice Thm.: must visit at least Thm.: must agree withConclusions | minset(G, u, v)| natural PageRank scorepsort, yet anotherfast stable ancestors approximationexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  41. 41. Exp. 1: hardness of real-world graphs (1/2) 16/43Localcomputation of Breakdown of a local ranking algorithmPageRank: theranking sideIntroductionMotivations 1. Visit ancestors 2. Compute rankingLocal ranking intheoryLocal ranking inpractice Thm.: must visit at least Thm.: must agree withConclusions | minset(G, u, v)| natural PageRank scorepsort, yet anotherfast stable ancestors approximationexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions | minset(G, u, v)| ≤ cost of ranking u, v in graph G
  42. 42. Exp. 1: hardness of real-world graphs (2/2) 17/43 107 average number of visited nodesLocalcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking in 106theoryLocal ranking inpracticeConclusions 5psort, yet another 10fast stableexternal sortingsoftware 104Introduction .it web graphMaking sorting acomplicate taskInside psortConclusions LiveJournal graphConclusions 103 2.56 1.28 .64 .32 .16 .08 .04 .02 .01 ε
  43. 43. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limitLocal PageRank computation to ancestors giving a high contribution.computation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions v pruning threshold = 10%
  44. 44. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limitLocal PageRank computation to ancestors giving a high contribution.computation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftware 10%IntroductionMaking sorting acomplicate task 35% 24%Inside psortConclusions 17%Conclusions v pruning threshold = 10%
  45. 45. Exp. 2: performance of approximation algorithms 18/43 Improved variant of the pruned bruteforce algorithm: limitLocal PageRank computation to ancestors giving a high contribution.computation ofPageRank: theranking side <10%IntroductionMotivations <10%Local ranking intheory <10% <10% <10%Local ranking inpracticeConclusions <10%psort, yet anotherfast stableexternal sortingsoftware 10%IntroductionMaking sorting acomplicate task 35% 24%Inside psortConclusions 17%Conclusions v pruning threshold = 10%
  46. 46. Exp. 2: performance of approximation algorithms 19/43 .it web graph 106Localcomputation ofPageRank: theranking sideIntroductionMotivations (2.56,5.12)Local ranking in 5 average cost 10theoryLocal ranking inpractice (1.28,2.56)Conclusions (0.64,1.28)psort, yet anotherfast stable (0.32,0.64) (0.16,0.32) 104external sortingsoftwareIntroduction (0.08,0.16)Making sorting a (0.04,0.08) (0.02,0.04)complicate taskInside psortConclusions (0.01,0.02) 3Conclusions 10 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  47. 47. Exp. 2: performance of approximation algorithms 20/43 LiveJournal graph 106Localcomputation ofPageRank: theranking sideIntroductionMotivations (2.56,5.12)Local ranking in 5 average cost 10theoryLocal ranking inpractice (1.28,2.56)Conclusions (0.64,1.28)psort, yet anotherfast stable (0.32,0.64) (0.16,0.32) 104external sortingsoftwareIntroduction (0.08,0.16)Making sorting a (0.04,0.08) (0.02,0.04)complicate taskInside psortConclusions (0.01,0.02) 3Conclusions 10 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  48. 48. Exp. 2: performance of approximation algorithms 21/43 .it web graph fraction of correctly ranked node pairsLocalcomputation ofPageRank: the 1ranking side 0.8IntroductionMotivationsLocal ranking in (2.56,5.12)theoryLocal ranking in 0.6practice (1.28,2.56) (0.64,1.28)Conclusions 0.4 (0.32,0.64)psort, yet anotherfast stableexternal sorting (0.16,0.32)softwareIntroduction 0.2 (0.08,0.16)Making sorting a (0.04,0.08) 0complicate taskInside psort (0.02,0.04)Conclusions (0.01,0.02)Conclusions -0.2 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  49. 49. Exp. 2: performance of approximation algorithms 22/43 LiveJournal graph fraction of correctly ranked node pairsLocalcomputation ofPageRank: the 1ranking side 0.8IntroductionMotivationsLocal ranking in (2.56,5.12)theoryLocal ranking in 0.6practice (1.28,2.56) (0.64,1.28)Conclusions 0.4 (0.32,0.64)psort, yet anotherfast stableexternal sorting (0.16,0.32)softwareIntroduction 0.2 (0.08,0.16)Making sorting a (0.04,0.08) 0complicate taskInside psort (0.02,0.04)Conclusions (0.01,0.02)Conclusions -0.2 10-1 10-2 10-3 10-4 10-5 10-6 10-7 pruning threshold
  50. 50. Conclusions 23/43Localcomputation ofPageRank: theranking side 1. Local computation of PageRank ranking is infeasibleIntroductionMotivationsLocal ranking intheoryLocal ranking in 2. Cost of exact local ranking algorithms bounded by minsetspracticeConclusionspsort, yet another 3. Tested real web/social graphs are near worst-casefast stableexternal sortingsoftwareIntroductionMaking sorting a 4. And approximation is not trivialcomplicate taskInside psortConclusionsConclusions Marco Bressan, Luca Pretto. Local computation of PageRank: the ranking side. Proc. of CIKM 2011: 631-640
  51. 51. 24/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusions psort, yet another fast stablepsort, yet anotherfast stable external sorting softwareexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  52. 52. In a nutshell 25/43 the psort sorting libraryLocalcomputation ofPageRank: the • written in C++ranking sideIntroduction • handles large datasets (> TB)MotivationsLocal ranking intheory • stable sortingLocal ranking inpractice • fastConclusionspsort, yet another • designed for PC-class machinesfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  53. 53. In a nutshell 25/43 the psort sorting libraryLocalcomputation ofPageRank: the • written in C++ranking sideIntroduction • handles large datasets (> TB)MotivationsLocal ranking intheory • stable sortingLocal ranking inpractice • fastConclusionspsort, yet another • designed for PC-class machinesfast stableexternal sortingsoftwareIntroduction ideal applications of psortMaking sorting acomplicate taskInside psort • sorting large databasesConclusions • sorting large log filesConclusions • sorting on commodity machines • ...
  54. 54. psort and the Sort Benchmark (1/2) 26/43 The PennySort BenchmarkLocal Sort what you can in 0.01$ of computing time.computation ofPageRank: theranking sideIntroduction 400 GB yearly record (Sort Benchmark) t orMotivationsLocal ranking in 350 GB pstheoryLocal ranking in 300 GBpracticeConclusions 250 GBpsort, yet anotherfast stable 200 GBexternal sortingsoftware 150 GBIntroduction 100 GBMaking sorting acomplicate taskInside psort 50 GBConclusions 0 GB 98 99 00 02 03 07 08 09 11Conclusions 19 19 20 20 20 20 20 20 20 Source: http://sortbenchmark.org Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software. ACM Journal of Experimental Algorithmics 16: (2011)
  55. 55. psort and the Sort Benchmark (2/2) 27/43 The Datamation BenchmarkLocal Sort 100MB disk-to-disk as fast as you can.computation ofPageRank: theranking sideIntroductionMotivations 980 sLocal ranking in thunder (1987)theoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting a 440 mscomplicate taskInside psort NOW-sort (2001)Conclusions psort (2011)Conclusions Paolo Bertasi, Michele Bonazza, Marco Bressan, Enoch Peserico: Datamation. A Quarter of a Century and Four Orders of Magnitude Later. CLUSTER 2011: 605-609
  56. 56. psort and the STXXL library 28/43 200 stxxl on disks (8,8) stxxl on disks (8,32)Local 180 stxxl on disks (8,128)computation ofPageRank: the stxxl on RAID (8,8)ranking side 160 stxxl on RAID (8,32)Introduction stxxl on RAID (8,128)Motivations psort on RAID (8,8) 140Local ranking in psort on RAID (8,32) sort speed (in MB/s)theory psort on RAID (8,128)Local ranking in 120practiceConclusions 100psort, yet anotherfast stableexternal sorting 80softwareIntroduction 60Making sorting acomplicate taskInside psort 40ConclusionsConclusions 20 0 1 2 3 4 10 10 10 10 sort size (in MB)
  57. 57. Machine budget for Sort Benchmark 2011 29/43 RAMLocalcomputation of Motherboard 47 EURPageRank: the 60 EUR CPU 38 EURranking sideIntroductionMotivationsLocal ranking intheory CaseLocal ranking in 22 EUR Power Supply UnitpracticeConclusionspsort, yet another 15 EURfast stableexternal sortingsoftware Assembly feeIntroductionMaking sorting a 35 EURcomplicate taskInside psortConclusionsConclusions Hard Disks 215 EUR
  58. 58. The big picture 30/43 psort execution diagramLocalcomputation ofPageRank: the 1MB, 10GB/sranking sideIntroduction CPU/cacheMotivationsLocal ranking intheoryLocal ranking in mergesort heap merge heap mergepracticeConclusionspsort, yet anotherfast stableexternal sorting main memory 1GB, 3GB/ssoftwareIntroductionMaking sorting acomplicate task 1st disk pass 2nd disk passInside psortConclusionsConclusions external memory 1TB, 0.7GB/s time
  59. 59. The big picture - now complicated 31/43 Hardware/software details you must deal with:Localcomputation ofPageRank: theranking sideIntroduction • hdd quality • buffer sizeMotivationsLocal ranking in I/O • file system • direct transfertheoryLocal ranking inpractice • scheduling • data placementConclusionspsort, yet anotherfast stable • size • page sizeexternal sortingsoftware memory • bandwidth • access patternIntroductionMaking sorting acomplicate task • latency • conflictsInside psortConclusionsConclusions • size • line size cache • speed • associativity
  60. 60. Hard disks 32/43 The speed curve of 13 “identical” WD1600JS disksLocalcomputation of 150PageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking in 100 Bandwidth (MB/s)practiceConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroduction 50Making sorting acomplicate taskInside psortConclusionsConclusions 0 0 50 100 150 Distance from the outer rim (in GB)
  61. 61. Memory 33/43 Why main memory is not really a RAMLocalcomputation of 4.5PageRank: theranking sideIntroduction 4MotivationsLocal ranking in 3.5theoryLocal ranking in 3 bandwidth (GB/s)practiceConclusions 2.5psort, yet anotherfast stableexternal sorting 2 L2 cache line sizesoftwareIntroduction 1.5Making sorting a sequential readcomplicate task random read 1Inside psort sequential writeConclusions random write 0.5Conclusions 20 22 24 26 28 210 212 214 216 218 struct size (bytes)
  62. 62. CPU 34/43 Is a dual-core always worth its price?Localcomputation ofPageRank: the 3e+10ranking side Intel dual core readIntroduction Intel dual core writeMotivations 2.5e+10 AMD single core readLocal ranking in AMD single core write bandwidth (MB/s)theoryLocal ranking inpractice 2e+10Conclusionspsort, yet another 1.5e+10fast stableexternal sortingsoftwareIntroduction 1e+10Making sorting acomplicate taskInside psort 5e+09ConclusionsConclusions 0 16 18 20 22 24 26 28 30 log2( bytes visited )
  63. 63. A list of psort’s tricks 35/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  64. 64. A list of psort’s tricks 35/43 • fast polling • key pre/postLocal general • payload processingcomputation ofPageRank: theranking side detachment • ...IntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  65. 65. A list of psort’s tricks 35/43 • fast polling • key pre/postLocal general • payload processingcomputation ofPageRank: theranking side detachment • ...IntroductionMotivationsLocal ranking in • O_DIRECTtheoryLocal ranking in disk • uniform fetchingpractice access • independentConclusions • ...psort, yet another disksfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  66. 66. A list of psort’s tricks 35/43 • fast polling • key pre/postLocal general • payload processingcomputation ofPageRank: theranking side detachment • ...IntroductionMotivationsLocal ranking in • O_DIRECTtheoryLocal ranking in disk • uniform fetchingpractice access • independentConclusions • ...psort, yet another disksfast stableexternal sortingsoftware • smart merging • special base caseIntroductionMaking sorting a mergesortcomplicate task • quasi-in-place • ...Inside psortConclusionsConclusions
  67. 67. A list of psort’s tricks 35/43 • fast polling • key pre/postLocal general • payload processingcomputation ofPageRank: theranking side detachment • ...IntroductionMotivationsLocal ranking in • O_DIRECTtheoryLocal ranking in disk • uniform fetchingpractice access • independentConclusions • ...psort, yet another disksfast stableexternal sortingsoftware • smart merging • special base caseIntroductionMaking sorting a mergesortcomplicate task • quasi-in-place • ...Inside psortConclusionsConclusions • key caching • payload interleaving heapsort • key offsetting • ...
  68. 68. A list of psort’s tricks 35/43 • fast polling • key pre/postLocal general • payload processingcomputation ofPageRank: theranking side detachment • ...IntroductionMotivationsLocal ranking in • O_DIRECTtheoryLocal ranking in disk • uniform fetchingpractice access • independentConclusions • ...psort, yet another disksfast stableexternal sortingsoftware • smart merging • special base caseIntroductionMaking sorting a mergesortcomplicate task • quasi-in-place • ...Inside psortConclusionsConclusions • key caching • payload interleaving heapsort • key offsetting • ...
  69. 69. Smart merging (1/3) 36/43 Naive mergingLocalcomputation ofPageRank: theranking side void merge(T *s1, T *s2, T *out, int size) {IntroductionMotivations int i = 0, j = 0, k = 0;Local ranking in bool bit;theory while ((i < size) & (j < size)) {Local ranking inpractice if (s1[i] > s2[j]) { // READ + READConclusions out[k] = s2[j]; // READpsort, yet another j++;fast stableexternal sorting } else {software out[k] = s1[i]; // (READ)Introduction i++;Making sorting acomplicate task }Inside psort k++;Conclusions ...Conclusions
  70. 70. Smart merging (1/3) 36/43 Naive mergingLocalcomputation ofPageRank: theranking side void merge(T *s1, T *s2, T *out, int size) {IntroductionMotivations int i = 0, j = 0, k = 0;Local ranking in bool bit;theory while ((i < size) & (j < size)) {Local ranking inpractice if (s1[i] > s2[j]) { // READ + READConclusions out[k] = s2[j]; // READpsort, yet another j++;fast stableexternal sorting } else {software out[k] = s1[i]; // (READ)Introduction i++;Making sorting acomplicate task }Inside psort k++;Conclusions ...Conclusions total mem READs per iteration: 3
  71. 71. Smart merging (2/3) 37/43 Smart mergingLocal void merge(T* s1, T* s2, T* out, int size) {computation ofPageRank: the int i = 0, j = 0, k = 0;ranking side bool bit;IntroductionMotivations T cache[ 2 ];Local ranking in cache[0] = s1[0];theoryLocal ranking in cache[1] = s2[0];practice while ((i < size) & (j < size)) {Conclusions if (cache[0] > cache[1]) {psort, yet another out[k] = cache[1];fast stableexternal sorting cache[1] = s2[j]; // READsoftware j++;IntroductionMaking sorting a } else {complicate task out[k] = cache[0];Inside psortConclusions cache[0] = s1[i]; // (READ) i++;Conclusions } k++; ...
  72. 72. Smart merging (2/3) 37/43 Smart mergingLocal void merge(T* s1, T* s2, T* out, int size) {computation ofPageRank: the int i = 0, j = 0, k = 0;ranking side bool bit;IntroductionMotivations T cache[ 2 ];Local ranking in cache[0] = s1[0];theoryLocal ranking in cache[1] = s2[0];practice while ((i < size) & (j < size)) {Conclusions if (cache[0] > cache[1]) {psort, yet another out[k] = cache[1];fast stableexternal sorting cache[1] = s2[j]; // READsoftware j++;IntroductionMaking sorting a } else {complicate task out[k] = cache[0];Inside psortConclusions cache[0] = s1[i]; // (READ) i++;Conclusions } k++; ... total mem READs per iteration: 1
  73. 73. Smart merging (3/3) 38/43 Time required to merge two sequencesLocal 800000computation of smart mergePageRank: the naive mergeranking side 700000IntroductionMotivationsLocal ranking in 600000theoryLocal ranking in time in microsecondspractice 500000Conclusionspsort, yet another 400000fast stableexternal sortingsoftware 300000IntroductionMaking sorting acomplicate task 200000Inside psortConclusions 100000Conclusions 0 10 12 14 16 18 20 22 24 log2( merge size )
  74. 74. Quasi-in-place mergesort (1/3) 39/43 traditional mergesortLocalcomputation ofPageRank: the void mergesort(T* input, T* output, int size) {ranking sideIntroduction for (int i = 1; i < log2(size); i++) {Motivations int subsize = 1 << (i + 1);Local ranking in for (int j = 0; j < size/subsize; j++) {theoryLocal ranking in merge(&input[j * subsize],practice &input[(j + 1) * subsize],Conclusions &output[j * subsize * 2],psort, yet anotherfast stable subsize);external sorting T* tmp = input; // swap input and outputsoftware input = output;IntroductionMaking sorting a output = tmp;complicate task }Inside psortConclusions } }Conclusions
  75. 75. Quasi-in-place mergesort (1/3) 39/43 traditional mergesortLocalcomputation ofPageRank: the void mergesort(T* input, T* output, int size) {ranking sideIntroduction for (int i = 1; i < log2(size); i++) {Motivations int subsize = 1 << (i + 1);Local ranking in for (int j = 0; j < size/subsize; j++) {theoryLocal ranking in merge(&input[j * subsize],practice &input[(j + 1) * subsize],Conclusions &output[j * subsize * 2],psort, yet anotherfast stable subsize);external sorting T* tmp = input; // swap input and outputsoftware input = output;IntroductionMaking sorting a output = tmp;complicate task }Inside psortConclusions } }Conclusions extra space = N
  76. 76. Quasi-in-place mergesort (2/3) 40/43 “quasi-in-place” mergesortLocalcomputation ofPageRank: the void mergesort(T* input, T* output, int size) {ranking side for (int i = 1; i < log2(size/2); i++) {Introduction int subsize = 1 << (i + 1);MotivationsLocal ranking in for (int j = 0; j < size/subsize; j++) {theory /* merge, overwriting the input vector */Local ranking inpractice merge(&input[j * subsize],Conclusions &input[(j + 1) * subsize],psort, yet another &input[(j - 1) * subsize],fast stableexternal sorting subsize);software }Introduction input = &input[-subsize]; // shift input leftMaking sorting acomplicate task }Inside psort // finally merge into the output vectorConclusions merge(input, &input[size/2], output, size/2);Conclusions }
  77. 77. Quasi-in-place mergesort (2/3) 40/43 “quasi-in-place” mergesortLocalcomputation ofPageRank: the void mergesort(T* input, T* output, int size) {ranking side for (int i = 1; i < log2(size/2); i++) {Introduction int subsize = 1 << (i + 1);MotivationsLocal ranking in for (int j = 0; j < size/subsize; j++) {theory /* merge, overwriting the input vector */Local ranking inpractice merge(&input[j * subsize],Conclusions &input[(j + 1) * subsize],psort, yet another &input[(j - 1) * subsize],fast stableexternal sorting subsize);software }Introduction input = &input[-subsize]; // shift input leftMaking sorting acomplicate task }Inside psort // finally merge into the output vectorConclusions merge(input, &input[size/2], output, size/2);Conclusions } extra space = N/2
  78. 78. Quasi-in-place mergesort (3/3) 41/43 Average time required to compare two keysLocal 4computation ofPageRank: theranking side 3.5IntroductionMotivationsLocal ranking in 3theoryLocal ranking in 2.5 relative unitiespracticeConclusionspsort, yet another 2fast stableexternal sortingsoftware 1.5IntroductionMaking sorting acomplicate task 1Inside psortConclusions 0.5Conclusions quasi-in-place 0 10 12 14 16 18 20 22 24 log2( input size in bytes )
  79. 79. Conclusions 42/43Localcomputation ofPageRank: theranking sideIntroductionMotivations 1. Solving old problems really fast is still trickyLocal ranking intheoryLocal ranking inpractice 2. To do it, you must match today’s hardwareConclusionspsort, yet anotherfast stableexternal sorting 3. Solution: software engineering and tuningsoftwareIntroductionMaking sorting acomplicate taskInside psort Paolo Bertasi, Marco Bressan, Enoch Peserico. psort, yet another fast stable sorting software.Conclusions ACM Journal of Experimental Algorithmics 16: (2011)Conclusions
  80. 80. Conclusions 43/43Localcomputation ofPageRank: theranking sideIntroductionMotivationsLocal ranking intheoryLocal ranking inpracticeConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  81. 81. Conclusions 43/43Local Rankingcomputation ofPageRank: theranking side 1. Local computation of PageRank ranking infeasible in theoryIntroductionMotivations 2. On tested web/social graphs, infeasible also in practiceLocal ranking intheoryLocal ranking in 3. Rank analysis requires novel tools!practiceConclusionspsort, yet anotherfast stableexternal sortingsoftwareIntroductionMaking sorting acomplicate taskInside psortConclusionsConclusions
  82. 82. Conclusions 43/43Local Rankingcomputation ofPageRank: theranking side 1. Local computation of PageRank ranking infeasible in theoryIntroductionMotivations 2. On tested web/social graphs, infeasible also in practiceLocal ranking intheoryLocal ranking in 3. Rank analysis requires novel tools!practiceConclusionspsort, yet anotherfast stable Sortingexternal sortingsoftware 1. Solving old problems really fast is still trickyIntroductionMaking sorting acomplicate task 2. To do it, you must match today’s hardwareInside psortConclusions 3. Software engineering and tuning are the waysConclusions
  83. 83. Conclusions 43/43Local Rankingcomputation ofPageRank: theranking side 1. Local computation of PageRank ranking infeasible in theoryIntroductionMotivations 2. On tested web/social graphs, infeasible also in practiceLocal ranking intheoryLocal ranking in 3. Rank analysis requires novel tools!practiceConclusionspsort, yet anotherfast stable Sortingexternal sortingsoftware 1. Solving old problems really fast is still trickyIntroductionMaking sorting acomplicate task 2. To do it, you must match today’s hardwareInside psortConclusions 3. Software engineering and tuning are the waysConclusions And of course now you should pay me twice! :-)

×