Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.

Slideshare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.

Successfully reported this slideshow.

Like this presentation? Why not share!

3,454 views

Published on

Published in:
Technology

No Downloads

Total views

3,454

On SlideShare

0

From Embeds

0

Number of Embeds

2,464

Shares

0

Downloads

7

Comments

0

Likes

2

No embeds

No notes for slide

- 1. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 1
- 2. “when the centrality-like rank computed on a local graph differ from the ones on the global graph” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 Local Ranking Problem - Bressan et al. in WWW 2013, “The Power of Local Information in PageRank” - Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank” - Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 0.3 0.6 0.3 0.3 0.2 0.4 0.3 0.6 0.2 2
- 3. The BrowseGraph user session BrowseGraph 3 “a graph where nodes are webpages and edges are browsing transitions” user navigation (e.g. Flickr) construction
- 4. Centrality Metrics applied to the BrowseGraph Increasing popularity in recent years - Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation” - Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior” - Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling” Provide higher-quality rankings compared to standard hyperlinks graphs - Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.” 4
- 5. Local Ranking Problem on the BrowseGraph WHY? 5
- 6. Local Ranking Problem on the BrowseGraph WHY? Image Ranking in Flickr in SIGIR 2012 We compared different ranking approaches on the BrowseGraph (PageRank and BrowseRank among others) How much our rank could vary having more information (i.e. nodes)? 6
- 7. BrowseGraph and ReferrerGraphs ReferrerGraphs: Domain-dependent Browse Graph Construct different BrowseGraphs based on the referrer domain Recommend news articles following the ReferrerGraphs BrowseGraph Twitter ReferrerGraph Facebook ReferrerGraph 7 Can we rely on centrality-based algorithms to infer news importance?
- 8. Local Ranking Problem on the BrowseGraph Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment) How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph Discover the referrer domain when it is not available (not discussed in the presentation—please see the paper) 8
- 9. Social Networks Search Engines News Homepage Yahoo News BrowseGraph ~500M pageviews Local Ranking Problem on the BrowseGraph 1. Construct the BrowseGraph (our “global graph”) 2. Construct the ReferrerGraphs (our “local graphs”) 9
- 10. Very different dimensions Subgraph Comparison Very well connected (also Reddit—the smallest one) 10
- 11. Cross-distance Kendall-tau among common nodes (min overlap 1k) In general the similarities are very low (<0.3) ~different content or different users’ interest Search engines are the most similar (>0.5) Subgraph Comparison 11
- 12. 1. For each ReferrerGraph 2. Compare the PageRank values with the global one (Kendall-tau) 3. Expand with the next neighborhood of nodes 4. Iterate till the convergence is closer to 1 Growing Rings Experiment Study of the LRP on the BrowseGraph by incrementally expand the local graph K(local+0, global) ~0.307 K(local+1, global) ~0.524 K(local+2, global) ~0.740 K(local+3, global) ~0.912 12
- 13. Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing) Growing Rings Experiment 13 Same size referrer-based (SRB) to measure the impact of the graph size Random (R) : 7 random graphs reﬂecting the size of the original RB graphs
- 14. Growing Rings Experiment 14 ReferrerGraphs
- 15. Growing Rings Experiment 15 same size RGs RandomReferrerGraphs
- 16. Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper) Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper) How does the expansion inﬂuences convergence if only few more representative nodes are selected ? Growing Rings Experiment with Selection of Nodes 16
- 17. Growing Rings Experiment with Selection of Nodes • 5 • 10 • 30 • 50 • 100 • 100 • 50 • 30 • 10 • 5 fewer more representative nodes lead to a better estimation of PageRank values in the ﬁrst iteration in the long run, expansions with the highest number of nodes present the best convergence 17
- 18. Growing Rings Expansion ..with Selected Nodes ~1 or 2 steps can be enough to estimate the PageRank score of the global graph Predicting Kendall-tau Distance Can we estimate the “distance” between the local and global PageRank only considering information available in the local graph ? 18
- 19. Hypothesis : some structural properties of the graph could be a good proxies for the tau value difference between local and global ranks. Predicting Kendall-tau Distance Can we estimate the distance between the local and global PageRank only considering information available in the local graph ? 19
- 20. Training Set Construction Predicting Kendall-tau Distance ReferrerGraph Jackknife resampling (1%, 5%, 10%, 20%) homepage Kendall-tau distance between ReferrerGraph and reduced subgraphs 20
- 21. Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops) • A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications” Predicting Kendall-tau Distance We compute 62 structural graphs metrics for each training instance Extract Structural Properties of each Graph 21
- 22. Regression Analysis (RF) in a ﬁve-fold CV over 10 iterations weighted degree : most predictive features ~better than using all the features assortativity : less predictive power ~too many features and too little training data? 22 Predicting Kendall-tau Distance
- 23. Predicting Kendall-tau Distance Most importance features in weighted degree : features based on the distribution of in- and out- degree: very straightforward to compute information alway available in the local graph 23
- 24. YES. With just few structural properties features of the of the local graph. Predicting Kendall-tau Distance Can we estimate the distance between the local and global PageRank only considering information available in the local graph ? 24
- 25. Summary How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes (“Growing Rings with Selection of Nodes”) It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph 25
- 26. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 26 Thanks.

No public clipboards found for this slide

Be the first to comment