Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
1
“when the centrality-like
rank computed on a local
graph differ from the ones
on the global graph”
0.4
0.6
0.5
0.1
0.2
0.3...
The BrowseGraph
user session
BrowseGraph
3
“a graph where nodes are webpages 

and edges are browsing transitions”
user na...
Centrality Metrics applied to
the BrowseGraph
Increasing popularity in recent years

- Chiarandini et al. in ICWSM 2013, “...
Local Ranking Problem
on the BrowseGraph
WHY?
5
Local Ranking Problem
on the BrowseGraph
WHY?
Image Ranking in Flickr in SIGIR 2012

We compared different ranking approac...
BrowseGraph and ReferrerGraphs
ReferrerGraphs: Domain-dependent Browse Graph
Construct different
BrowseGraphs based 

on t...
Local Ranking Problem
on the BrowseGraph
Study of the LRP on the BrowseGraph by incrementally
expand the local graph (“Gro...
Social Networks Search Engines
News
Homepage
Yahoo News
BrowseGraph
~500M pageviews
Local Ranking Problem on the BrowseGra...
Very different dimensions
Subgraph Comparison
Very well connected 

(also Reddit—the smallest one)
10
Cross-distance Kendall-tau among common nodes (min overlap 1k)
In general the similarities are very low (<0.3)

~different...
1. For each ReferrerGraph
2. Compare the PageRank values with the
global one (Kendall-tau)
3. Expand with the next neighbo...
Referrer-based (RB) : the 7 ReferrerGraphs
(Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing)
Growing Rings Experi...
Growing Rings Experiment
14
ReferrerGraphs
Growing Rings Experiment
15
same size RGs RandomReferrerGraphs
Hypothesis 1 : adding all the nodes mean to
add more information, therefore it should lead to
a faster convergence (Boldi ...
Growing Rings Experiment with Selection of Nodes
• 5
• 10
• 30
• 50
• 100
• 100
• 50
• 30
• 10
• 5
fewer more representati...
Growing Rings Expansion
..with Selected Nodes
~1 or 2 steps can be enough
to estimate the PageRank
score of the global gra...
Hypothesis : some structural properties of the
graph could be a good proxies for the tau value
difference between local an...
Training Set Construction
Predicting Kendall-tau Distance
ReferrerGraph
Jackknife resampling 

(1%, 5%, 10%, 20%)
homepage...
Size and Connectivity (S) : basic statistics
Assortativity (A) : tendency of node with a certain degree to be
linked with ...
Regression Analysis (RF) in a five-fold CV over 10 iterations
weighted degree : most predictive features
~better than using...
Predicting Kendall-tau Distance
Most importance features in weighted degree :
features based on the distribution
of in- an...
YES.

With just few structural properties
features of the of the local graph.
Predicting Kendall-tau Distance
Can we estim...
Summary
How the LRP behaves on the BrowseGraph:
expanding the local graph with the whole
neighborhoods (“Growing Rings” ex...
Local Ranking Problem
Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco
on the BrowseGraph
26
Thanks.
Upcoming SlideShare
Loading in …5
×

Presentation @SIGIR2015

3,309 views

Published on

The “Local Ranking Problem” (LRP) is related to the computation of a centrality-like rank on a local graph, where the scores of the nodes could significantly differ from the ones computed on the global graph. Previous work has studied LRP on the hyperlink graph but never on the BrowseGraph, namely a graph where nodes are webpages and edges are browsing transitions. Recently, this graph has received more and more attention in many different tasks such as ranking, prediction and recommendation. However, a webserver has only the browsing traffic performed on its pages (local BrowseGraph) and, as a consequence, the local computation can lead to estimation errors, which hinders the increasing number of applications in the state of the art. Also, although the divergence between the local and global ranks has been measured, the possibility of estimating such divergence using only local knowledge has been mainly overlooked. These aspects are of great interest for online service providers who want to: (i) gauge their ability to correctly assess the importance of their resources only based on their local knowledge, and (ii) take into account real user browsing fluxes that better capture the actual user interest than the static hyperlink network. We study the LRP problem on a BrowseGraph from a large news provider, considering as subgraphs the aggregations of browsing traces of users coming from different domains. We show that the distance between rankings can be accurately predicted based only on structural information of the local graph, being able to achieve an average rank correlation as high as 0.8.

Published in: Technology
  • Be the first to comment

Presentation @SIGIR2015

  1. 1. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 1
  2. 2. “when the centrality-like rank computed on a local graph differ from the ones on the global graph” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 Local Ranking Problem - Bressan et al. in WWW 2013, “The Power of Local Information in PageRank”
 - Bar-Yossef and Mashiach in CIKM 2008, “Local Approximation of PageRank and reverse PageRank”
 - Chen et al. in CIKM 2004, “Local Methods for Estimating PageRank Values” 0.4 0.6 0.5 0.1 0.2 0.3 0.01 0.01 0.1 0.3 0.6 0.3 0.3 0.2 0.4 0.3 0.6 0.2 2
  3. 3. The BrowseGraph user session BrowseGraph 3 “a graph where nodes are webpages 
 and edges are browsing transitions” user navigation
 (e.g. Flickr) construction
  4. 4. Centrality Metrics applied to the BrowseGraph Increasing popularity in recent years
 - Chiarandini et al. in ICWSM 2013, “Leveraging browsing patterns for topic discovery and photostream recommendation”
 - Trevisiol et al. in SIGIR 2012, “Image ranking based on user browsing behavior”
 - Liu et al. in CIKM 2011, “User browsing behavior-driven web crawling” Provide higher-quality rankings 
 compared to standard hyperlinks graphs
 - Y. Liu et al. in SIGIR 2008, “Browserank: letting web users vote for page importance.” 4
  5. 5. Local Ranking Problem on the BrowseGraph WHY? 5
  6. 6. Local Ranking Problem on the BrowseGraph WHY? Image Ranking in Flickr in SIGIR 2012 We compared different ranking approaches on the BrowseGraph (PageRank and BrowseRank among others) How much our rank could vary having more information (i.e. nodes)? 6
  7. 7. BrowseGraph and ReferrerGraphs ReferrerGraphs: Domain-dependent Browse Graph Construct different BrowseGraphs based 
 on the referrer domain Recommend news articles following the ReferrerGraphs BrowseGraph Twitter ReferrerGraph Facebook ReferrerGraph 7 Can we rely on 
 centrality-based algorithms to infer news importance?
  8. 8. Local Ranking Problem on the BrowseGraph Study of the LRP on the BrowseGraph by incrementally expand the local graph (“Growing Rings” experiment) How to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph Discover the referrer domain when it is not available 
 (not discussed in the presentation—please see the paper) 8
  9. 9. Social Networks Search Engines News Homepage Yahoo News BrowseGraph ~500M pageviews Local Ranking Problem on the BrowseGraph 1. Construct the BrowseGraph (our “global graph”) 2. Construct the ReferrerGraphs (our “local graphs”) 9
  10. 10. Very different dimensions Subgraph Comparison Very well connected 
 (also Reddit—the smallest one) 10
  11. 11. Cross-distance Kendall-tau among common nodes (min overlap 1k) In general the similarities are very low (<0.3)
 ~different content or different users’ interest Search engines are the most similar (>0.5) Subgraph Comparison 11
  12. 12. 1. For each ReferrerGraph 2. Compare the PageRank values with the global one (Kendall-tau) 3. Expand with the next neighborhood of nodes 4. Iterate till the convergence is closer to 1 Growing Rings Experiment Study of the LRP on the BrowseGraph 
 by incrementally expand the local graph K(local+0, global) ~0.307 K(local+1, global) ~0.524 K(local+2, global) ~0.740 K(local+3, global) ~0.912 12
  13. 13. Referrer-based (RB) : the 7 ReferrerGraphs (Facebook, Twitter, Reddit, Homepage, Yahoo, Google, Bing) Growing Rings Experiment 13 Same size referrer-based (SRB) to measure the impact of the graph size Random (R) : 7 random graphs reflecting the size of the original RB graphs
  14. 14. Growing Rings Experiment 14 ReferrerGraphs
  15. 15. Growing Rings Experiment 15 same size RGs RandomReferrerGraphs
  16. 16. Hypothesis 1 : adding all the nodes mean to add more information, therefore it should lead to a faster convergence (Boldi et al. [6] in the paper) Hypothesis 2 : the most representative nodes bring less noise and therefore a quicker convergence (Cho et al. [13] in the paper) How does the expansion influences convergence if only few more representative nodes are selected ? Growing Rings Experiment with Selection of Nodes 16
  17. 17. Growing Rings Experiment with Selection of Nodes • 5 • 10 • 30 • 50 • 100 • 100 • 50 • 30 • 10 • 5 fewer more representative nodes lead to a better estimation of PageRank values in the first iteration in the long run, expansions with the highest number of nodes present the best convergence 17
  18. 18. Growing Rings Expansion ..with Selected Nodes ~1 or 2 steps can be enough to estimate the PageRank score of the global graph Predicting Kendall-tau Distance Can we estimate the “distance” between the local and global PageRank only considering information available in the local graph ? 18
  19. 19. Hypothesis : some structural properties of the graph could be a good proxies for the tau value difference between local and global ranks. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 19
  20. 20. Training Set Construction Predicting Kendall-tau Distance ReferrerGraph Jackknife resampling 
 (1%, 5%, 10%, 20%) homepage Kendall-tau distance
 between ReferrerGraph
 and reduced subgraphs 20
  21. 21. Size and Connectivity (S) : basic statistics Assortativity (A) : tendency of node with a certain degree to be linked with nodes with similar degree Degree (D) : statistics on the degree distribution Weighted degree (W) : same as degree but considering the weight on edges (transitions) Local PageRank (P) : stats on the PageRank values Closeness centralization (C) : statistics on the distance (no hops) • A. Barrat et al. in Cambridge Univ. Press 2008, “Dynamical Processes on Complex Networks” • S. Wasserman and K. Faust in Cambridge Univ. Press 1994, “Social Network Analysis: Methods and Applications” Predicting Kendall-tau Distance We compute 62 structural graphs metrics for each training instance Extract Structural Properties of each Graph 21
  22. 22. Regression Analysis (RF) in a five-fold CV over 10 iterations weighted degree : most predictive features ~better than using all the features assortativity : less predictive power 
 ~too many features and too little training data? 22 Predicting Kendall-tau Distance
  23. 23. Predicting Kendall-tau Distance Most importance features in weighted degree : features based on the distribution of in- and out- degree: very straightforward to compute information alway available in the local graph 23
  24. 24. YES.
 With just few structural properties features of the of the local graph. Predicting Kendall-tau Distance Can we estimate the distance
 between the local and global PageRank only considering information available in the local graph ? 24
  25. 25. Summary How the LRP behaves on the BrowseGraph: expanding the local graph with the whole neighborhoods (“Growing Rings” experiment) or with the most representative nodes
 (“Growing Rings with Selection of Nodes”) It is possible to estimate the “distance” between the local and global PageRank exploiting the structural properties of the local graph 25
  26. 26. Local Ranking Problem Michele Trevisiol, Luca Maria Aiello, Paolo Boldi, Roi Blanco on the BrowseGraph 26 Thanks.

×