Basis topic [ c1, ……c16] use 16 categories (first level of ODP)
List of rank vectors[ r1, ……r16]
rj : (page -> page importance of topic cj)
rj = IPR (W, vj)
Content Query Processing
Goal : calculate some distribution of weights over 16 topics
Use a multinomial Naïve Bayes classifier
Training set : pages listed in ODP
Input : query or query, context
Output : probability distribution (weights) over the basis topics
Content Composite Link Score
Use the distribution w to weight the respective topic-specific ranks, forming the topic-sencitivePageRank score for Document d:
Test set of 10 queries
5 users were each shown top 10 results to queries, when ranking using
- Standard PageRank
A page was relevant if 3of the 5 users’ judged it to be relevant.
Training set : 280,000 of the 3 million URLs in the ODP
Experiment The average precision for rankings induced by topic-sensitive PageRank scoresis substantially higher than that of the unbiased PageRank.
Novelty Flexibility- uniformly treat variety of sources of context and personalization Transparency
Topic weights are easily interpreted by user
Privacy -topic weights reveal less unintentionally Efficiency -low query time cost, with small additional preprocessing cost
Discussion Why author chosen ODP? Rocchio K-Nearest Neighbor Decision Tree Naïve Bayes Support Vector Machine We don’t need the name of topic too much.
Discussion Why are the weights calculated by counting the number of terms?
In topic-sensitive PageRank, it consider how many occurrence of term
in each topic
Is there any way to calculate weights?
Count the number of documents, which have X query, of each topic If there is many retrieved document in topic A then X is very related with topic A
Discussion What if there are relevant document in topics which have 0 point? If A document does not relate with shopping I can not retrieve any document in shopping. Calculate how similar each topic is and cover 0 point