Query Dependent Ranking Using     K-Nearest Neighbor    Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew  Arnold, Hang Li, Heung-Y...
Outline•   Introduction•   Related Work•   Ranking Using K-Nearest Neighbor•   Experiments•   Conclusion and Future Work  ...
Introduction• Most of the existing methods do not take into  consideration the fact that significant differences  exist be...
Related Work• There has not been much previous work on  query dependent ranking. The most relevant  research topics are qu...
Illustration of Algorithm                            5
KNN Online & KNN Offline-1 Algorithm                                   6
Time Complexity• n denotes the number of documents to be ranked for the test query• k denotes the number of nearest neighb...
Experiments (Dataset)• In the experiment, used data obtained from a commercial search  engine.• There are two data sets   ...
Experiments (Parameter Selection)• In the experiments, adopted Ranking SVM as the basic  learning algorithm.   – Ranking S...
Experiments (Evaluation Measure)• NDCG is used as an evaluation measure.• NDCG at position n is calculated as   – j is the...
Comparisons with Baselines•   Single(Single model approach)•   QC(Query classification based approach)     –   Implemented...
Effects of Different k Values                                12
Change Ratio between KNN Online and KNN Offline• If the training query sets in KNN Online and KNN Offline have a large  ov...
Conclusion & Future Work• Ranking of documents in search should be conducted by using dierent  models based on different p...
Upcoming SlideShare
Loading in …5
×

Query dependent ranking using k nearest neighbor

1,117 views
972 views

Published on

Published in: Technology, Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,117
On SlideShare
0
From Embeds
0
Number of Embeds
197
Actions
Shares
0
Downloads
21
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Query dependent ranking using k nearest neighbor

  1. 1. Query Dependent Ranking Using K-Nearest Neighbor Xiubo Geng, Tie-Yan Liu, Tao Qin, Andrew Arnold, Hang Li, Heung-Yeung Shum, SIGIR’08 Presenter:Yueng Sheng Su Date:2011-11-18 1
  2. 2. Outline• Introduction• Related Work• Ranking Using K-Nearest Neighbor• Experiments• Conclusion and Future Work 2
  3. 3. Introduction• Most of the existing methods do not take into consideration the fact that significant differences exist between queries, and only resort to a single function in ranking of documents.• It is necessary to employ different ranking models for dierent queries and conduct what called query-dependent ranking. 3
  4. 4. Related Work• There has not been much previous work on query dependent ranking. The most relevant research topics are query classification and learning to rank. 4
  5. 5. Illustration of Algorithm 5
  6. 6. KNN Online & KNN Offline-1 Algorithm 6
  7. 7. Time Complexity• n denotes the number of documents to be ranked for the test query• k denotes the number of nearest neighbors• m denotes the number of queries in the training data 7
  8. 8. Experiments (Dataset)• In the experiment, used data obtained from a commercial search engine.• There are two data sets – Dataset1 containing 1,500 training queries and 400 test queries. – Dataset2 containing 3,000 training queries and 800 test queries.• Each query is associated with its retrieved documents along with labels representing there levance of those documents.• There are five levels of relevance: perfect, excellent, good, fair, and bad.• For a query-document pair, a feature vector is defined. There are in total 200 features in cluding term frequency, PageRank, etc. 8
  9. 9. Experiments (Parameter Selection)• In the experiments, adopted Ranking SVM as the basic learning algorithm. – Ranking SVM has only one parameter λ representing the trade-off between empirical loss and model complexity. Set λ = 0.01 for all methods.• In KNN, used BM25 as the reference model to rank documents. – Chose the top T = 50 documents, and then created query features. 9
  10. 10. Experiments (Evaluation Measure)• NDCG is used as an evaluation measure.• NDCG at position n is calculated as – j is the position in the document list. – r(j) is the score of the j-th document in the document list(represent scores of perfect, excellent, good, fair, and bad as 4, 3, 2, 1, 0, respectively). – Zn is a normalizing factor. Zn is chosen so that for the perfect list NDCG at each position equals one. 10
  11. 11. Comparisons with Baselines• Single(Single model approach)• QC(Query classification based approach) – Implemented the query type classi fier to classify queries into three categories(topic distillation, name page finding, and homepage finding). – Then trained one ranking model for each category.• KNN(KNN methods) – set k = 400 for Dataset1, – and k = 800 for Dataset2 11
  12. 12. Effects of Different k Values 12
  13. 13. Change Ratio between KNN Online and KNN Offline• If the training query sets in KNN Online and KNN Offline have a large overlap(i.e. ϒ is very small), then the performances of KNN Online and KNN Offline will be very close to each other.• We call ϒ the change ratio. 13
  14. 14. Conclusion & Future Work• Ranking of documents in search should be conducted by using dierent models based on different properties of queries.• The complexity of the offline processing is still high.• The complexity of online processing in the KNN methods can be further reduced if use KD-trees or other advanced data structures for nearest neighbor search.• Try to define other query features and investigate their effects on performance.• Here used Euclidean distance as the metric in KNN methods. It would be interesting to see whether other metrics can work better for the task.• It is also a common practice to use a fixed radius in KNN. 14

×