Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Time-aware Evaluation of Cumulative Citation Recommendation Systems

989 views

Published on

Work presented at the SIGIR 2013 workshop on Time-aware Information Access (#TAIA2013)

Published in: Technology, Business
  • Be the first to comment

  • Be the first to like this

Time-aware Evaluation of Cumulative Citation Recommendation Systems

  1. 1. Time-aware Evaluation of Cumulative Citation Recommendation Systems Krisztian Balog University of Stavanger SIGIR 2013 workshop on Time-aware Information Access (#TAIA2013) | Dublin, Ireland, Aug 2013 Laura Dietz, Jeffrey Dalton CIIR, University of Massachusetts, Amherst
  2. 2. CCR @TREC 2012 KBA
  3. 3. Evaluation methodology Target entity: Aharon Barak 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak score Target entity: Aharon Barak urlname stream_id Cutoff 1000 500 500 480 450 430 428 428 380 380 375 315 263 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak
  4. 4. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness”
  5. 5. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness” Results are evaluated in a single batch (temporal aspects are not considered)
  6. 6. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness” Evaluation metrics are set-based (using a confidence cut-off)
  7. 7. Aims - Develop a time-aware evaluation paradigm for streaming collections - Capture how retrieval effectiveness changes over time - Deal with ground truth of bursty nature - Accommodate various underlying user models - Test the ideas on CCR
  8. 8. Overview time 1. Slicing time 2. Measuring slice relevance 3. Aggregating slice relevance.87 .65 Slice importance
  9. 9. Overview time .87 .65 Slice importance 1. Slicing time
  10. 10. Slicing time - Simplifying assumptions - Slices are non-overlapping - Unconcerned about slices that don’t contain any relevant documents (A) Uniform slicing - Slices of equal length (B) Non-uniform slicing - Slices of varying length #relevant time (A) (B) ti
  11. 11. Overview time .87 .65 Slice importance 2. Measuring slice relevance
  12. 12. Measuring slice relevance - Ranked list of documents within a given slice - Evaluation metric - Standard IR metrics - MAP, R-Prec, NDCG d =< d1, . . . , dn > m(di, q)
  13. 13. Overview time .87 .65 Slice importance 3. Aggregating slice relevance
  14. 14. Aggregating slice relevance - Probabilistic formulation to estimate the likelihood of relevance P(r = 1|d, q, m) = X i2I P(r = 1|di, q, i)P(i|q) Slice-based relevance Slice importance ⇡ m(di, q)
  15. 15. Slice importance - Uniform slicing - All slices are equally important - Non-uniform slicing - Bursty periods (i.e., slices with more relevant documents) are more important P(i|q) = 1 I P(i|q) = #R(i, q) P i2I #R(i, q)
  16. 16. Experiments - Official TREC 2012 KBA CCR runs - 8 systems, best run for each system - Only uniform time slicing - Binary relevance
  17. 17. Results Atemporal vs. temporal ranking (MAP, weekly slicing) 0 0.15 0.3 0.45 0.6 UvA udel_fang LSIS CWI UMass_CIIR uiucGSLIS hltcoe igpi2012 helsinki Atemporal Temporal (uniform slice weighting) Temporal (non-uniform slice weighting)
  18. 18. 0 0.175 0.35 0.525 0.7 UvA udel_fang LSIS CWI UMass_CIIR uiucGSLIS hltcoe igpi2012 helsinki Atemporal Temporal (uniform slice weighting) Temporal (non-uniform slice weighting) Results Atemporal vs. temporal ranking (MAP, daily slicing)
  19. 19. Zooming in atemporal (MAP) temporal (MAP)temporal (MAP)temporal (MAP)temporal (MAP) atemporal (MAP) weekly slicingweekly slicing daily slicingdaily slicing atemporal (MAP) uniform non-uniform uniform non-uniform LSIS 0.48 0.52 0.54 0.60 0.62 CWI 0.45 0.48 0.51 0.62 0.63 LSIS CWI
  20. 20. Findings - Top performing teams are (almost) always the same, independent of the metric - Temporal evaluation provides additional insights
  21. 21. Wrap-up - Framework for temporal evaluation - Applied to the evaluation of TREC 2012 KBA CCR systems - Future work - Non-uniform slice weighting - Other streaming tasks/collections (e.g., microblog search) - Generalize to other time-aware information access tasks
  22. 22. Questions? Online appendix: http://ciir.cs.umass.edu/~dietz/streameval/

×