Time-aware Evaluation of
Cumulative Citation
Recommendation Systems
Krisztian Balog
University of Stavanger
SIGIR 2013 wor...
CCR @TREC 2012 KBA
Evaluation methodology
Target entity: Aharon Barak
1328055120'f6462409e60d2748a0adef82fe68b86d
1328057880'79cdee3c9218ec77...
CCR @TREC 2012 KBA
- Cumulative citation recommendation
- Filter a time-ordered corpus for documents that are
highly relev...
CCR @TREC 2012 KBA
- Cumulative citation recommendation
- Filter a time-ordered corpus for documents that are
highly relev...
CCR @TREC 2012 KBA
- Cumulative citation recommendation
- Filter a time-ordered corpus for documents that are
highly relev...
Aims
- Develop a time-aware evaluation paradigm for
streaming collections
- Capture how retrieval effectiveness changes ov...
Overview
time
1. Slicing time
2. Measuring
slice relevance
3. Aggregating
slice relevance.87
.65
Slice
importance
Overview
time
.87
.65
Slice
importance
1. Slicing time
Slicing time
- Simplifying assumptions
- Slices are non-overlapping
- Unconcerned about slices that don’t contain any
rele...
Overview
time
.87
.65
Slice
importance
2. Measuring
slice relevance
Measuring slice relevance
- Ranked list of documents within a given slice
- Evaluation metric
- Standard IR metrics
- MAP,...
Overview
time
.87
.65
Slice
importance
3. Aggregating
slice relevance
Aggregating slice relevance
- Probabilistic formulation to estimate the
likelihood of relevance
P(r = 1|d, q, m) =
X
i2I
P...
Slice importance
- Uniform slicing
- All slices are equally important
- Non-uniform slicing
- Bursty periods (i.e., slices...
Experiments
- Official TREC 2012 KBA CCR runs
- 8 systems, best run for each system
- Only uniform time slicing
- Binary rel...
Results
Atemporal vs. temporal ranking (MAP, weekly slicing)
0
0.15
0.3
0.45
0.6
UvA
udel_fang LSIS CWI
UMass_CIIR
uiucGSL...
0
0.175
0.35
0.525
0.7
UvA
udel_fang LSIS CWI
UMass_CIIR
uiucGSLIS
hltcoe
igpi2012
helsinki
Atemporal
Temporal (uniform sl...
Zooming in
atemporal
(MAP)
temporal (MAP)temporal (MAP)temporal (MAP)temporal (MAP)
atemporal
(MAP)
weekly slicingweekly s...
Findings
- Top performing teams are (almost) always the
same, independent of the metric
- Temporal evaluation provides add...
Wrap-up
- Framework for temporal evaluation
- Applied to the evaluation of TREC 2012 KBA CCR
systems
- Future work
- Non-u...
Questions?
Online appendix:
http://ciir.cs.umass.edu/~dietz/streameval/
Upcoming SlideShare
Loading in …5
×

Time-aware Evaluation of Cumulative Citation Recommendation Systems

881 views

Published on

Work presented at the SIGIR 2013 workshop on Time-aware Information Access (#TAIA2013)

Published in: Technology, Business
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
881
On SlideShare
0
From Embeds
0
Number of Embeds
15
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Time-aware Evaluation of Cumulative Citation Recommendation Systems

  1. 1. Time-aware Evaluation of Cumulative Citation Recommendation Systems Krisztian Balog University of Stavanger SIGIR 2013 workshop on Time-aware Information Access (#TAIA2013) | Dublin, Ireland, Aug 2013 Laura Dietz, Jeffrey Dalton CIIR, University of Massachusetts, Amherst
  2. 2. CCR @TREC 2012 KBA
  3. 3. Evaluation methodology Target entity: Aharon Barak 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak score Target entity: Aharon Barak urlname stream_id Cutoff 1000 500 500 480 450 430 428 428 380 380 375 315 263 1328055120'f6462409e60d2748a0adef82fe68b86d 1328057880'79cdee3c9218ec77f6580183cb16e045 1328057280'80fb850c089caa381a796c34e23d9af8 1328056560'450983d117c5a7903a3a27c959cc682a 1328056560'450983d117c5a7903a3a27c959cc682a 1328056260'684e2f8fc90de6ef949946f5061a91e0 1328056560'be417475cca57b6557a7d5db0bbc6959 1328057520'4e92eb721bfbfdfa0b1d9476b1ecb009 1328058660'807e4aaeca58000f6889c31c24712247 1328060040'7a8c209ad36bbb9c946348996f8c616b 1328063280'1ac4b6f3a58004d1596d6e42c4746e21 1328064660'1a0167925256b32d715c1a3a2ee0730c 1328062980'7324a71469556bcd1f3904ba090ab685 PositiveNegative Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak Aharon_Barak
  4. 4. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness”
  5. 5. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness” Results are evaluated in a single batch (temporal aspects are not considered)
  6. 6. CCR @TREC 2012 KBA - Cumulative citation recommendation - Filter a time-ordered corpus for documents that are highly relevant to a predefined set of entities - For each entity, provide a ranked list of documents based on their “citation-worthiness” Evaluation metrics are set-based (using a confidence cut-off)
  7. 7. Aims - Develop a time-aware evaluation paradigm for streaming collections - Capture how retrieval effectiveness changes over time - Deal with ground truth of bursty nature - Accommodate various underlying user models - Test the ideas on CCR
  8. 8. Overview time 1. Slicing time 2. Measuring slice relevance 3. Aggregating slice relevance.87 .65 Slice importance
  9. 9. Overview time .87 .65 Slice importance 1. Slicing time
  10. 10. Slicing time - Simplifying assumptions - Slices are non-overlapping - Unconcerned about slices that don’t contain any relevant documents (A) Uniform slicing - Slices of equal length (B) Non-uniform slicing - Slices of varying length #relevant time (A) (B) ti
  11. 11. Overview time .87 .65 Slice importance 2. Measuring slice relevance
  12. 12. Measuring slice relevance - Ranked list of documents within a given slice - Evaluation metric - Standard IR metrics - MAP, R-Prec, NDCG d =< d1, . . . , dn > m(di, q)
  13. 13. Overview time .87 .65 Slice importance 3. Aggregating slice relevance
  14. 14. Aggregating slice relevance - Probabilistic formulation to estimate the likelihood of relevance P(r = 1|d, q, m) = X i2I P(r = 1|di, q, i)P(i|q) Slice-based relevance Slice importance ⇡ m(di, q)
  15. 15. Slice importance - Uniform slicing - All slices are equally important - Non-uniform slicing - Bursty periods (i.e., slices with more relevant documents) are more important P(i|q) = 1 I P(i|q) = #R(i, q) P i2I #R(i, q)
  16. 16. Experiments - Official TREC 2012 KBA CCR runs - 8 systems, best run for each system - Only uniform time slicing - Binary relevance
  17. 17. Results Atemporal vs. temporal ranking (MAP, weekly slicing) 0 0.15 0.3 0.45 0.6 UvA udel_fang LSIS CWI UMass_CIIR uiucGSLIS hltcoe igpi2012 helsinki Atemporal Temporal (uniform slice weighting) Temporal (non-uniform slice weighting)
  18. 18. 0 0.175 0.35 0.525 0.7 UvA udel_fang LSIS CWI UMass_CIIR uiucGSLIS hltcoe igpi2012 helsinki Atemporal Temporal (uniform slice weighting) Temporal (non-uniform slice weighting) Results Atemporal vs. temporal ranking (MAP, daily slicing)
  19. 19. Zooming in atemporal (MAP) temporal (MAP)temporal (MAP)temporal (MAP)temporal (MAP) atemporal (MAP) weekly slicingweekly slicing daily slicingdaily slicing atemporal (MAP) uniform non-uniform uniform non-uniform LSIS 0.48 0.52 0.54 0.60 0.62 CWI 0.45 0.48 0.51 0.62 0.63 LSIS CWI
  20. 20. Findings - Top performing teams are (almost) always the same, independent of the metric - Temporal evaluation provides additional insights
  21. 21. Wrap-up - Framework for temporal evaluation - Applied to the evaluation of TREC 2012 KBA CCR systems - Future work - Non-uniform slice weighting - Other streaming tasks/collections (e.g., microblog search) - Generalize to other time-aware information access tasks
  22. 22. Questions? Online appendix: http://ciir.cs.umass.edu/~dietz/streameval/

×