This document provides an overview of evaluation measures for information retrieval systems. It discusses why evaluation is important for improving systems and measuring user satisfaction. Key points include:
- Common set-based measures include recall, precision, and F-measure. Ranked retrieval measures include average precision (AP), normalized discounted cumulative gain (nDCG), expected reciprocal rank (ERR), and Q-measure for graded relevance.
- Measures for diversified search aim to balance relevance and diversity across different user intents. Examples given include α-nDCG, ERR-IA, D#-nDCG, and U-IA.
- Statistical significance testing allows determining whether differences between systems are likely real or due to chance. The t