This document summarizes key aspects of evaluating information retrieval systems, including:
- Precision and recall are common performance measures, where precision measures the percentage of retrieved documents that are relevant and recall measures the percentage of relevant documents retrieved.
- Other measures include mean average precision (MAP), which averages precision scores across queries, and R-precision, which measures precision after R relevant documents are retrieved, where R is the total number of relevant documents.
- Precision and recall can be plotted on a graph to show their tradeoff, with interpolation used to calculate precision at standard recall levels for better comparison of systems.
- Relevance judgments can be subjective, situational, and dynamic, making evaluation of IR systems challenging.