This document discusses various evaluation measures used in information retrieval and natural language processing. It describes precision, recall, and the F1 score as fundamental measures for unranked retrieval sets. It also covers averaged precision and recall, accuracy, novelty and coverage ratios. For ranked retrieval sets, it discusses recall-precision graphs, interpolated recall-precision, precision at k, R-precision, ROC curves, and normalized discounted cumulative gain (NDCG). The document also discusses agreement measures like Kappa statistics and parses evaluation measures like Parseval and attachment scores.