The document discusses the validity and reliability of using Cranfield-like test collections to evaluate information retrieval systems. It outlines some key assumptions made in using system-based measures as proxies for user-based measures of satisfaction. Specifically, it assumes system measures correlate with user satisfaction, but questions whether the mapping is accurate. It also notes reliability depends on test collection size and representation. The document then describes an experiment comparing user preferences to system measure scores to evaluate the validity of different measures and relevance scales in predicting user satisfaction with single or multiple retrieval systems.