Identifying and curating documentary evidence from textual corpora is an essential part of empirical research in the humanities. Initially, we discuss "themed" evidence - traces of a fact or situation relevant to a theme of interest and focus on the problem of identifying them in texts. To that end, we combine statistical NLP, background knowledge, and Semantic Web technologies in a hybrid approach. We illustrate the method's effectiveness in a case study of a database of evidence of experiences of listening to music. We also evidence its generality by testing it on a different use case in the digital humanities. Finally, we ponder the applicability of knowledge extraction techniques to automatically populate a database of documentary evidence and discuss the challenges from the point of view of scientific knowledge acquisition.