The document presents a methodology for natural language processing (NLP) data cleansing based on linguistic ontology constraints, focusing on two main vocabularies: LEMon and NIF. It highlights the challenges and common errors in existing NLP datasets, proposes a test-driven evaluation approach to enhance data quality, and provides examples of successful test cases. The findings indicate a substantial number of errors in current datasets and suggest future work to extend methodologies to additional NLP ontologies.