Alice: "What version of ChEMBL are we using?"
Bob: "Er…let me check. It's going to take a while, I'll get back to you."
This simple question took us the best part of a month to resolve and involved several individuals. Knowing the provenance of your data is essential, especially when using large complex systems that process multiple datasets.
The underlying issues of this simple question motivated us to improve the provenance data in the Open PHACTS project. We developed a guideline for dataset descriptions where the metadata is carried with the data. In this talk I will highlight the challenges we faced and give an overview of our metadata guidelines.
Presentation given to the W3C Semantic Web for Health Care and Life Sciences Interest Group on 14 January 2013.