Be the first to like this
The Internet has been widely lauded as a great equalizer of information access. However, the absence of any central authority on content places the burden on the end-user to verify the quality of the information accessed. We have examined the accuracy of the chemical structures of ca. 200 major pharmaceutical products that can be found on the internet. We have demonstrated that while erroneous structures are commonplace, it is possible to determine the correct structures by utilizing a carefully defined structure validation workflow. In addition, we and others have shown that the use of un-curated structures affects the accuracy of cheminformatics investigations such as QSAR modeling. Furthermore, models built for carefully curated datasets can be used to correct erroneously reported biological data. We posit that chemical datasets must be carefully curated prior to any cheminformatics investigations. We summarize best practices developed in our groups for data curation.