Be the first to like this
Christopher Southan (The IUPHAR/BPS Guide to PHARMACOLOGY, UK)
As of August 2017, the major automated patent chemistry extractions (in ascending size, NextMove, SCRIPDB, IBM and SureChEMBL) are included submitters for 21.5 million CIDs from the PubChem total of 93.8. The following aspects will be expanded in this presentation, starting with advantages; a) while the relative coverage between open and commercial sources is difficult to determine (PMID 26457120) it is clear that the majority of patent-exemplified structures of medicinal chemistry interest (i.e. from C07 plus A61) are now in PubChem b) this allows most first-filings of lead series and clinical candidates to be tracked d) the PubChem tool box has query, analysis, clustering and linking features difficult to match in commercial sources, e) many structures can be associated with bioactivity data f) connections between manually curated papers and patents can be made via the 0.48 million CID intersects with ChEMBL. However, looking more closely also indicates disadvantages; a) extraction coverage is compromised by dense image tables and poor OCR quality of WO documents, b) SureChEMBL is the only major open pipeline continuously running in situ but has a PubChem updating lag, c) automated extraction generates structural “noise” that degrades chemistry quality d) PubChem patent document metadata indexing is patchy (although better for SureChEMBL in situ) d) nothing in the records indicateas IP status, e) continual re-extraction of common chemistry results in over-mapping (e.g. 126,949 patents for aspirin and 14,294 for atorvastatin), f) authentic compounds are contaminated with spurious mixtures and never-made virtuals, including 1000s of deuterated drugs g) linking between assay data and targets is still a manual exercise. However, all things considered the PubChem patent “big bang” presents users with the best of both worlds (PMID 26194581). Academics or smaller enterprises who cannot afford commercial solutions can now patent mine extensively. For those who have such subscriptions, PubChem has become an essential adjunct/complementary source for the analysis of patent chemistry and associated bio entities such as diseases and drug targets.