Comparison of Compounds-to-targets between Databases

1,107 views

Published on

Bio-IT_2011

Published in: Health & Medicine, Technology
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
1,107
On SlideShare
0
From Embeds
0
Number of Embeds
5
Actions
Shares
0
Downloads
17
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide
  • Primary, Secondary and tertiary literature Overlap BDB, ChEMBL PC
  • MLSN collection and/or NGSC screening collections
  • No ChEMBL activity flag FDA lable is hemi-calcium trihydrate PDB not an assay
  • Yeast growth assay as new screen
  • Expression pattern - party hub or date hub
  • No TTD
  • Comparison of Compounds-to-targets between Databases

    1. 1. Comparison of Compound-to-TargetRelationships in Chemogenomic and Drug DatabasesAprill 2012 update: FYI these two blog posts are on the same theme http://cdsouthan.blogspot.se/2012/01/our-human-beta-lactamase-is-not_09.html http://cdsouthan.blogspot.se/2011/08/compound-to-target-mappings-part-i.html Chris Southan ChrisDS Consulting, Göteborg, Sweden, Presented to the NCBI PubChem team on 11 April. the BioIT World Chemogenomics and Toxicogenomics Workshop on 12 April Boston, USA, and as a shorter version, the ChEMBL users meeting at the EBI, 27 may 2011 [1]
    2. 2. Aknowledgments and Context• I profoundly appreciate the efforts of those who develop, manage and maintain public resources specified here and many others I enjoy acessing• I have some history in evaluating the utility, exploitation and content quality of both bioinformatics and cheminformatics databases. I thus enjoy the dual roles (roughly in equal parts) of both fan and critic• All databases have imperfections. This presentation investigates a selection of these but critical analysis should not be missinterpreted as disparaging either the quality of primary sources or the work of curators and database teams [2]
    3. 3. Outline• Mapping concepts sources and challenges• Extremes of the distribution• Atorvastatin, drug-to-targets• Hmg-CoA reductase target-to-drugs• Equivocal mapping examples• Exploring data intersects• Complex targets• Conclusions and outlook [3]
    4. 4. Activity-to-compound-to-protein Mapping: Capturing Relationships Between four Concepts MAQALPWLLLWMGAGVLPAHGTQHGIRLPLRSGLGG APLGLRLPRETDEEPEEPGRRGSFVEMVDNLRGKSGQ GYYVEMTVGSPPQTLNILVDTGSSNFAVGAAPHPFLHR YYQRQLSSTYRDLRKGVYVPYTQGKWEGELGTDLVSI PHGPNVTVRANIAAITESDKFFINGSNWEGILGLAYAEI ARPDDSLEPFFDSLVKQTHVPNLFSLQLCGAGFPLNQSE VLASVGGSMIIGGIDHSLYTGSLWYTPIRREWYYEVIIV RVEINGQDLKMDCKEYNYDKSIVDSGTTNLRLPKKVFE AAVKSIKAASSTEKFPDGFWLGEQLVCWQAGTTPWNI FPVISLYLMGEVTNQSFRITILPQQYLRPVEDVATSQDD CYKFAISQSSTGTVMGAVIMEGFYVVFDRARKRIGFAV SACHVHDEFRTAAVEGPFVTLDMEDCGYNIPQTDESTL MTIAYVMAAICALFMLPLCLMVCQWRCLRCLRQQHD DFADDISLLKDocument Assay Result Compound Protein Expert extraction and curationUnstructured data Structured dataPapers & Patents Databases [4]
    5. 5. The D-A-R-C-P Axis Pathway/module/ system [5]
    6. 6. Compound and drug-to-target Collations D-A-R-C-P Targets = 5,662 protein targets, cpds = 284,206 data points = 648,915, D-A-R-C-P Targets = 8,091 Small Molecules = 658,075, data points = 3,030,317 (D)-A-R-C-P-SBioAssays extracted from literature (ChEMBL) = 499,520, Direct screening assays = 3,208, active Compounds = 23,677, Targets = 447 D-C-P-S Approved cpds = 1431 , Targets = 1458, Experimental cpds = 5212, research targets = 3206 D-C-P Targets = 358 successful, 251 clinical trial and 1,254 research, Drugs = 1,511 approved, 1,118 clinical trial and 2,331 experimental [6]
    7. 7. PDB Drug-to- Protein Mappingsin DrugPort [7]
    8. 8. Target Mapping: Curatorial Challenges• Target = (infered) direct binding• Primary (bona fide) target = therapeutic causality• Polytargets = multiple• Para-target = sub-family specificity• Ortho-target = cross-species specificity• Cross-screen = non-homologous• Non-target (e.g. trypsin, albumin)• Off-target = liability (ADR or side effect)• Anti-target = known libaility (e.g. HERG)• Indirect target = non-binding (e.g. APP)• Complex = resolvable to sequence IDs (eg proteosome)• Complex = experimentaly unresolved (e.g. PDE5s)• Ambigous = lack of metadata or curatorial judgment (e.g. BACE)• Non-canonical = where metadata specifies mutation, splice or PTM• Metabo-target = metabolic interactions• Transport-target = transporters [8]
    9. 9. Drug-target Networks [9]
    10. 10. One target-to-many compounds: Dopamine Receptor D2 [10]
    11. 11. One compound-to-(367)- proteins [11]
    12. 12. Mapping sources forthe top selling drug [12]
    13. 13. Target Matrix for Atorvastatin Swiss-Prot ChEMBL TTD DrugBa PubChem (BindingD nk B) HMDH_HUMAN X X X (PDB) X HMDH_RAT X X DPP4_HUMAN X DPP4_PIG X AHR_HUMAN X [13]
    14. 14. Other Statins: Different BioAssayCoverages [14]
    15. 15. Diferent PubChem CIDs map to differentsubmissions, structures and activity profiles Atorvastatin -> 10 CID name matches Substances 397 Links Same structure: 33 Links Mixture: 364 Links CID 60823 39 canonical Substances: 19 Links [15]
    16. 16. Vice-versa, Compounds-to-target: HMG-CoA [16]
    17. 17. Drugs mapped to HMG-CoA as targetSwiss-Prot cross-reference [17]
    18. 18. Equivocal Mappings [18]
    19. 19. Swiss-Prot Target Intersects• 1,627 results for database:(type:drugbank)• 297 results for database:(type:bindingdb)• 45 results for database:(type:bindingdb) AND database:(type:drugbank) AND organism:"Homo sapiens [19]
    20. 20. MixedMappings [20]
    21. 21. Mannitol: drug ? yes -ligand ? yes ? target ? no [21]
    22. 22. Polypropylene Glycol: drug ? no, ligand ? maybe, target ? no [22]
    23. 23. E-2012: False-negative? [23]
    24. 24. Antifreeze: drug ?, no, ligand ? no, 154 targets ? no Wikipedia: Ethylene glycol is moderately toxic with an oral LDLO = 786 mg/kg for humans [24]
    25. 25. Crowdsourcing Works ! [25]
    26. 26. Curation Challenges [26]
    27. 27. Secretase matches in TTDMixed-concept targetsbut no small-molecule true positives [27]
    28. 28. Gamma Secretase Activity: Variable Subunit Mappings [28]
    29. 29. APP: Indirect Target, three mechanisms “for small molecules that suppress the Amyloid Precursor Protein (APP) translation by binding to the 5Untranslated Region of the APP mRNA [29]
    30. 30. Proteasome: Target Descriptions and Cross- screens for Bortzemib [30]
    31. 31. PubChem Compound Intersects:Primary Drug Targets with Screening data [31]
    32. 32. Mycophenolic acid [32]
    33. 33. Mycophenolic acid and Prodrug: Complex mappings • Primary Target human IMPDH2 • IMPDH1 ? • IMPDH2 hamster • IMPDH2 Tritrichomonas • myfortic is an enteric-coated formulation of MPA in a delayed- release tablet. [33]
    34. 34. Conclusions• Compared to what we had even a few years ago, let alone in LBPC (life-before-PubChem) these compound-to-protein sources are fantastic• However, most things that could go wrong have• We don’t often see QC statistics• Data coverage is patchy, ad hoc and can be circular• If you operate on these data at large scale you have no choice but to ”trust and filter”• If detailed realationships are important you need to ”verify and judge” back to the primary source• You can only really do this if you have at least some in vitro background rather than just in silico [34]
    35. 35. Wouldn’t it be nice if we had ....• Interpreted mapping distribution statisitcs for each database• Details about extraction triages, curation rules and parsing logic• Harmonisation of mapping rules and cross-comparison of content• Clear declarations and statistics of circularity between databases• Curator judgments overuling document primacy• Consolidated and extended Swiss-Prot cross-references• Assay and target ontologies (Pistoia ? Open Phacts ?)• “Standardization of Enzyme Data” (STRENDA, http://www.beilstein- institut.de/en/projekte/strenda/)• “Minimum Information About a Bioactive Entity” (MIABE, http://www.psidev.info/index.php?q=node/394) [35]
    36. 36. Our Effortshttp://www.jcheminf.com/content/3/1/14http://www.jcheminf.com/content/1/1/10 [36]
    37. 37. [37]

    ×