The BADAPPLE promiscuity plugin for BARD

1,216 views

Published on

The BADAPPLE promiscuity plugin for BARD; Evidence-based promiscuity scores (ACS National Meeting, Indianapolis, IN, Sept. 9, 2013)

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total views
1,216
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
28
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

The BADAPPLE promiscuity plugin for BARD

  1. 1. The BADAPPLE promiscuity plugin for BARD Evidence-based promiscuity scores Jeremy Yang, UNM Oleg Ursu, UNM Cristian Bologa, UNM Anna Waller, UNM Larry Sklar, UNM Tudor Oprea, UNM ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources
  2. 2. What is BADAPPLE? • BioActivity Data Associative Promiscuity Pattern Learning Engine • Bioassay data analysis algorithm • Scaffold association patterns • Evidence-based • Robust to noise and errors UNM
  3. 3. What is promiscuity? • Un-selective bioactivity • Normally undesirable • Evolving conceptions: – Polypharmacology – Systems biology – Systems chemical biology 3
  4. 4. Promiscuity & bioassay data analysis: Selected references • Frequent hitters (2002, Schneider &al.) • Aggregators (2003, Shoichet &al.) • ALARM NMR (2004, Hajduk &al.) • Promiscuous scaffolds [talk] (2007, Bologa) • Pan-Assay Interference (2010, Baell &al.) • PubChem Promiscuity (2011, Canny &al.) 4
  5. 5. Prerequisites for Promiscuity Data Analysis • Definition of unique chemical entity? – Yes • Definition of unique biological entity? – Challenging • I.e., Rigorously calibrated bioactivity matrix 5 CN/C(=C[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
  6. 6. 2211 Chemicals On 159 Targets Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011 Bioactivity matrices depend on rigorous informatics 6
  7. 7. 2211 Chemicals On 159 Targets Bioactivity matrices depend on rigorous informatics chemistry biology “1 chemistry” = unique compound or… = unique scaffold “1 biology” = unique target? or… = unique assay? Chemical biology space: To define a space must define a point 7 Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011
  8. 8. Promiscuity: related concepts • Assay-interferers • Experimental artifacts • Frequent hitters • False positives • True positives • True non-selective actives • Aggregators • Reactives • Cytotoxic 8
  9. 9. Badapple promiscuity score: a practical definition for bioassay data analysis • Purpose: Streamline molecular discovery project workflow. • Hence: Score designed to detect "false trails" (true-promiscuous OR false positives) unlikely to be productive leads. 9
  10. 10. What is evidence-based? • Empirical data analysis • Mistakes can be un-learned. • Data driven • Not: Expert systems 10
  11. 11. Drawbacks of evidence-based • Theories proven wrong. Ouch. • Reality is messy. • GIGO. 11
  12. 12. Benefits of evidence-based More wins. More knowledge. Lower costs. Progress. (*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner. 12 (*)
  13. 13. BADAPPLE scoring function • Scaffold score • By scoring scaffolds, more relevant evidence • Substances, assays and samples considered • Penalize under-sampling • Score is a statistic, not a "model"! • Inherently "validated" 13
  14. 14. Avoiding overfitting; being skeptical of scanty evidence 14
  15. 15. Avoiding overfitting: Moneyball style Sabermetrics leads the way. The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, Nate Silver (2012). http://mark.stubbornlights.org/phils/archive s/2004_05.html 15
  16. 16. *http://pasilla.health.unm.edu/tomcat/biocomp/badapple BADAPPLE public web app 16
  17. 17. Why Scaffolds? biology - birds, bees, nature chemistry - chemists SAR - drugs IP - lawyers
  18. 18. Molecular scaffolds are special and useful guides for discovery Jeremy Yang, UNM & IU Cristian Bologa, UNM David Wild, IU Tudor Oprea, UNM ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm
  19. 19. There's something about scaffolds... Especially the "privileged few"... pScore Count Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714 19 Promiscuity score distribution
  20. 20. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... % of active samples i_scaf, descending pScore order Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 20
  21. 21. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 % total activity # scaffolds % scaffolds All 50% 1979 1.4% All 75% 11,645 8% Drugs 50% 54 2.8% Drugs 90% 327 16.7% “total activity” = active scaffold-instances 21
  22. 22. Scaffold visualization: Molecule Cloud (Badapple-scaled) The Molecule Cloud - compact visualization of large collections of molecules, Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12. 22
  23. 23. What is BARD? • BioAssay Research Database, http://bard.nih.gov • MLP: 1000+ assays, 400k+ cpds, 200M data • Manual assay annotations + QA • BARD Assay Ontology • Assay Data Standard • Community platform for bioassay data analysis & computation 25
  24. 24. BARD + Badapple Synergy • BARD ontology, based on BAO • BARD raising "semantic IQ" of public bioassay data. • Evidence = information = data + metadata http://bard.nih.gov/
  25. 25. BARD Plugin Platform: Community development Enterprise deployment • BARD Plugin spec (IPlugin interface) • BARD PluginValidator class • Java, JAX-RS, Jersey, REST • BARD REST API • WAR deployment, discoverable fig c/o Steve Brudz, Broad
  26. 26. Badapple Plugin via BARD web client 28 Discovery scenario: Rapidly flags potentially problematic (notorious) scaffolds.
  27. 27. BARD Synergy, next steps: Raising Badapple to the next level Assay Definition Assay Protocol Biology BARD Classes biology chemistry Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.
  28. 28. BARD Synergy, next steps: Raising Badapple to the next level Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology. Biology Types Assay Formats Protein Classes Some re-calibrations of interest:
  29. 29. BARD Synergy, next steps: Raising Badapple to the next level cell-based format, 540, 60%single protein format, 161, 18% biochemical format, 76, 8% protein complex format, 38, 4% whole-cell lysate format, 32, 4% small-molecule format, 11, 1% protein format, 10, 1% Assays: assay format cell-based format single protein format biochemical format protein complex format whole-cell lysate format small-molecule format protein format tissue-based format nucleic acid format organism-based format cell membrane format microsome format cell-free format mitochondrion format (blank)
  30. 30. BARD Synergy, next steps: Raising Badapple to the next level PROTEIN, 514, 47% PROCESS, 341, 31% GENE, 217, 20% Biology Types PROTEIN PROCESS GENE biology AASUBSTITUTION NCBI
  31. 31. BARD Synergy, next steps: Raising Badapple to the next level 33 transporter, 112, 28% receptor, 56, 14% nucleic acid binding, 52, 13% enzyme modulator, 45, 11% cell adhesion molecule, 24, 6% transferase, 23, 6% hydrolase, 22, 6% signaling molecule, 19, 5% extracellular matrix protein, 16, 4% oxidoreductase, 11, 3% membrane traffic protein, 5, 1% Protein Classes transporter receptor nucleic acid binding enzyme modulator cell adhesion molecule transferase hydrolase signaling molecule extracellular matrix protein oxidoreductase membrane traffic protein transfer/carrier protein transcription factor cytoskeletal protein storage protein isomerase defense/immunity protein
  32. 32. Conclusions • Badapple exemplar as BARD plugin • Badapple exemplar as evidence-based algorithm • New BARD semantic capabilities will elevate Badapple to next level. • Promiscuity a complex issue. 34
  33. 33. Acknowledgements • Steve Mathias, UNM • Chris Lipinski, Melior • Rajarshi Guha, NCATS • Stephan Schurer, UMiami • Uma Vempati, UMiami • Mark Southern, Scripps • BARD Engineering Working Group 35

×