The BADAPPLE promiscuity plugin for BARD
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

The BADAPPLE promiscuity plugin for BARD

on

  • 659 views

The BADAPPLE promiscuity plugin for BARD; Evidence-based promiscuity scores (ACS National Meeting, Indianapolis, IN, Sept. 9, 2013)

The BADAPPLE promiscuity plugin for BARD; Evidence-based promiscuity scores (ACS National Meeting, Indianapolis, IN, Sept. 9, 2013)

Statistics

Views

Total Views
659
Views on SlideShare
659
Embed Views
0

Actions

Likes
0
Downloads
12
Comments
1

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

The BADAPPLE promiscuity plugin for BARD Presentation Transcript

  • 1. The BADAPPLE promiscuity plugin for BARD Evidence-based promiscuity scores Jeremy Yang, UNM Oleg Ursu, UNM Cristian Bologa, UNM Anna Waller, UNM Larry Sklar, UNM Tudor Oprea, UNM ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources
  • 2. What is BADAPPLE? • BioActivity Data Associative Promiscuity Pattern Learning Engine • Bioassay data analysis algorithm • Scaffold association patterns • Evidence-based • Robust to noise and errors UNM
  • 3. What is promiscuity? • Un-selective bioactivity • Normally undesirable • Evolving conceptions: – Polypharmacology – Systems biology – Systems chemical biology 3
  • 4. Promiscuity & bioassay data analysis: Selected references • Frequent hitters (2002, Schneider &al.) • Aggregators (2003, Shoichet &al.) • ALARM NMR (2004, Hajduk &al.) • Promiscuous scaffolds [talk] (2007, Bologa) • Pan-Assay Interference (2010, Baell &al.) • PubChem Promiscuity (2011, Canny &al.) 4
  • 5. Prerequisites for Promiscuity Data Analysis • Definition of unique chemical entity? – Yes • Definition of unique biological entity? – Challenging • I.e., Rigorously calibrated bioactivity matrix 5 CN/C(=C[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
  • 6. 2211 Chemicals On 159 Targets Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011 Bioactivity matrices depend on rigorous informatics 6
  • 7. 2211 Chemicals On 159 Targets Bioactivity matrices depend on rigorous informatics chemistry biology “1 chemistry” = unique compound or… = unique scaffold “1 biology” = unique target? or… = unique assay? Chemical biology space: To define a space must define a point 7 Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011
  • 8. Promiscuity: related concepts • Assay-interferers • Experimental artifacts • Frequent hitters • False positives • True positives • True non-selective actives • Aggregators • Reactives • Cytotoxic 8
  • 9. Badapple promiscuity score: a practical definition for bioassay data analysis • Purpose: Streamline molecular discovery project workflow. • Hence: Score designed to detect "false trails" (true-promiscuous OR false positives) unlikely to be productive leads. 9
  • 10. What is evidence-based? • Empirical data analysis • Mistakes can be un-learned. • Data driven • Not: Expert systems 10
  • 11. Drawbacks of evidence-based • Theories proven wrong. Ouch. • Reality is messy. • GIGO. 11
  • 12. Benefits of evidence-based More wins. More knowledge. Lower costs. Progress. (*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner. 12 (*)
  • 13. BADAPPLE scoring function • Scaffold score • By scoring scaffolds, more relevant evidence • Substances, assays and samples considered • Penalize under-sampling • Score is a statistic, not a "model"! • Inherently "validated" 13
  • 14. Avoiding overfitting; being skeptical of scanty evidence 14
  • 15. Avoiding overfitting: Moneyball style Sabermetrics leads the way. The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, Nate Silver (2012). http://mark.stubbornlights.org/phils/archive s/2004_05.html 15
  • 16. *http://pasilla.health.unm.edu/tomcat/biocomp/badapple BADAPPLE public web app 16
  • 17. Why Scaffolds? biology - birds, bees, nature chemistry - chemists SAR - drugs IP - lawyers
  • 18. Molecular scaffolds are special and useful guides for discovery Jeremy Yang, UNM & IU Cristian Bologa, UNM David Wild, IU Tudor Oprea, UNM ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm
  • 19. There's something about scaffolds... Especially the "privileged few"... pScore Count Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714 19 Promiscuity score distribution
  • 20. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... % of active samples i_scaf, descending pScore order Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 20
  • 21. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 % total activity # scaffolds % scaffolds All 50% 1979 1.4% All 75% 11,645 8% Drugs 50% 54 2.8% Drugs 90% 327 16.7% “total activity” = active scaffold-instances 21
  • 22. Scaffold visualization: Molecule Cloud (Badapple-scaled) The Molecule Cloud - compact visualization of large collections of molecules, Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12. 22
  • 23. What is BARD? • BioAssay Research Database, http://bard.nih.gov • MLP: 1000+ assays, 400k+ cpds, 200M data • Manual assay annotations + QA • BARD Assay Ontology • Assay Data Standard • Community platform for bioassay data analysis & computation 25
  • 24. BARD + Badapple Synergy • BARD ontology, based on BAO • BARD raising "semantic IQ" of public bioassay data. • Evidence = information = data + metadata http://bard.nih.gov/
  • 25. BARD Plugin Platform: Community development Enterprise deployment • BARD Plugin spec (IPlugin interface) • BARD PluginValidator class • Java, JAX-RS, Jersey, REST • BARD REST API • WAR deployment, discoverable fig c/o Steve Brudz, Broad
  • 26. Badapple Plugin via BARD web client 28 Discovery scenario: Rapidly flags potentially problematic (notorious) scaffolds.
  • 27. BARD Synergy, next steps: Raising Badapple to the next level Assay Definition Assay Protocol Biology BARD Classes biology chemistry Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.
  • 28. BARD Synergy, next steps: Raising Badapple to the next level Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology. Biology Types Assay Formats Protein Classes Some re-calibrations of interest:
  • 29. BARD Synergy, next steps: Raising Badapple to the next level cell-based format, 540, 60%single protein format, 161, 18% biochemical format, 76, 8% protein complex format, 38, 4% whole-cell lysate format, 32, 4% small-molecule format, 11, 1% protein format, 10, 1% Assays: assay format cell-based format single protein format biochemical format protein complex format whole-cell lysate format small-molecule format protein format tissue-based format nucleic acid format organism-based format cell membrane format microsome format cell-free format mitochondrion format (blank)
  • 30. BARD Synergy, next steps: Raising Badapple to the next level PROTEIN, 514, 47% PROCESS, 341, 31% GENE, 217, 20% Biology Types PROTEIN PROCESS GENE biology AASUBSTITUTION NCBI
  • 31. BARD Synergy, next steps: Raising Badapple to the next level 33 transporter, 112, 28% receptor, 56, 14% nucleic acid binding, 52, 13% enzyme modulator, 45, 11% cell adhesion molecule, 24, 6% transferase, 23, 6% hydrolase, 22, 6% signaling molecule, 19, 5% extracellular matrix protein, 16, 4% oxidoreductase, 11, 3% membrane traffic protein, 5, 1% Protein Classes transporter receptor nucleic acid binding enzyme modulator cell adhesion molecule transferase hydrolase signaling molecule extracellular matrix protein oxidoreductase membrane traffic protein transfer/carrier protein transcription factor cytoskeletal protein storage protein isomerase defense/immunity protein
  • 32. Conclusions • Badapple exemplar as BARD plugin • Badapple exemplar as evidence-based algorithm • New BARD semantic capabilities will elevate Badapple to next level. • Promiscuity a complex issue. 34
  • 33. Acknowledgements • Steve Mathias, UNM • Chris Lipinski, Melior • Rajarshi Guha, NCATS • Stephan Schurer, UMiami • Uma Vempati, UMiami • Mark Southern, Scripps • BARD Engineering Working Group 35