The BADAPPLE promiscuity
plugin for BARD
Evidence-based promiscuity
scores
Jeremy Yang, UNM
Oleg Ursu, UNM
Cristian Bologa...
What is BADAPPLE?
• BioActivity Data Associative Promiscuity Pattern
Learning Engine
• Bioassay data analysis algorithm
• ...
What is promiscuity?
• Un-selective bioactivity
• Normally undesirable
• Evolving conceptions:
– Polypharmacology
– System...
Promiscuity & bioassay data analysis:
Selected references
• Frequent hitters (2002, Schneider &al.)
• Aggregators (2003, S...
Prerequisites for Promiscuity Data
Analysis
• Definition of unique chemical entity?
– Yes
• Definition of unique biologica...
2211 Chemicals On 159 Targets
Matrix c/o Tudor
Oprea, Scott Boyer
(AZ), 2011
Bioactivity matrices depend on rigorous
infor...
2211 Chemicals On 159 Targets
Bioactivity matrices depend on rigorous
informatics
chemistry
biology
“1 chemistry”
= unique...
Promiscuity: related concepts
• Assay-interferers
• Experimental
artifacts
• Frequent hitters
• False positives
• True pos...
Badapple promiscuity score:
a practical definition for bioassay
data analysis
• Purpose: Streamline molecular discovery
pr...
What is evidence-based?
• Empirical data analysis
• Mistakes can be un-learned.
• Data driven
• Not: Expert systems
10
Drawbacks of evidence-based
• Theories proven wrong. Ouch.
• Reality is messy.
• GIGO.
11
Benefits of evidence-based
More wins. More knowledge. Lower costs. Progress.
(*) N.B. key role of Dick Cramer, ACS CINF 20...
BADAPPLE scoring function
• Scaffold score
• By scoring scaffolds, more relevant evidence
• Substances, assays and samples...
Avoiding overfitting; being skeptical
of scanty evidence
14
Avoiding overfitting: Moneyball style
Sabermetrics leads the way.
The Signal and the Noise: Why So Many
Predictions Fail—B...
*http://pasilla.health.unm.edu/tomcat/biocomp/badapple
BADAPPLE public web app
16
Why Scaffolds?
biology - birds, bees, nature
chemistry - chemists
SAR - drugs
IP - lawyers
Molecular scaffolds are special
and useful guides for discovery
Jeremy Yang, UNM & IU
Cristian Bologa, UNM
David Wild, IU
...
There's something about scaffolds...
Especially the "privileged few"...
pScore
Count
Dataset:
BARD,
MLSMR,
MLP HTS
Totals:...
Scaffolds & drug-scaffolds, the privileged few
explaining a lot of activity...
% of
active
samples
i_scaf, descending pSco...
Scaffolds & drug-scaffolds, the privileged few
explaining a lot of activity...
Dataset:
BARD,
MLSMR,
MLP HTS
Totals: compo...
Scaffold visualization: Molecule Cloud
(Badapple-scaled)
The Molecule Cloud - compact visualization of large collections o...
What is BARD?
• BioAssay Research Database, http://bard.nih.gov
• MLP: 1000+ assays, 400k+ cpds, 200M data
• Manual assay ...
BARD + Badapple Synergy
• BARD ontology, based on BAO
• BARD raising "semantic IQ" of public bioassay
data.
• Evidence = i...
BARD Plugin Platform:
Community development
Enterprise deployment
• BARD Plugin spec (IPlugin interface)
• BARD PluginVali...
Badapple Plugin via BARD web client
28
Discovery
scenario:
Rapidly flags
potentially
problematic
(notorious)
scaffolds.
BARD Synergy, next steps:
Raising Badapple to the next level
Assay Definition
Assay Protocol
Biology
BARD Classes
biology
...
BARD Synergy, next steps:
Raising Badapple to the next level
Re-calibrating the bioactivity matrix for improved rigor,
acc...
BARD Synergy, next steps:
Raising Badapple to the next level
cell-based format, 540,
60%single protein format,
161, 18%
bi...
BARD Synergy, next steps:
Raising Badapple to the next level
PROTEIN, 514, 47%
PROCESS, 341, 31%
GENE, 217, 20%
Biology Ty...
BARD Synergy, next steps:
Raising Badapple to the next level
33
transporter, 112, 28%
receptor, 56, 14%
nucleic acid bindi...
Conclusions
• Badapple exemplar as BARD plugin
• Badapple exemplar as evidence-based
algorithm
• New BARD semantic capabil...
Acknowledgements
• Steve Mathias, UNM
• Chris Lipinski, Melior
• Rajarshi Guha, NCATS
• Stephan Schurer, UMiami
• Uma Vemp...
Upcoming SlideShare
Loading in...5
×

The BADAPPLE promiscuity plugin for BARD

625

Published on

The BADAPPLE promiscuity plugin for BARD; Evidence-based promiscuity scores (ACS National Meeting, Indianapolis, IN, Sept. 9, 2013)

Published in: Technology
1 Comment
0 Likes
Statistics
Notes
  • Be the first to like this

No Downloads
Views
Total Views
625
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
20
Comments
1
Likes
0
Embeds 0
No embeds

No notes for slide

The BADAPPLE promiscuity plugin for BARD

  1. 1. The BADAPPLE promiscuity plugin for BARD Evidence-based promiscuity scores Jeremy Yang, UNM Oleg Ursu, UNM Cristian Bologa, UNM Anna Waller, UNM Larry Sklar, UNM Tudor Oprea, UNM ACS National Meeting – Indianapolis, IN -- Sep. 8-12, 2013 CINF: Integrative Chemogenomics Knowledge Mining Using NIH Open Access Resources
  2. 2. What is BADAPPLE? • BioActivity Data Associative Promiscuity Pattern Learning Engine • Bioassay data analysis algorithm • Scaffold association patterns • Evidence-based • Robust to noise and errors UNM
  3. 3. What is promiscuity? • Un-selective bioactivity • Normally undesirable • Evolving conceptions: – Polypharmacology – Systems biology – Systems chemical biology 3
  4. 4. Promiscuity & bioassay data analysis: Selected references • Frequent hitters (2002, Schneider &al.) • Aggregators (2003, Shoichet &al.) • ALARM NMR (2004, Hajduk &al.) • Promiscuous scaffolds [talk] (2007, Bologa) • Pan-Assay Interference (2010, Baell &al.) • PubChem Promiscuity (2011, Canny &al.) 4
  5. 5. Prerequisites for Promiscuity Data Analysis • Definition of unique chemical entity? – Yes • Definition of unique biological entity? – Challenging • I.e., Rigorously calibrated bioactivity matrix 5 CN/C(=C[N+](=O)[O-])/NCCSCC1=CC=C(O1)CN(C)C.Cl VLSPADKTNVKAAWGKVGAHAGEYGAEALERMFLSFPTTKTYFPHFDLSHGSAQVKGHGKKVADALTNAVAH VDDMPNALSALSDLHAHKLRVDPVNFKLLSHCLLVTLAAHLPAEFTPAVHASLDKFLASVSTVLTSKYR
  6. 6. 2211 Chemicals On 159 Targets Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011 Bioactivity matrices depend on rigorous informatics 6
  7. 7. 2211 Chemicals On 159 Targets Bioactivity matrices depend on rigorous informatics chemistry biology “1 chemistry” = unique compound or… = unique scaffold “1 biology” = unique target? or… = unique assay? Chemical biology space: To define a space must define a point 7 Matrix c/o Tudor Oprea, Scott Boyer (AZ), 2011
  8. 8. Promiscuity: related concepts • Assay-interferers • Experimental artifacts • Frequent hitters • False positives • True positives • True non-selective actives • Aggregators • Reactives • Cytotoxic 8
  9. 9. Badapple promiscuity score: a practical definition for bioassay data analysis • Purpose: Streamline molecular discovery project workflow. • Hence: Score designed to detect "false trails" (true-promiscuous OR false positives) unlikely to be productive leads. 9
  10. 10. What is evidence-based? • Empirical data analysis • Mistakes can be un-learned. • Data driven • Not: Expert systems 10
  11. 11. Drawbacks of evidence-based • Theories proven wrong. Ouch. • Reality is messy. • GIGO. 11
  12. 12. Benefits of evidence-based More wins. More knowledge. Lower costs. Progress. (*) N.B. key role of Dick Cramer, ACS CINF 2013 Skolnik award winner. 12 (*)
  13. 13. BADAPPLE scoring function • Scaffold score • By scoring scaffolds, more relevant evidence • Substances, assays and samples considered • Penalize under-sampling • Score is a statistic, not a "model"! • Inherently "validated" 13
  14. 14. Avoiding overfitting; being skeptical of scanty evidence 14
  15. 15. Avoiding overfitting: Moneyball style Sabermetrics leads the way. The Signal and the Noise: Why So Many Predictions Fail—But Some Don't, Nate Silver (2012). http://mark.stubbornlights.org/phils/archive s/2004_05.html 15
  16. 16. *http://pasilla.health.unm.edu/tomcat/biocomp/badapple BADAPPLE public web app 16
  17. 17. Why Scaffolds? biology - birds, bees, nature chemistry - chemists SAR - drugs IP - lawyers
  18. 18. Molecular scaffolds are special and useful guides for discovery Jeremy Yang, UNM & IU Cristian Bologa, UNM David Wild, IU Tudor Oprea, UNM ACS National Meeting - Sept. 8-12, 2013 - Indianapolis, IN CINF grad student symposium 9/8/13 & ACS SciMix 9/9/13 8pm
  19. 19. There's something about scaffolds... Especially the "privileged few"... pScore Count Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714 19 Promiscuity score distribution
  20. 20. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... % of active samples i_scaf, descending pScore order Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 20
  21. 21. Scaffolds & drug-scaffolds, the privileged few explaining a lot of activity... Dataset: BARD, MLSMR, MLP HTS Totals: compounds: 373,802 ; scaffolds: 146,024 ; assays: 528 ; wells/results: 30,612,714; drugs: 283; drugscafs: 1958 % total activity # scaffolds % scaffolds All 50% 1979 1.4% All 75% 11,645 8% Drugs 50% 54 2.8% Drugs 90% 327 16.7% “total activity” = active scaffold-instances 21
  22. 22. Scaffold visualization: Molecule Cloud (Badapple-scaled) The Molecule Cloud - compact visualization of large collections of molecules, Peter Ertl and Bernd Rohde, J. Cheminformatics, 2012, 4:12. 22
  23. 23. What is BARD? • BioAssay Research Database, http://bard.nih.gov • MLP: 1000+ assays, 400k+ cpds, 200M data • Manual assay annotations + QA • BARD Assay Ontology • Assay Data Standard • Community platform for bioassay data analysis & computation 25
  24. 24. BARD + Badapple Synergy • BARD ontology, based on BAO • BARD raising "semantic IQ" of public bioassay data. • Evidence = information = data + metadata http://bard.nih.gov/
  25. 25. BARD Plugin Platform: Community development Enterprise deployment • BARD Plugin spec (IPlugin interface) • BARD PluginValidator class • Java, JAX-RS, Jersey, REST • BARD REST API • WAR deployment, discoverable fig c/o Steve Brudz, Broad
  26. 26. Badapple Plugin via BARD web client 28 Discovery scenario: Rapidly flags potentially problematic (notorious) scaffolds.
  27. 27. BARD Synergy, next steps: Raising Badapple to the next level Assay Definition Assay Protocol Biology BARD Classes biology chemistry Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology.
  28. 28. BARD Synergy, next steps: Raising Badapple to the next level Re-calibrating the bioactivity matrix for improved rigor, accuracy & sensitivity, using the BARD ontology. Biology Types Assay Formats Protein Classes Some re-calibrations of interest:
  29. 29. BARD Synergy, next steps: Raising Badapple to the next level cell-based format, 540, 60%single protein format, 161, 18% biochemical format, 76, 8% protein complex format, 38, 4% whole-cell lysate format, 32, 4% small-molecule format, 11, 1% protein format, 10, 1% Assays: assay format cell-based format single protein format biochemical format protein complex format whole-cell lysate format small-molecule format protein format tissue-based format nucleic acid format organism-based format cell membrane format microsome format cell-free format mitochondrion format (blank)
  30. 30. BARD Synergy, next steps: Raising Badapple to the next level PROTEIN, 514, 47% PROCESS, 341, 31% GENE, 217, 20% Biology Types PROTEIN PROCESS GENE biology AASUBSTITUTION NCBI
  31. 31. BARD Synergy, next steps: Raising Badapple to the next level 33 transporter, 112, 28% receptor, 56, 14% nucleic acid binding, 52, 13% enzyme modulator, 45, 11% cell adhesion molecule, 24, 6% transferase, 23, 6% hydrolase, 22, 6% signaling molecule, 19, 5% extracellular matrix protein, 16, 4% oxidoreductase, 11, 3% membrane traffic protein, 5, 1% Protein Classes transporter receptor nucleic acid binding enzyme modulator cell adhesion molecule transferase hydrolase signaling molecule extracellular matrix protein oxidoreductase membrane traffic protein transfer/carrier protein transcription factor cytoskeletal protein storage protein isomerase defense/immunity protein
  32. 32. Conclusions • Badapple exemplar as BARD plugin • Badapple exemplar as evidence-based algorithm • New BARD semantic capabilities will elevate Badapple to next level. • Promiscuity a complex issue. 34
  33. 33. Acknowledgements • Steve Mathias, UNM • Chris Lipinski, Melior • Rajarshi Guha, NCATS • Stephan Schurer, UMiami • Uma Vempati, UMiami • Mark Southern, Scripps • BARD Engineering Working Group 35
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×